Friday, January 9, 2009

Sandboxing Applications in OS X

Last month I spent most of my time studying the OS X Mach-O file format and the dyld. After playing around with Sandboxie on windows I thought it'd be fun to see how it could potentially be implemented on Mac OS X. The following is basically a dump of what I learned as I ventured into the world of the dyld on Mac OS X.

First, I should define what sandboxing means in this context. Sandboxing an application involves wrapping applications within a lightweight container which restricts what an application is able to do and in some cases changes the behavior of system functions completely. The goal of this type of sandboxing is to prevent an application from arbitrarily writing into the filesystem. Reads come from the source filesystem but writes are redirected into the sandbox filesystem. In this way sandboxed applications cannot change system files.

Is there a need for sandboxing on OS X?
I'm not a security expert so I cannot comment on whether or not sandboxing in this form is effective, beneficial or necessary on OS X. From my average developer's standpoint there are benefits such as:

  • Security: Yes, there may not be (m)any malicious apps targeted towards OS X but this could change. Assuming the sandboxing application is lightweight enough to run transparently to the user, there would be little reason to not run inside a sandbox.

  • Testing: Run app that is to be tested inside a sandbox. After running the app the sandbox can be deleted when done. No extra cleanup required, system never really was effected by the app.

  • Auditing: Track what application adds/removes/modifies files. To be fair there are auditing abilities built into OS X already, see audit(2). Accessed files can also be seen within the Activity Monitor application under the Inspect, 'Open Files and Ports.'


Ways to Sandbox Applications on OS X

  • Kernel Extension: From just a glance at sandboxie it seems to use a kernel-mode driver to get most of it's work done. Not having ever done kernel development before, I took a different approach. It could be that a kernel extension would be the cleanest way to achieve app sandboxing however the kext may then become a target itself if it were poorly implemented.

  • Some Sort of Emulation: This approach would involve running the process under an emulation layer like Valgrind. Patches do exist to get valgrind running under OS X and would be worth investigating feasibility. One downside would be the extreme performance penalty.

  • Function Interposing via Dynamic Library Insertion: Probably the most viable user-level way to achieve sandboxing. Below I have described how it would work.

  • Function interposing via Executable Modification: The details would be pretty nasty but you could modify the symbol table within the mach-o executable to point to other functions in other libraries. The problem is that all the other loaded dynamic libraries would need to be modified as well and that's where it'd be really messy.

  • Other ways? Certainly I'm probably missing other (easier) methods for doing this. If so let me know!



Function Interposing via Dynamic Library Insertion
(Before continuing, note that OS X occasionally uses two-level namespaces, which means that linux method of overriding functions may not necessarily work. Interposing does provide a nice way to address this issue.)

This approach involves directing the dyld (the dynamic link editor) to load a specified library at the process launch time. This is simply done by setting the 'DYLD_INSERT_LIBRARIES' environment variable to the target library before execution of the an application. The inserted library to be loaded has a special '__interpose' segment in the __DATA section which is interpreted by the dyld. Amit Singh's fantastic book has a chapter on interposing. Essentially it is a table of new and old function pointer pairs. Whenever the dyld needs to locate a symbol it will first look at the interposing table. This means that any dynamically linked symbol may be overridden by the library you inserted. So now you can redirect open/read/write system calls from the dynamic library.

The caveat of this method being that you can only call the real functions from within the library. This means that everything you need to do has to happen within the library.

So now we can insert our own versions of functions to be used by arbitrary applications. The question now is what functions should be interposed?
Nearly all the system calls are located within libSystem on OS X but there are also multiple versions of most system calls. From the average developer's standpoint this is handled transparently. In this case, though, it causes issues if the library is to robustly handle any OS X executable. For example, there are actually three different 'open' function calls:
open
open$UNIX2003
open$NOCANCEL$UNIX2003

Since we're not necessarily sure which version the executable is going to use each of the three versions of the functions need to interposed. See the multiple symbol variants link below for further explanation.

Of the system calls we are generally interested in the ones that deal with file and directory manipulation directly:
open
close
read
write
fstat
link
unlink
mkdir
mknod
mmap
munmap
opendir
...etc...


The list ends up being very long and there are a huge number of special cases. Such as how mmap function is handled, or how the /dev/* special nodes are handled.

Current Development State
So that's where I'm at now. Early on, I was able to run Safari inside of another shadow directory simply by hijacking the open call and opening a shadow file (first copying the original to the shadow directory) and returning the file descriptor to the shadow file. This worked surprisingly well but I wanted to get more ambitious and run any application from a single shadow volume file. To do this I've chosen to use sqlfs with sqlite. The result should be an injectable library which runs in user-mode without need for special privileges which writes into one file which can easily be discarded. Well, that's the idea anyway, it's quite a bit of work to be done still and I'm not sure if there is any demand for such a product.



Miscellaneous and References:

dyld was re-written in C++ from a C/asm version for rosetta 10.4/10.5? However, interposing is not included in the open source release?

Guard Malloc(libgmalloc) uses only DYLD_INSERT_LIBRARIES envar to override malloc and it does so by using an interpose table.

Two Level Namespaces

Executing Mach-O Files

dyld Library Reference

Mach-O Reference

Multiple Symbol Variants

OS X Kernel Programming