NOWSECURE UNVEILS FIRST AUTOMATED OWASP MASVS V2.1 MOBILE APP SECURITY AND NEW PRIVACY TESTING

The depth and scope of NowSecure Platform testing gives customers assurance that their mobile AppSec programs meet the highest industry standard.

Media Announcement
NOWSECURE UNVEILS FIRST AUTOMATED OWASP MASVS V2.1 MOBILE APP SECURITY AND NEW PRIVACY TESTING NOWSECURE UNVEILS FIRST AUTOMATED OWASP MASVS V2.1 MOBILE APP SECURITY AND NEW PRIVACY TESTING Show More
magnifying glass icon

Reversing iOS System Libraries Using Radare2: A Deep Dive into Dyld Cache (Part 1)

Posted by

Francesco Tamagni

Android/iOS Security Research Engineer
Through research and development, Francesco Tamagni makes NowSecure automated iOS security testing tools better. Previously a mobile application engineer, Francesco is driven by the will to create and reverse-engineer various things. He is an avid Frida user and occasional contributor to Radare.

Editor’s Note: Keeping up with ever-changing low-level operating system internals is no small task, especially when it comes to the iOS dyld cache, a key element that underpins how iOS apps interact with the OS layer. This post, by Francesco Tamagni, focuses on leveraging foundational pieces of radare2 to read and interpret the formats critical to performing mobile security investigations. 

The often-overlooked backbone of our industry are the individuals working to build and maintain the tools along with the security researchers and pen testers who uncover critical vulnerabilities and privacy issues. Remember, their work keeps us all safe and Francesco’s effort to continue to educate newcomers and experienced alike allows the broader community to do their jobs effectively. We are proud that the NowSecure team is among those leading the charge, tackling these challenges, creating educational material, and working to ensure that the tools we rely on remain functional and robust to maximize the effectiveness of all pen testers and researchers, even as the ground shifts beneath the entire community’s feet.

For the last eight years, my day-to-day job as a research engineer for NowSecure has been to help create tools to automate dynamic mobile application security testing of iOS apps, and its orchestration on real devices. A big part of that happens via dynamic binary instrumentation, thanks to Frida. This requires a deep understanding of how the system works, apps interact with it, and data flows. That understanding can only be achieved by reverse engineering mobile apps and system components.

When I first began reversing iOS apps, I soon discovered that the system library files don’t reside on the file system, certainly not in the path pointed to by the dynamic linking information. That baffled me at first, as it probably does most people who embark on reverse engineering Apple platforms.

It turns out that all those libraries are instead prelinked together in a single big executable file referred to as dyld shared library cache. This file is then mapped in the address space of all executables running on the system, by the dynamic loader and linker (dyld).

In order to look inside those libraries, you need to get familiar with the dyld shared library cache (DSC) and the tools available to navigate those big binary blobs. My tool of choice for this task is radare2, which I love, and for which I’ve been contributing a lot of code to support the constantly evolving structure of Apple’s DSC.

This first installment of a three-part series of blog posts covers the basics: how to obtain the DSC and use radare2 to open and navigate dyld shared caches, their metadata, and the code they contain. 

That lays the foundation for the following posts, which will cover finding cross-references because it’s a basic aspect of reverse engineering and guide you with examples.

DSC from Above

The DSC resides on the device’s file system. Its path and the number of files on which it is split into depend on the OS, its version and the hardware type:

macOSOne of:
/System/Library/dyld/dyld_shared_cache_<arch>
/System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_<arch>
iOSOne of:
/System/Library/Caches/com.apple.dyld/dyld_shared_cache_<arch>
/private/preboot/Cryptexes/OS/System/Library/Caches/com.apple.dyld/dyld_shared_cache_<arch>
SimulatorsThey’re under ~/Library/Developer/CoreSimulator/Caches/dyld for each installed simulator runtime.

Starting around the time of iOS 15.x, Apple started to split the DSC into multiple files, with the same naming convention as above, plus a progressive-number suffix. The number of splits varies wildly across systems and versions, between one and a few tens.

The DSC contains most of the system libraries and the bulk of the whole executable code shipped with the OS, so they’re large by nature, totalling between 1 and 4 GB per OS version.

How to Get the DSC Files

There are a few different ways to obtain the DSC file(s):

  • The easiest and most reliable way is to use the fine ipsw open source tool by Blacktop which (among other amazing things) automates the download of the right IPSW file and the extraction of the dyld cache from its file system:
    • Example download: ipsw download appledb –os iOS –version ‘17.2.1’ –device iPhone15,4
    • Example extract: ipsw extract iPhone15,4_17.2.1_21C66_Restore.ipsw -d
  • Alternatively, it’s possible to retrieve them manually from the IPSW for the target os (by downloading e.g. from from ipsw.me).
    • extract the second-largest dmg from ipsw zip (was the largest one for pre-cryptex era versions)
      • file system images on iOS 18 ipsw files are also encrypted
    • mount it on macOS and fetch the files according to the paths above
  • You can even extract them from a running device, but this requires extra care and effort to succeed and potentially a separate blog post so I won’t go into details about it now.

Inside the DSC

Before opening the DSC using radare2, let’s keep in mind a few guiding principles:

  • We want to open the whole cache, not extract single libraries in separate files.
    • There are tools out there (including Xcode) which produce single dylib files from the cache, but the result is incomplete because the libraries get transformed while being linked together in a single executable. In particular, each library can call others just by direct (or indirect) branches, instead of relying on the traditional mechanism of dynamic importing of public symbols. This makes the bundling of libraries in the DSC quite a one-way and “lossy” process
  • The cache code is a lot, so we don’t want to analyze all of it blindly and wait hours; instead we’ll have to focus on just what’s necessary.
    • The files extracted by Xcode, located at “~/Library/Developer/Xcode/iOS DeviceSupport/” can still be useful for some preliminary grepping in order to narrow down the libraries we’re interested in
  • As always with radare2, the more a priori knowledge we bring about the problem we’re trying to solve, the higher the reward in terms of performance we get from the tool.

Warning: keep calm and use radare2 from GitHub!

This is an old mantra (there are real t-shirts out there with this slogan) but it’s still true!

Since radare2 is constantly evolving and bleeding edge by nature, we always recommend to use the latest code straight from the Git repository, Here are the full installation instructions but if you’re on a Unix-based machine (including macOS), it’s simply a matter of opening a terminal and typing:

Opening the Whole Dyld Cache

To open the DSC, we need to specify the path using the dsc:// URL scheme, which tells r2 to use the DSC-specific I/O plugin. This takes care of rebasing pointers under the hood, and abstracts the presence of multiple files in the cache. If the cache is composed of multiple files, just point it to the first one (the one without the numerical suffix).

If we open the cache right away:

a warning message is immediately printed:

where r2 is suggesting us to set the R_DYLDCACHE_FILTER environment variable to restrict the set of libraries for which metadata (like symbols, strings, etc.) will be loaded:

  • It’s a colon-separated list of library names
  • They will be matched against the actual list of libraries present in the cache (case sensitive, partial match on full paths)
  • The matching libraries are loaded, and their direct dependencies
    • But the whole cache will still be mapped and visible

That’s a way to cut the initial loading time and memory overhead down, and requires knowing ahead of time what libraries we’re interested in (at least vaguely). For example, if we want the Foundation framework, the libSystem sub-libraries and libdispatch, we can run it like this:

This filter will be the one applied to all examples below, unless specified otherwise.

Navigating the Libraries

Once the cache file is open, all the resident libraries contribute their own Mach-O sections, and r2 treats the result as a single executable, where each section name is prefixed with the library path it’s originating from.

For example, let’s list all sections from all the libraries which are currently loaded using the iS command. Here, the output is truncated after the first 100 lines (for brevity):

Strings, symbols and classes are all loaded for the libraries matching the filter (and their dependencies), and can be accessed via the r2 commands used for doing the same on regular executables.

The most convenient way to get to named entities in r2 is to list flags (f command) and then grep that list (~) for partial names. For example:

To visualize long lists and grep them interactively, r2 provides the ~…  command which is very handy. It also has the visual browse mode, reachable via the Vb command, which allows users to visually navigate items like flags and classes. These commands are quite hard to show in a blog post, though, so I encourage readers to try them firsthand.

Any virtual address can then be seeked to (s command):

Then, for example, let’s disassemble a few instructions using pd:

In essence, any r2 command can be used just the same as it would be on single executables / libraries.

Be careful  to avoid commands that require analyzing the whole code or mapped memory. We’ll later share examples of finding cross-references, where for performance reasons we’ll have to restrict analysis to only the portions of code we’re interested in.

Get Information About the Cache Itself

Each file of the ones composing the DSC defines a set of maps which dyld then uses to load specific portions of code and data into the corresponding virtual memory addresses.

The DSC memory maps can be shown using the iSS (info, list segments) command:

Where the paddr column shows the offset in the DSC file, and the vaddr column shows the corresponding address in memory (with no ASLR slide applied). The Mach-O segments of the single libraries are all clumped together in the cache maps with the corresponding memory access permissions.

Note that the default address that r2 takes us to when the DSC is opened (0x180000000) is the virtual address of the first map defined in the first file, which is also where the “main” DSC header is located.

In order to get information about the DSC header itself and the executable images contained in the cache, the iH command (info, header) can be used (here truncated for brevity):

The output is in JSON by default, and it’s handy for automation tasks (like r2pipe scripting, as we’ll see shortly).

Exported Symbols vs. Debug Symbols

The complete list of symbols defined by the libraries which match the loading filter is visible with the is command (and its variants). This includes both internal and external (exported) ones.

To get only the list of exported symbols, the iE command (and its variants) can be used instead.

As an example, let’s look up the swift_retainCount symbol using the isq command (faster and less verbose than plain is):

To check if it’s exported, then we can grep again its name (or its address) on the output of iEq (info, exported, quiet):

Because it’s returned again, that means it’s an exported symbol.

If we do the same with a symbol which is internal, we get instead:

Where the second command doesn’t yield results, meaning that _swift_xpc_retain is not exported, therefore it (normally) can’t be called from outside the library which defines it.

Keep in mind that all symbols are available as flags too, prefixed with the sym. “flag space” identifier. For example:

Exploring Objective-C Classes

Even if most of native app development on Apple platforms nowadays has transitioned to Swift, many system libraries are still implemented in Objective-C, making DSC Objective-C classes an important reversing target.

All the Objective-C classes present in the libraries which match the filter are available in r2, just like it happens when opening single executables, using the ic command and subcommands.

iOS system libraries are prelinked together in a single big executable file referred to as dyld shared library cache.

However, over time, Apple has been adding various optimizations specific to the DSC for how Objective-C metadata is encoded and retrieved — and that’s something reverse engineering tools must be constantly updated to support.

For example, on recent caches which encode Objective-C class metadata in the “list of lists” format, when you enumerate the methods of a class, all methods from categories on that class are present, regardless of the actual library which defines them. On one hand, this could result in quite long lists of methods, on the other hand, it makes it easier to find arcane methods defined by categories.

Fortunately, as we already saw in the examples above, restricting long lists by grepping in r2 is just a matter of using the ~ command on the output of other commands.

A good way to visualize this abundance of methods from categories it to list all methods of the NSObject class containing the word “perform”:

Another recent optimization Apple added to Objective-C is the ability to exclude some methods from the Objective-C runtime, in fact transforming them into normal C functions. When that happens in the DSC, we still can see the debug symbol (if present) for those methods, but they won’t appear in the ic output.

This can be seen by comparing symbols or flags with the ic output, for example the _NSPredicateUtilities class has a _predicateSecurityAction class method which is called internally from within the Foundation framework, but it is not part of the Objective-C runtime, so the ic command doesn’t list it:

However, it is still present as a debug symbol, visible to the isq command (and also listed as a flag):

Which makes it possible to go look at the disassembly and inspect its logic.

Another handy feature when exploring Objective-C classes is listing the ivars too. An easy way to do it in r2 is to query flags again and grep for the class name, “field.class” and “var”:

Gaining visibility into ivars is important because they are offsets into the internal state of the class, which in turn may or may not be exposed through getters and setters. Even when accessor functions are present, though, the class code usually refers to its own fields directly, and being able to see their names greatly helps in understanding what the code does.

Automating Tasks with r2pipe

There are repetitive tasks which can easily be automated using r2 scripting functionalities. The one I usually go for is r2pipe, which is a way to execute r2 commands from scripts in one of the (many) supported languages.

For the sake of this blog post series I’m going to use python, but r2pipe can be used in the same way from many languages

Now I’m going to introduce an r2pipe script which will be useful multiple times across the series, and is simple enough to talk through chunk-by-chunk.

The “dyld_what” Script

This script’s purpose is to print the path of the library (if any) to which any address in the DSC belongs. It comes handy when it’s necessary to refine the loading filter, as we’ll see in the example below.

The address to look up (or the corresponding flag name) can be passed as an argument. If no argument is passed, it looks up the current address.

This works regardless of the initial filter we set, by using the metadata from the DSC header to binary-search all the executable images for the address we provide.

It’s designed to work from within an r2 session, and can be invoked from the r2 prompt using the #!pipe command:

The full code of the script can be found in this Github’s gist: https://gist.github.com/mrmacete/e061f0f0d38a96c75f8177747c26ea01.

It starts with the import statements where, among other things needed for this particular script, we import the r2pipe python package (make sure you install it first using pip, as stated in the docs). After that we can open the pipe to the existing r2 session, by calling r2pipe.open() without arguments:

Now we can use the r2 variable to execute commands and get the output from them as string with r2.cmd() or as a parsed JSON object using r2.cmdj() and that’s pretty much all there is to know about r2pipe!

So let’s see it in action, as the script proceeds to get the array of images from the header using the iH command:

From which we can create the heart of this script, which performs a binary search on the images, assuming they can’t overlap. For doing that we need to sort the array we just got, and extract a helper array with just the addresses, so we can leverage the bisect python package:

The get_path_if_contains double checks if the address falls into the candidate image, because images are actually interleaved with stubs islands which don’t belong to any particular image but are shared among different ones (we’ll see that in a later post about finding references across libraries).

What remains of the script deals with the input argument, which could be a numeric address, a flag name or nothing at all.

In case no argument is passed we have to resolve the value of the $$ variable which holds the current address:

Instead if we got an argument, it may be a named flag, so we have to resolve its address by converting whatever the input is to a numeric value:

Finally, the main entry points puts it all together:

To use this script, we can define an alias from the r2 prompt, or from ~/.radare2rc called $what (quotes are important):

In this way we’ll have the $what command to which we can provide the argument directly instead of writing the long pipe command.

Example: using $what to refine the filter

As we already know, on modern DSC, Objective-C classes contain methods combined from all categories on all frameworks, even if they’re not present in the filter. To narrow down the above example, on NSObject we have:

But if we look at the categories on NSObject loaded with the current filter, there’s nothing obviously responsible for that:

Let’s use the $what alias to run our r2pipe script and discover which library the implementation of that method belongs to, by passing it the address:

Now, if we reopen the DSC with /IMSharedUtilities.framework added to the filter:

And list again the categories on NSObject, this time we get:

Where the last one seems promising:

And we can see the full set of features this category is adding, and since now the responsible framework belongs to the filter, we can also see all symbols and classes related to it in case we have to dig deeper into how these functionalities are implemented.

Conclusion

If you reached this point, hopefully you made it through the first blog post about reversing iOS system libraries using radare2. That was just the beginning, but now you are equipped to jump into experimenting with it on your own. The next posts will go in depth about finding cross references, first within single libraries then across different libraries. Until then, hang tight and feel free to reach out with any questions. Issues and pull requests are also appreciated on radare2’s GitHub.