Reversing iOS System Libraries Using Radare2: A Deep Dive into Dyld Cache (Part 3)

Francesco Tamagni

Android/iOS Security Research Engineer

Through research and development, Francesco Tamagni makes NowSecure automated iOS security testing tools better. Previously a mobile application engineer, Francesco is driven by the will to create and reverse-engineer various things. He is an avid Frida user and occasional contributor to Radare.

September 13, 2024

Best Practices, Research & Threat Intel

Welcome to the final blog post in our series about reverse engineering iOS system libraries with radare2. This time we’ll focus on finding cross-references across different libraries present in the dyld shared library cache (or DSC if you’re into acronyms).

We’ll discuss various techniques to achieve this, each with its trade-offs between performance and amount of prior knowledge required.

If you missed the previous two blog installments, now it’s a good time to catch up because they go through the concepts needed to understand today’s topic.

By leveraging the capabilities of multi-file caches, all import stubs are grouped into ‘stub islands,’ cleverly mapped to maximize the number of executables that can reuse the same import stub.

Cross-References Across Libraries

Functions that are exported symbols can be called by other libraries within the DSC. This usually happens through an “import stub,” a small function that loads the address of the final function and jump to it.

Typically, each executable that calls functions exported by other libraries embeds its own import stubs in a specific section of its own Mach-O file. While this remains true for executables embedded into the macOS and simulators’ DSCs, it differs on iOS because Apple implemented several optimizations over time to reduce the size of the entire DSC.

On iOS 15.x, libraries “near enough” in the DSC memory layout (but not necessarily related to each other) were able to reuse each other’s import stubs if they happened to depend on the same imports.

More recently, by leveraging the capabilities of multi-file caches, all import stubs are instead grouped together into “stub islands”, which in turn are cleverly mapped to maximize the amount of executables which can reuse the same import stub.

To implement these optimizations, the original libraries are transformed while being embedded to the DSC. That is why extracting separated libraries from the DSC could be cumbersome and lead to incomplete information — it’s simpler to work with the entire DSC.

Mobile AppSec Training

Challenges with Performance

The main problem when finding cross-references across libraries in the DSC is performance. The challenge is to reduce the size of the problem as much as possible while still accomplishing the task.

Since for any exported symbol there can be many libraries referencing it, we want to load only the “interesting libraries” when opening the DSC with r2. We need to know both the library that holds the function we want to find references to, and ideally, the caller libraries where we want to find the caller code.

This means finding all references to a given exported symbol is a tedious job that requires:

Finding all import stubs that reference the target symbol.
Finding references to the stubs.

Even when “stub islands” are available and easy to emulate in a reasonable time, finding all the references to a subset of them may still require emulating most of the executable code in the DSC. This is not ideal and consumes significant time and memory.

Instead of blindly searching for references, I typically opt for a more interactive discovery of calls to imported functions, provided I know the caller code I’m interested in. This approach requires some help from r2pipe which is the easiest way to automate tasks in radare2.In addition to the $what script we already encountered in the first episode, here’s another r2pipe python script that I frequently use for this task.

The “namestubs” Script

This script uses r2’s emulation powers to name an import stub after the function it wraps. It also supports the relatively recent Objective-C method stubs, which encapsulate calls to objc_msgSend using a specific selector.

You can find the script on GitHub: Gist link.The alias I use for this script is called $namestubs:

The resulting $namestubs command must be provided with a list of addresses. If everything works correctly, it will produce no output and will create functions with the correct names.The produced names have the stub. prefix, and if multiple stubs have the same name, they’ll be postfixed with an incremental number.

The stub function creation is idempotent, in the sense that if $namestubs is called multiple times on the same address, or if the same address appears multiple times in the argument list, the script will detect that a stub name is already defined and move on.

Example: Exploring Calls to Exports Interactively

Let’s put it all together with an example where we analyze the code of a very simple function that calls exports from other libraries.

The function in question is +[_NSPredicateUtilities _predicateSecurityAction], an Objective-C class method that has been stripped out of the Objective-C runtime by means of compiler optimizations (as pointed out in the first post).

That function lives in the Foundation framework, so we can open the DSC with a filter like

If we disassemble that function as it is now, here’s what we see:

This output isn’t very informative since it only shows raw addresses without symbol names. However, if we examine the first call target, the 0x18bf71170 address:

We can see that it’s computing an address in x16 then jumping to it.

If we enable r2’s emulation and disassemble that again, we can see that the computed address refers to a symbol with a name:

That means it is a stub for objc_opt_self().

That’s exactly what the $namestubs script does, so let’s feed it with all the function addresses called by the +[_NSPredicateUtilities _predicateSecurityAction] function:

Now the stubs functions are created automatically with the correct names, so if we disassemble the function again, we get:

We can now use the $what script to check which library those stubs belong to, but that will fail to resolve the image. By using the iSq command to reveal the corresponding section, though, we can see all those stubs belong to the same stubs island:

Stub islands don’t belong to any particular image because they’re shared between neighboring libraries.

Example: Interactive Exploration of Older DSC

For completeness, let’s examine the same function on iOS 15.7, where stub islands weren’t a thing yet:

The first thing that stands out is that dyld_program_sdk_at_least is not a stub. We can confirm this by checking the section it belongs to:

This reveals that this function calls a symbol from libdyld directly without any stub.

As for the other two, if we run $what and iSq. on them:

We can see that they are indeed stubs, reused from CoreFoundation and libicucore, respectively. Note that iSq. would have not returned any results in case the stubs were reused from libraries not present in the load filter, while $what instead works regardless.

Let’s name them using $namestubs and print the function’s disassembly again:

Here, we can see they were indeed objc_opt_self and abort, confirming that the stubs were reused because they were near in memory, not because CoreFoundation or libicucore are responsible for those symbols.

Example: Find All References Leveraging Stub Islands

If interactive exploration isn’t an option and you still need to enumerate all the calls to a specific exported symbol, you can still do it by leveraging stub islands — just be prepared for disappointing performance compared to the interactive technique.

Things you should know in advance:

The function you want to find calls to
Ideally, a superset of the caller libraries to set the filter properly at load time.

The steps are:

Emulate all stubs islands first
Find references to the target function within stub islands to locate the corresponding stubs
Find references to the stubs you’re interested in, limiting the search to code near enough to call the target stubs directly (which is the point of having stubs).

First, we need to locate all stub islands. This is easy because r2 creates a section for each of them, so they’re obtained by filtering the sections list by name, similar to what we do for libraries:

Where the first column is the section’s start address, the second column is the end address.

To emulate all stubs islands with a single one-liner, we can call:

The command runs aaex once at the beginning of each stub island section, emulating $SS (section size) bytes of each.

The process takes a while (not too long — about 1.5 minutes on my MacBook Pro M1), but after it finishes, we can query references to any symbol and we’ll immediately find out where the stubs for it are.

For example:

All those references point within the stubs functions, specifically to the second instruction of each stub.

Let’s examine the first one (they all have the same structure). By disassembling three instructions starting from four bytes above the found reference, we can see the entire stub code:

The next step is to collect the ranges of addresses we need to emulate to find direct call (or branch) references to each of such stubs.

This is where things get problematic performance-wise, because the code we need to emulate can very well be all the executable code in the DSC.

Assuming an arm64 DSC, it’s possible to restrict the emulation space if the target symbol isn’t widely used and appears only in a subset of the islands. In that case, since on arm64 the b / bl instruction can jump to code located within +/- 128MB from the instruction itself, we can take advantage of this fact to focus only on the parts of all the __text sections which could call our stub functions directly. This will hopefully reduce the work needed to emulate them all and find the references to the stubs, and can be easily automated using r2pipe.

However, if the stubs for the target symbol are scattered across the DSC, the brute-force approach might be our only option. This would involve emulating all the __text sections in the DSC with the current filter, which generally is not advisable unless the filter already narrows down the loaded libraries to just a few tens.

Here’s a proof-of-concept (PoC) r2pipe script that automates the above: Gist link, which

can be invoked directly or through an alias like this:

The script requires an address or flag name of any exported symbol and will output all the references to the corresponding stubs found in the stub islands. A few warnings to note:

This works only on caches with stubs islands (iOS 16+)
It can take several hours to complete!
It’s just a PoC and should be tailored/optimized for your own task

Here’s an example of using this script where we open an unfiltered cache (worst-case scenario) and look for all references to the mach_continuous_approximate_time function. To speed up the loading of the unfiltered caches, it’s possible to disable the parsing of strings and classes, and the demangling of symbols by using the -e command line switch multiple times to set these configuration variables:

Alternatively, you could use Blacktop’s ipsw tool, as described here. Be prepared for some waiting, as analyzing a few gigabytes of code with about 9 million symbols takes time. Once the focus is more narrow and ready for interactive exploration at a more fine-grained scope, you can return to r2, where it really shines.

Conclusion

That’s it, for now. Thank you for taking the time to digest all this. Hopefully, you can use this information for profit, or to have fun opening issues and pull requests on radare2’s GitHub.

NOWSECURE UNVEILS FIRST AUTOMATED OWASP MASVS V2.1 MOBILE APP SECURITY AND NEW PRIVACY TESTING