Last we left off, I was hoping to analyze some malware for my OSU capstone project. And the good news is that’s exactly the project I’ve been assigned! I’m working with two incredibly talented classmates, Andrea Irwin and Sean Mack, to analyze malware samples from the book Practical Malware Analysis, and so far we’ve been making fantastic progress.
This week’s capstone blog assignment asks us to talk about our experiences using AI for our project. Well, we haven’t planned to use AI tools, such as ChatGPT or Bard, as part of our analysis. But during a previous internship I did have the incredible opportunity to design and build a tool that uses ChatGPT to try and help reverse engineer black-box binaries – a task similar to analyzing malware samples.
The result of this work is called OFRAK AI, and it uses the capability of the OFRAK toolkit to unpack and analyze closed-source binaries and then makes requests to ChatGPT to help analyze the results at the binary and disassembly levels.
So without further ado, here are some of my thoughts on the use of AI to assist in reverse engineering software.
The Good
Right off the bat, getting started with ChatGPT is a breeze. Their API is well-documented, and you can find a considerable amount of examples for establishing and maintaining conversations all over the web. The time savings here is well appreciated, too, because you’ll end up spending the majority of your time fine-tuning your prompts to get the kind of results you’re looking for.
Unsurprisingly, when used for analyzing and manipulating executables, ChatGPT excelled best at tasks involving natural language. The majority of information that it was able to glean about a binary came from strings and symbol information, such as identifying the compiler used by the .comment
section of an ELF or common library functions by their symbol names.
One particularly fun use of ChatGPT was to patch a binary’s debugging strings after altering their voice. For example, by extracting strings of a certain length from a file, asking ChatGPT to “sassify” them, and then patching them back into the original binary, we were able to considerably improve the console output of a Cisco router.
The Bad
Alas, ChatGPT certainly isn’t the panacea that some may think it is, and that’s especially true when it comes to reverse engineering binary files. While ChatGPT excels at operating on natural languages such as English, it performs much worse at formal languages like you find in disassembly output. And while it may be able to regurgitate answers to common Leetcode questions – thanks to data it’s seen thousands of times in its training corpus, no doubt – it leaves much to be desired when trying to reason about the purpose of a function given only its assembly instructions. (Perhaps more people should be solving Leetcode problems in Assembly, one wonders…)
For starters, the ChatGPT tokenizer simply isn’t meant for formal languages. Given the unnatural character combinations found in assembly instructions (how many words contain “xmm1”?), you rapidly chew through tokens while parsing input. Combined with the rather limited conversation history in terms of token count, it’s impossible to feed ChatGPT the entire disassembly of any program. This makes it unable to use contextual information to form a larger understanding of what the program does.
What about the analysis that ChatGPT was able to perform on human readable strings? Unfortunately, those results aren’t as promising as you might think, either. For example, it doesn’t take an experienced reverse engineer to spot the GNU version in the .comment
section using standard RE tools. Furthermore, symbol tables and their associated symbol names are likely to be stripped in production code, especially if you’re dealing with malware.
The Uncertain
So does that mean there’s no future for AI in reverse engineering and closed-source binary analysis? I wouldn’t be quite so quick to write it off. On the one hand, there is an art to reverse engineering – the forward compilation process is lossy after all, and thus the reversing process is non-deterministic – and so it requires creativity (which Large Language Models like ChatGPT arguably do not possess).
On the other hand, LLMs and other machine learning models trained more specifically on formal languages should be expected to perform better than a general chatbot like ChatGPT. For instance, take Binary Diffing (AKA Binary Similarity or Binary Matching), a technique used in reverse engineering to compare some unknown binary file or function to a known one. For many years, research into binary diffing focused on analytic approaches, such as comparing control flow and call graphs between complex blocks. Indeed, this forms part of the approach used by the industry standard tool BinDiff. Looking at more recent research, though, shows a clear trend towards the use of machine learning. And those research papers claim to show great results at using their machine learning models to identify functions based on their disassembly. Disappointingly, though, if code is published alongside this research at all, it has only been for re-training and fine tuning the models. I have yet to find a binary diffing tool available for widespread use that utilizes one of these machine learning models for its core functionality.
So what’s the verdict, can AI help with reverse engineering? Based on my own experiences using ChatGPT for reverse engineering, there’s still a long way to go before it produces the same kinds of success that it has for forward engineering. Considering the amount of active research being conducted in this problem space, though, I’m excited to see what improvements will be made over the next several years.
Marc Zalik