As people quite frequently ask me how I triage potentially malicious Microsoft Office documents, I decided to run through a quick analysis here.
Our specimen for that tutorial is a word document out of the malware collection published by @0xffff0800 on http://iec56w4ibovnb4wc.onion (URL might change. Check current address at 0day.coffee). @0xffff0800 attributes the file to an Iranian Threat actor dubbed APT34 by Mandiant/FireEye. You can download the file directly from the repository or Virus Total (https://www.virustotal.com/#/file/db53b4157868fffd0331c1498e2209c11499b14f5aa980fe4fb3453858ed90b5/detection)
As you can see, they didn’t really care about crafting a nicer fake document. I trust, your red-team does better than that.
Triage vs. Full Analysis
The main goal of a triage is to allow a medium experienced Forensic analyst who probably has no background in malware reverse engineering to figure out if a document is malicious and even get some IOCs out of it. The approach I’m suggesting here is a low-risk approach as it works completely without opening the file in Microsoft Word or executing any PowerShell code. For the sake of completeness, I’ll give some hints on how a Malware Analyst could continue dissecting the malware.
1.) Finding the Macro
We are looking at an old Word Document format as indicated by the .doc suffix. Those documents are stored in the OLECF file format (further reading). Oledump is a nice tool written in python that allows you to extract various streams contained in the olecf file. So let’s take a closer look at the file and see if it is even malicious.
This will scan the file for all subelements in the compound file. For the given specimen it shows the following output. Note the stream number 7. This is the only stream that contains a macro.
Oledump offers a fast and easy way to extract individual streams. As macros are usually compressed, we need the -v flag as well to decompress the content.
Looking at the macro, what we see are typical Powershell parameters and a
oledump.py -s7 -v MagicHoundAPT34.doc > macro.txt
2.) Revealing the actual code
Looking at the macro in the Texteditor of your choice (I use Sublime), preferably one that supports syntax highlighting we see that the main payload seems to be PowerShell based, and the vb macro only executes PowerShell and shows an error message.
So apparently we need to decode the base64 Powershell source next. You probably realized that it is not a single string block, but multiple concatenated string blocks. So we need to clean that up a bit and then decode it into a new file called base64.txt
Mac OS: base64 -D base64.txt > powershell1.txt Most Linux: base64 -d base64.txt > powershell1.txt
That gives us an interesting piece of PowerShell code. At first, we see, that there seems to be some additional PowerShell code in a string variable called $G8t. at the end of the file, that same Powershell code gets base64 encoded again and then executed with the 32-bit version of PowerShell (Note the location of powershell.exe in a subfolder of syswow64). A very common reason to use 32-bit binaries is when the attacker happens to have 32-bit shellcode he wants to execute. This code wouldn’t run using 64-bit binaries. So let’s look for some shellcode. I’m sure you spotted it already. The variable called $z is an array of byte values. If you look at the code more closely you see, that the attacker leverages the
For an incident responder with no malware analysis background, that would be the right moment to hand the sample over to a malware analyst.
3.) Extracting and analysing the shellcode
If you don’t know what shellcode is, Wikipedia has an easy to read article on that topic. So
I want to use CyberChef to create a binary file out of the shellcode. For that to work, I either need to get rid of all the 0x or the commas as CyberChef only accepts one separator. I’ll get rid of the commas and paste it into CyberChef’s input window. Selecting the “From HEX” recipe gives me some gibberish characters in the output window. That’s exactly how machine code is supposed to look like in ASCII. So let’s save that to a file by clicking on the save icon. I choose shellcode.bin.
So now that we got the shellcode, what do we do next? Shellcode is not a complete binary. It usually uses functions provided by the operating system, in that case, windows to do whatever it needs to do. We have two options now. We do have way more than one option to run it anyhow. For a first glance, I’ll use a tool called scdbg. It allows us to run shellcode without first putting it into a complete windows executable. The downside of this is, that we can’t really debug it from there. One note, if you do the same thing, be aware that scdbg actually executes the code. This can definitely harm your system. So let’s see if we can get it to run.
Ok, so now we know a bit more about the shellcode. It seems to leverage a WSASocket to open a connection to a local IP address on port 5555. This IP is not in the subnet of the analysis workstation so there is no way it would get a response. So wouldn’t it be nice if we could look at what it is trying to do there more closely? I guess it is worth a try. There is a nice little tool written by Adam Kramer that can help us out. It is called jmp2it. It allows us to debug the shellcode. But that’s already way beyond simple malware triage for Incident Responders. I’ll put up a separate blog entry on how to proceed with that sample as soon as I have time. So happy hunting and have a great weekend.