Quick Office Document Triage

As people quite frequently ask me how I triage potentially malicious Microsoft Office documents, I decided to run through a quick analysis here. 

Our specimen for that tutorial is a word document out of the malware collection published by @0xffff0800 on http://iec56w4ibovnb4wc.onion (URL might change. Check current address at 0day.coffee)@0xffff0800 attributes the file to an Iranian Threat actor dubbed APT34 by Mandiant/FireEye. You can download the file  directly from the repository or Virus Total (https://www.virustotal.com/#/file/db53b4157868fffd0331c1498e2209c11499b14f5aa980fe4fb3453858ed90b5/detection)

Specimen Details

Size39’936 b
This is how the file looks like when you open it in Word (Don’t do it, but if you really have to don’t enable Macros 😜)

As you can see, they didn’t really care about crafting a nicer fake document. I trust, your red-team does better than that.

Triage vs. Full Analysis 

The main goal of a triage is to allow a medium experienced Forensic analyst who probably has no background in malware reverse engineering to figure out if a document is malicious and even get some IOCs out of it. The approach I’m suggesting here is a low-risk approach as it works completely without opening the file in Microsoft Word or executing any PowerShell code. For the sake of completeness, I’ll give some hints on how a Malware Analyst could continue dissecting the malware. 

Tools used

1.) Finding the Macro

We are looking at an old Word Document format as indicated by the .doc suffix. Those documents are stored in the OLECF  file format (further reading). Oledump is a nice tool written in python that allows you to extract various streams contained in the olecf file. So let’s take a closer look at the file and see if it is even malicious.

oledump.py MagicHoundAPT34.doc 

This will scan the file for all subelements in the compound file. For the given specimen it shows the following output. Note the stream number 7. This is the only stream that contains a macro.

Scanning the document

Oledump offers a fast and easy way to extract individual streams. As macros are usually compressed, we need the -v flag as well to decompress the content.

Stream 7 in uncompressed form

Looking at the macro, what we see are typical Powershell parameters and a lot base64 encoded sections. Let’s dump the macro in a separate file to look at it more closely.

oledump.py -s7 -v MagicHoundAPT34.doc > macro.txt 

2.) Revealing the actual code

Looking at the macro in the Texteditor of your choice (I use Sublime), preferably one that supports syntax highlighting we see that the main payload seems to be PowerShell based, and the vb macro only executes PowerShell and shows an error message.

powershell.exe  called using the vb Shell command

So apparently we need to decode the base64 Powershell source next. You probably realized that it is not a single string block, but multiple concatenated string blocks. So we need to clean that up a bit and then decode it into a new file called base64.txt 

Mac OS:
base64 -D base64.txt > powershell1.txt
Most Linux:
base64 -d base64.txt > powershell1.txt

That gives us an interesting piece of PowerShell code. At first, we see, that there seems to be some additional PowerShell code in a string variable called $G8t. at the end of the file, that same Powershell code gets base64 encoded again and then executed with the 32-bit version of PowerShell (Note the location of powershell.exe in a subfolder of syswow64). A very common reason to use 32-bit binaries is when the attacker happens to have 32-bit shellcode he wants to execute. This code wouldn’t run using 64-bit binaries. So let’s look for some shellcode. I’m sure you spotted it already. The variable called $z is an array of byte values. If you look at the code more closely you see, that the attacker leverages the memset function to write the shellcode bytewise into memory he allocated using VirtualAlloc. The malware seems to be flexible when it loads shellcode. Normally it allocates $g bytes which would be 1000 bytes. If the shellcode us longer it changes $g to reflect the actual size of the shellcode.

PowerShell code in powershell1.txt

For an incident responder with no malware analysis background, that would be the right moment to hand the sample over to a malware analyst. 

3.) Extracting and analysing the shellcode 

If you don’t know what shellcode is, Wikipedia has an easy to read article on that topic. So lets get out the shellcode in hex first. It looks like this.


I want to use CyberChef to create a binary file out of the shellcode. For that to work, I either need to get rid of all the 0x or the commas as CyberChef only accepts one separator.  I’ll get rid of the commas and paste it into CyberChef’s input window. Selecting the “From HEX” recipe gives me some gibberish characters in the output window. That’s exactly how machine code is supposed to look like in ASCII. So let’s save that to a file by clicking on the save icon. I choose shellcode.bin.

Creating binary shellcode from the HEX string

So now that we got the shellcode, what do we do next? Shellcode is not a complete binary. It usually uses functions provided by the operating system, in that case, windows to do whatever it needs to do. We have two options now. We do have way more than one option to run it anyhow. For a first glance, I’ll use a tool called scdbg. It allows us to run shellcode without first putting it into a complete windows executable. The downside of this is, that we can’t really debug it from there. One note, if you do the same thing, be aware that scdbg actually executes the code. This can definitely harm your system. So let’s see if we can get it to run.

Shellcode loaded into scdbg
Results after running the shellcode through scdbg

Ok, so now we know a bit more about the shellcode. It seems to leverage a WSASocket to open a connection to a local IP address on port 5555. This IP is not in the subnet of the analysis workstation so there is no way it would get a response. So wouldn’t it be nice if we could look at what it is trying to do there more closely? I guess it is worth a try. There is a nice little tool written by Adam Kramer that can help us out. It is called jmp2it. It allows us to debug the shellcode. But that’s already way beyond simple malware triage for Incident Responders. I’ll put up a separate blog entry on how to proceed with that sample as soon as I have time. So happy hunting and have a great weekend.

Attackers and RDP MRUs

Now I finally got the time to continue with mapping the data out of my Tanium RDP MRU Sensor.

But first a couple of things. Two people responded to my last Blog entry and pointed me at the HKEY_Users hive (HKU) to get my data easier. And they are partly right. The HKU holds the ntuser.dat content for all active users. The problem is, active users means actively logged on user. So for my scenario, investigating lateral RDP movement this will not be enough. So long story short, you’ll still need some coding if you want to get all the information.

Today I acquired some data from a testnetwork at my workplace using Tanium. As it is Sunday it contains only a small subset of the machines which are usually there. So the dataset contains mostly servers, including a fair share of jumpservers. So it’s actually a good dataset to start with.

My test dataset

My theory is, that if an attacker uses a known account or probably even an account created by the attacker, we should be able to follow his trail through the network. What I’m looking at is something like this.

Expected outcome

Creating a Graph like that would require me to have source, destination and username for an RDP MRU. That’s nice because I have that already as CSV exported out of my Tanium Sensor results. To see something that looks like attack I added some fake Data into the mix. Essentially someone using an account called attacker who jumps from a notebook to a jumpserver, from there to another server and yet another server and so on, you get the idea. I also assume, that I’ll definitely discover some logon habits of the people on the testsystems.

Once the data is ready, the question is how to filter it, plot it and do that all over again to support dynamic analysis. After some search I decided to start with an existing tool rather than building something. The tool I choose is Gephi (https://gephi.org) which is a very well known graphing software in data sciences. The best of all, it’s free.

So I had to figure out how that tool works. After investing an hour on it I got pretty much the result I wanted to have.

Plotting the full dataset

The first thing I checked out was the most active user in the dataset. The data immediately shows typical administrative behavior – quite disciplined though. looking up the username it is indeed an administrator. So even if the data looked chaotic at first, graphing it did allow to draw conclusions.
For obvious reasons I had to remove the node names (hostnames) from the picture. But the Two central nodes are jumpservers, thus being the target for connections from this user. The more remote nodes are servers the Admin jumped to using the jumpserver. The fat line between the two jumpservers indicates the administrator used both of his accounts to move between the servers in both directions which is perfectly in line with the policy.

Normal admin

So let’s see how a simulated attack vector looks like. My assumption here is, that the attacker used an account that never uses RDP normally or an account he created himself for lateral movement. Thus filtering for the username in the dataset will only show the adversary moving. If the attacker uses more than one account, it’s perfectly possible and even quite easy to plot all the action in one run.

Simulated attacker

I’m quite happy with the results and plan to give it a shot in a bigger, real network. That opens up a new way of hunting to me, so I guess it was worth the while. There are some additional things I need to mention. First and foremost finding the attackers targets works in any case if he uses RDP for lateral movement and you are able to differentiate between legitimate users and attackers. In the worst case you need to check with the owner of the compromised account which hosts he usually accesess using RDP and filter those out.
Having said that, there is a limitaion in some networks. As soon as roaming profiles are used widely, the ntuser.dat of the users will be roamed as well. This means that the source for the MRUs is not necessarily the machine the entry was added to the registry. In this case real step-by-step tracking does not work anymore. Secondly there is almost no timing aspect to that data. You can usually deduct which move came first, but not when. In that case you would need to leverage eventlogs if they have not rolled already (which they most likely have as you’d have used them if you could in a first place). The huge advantage is that the data I leveraged here stays on the system for very long. I remember the profiles of some admins still being present on my company notebook after I was in the company for several years. Those guys had already left the company by then.

So to summarize, RDP MRUs are a great way to track lateral movement using RDP even a long time after the attacker has left. It does not work as well when profiles are roaming but still gives some value in that case. If you have any questions just ping me on twitter @mathias_fuchs.

Another DFIR Blog? Really?


I’ve not been maintaining a blog for quite some time know. So why do I feel that ti now makes sense to start over again. Well, first and foremost whenever I develop new fancy threat detection mechanisms and strategies or run incident response engagements in my day job, or when I’m teaching SANS classes around the world people tell me cool stuff that makes me better in what I am doing. Now it’s time to contribute more to the community, give back what the community gave and still gives to me.


I’m not sure yet how that’s gonna turn out, but in the end I want to be able to point my customers and my students to my blog when they ask my what the hack I’m doing all day long. I also want to incubate and conserve ideas here. Right now I do have ideas for maybe my first 3-4 blog entries. I want to keep those entries technical, make whatever I describe usable to others. The DFIR community is a great community and I’m proud to be in a position to contribute at least a little bit to it.


So my full name is Mathias Fuchs, I’d rather go by the name of Mat (yes just one t – like in my full name). My current day job is building Cyberdefence Center in it’s own league together with awesome colleagues. My office is in Switzerland but I still live in Austria and commute on a weekly basis.

Besides that I teach the Advanced Incident Response and Threat Hunting” class for SANS (www.sans.org). I love it to travel to conferences and meet the best of the best – learn from them and maybe also help them to become even better incident responders.


If you have any questions, don’t hesitate to ping me on twitter, via mail or wherever you can find me. If you look to the right, you see where I’m currently at if I’m at conferences.