Por Mandiant em 23 de janeiro de 2014
Rastreamento grupos de ameaças ao longo do tempo é uma ferramenta importante para ajudar os defensores caçar para o mal em redes e conduzir resposta a incidentes eficaz. Saber como certos grupos operam faz para uma investigação eficiente e ajuda a identificar facilmente a atividade ator ameaça.
No Mandiant, nós utilizamos vários métodos para ajudar a identificar e correlacionar atividade de grupo ameaça. A peça fundamental do nosso trabalho envolve o acompanhamento diversos itens operacionais, tais como infra-estrutura de atacante e endereços de email. Além disso, acompanhar os backdoors específicas de cada grupo ameaça utiliza - uma das principais formas de acompanhar as atividades de um grupo ao longo do tempo. Por exemplo, alguns grupos podem favorecer o backdoor SOGU, enquanto outros usam HOMEUNIX.
Uma forma original que acompanha Mandiant backdoors grupos de ameaças específicas "é rastrear executável portátil (PE) importações. As importações são as funções que um pedaço de software (neste caso, o backdoor) denomina de outros arquivos (tipicamente várias DLLs que fornecem funcionalidade para o sistema operacional Windows). Para acompanhar essas importações, Mandiant cria um hash com base nos nomes de biblioteca / API e sua ordem específica dentro do executável. Referimo-nos a esta convenção como um "imphash" (para "hash de importação"). Devido à forma como a tabela de importação de um PE é gerado (e, portanto, como sua imphash é calculado), podemos usar o valor imphash para identificar amostras de malware relacionados. Também pode usá-lo para procurar novas amostras semelhantes, que o mesmo grupo ameaça pode ter criado e usado.
Embora Mandiant vem alavancando essa técnica há mais de um ano internamente, nós não somos os primeiros a discutir publicamente isso. Um imphash é uma poderosa forma de identificar o malware relacionado porque o próprio valor deve ser relativamente único. Isto porque vinculador do compilador gera e constrói a tabela de endereços de importação (IAT) com base na ordem específica de funções dentro do arquivo de origem. Tome o seguinte exemplo de código fonte:
#include <windows.h> #include <stdio.h> #include <stdlib.h> #include <wininet.h> #pragma comment(lib, "ws2_32.lib") #pragma comment(lib, "wininet.lib") int makeMutexA() { CreateMutexA(NULL, FALSE, "TestMutex"); return 0; } int makeMutexW() { CreateMutexW(NULL, FALSE, L"TestMutex"); return 0; } int makeUserAgent() { HANDLE hInet=0, hConn=0; char buf[sizeof(struct hostent)] = {0}; hInet = InternetOpenA("User-Agent: (Windows; 5.1)", INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0); hConn = InternetConnectA(hInet, "www.google.com", 443, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0); WSAAsyncGetHostByName(NULL, 3, "www.yahoo.com", buf, sizeof(struct hostent)); return 0; } int main(int argc, char *argv[]) { makeMutexA(); makeMutexW(); makeUserAgent(); return 0; }When that source file is compiled, the resulting import table looks as follows:
ws2_32.dllws2_32.dll.WSAAsyncGetHostByName
wininet.dllwininet.dll.InternetOpenAwininet.dll.InternetConnectA
kernel32.dll kernel32.dll.InterlockedIncrement kernel32.dll.IsProcessorFeaturePresent kernel32.dll.GetStringTypeW kernel32.dll.MultiByteToWideChar kernel32.dll.LCMapStringWkernel32.dll.CreateMutexA kernel32.dll.CreateMutexW
kernel32.dll.GetCommandLineA kernel32.dll.HeapSetInformation kernel32.dll.TerminateProcessImphash: 0c6803c4e922103c4dca5963aad36ddf
We abbreviated the table to save space, but the red/bolded APIs are the ones referenced in the source code. Note the order in which they appear in the table, and compare that to the order in which they appear in the source file.
If an author were to change the order of the functions and/or the order of the API calls in the source code, this would in turn affect the compiled import table. Take the previous example, modified:
#include <windows.h> #include <stdio.h> #include <stdlib.h> #include <wininet.h> #pragma comment(lib, "ws2_32.lib") #pragma comment(lib, "wininet.lib") int makeMutexW() { CreateMutexW(NULL, FALSE, L"TestMutex"); return 0; } int makeMutexA() { CreateMutexA(NULL, FALSE, "TestMutex"); return 0; } int makeUserAgent() { HANDLE hInet=0, hConn=0; char buf[sizeof(struct hostent)] = {0}; hConn = InternetConnectA(hInet, "www.google.com", 443, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0); hInet = InternetOpenA("User-Agent: (Windows; 5.1)", INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0); WSAAsyncGetHostByName(NULL, 3, "www.yahoo.com", buf, sizeof(struct hostent)); return 0; } int main(int argc, char *argv[]) { makeMutexA(); makeMutexW(); makeUserAgent(); return 0; }In this example, we have reversed the order of makeMutexW and makeMutexA, and of InternetConnectA and InternetOpenA. (Note that this would be an invalid sequence of API calls, but we use it here to illustrate the point.) Below is the import table generated from this modified source code (again abbreviated); note the changes when compared to the original IAT, above, as well as the different imphash value:
ws2_32.dllws2_32.dll.WSAAsyncGetHostByName
wininet.dllwininet.dll.InternetConnectAwininet.dll.InternetOpenA
kernel32.dll kernel32.dll.InterlockedIncrement kernel32.dll.IsProcessorFeaturePresent kernel32.dll.GetStringTypeW kernel32.dll.MultiByteToWideChar kernel32.dll.LCMapStringWkernel32.dll.CreateMutexWkernel32.dll.CreateMutexA
kernel32.dll.GetCommandLineA kernel32.dll.HeapSetInformation kernel32.dll.TerminateProcessImphash: b8bb385806b89680e13fc0cf24f4431e
The final example shows how the ordering of included files at compile time will affect the resulting IAT (and thus the resulting imphash value). We’ll expand on our original example by adding files
imphash1.c
and imphash2.c
, to be included with our original source file imphash.c
:-- imphash1.c -- int makeNamedPipeA() { HANDLE ph = CreateNamedPipeA("\\\\.\\pipe\\test_pipe", PIPE_ACCESS_DUPLEX, PIPE_TYPE_MESSAGE, 1, 128, 64, 200, NULL); return 0; }
-- imphash2.c -- int makeNamedPipeW() { HANDLE ph2 = CreateNamedPipeW(L"\\\\.\\pipe\\test_pipeW", PIPE_ACCESS_DUPLEX, PIPE_TYPE_MESSAGE, 1, 128, 64, 200, NULL); return 0; }
-- imphash.c -- #include <windows.h> #include <stdio.h> #include <stdlib.h> #include <wininet.h> #include “imphash1.h” #include “imphash2.h” #pragma comment(lib, "ws2_32.lib") #pragma comment(lib, "wininet.lib") int makeMutexW() { CreateMutexW(NULL, FALSE, L"TestMutex"); return 0; } int makeMutexA() { CreateMutexA(NULL, FALSE, "TestMutex"); return 0; } int makeUserAgent() { HANDLE hInet = 0, hConn = 0; char buf[sizeof(struct hostent)] = {0}; hConn = InternetConnectA(hInet, "www.google.com", 443, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0); hInet = InternetOpenA("User-Agent: (Windows; 5.1)", INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0); WSAAsyncGetHostByName(NULL, 3, "www.yahoo.com", buf, sizeof(struct hostent)); return 0; } int main(int argc, char *argv[]) { makeMutexA(); makeMutexW(); makeUserAgent(); makeNamedPipeA(); makeNamedPipeW(); return 0; }Using the following command to build the EXE:
cl imphash.c imphash1.c imphash2.c /W3 /WX /linkThe resulting IAT is:
ws2_32.dll ws2_32.dll.WSAAsyncGetHostByName wininet.dll wininet.dll.InternetConnectA wininet.dll.InternetOpenA kernel32.dll kernel32.dll.TlsFree kernel32.dll.IsProcessorFeaturePresent kernel32.dll.GetStringTypeW kernel32.dll.MultiByteToWideChar kernel32.dll.LCMapStringW kernel32.dll.CreateMutexW kernel32.dll.CreateMutexAkernel32.dll.CreateNamedPipeAkernel32.dll.CreateNamedPipeW
kernel32.dll.GetCommandLineA kernel32.dll.HeapSetInformation kernel32.dll.TerminateProcessImphash: 9129bdbc18cfd1aba498c94e809567d5
Changing the order of includes for
imphash1.h
and imphash2.h
within the source file imphash.c
will have no effect on the ordering of the IAT. However, changing the
order of the files on the command line and recompiling will affect the
IAT; note the re-ordering of CreateNamedPipeW
and CreateNamedPipeA
:cl imphash.c imphash2.c imphash1.c /W3 /WX /link ws2_32.dll ws2_32.dll.WSAAsyncGetHostByName wininet.dll wininet.dll.InternetConnectA wininet.dll.InternetOpenA kernel32.dll kernel32.dll.TlsFree kernel32.dll.IsProcessorFeaturePresent kernel32.dll.GetStringTypeW kernel32.dll.MultiByteToWideChar kernel32.dll.LCMapStringW kernel32.dll.CreateMutexW kernel32.dll.CreateMutexAkernel32.dll.CreateNamedPipeWkernel32.dll.CreateNamedPipeA
kernel32.dll.GetCommandLineA kernel32.dll.HeapSetInformation kernel32.dll.TerminateProcessImphash: c259e28326b63577c31ee2c01b25d3fa
These examples show that both the ordering of functions within the original source code – as well as the ordering of source files at compile time – will affect the resulting IAT, and therefore the resulting imphash value. Because the source code is not organized the same way, two different binaries with exactly the same imports are highly likely to have different import hashes. Conversely, if two files have the same imphash value, they have the same IAT, which implies that the files were compiled from the same source code, and in the same manner.
For packed samples, simple tools or utilities (with few imports and, based on their simplicity, likely compiled in the same way), the imphash value may not be unique enough to be useful for attribution. In other words, it may be possible for two different threat actors to independently generate tools with the same imphash based on those factors.
However, for more complex and/or custom tools (like backdoors), where there are a sufficient number of imports present, the imphash should be relatively unique, and can therefore be used to identify code families that are structurally similar. While files with the same imphash are not guaranteed to originate from the same threat group (it’s possible, for example, for the files were generated by a common builder that is shared among groups) the files can at least be reasonably assumed to have a common origin and may eventually be attributable to a single threat group with additional corroborating information.
Employing this method has given us great success for verifying attacker backdoors over a period of time and demonstrating relationships between backdoors and their associated threat groups.
Mandiant has submitted a patch that enables the calculation of the imphash value for a given PE to Ero Carrera’s pefile (http://code.google.com/p/pefile/).
Example code:
import pefile pe = pefile.PE(sys.argv[1]) print “Import Hash: %s” % pe.get_imphash()Mandiant uses an imphash convention that requires that the ordinals for a given import be mapped to a specific function. We’ve added a lookup for a couple of DLLs that export functions commonly looked up by ordinal to pefile.
Mandiant’s imphash convention requires the following:
- Resolving ordinals to function names when they appear
- Converting both DLL names and function names to all lowercase
- Removing the file extensions from imported module names
- Building and storing the lowercased string . in an ordered list
- Generating the MD5 hash of the ordered list
If imphash values serve as relatively unique identifiers for malware families (and potentially for specific threat groups), won’t discussing this technique alert attackers and cause them to change their methods? Attackers would need to modify source code (in a way that did not affect the functionality of the malware itself) or change the file order at compile time (assuming the source code is spread across multiple files). While attackers could write tools to modify the imphash, we don’t expect many attackers to care enough to do this.
We believe it is important to add imphash to the lexicon as a way to discuss malware samples at a higher level and to exchange information about attackers and threat groups. For example, incident responders can use imphash values to discuss malware without specifically disclosing which exact sample (specific MD5) is being discussed.
Consider a scenario where an attacker compiles 30 variants of its backdoor with different C2 locations and campaign IDs and deploys them to various companies. If a blog post comes out stating that a specific MD5 was identified as part of a campaign, then based on that MD5 the attacker immediately knows what infrastructure (such as C2 domains or associated IP addresses) is at stake and which campaign may be in jeopardy. However, if the malware was identified just by its imphash value, it is possible that the imphash is shared across all 30 of the attacker’s variants. The malware is still identifiable by and can be discussed within the security community, but the attacker doesn’t know which specific samples have been identified or which parts of their infrastructure are in jeopardy.
To demonstrate the effectiveness of this analysis method, we’ve decided to share the imphash values of a few malware families from the Mandiant APT1 report:
Family Name | Import Hash | Total Imports | Number of matched Samples |
---|---|---|---|
GREENCAT | 2c26ec4a570a502ed3e8484295581989 | 74 | 23 |
GREENCAT | b722c33458882a1ab65a13e99efe357e | 74 | 18 |
GREENCAT | 2d24325daea16e770eb82fa6774d70f1 | 113 | 13 |
GREENCAT | 0d72b49ed68430225595cc1efb43ced9 | 100 | 13 |
STARSYPOUND | 959711e93a68941639fd8b7fba3ca28f | 62 | 31 |
COOKIEBAG | 4cec0085b43f40b4743dc218c585f2ec | 79 | 10 |
NEWSREELS | 3b10d6b16f135c366fc8e88cba49bc6c | 77 | 41 |
NEWSREELS | 4f0aca83dfe82b02bbecce448ce8be00 | 80 | 10 |
TABMSGSQL | ee22b62aa3a63b7c17316d219d555891 | 102 | 9 |
WEBC2 | a1a42f57ff30983efda08b68fedd3cfc | 63 | 25 |
WEBC2 | 7276a74b59de5761801b35c672c9ccb4 | 52 | 13 |
Imphash analysis, like any other method, has its limitations and should not be considered a single point of success. Just because two binaries have the same imphash value does not mean they belong to the same threat group, or even that they are part of the same malware family (though there is an increased likelihood that this is the case). Imphash analysis is a low-cost, efficient and valuable way to triage potential malware samples and expand discovery by identifying “interesting” samples that merit further analysis. The imphash value gives analysts another pivot point when conducting discovery on threat groups and their tools. Employing this method can also yield results in tracking and verifying attacker backdoors over time, and it can assist in exposing relationships between backdoors and threat groups. Happy Hunting!
Nenhum comentário:
Postar um comentário