Como hackear un facebook de forma fácil, rápida y segura.

terça-feira, 18 de novembro de 2014

Rastreamento Malware com Importação Hashing

Por Mandiant em 23 de janeiro de 2014

Rastreamento grupos de ameaças ao longo do tempo é uma ferramenta importante para ajudar os defensores caçar para o mal em redes e conduzir resposta a incidentes eficaz. Saber como certos grupos operam faz para uma investigação eficiente e ajuda a identificar facilmente a atividade ator ameaça.

No Mandiant, nós utilizamos vários métodos para ajudar a identificar e correlacionar atividade de grupo ameaça. A peça fundamental do nosso trabalho envolve o acompanhamento diversos itens operacionais, tais como infra-estrutura de atacante e endereços de email. Além disso, acompanhar os backdoors específicas de cada grupo ameaça utiliza - uma das principais formas de acompanhar as atividades de um grupo ao longo do tempo. Por exemplo, alguns grupos podem favorecer o backdoor SOGU, enquanto outros usam HOMEUNIX.

Uma forma original que acompanha Mandiant backdoors grupos de ameaças específicas "é rastrear executável portátil (PE) importações. As importações são as funções que um pedaço de software (neste caso, o backdoor) denomina de outros arquivos (tipicamente várias DLLs que fornecem funcionalidade para o sistema operacional Windows). Para acompanhar essas importações, Mandiant cria um hash com base nos nomes de biblioteca / API e sua ordem específica dentro do executável. Referimo-nos a esta convenção como um "imphash" (para "hash de importação"). Devido à forma como a tabela de importação de um PE é gerado (e, portanto, como sua imphash é calculado), podemos usar o valor imphash para identificar amostras de malware relacionados. Também pode usá-lo para procurar novas amostras semelhantes, que o mesmo grupo ameaça pode ter criado e usado.

Embora Mandiant vem alavancando essa técnica há mais de um ano internamente, nós não somos os primeiros a discutir publicamente isso. Um imphash é uma poderosa forma de identificar o malware relacionado porque o próprio valor deve ser relativamente único. Isto porque vinculador do compilador gera e constrói a tabela de endereços de importação (IAT) com base na ordem específica de funções dentro do arquivo de origem. Tome o seguinte exemplo de código fonte:







#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <wininet.h>

#pragma comment(lib, "ws2_32.lib")
#pragma comment(lib, "wininet.lib")

int makeMutexA()
{
 CreateMutexA(NULL, FALSE, "TestMutex");
 return 0;
}

int makeMutexW()
{
 CreateMutexW(NULL, FALSE, L"TestMutex");
 return 0;
}

int makeUserAgent()
{
 HANDLE hInet=0, hConn=0;
 char buf[sizeof(struct hostent)] = {0};

 hInet = InternetOpenA("User-Agent: (Windows; 5.1)", INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
 hConn = InternetConnectA(hInet, "www.google.com", 443, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0);
 WSAAsyncGetHostByName(NULL, 3, "www.yahoo.com", buf, sizeof(struct hostent));
 return 0;
}

int main(int argc, char *argv[])
{

 makeMutexA();
 makeMutexW();
 makeUserAgent();

 return 0;
}
When that source file is compiled, the resulting import table looks as follows:
ws2_32.dll
ws2_32.dll.WSAAsyncGetHostByName
wininet.dll
wininet.dll.InternetOpenAwininet.dll.InternetConnectA
kernel32.dll
        kernel32.dll.InterlockedIncrement
        kernel32.dll.IsProcessorFeaturePresent
        kernel32.dll.GetStringTypeW
        kernel32.dll.MultiByteToWideChar
        kernel32.dll.LCMapStringW
kernel32.dll.CreateMutexA kernel32.dll.CreateMutexW
        kernel32.dll.GetCommandLineA
        kernel32.dll.HeapSetInformation
        kernel32.dll.TerminateProcess
Imphash: 0c6803c4e922103c4dca5963aad36ddf
We abbreviated the table to save space, but the red/bolded APIs are the ones referenced in the source code. Note the order in which they appear in the table, and compare that to the order in which they appear in the source file.
If an author were to change the order of the functions and/or the order of the API calls in the source code, this would in turn affect the compiled import table. Take the previous example, modified:
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <wininet.h>

#pragma comment(lib, "ws2_32.lib")
#pragma comment(lib, "wininet.lib")

int makeMutexW()
{
 CreateMutexW(NULL, FALSE, L"TestMutex");
 return 0;
}

int makeMutexA()
{
 CreateMutexA(NULL, FALSE, "TestMutex");
 return 0;
}

int makeUserAgent()
{
 HANDLE hInet=0, hConn=0;
 char buf[sizeof(struct hostent)] = {0};

 hConn = InternetConnectA(hInet, "www.google.com", 443, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0);
 hInet = InternetOpenA("User-Agent: (Windows; 5.1)", INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
 WSAAsyncGetHostByName(NULL, 3, "www.yahoo.com", buf, sizeof(struct hostent));
 return 0;
}

int main(int argc, char *argv[])
{

 makeMutexA();
 makeMutexW();
 makeUserAgent();

 return 0;
}
In this example, we have reversed the order of makeMutexW and makeMutexA, and of InternetConnectA and InternetOpenA. (Note that this would be an invalid sequence of API calls, but we use it here to illustrate the point.) Below is the import table generated from this modified source code (again abbreviated); note the changes when compared to the original IAT, above, as well as the different imphash value:
ws2_32.dll
ws2_32.dll.WSAAsyncGetHostByName
wininet.dll
wininet.dll.InternetConnectAwininet.dll.InternetOpenA
kernel32.dll
        kernel32.dll.InterlockedIncrement
        kernel32.dll.IsProcessorFeaturePresent
        kernel32.dll.GetStringTypeW
        kernel32.dll.MultiByteToWideChar
        kernel32.dll.LCMapStringW
kernel32.dll.CreateMutexWkernel32.dll.CreateMutexA
        kernel32.dll.GetCommandLineA
        kernel32.dll.HeapSetInformation
        kernel32.dll.TerminateProcess
Imphash: b8bb385806b89680e13fc0cf24f4431e
The final example shows how the ordering of included files at compile time will affect the resulting IAT (and thus the resulting imphash value). We’ll expand on our original example by adding files imphash1.c and imphash2.c, to be included with our original source file imphash.c:
-- imphash1.c --
int makeNamedPipeA()
{
  HANDLE ph = CreateNamedPipeA("\\\\.\\pipe\\test_pipe", PIPE_ACCESS_DUPLEX,
          PIPE_TYPE_MESSAGE, 1, 128,
          64, 200, NULL);

 return 0;
}
-- imphash2.c --
int makeNamedPipeW()
{
   HANDLE ph2 = CreateNamedPipeW(L"\\\\.\\pipe\\test_pipeW", PIPE_ACCESS_DUPLEX,
    PIPE_TYPE_MESSAGE, 1, 128,
    64, 200, NULL);
 return 0;
}
-- imphash.c --
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <wininet.h>
#include “imphash1.h”
#include “imphash2.h”

#pragma comment(lib, "ws2_32.lib")
#pragma comment(lib, "wininet.lib")

int makeMutexW()
{
 CreateMutexW(NULL, FALSE, L"TestMutex");
 return 0;
}

int makeMutexA()
{
 CreateMutexA(NULL, FALSE, "TestMutex");
 return 0;
}

int makeUserAgent()
{
 HANDLE hInet = 0, hConn = 0;
 char buf[sizeof(struct hostent)] = {0};

 hConn = InternetConnectA(hInet, "www.google.com", 443, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0);
 hInet = InternetOpenA("User-Agent: (Windows; 5.1)", INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
 WSAAsyncGetHostByName(NULL, 3, "www.yahoo.com", buf, sizeof(struct hostent));
 return 0;
}

int main(int argc, char *argv[])
{

 makeMutexA();
 makeMutexW();
 makeUserAgent();
 makeNamedPipeA();
 makeNamedPipeW();

 return 0;
}
Using the following command to build the EXE:
cl imphash.c imphash1.c imphash2.c /W3 /WX /link
The resulting IAT is:
ws2_32.dll
        ws2_32.dll.WSAAsyncGetHostByName
wininet.dll
        wininet.dll.InternetConnectA
        wininet.dll.InternetOpenA
kernel32.dll
        kernel32.dll.TlsFree
        kernel32.dll.IsProcessorFeaturePresent
        kernel32.dll.GetStringTypeW
        kernel32.dll.MultiByteToWideChar
        kernel32.dll.LCMapStringW
        kernel32.dll.CreateMutexW
        kernel32.dll.CreateMutexA
kernel32.dll.CreateNamedPipeAkernel32.dll.CreateNamedPipeW
        kernel32.dll.GetCommandLineA
        kernel32.dll.HeapSetInformation
        kernel32.dll.TerminateProcess
Imphash: 9129bdbc18cfd1aba498c94e809567d5
Changing the order of includes for imphash1.h and imphash2.h within the source file imphash.c will have no effect on the ordering of the IAT. However, changing the order of the files on the command line and recompiling will affect the IAT; note the re-ordering of CreateNamedPipeW and CreateNamedPipeA:
cl imphash.c imphash2.c imphash1.c /W3 /WX /link

ws2_32.dll
        ws2_32.dll.WSAAsyncGetHostByName
wininet.dll
        wininet.dll.InternetConnectA
        wininet.dll.InternetOpenA
kernel32.dll
        kernel32.dll.TlsFree
        kernel32.dll.IsProcessorFeaturePresent
        kernel32.dll.GetStringTypeW
        kernel32.dll.MultiByteToWideChar
        kernel32.dll.LCMapStringW
        kernel32.dll.CreateMutexW
        kernel32.dll.CreateMutexA
kernel32.dll.CreateNamedPipeWkernel32.dll.CreateNamedPipeA
        kernel32.dll.GetCommandLineA
        kernel32.dll.HeapSetInformation
        kernel32.dll.TerminateProcess
Imphash: c259e28326b63577c31ee2c01b25d3fa
These examples show that both the ordering of functions within the original source code – as well as the ordering of source files at compile time – will affect the resulting IAT, and therefore the resulting imphash value. Because the source code is not organized the same way, two different binaries with exactly the same imports are highly likely to have different import hashes. Conversely, if two files have the same imphash value, they have the same IAT, which implies that the files were compiled from the same source code, and in the same manner.
For packed samples, simple tools or utilities (with few imports and, based on their simplicity, likely compiled in the same way), the imphash value may not be unique enough to be useful for attribution. In other words, it may be possible for two different threat actors to independently generate tools with the same imphash based on those factors.
However, for more complex and/or custom tools (like backdoors), where there are a sufficient number of imports present, the imphash should be relatively unique, and can therefore be used to identify code families that are structurally similar. While files with the same imphash are not guaranteed to originate from the same threat group (it’s possible, for example, for the files were generated by a common builder that is shared among groups) the files can at least be reasonably assumed to have a common origin and may eventually be attributable to a single threat group with additional corroborating information.
Employing this method has given us great success for verifying attacker backdoors over a period of time and demonstrating relationships between backdoors and their associated threat groups.
Mandiant has submitted a patch that enables the calculation of the imphash value for a given PE to Ero Carrera’s pefile (http://code.google.com/p/pefile/).
Example code:
import pefile
pe = pefile.PE(sys.argv[1])
print “Import Hash: %s” % pe.get_imphash()
Mandiant uses an imphash convention that requires that the ordinals for a given import be mapped to a specific function. We’ve added a lookup for a couple of DLLs that export functions commonly looked up by ordinal to pefile.
Mandiant’s imphash convention requires the following:
  • Resolving ordinals to function names when they appear
  • Converting both DLL names and function names to all lowercase
  • Removing the file extensions from imported module names
  • Building and storing the lowercased string . in an ordered list
  • Generating the MD5 hash of the ordered list
This convention is implemented in pefile.py version 1.2.10-139 starting at line 3618.
If imphash values serve as relatively unique identifiers for malware families (and potentially for specific threat groups), won’t discussing this technique alert attackers and cause them to change their methods? Attackers would need to modify source code (in a way that did not affect the functionality of the malware itself) or change the file order at compile time (assuming the source code is spread across multiple files). While attackers could write tools to modify the imphash, we don’t expect many attackers to care enough to do this.
We believe it is important to add imphash to the lexicon as a way to discuss malware samples at a higher level and to exchange information about attackers and threat groups. For example, incident responders can use imphash values to discuss malware without specifically disclosing which exact sample (specific MD5) is being discussed.
Consider a scenario where an attacker compiles 30 variants of its backdoor with different C2 locations and campaign IDs and deploys them to various companies. If a blog post comes out stating that a specific MD5 was identified as part of a campaign, then based on that MD5 the attacker immediately knows what infrastructure (such as C2 domains or associated IP addresses) is at stake and which campaign may be in jeopardy. However, if the malware was identified just by its imphash value, it is possible that the imphash is shared across all 30 of the attacker’s variants. The malware is still identifiable by and can be discussed within the security community, but the attacker doesn’t know which specific samples have been identified or which parts of their infrastructure are in jeopardy.
To demonstrate the effectiveness of this analysis method, we’ve decided to share the imphash values of a few malware families from the Mandiant APT1 report:
Family Name Import Hash Total Imports Number of
matched
Samples
GREENCAT 2c26ec4a570a502ed3e8484295581989 74 23
GREENCAT b722c33458882a1ab65a13e99efe357e 74 18
GREENCAT 2d24325daea16e770eb82fa6774d70f1 113 13
GREENCAT 0d72b49ed68430225595cc1efb43ced9 100 13
STARSYPOUND 959711e93a68941639fd8b7fba3ca28f 62 31
COOKIEBAG 4cec0085b43f40b4743dc218c585f2ec 79 10
NEWSREELS 3b10d6b16f135c366fc8e88cba49bc6c 77 41
NEWSREELS 4f0aca83dfe82b02bbecce448ce8be00 80 10
TABMSGSQL ee22b62aa3a63b7c17316d219d555891 102 9
WEBC2 a1a42f57ff30983efda08b68fedd3cfc 63 25
WEBC2 7276a74b59de5761801b35c672c9ccb4 52 13
We calculated the above malware families and corresponding imphash values over the set of malware from the Mandiant APT1 report released in February 2013. Using the imphash method described above, we calculated imphash values over all the samples, and then counted the total number of samples that matched on each imphash. Using 356 total samples from the report, we were able to identify 11 imphash values that provided significant coverage of their respective families. Pivoting from these imphash values, we were able to identify additional malware samples that further analysis showed were part of the same malware families and attributable to the same threat group.

Imphash analysis, like any other method, has its limitations and should not be considered a single point of success. Just because two binaries have the same imphash value does not mean they belong to the same threat group, or even that they are part of the same malware family (though there is an increased likelihood that this is the case). Imphash analysis is a low-cost, efficient and valuable way to triage potential malware samples and expand discovery by identifying “interesting” samples that merit further analysis. The imphash value gives analysts another pivot point when conducting discovery on threat groups and their tools. Employing this method can also yield results in tracking and verifying attacker backdoors over time, and it can assist in exposing relationships between backdoors and threat groups. Happy Hunting!

















































































Nenhum comentário:

Postar um comentário