Sunday, September 11, 2011

Malware Analysis 3: int2d anti-debugging (Part I)

Learning Goals:
  1. Understand the general interrupt handling  mechanism on X86 platform.
  2. Understand the byte scission anti-debugging technique.
  3. Know how to use a binary debugger to patch an executable program
Applicable to:
  1. Computer Architecture
  2. Operating Systems
  3. Principles of Programming Languages

Challenge of the Day:
  1. Analyze the code between 0xaaaa and 0xaaaa. What is its purpose?

1. Introduction

  To prolong the life of a malware, anti-debugging techniques are frequently used to delay the analysis process performed by security experts. This lesson presents "int 2d", an example of the various anti-debug techniques employed by Max++. Bonfa has provided a brief introduction of this technique in [1]. Our analysis complements [1], and presents an in-depth analysis of the vulnerabilities of debuggers.

  The purpose of anti-debugging is to hinder the process of reverse engineering. There could be several general approaches: (1) to detect the existence of a debugger, and behave differently when a debugger is attached to the current process; (2) to disrupt or crash a debugger. Approach (1) is the mostly frequently applied (see an excellent survey in [2]). Approach (2) is rare (it targets and attacks a debugger - and we will see several examples in Max++ later). Today, we concentrate on Approach (1).

  To tell the existence of a debugger, as pointed by Shields in [2], there are many different ways. For example, an anti-debugging program can call system library functions such as "isDebuggerPresent()", or to examine the data structure of Thread Information Block (TIB/TEB) of the operating system. These techniques can be easily evaded by a debugger, by purposely masking the return result or the kernel data structure of the operating system.

  The instruction we are trying to analyze is the "INT 2D" instruction located at 0x00413BD5 (as shown in Figure 1). By single-stepping the malware, you might notice that the program's entry point is 0x00413BC8. After the execution of the first 8 instructions, right before the "INT 2D" instruction, the value of EAX is 0x1. This is an important fact you should remember in the later analysis.


Figure 1. Snapshot of Max++ Entry Point
2. Background Information

  Now let us watch the behavior of the Immunity Debugger (IMM).   By stepping over (using F8) the instruction "INT 2D" at 0x413BD5,  we are supposed to stop at the next immediate instruction "RETN" (0x00413BD7), however, it is not. The new EIP value (i.e., the location of the next instruction to be executed is 0x00413A38)! Now the big question: is the behavior of the IMM debugger correct (i.e., is it exactly the same as the normal execution of Max++ without debugger attached)?

  We need to read some background information of "INT 2D". Please take one hour and read the following related articles carefully. (Simply search for the "int 2d", and ignore the other parts).

  1. Guiseppe Bonfa, "Step-by-Step Reverse Engineering Malware: ZeroAccess / Max++ / Smiscer Crimeware Rootkit", Available at http://resources.infosecinstitute.com/step-by-step-tutorial-on-reverse-engineering-malware-the-zeroaccessmaxsmiscer-crimeware-rootkit/
  2. Tyler Shields, "Anti-Debugging - A Developer's View", Available at http://www.shell-storm.org/papers/files/764.pdf
  3. P. Ferrie, "Anti-Unpacker Tricks - Part Three", Virus Bulletin Feb 2009. Available at http://pferrie.tripod.com/papers/unpackers23.pdf, Retrieved 09/07/2011.
Let's summarize the conclusion of the above related work:
  1. Bonfa in [1] points out that the "int 2d" instruction will trigger an interrupt (exception). When a debugger is attached, the exception is handled; and when a debugger is not attached, the program (Max++) will be able to see the exception. The execution of "int 2d" will cause a byte scission (the next immediate byte following "int 2d" will be skipped). However, no explanation is provided for this byte scission. A solution is given: use the StrongOD plug-in for OllyDbg to handle the correct execution of "int 2d". We could not repeat the success of StrongOD on IMM, however, the readers are encouraged to try it on OllyDbg.
  2. Shields in [2] gives a high-level language example of the int 2d anti-debugging trick. The example is adapted from its section III.A (the int 3 example). This example explains how the malware can "see" the debugger, using a try-catch structure. when a debug IS attached, the "try-catch" will not be able to capture that exception (because the debugger has already handled the exception).; When no debugger is attached, its "try-catch" struct can capture the "int 3 (or 2d)" exception (thus set a flag which indicates a debugger is not attached);
  3. Ferrie in [3] gives an explanation of the reason why there is a byte scission of program execution. Ferrie gives an excellent example in Section 1.3 of [3]. We added a number of comments for each instruction. This example corresponds to the high-level language example in [2], however, at the assembly level and relies on a OS support for exception handling, called "SEH" (Structured Exception Handling). We will later come back to this example and explain its details after introducing SEH in Section 3.
               ----------------------------------------------------------------------------------
      1      xor eax, eax           
      2      push offset l1         
      3      push d fs:[eax]
      4      mov fs:[eax], esp
      5      int 2dh                
      6      inc eax                
      7      je being_debugged      
      8          ...
      9  l1: xor al, al             
      10     ret                     
                 ----------------------------------------------------------------------------------
          Listing 1. The int 2dh example from  P. Ferrie, "Anti-Unpacker Tricks - Part Three", VB2009

    3. Structured Exception Handling

    3.1 Interrupt and Exceptions

      When a program uses instructions like "int 2d" - it's an exception and triggers a whole procedure of interrupt handling. It is beneficial to completely understand the technical details involved in the interrupt handling scenario. We recommend the Intel IA32 Manual [5] (ch6: interrupt and exception overview). Some important facts are listed below:
    1. Interrupts happen due to hardware signals (e.g., I/O completion signals, and by executing INT xx instructions). They happen at random time (e.g., I/O signal), except the direct call of INT instructions.
    2. Exceptions occurs when CPU detects error when executing an instruction.
    3. When an interrupt/exception occurs, normal execution is interrupted and CPU jumps to the interrupt handler (a piece of code that handles the interrupt/exception). When interrupt handler completes, the normal execution resumes. Interrupt handlers are loaded by OS during system booting, and there is an interrupt vector table (also called interrupt descriptor table IDT) which defines which handler deals with which interrupt.
    4. In general there are following interrupts/exceptions: (1) software generated exceptions (INT 3 and other INT n instructions - note the discussion of "not pushing error code into stack" for INT n instructions), (2) machine checking interrupts (not interesting to us at this point), (3) fault - an exception that can be corrected, when the execution resumes, it executes the same instructions (which triggers the exception) again, (4) trap - different from fault in that when resuming, it resumes from the next immediate instruction (to be executed), (5) abort (severe errors, not interesting to us at this point). If you look at Table 6-1, the divide by 0 exception and protection error are fault, and the INT 3 (software breakpoint) is a trap. Section 6.6 gives you a clear idea of the difference between fault and trap.
    5. When an interrupt/exception happens, the CPU pushes the following information (varies depending on the type of interrupt/exception): EIP, CS and flag registers, and ERROR CODE into the stack. Then find out the entry address of the interrupt handler using IDT, and jumps to it. Note that the saved EIP/CS (return address) depends on if it is a fault and trap! Then the interrupt handler will take over the job, and when resuming, use the information of the saved EIP/CS.


    3.2 Structured Exception Handling

       Different from Intel IA32 Manual, Microsoft WIN32 encapsulates the details of interrupt handling. An MSDN article [6] provides an overview. In Win32 portable interrupt handling service, all hardware signals (irrepeatable and asynchronous) are treated as "interrupts"; and all other replicable exceptions (including faults, traps, and INT xx instructions) are treated as exceptions in Win32, and all exceptions are handled using a mechanism called Structured Exception Handling (SEH) [this includes the case int 2dh!]. M. Peitrek provides an excellent article [4] on the Microsoft System Journal, which reveals the internals of SEH. We recommend you thoroughly read [4] before proceeding to our discussion next.
    3.3 Structured Exception Handling.

      Figure 2 displays a general procedure to handle an exception. When a program generates an error (e.g., divide by 0 error), CPU will raise an exception. By looking at the IDT (interrupt dispatch table), CPU retrieves the entry address of the interrupt service handler (ISR).  In most cases, the Win32 ISR will call KiDispatchException (we will later come back to this function). Then the ISR will look for user defined exception handlers, until one handles the error successfully. There are several interesting points here:
    1. The ISR needs to find a user-defined handle (e.g., the catch clause in the program). Where to find it? The memory word at FS:[0] contains the entry address. Here FS, like CS, DS, and SS, is one of the segment registers in a X86 CPU. In Win32, FS register always points to a kernel data structure TIB (Thread Information Block). TIB records the important system information (such as stack top, last error, process ID) of the current thread being executed. The first memory word of TIB is the address of the Exception Handler Record which contains the information. Thus from FS:[0] (meaning the word at the offset 0 starting from segment base FS), ISR could invoke the user-defined handlers. For more information on TIB, you can read [8].
    2. Notice that there is a CHAIN OF HANDLERS! This is natural because you might have nested try-catch statement. In addition, in the case the error is not handled by the user program, the system will anyway provides a handler which terminates the user application and popping a Windows error dialog which shows you "Program error at 0xaabbcc, debug or terminate it?". Where to place this chain of handlers? It's the stack of the user program. Each element if the chain is an instance of the _EXCEPTION_REGISTRATION data structure. Read [4] for more details! To make a complete story, the _EXCEPTION_REGISTRATION struct from [4] is shown in the following: here "dd" stands for "double word" (32-bit memory word). The "prev" field points to the previous exception registration record and the "handler" is the entry address of the handler.

                      
    _EXCEPTION_REGISTRATION struc
    prev    dd      ?
    handler dd      ?
    _EXCEPTION_REGISTRATION ends


         3. How does ISR tell when to stop? When a user-defined handler returns 0 (ExceptionContinueExecution), the ISR can resume the user process. When a handler returns 1 (ExceptionContinueSearch), the IRS will have to search in the chain for the next handler. The definition of ExceptionContinueExecution can be found in the definition of EXCEPTIOn_DISPOSITION in  EXCPT.h (you can easily google to find its source file).
    Figure 2. General Procedure of Handling an Exception

    3.3 Revisit of Ferrie's Example [3]

      With the information of 3.2, we are now able to completely understand the details of Ferrie's example. Some important points are listed below:
    1. Instructions 2 to 4 builds a new _EXCEPTION_REGISTRATION record. Instruction 2 sets up the handler entry address, instruction 3 sets the "prev" link, and instruction 4 makes FS:[0] to point to the new record
    2. Instruction 9 sets the value of the AL register to 0. This is essentially to return 0 (ExceptionContinueExecution). This is to inform the IRS that the error is handled and there is no need to look for other handlers. Then the IRS will resume the normal execution (the old instruction might be re-executed, or it starts from the next immediate instruction. This will depend on the type of the fault/trap, see Intel IA32 manual chapter 6).

           ----------------------------------------------------------------------------------
          1      xor eax, eax           # EAX = 0        
          2      push offset l1         # push the entry of new handler into stack
          3      push d fs:[eax]        # push the old entry into stack
          4      mov fs:[eax], esp      # now make fs:[0] points to the new _Exception_Registration record
          5      int 2dh                # interrupt -> CPU will jump to l1
          6      inc eax                # EAX = 1, will be skipped (when debugger attached)
          7      je being_debugged      # if EAX=0, an debugger is there
          8          ...
          9  l1: xor al, al            # handler: set AL=0 (this is to return 0)
          10     ret                     
              ----------------------------------------------------------------------------------
              Listing 2. Ferrie's Example with Comments , "Anti-Unpacker Tricks - Part Three", VB2009


    3.4 Int 2D Service

      We now examine some important facts related to INT 2d.  Almeida provides an excellent article about the INT 2d service and kernel debugging. We recommend a thorough reading of this article [7].

      INT 2d is the interface for Win32 kernel to provide kernel debugging services to user level debuggers and remote debuggers such as IMM, Kd and WinDbg. User level debuggers invoke the service usually by

     NTSTATUS DebugService(UCHAR ServiceClass, PVOID arg1, PVOID arg2)

    According to  [7], there are four classes (1: Debug printing, 2: interactive prompt, 3: load image, 4: unload image). The call of DebugService is essentially translated to the following machine code:

      EAX <- ServiceClass
      ECX <- Arg1
      EDX <- Arg2
      INT 2d

    The interrupt triggers CPU to jump to KiDispatchException, which later calls KdpTrap (when the DEBUG mode of the windows ini file is on, when Windows XP boots). KdpTrap takes an EXCEPTION_RECORD constructed by KiDispatchException. The EXCEPTION_RECORD contains the following information: ExceptionCode: BREAKPOINT, arg0: EAX, arg1: ECX, and arg2: EDX. Note that according to [7] (Section "Notifying Debugging Events"), the INT 3 interrupts (software breakpoints) is also handled by KdpTrap except that arg0 is 0.

     Notice that KiDispatchException deserves some special attention. Nebbett in his book [9] (pp. 439 - sometimes you can view sample chapters from Google books) lists the implementation code of KiDispatchException (in Example C.1). You have to read the code in [9] and there are several interesting points. First, let's concentrate on the case if the previous mode of the program is kernel mode (i.e., it's the kernel code which invokes the interrupt):
    1. At line 4 of the function body, KiDispatchException reduces EIP by 1, if the Exception code is STATUS_BREAKPOINT (this happens when int 2dh and int 3 are invoked). Note that in [3], P. Ferrie gave an excellent explanation regarding why the code reduces EIP by 1!
    2. It calls KiDebugRoutine several times. KiDebugRoutine is a function pointer. It points to KdpTrap (if debug enabled set in BOOT.ini), otherwise KdpTrapStub (which does nothing). 
    3. KdpTrap/KiDebugRoutine is invoked first, and then user handler is invoked (given search frame is enabled), and then KiDebugRoutine is invoked second time if user handle did not finish the job
    For the "user mode" (it's the user program which invokes int 2d):
    1. It first check if there is a user debugger not attached (by checking DEBUG_PORT). If this is the case, kernel debugging service KiDispatchException will be called first to handle the exception.
    2. Then there is a complex nested if-else statements which uses DbgkForwardException to forward the exception to user debugger. (Unfortunately, there are not sufficient documentations for these involved functions). Our guess is that DbgkForwardException is to invoke user debugger to handle exception and KiUserDipsatchException is called to search for frame based user handlers if user debugger could not handle it.
    3. If the Search Frames attribute is false, the above (1 and 2) are not tried at all. It is directly forwarded to user debugger (make it to try twice), and if not processed, terminate the user process.
    Now let's look back to Ferrie's article [3] again. The following description is complex and we will verify it in our later experient (in part II). Here the "exception address" is the "EIP value of the context" (which to be copied back to user process), and the "EIP register value" is the real EIP value of the user process when the exception occurs.

    "After an exception has occurred, and in the absence of a
    debugger, execution will resume by default at the exception
    address.
    The assumption is that the cause of the exception
    will have been corrected, and the faulting instruction will
    now succeed. In the presence of a debugger, and if the
    debugger consumed the exception, execution will resume at
    the current EIP register value."

    What's more important is the following description from [3]: This should happen even before KiDispatchException is called.

    "
    However, when interrupt 0x2D is executed, Windows
    uses the current EIP register value as the exception
    address and increases the EIP register value by one.

    Finally, it issues an EXCEPTION_BREAKPOINT
    (0x80000003) exception. Thus, if the ‘CD 2D’ opcode
    (‘INT 0x2D’ instruction) is used, the exception address
    points to the instruction immediately following the
    interrupt 0x2D instruction, as for other interrupts, and the
    EIP register value points to a memory location that is one
    byte after that.
    "

    According to [3], due to the above behaviors of Win32 exception handling, it could cause byte scission. When a user debugger (e.g., OllyDbg) decides to resume the execution using the EIP register value, its behavior will be different from a normal execution. We will verify this argument in our later experiments. In summary we want to consider the following factors in our experiments:

    1. How does the debug mode (enabled in boot.ini) affect the user debugger behavior?
    2. How would user defined handlers affect the behavior?
    3. In summary, is the behavior of IMM correct regarding the code at 0x413BD5?
    We will explore them in our experiments in Part II of this serie.

    References

    [1] Guiseppe Bonfa, "Step-by-Step Reverse Engineering Malware: ZeroAccess / Max++ / Smiscer Crimeware Rootkit", Available at http://resources.infosecinstitute.com/step-by-step-tutorial-on-reverse-engineering-malware-the-zeroaccessmaxsmiscer-crimeware-rootkit/

    [2] Tyler Shields, "Anti-Debugging - A Developer's View", Available at http://www.shell-storm.org/papers/files/764.pdf

    [3] P. Ferrie, "Anti-Unpacker Tricks - Part Three", Virus Bulletin Feb 2009. Available at http://pferrie.tripod.com/papers/unpackers23.pdf, Retrieved 09/07/2011.

    [4] M. Pietrek, "A Crash Course on the Depth of Win32Tm Structured Exception Handling," Microsoft System Journal, 1997/01. Available at http://www.microsoft.com/msj/0197/exception/exception.aspxhttp://www.microsoft.com/msj/0197/exception/exception.aspx.

    [5] Intel, "Intel 64 and  IA-32 Architectures for Software Developers Manual (5 Volume)", Available at http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

    [6] Microsoft, "Lesson 8 - Interrupt and Exception Handling", MSDNAA. Available at
    http://technet.microsoft.com/en-us/library/cc767887.aspx

    [7] A. Almeida, "Kernel and Remote Debuggers", Developer Fusions. Available at
    http://www.developerfusion.com/article/84367/kernel-and-remote-debuggers/

    [8] Wikipedia, "Win32 Thread Information Block", Available at http://en.wikipedia.org/wiki/Win32_Thread_Information_Block.

    [9] G. Nebbett, "Windows NT/2000 Native API Reference", pp. 439-441, ISBN: 1578701996.