Tuesday, January 8, 2013

About shellcodes

In this post we have documented a beginners introduction to shellcode writing. We go from zero to a super simple shellcode using tools you may find already installed in any serious operating system. If you are looking for a digested and more mature way of generating shellcode you should check InlineEggMOSDEF or impurity first.


A shellcode is an opaque byte array which if it is in memory and the control is passed to it, it executes a shell (or anything else).
Let's start with a program simulating a post-exploit situation:  http://pastebin.com/Fus1b46S
This program should:
  1. Allocate executable memory.
  2. Read/load the content of a file (shellcode).
  3. Pass the control.
Now, let's try to build a shellcode. (piece of memory which you pass the control to it and it prints "YOU WIN!")
Considering the helloworld assembly...

  1. .section .data
  2. message:
  3.     .ascii "YOU WIN!\n"
  4.     len = . - message
  5. .section .text
  6. .globl _start
  7. _start:
  8.     #write mesaje to stdout
  9.     movl $len, %edx              # LEN
  10.     movl $message, %ecx          # BUFFER
  11.     movl $1, %ebx                # FD
  12.     movl $4, %eax                # WRITE
  13.     int $0x80                    # SYSCALL
  14.     #exit
  15.     movl $0, %ebx                # RETVALUE
  16.     movl $1, %eax                # EXIT
  17.     int $0x80                    # SYSCALL

Clearly, this code have 2 different sections or pieces of memory: .text and .data. The code lives in .text and the message in .data.

To assemble it execute:

    $ as -32 01-helloworld.s -o 01-helloworld.o 

and then you could inspect it with objdump (or nmreadelf, od):

    $ objdump -s 01-helloworld.o 

    01-helloworld.o:     file format elf32-i386

    Contents of section .text:
     0000 ba0b0000 00b90000 0000bb01 000000b8  ................
     0010 04000000 cd80bb00 000000b8 01000000  ................
     0020 cd80                                 ..              
    Contents of section .data:
     0000 486f6c61 204d756e 646f0a             Hello World.     !

The interesting thing is the presence of  relocations, i.e. different sections being referenced one from the other. In this case, there is a reference to message which is in .data section from the .text section. The relocations can be inspected with the -r argument.

    $objdump -r 01-helloworld.o 

    01-helloworld.o:     file format elf32-i386

    RELOCATION RECORDS FOR [.text]:
    OFFSET   TYPE              VALUE 
    00000006 R_386_32          .data

Generally, the loader (man ld)  is in charge of loading the sections in memory, setting the permissions and resolving the relocations.
ld combines a number of object and archive files, relocates their data and ties up symbol references. Usually the last step in compiling a program is to run ld.
To build a shellcode using the classic toolchain we need code without relocations. The easy way is to generate code that uses only one section (or implement the loader by hand). Ok, to accomplish this we need to do two things:
  1. Mix the sections in only one section => .text.
    • Data will appear following the code (you can invent other layauts)
  2. Variable references must be calculated taking into account static offsets and the current code position in memory (you can get it from the stack making a call).
The resulting assembler code using just one section looks like the following:

  1. # BASIC SHELLCODE
  2. .section .text                         #everything in one section
  3. .globl _start
  4. _start:
  5.     call dummy
  6. dummy: #<--------------------------------\
  7.     popl %ecx                           #address of dummy in ecx
  8.     #write mesaje to stdout using int 80
  9.     movl $len, %edx                     # LEN
  10.     # add the distance from dummy to message to get
  11.     # the absolute pointer to message
  12.     # no matter where this code is put in memory
  13.     addl $offset_dummy_to_message, %ecx # buffer
  14.     movl $1, %ebx                       # FD
  15.     movl $4, %eax                       # WRITE
  16.     int $0x80                           # SYSCALL
  17.     #exit
  18.     movl $0, %ebx                       # RETVALUE
  19.     movl $1, %eax                       # EXIT
  20.     int $0x80                           # SYSCALL
  21.     # The GNU Assembler knows the sizes of intructions
  22.     # and is able to calculate distances between labels
  23.     # statically at 'assembling' time
  24.     offset_dummy_to_message = message-dummy   # This wont generate
  25.                                               # nothing in the binary
  26. message:
  27.     # This is in the .text section and inmediatelly
  28.     # after the last instruction
  29.     .ascii "YOU WIN\n"
  30.     len = . - message

As usual it is assembled with this command:

    $as -32 02-helloworld-shellcode.s -o 02-helloworld-shellcode.o

This generates an ELF file with only one section and no relocations. You can inspect it with:

    $objdump -S 02-holamundo-shellcode.o
    $objdump -r 02-holamundo-shellcode.o

To obtain the shellcode from the .text section (you can also get it with objdump -D), we use this python program:

  1. from elftools.elf.elffile import ELFFile
  2. import sys
  3. if len(sys.argv) != 3:
  4.         print "Usage:\t%s file.o file.bin"%sys.argv[0]
  5.         sys.exit()
  6. #read the ELF object file
  7. elf = ELFFile(file(sys.argv[1]))
  8. #read .text data (abort if not found)
  9. text = elf.get_section_by_name('.text').data()
  10. print "Section .text is %d bytes long"%len(text)
  11. print "Dumping .text section to %s ..."%sys.argv[2]
  12. file(sys.argv[2],"wb").write(text)
  13. print "done."


Okay this is all good, but someone could argue that assembler is not really comfortable to write large or even medium sized code. I'm not sure about that but I'm going to show how to write shellcode in C in a following post anyway. 

No comments:

Post a Comment