Writing an ELF executable from scratch
The ELF header layout
An ELF file begins with a few magic bytes and data structures that instruct the operating system on how to load and run the executable. A basic C representation of this would be:
struct ELFHeader {
char magic_bytes[4];
uint8_t address_width;
uint8_t endianness;
uint8_t elf_version;
uint8_t os_abi;
uint8_t abi_version;
char padding[7];
uint16_t file_type;
uint16_t instruction_set;
uint32_t elf_version_copy;
void* entry_point;
void* program_header_table_pointer;
void* section_header_table_pointer;
uint8_t flags[4];
uint16_t header_size;
uint16_t program_header_table_entry_size;
uint16_t program_header_table_number_of_entries;
uint16_t section_header_table_entry_size;
uint16_t section_header_table_number_of_entries;
uint16_t section_header_table_section_names_index;
}
Don’t worry if some of these fields seem confusing, they will be explained as we add them to the executable.
Building our ELF header
We begin by defining the four byte “Magic Number” that specifies that
this is an ELF executable. This is the byte 0x7F
followed
by the ASCII representation of “ELF”
0x7F // ELF magic number
0x45 // 'E'
0x4C // 'L'
0x46 // 'F'
We next must fill in the next five bytes to represent what platform we are targeting; followed by seven reserved bytes of padding which should be filled with zeroes. Since we will be targeting 64 Bit Linux we set these bytes to:
0x02 // 64 Bit Executable (0x01 represents 32 Bit)
0x01 // Little Endian (any Intel or AMD x86_64 processor will always be little endian)
0x01 // This is the current version of ELF
0x00 // System V UNIX ABI (There are a few valid values of this which aren't reproduced here)
0x00 // Ignored ABI Version Specifier on Linux
0x00 0x00 0x00 0x00 0x00 0x00 0x00 // Seven bytes of padding
We now have to set the flags that determine what type of ELF binary we are creating and the targeted instruction set. Since we are creating a executable program this will be
0x02 0x00 // '2' (The bytes appear out of order since they are little endian)
0x3E 0x00 // 0x3E is the hex representation for the x86_64 processor architecture
We follow this by four bytes representing another copy of the current version of ELF
0x01 0x00 0x00 0x00 // the same as previously only this time with extra bytes
The ELF version number is followed by three pointers, these go to: 1.
The entry point of the function (Since we aren’t doing anything fancy
this will be right after the program header table) 2. The start of the
program header table (Immediately following the ELF header) 3. The
Section Header Table (We won’t be using any special sections so for our
program this will be empty) At the moment we know two of these. The
first is the offset to the start of the program header table
(0x40
followed by 7 null bytes to make up a 64 bit pointer)
since a 64 bit ELF header is (coincidentally) 64 bytes long. The second
is the Section Header Table pointer, which will be the null pointer
since we won’t be using it.
We don’t know what the pointer for the entry point should be yet, but we can calculate it by adding together the sizes of the Program header and the ELF header and then offsetting that by the location that we are going to have our program loaded at.
Element | Size (In bytes) |
---|---|
ELF Header | 64 |
Program Header | 56 |
Total | 120 |
The total size of the two headers is 120 bytes, which means that the
offset is 0x78
. We now need to pick a location to load our
program at. The default location for GNU ld to link a program’s code
section at is 0x400000
so we add our 0x78
offset to that to get 0x400078
as our entry point address.
We can now write the next portion of our header starting with that entry
point.
0x78 0x00 0x40 0x00 0x00 0x00 0x00 0x00 // Entry point address (In little endian encoding and extended to 64 bits)
0x40 0x00 0x00 0x00 0x00 0x00 0x00 0x00 // Program header offset
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 // The section header table offset (We aren't using this so it is null)
We don’t need any special architecture flags for our program so the next four bytes will also be null
0x00 0x00 0x00 0x00
We now need bytes representing the sizes and numbers of the various ELF header entries
0x40 0x00 // The ELF Header Size (64 bytes)
0x38 0x00 // The size of a program header entry (56 bytes)
0x01 0x00 // The number of program header entries (We will only have one)
0x00 0x00 // The size of a section header entry (0 bytes since we aren't using it)
0x00 0x00 // The number of section header entries (We aren't using any sections)
0x00 0x00 // The index of the section header names entry (null, since we aren't using it)
We now have a complete ELF header which should be parsable by the
file
command on linux, or readelf
with the
-h
flag which tells it to display the ELF header of the
file. It should correctly show the data, although readelf
will output an error since we told it that we have a program header
entry and didn’t actually provide one.
Here is an example of what the output of readelf -h
will
be for the program:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x400078
Start of program headers: 64 (bytes into file)
Start of section headers: 0 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 1
Size of section headers: 0 (bytes)
Number of section headers: 0
Section header string table index: 0
You can see that most of the data that we programmed into the header is very nicely rendered here. You can even mess with the values of some of the fields and see how that changes the output.
The end of readelf
’s output will be an error message
telling you that it tried to read the program headers but it got the end
of the file instead. This is expected, because we have not created any
program headers yet, but we told the operating system that there would
be one header with a size of 56 bytes.
readelf: Error: Reading 56 bytes extends past end of file for program headers
That’s it for this post. In the next post we will build the program header entry and actually get our basic executable to run.