From .rodata to .rwdata- an introduction to memory mapping and LD scripts

From .

rodata to .

rwdata- an introduction to memory mapping and LD scriptsGuy EzerBlockedUnblockFollowFollowingJun 4A few days ago a colleague of mine, which had just started to learn C, was wondering about the following piece of code:char *foo = "AAAAAA";foo[0] = 'B';This is described as a valid code, according to the tutorial he followed, yet when running it a segmentation fault occurs:guy@localhost ~/b/string_elf> gcc sample_1.

cguy@localhost ~/b/string_elf> .

/a.

out fish: ".

/a.

out" terminated by signal SIGSEGV (Address boundary error)Why there is a segmentation fault?We can easily find the instruction that caused the segmentation fault by debugging the process under GDB:guy@localhost ~/b/string_elf> gdb a.

out .

(gdb) set disassemble-next-line on(gdb) rStarting program: /home/guy/blog/string_elf/a.

out Program received signal SIGSEGV, Segmentation fault.

0x00000000004004dd in main ()=> 0x00000000004004dd <main+16>: c6 00 42 movb $0x42,(%rax)It is easy to see that writing to address of $rax caused the crash — the instructionmovb $0x42,(%rax)was the last instruction that was run before the segmentation fault.

We can see that $rax contains the initial value of the string by using the e x amine command:(gdb) x/6c $rax 0x400580: 65 'A' 65 'A' 65 'A' 65 'A' 65 'A' 65 'A'In order to display the memory mapping of the process, we can use the file /proc/PID/maps.

The info proc gdb command returns the PID:(gdb) info procprocess 8596cmdline = '/home/guy/blog/string_elf/a.

out'cwd = '/home/guy/blog/string_elf'exe = '/home/guy/blog/string_elf/a.

out'(gdb) shell cat /proc/8596/maps00400000-00401000 r-xp 00000000 fd:02 134817229 .

00600000-00601000 r–p 00000000 fd:02 134817229 .

00601000-00602000 rw-p 00001000 fd:02 134817229 .

.

As seen from the memory mapping, the address that is stored in $rax, 0x400580, which resides in range 0x00400000–0x00401000, is marked as non-writable.

00400000-00401000 r-xpWriting to it would obviously cause a segmentation fault, as the page is marked as non-writable.

An interesting question popped up:how can we cause the string to be writable?Before we continue any further, we need to make a quick detour and introduce the ELF file format.

A brief introduction to the ELF file formatThe ELF file format, among other things, consists of sections that describe the logical memory layout of the binary.

Some typical sections one may find in an ELF file are:.

text — which stores the instructions that consist of the program itself.

It is marked as executable and read-only (r-x).

data — which is used to store static and global variables (non-static variables are stored on the stack).

It is marked as read-write and non-executable (rw-).

bss — which stores non-initialized variables.

It is marked as read-write and non-executable (rw-).

rodata — which stores constant data.

One should expect string literals, and other constant values to reside there.

It is marked as read-only (although usually resides in a read and executable segment).

The section header table stores information on the various sections — including their permissions, virtual address memory range, etc.

One can view the different sections using the readelf tool:guy@localhost ~/b/string_elf> readelf -S a.

out There are 31 section headers, starting at offset 0x1908:Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align.

[14] .

text PROGBITS 00000000004003e0 000003e0.

[16] .

rodata PROGBITS 0000000000400570 00000570 000000000000003b 0000000000000000 A 0 0 8.

[25] .

data PROGBITS 0000000000601020 00001020 0000000000000004 0000000000000000 WA 0 0 1 [26] .

bss NOBITS 0000000000601024 00001024 0000000000000004 0000000000000000 WA 0 0 1.

objdump is another great tool.

The VMA column stands for and describes the base virtual address of the section.

guy@localhost ~/b/string_elf> objdump -h a.

out a.

out: file format elf64-x86-64Sections:Idx Name Size VMA LMA File off Algn 13 .

text 00000182 00000000004003e0 00000000004003e0 000003e0 2**4.

15 .

rodata 0000003b 0000000000400570 0000000000400570 00000570 2**3.

24 .

data 00000004 0000000000601020 0000000000601020 00001020 2**0 CONTENTS, ALLOC, LOAD, DATA 25 .

bss 00000004 0000000000601024 0000000000601024 00001024 2**0.

In addition to sections, the elf file format consists of a Program Header table which describes how different sections are grouped into segments in memory.

The elf loader creates the memory mapping of the process according to those segments:When an ELF file format is being loaded into memory, via a call to , the kernel only examines the segments in order to set the memory mapping of the process.

The kernel does not care about individual sections.

Here is a snippet of the load_elf_binary function (kernel 3.

18) — it can be seen that the kernel only considers the program headers (segments) and calls elf_map (which in turn calls vm_mmap) which mmaps each segment to its VMA with its given flags:static int load_elf_binary(struct linux_binprm *bprm) { .

/* iterate over the program headers */ for(i = 0, elf_ppnt = elf_phdata; i < loc->elf_ex.

e_phnum; i++, elf_ppnt++) { int elf_prot = 0, elf_flags; unsigned long k, vaddr; .

/* store the virtual address of the segment */ vaddr = elf_ppnt->p_vaddr; .

/* mmap the segment to its virtual address with the permissions that are specified in the program header table */ error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt, elf_prot, elf_flags, 0); .

}Let’s find the section in which $rax resides in via GDB (since we already covered readelf and objdump):(gdb) maintenance info sectionsExec file: `/home/guy/blog/string_elf/a.

out', file type elf64-x86-64.

.

0x00400570->0x004005ab at 0x00000570: .

rodata ALLOC LOAD READONLY DATA HAS_CONTENTS .

We can then see, that the string was put at the “.

rodata” section which is marked as READONLY.

Writing to it would obviously cause a segmentation fault.

A run-time patch: mprotectmprotect is a syscall that sets protection on a region of memory:#include <sys/mman.

h>int mprotect(void *addr, size_t len, int prot);It sets permissions of the memory region starting at addr and ending in addr+len with the following permissions (which are passed via prot):PROT_NONE The memory cannot be accessed at all.

PROT_READ The memory can be read.

PROT_WRITE The memory can be modified.

PROT_EXEC The memory can be executed.

This means that we can call mprotect with the address of the page containing the string and set it to:PROT_WRITE | PROT_READ | PROT_EXEC(which is equal to 7).

Why would we mark it as executable?.Because “.

text” maps to this page as well!.We can see that by examining the section-to-segment mapping using :guy@localhost ~/b/string_elf> readelf -l a.

outElf file type is EXEC (Executable file)Entry point 0x4003e0There are 9 program headers, starting at offset 64Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040 0x00000000000001f8 0x00000000000001f8 R E 8 INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238 0x000000000000001c 0x000000000000001c R 1 [Requesting program interpreter: /lib64/ld-linux-x86-64.

so.

2] LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x00000000000006b4 0x00000000000006b4 R E 200000.

Section to Segment mapping: Segment Sections.

00 01 .

interp 02 .

interp .

note.

ABI-tag .

.

plt.

got .

text .

fini .

rodata .

03 .

init_array .

fini_array .

04 .

dynamic 05 .

note.

ABI-tag .

note.

gnu.

build-id 06 .

eh_frame_hdr 07 08 .

init_array .

fini_array .

jcr .

dynamic .

gotWe can find the page address by aligning $rax to a page size:(gdb) $rax – $rax%4096Here is the continuation of the GDB session, in which a breakpoint was set before the write;A call to mprotect was then performed, changing the page which $rax resides in to be writable:(gdb) p mprotect($rax – $rax%4096, 4096, 7)And success!.The binary exited successfully!(gdb) bBreakpoint 1 at 0x4004dd(gdb) rThe program being debugged has been started already.

Start it from the beginning?.(y or n) yStarting program: /home/guy/blog/string_elf/a.

out Breakpoint 1, 0x00000000004004dd in main ()=> 0x00000000004004dd <main+16>: c6 00 42 movb $0x42,(%rax)(gdb) p mprotect($rax – $rax%4096, 4096, 7)$1 = 0(gdb) si0x00000000004004e0 in main ()=> 0x00000000004004e0 <main+19>: 5d pop %rbp(gdb) x/6c $rax0x400580: 66 'B' 65 'A' 65 'A' 65 'A' 65 'A' 65 'A'While this method works, it is not ideal and suffers from a lot of problems:This is a run-time solution, meaning that additional instructions are required to be run.

We have to know the size of the buffer we want to change in advance — this method will fail when the size of .

rodata is over one page.

Since multiple sections can reside in a single page, a call to mprotect would possibly change permissions of other sections.

We can do better: LD scriptsIf we could change the segment in which “.

rodata” resides in a writable segment it would have been perfect!.That is where LD scripts come into play.

The segment to section mapping, among other things, is determined during the linkage of the program.

GCC, the compiler, creates an object file, which already contains some sections inside — such as the “.

text”, “.

data”, and “.

bss”.

According to the GCC documentation, every output must contain, at least, a text section.

The linker, LD, takes a bunch of object files and combines them into an ELF file.

Commands are passed to LD using an ld script.

The main purpose of the linker script is to describe how the sections in the input files should be mapped into the output file and to control the memory layout of the output file.

The linker always uses a linker script.

Using the default linker scriptThe default linker script can be obtained by running:ld –verboseThe output is quite long so I put it into the following gist.

I’ve removed some parts in order to create a simplified view of the default linker script:SECTIONS { /* Read-only sections, merged into text segment: */ .

= SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; .

text : {*(.

text) } .

rodata : {*(.

rodata .

rodata.

*) } /* Read-write sections */ .

= DATA_SEGMENT_ALIGN (.

); .

data : {*(.

data .

data.

*) }}The SECTION block describes how section from the object files, that are given as input to the linker, will map to section in the output ELF file.

For example, the line:Defines an output section named “.

text” (left-hand side), which contains the “.

text” section from all the input files — using a wildcard:The “.

” symbol is used to describe the current memory location and is called the location counter.

Output sections are mapped to the location counter and the location counter is incremented by the size of the output section.

The line.

= SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;Sets the location counter to the start of the text segment, at VMA 0x400000.

Since the text segment is marked as readable and executable, sections that are put there will be non-writable — which in this case are “.

text” and “.

rodata”.

It can later be seen that the location counter is set to the data segment, and the data section is put there.

An easy hack would be to put the .

rodata section after updating the location to the data segment.

Let’s create a segment called “.

rwdata “ which would replace “.

rodata”:.

rwdata : {*(.

rodata .

rodata.

*) }And we will change the linker script as follows — we will remove that “.

rodata” section, and insert our “.

rwdata” inside the data section:SECTIONS { /* Read-only sections, merged into text segment: */ .

= SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; .

text : {*(.

text) } /* Read-write sections */ .

= DATA_SEGMENT_ALIGN (.

); .

data : {*(.

data .

data.

*) } /* —- our evil hack —- /* .

rwdata : {*(.

rodata .

rodata.

*) }}The full linker script can be found here.

GCC can take a non-default linker script using the -T option.

Let’s try and compile the following code using our modified linker script:#include <stdio.

h>int main(void) { char *foo = "AAAAAA"; printf("printing string foo %s.", foo); foo[0] = 'B'; printf("printing string foo %s.", foo); return 0;}I’ve written a Makefile that invokes GCC using the modified linker script:guy@localhost ~/b/ld_script_elf_blog_post> makemkdir -p buildgcc -T rwdata.

ld sample.

c -o build/sampleAnd when running:guy@localhost ~/b/ld_script_elf_blog_post> .

/build/sampleprinting string foo AAAAAAprinting string foo BAAAAAIt works!Listing the section-to-segment mapping of the ELF, we see a new “.

rwdata” section and no “.

rodata” section:guy@localhost ~/b/ld_script_elf_blog_post> readelf -l build/sample .

Section to Segment mapping: Segment Sections.

.

02 .

interp .

note.

ABI-tag .

note.

gnu.

build-id .

gnu.

hash .

dynsym .

dynstr .

gnu.

version .

gnu.

version_r .

rela.

dyn .

rela.

plt .

init .

plt .

plt.

got .

text .

fini .

eh_frame_hdr .

eh_frame 03 .

init_array .

fini_array .

jcr .

dynamic .

got .

got.

plt .

data .

rwdata .

bss .

ConclusionLD scripts can be very useful when a tight control over the memory mapping is needed, which is something that is sometimes needed when programming for an embedded target (see the following example), or for other esoteric needs — for example, making the “.

text” section writable for a self-modifying binary.

I hope you found this post interesting.

This is my first-ever post, so any comments would be much appreciated ????Originally published at https://guyonbits.

com.

.

. More details

Leave a Reply