How does debugger locate source code of an executable

I always wonder how can debugger tools such as GDB would be able to locate the source code of an instruction. Turn out, this information is embebeded inside the executable, GDB just need to parse and search for this information.

So let’s say we have a simple C program as such:

// fizzbuzz.c
#include <stdio.h>
int main(){
  for (int i = 0; i < 100; i++)
    if (i % 15 == 0)
      printf("FizzBuzz\n");
    else if (i % 3 == 0)
      printf("Fizz\n");
    else if (i % 5 == 0)
      printf("Buzz\n");
    else
      printf("%d\n", i);
}

Normally, compiling with gcc gives no special information inside the executable. We can check it by using objdump

# Compile to object file
gcc -c fizz.c

# List sections in this object file
objdump -h fizz.o

The output looks like:

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         000000fb  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  0000013b  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  0000013b  2**0
                  ALLOC
  3 .rodata       00000017  0000000000000000  0000000000000000  0000013b  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .comment      0000002c  0000000000000000  0000000000000000  00000152  2**0
                  CONTENTS, READONLY
  5 .note.GNU-stack 00000000  0000000000000000  0000000000000000  0000017e  2**0
                  CONTENTS, READONLY
  6 .eh_frame     00000038  0000000000000000  0000000000000000  00000180  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Just normal .text and .data sections

With this kind of output, debugger won’t be able to locate any information about the source code of an instruction.

We need to set the debug flag to attach this information to object file.

gcc -gstabs -c fizz.c

Here we set the debug symbol using a STABS format albeit rather old(there is another format called DWARF, but let’s save it for another post).

Now this time, there are debug sections in the result object file(.stab and .stabstr):

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
... 
  3 .stab         0000069c  0000000000000000  0000000000000000  0000013c  2**2
                  CONTENTS, RELOC, READONLY, DEBUGGING
  4 .stabstr      0000110a  0000000000000000  0000000000000000  000007d8  2**0
                  CONTENTS, READONLY, DEBUGGING
...

Let’s have a quick review on STABS format.

The Stabs format

We can check the STABS format that gcc emits by using -S flags

gcc -S -gstabs fizz.c

this will create an fizz.s in compiled assembly syntax. There we can see the following information

# extract from fizz.s
.stabs  "fizz.c",100,0,2,.Ltext0
.stabs  "main:F(0,1)",36,0,0,main
.stabn  68,0,2,.LM0-.LFBB1

according to STABS documentation, there are 4 kinds of stab record where stabs and stabn are the most used. Their format are as follow:

.stabs "string",type,other,desc,value
.stabn type,other,desc,value

stabs

Take .stabs "fizz.c",100,0,2,.Ltext0 for example, we know that this is a stabs kind, with the string = "fizz.c", type is 100. In order to know that 100 means, we have to check the reference code stab type code

over there, we will know that it’s a N_SO type, i.e:

Path and name of source file containing main routine

the other field is 0 so it just means NULL, desc is 2 meaning the source code is written in K&R traditionally C(this field is optional). Finally, value is .Ltext0 meaning the address of this file when it runs.

Now let check another one, .stabs "main:F(0,1)",36,0,0,main

string = “main:F(0, 1)”, F means it’s a global function, 1 is the return type(int).
type = 26, meaning a N_FUN

Function name or text segment variable for C
value = main ie: address of this function when running

stabn

Let check the next stabn

.stabn 68,0,2,.LM0-.LFBB1

Remember the syntax of stabn is .stabn type,other,desc,value, we can deduce the following information:

type = 68(N_SLINE)

Line number in text segment
desc = 2, line 2 of source file
value = .LM0-.LFBB1, the address in memory when it is run(note that this address will turn into a memory address when it is executed)

When generating assembly code, these stab information is intermingled between asm instructions. But when assembler compiles into machine code(or object file), all stab information will be put into 2 sections: stabs and stabstr and by parsing these section, debugger can trace back to the source information.

We can also list all of the exported stab symbols of the executable by using objdump

objdump -G fizz

So giving an instruction’s address in memory(for example, a value of IP register), we can deduce the source’s information by the following 3 steps:

locate the source file(using N_SO type)
locate the method inside this source file(using N_FUN type)
location the line number inside this method(using N_SLINE type)

In my next post, I will use these 3 steps to write a program that can print its own source information.