The GCC provides various levels of compiler optimization. The option can be specified by -On
where n contains integral values {0,1,2,3}.
-O0
Reduce compilation time and make debugging produce the expected results. This is the default.
-O1
Low level optimization. The compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.
-O2
Medium level optimization. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O1, this option increases both compilation time and the performance of the generated code.
-O3
High level optimization. -O3 turns on all optimizations specified by -O2 and some additional optimizations, which increases the total compilation time.
Consider the following C code snippet:
int flag = 1;
void main()
{
while(flag);
}
The following is the disassembly for an ARM architecture based device like the Raspberry Pi.
It is compiled with -O0
optimization.
8424: e59f301c ldr r3, [pc, #28] ; 8448 <main+0x30>
8428: e5933000 ldr r3, [r3]
842c: e3530000 cmp r3, #0
8430: 1afffffb bne 8424 <main+0xc>
Register R3
is loaded with the global variable flag
’s value. It is compared with zero and if not equal it branches to the 0x8424
to repeat the procedure again.
The following is the disassembly under O3
optimization.
82f4: e59f3020 ldr r3, [pc, #32] ; 831c <main+0x28>
82f8: e92d4010 push {r4, lr}
82fc: e5934000 ldr r4, [r3]
8300: e3540000 cmp r4, #0
8304: 0a000000 beq 830c <main+0x18>
8308: eafffffe b 8308 <main+0x14>
Note the instruction at 0x8308
the code branches to itself which although seems odd is the correct behavior expected from our program. It is faster too as there is no memory address being accessed like the previous code.
We can avoid this kind of optimization by using a function in another source file to return the flag
variable value.
file1.c
int main()
{
while(check_var());
}
file2.c
int flag=1;
int check_var()
{
return flag;
}
Now when after compiling the above two source files to produce the executable and disassemble main
:
<main>
[.]
8424: eb000008 bl 844c <check_var>
8428: e1a03000 mov r3, r0
842c: e3530000 cmp r3, #0
8430: 1afffffb bne 8424 <main+0xc>
[.]
Similarly for check_var
:
<check_var>
844c: e52db004 push {fp} ; (str fp, [sp, #-4]!)
8450: e28db000 add fp, sp, #0
8454: e59f3010 ldr r3, [pc, #16] ; 846c <check_var+0x20>
8458: e5933000 ldr r3, [r3]
845c: e1a00003 mov r0, r3
8460: e24bd000 sub sp, fp, #0
8464: e49db004 pop {fp} ; (ldr fp, [sp], #4)
8468: e12fff1e bx lr
846c: 00010614 andeq r0, r1, r4, lsl r6
Register R0
is used to store the return value.
Now the program is forced to load the variable with the value at the memory address and is thus not optimized away.