home  /   who am i  /   my work  /   soap box

2012.03.23 - inlining, qsort, and std::sort

C++ Sucks C++ enthusiasts are often quick to claim std::sort is faster than qsort.  The typical citation they provide is from Scott Meyers' book "Effective STL", Item 46: "At runtime, sort makes inline calls to its comparison function while qsort calls its comparison function through a pointer."

His statement, however, is only true when the definition of qsort is unavailable.  Because templates usually aren't external, the compiler tends to have the necessary definitions at compile time and can inline as appropriate.  What Scott failed to mention is that if the definition of qsort and the comparison function are available, C compilers can perform the same inlining!

This claim is often met with skepticism.  The experiment below demonstrates this skepticism is unwarranted.

The setup:

The C source code and disassembly can be found here.  The C++ source and disassembly can be found here.

Compilation was done with GCC version 4.5.0, using the following command for the C code:

Prompt> gcc -O3 -Wall -Winline --std=c99 qsort_main.c external_qsort.c

Similarly, the C++ version was compiled with:

Prompt> g++ -O3 -Wall stlsort.cpp

When invoking the C version, the program takes two arguments.  The first indicates whether to use inline or external qsort (0=external, 1=inline).  The second sets how many elements to sort.  The C++ version only takes how many elements to sort.

Finally, the test was run, each program sorted 100 000 000 elements.  The results can be seen below:

# External qsort
Prompt> qsort 0 100000000
Total time: 31.290000
# Inline qsort
Prompt> qsort 1 100000000
Total time: 11.228000
# STL sort
Prompt> stlsort 100000000
Total time: 9.291000

When the C compiler has the same information the C++ one has the programs perform nearly the same.  When the compiler is forced to use an indirect function call, the performance is terrible.  Looking at the disassembly, the call to external qsort has the following code:

movl    $_foo_compare, 12(%esp)
movl    $4, 8(%esp)
movl    %esi, 4(%esp)
movl    -308(%ebp), %ecx
movl    %ecx, (%esp)
call    _external_qsort

Here it is obvious no inlining is performed.  The comparison function address is pushed onto the stack, and a call made to an ignorant external qsort.  The remaining assembly for the function main contains no direct references to either inline_qsort or foo_comparison since both were completely inlined.  Inlined instances of foo_compare can easily be found if the comparison code is modified to make a call to the function foo_token.  Looking at the disassembled version, inlined instances can be found simply by searching for foo_token.  For instance:

movl    $LC0, (%esp)
call    _foo_token
movl    (%edi), %eax
cmpl    %eax, (%ebx)
jns    L28
movl    %ebx, %edi
addl    $4, %ebx
cmpl    %ebx, %esi
jae    L29

Many such inlined instances are seen scattered throughout the main function.

Hopefully the above experiment demonstrates the inaccuracy of Scott Meyers' initial claim.