Sunday, May 12, 2013

Uncharted Territory

Different projects require different demands out of C++. If the developer was writing medical software they would probably focus more on making the code correct rather than fast; if the developer was writing trading software they'd want it to be fast; if the developer was writing in-house tools they'd want the code readable. C++ allows us to do all of these things. Keep in mind that there are priorities in writing code. Just because a trading software engineer would focus on speed doesn't mean that safety goes out the window, it just means it might have a lower priority. 

I've been thinking about ways to measure the efficiency of compiled programs. Most of the time I would use a profiler to do the work (callgrind). This is overkill for small snippets of code that I want to compare. 

The next alternative would be to use a linux "time" command. It's simple, just call time with the name of your executable. The results will be that it comes back with listings of real time, user time, and sys time. This could be a valuable way to test small programs; I tried doing a search for the pitfalls of using this approach. Granted it wouldn't work for large-scale applications that may have threading/gui components unless of course there was a tremendous bottleneck (it would have to be enormous).

The second alternative is to compile to assembly language. I think this is a perfect solution if you can understand the assembly behind it. After a quick googling I found out that clang can do this:

clang++ -S -mllvm --x86-asm-syntax=intel main.cpp

This produces a main.s file that has a whole bunch of assembly language output in it. I am suggesting that for alternative #2 that small programs get written then compiled into assembly language to count and analyze the instructions. Let's try a small program:

int main(int argc, char** argv)   
{                                 
    int i = 0;                    
    ++i;                          
}                                 

The assembly for this program is:

    .file    "main.cpp"
    .text
    .globl    main
    .align    16, 0x90
    .type    main,@function
main:                                   # @main
# BB#0:
    sub    ESP, 16
    mov    EAX, DWORD PTR [ESP + 24]
    mov    ECX, DWORD PTR [ESP + 20]
    mov    DWORD PTR [ESP + 12], 0
    mov    DWORD PTR [ESP + 8], ECX
    mov    DWORD PTR [ESP + 4], EAX
    mov    DWORD PTR [ESP], 0
    mov    EAX, DWORD PTR [ESP]
    add    EAX, 1
    mov    DWORD PTR [ESP], EAX
    mov    EAX, DWORD PTR [ESP + 12]
    add    ESP, 16
    ret
.Ltmp0:
    .size    main, .Ltmp0-main


    .section    ".note.GNU-stack","",@progbits


Wow. That looks intimidating. The first thing I noticed was a bit of meta-information at the top:

        .file    "main.cpp"
    .text
    .globl    main
    .align    16, 0x90
    .type    main,@function


This seems to be a collection of extra information that is helpful to us (the developers) and to debuggers [1].

main:                 # @main

This is the start of our main function area (a label).

    sub    ESP, 16

"As program adds data to the stack, the stack grows downward from high memory to low memory." [2]. Try the additional program below the references to see an example. As we go further up a call stack the lower the address number for stack variables! I would have thought it would go the other way- but I'm also directionally challenged.

ESP is the register that contains the stack pointer. So when we subtract 16 from the stack pointer we are essentially moving the stack pointer to make room for more "stuff". The local variables (in this case argc and argv) are going to be located in memory at [ESP + 24] and [ESP + 20]; it's good to note their original location was at [ESP + 40] and [ESP + 36].

    mov    EAX, DWORD PTR [ESP + 24]
    mov    ECX, DWORD PTR [ESP + 20]


Since this is assembly, we need to take the contents of the source inputs and put them into local variables. We cannot do an address to address move- we can take the contents of memory and put them into temporary registers. This is taking argc and argv and putting the contents into EAX and ECX. 

    mov    DWORD PTR [ESP + 12], 0
    mov    DWORD PTR [ESP + 8], ECX
    mov    DWORD PTR [ESP + 4], EAX
    mov    DWORD PTR [ESP], 0


This is establishing the local variables. A difficult thing to remember is that the stack pointer (ESP) has already been shifted down enough to contain all of the local variables.The local variables will have the address of the Stack Pointer PLUS some offset in the stack. 

Also, remember how we had previously stuffed the original argc/argv into registers EAX and ECX? This step is COPYING those values into local stack arguments (at [ESP + 8] and [ESP + 4]). The first assignment of [ESP + 12] to 0 is our return value. The second assignment of 0 to the memory location of the stack pointer is our "i" value.

    mov    EAX, DWORD PTR [ESP]
    add    EAX, 1
    mov    DWORD PTR [ESP], EAX


This takes the temporary location for i ([ESP]), moves it into the EAX register, adds 1 to it, then moves the contents of EAX back into the address of where "i" is [ESP]. Whew, this is getting me exhausted just looking at this!

    mov    EAX, DWORD PTR [ESP + 12]
    add    ESP, 16

Remember a couple steps up we stuffed a return value into [ESP + 12]? Yeah, it's back. We simply move the contents into register EAX (our return value). Finally, we add 16 to the stack pointer- it completely restores our prior state!

This post was all about converting to assembly and trying to read/interpret the results. I can tell in the future we can use this tool to detect possible areas where we are copying values around too much or we are accessing memory too much, etc. Assembly language is probably the best solution for analyzing small snippets of what the compiler produces (as long as you can read it).


REFERENCES

[1] - http://www.cs.wfu.edu/~torgerse/Kokua/More_SGI/007-2418-006/sgi_html/ch07.html

[2] - http://www.c-jump.com/CIS77/ASM/Stack/S77_0040_esp_register.htm


ADDITIONAL PROGRAMS

#include <iostream>

void funcall2()
{
    int i3 = 10;
    std::cout << "Address of i3: " << &i3 << std::endl;
}

void funcall()
{
    int i2 = 13;

    std::cout << "Address of i2: " << &i2 << std::endl;
    funcall2();
}

int main(int argc, char** argv)
{
    int i1 = 0;

    std::cout << "Address of i1: " << &i1 << std::endl;
    funcall();
}


OUTPUT:

Address of i1: 0xbf8b749c
Address of i2: 0xbf8b746c
Address of i3: 0xbf8b743c

No comments:

Post a Comment