Tuesday, July 16, 2013

std::stringstream vs. scanf

Back in the olden days C-Programmers would use a facility called scanf to read in values from some sort of string (char array) data. The more modern and usable facility is to use std::stringstream. I like to focus on efficiency over everything else- so if that's not your bag I would suggest skipping this entry.

The inspiration for this blog entry came from "Exceptional C++ Style" by Herb Sutter, Item #2. I would strongly advise you to check it out if you want more detailed explanations about what I'm writing about. There's a cool chart in the article that shows the trade-off's between stringstream, boost::lexical_cast, and scanf. The claims made in the book are that scanf is more efficient because of the lack of one memory allocation call. This blog entry investigates this claim.

To understand each, let's write a typical program including both styles:

// Ahh, these make me feel good!
#include <iostream>
#include <sstream>
#include <string>
// Yikes, this makes me feel sick.
#include <cstdio>

enum { BUFFER_SIZE = 100 };

int main(int argc, char** argv)
{
    std::string input("Hello World!");
 
    std::string cpp_contents;
    std::stringstream ss;
    ss << input;
    ss >> cpp_contents;
     char c_contents[BUFFER_SIZE];
    sscanf( input.c_str(), "%s", c_contents);

    std::cout << "CPP Contents: " << cpp_contents << std::endl;
    std::cout << "C Contents: " << c_contents << std::endl;
}

So obviously this is skewing in terms of efficiency. But is there anything that can be done to improve the C++ side? Let's go line by line for the analysis:

std::input("Hello World!");

This line is not what we want to focus on. I used a string here, but we could easily replace it with a null terminated character buffer. In both cpp/c land we are taking its contents (.c_str()) and using that input.

std::string cpp_contents;
std::stringstream ss;

These two lines are just setup so far. Obviously the creation of the stringstream is a unique object so it's going to cost something. Also the cpp_contents are what we are going to read into.

ss << input;
ss >> cpp_contents;

So to fake things out we'll stream input into the string stream to populate it then stream a single string into cpp_contents. Thus far the overhead is the creation of two objects (cpp_contents, ss) and streaming in to populate the stringstream then streaming out to get the single string.

char c_contents[BUFFER_SIZE];
sscanf(input.c_str(), "%s", c_contents);

The analysis at this point doesn't seem fair because it's obvious that sscanf is more efficient because there is only a character buffer created. Character buffers make me cringe because they are fragile, prone to buffer overruns, and require more maintenance. sscanf does a very efficient job of filling a buffer. Can we make the C++ better?

The first thing I thought of was replacing the string cpp_contents with a character buffer. This is a stupid idea because all we would be doing is trading one object for another instead of getting rid of an object altogether.

The second thing I thought about was directly populating the stringstream. The stringstream object has a second constructor that takes a string! Therefore we can eliminate that first stream to populate the object. But our tally still stands at 2 objects and one operation for the C++ implementation, 1 object and 1 operation for the C implementation. How sad.

As a final test, I'm taking my two implementations and racing them:

#include <iostream>
#include <sstream>
#include <string>
#include <cstdio>

enum { BUFFER_SIZE = 100, MAX = 700000 };

//#define CPP_IMPLEMENTATION
#define C_IMPLEMENTATION

int main(int argc, char** argv)
{
    std::string input("Hello World!");

#if defined CPP_IMPLEMENTATION
    for(unsigned long i=0; i<MAX; ++i)
    {
        char cpp_contents[BUFFER_SIZE];
        std::stringstream ss(input);
        ss >> cpp_contents;
    }
#endif

#if defined C_IMPLEMENTATION

    for(unsigned long i=0; i<MAX; ++i)
    {
        char c_contents[BUFFER_SIZE];
        sscanf(input.c_str(), "%s", c_contents);   
    }

#endif

}

 

What I found:

CPP_IMPLEMENTATION
    real     0m0.547s
    user     0m0.544s
    sys      0m0.000s

C_IMPLEMENTATION
    real     0m0.062s
    user     0m0.056s
    sys      0m0.004s

There it is. If you are coding for safety, use C++'s modern standard string stream. However if you are focusing your endeavours on efficiency then seriously consider using a function from the scanf family. It's faster because of the obvious: no need for an extra object allocation.




No comments:

Post a Comment