Tuesday, April 30, 2013
Boost Threads (Part 1)
/**
* Boost Threads (Part 1)
*
* In this example I want to cover basics about Boost Thread. I am completely
* new to this topic so I'll be learning as I write. This is being compiled
* on Ubuntu 12.10 with Boost 1.49.
*/
// The following must be included to get boost threads.
#include <boost/thread.hpp>
#include <iostream>
#include <string>
/**
* A scoped printer object. When constructed it prints that the scope
* starts, when finished it prints that the scope ends. Look up info
* on RAII if you are unsure of what this is doing.
*/
struct printer
{
//! The label output.
std::string label;
printer(std::string const& labelIn)
: label(labelIn)
{
std::cout << "START(" << this->label << ")" << std::endl;
}
~printer()
{
std::cout << "END(" << this->label << ")" << std::endl;
}
};
/**
* Worker #1 has no arguments. This illustrates basic execution of a thread. [1]
*/
void worker1()
{
printer P("worker1");
// You have to create a time object that will denote the duration of
// how long this thread will sleep.
boost::posix_time::seconds workTime(3);
// Then using the boost::this_thread object, perform the sleep
// operation.
boost::this_thread::sleep(workTime);
}
/**
* Worker #2 is a little cooler because it demonstrates how to pass arguments
* into the worker functions. [2]
*/
void worker2(std::string const& extraLabel, int pauseTime)
{
printer P("worker2 - " + extraLabel);
boost::posix_time::seconds work2Time(pauseTime);
boost::this_thread::sleep(work2Time);
}
int main(int, char**)
{
printer P("main");
// Create a Worker #1 Thread.
boost::thread worker1Thread(worker1);
// Create a couple Worker #2 Threads.
boost::thread worker2Thread1(worker2, "Pause 4", 4);
boost::thread worker2Thread2(worker2, "Pause 2", 2);
std::cout << "WAITING(worker2Thread1 - Pause4)" << std::endl;
worker2Thread1.join();
std::cout << "WAITING(worker1)" << std::endl;
worker1Thread.join();
// This thread should have only paused for 2 seconds, wonder what
// happens?
std::cout << "WAITING(worker2Thread1 - Pause 2)" << std::endl;
worker2Thread2.join();
return 0;
}
/**
Results:
START(main)
WAITING(worker2Thread1 - Pause4)
START(worker2 - Pause 2)
START(worker2 - Pause 4)
START(worker1)
END(worker2 - Pause 2)
END(worker1)
END(worker2 - Pause 4)
WAITING(worker1)
WAITING(worker2Thread1 - Pause 2)
END(main)
As you can see, the join doesn't seem to care if the worker's end in a
random way. This is a neat API- unlike serialization where I had to keep
tweaking and tweaking to make it work right, the threading seemed to
work out-of-the-box. However this is only a really basic example. Next
time I will try to do something more interesting.
*/
/**
* REFERENCES
*
* [1] - http://www.codeproject.com/Articles/279053/How-to-get-started-using-Boost-threads
*
* [2] - http://www.drdobbs.com/cpp/whats-new-in-boost-threads/211600441
*
*/
Monday, April 29, 2013
Boost Serialization (Part 1)
/**
* The following is an example of how to use Boost Serialization.
* After googling around, I wasn't able to find many quality
* examples that did anything interesting. And those examples I
* did find were always either too complicated or incomplete.
* This example fully compiles, but make sure to link in
* boost_serialization.
*/
// You must include this header in order to perform serialization
// on shared pointers in boost.
#include <boost/serialization/shared_ptr.hpp>
#include <boost/shared_ptr.hpp>
// This class is required for use of the macro BOOST_CLASS_EXPORT_GUID
#include <boost/serialization/export.hpp>
// These includes are necessary to build an archive. The archive
// handles operator &, which provides a generic serialization
// operation; this can serialize in or out.
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <iostream>
// My favorite stream type: stringstream. It's so flexible and
// easy to use.
#include <sstream>
#include <string>
/**
* The character class, which is the class we are going to serialize.
*/
class Character
{
public:
Character() {}
virtual ~Character() {}
void setName(std::string const& n) { this->name = n; }
std::string const& getName() const { return this->name; }
void setAge(int a) { this->age = a; }
int getAge() const { return this->age; }
void dump()
{
std::cout << this->name << "(" << this->age << ")" << std::endl;
dump_();
}
/**
* This is a brilliant mechanism provided by boost. If the
* method is never used it is never generated. The Archive
* can provide either input or output serialization because
* we pass by reference. Also, the Archive as a template
* parameter allows us to serialize it in so many different
* ways (xml, text, binary, etc.). Like I said, simply
* brilliant design!
*/
template <typename Archive>
void serialize(Archive& ar, unsigned int const /* version */)
{
ar & this->name;
ar & this->age;
}
private:
std::string name;
int age;
private:
virtual void dump_() { }
};
/**
* This is a simple derived class that has a little more information
* to it. This illustrates how boost takes care of serialization of
* derived classes.
*/
class ComicCharacter : public Character
{
public:
ComicCharacter() : Character() { }
template <typename Archive>
void serialize(Archive& ar, unsigned int const version)
{
// This is the way that is outlined in the boost documentation
// for serializing the derived/base classes. Without this line,
// you will get: "unregistered void cast" errors.
boost::serialization::void_cast_register( static_cast<ComicCharacter*>(0)
, static_cast<Character*>(0));
Character::serialize(ar, version);
ar & this->strip;
}
void setStrip(std::string const& s) { this->strip = s; }
std::string const& getStrip() const { return this->strip; }
private:
std::string strip;
private:
void dump_() { std::cout << " from strip: " << this->strip << std::endl; }
};
BOOST_CLASS_EXPORT_GUID(Character, "Character")
BOOST_CLASS_EXPORT_GUID(ComicCharacter, "ComicCharacter")
/**
* Main entry point.
*/
int main(int, char**)
{
// Create a stream object to write our output archive to.
std::stringstream out;
// Create an archive. This passes an ostream object in as a place to
// output the results of the serialization.
boost::archive::text_oarchive oa(out);
// Create a shared pointer to a comic character AS a
// comic character.
boost::shared_ptr<ComicCharacter> comicCharacter(new ComicCharacter());
comicCharacter->setName("Charlie Brown");
comicCharacter->setAge(8); // [2]
comicCharacter->setStrip("Peanuts");
// When I write to the archive I want to make sure that the
// operation occurs on the BASE CLASS. This illustrates that the
// polymorphic data is also written to the stream.
boost::shared_ptr<Character> baseCharacter = comicCharacter;
// Finally now write the base object to the stream.
oa << baseCharacter;
std::string const contents = out.str();
std::cout << "ARCHIVE CONTENTS: " << contents << std::endl;
// Create a stream for input and populate it with the serialized
// contents.
std::stringstream in;
in << contents;
// Now we create a text input archive, again passing in an ostream
// object.
boost::archive::text_iarchive ia(in);
// Create an empty Character shared pointer. Note that character is
// the base class. We serialized a ComicCharacter into the archive-
// therefore we should get one out.
boost::shared_ptr<Character> mysteryCharacter;
// Here is the deserialization call.
ia >> mysteryCharacter;
// Finally to prove that we indeed received a Comic Character
// named Charlie Brown:
mysteryCharacter->dump();
}
#if 0
The following is the result of running this example:
ARCHIVE CONTENTS: 22 serialization::archive 10 0 1 2 14 ComicCharacter 1 0
0 13 Charlie Brown 8 7 Peanuts
Charlie Brown(8)
from strip: Peanuts
In conclusion, using the serialization capabilities in boost to serialize
polymorphic types is difficult. Hopefully this thorough example helps
folks muddling their way through.
#endif
* The following is an example of how to use Boost Serialization.
* After googling around, I wasn't able to find many quality
* examples that did anything interesting. And those examples I
* did find were always either too complicated or incomplete.
* This example fully compiles, but make sure to link in
* boost_serialization.
*/
// You must include this header in order to perform serialization
// on shared pointers in boost.
#include <boost/serialization/shared_ptr.hpp>
#include <boost/shared_ptr.hpp>
// This class is required for use of the macro BOOST_CLASS_EXPORT_GUID
#include <boost/serialization/export.hpp>
// These includes are necessary to build an archive. The archive
// handles operator &, which provides a generic serialization
// operation; this can serialize in or out.
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <iostream>
// My favorite stream type: stringstream. It's so flexible and
// easy to use.
#include <sstream>
#include <string>
/**
* The character class, which is the class we are going to serialize.
*/
class Character
{
public:
Character() {}
virtual ~Character() {}
void setName(std::string const& n) { this->name = n; }
std::string const& getName() const { return this->name; }
void setAge(int a) { this->age = a; }
int getAge() const { return this->age; }
void dump()
{
std::cout << this->name << "(" << this->age << ")" << std::endl;
dump_();
}
/**
* This is a brilliant mechanism provided by boost. If the
* method is never used it is never generated. The Archive
* can provide either input or output serialization because
* we pass by reference. Also, the Archive as a template
* parameter allows us to serialize it in so many different
* ways (xml, text, binary, etc.). Like I said, simply
* brilliant design!
*/
template <typename Archive>
void serialize(Archive& ar, unsigned int const /* version */)
{
ar & this->name;
ar & this->age;
}
private:
std::string name;
int age;
private:
virtual void dump_() { }
};
/**
* This is a simple derived class that has a little more information
* to it. This illustrates how boost takes care of serialization of
* derived classes.
*/
class ComicCharacter : public Character
{
public:
ComicCharacter() : Character() { }
template <typename Archive>
void serialize(Archive& ar, unsigned int const version)
{
// This is the way that is outlined in the boost documentation
// for serializing the derived/base classes. Without this line,
// you will get: "unregistered void cast" errors.
boost::serialization::void_cast_register( static_cast<ComicCharacter*>(0)
, static_cast<Character*>(0));
Character::serialize(ar, version);
ar & this->strip;
}
void setStrip(std::string const& s) { this->strip = s; }
std::string const& getStrip() const { return this->strip; }
private:
std::string strip;
private:
void dump_() { std::cout << " from strip: " << this->strip << std::endl; }
};
BOOST_CLASS_EXPORT_GUID(Character, "Character")
BOOST_CLASS_EXPORT_GUID(ComicCharacter, "ComicCharacter")
/**
* Main entry point.
*/
int main(int, char**)
{
// Create a stream object to write our output archive to.
std::stringstream out;
// Create an archive. This passes an ostream object in as a place to
// output the results of the serialization.
boost::archive::text_oarchive oa(out);
// Create a shared pointer to a comic character AS a
// comic character.
boost::shared_ptr<ComicCharacter> comicCharacter(new ComicCharacter());
comicCharacter->setName("Charlie Brown");
comicCharacter->setAge(8); // [2]
comicCharacter->setStrip("Peanuts");
// When I write to the archive I want to make sure that the
// operation occurs on the BASE CLASS. This illustrates that the
// polymorphic data is also written to the stream.
boost::shared_ptr<Character> baseCharacter = comicCharacter;
// Finally now write the base object to the stream.
oa << baseCharacter;
std::string const contents = out.str();
std::cout << "ARCHIVE CONTENTS: " << contents << std::endl;
// Create a stream for input and populate it with the serialized
// contents.
std::stringstream in;
in << contents;
// Now we create a text input archive, again passing in an ostream
// object.
boost::archive::text_iarchive ia(in);
// Create an empty Character shared pointer. Note that character is
// the base class. We serialized a ComicCharacter into the archive-
// therefore we should get one out.
boost::shared_ptr<Character> mysteryCharacter;
// Here is the deserialization call.
ia >> mysteryCharacter;
// Finally to prove that we indeed received a Comic Character
// named Charlie Brown:
mysteryCharacter->dump();
}
#if 0
The following is the result of running this example:
ARCHIVE CONTENTS: 22 serialization::archive 10 0 1 2 14 ComicCharacter 1 0
0 13 Charlie Brown 8 7 Peanuts
Charlie Brown(8)
from strip: Peanuts
In conclusion, using the serialization capabilities in boost to serialize
polymorphic types is difficult. Hopefully this thorough example helps
folks muddling their way through.
#endif
Saturday, April 27, 2013
Muddling Through Endians..
Awhile back I was looking for a job. I was scared to death of the possible C++ questions I would get so I studied the crap out of containers, templates, etc. The interviewer asked me on a scale of 1-10 what my skill level for C++ was. I answered 6.
Not to brag or anything but I answered almost every single C++ question correctly in the phone screen. Everything from multiple virtual inheritance to big O of the standard algorithms to the nature of sizeof. I was on a roll until I ran into too many endians. The interviewer asked, "How could you tell if your program is reading in big endian numbers or little endian numbers?" I vaguely knew what endian-ness was and tried to muddle through the question; I hate it when interviewee's try to muddle through something they don't know. Yet there I was, muddling with the best of em.
I think my answer had something to do with performing an AND operation on 1. Needless to say the answer I gave was wrong and the interview was over. Didn't matter how good my C++ knowledge was if I didn't know what Endian-ness was all about.
Big Endian vs. Little Endian [1]. Basically it has to do with byte order (like I had tried to muddle). Assume you put the number "1" into an int. How is the int layed out? Where is the byte with the 1 in it located?
If you are little endian:
Address : 0 1 2 3
Bytes : 00000001 00000000 00000000 00000000
Big Endian:
Address : 0 1 2 3
Bytes : 00000000 00000000 00000000 00000001
By default, x86 machines seem to use little-endian encoding. Therefore the simple way to test whether or not you have a big endian or little endian is:
int main(int argc, char** argv)
{
int a = 1;
char* c = reinterpret_cast<char*>(&a);
if (*c == 1)
{
std::cout << "Little Endian" << std::endl;
}
else
{
std::cout << "Big Endian" << std::endl;
}
}
I feel much more ignorant than when I started because this is a simple question. Guess it doesn't matter sometimes how much C++ you know if you don't know the basics. I checked the above example with other code online [2]. For more information about big/little endian see reference [3].
References
I Did a Google Search...
In an effort to start learning the finer points of C++ efficiency I did a Google search for C++ Efficiency. The first article I found was "The Top 20 C++ Tips of All Time" [1]. When I read stuff like this the information never sticks unless I try it out. Therefore for each item cover in this article I'll be trying it out with a compiler. The compiler I am using is g++ 4.7.2 which is the stock compiler for Ubuntu 12.10.
The first element I want to cover is Item #9 in the article above which deals with member alignment. I assumed that the compiler would find the most efficient layout for POD data in a class/struct. I was wrong. Try the following program yourself:
#include <iostream>
struct A
{
bool a;
int b;
bool c;
};
struct B
{
bool a;
bool c;
int b;
};
int main(int argc, char** argv)
{
std::cout << "sizeof(A) = " << sizeof(A) << std::endl;
std::cout << "sizeof(B) = " << sizeof(B) << std::endl;
}
The result you'll see is:
sizeof(A) = 12
sizeof(B) = 8
Wow. I wasn't expecting that! I guess the rule of thumb here is that each member needs to align on a four byte boundary. After each bool is inserts 3 padding bytes. Therefore, in A you'll see 3 padding bytes added to 1 byte for member a, then 4 bytes for b, then another byte for c followed by 3 more padding bytes.
struct A
{
bool a; //< 1 Byte for value, 3 Bytes Padding
int b; //< 4 Bytes
bool c; //< 1 Byte for value, 3 bytes for padding.
}; // SUM = 12 bytes.
The lesson of the story is to try and group smaller types together. The same example could be applied if you replaced bool with char or short. I think that's also an important consideration to make: what values can be placed into a variable? Can you put the value into an unsigned char, char, short, or unsigned short? At least on this compiler, the sizeof(short) is less than sizeof(int). Obviously this is not a guarantee across all platforms [2].
I've never tried writing anything that used bit-packing. Bit-packing is a technique where developers can specify how many bits to use in a data type. The example that follows was inspired by [3] but as the example progressed I asked a question, typed it into the compiler, then collected the answer. The first example I have is:
#include <iostream>
struct D
{
unsigned char a : 1; // sum = 1
unsigned char b : 2; // sum = 3
unsigned char c : 2; // sum = 5
unsigned char d : 3; // sum = 8
};
int main(int argc, char** argv)
{
std::cout << "sizeof(D) = " << sizeof(D) << std::endl;
}
The result ends up being:
sizeof(D) = 1
So why 1 and not 4? Try adding another member to the structure:
struct D
{
unsigned char a : 1; // sum = 1
unsigned char b : 2; // sum = 3
unsigned char c : 2; // sum = 5
unsigned char d : 3; // sum = 8
unsigned char e : 4; // sum = 12
};
sizeof(D) = 2
So why 2 now instead of 4? Member alignment is confusing. I understand why this is 2: we want to align on a byte boundary. Even though we only use 4 bits of the second byte in D, we need to align on a byte boundary.
Then I tried the following:
struct D
{
unsigned char a : 1; // sum = 1 bit
unsigned char b : 2; // sum = 3 bits
unsigned char c : 2; // sum = 5 bits
unsigned char d : 3; // sum = 8 bits
int e; // sum = (32 bits + 8 bits) == 40 bits == 5 bytes
};
References
[1] - http://www.devx.com/cplus/Article/16328
[2] - http://www.cplusplus.com/doc/tutorial/variables/
[3] - http://cplus.about.com/od/learningc/ss/lowlevel_10.htm
The first element I want to cover is Item #9 in the article above which deals with member alignment. I assumed that the compiler would find the most efficient layout for POD data in a class/struct. I was wrong. Try the following program yourself:
#include <iostream>
struct A
{
bool a;
int b;
bool c;
};
struct B
{
bool a;
bool c;
int b;
};
int main(int argc, char** argv)
{
std::cout << "sizeof(A) = " << sizeof(A) << std::endl;
std::cout << "sizeof(B) = " << sizeof(B) << std::endl;
}
The result you'll see is:
sizeof(A) = 12
sizeof(B) = 8
Wow. I wasn't expecting that! I guess the rule of thumb here is that each member needs to align on a four byte boundary. After each bool is inserts 3 padding bytes. Therefore, in A you'll see 3 padding bytes added to 1 byte for member a, then 4 bytes for b, then another byte for c followed by 3 more padding bytes.
struct A
{
bool a; //< 1 Byte for value, 3 Bytes Padding
int b; //< 4 Bytes
bool c; //< 1 Byte for value, 3 bytes for padding.
}; // SUM = 12 bytes.
struct B
{
bool a; //< 1 Byte for value.
bool c; //< 1 Byte for value, 2 more for Byte Padding
int b; //< 4 Bytes.
}; // SUM = 8 bytes
{
bool a; //< 1 Byte for value.
bool c; //< 1 Byte for value, 2 more for Byte Padding
int b; //< 4 Bytes.
}; // SUM = 8 bytes
I've never tried writing anything that used bit-packing. Bit-packing is a technique where developers can specify how many bits to use in a data type. The example that follows was inspired by [3] but as the example progressed I asked a question, typed it into the compiler, then collected the answer. The first example I have is:
#include <iostream>
struct D
{
unsigned char a : 1; // sum = 1
unsigned char b : 2; // sum = 3
unsigned char c : 2; // sum = 5
unsigned char d : 3; // sum = 8
};
int main(int argc, char** argv)
{
std::cout << "sizeof(D) = " << sizeof(D) << std::endl;
}
The result ends up being:
sizeof(D) = 1
So why 1 and not 4? Try adding another member to the structure:
struct D
{
unsigned char a : 1; // sum = 1
unsigned char b : 2; // sum = 3
unsigned char c : 2; // sum = 5
unsigned char d : 3; // sum = 8
unsigned char e : 4; // sum = 12
};
sizeof(D) = 2
So why 2 now instead of 4? Member alignment is confusing. I understand why this is 2: we want to align on a byte boundary. Even though we only use 4 bits of the second byte in D, we need to align on a byte boundary.
Then I tried the following:
struct D
{
unsigned char a : 1; // sum = 1 bit
unsigned char b : 2; // sum = 3 bits
unsigned char c : 2; // sum = 5 bits
unsigned char d : 3; // sum = 8 bits
int e; // sum = (32 bits + 8 bits) == 40 bits == 5 bytes
};
sizeof(D) = 8
That's what I would have expected. The first four characters are allocated into an unsigned char which is 1 Byte. Add 3 Bytes of padding. Then e will take up another 4 bytes. Ultimately the total is 5 bytes and if we are allocating based on a 4 byte boundary the total becomes 8 bytes.
That was a lot of work! It still doesn't answer the question as to why structures with bit packed unsigned char values only take up 1 byte. I tried the following:
struct E
{
unsigned char a;
};
sizeof(E) = 1
Then I tried:
struct F
{
unsigned short a;
};
sizeof(F) = 2;
Finally I tried:
struct G
{
unsigned short a;
unsigned short b;
unsigned short c;
};
sizeof(G) = 6;
I guess the theory I'm rolling with is that we always align based on the largest type in a structure. To test the theory, we should be able to make the largest size a short with two other char's:
struct H
{
unsigned char a;
unsigned short b;
unsigned char c;
};
Drum-Roll.... Size is 6! So then I tried something a bit more unusual (and probably warrants its own post):
#include <iostream>
struct I
{
char a;
long long b;
char c;
};
int main(int argc, char** argv)
{
std::cout << "sizeof(long long) = " << sizeof(long long) << std::endl;
std::cout << "sizeof(I) = " << sizeof(I) << std::endl;
}
This breaks the pattern:
sizeof(long long) = 8
sizeof(I) = 16
My guess that I will try to back up in another post is that a long long is actually just aligned using two long's (which would only make sense). This post has already gone too far- it just flies when you are having fun.
References
[1] - http://www.devx.com/cplus/Article/16328
[2] - http://www.cplusplus.com/doc/tutorial/variables/
[3] - http://cplus.about.com/od/learningc/ss/lowlevel_10.htm
Friday, April 26, 2013
I Can Do It Better...
At work I have to deal with QtCreator and building it from source. Please keep in mind this was Qt 2.5.0 that I was dealing with; so things may have changed. When looking through the code I was baffled by their design motivations. I guess given that it's an open source project the design is probably more of an afterthought. The goal of the project was simple, use their "model" to store QML so that we could use it in something that could run physical hardware. QML is an excellent language because of its declarative syntax it makes a world of sense to use when writing software that needs to map information onto hardware.
The more I dug my heels into the code the more I loathed it. Despised it. I thought, "What in the hell are you guys doing?!" There were even times that I ventured onto the message boards to attempt to get answers only to be ignored. I can't imagine why.
The final conclusion that was made at the time was that the Designer plugin didn't offer everything we needed to write our hardware descriptions. Extending the QML via plugins or static Declarative Item's was not going to work either because at the time (2.5.0) it was impossible for the QML Designer to display custom C++ Declarative Items. The only viable solution was to write a front end ourselves. Thus stating clearly, "I can do it better."
The only piece of QtCreator I borrowed was the QmlJS library which contains code that can parse QML Documents ( I definitely did not want to rewrite this! ). What does all of this have to do with Boost? In summary, the performance of my custom Model was not very good. It lags when trying to enter a couple hundred items. The underlying data structure I used was a sorted vector because I felt it gave me the best performance when looking up items. Find functionality was key to the model and an efficient solution would mean that the Model should be efficient. Sorted vector lookups take O(log n) time; if using equal_range the insertion takes O(log n).
The sorted vector isn't working. Today I'd like to demonstrate how to replace the sorted std::vector with a boost::unordered_map then remark about what the difference in performance feels like- it could be nothing at all. In order to provide a comparison I'm going to use C-style macro definitions to optionally enable the unordered map:
#define USE_UNORDERED
Let's say that my model looks like this:
struct Item
{
std::string id;
std::string description;
};
bool ModelComparator( boost::shared_ptr<Item> lhs
, boost::shared_ptr<Item> rhs )
{
return (lhs->id < rhs->id);
}
class Model : public boost::noncopyable
{
public:
typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;
typedef ItemsContainer::iterator ItemsIterator;
public:
Model() { }
void AddItem(boost::shared_ptr<Item> item)
{
ItemIterator begin;
ItemIterator end;
Find( item->id, begin, end );
this->items.insert(end, item);
}
boost::shared_ptr<Item> Get(std::string const& itemKey)
{
ItemIterator begin;
ItemIterator end;
Find( itemKey, begin, end );
return *begin;
}
ItemsContainer const& GetContainer() const
{
return this->items;
}
private:
ItemsContainer items;
private:
/**
* The key method in this model for performing O(log n) lookups
* using equal_range.
*/
void Find( std::string const& id
, ItemsIterator& begin
, ItemsIterator& end )
{
if (this->items.empty())
{
begin = this->items.begin();
end = this->items.end();
}
else
{
boost::shared_ptr<Item> item(new Item());
item->id = id;
ItemRange range = std::equal_range( this->items.begin()
, this->items.end()
, item
, ModelComparator );
}
}
};
This is the original code to perform lookups. I even threw in a general "GetContainer" method because this might throw a wrench into what we are trying to accomplish. The first task we have is to set up boost::hash and std::equal_to template specializations [1]:
#if defined USE_UNORDERED
namespace std
{
template <>
struct equal_to
{
bool operator()( boost::shared_ptr<Item> const& lhs
, boost::shared_ptr<Item> const& rhs) const
{
return (lhs->id == rhs->id);
}
};
} // End of std.
namespace boost
{
template <>
struct hash
{
size_t operator()(boost::shared_ptr<Item> const& item) const
{
// Just use the boost::hash with a string, because that's what we
// need.
return boost::hash<std::string>(item->id)
}
};
} // End of boost.
#endif
That sets up the basic hashing mechanism for the items. Now the class member also needs to be redefined if we are using USE_UNORDERED:
typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;
That's simple enough. Note that the example will fail now because the Add/Get methods do not work with equal_range. Let's fix that by modifying the Add operation:
void AddItem(boost::shared_ptr<Item> item)
{
ItemIterator begin;
ItemIterator end;
Find( item->id, begin, end );
this->items.insert(end, item);
}
#define USE_UNORDERED
struct Item
{
std::string id;
std::string description;
};
namespace std
{
template <>
struct equal_to
{
bool operator()( boost::shared_ptr<Item> const& lhs
, boost::shared_ptr<Item> const& rhs) const
{
return (lhs->id == rhs->id);
}
};
} // End of std.
namespace boost
{
template <>
struct hash
{
size_t operator()(boost::shared_ptr<Item> const& item) const
{
// Just use the boost::hash with a string, because that's what we
// need.
return boost::hash<std::string>(item->id)
}
};
} // End of boost.
#endif
bool ModelComparator( boost::shared_ptr<Item> lhs
, boost::shared_ptr<Item> rhs )
{
return (lhs->id < rhs->id);
}
class Model : public boost::noncopyable
{
public:
public:
Model() { }
void AddItem(boost::shared_ptr<Item> item)
{
#if defined USE_UNORDERED
this->items.insert(item);
#else
ItemIterator begin;
ItemIterator end;
Find( item->id, begin, end );
this->items.insert(end, item);
#endif
}
boost::shared_ptr<Item> Get(std::string const& itemKey)
{
#if defined USE_UNORDERED
boost::unordered_set<boost::shared_ptr>::iterator iter =
this->items.find(itemKey);
#else
ItemIterator begin;
ItemIterator end;
Find( itemKey, begin, end );
return *begin;
#endif
ItemsContainer const& GetContainer() const
{
return this->items;
}
private:
private:
#if !defined USE_UNORDERED /**
* The key method in this model for performing O(log n) lookups
* using equal_range.
*/
void Find( std::string const& id
, ItemsIterator& begin
, ItemsIterator& end )
{
if (this->items.empty())
{
begin = this->items.begin();
end = this->items.end();
}
else
{
boost::shared_ptr<Item> item(new Item());
item->id = id;
ItemRange range = std::equal_range( this->items.begin()
, this->items.end()
, item
, ModelComparator );
}
}
#endif
};
References
[1] - http://jrscppboostnotes.blogspot.com/2013/04/unordered-set.html
The more I dug my heels into the code the more I loathed it. Despised it. I thought, "What in the hell are you guys doing?!" There were even times that I ventured onto the message boards to attempt to get answers only to be ignored. I can't imagine why.
The final conclusion that was made at the time was that the Designer plugin didn't offer everything we needed to write our hardware descriptions. Extending the QML via plugins or static Declarative Item's was not going to work either because at the time (2.5.0) it was impossible for the QML Designer to display custom C++ Declarative Items. The only viable solution was to write a front end ourselves. Thus stating clearly, "I can do it better."
The only piece of QtCreator I borrowed was the QmlJS library which contains code that can parse QML Documents ( I definitely did not want to rewrite this! ). What does all of this have to do with Boost? In summary, the performance of my custom Model was not very good. It lags when trying to enter a couple hundred items. The underlying data structure I used was a sorted vector because I felt it gave me the best performance when looking up items. Find functionality was key to the model and an efficient solution would mean that the Model should be efficient. Sorted vector lookups take O(log n) time; if using equal_range the insertion takes O(log n).
The sorted vector isn't working. Today I'd like to demonstrate how to replace the sorted std::vector with a boost::unordered_map then remark about what the difference in performance feels like- it could be nothing at all. In order to provide a comparison I'm going to use C-style macro definitions to optionally enable the unordered map:
#define USE_UNORDERED
Let's say that my model looks like this:
struct Item
{
std::string id;
std::string description;
};
bool ModelComparator( boost::shared_ptr<Item> lhs
, boost::shared_ptr<Item> rhs )
{
return (lhs->id < rhs->id);
}
class Model : public boost::noncopyable
{
public:
typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;
typedef ItemsContainer::iterator ItemsIterator;
typedef std::pair<ItemsIterator, ItemsIterator> ItemRange;
public:
Model() { }
void AddItem(boost::shared_ptr<Item> item)
{
ItemIterator begin;
ItemIterator end;
Find( item->id, begin, end );
this->items.insert(end, item);
}
boost::shared_ptr<Item> Get(std::string const& itemKey)
{
ItemIterator begin;
ItemIterator end;
Find( itemKey, begin, end );
return *begin;
}
ItemsContainer const& GetContainer() const
{
return this->items;
}
private:
ItemsContainer items;
private:
/**
* The key method in this model for performing O(log n) lookups
* using equal_range.
*/
void Find( std::string const& id
, ItemsIterator& begin
, ItemsIterator& end )
{
if (this->items.empty())
{
begin = this->items.begin();
end = this->items.end();
}
else
{
boost::shared_ptr<Item> item(new Item());
item->id = id;
ItemRange range = std::equal_range( this->items.begin()
, this->items.end()
, item
, ModelComparator );
}
}
};
This is the original code to perform lookups. I even threw in a general "GetContainer" method because this might throw a wrench into what we are trying to accomplish. The first task we have is to set up boost::hash and std::equal_to template specializations [1]:
#if defined USE_UNORDERED
namespace std
{
template <>
struct equal_to
{
bool operator()( boost::shared_ptr<Item> const& lhs
, boost::shared_ptr<Item> const& rhs) const
{
return (lhs->id == rhs->id);
}
};
} // End of std.
namespace boost
{
template <>
struct hash
{
size_t operator()(boost::shared_ptr<Item> const& item) const
{
// Just use the boost::hash with a string, because that's what we
// need.
return boost::hash<std::string>(item->id)
}
};
} // End of boost.
#endif
That sets up the basic hashing mechanism for the items. Now the class member also needs to be redefined if we are using USE_UNORDERED:
typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;
becomes
#if defined USE_UNORDERED
typedef boost::unordered_set<boost::shared_ptr<Item> > ItemsContainer;
#else
typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;
#endif
That's simple enough. Note that the example will fail now because the Add/Get methods do not work with equal_range. Let's fix that by modifying the Add operation:
void AddItem(boost::shared_ptr<Item> item)
{
ItemIterator begin;
ItemIterator end;
Find( item->id, begin, end );
this->items.insert(end, item);
}
becomes:
void AddItem(boost::shared_ptr<Item> item)
{
#if defined USE_UNORDERED
this->items.insert(item);
#else
ItemIterator begin;
ItemIterator end;
Find( item->id, begin, end );
this->items.insert(end, item);
#endif
}
{
#if defined USE_UNORDERED
this->items.insert(item);
#else
ItemIterator begin;
ItemIterator end;
Find( item->id, begin, end );
this->items.insert(end, item);
#endif
}
Notice that the code for unordered_set is much simpler and easier to read. The macros are doing a great job in making the code unreadable; my advice is to take them out once you've proven/disproven there is a performance increase. The second to last item is the Get operation:
boost::shared_ptr<Item> Get(std::string const& itemKey)
{
ItemIterator begin;
ItemIterator end;
Find( itemKey, begin, end );
return *begin;
}
{
ItemIterator begin;
ItemIterator end;
Find( itemKey, begin, end );
return *begin;
}
becomes:
boost::shared_ptr<Item> Get(std::string const& itemKey)
{
#if defined USE_UNORDERED
boost::unordered_set<boost::shared_ptr>::iterator iter =
this->items.find(itemKey);
#else
ItemIterator begin;
ItemIterator end;
Find( itemKey, begin, end );
return *begin;
#endif
}
{
#if defined USE_UNORDERED
boost::unordered_set<boost::shared_ptr>::iterator iter =
this->items.find(itemKey);
#else
ItemIterator begin;
ItemIterator end;
Find( itemKey, begin, end );
return *begin;
#endif
}
The last part of the example that needs modification is the GetContainer() method. There are two solutions for refactoring the GetContainer() method:
1) Leave it alone and fix compiler errors. Just replace use of operator[] with equivalent iterator operations. This is the solution I used at work because the second solution while probably being more correct would have required more refactoring.
2) Replace GetContainer() with begin() and end() methods that return iterators to the underlying container. This is the preferable solution! However it requires more work. It's a much more CPlusPlusee solution.
I'd like to say that the story above resulted in a happy ending and that after all of the refactoring the code was so fast that there were no hiccups. The good news is that it made the model MUCH faster; but the bad news was that it wasn't as fast as it needed to be. There are always other opportunities for optimization! Also please remember if you find the better solution get rid of the macros! Here is the final code from above:
1) Leave it alone and fix compiler errors. Just replace use of operator[] with equivalent iterator operations. This is the solution I used at work because the second solution while probably being more correct would have required more refactoring.
2) Replace GetContainer() with begin() and end() methods that return iterators to the underlying container. This is the preferable solution! However it requires more work. It's a much more CPlusPlusee solution.
I'd like to say that the story above resulted in a happy ending and that after all of the refactoring the code was so fast that there were no hiccups. The good news is that it made the model MUCH faster; but the bad news was that it wasn't as fast as it needed to be. There are always other opportunities for optimization! Also please remember if you find the better solution get rid of the macros! Here is the final code from above:
#define USE_UNORDERED
struct Item
{
std::string id;
std::string description;
};
namespace std
{
template <>
struct equal_to
{
bool operator()( boost::shared_ptr<Item> const& lhs
, boost::shared_ptr<Item> const& rhs) const
{
return (lhs->id == rhs->id);
}
};
} // End of std.
namespace boost
{
template <>
struct hash
{
size_t operator()(boost::shared_ptr<Item> const& item) const
{
// Just use the boost::hash with a string, because that's what we
// need.
return boost::hash<std::string>(item->id)
}
};
} // End of boost.
#endif
bool ModelComparator( boost::shared_ptr<Item> lhs
, boost::shared_ptr<Item> rhs )
{
return (lhs->id < rhs->id);
}
class Model : public boost::noncopyable
{
public:
#if defined USE_UNORDERED
typedef boost::unordered_set<boost::shared_ptr<Item> > ItemsContainer;
#else
typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;
#endif
typedef ItemsContainer::iterator ItemsIterator; typedef std::pair<ItemsIterator, ItemsIterator> ItemRange;
public:
Model() { }
void AddItem(boost::shared_ptr<Item> item)
{
#if defined USE_UNORDERED
this->items.insert(item);
#else
ItemIterator begin;
ItemIterator end;
Find( item->id, begin, end );
this->items.insert(end, item);
#endif
}
boost::shared_ptr<Item> Get(std::string const& itemKey)
{
#if defined USE_UNORDERED
boost::unordered_set<boost::shared_ptr>::iterator iter =
this->items.find(itemKey);
#else
ItemIterator begin;
ItemIterator end;
Find( itemKey, begin, end );
return *begin;
#endif
ItemsContainer const& GetContainer() const
{
return this->items;
}
private:
#if defined USE_UNORDERED
typedef boost::unordered_set<boost::shared_ptr<Item> > ItemsContainer;
#else
typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;
#endif
private:
#if !defined USE_UNORDERED /**
* The key method in this model for performing O(log n) lookups
* using equal_range.
*/
void Find( std::string const& id
, ItemsIterator& begin
, ItemsIterator& end )
{
if (this->items.empty())
{
begin = this->items.begin();
end = this->items.end();
}
else
{
boost::shared_ptr<Item> item(new Item());
item->id = id;
ItemRange range = std::equal_range( this->items.begin()
, this->items.end()
, item
, ModelComparator );
}
}
#endif
};
References
[1] - http://jrscppboostnotes.blogspot.com/2013/04/unordered-set.html
Wednesday, April 24, 2013
Unordered Set
The first container in Boost I want to take a look at is unordered_set [1]. The unordered_map is also very similar. In either case the goal is simple: put together an unordered_set that contains classes other than strings.
Let's start by creating a main function (complete with includes for using unordered_set):
// boost dependencies
#include <boost/unordered_set.hpp>
// std dependencies
#include <string>
/**
* A class that has a couple of members, just a simple number
* and string. In a real world class you would probably put
* more "stuff" in here.
*/
class Item
{
public:
Item(int importanceIn, std::string const& descriptionIn)
: importance(importanceIn)
, description(descriptionIn)
{
}
int importance;
std::string description;
};
std::ostream& operator<<(std::ostream& out, Item const& i)
{
out << "ITEM(" << i.importance << "," << i.description << ");
return out;
}
int main(int argc, char** argv)
{
boost::unordered_set<Item> items;
return 0;
}
Before you type all of this in, know that the example will NOT compile as it is! The reason it will not compile should be apparent in the definition of unordered set [1]:
namespace boost {
template <
class Key,
class Hash =boost::hash
<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<Key> >
classunordered_set
;
The second template argument to the unordered_set is assumed to be boost::hash<Item> in our case. Currently that specialization doesn't exist. Therefore we must provide a specialization to appease our compiler. To specialize, simply refer to the original template definition of boost::hash [2]. Our specialization is simply going to use the "importance" member variable as our hash value:
namespace boost
{
template <>
struct hash<Item>
{
std::size_t operator()(Item const& item) const
{
// SHOULD USE boost::numeric_cast INSTEAD!
return static_cast<size_t>(item.importance);
}
};
}
This will please our compiler; but it's not the last of our problems! Currently we still do not have a specialization for std::equal_to [3]:
namespace std //< You aren't supposed to do this!
{
template <>
struct equal_to<Item>
{
bool operator()(Item const& lhs, Item const& rhs) const
{
return (lhs.importance == rhs.importance) && (lhs.description == rhs.description);
}
};
}
Because we added an equal_to specialization we must also include <functional>. Our implementation now looks like this:
// boost dependencies
#include <boost/unordered_set.hpp>
// std dependencies
#include <functional>
#include <iostream>
#include <ostream>
#include <string>
/**
* A class that has a couple of members, just a simple number
* and string. In a real world class you would probably put
* more "stuff" in here.
*/
class Item
{
public:
Item(int importanceIn, std::string const& descriptionIn)
: importance(importanceIn)
, description(descriptionIn)
{
}
int importance;
std::string description;
};
std::ostream& operator<<(std::ostream& out, Item const& i)
{
out << "ITEM(" << i.importance << "," << i.description << ");
return out;
}
//-----------------------------
// Our Specialization for Hash!
//-----------------------------
namespace boost
{
template <>
struct hash<Item>
{
std::size_t operator()(Item const& item) const
{
// SHOULD USE boost::numeric_cast INSTEAD!
return static_cast<size_t>(item.importance);
}
};
}
//---------------------------------
// Our Specialization for equal_to!
//---------------------------------
namespace std //< You aren't supposed to do this!
{
template <>
struct equal_to<Item>
{
bool operator()(Item const& lhs, Item const& rhs) const
{
return (lhs.importance == rhs.importance) && (lhs.description == rhs.description);
}
};
}
int main(int argc, char** argv)
{
typedef boost::unordered_set<Item> ItemsContainer;
typedef ItemsContainer::iterator Iterator;
ItemsContainer items;
items.insert(Item(5, "Fiver"));
items.insert(Item(15, "Fifteen"));
items.insert(Item(15, "Second Fifteen!"));
for(Iterator iter=items.begin(); iter != items.end(); ++iter)
{
std::cout << *iter << std::endl;
}
return 0;
}
I found the use of boost::unordered_set really simple and easy! I think what I like the best about this is that it is a header only class; this means we don't need to do any linking to use! All that the developer has to do is implement their own boost::hash and std::equal_to and they are good to go. I hope later classes are as easy to use. The full listing of unordered_set is at [4].
References
Subscribe to:
Posts (Atom)