Friday, April 26, 2013

I Can Do It Better...

At work I have to deal with QtCreator and building it from source. Please keep in mind this was Qt 2.5.0 that I was dealing with; so things may have changed. When looking through the code I was baffled by their design motivations. I guess given that it's an open source project the design is probably more of an afterthought. The goal of the project was simple, use their "model" to store QML so that we could use it in something that could run physical hardware. QML is an excellent language because of its declarative syntax it makes a world of sense to use when writing software that needs to map information onto hardware.

The more I dug my heels into the code the more I loathed it. Despised it. I thought, "What in the hell are you guys doing?!" There were even times that I ventured onto the message boards to attempt to get answers only to be ignored. I can't imagine why.

The final conclusion that was made at the time was that the Designer plugin didn't offer everything we needed to write our hardware descriptions. Extending the QML via plugins or static Declarative Item's was not going to work either because at the time (2.5.0) it was impossible for the QML Designer to display custom C++ Declarative Items. The only viable solution was to write a front end ourselves. Thus stating clearly, "I can do it better."

The only piece of QtCreator I borrowed was the QmlJS library which contains code that can parse QML Documents ( I definitely did not want to rewrite this! ). What does all of this have to do with Boost? In summary, the performance of my custom Model was not very good. It lags when trying to enter a couple hundred items. The underlying data structure I used was a sorted vector because I felt it gave me the best performance when looking up items. Find functionality was key to the model and an efficient solution would mean that the Model should be efficient. Sorted vector lookups take O(log n) time; if using equal_range the insertion takes O(log n).

The sorted vector isn't working. Today I'd like to demonstrate how to replace the sorted std::vector with a boost::unordered_map then remark about what the difference in performance feels like- it could be nothing at all. In order to provide a comparison I'm going to use C-style macro definitions to optionally enable the unordered map:

#define USE_UNORDERED

Let's say that my model looks like this:

struct Item
{
    std::string id;
    std::string description;
};


bool ModelComparator( boost::shared_ptr<Item> lhs
                    , boost::shared_ptr<Item> rhs )
{
    return (lhs->id < rhs->id);
}


class Model : public boost::noncopyable
{
    public:

        typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;
        typedef ItemsContainer::iterator ItemsIterator;
        typedef std::pair<ItemsIterator, ItemsIterator> ItemRange;


    public:
        Model() { }
        void AddItem(boost::shared_ptr<Item> item)
        {
            ItemIterator begin;
            ItemIterator end;
            Find( item->id, begin, end );
            this->items.insert(end, item);
        }

        boost::shared_ptr<Item> Get(std::string const& itemKey)
        {
            ItemIterator begin;
            ItemIterator end;
            Find( itemKey, begin, end );
            return *begin;
        }

        ItemsContainer const& GetContainer() const
        {
            return this->items;
        }

    private:
        ItemsContainer items;


    private:

        /**
         * The key method in this model for performing O(log n) lookups
         * using equal_range.
         */
        void Find( std::string const& id
                 , ItemsIterator& begin
                 , ItemsIterator& end )
        {
            if (this->items.empty()) 
            {
                begin = this->items.begin();
                end = this->items.end();
            }
            else
            {
                 boost::shared_ptr<Item> item(new Item());
                 item->id = id;
                 ItemRange range = std::equal_range( this->items.begin()
                                                   , this->items.end()
                                                   , item
                                                   , ModelComparator );
            }
        }
};


This is the original code to perform lookups. I even threw in a general "GetContainer" method because this might throw a wrench into what we are trying to accomplish. The first task we have is to set up boost::hash and std::equal_to template specializations [1]:

#if defined USE_UNORDERED

namespace std
{

template <>
struct equal_to 
{
    bool operator()( boost::shared_ptr<Item> const& lhs
                   , boost::shared_ptr<Item> const& rhs) const
    {
        return (lhs->id == rhs->id);
    }
};

} // End of std.

namespace boost
{

template <>
struct hash 
{
    size_t operator()(boost::shared_ptr<Item> const& item) const
    {
        // Just use the boost::hash with a string, because that's what we 
        // need.
        return boost::hash<std::string>(item->id)
    }
};

} // End of boost.

#endif

That sets up the basic hashing mechanism for the items. Now the class member also needs to be redefined if we are using USE_UNORDERED:



        typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;

becomes

#if defined USE_UNORDERED
        typedef boost::unordered_set<boost::shared_ptr<Item> > ItemsContainer;
#else
        typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;
#endif



That's simple enough. Note that the example will fail now because the Add/Get methods do not work with equal_range. Let's fix that by modifying the Add operation:


        void AddItem(boost::shared_ptr<Item> item)
        {
            ItemIterator begin;
            ItemIterator end;
            Find( item->id, begin, end );
            this->items.insert(end, item);
        }

becomes:

        void AddItem(boost::shared_ptr<Item> item)
        {
#if defined USE_UNORDERED
            this->items.insert(item);
#else
            ItemIterator begin;
            ItemIterator end;
            Find( item->id, begin, end );
            this->items.insert(end, item);
#endif
        }


Notice that the code for unordered_set is much simpler and easier to read. The macros are doing a great job in making the code unreadable; my advice is to take them out once you've proven/disproven there is a performance increase. The second to last item is the Get operation:

        boost::shared_ptr<Item> Get(std::string const& itemKey)
        {
            ItemIterator begin;
            ItemIterator end;
            Find( itemKey, begin, end );
            return *begin;
        }

becomes:

        boost::shared_ptr<Item> Get(std::string const& itemKey)
        {
#if defined USE_UNORDERED
            boost::unordered_set<boost::shared_ptr>::iterator iter = 
                this->items.find(itemKey);
#else
            ItemIterator begin;
            ItemIterator end;
            Find( itemKey, begin, end );
            return *begin;
#endif
        }

The last part of the example that needs modification is the GetContainer() method. There are two solutions for refactoring the GetContainer() method:

1) Leave it alone and fix compiler errors. Just replace use of operator[] with equivalent iterator operations. This is the solution I used at work because the second solution while probably being more correct would have required more refactoring.

2) Replace GetContainer() with begin() and end() methods that return iterators to the underlying container. This is the preferable solution! However it requires more work. It's a much more CPlusPlusee solution. 

I'd like to say that the story above resulted in a happy ending and that after all of the refactoring the code was so fast that there were no hiccups. The good news is that it made the model MUCH faster; but the bad news was that it wasn't as fast as it needed to be. There are always other opportunities for optimization! Also please remember if you find the better solution get rid of the macros! Here is the final code from above:


#define USE_UNORDERED

struct Item
{
    std::string id;
    std::string description;
};



namespace std
{

template <>
struct equal_to 
{
    bool operator()( boost::shared_ptr<Item> const& lhs
                   , boost::shared_ptr<Item> const& rhs) const
    {
        return (lhs->id == rhs->id);
    }
};

} // End of std.

namespace boost
{

template <>
struct hash 
{
    size_t operator()(boost::shared_ptr<Item> const& item) const
    {
        // Just use the boost::hash with a string, because that's what we 
        // need.
        return boost::hash<std::string>(item->id)
    }
};

} // End of boost.

#endif


bool ModelComparator( boost::shared_ptr<Item> lhs
                    , boost::shared_ptr<Item> rhs )
{
    return (lhs->id < rhs->id);
}


class Model : public boost::noncopyable
{
    public:

#if defined USE_UNORDERED
        typedef boost::unordered_set<boost::shared_ptr<Item> > ItemsContainer;
#else
        typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;
#endif
        typedef ItemsContainer::iterator ItemsIterator;
        typedef std::pair<ItemsIterator, ItemsIterator> ItemRange;


    public:
        Model() { }


        void AddItem(boost::shared_ptr<Item> item)
        {
#if defined USE_UNORDERED
            this->items.insert(item);
#else
            ItemIterator begin;
            ItemIterator end;
            Find( item->id, begin, end );
            this->items.insert(end, item);
#endif
        }



        boost::shared_ptr<Item> Get(std::string const& itemKey)
        {
#if defined USE_UNORDERED
            boost::unordered_set<boost::shared_ptr>::iterator iter = 
                this->items.find(itemKey);
#else
            ItemIterator begin;
            ItemIterator end;
            Find( itemKey, begin, end );
            return *begin;
#endif


        ItemsContainer const& GetContainer() const
        {
            return this->items;
        }

    private:

#if defined USE_UNORDERED
        typedef boost::unordered_set<boost::shared_ptr<Item> > ItemsContainer;
#else
        typedef std::vector<boost::shared_ptr<Item> > ItemsContainer;
#endif




    private:

#if !defined USE_UNORDERED        /**
         * The key method in this model for performing O(log n) lookups
         * using equal_range.
         */
        void Find( std::string const& id
                 , ItemsIterator& begin
                 , ItemsIterator& end )
        {
            if (this->items.empty()) 
            {
                begin = this->items.begin();
                end = this->items.end();
            }
            else
            {
                 boost::shared_ptr<Item> item(new Item());
                 item->id = id;
                 ItemRange range = std::equal_range( this->items.begin()
                                                   , this->items.end()
                                                   , item
                                                   , ModelComparator );
            }
        }
#endif

};



References
[1] - http://jrscppboostnotes.blogspot.com/2013/04/unordered-set.html


No comments:

Post a Comment