Thursday, January 1, 2015

Standard Remove Functions

The subject of this blog entry is removing elements from containers in C++. When it comes to removing elements I either 1) have to look up how to do removals or 2) I handwrite the loop and feel guilty all day. This entry focuses most specifically on the std::remove family of functions and what they actually do.

What I Know

I know that in a standard vector, erase is the weapon of choice for removing elements from a container. Erase is of course only one part of the equation; the standard library also has a function that does removals. At this point I know of the "erase-remove" idiom [1] but am unsure of how to apply it.

In the case of associative containers that return back constant iterators the erase-remove idiom does not apply (see [1] for details). It makes sense because std::remove actually reorders the elements in the container.

I want to make this next assertion in bold: remove/remove_if do NOT resize the containers and do not necessarily erase values from a container. That statement is the key to understanding what remove actually does.

The Remove Part of Erase-Remove

Let's start with an example program to clearly explain what's going on with std::remove:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

int main(int argc, char** argv)
{
    std::vector<int> v = { 8, 3, 5, 7, 9, 2, 4, 1 };
    
    std::copy( v.begin(), v.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << std::endl;

    std::remove(v.begin(), v.end(), 9);

    std::copy( v.begin(), v.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << std::endl;
}

What I thought would happen would be that 9 moves to the end of the container. So the output I thought to expect was:

8 3 5 7 9 2 4 1
8 3 5 7 2 4 1 9

This is NOT what I observed when I ran the program. Instead what I got was:

8 3 5 7 9 2 4 1
8 3 5 7 2 4 1 1

That makes more sense. Why copy 9 to the end of the container if it's just going to be removed anyways? It's an efficiency thing. If that's the case, why not just have remove simply get rid of the elements? Again- it's more efficient to remove the elements then erase all of items in bulk at the end of a container.

I guess the moral of this part of the entry is that don't count on remove keeping the value of whatever it is you are removing- as is illustrated with this simple program it does the remove (as advertised) but does NOT move it to the end of the container! This may matter if you are using a container of shared pointers.

Remove-If

Often times when I'm writing code to remove elements from a container it's not as simple as "remove this exact item from the container". Most of the time it involves some sort of predicate case. That's where remove_if comes into play. With the addition of lambda expressions it's even more addictive to use:


#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

int main(int argc, char** argv)
{
    std::vector<int> v = { 8, 3, 5, 7, 9, 2, 4, 1 };
    
    std::copy( v.begin(), v.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << std::endl;

    std::remove(v.begin(), v.end(),[](int n) -> bool { return (n % 2 == 1); });

    std::copy( v.begin(), v.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << std::endl;
}
The example above will remove any odd number from the container. From the remove example above, I'd expect the output of my program now to be:

8 3 5 7 9 2 4 1
8 2 4 7 9 1 1 1

The reason I think this is the container is because it moves all of the elements with the non-matching criteria (i.e. the elements to NOT be removed) to the front of the container. Anything else in the back is irrelevant. Therefore I move all the non-matches to the front in order (8, 2, 4). At this point I have no idea what is at the end of the container. The real answer is actually:

8 3 5 7 9 2 4 1
8 2 4 7 9 2 4 1
This makes sense. Why would it put 1's at the end? It just left the good values in because remove_if doesn't give a flying whatever as to what is removed.

Remove Copy

Okay, what the hell is this one? The remove_copy function is weird because it is meant to be used differently than remove/remove_if. It is so different than remove/remove_if I've made the subheading red. Instead of removing from within a container it copies the preserved values to an output iterator. Here's an example:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <list>
#include <vector>

int main(int argc, char** argv)
{
    std::vector<int> v = { 8, 3, 5, 7, 9, 2, 4, 1 };
    std::list<int> outputs;

    std::copy( v.begin(), v.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << std::endl;

    std::remove_copy(v.begin(), v.end(), std::back_inserter(outputs), 5);

    std::copy( v.begin(), v.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << std::endl;

    std::copy( outputs.begin(), outputs.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << std::endl;
}
One more time let's play the game of "Expected Outputs". I would have expected:

8 3 5 7 9 2 4 1
8 3 7 9 2 4 1 1
8 3 7 9 2 4 1

Once again I was wrong- the original vector is no longer modified! So the correct answer is:

8 3 5 7 9 2 4 1
8 3 5 7 9 2 4 1
8 3 7 9 2 4 1

Wow. The other thing that this example is missing is that the iterator that is returned from the remove_copy does not actually point at the container that is passed in. This makes sense because the container passed in is never altered. Instead the iterator returned is an output iterator that points at the end of the output container [2]. Did not see that coming.

Almost wish I could have attended the meeting where they decided to add remove_copy. It's a strange addition and breaks with how remove and remove_if are used. It's got great utility and I'd almost prefer using remove_copy in most code that doesn't have to go fast since it is a bit more understandable (i.e. doesn't need an additional erase step).

REFERENCES

[1] - http://en.wikipedia.org/wiki/Erase-remove_idiom

[2] - http://www.cplusplus.com/reference/algorithm/remove_copy/



No comments:

Post a Comment