Want speed? Don’t (always) pass by value.

As the title indicates, this post is motivated by Dave Abrahams’ influential blog post “Want speed? Pass by Value”. In that post, Abrahams makes the case that often passing and returning by value is cheaper than passing by reference, or trying to engineer some means to return pointers or references. This is due to copy elision, which has been a part of the language since standardization, and move semantics, which were introduced with C++11. Often, passing and returning by value results in code that is both clearer and safer. This in itself is a compelling argument, speed considerations aside.

This trivial example illustrates this point:

void create_sequence(std::vector<int>& output);

Here, a the role of create_sequence is to set the contents of a vector to a secret sequence. The author passes a vector by reference because passing by reference is cheap. However,

  • The caller has to instantiate a vector anyway.
  • The author has to document what happens to the “output” parameter. Is it cleared? Is the sequence appended?
  • The function’s implementation cannot be made safe to use in a multi-threaded environment.

It would seem much clearer to do this:

std::vector<int> create_sequence();

Now there is no need to document, thread-safe implementations can be provided, and, provided the implementation follows some simple rules, the return value is more than likely to be elided. Similar arguments can be made for passing parameters by value, with and without move semantics.

So far so good. This kind of reasoning has been extended to all kinds of applications, for instance, binary arithmetic operators. This addition operator implementation might look familiar to you:

Foo operator+(const Foo& lhs, const Foo& rhs)
{
  Foo result = lhs;
  result += rhs;
  return result;
}

This implementation leverages the member operator+=. There’s nothing much wrong with it. But, “Want speed, pass by value” has inspired the following implementation, which is increasingly common and is, at the time of writing, the “canonical” example shown on the stackoverflow c++ operator overloading page:

Foo operator+(Foo lhs, const& Foo& rhs)
{
  lhs += rhs;
  return lhs;
}

What is going on here? Well, the operator+= is leveraged as in the first example, but this implementation takes advantage of the fact that a copy of one of the operands is required. Wanting speed, we pass by value, hoping to benefit from copy elision, or at least from move semantics. We also have one less line of code, for no apparent loss of clarity. This has to be a good thing. But there are some issues to think about:

  • The signature could be said to leak implementation details: Why is only one parameter taken by value? Why the first and not the second?
  • The operator is asymmetric WRT its operands. I used to be a physicist. This is the kind of thing that keeps me awake at night.
  • It inhibits return value optimization (RVO)

The first two bullet points most people could probably live with. But the third one is quite important: in the process of trying to sometimes optimize away a copy, we have made sure a well understood and implemented type of copy elision, RVO, does not take place. This is not a good thing.

So we have a good thing vs. not a good thing problem. The question is, does the good thing outweigh the not a good thing enough to make this clever implementation worthwhile? This is something that can only be answered empirically, and the answer is likely to depend on the application.

I tested these implementations of `Foo operator+` in four different but very simple scenarios, with and without move semantics, and with two different compilers. The number of copies generated by

Case 1:

Foo a, b;
Foo c = a + b;

Case 2:

Foo a, b, c;
Foo d = a + b + c;

Case 3:

Foo f = Foo() + Foo();

Case 4:

Foo f = Foo() + Foo() + Foo();

were tested using a verbose class Foo:

#include <iostream>

struct Foo
{
  Foo() { std::cout << "Default constructor\n"; }

  ~Foo() { std::cout << "Destructor\n"; }

  Foo(const Foo&) { std::cout << "Copy constructor\n"; }

  Foo(Foo&&) { std::cout << "Move copy constructor\n"; }

  Foo& operator=(const Foo&)
  {
    std::cout << "Assignment operator";
    return *this;
  }

  Foo& operator=(Foo&&)
  {
    std::cout << "Move assignment operator\n";
    return *this;
  }

  Foo& operator+=(const Foo&) { return *this; }
};

A version without move copy and move assignment was used where applicable. Note that the assignment operators are only shown for completeness. These do not get invoked in the tests.

These are the results. The numbers in parentheses correspond to types without move copy constructors or move assignment operators. In what follows, we refer to the “traditional” implementation, Foo operator+(const Foo&, const Foo&) as Trad, and the alternative, pass by value implenentation Foo operator+(Foo, const Foo&) as Clever, because that is what it it trying to be.

First, we look at the cases involving only lvalues on the RHS:

Case 1: Foo c = a + b;

Implementation Default ctor Copy ctor Move copy ctor Destructor
Trad (no move) 2 (2) 1 (1) 0 (0) 3 (3)
Clever (no move) 2 (2) 1 (2) 1 (0) 4 (4)

Case 2: Foo d = a + b + c;

Implementation Default ctor Copy ctor Move copy ctor Destructor
Trad (no move) 3 (3) 2 (2) 0 (0) 5 (5)
Clever (no move) 3 (3) 1 (3) 2 (0) 6 (6)

As can be seen in cases involving only lvalues, in the absence of move semantics, the “pass by value” version results in one extra object being created, via a copy construction. With move semantics, the “pass by value” version incurs one extra move copy in the first case, whereas in the second case, it swaps a copy construction for two move constructions.

Next, we look at the cases involving only rvalues on the RHS.

Case 3: Foo f = Foo() + Foo();

Implementation Default ctor Copy ctor Move copy ctor Destructor
Trad (no move) 2 (2) 1 (1) 0 (0) 3 (3)
Clever (no move) 2 (2) 0 (1) 1 (0) 3 (3)

Case 4: Foo f = Foo() + Foo() + Foo();

Implementation Default ctor Copy ctor Move copy ctor Destructor
Trad (no move) 3 (3) 2 (2) 0 (0) 5 (5)
Clever (no move) 3 (3) 0 (2) 2 (0) 5 (5)

In the latter two cases, involving rvalues, the “pass by value” implementation would be expected to perform better. It does not incur extra copies as in case 1, and it swaps a copy constructions for move constructions. In the presence of move semantics, and in the case of an efficiently movable type, it shows a benefit. Otherwise, it has the same performance as the traditional implementation.

This all leads me to believe this alternative implementation is not something that should be adopted blindly,if at all. It looks like a premature optimization, or even a pessimization in C++03 or for types without move copy constructors, or for types that are not cheap to move. I would suggest the “pass by value” implementation only be used once it has been empirically established that it yields a benefit in a given domain.

Note that, in C++11, it is possible to get close to the best of all worlds for types that are efficiently movable by providing suitable overloads:

Foo operator+(const Foo& lhs, const Foo& rhs);
Foo operator+(Foo&& lhs,  Foo&& rhs);
Foo operator+(Foo&& lhs, const Foo& rhs);
Foo operator+(const Foo& lhs, Foo&& rhs);

This would swap some copy elisions for move constructions in expressions involving rvalues and has higher maintenance costs than a single operator implementation, so, again, the possible benefits must be determined empirically.


The tests were performed with G++ 4.8.2 and CLANG 3.4, both macports on Mac OSX. The code used for the tests, as well as results, can be found on github.

Advertisements

4 comments

  1. Also a former physicist here: I totally agree with your comment about losing sleep over asymmetric syntactical arguments for operators with symmetric semantics! (I also for some reason tend to care very deeply about physical layout of code and comments)

  2. You can also “Swaptimize.” Take by value, make an empty Foo, swap, work on the local Foo and return it. If your class is fast swapable you get the both RVO and elision.

  3. Great post! Could you provide more explanation of why and how RVO is inhibited, and where the additional copies occur?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s