You are currently browsing the monthly archive for April 2008.

I’m moving this week, which means tearing down my living room server. So much for my uptime!

 17:59:28 up 310 days, 22:29,  4 users,  load average: 0.41, 1.09, 1.33

In the meantime, I’ll be backing up my files over SSH to my dad’s homemade NAS box… we’ll see how long that takes with my RCN connection.

Here’s a few tidbits that I’ve run into when building latency-sensitive applications in Linux…

gcc -c -fno-guess-branch-probabilities

If GCC is not provided with branch probability data (either by using -fprofile-generate or __builtin_expect), it will run its own estimate. Unfortunately, very minor changes in source code can have very major changes in these estimates, which can foil someone attempting to measure the effect of source-level optimizations. Personally, my opinion is to use -fprofile-generate and -fprofile-use for maximum performance, but the next best is to have consistent results while developing.

Also, see this brief gcc mailing list discussion about the topic.

ld -z now
Equivalent to gcc -Wl,-z,now (which is probably preferrable, since I write C++ and always use the g++ frontend), this tells the dynamic linker to resolve symbols on program start instead of on first use. I’ve noticed that sometimes, the first run through a particular code path results in a latency of 800 milliseconds. That’s huge; that’s big enough for a human to notice with the naked eye. For server processes, that generally also means that the first request serviced on startup blows chunks, latency-wise. While it may not make a huge difference in the grand scheme of things, it’s nice to be able to pay the cost of symbol lookup up front instead of at first use.

Most software developers have stumbled across the humble C atoi() function. And for most simple jobs, it is sufficient. However, I’ve actually run into cases where it becomes the performance bottleneck, so I’ve spent a little bit of time playing around with various optimizations.

So, what are the problems with the standard atoi() that would cause me to even think about reimplementing it myself for a custom case?

  • It doesn’t provide useful error indicators. It just returns 0 on failure. It also doesn’t consider trailing non-digits to be errors, so its return code can’t really be trusted for total conversion, anyways.
  • It’s at least an order of magnitude slower than rolling your own.

Overall, I’d probably stick with boost::lexical_cast or std::istringstream for the average case, where performance is negligable. since I always use C++, I don’t see any reason to drop down to strtod, but I suppose I would use that if I were writing C code.

A Simple Replacement

For your edification, here’s a simple replacement function that handles most of the stock C version:

int atoi(const char *str) {
    while (*str == ' ') ++str; // skip leading whitespace
    int sign = 1;
    switch (*str) { // handle leading +/-
        case '-': ++str; sign = -1; break;
        case '+': ++str; break;
    }
    int value = 0;
    while (char c = *str) {
        if (c  '9' || value < 0) return 0; // non-digit, or overflow
        value *= 10;
        value += c - '0';
    }
    if (value < 0) return 0; // integer overflow
    return value * sign;
}

Optimizing my Special Case

In the cases I care about, there are a few additional constraints I can typically take advantage of:

  • Negative numbers can show up, but positive numbers never have a leading + character. Leading 0s never show up.
  • Size of the string is known a priori, so I can work from the back to the front.
  • Maximum size of the string is known (a 32-bit integer isn’t ever going to be more than 10 characters), so I can manually unroll the loop.
  • The caller has validated that there are no non-digits (or doesn’t care if an input error cascades to an output error silently), so I don’t need to check if all values are between 0 and 9 inclusive.

So here’s the optimized version. For some reason, wrapping this up in a function object (called many times per instantiation) performed significantly better than using a plain old global function.

class atoi_func
{
public:
    atoi_func(): value_() {}

    inline int value() const { return value_; }

    inline bool operator() (const char *str, size_t len)
    {
        value_ = 0;
        int sign = 1;
        if (str[0] == '-') { // handle negative
            sign = -1;
            ++str;
            --len;
        }

        switch (len) { // handle up to 10 digits, assume we're 32-bit
            case 10:    value_ += (str[len-10] - '0') * 1000000000;
            case  9:    value_ += (str[len- 9] - '0') * 100000000;
            case  8:    value_ += (str[len- 8] - '0') * 10000000;
            case  7:    value_ += (str[len- 7] - '0') * 1000000;
            case  6:    value_ += (str[len- 6] - '0') * 100000;
            case  5:    value_ += (str[len- 5] - '0') * 10000;
            case  4:    value_ += (str[len- 4] - '0') * 1000;
            case  3:    value_ += (str[len- 3] - '0') * 100;
            case  2:    value_ += (str[len- 2] - '0') * 10;
            case  1:    value_ += (str[len- 1] - '0');
                value_ *= sign;
                return value_ > 0;
            default:
                return false;
        }
    }
private:
    int value_;
};

Results

I compared this optimized version with the standard library function by running each 1 million times in a row. I know for a fact that it’s bad, particularly because it reduces the impact of instruction cache misses. However, I expect that this only pessimizes my specialization, which I would expect to exhibit much better locality due to aggressive inlining. Here are the times:

standard atoi()
79142 milliseconds, or approximately 79 nanoseconds per iteration amortized.
class atoi_func
131 milliseconds. On my machine, that works out to about 4 clock cycles, which can’t be right. My best guess is that the numbers are too small for my clock resolution. I’ll take this to work and try on our machines there, they have much higher resolution for clock_gettime(CLOCK_REALTIME).

Sometime in the recent past, I ran across a bit of code that looked something like this:

// C++ code to parse an ASCII message sent across the network
void parseMessage(std::string serializedObject)
{
    std::string sString = message.substr(0, 1); // first character codifies the data type
    if (sString == "1") { ... }
    else if (sString == "2") { ... }
    else if (sString == "3") { ... }
    else { throw std::runtime_error("invalid message '" + serializedObject + "'"); }
}

Seriously? Let’s enumerate the ways that this could be improved:

  1. The first character describes the object type. A std::string isn’t necessary, a single character will do just fine.
  2. if/elseif/elseif/else should really be replaced with a switch block. If I were writing this in Python, I’d probably create a Dict of first-class functions and call the appropriate one. But in C++, switch is the closest thing; it’s more efficient than chained ifs, and (more importantly) IMHO its intent is clearer.
  3. The declaration std::string sString tells us three times that it is a string, but it never tells us what it means. At best, it’s a combination of misuse of Hungarian Notation and a lazy variable name. What’s worse—because it’s only ever a single character (and not a full string), the declaration is lying to us three times.

So what would I do to clean this up?

// C++ code to parse an ASCII message sent across the network
void parseMessage(std::string serializedObject)
{
    const char TYPE_FOO = '1'; // maybe these are defined elsewhere...
    const char TYPE_BAR = '2';
    const char TYPE_BAZ = '3';

    // wordpress doesn't like a literal ASCII null value, ignore the extra space
    char messageType = message.size() ? message.front() : '\ 0';
    switch (messageType)
    {
        case TYPE_FOO:
            ...
            break;
        case TYPE_BAR:
            ...
            break;
        case TYPE_BAZ:
            ...
            break;
        default:
            throw std::runtime_error("invalid message '" + serializedObject + "'");
    }
}

Ok, so that rant hit more than just variable names. I guess you could argue with me about some of the other style issues, but there’s no forgiveness for sString. That’s a lazy mind, and IMHO I’d never want to work next to someone who bothered checking that into source control.

For some more references on naming variables, functions, etc… I’d recommend checking out these pages:

  • Tim Ottinger’s Rules for Variable and Class Naming. I don’t recall where I ran into this, but I agree with most of what he’s saying. The main place where I would extend what he says is when he describes using names from either the Problem Domain or the Solution Domain. It’s quite likely that I just haven’t worked on systems of the same scope as him, but my general opinion is thus:
    • If it’s relevant to the business, it should always be described in terms of the problem domain.
    • If it’s a necessary piece of getting the business parts to work, but it is not at all relevant to the business, it should always be described in terms of the solution domain.
    • Some code may fit in between… choose the solution domain, if possible. But for christsake, don’t call something a FactoryManager!

    That way, properly-layered software can be described at a high level entirely using business-relevant concepts and structure. Anyone looking at the implementation details and lower levels will naturally need to understand basic software development concepts such as data structures and algorithms. It’s really a matter of identifying your audience and writing specifically to them. As to the people who name their design patterns explicitly… fie on you. It’s a pattern because of its behavior, not because of its name. The executable code should eschew pattern-speak if possible, and focus more on the problem domain.

  • Ubiquitous Language by Eric Evans in his book Domain-Driven Design (also see the c2.com page, spartan that it may be). Evans’ entire book is based around making the problem domain explicit in any software implementation, and a Ubiquitous Language is the first way to describe it. I worked at an ISP in the past, and differentiating between Customer, Account, Device, Access Point, etc. were all critically important for clarifying how the system was supposed to work—particularly when the time came to add multiple Devices per Customer, and the structural code changes practially wrote themselves. The best code that I wrote there all embedded that knowledge into the code itself, so anyone reading it would learn that part of the domain by learning the code.