You are currently browsing the category archive for the ‘Coding Tips’ category.

Sometime in the recent past, I ran across a bit of code that looked something like this:

// C++ code to parse an ASCII message sent across the network
void parseMessage(std::string serializedObject)
{
    std::string sString = message.substr(0, 1); // first character codifies the data type
    if (sString == "1") { ... }
    else if (sString == "2") { ... }
    else if (sString == "3") { ... }
    else { throw std::runtime_error("invalid message '" + serializedObject + "'"); }
}

Seriously? Let’s enumerate the ways that this could be improved:

  1. The first character describes the object type. A std::string isn’t necessary, a single character will do just fine.
  2. if/elseif/elseif/else should really be replaced with a switch block. If I were writing this in Python, I’d probably create a Dict of first-class functions and call the appropriate one. But in C++, switch is the closest thing; it’s more efficient than chained ifs, and (more importantly) IMHO its intent is clearer.
  3. The declaration std::string sString tells us three times that it is a string, but it never tells us what it means. At best, it’s a combination of misuse of Hungarian Notation and a lazy variable name. What’s worse—because it’s only ever a single character (and not a full string), the declaration is lying to us three times.

So what would I do to clean this up?

// C++ code to parse an ASCII message sent across the network
void parseMessage(std::string serializedObject)
{
    const char TYPE_FOO = '1'; // maybe these are defined elsewhere...
    const char TYPE_BAR = '2';
    const char TYPE_BAZ = '3';

    // wordpress doesn't like a literal ASCII null value, ignore the extra space
    char messageType = message.size() ? message.front() : '\ 0';
    switch (messageType)
    {
        case TYPE_FOO:
            ...
            break;
        case TYPE_BAR:
            ...
            break;
        case TYPE_BAZ:
            ...
            break;
        default:
            throw std::runtime_error("invalid message '" + serializedObject + "'");
    }
}

Ok, so that rant hit more than just variable names. I guess you could argue with me about some of the other style issues, but there’s no forgiveness for sString. That’s a lazy mind, and IMHO I’d never want to work next to someone who bothered checking that into source control.

For some more references on naming variables, functions, etc… I’d recommend checking out these pages:

  • Tim Ottinger’s Rules for Variable and Class Naming. I don’t recall where I ran into this, but I agree with most of what he’s saying. The main place where I would extend what he says is when he describes using names from either the Problem Domain or the Solution Domain. It’s quite likely that I just haven’t worked on systems of the same scope as him, but my general opinion is thus:
    • If it’s relevant to the business, it should always be described in terms of the problem domain.
    • If it’s a necessary piece of getting the business parts to work, but it is not at all relevant to the business, it should always be described in terms of the solution domain.
    • Some code may fit in between… choose the solution domain, if possible. But for christsake, don’t call something a FactoryManager!

    That way, properly-layered software can be described at a high level entirely using business-relevant concepts and structure. Anyone looking at the implementation details and lower levels will naturally need to understand basic software development concepts such as data structures and algorithms. It’s really a matter of identifying your audience and writing specifically to them. As to the people who name their design patterns explicitly… fie on you. It’s a pattern because of its behavior, not because of its name. The executable code should eschew pattern-speak if possible, and focus more on the problem domain.

  • Ubiquitous Language by Eric Evans in his book Domain-Driven Design (also see the c2.com page, spartan that it may be). Evans’ entire book is based around making the problem domain explicit in any software implementation, and a Ubiquitous Language is the first way to describe it. I worked at an ISP in the past, and differentiating between Customer, Account, Device, Access Point, etc. were all critically important for clarifying how the system was supposed to work—particularly when the time came to add multiple Devices per Customer, and the structural code changes practially wrote themselves. The best code that I wrote there all embedded that knowledge into the code itself, so anyone reading it would learn that part of the domain by learning the code.

Today was my last day of work at Pavlov Media. Sad day, the last day of a job. Anyways, I don’t keep this blog to talk about personal ramblings, so I’ll keep it short. One of the guys pointed out the large number of obscure tidbits I know about PostgreSQL and PHP development (most of the PHP is stuff you won’t see in a book, but as tiny repeatable timesavers or readability improvements in a large). So, I’m going to try to post random things I know about Postgres that I consider on the obscure side. Most of them are already well-documented, but stuff a newcomer would probably miss.

Postgres Tip #1: Use COALESCE and friends

The functions COALESCE(...), NULLIF(a,b), GREATEST(...), and LEAST(...) are all well-hidden ways to improve the readability of SQL queries. I’m going to talk about the first today.

COALESCE takes an arbitrary number of parameters, and simply returns the first one that’s not NULL. This has come in handy at work in two specific cases:

  1. If table vendors has field website, and table products (with a foreign key into vendors) also has field website, I can pull the most specific website of a product available with the SQL query:

    SELECT COALESCE(p.website, v.website) FROM products p JOIN vendors v...

    The alternative would have been to use a rather obtuse-looking CASE statement like this:

    SELECT CASE WHEN p.website IS NULL THEN v.website ELSE p.website END FROM products p JOIN vendors v...

  2. If a NULL value in a table is logically equivalent to another non-NULL value, it simplifies conditionals. For example, the Freeside billing system treats NULL and 0 both as “empty” in date fields, so I can check if a date is “empty” with only one comparison:

    ... WHERE COALESCE(date_field, 0) <> 0

    Again, the alternative is bulkier and harder to read:

    ... WHERE (date_field IS NOT NULL OR date_field <> 0)

    This is particularly useful for string fields, where the empty string and the NULL string are sometimes interchangeable. In this case, COALESCE really helps when dealing with someone else’s sloppy database design.

In both of these cases, the advantage is small in a toy example like this, but a SQL query with 3 default-values and 2 “maybe-NULL” conditionals, the difference is huge.