You are currently browsing the category archive for the ‘Postgres’ category.

The other day I was thinking wayyyy too hard about nothing in particular, and I thought about the simple ambiguity of naming ourselves.

From the movie Orgasmo:

Maxxx Orbison:
What’s your name, again?
Sancho:
I am Sancho.
Maxxx Orbison:
Look, I get a lot of people auditioning all the time. What makes you think that you’d be good enough for porno?
Sancho:
I am Sancho.
Maxxx Orbison:
Great… but what do you do?
Sancho:
What do I do? I am Sancho.
Maxxx Orbison:
And…?
Sancho:
And there are many Jeffs in the world, and many Toms as well. But I… am Sancho.
Maxxx Orbison:
And…?
Sancho:
Are you Sancho? No you are not. Neither is Scott Baio Sancho. Frank Gifford is not Sancho. But I…
Maxxx Orbison:
You… are Sancho!
Sancho:
That’s right.
Maxxx Orbison:
Okay, you’re hired.

In a nutshell, Sancho is stating something more abstract than just his name. I can say “I am Tom Barta”, but typically I just mean “My name is Tom Barta.” This guy isn’t just named Sancho, he is Sancho, and there is no other.

Think about it: what do you really mean if you go up to a stranger and ask, “Who are you?” The answer could be “John”, “a businessman”, “your neighbor”, or “a pround Republican”. Similarly, if you approach someone and say “Hi, I am John,” I can imagine it would take a tiny bit more processing than saying “Hi, my name is John.” If your name is rare (like Moon Unit or Apple), it’s even more important to remove any ambiguity. “I am Apple” sounds like nonsense or pidgin.

There’s a Nerd-Tangent Hidden in Every Real-Life Thought

There is a parallel between this and programming. The advent of object-oriented programming has popularized the notion of “object identity” versus “object equality”. I can have two objects sitting in memory with identical data. Sometimes, that’s just a programming error, and the same object has been copied unnecessarily (this happens frequently with caching or persistent systems). Sometimes, the two objects genuinely are different. How can I tell if one is just an alias, or if they are logically distinct?

It depends, of course. If I am looking at Value Objects, there’s generally not a reason to distinguish them by identity. The color Red is always Red, even if I have two Reds. However, with Entities, identity is of utmost importance. Two John Smiths in a customer database represent different people. Another way to think about it is in the context of the Flyweight pattern. The two Reds could be replaced with a flyweight without affecting the program. However, the John Smiths couldn’t.

Enter Programming Languages

Of course, programming languages that use objects must have some way of distinguishing object identity from object equality.

Language Identity Equality
C, C++ &a == &b a == b
Java Anyone care to fill this one in for me? I’m unaware of the semantics of == and equals().
PHP nothing! a == b (coerce types to match) or a === b (check types)
Python a is b a == b

Whoops! Looks like PHP doesn’t even have object identity! I’d like for someone to be able to refute this, but I haven’t been able to figure it out. PHP documentation claims === means identical and == means equal, but that certainly doesn’t match the notion of object identity I just explained. Sadly, this essentially means that object identity will never truly work in PHP. Instead, we are left with “equal” and “more equal”.

Does it Matter?

In the big picture, I don’t think it hurts PHP programmers to lose object identity. Most PHP applications are business-logic interfaces sitting on top of relational databases. What’s special about the RDBMS in this context? Well, object identity doesn’t exist. I know Postgres has oid and there are probably others, but using them for general applications seems to be unfavorable. In a database table, objects (tuples/records/rows) are identified by a primary key that disallows duplicates and frequently uses auto-incrementing integers. It’s a trivial solution, really, to just assign a number to everyone who walks in the door (until you run out of numbers, of course).

Since the database enforces this uniqueness, I know that two customers both named John Smith will at least have different customer IDs. Social Security, credit cards, university student IDs, and phone numbers all revolve around this notion of unique numeric identity. Consequently, almost any PHP application using a RDBMS can simply piggyback upon the database’s IDs and trivially state that === is now identical to ==.

Recently, I took a look at ADOdb, ADOdb-Lite, and PDO to find a replacement for PEAR::DB. I’ve looked at PEAR::MDB2, and not been happy with it for the same reasons I’m not happy with PEAR::DB:

  1. Since the application is tied to Postgres, there’s not much benefit from getting a database-agnostic driver. I’d rather have something that supports my database well, instead of all databases in a mediocre and outdated fashion.
  2. Not directly a complaint of PEAR itself, but my manager prefers wrapping existing OSS solutions to modifying the source (less concern about patches). Wrapping around PEAR::DB introduces all sorts of efficiency problems, since whatever data processing happens gets done twice. So not anything for the fault of PEAR, but it doesn’t fit with the development method of “wrap” vs “inject”.
  3. Following from #1, PEAR supports older versions of PHP/Postgres than I need, with the result that it won’t use all of the modern functionality I want.

For those of you who are curious, I did discover that ADOdb does use the existing pg_*() functions. Plus, the ADOdbs both target database independence through API-level SQL generation, instead of raw SQL. I ended up going with PDO for my project with a few modifications:

  • Exceptions are enabled by default
  • query() can take parametrized statements (why isn’t this done by default?)
  • Fetchmode is associative by default, so i can foreach a record
  • Statement execute() returns $this instead of an error code (I’m using exceptions anyways) so I can chain other functions onto it
  • I’ve added a few fetch*() functions for clarity in common use-cases

All of these modifications are intended to improve PDO’s overall usability (interfaces aren’t just for end users).

class PDO_ extends PDO {

  function __construct($dsn, $username, $password) {
    parent::__construct($dsn, $username, $password);
    $this->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
  }

  function prepare($sql) {
    $stmt = parent::prepare($sql, array(
      PDO::ATTR_STATEMENT_CLASS => array('PDOStatement_')
    ));

    $stmt->setFetchMode(PDO::FETCH_ASSOC);
    return $stmt;
  }

  function query($sql, $params = array()) {
    $stmt = $this->prepare($sql);
    $stmt->execute($params);
    return $stmt;
  }
}

class PDOStatement_ extends PDOStatement {

  function execute($params = array()) {
    parent::execute($params);
    return $this;
  }

  function fetchSingle() {
    return $this->fetchColumn(0);
  }

  function fetchAssoc() {
    $this->setFetchMode(PDO::FETCH_NUM);
    $data = array();
    while ($row = $this->fetch()) {
      $data[$row[0]] = $row[1];
    }
    return $data;
  }
}

The end result of this is that I can use the connection much more naturally. It’s not quite a fluent interface, but you can see that it’s improved:

chained fetches after statement execution:
$stmt = $db->prepare(
    'SELECT first, last FROM users WHERE uid = ?');
$user1 = $stmt->execute(array(1))->fetch();
$user2 = $stmt->execute(array(2))->fetch();
parametrized direct querying:
$uid = $db->query(
    'SELECT uid FROM users WHERE first = ? AND last = ?',
    array($first, $last));
fetching an associative array of $uid => $username:
$users = $db->query('SELECT uid, username FROM users')
    ->fetchAssoc();
selecting a single aggregate:
$count = $db->query('SELECT COUNT(*) FROM users')
    ->fetchSingle();

I also add RAII-style transactional programming. This object can be created at the top of a function and committed at the bottom of the function. If an exception is thrown in between, it will automatically roll back the transaction that it started upon creation. This is useful in cases where exceptions are not being handled, but must bubble up to higher layers in the code.

class Transaction {

  private $db = NULL;
  private $finished = FALSE;

  function __construct($db) {
    $this->db = $db;
    $this->db->beginTransaction();
  }

  function __destruct() {
    if (!$this->finished) {
      $this->db->rollback();
    }
  }

  function commit() {
    $this->finished = TRUE;
    $this->db->commit();
  }

  function rollback() {
    $this->finished = TRUE;
    $this->db->rollback();
  }
}

This little guy is really convenient. Any model-level code that works with the database typically doesn’t need to address database errors (actually, I usually let them bubble up to the global error handler, which can then log it and print an apologetic 500 HTTP response).

function addUser($db, $username, $friend_users) {
  $txn = new Transaction($db);

  $db->query('INSERT INTO users(username) VALUES (?)', array($username);
  $uid = $db->query('SELECT currval(?)', array('username_id_seq'))
    ->fetchSingle();

  $stmt = $db->prepare('INSERT INTO friendships(uid1, uid2) VALUES (?, ?)
  foreach ($friend_users as $fuid => $fname) {
    $stmt->execute(array($uid, $fuid));
  }
  $txn->commit();
}

No exception handling required at all. If anything throws an error (e.g. an invalid friend), the stack unwinds and $txn automatically rolls back the transaction in its destructor.

Today was my last day of work at Pavlov Media. Sad day, the last day of a job. Anyways, I don’t keep this blog to talk about personal ramblings, so I’ll keep it short. One of the guys pointed out the large number of obscure tidbits I know about PostgreSQL and PHP development (most of the PHP is stuff you won’t see in a book, but as tiny repeatable timesavers or readability improvements in a large). So, I’m going to try to post random things I know about Postgres that I consider on the obscure side. Most of them are already well-documented, but stuff a newcomer would probably miss.

Postgres Tip #1: Use COALESCE and friends

The functions COALESCE(...), NULLIF(a,b), GREATEST(...), and LEAST(...) are all well-hidden ways to improve the readability of SQL queries. I’m going to talk about the first today.

COALESCE takes an arbitrary number of parameters, and simply returns the first one that’s not NULL. This has come in handy at work in two specific cases:

  1. If table vendors has field website, and table products (with a foreign key into vendors) also has field website, I can pull the most specific website of a product available with the SQL query:

    SELECT COALESCE(p.website, v.website) FROM products p JOIN vendors v...

    The alternative would have been to use a rather obtuse-looking CASE statement like this:

    SELECT CASE WHEN p.website IS NULL THEN v.website ELSE p.website END FROM products p JOIN vendors v...

  2. If a NULL value in a table is logically equivalent to another non-NULL value, it simplifies conditionals. For example, the Freeside billing system treats NULL and 0 both as “empty” in date fields, so I can check if a date is “empty” with only one comparison:

    ... WHERE COALESCE(date_field, 0) <> 0

    Again, the alternative is bulkier and harder to read:

    ... WHERE (date_field IS NOT NULL OR date_field <> 0)

    This is particularly useful for string fields, where the empty string and the NULL string are sometimes interchangeable. In this case, COALESCE really helps when dealing with someone else’s sloppy database design.

In both of these cases, the advantage is small in a toy example like this, but a SQL query with 3 default-values and 2 “maybe-NULL” conditionals, the difference is huge.