You are currently browsing the category archive for the ‘PHP’ category.

The other day I was thinking wayyyy too hard about nothing in particular, and I thought about the simple ambiguity of naming ourselves.

From the movie Orgasmo:

Maxxx Orbison:
What’s your name, again?
Sancho:
I am Sancho.
Maxxx Orbison:
Look, I get a lot of people auditioning all the time. What makes you think that you’d be good enough for porno?
Sancho:
I am Sancho.
Maxxx Orbison:
Great… but what do you do?
Sancho:
What do I do? I am Sancho.
Maxxx Orbison:
And…?
Sancho:
And there are many Jeffs in the world, and many Toms as well. But I… am Sancho.
Maxxx Orbison:
And…?
Sancho:
Are you Sancho? No you are not. Neither is Scott Baio Sancho. Frank Gifford is not Sancho. But I…
Maxxx Orbison:
You… are Sancho!
Sancho:
That’s right.
Maxxx Orbison:
Okay, you’re hired.

In a nutshell, Sancho is stating something more abstract than just his name. I can say “I am Tom Barta”, but typically I just mean “My name is Tom Barta.” This guy isn’t just named Sancho, he is Sancho, and there is no other.

Think about it: what do you really mean if you go up to a stranger and ask, “Who are you?” The answer could be “John”, “a businessman”, “your neighbor”, or “a pround Republican”. Similarly, if you approach someone and say “Hi, I am John,” I can imagine it would take a tiny bit more processing than saying “Hi, my name is John.” If your name is rare (like Moon Unit or Apple), it’s even more important to remove any ambiguity. “I am Apple” sounds like nonsense or pidgin.

There’s a Nerd-Tangent Hidden in Every Real-Life Thought

There is a parallel between this and programming. The advent of object-oriented programming has popularized the notion of “object identity” versus “object equality”. I can have two objects sitting in memory with identical data. Sometimes, that’s just a programming error, and the same object has been copied unnecessarily (this happens frequently with caching or persistent systems). Sometimes, the two objects genuinely are different. How can I tell if one is just an alias, or if they are logically distinct?

It depends, of course. If I am looking at Value Objects, there’s generally not a reason to distinguish them by identity. The color Red is always Red, even if I have two Reds. However, with Entities, identity is of utmost importance. Two John Smiths in a customer database represent different people. Another way to think about it is in the context of the Flyweight pattern. The two Reds could be replaced with a flyweight without affecting the program. However, the John Smiths couldn’t.

Enter Programming Languages

Of course, programming languages that use objects must have some way of distinguishing object identity from object equality.

Language Identity Equality
C, C++ &a == &b a == b
Java Anyone care to fill this one in for me? I’m unaware of the semantics of == and equals().
PHP nothing! a == b (coerce types to match) or a === b (check types)
Python a is b a == b

Whoops! Looks like PHP doesn’t even have object identity! I’d like for someone to be able to refute this, but I haven’t been able to figure it out. PHP documentation claims === means identical and == means equal, but that certainly doesn’t match the notion of object identity I just explained. Sadly, this essentially means that object identity will never truly work in PHP. Instead, we are left with “equal” and “more equal”.

Does it Matter?

In the big picture, I don’t think it hurts PHP programmers to lose object identity. Most PHP applications are business-logic interfaces sitting on top of relational databases. What’s special about the RDBMS in this context? Well, object identity doesn’t exist. I know Postgres has oid and there are probably others, but using them for general applications seems to be unfavorable. In a database table, objects (tuples/records/rows) are identified by a primary key that disallows duplicates and frequently uses auto-incrementing integers. It’s a trivial solution, really, to just assign a number to everyone who walks in the door (until you run out of numbers, of course).

Since the database enforces this uniqueness, I know that two customers both named John Smith will at least have different customer IDs. Social Security, credit cards, university student IDs, and phone numbers all revolve around this notion of unique numeric identity. Consequently, almost any PHP application using a RDBMS can simply piggyback upon the database’s IDs and trivially state that === is now identical to ==.

Advertisements

Recently, I took a look at ADOdb, ADOdb-Lite, and PDO to find a replacement for PEAR::DB. I’ve looked at PEAR::MDB2, and not been happy with it for the same reasons I’m not happy with PEAR::DB:

  1. Since the application is tied to Postgres, there’s not much benefit from getting a database-agnostic driver. I’d rather have something that supports my database well, instead of all databases in a mediocre and outdated fashion.
  2. Not directly a complaint of PEAR itself, but my manager prefers wrapping existing OSS solutions to modifying the source (less concern about patches). Wrapping around PEAR::DB introduces all sorts of efficiency problems, since whatever data processing happens gets done twice. So not anything for the fault of PEAR, but it doesn’t fit with the development method of “wrap” vs “inject”.
  3. Following from #1, PEAR supports older versions of PHP/Postgres than I need, with the result that it won’t use all of the modern functionality I want.

For those of you who are curious, I did discover that ADOdb does use the existing pg_*() functions. Plus, the ADOdbs both target database independence through API-level SQL generation, instead of raw SQL. I ended up going with PDO for my project with a few modifications:

  • Exceptions are enabled by default
  • query() can take parametrized statements (why isn’t this done by default?)
  • Fetchmode is associative by default, so i can foreach a record
  • Statement execute() returns $this instead of an error code (I’m using exceptions anyways) so I can chain other functions onto it
  • I’ve added a few fetch*() functions for clarity in common use-cases

All of these modifications are intended to improve PDO’s overall usability (interfaces aren’t just for end users).

class PDO_ extends PDO {

  function __construct($dsn, $username, $password) {
    parent::__construct($dsn, $username, $password);
    $this->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
  }

  function prepare($sql) {
    $stmt = parent::prepare($sql, array(
      PDO::ATTR_STATEMENT_CLASS => array('PDOStatement_')
    ));

    $stmt->setFetchMode(PDO::FETCH_ASSOC);
    return $stmt;
  }

  function query($sql, $params = array()) {
    $stmt = $this->prepare($sql);
    $stmt->execute($params);
    return $stmt;
  }
}

class PDOStatement_ extends PDOStatement {

  function execute($params = array()) {
    parent::execute($params);
    return $this;
  }

  function fetchSingle() {
    return $this->fetchColumn(0);
  }

  function fetchAssoc() {
    $this->setFetchMode(PDO::FETCH_NUM);
    $data = array();
    while ($row = $this->fetch()) {
      $data[$row[0]] = $row[1];
    }
    return $data;
  }
}

The end result of this is that I can use the connection much more naturally. It’s not quite a fluent interface, but you can see that it’s improved:

chained fetches after statement execution:
$stmt = $db->prepare(
    'SELECT first, last FROM users WHERE uid = ?');
$user1 = $stmt->execute(array(1))->fetch();
$user2 = $stmt->execute(array(2))->fetch();
parametrized direct querying:
$uid = $db->query(
    'SELECT uid FROM users WHERE first = ? AND last = ?',
    array($first, $last));
fetching an associative array of $uid => $username:
$users = $db->query('SELECT uid, username FROM users')
    ->fetchAssoc();
selecting a single aggregate:
$count = $db->query('SELECT COUNT(*) FROM users')
    ->fetchSingle();

I also add RAII-style transactional programming. This object can be created at the top of a function and committed at the bottom of the function. If an exception is thrown in between, it will automatically roll back the transaction that it started upon creation. This is useful in cases where exceptions are not being handled, but must bubble up to higher layers in the code.

class Transaction {

  private $db = NULL;
  private $finished = FALSE;

  function __construct($db) {
    $this->db = $db;
    $this->db->beginTransaction();
  }

  function __destruct() {
    if (!$this->finished) {
      $this->db->rollback();
    }
  }

  function commit() {
    $this->finished = TRUE;
    $this->db->commit();
  }

  function rollback() {
    $this->finished = TRUE;
    $this->db->rollback();
  }
}

This little guy is really convenient. Any model-level code that works with the database typically doesn’t need to address database errors (actually, I usually let them bubble up to the global error handler, which can then log it and print an apologetic 500 HTTP response).

function addUser($db, $username, $friend_users) {
  $txn = new Transaction($db);

  $db->query('INSERT INTO users(username) VALUES (?)', array($username);
  $uid = $db->query('SELECT currval(?)', array('username_id_seq'))
    ->fetchSingle();

  $stmt = $db->prepare('INSERT INTO friendships(uid1, uid2) VALUES (?, ?)
  foreach ($friend_users as $fuid => $fname) {
    $stmt->execute(array($uid, $fuid));
  }
  $txn->commit();
}

No exception handling required at all. If anything throws an error (e.g. an invalid friend), the stack unwinds and $txn automatically rolls back the transaction in its destructor.

I downloaded the WordPress codebase the other day and I’ve been browsing through it. I’d like to contribute to the project, since I use it myself and am starting to encourage others to do so as well. I don’t really know where to start, but documentation and simple refactorings seem to be a good start.

Here are a few suggestions that I’ve been able to think of while browsing. Some of them are purely style-based (I’m dreading hostile responses from the Big Ball Of Mud natives), but I’d like to clean up the code without actually changing anything.

Read the rest of this entry »

I love RAII. Unfortunately, most popular enterprisey languages don’t support it. I’ve seen people claim that Java’s finally is essentially RAII, but that’s not quite right. I hate seeing try/catch/finally scattered throughout application code.

One thing I definitely haven’t seen is any mention of RAII in PHP. Nasty ugly weak PHP that’s fallen out of favor with the hip. I can’t possibly be the only persion to have thought about this, because one of the bugfixes in PHP 5.2 is properly ordered destructors. However, since most people aren’t developing for PHP 5.2, I suppose it’s not a surprise that the subject isn’t googleable. Anyhow, prior to PHP 5.2, objects in the same scope would be destructed in arbitrary scope (usually the order in which they were declared, which sounds backwards), so multiple resources acquired in one function couldn’t be interdependent without invoking undefined behavior. That’s bad, but it’s not a show-stopper.

Read the rest of this entry »

You know what really grinds my gears? Reading “PHP Security” articles on the internet and discovering nothing but crap in them. I keep seeing people attempting to work around register_globals, or using regular expressions to attempt to filter data that they should merely be escaping.
The only good source I’ve seen is Chris Shiflett, and this is probably going to be the same as what he says:

  • Disable register_globals
  • Disable all magic_quotes (or include a stripslashes() in your input filtering process)
  • Filter all input (regexes are ok for this) to bring it into a “pure” state
  • Escape all output that can be traced back to user input, so malicious input cannot alter the syntactical structure of the output
    • SQL querys are output
    • HTML is output
    • Shell execution is output
  • Use SafeHTML if you really need to allow HTML syntax… but realize that you’re still not guaranteed to be safe and design your application to be minimally dangerous when compromised
  • Enable error_reporting(E_ALL) (E_STRICT too, if you’re using PHP5 exclusively)
  • Log errors internally instead of printing them for production servers
  • Define all variables before use
  • Don’t store sensitive information or logic client-side
    • Cookies can be hijacked
    • Javascript validation must be considered optional, and supported by server-side validation.
  • Be aware of character encodingsĀ 

The one thing I was looking for was session-ID protection, to minimize the damage possible if a malicious user does manage to grab cookies using a XSS exploit.

I’m a PHP programmer. I’ve been using it since ’02 or ’03, and I’m very familiar with the language and “enough” of the packages to get by. Over the last few days, I’ve been taking another look at Django. I’ve looked through most of its code (it’s surprisingly small, I figured it would be bigger) and am trying to wrap my head around it to see how effectively I could actually use it. Most of the time, I’m really comparing it to how I’ve solved a particular problem with PHP in the past, or seeing what sort of Program Transformation is involved between what I’ve done and what the Django devs have done.

It’s really interesting to see a big common area (and makes me satisfied, that my design isn’t crazy), but then to see something that, for me, has worked with a single method, but that in their framework has been extracted out into their entire MiddleWare suite. Me likey.

Read the rest of this entry »