You are currently browsing the category archive for the ‘ASM’ category.

I like to sneak bitshifts into interviews—not because they’re used commonly in modern C++ code, but because they used to be common, as a way of getting good performance out of poor compilers. It’s very useful to know the tricks of the past, if you ever find yourself maintaining code written by an earlier generation of programmer. Take this example:

unsigned imul(unsigned x)
{
    return x * 10;
}

unsigned bitshift(unsigned x)
{
    return (x << 3) + (x << 1);
}

Below is the assembler produced by gcc -c -O3 -march=core2, using objdump -d --no-show-raw-insn to get the assembler from the compiled output. There are two interesting things to note:

  1. The compiler uses address calculation hardware for simple arithmetic: lea is basically a strided array access, base + stride * i.
  2. The compiler doesn’t use any shifts at all.
00000000 <imul>:
   0:	push   %ebp
   1:	mov    %esp,%ebp
   3:	mov    0x8(%ebp),%eax
   6:	pop    %ebp
   7:	lea    (%eax,%eax,4),%eax    # n + 4n
   a:	add    %eax,%eax             # (n + 4n) + (n + 4n)
   c:	ret    

00000010 <bitshift>:
  10:	push   %ebp
  11:	mov    %esp,%ebp
  13:	mov    0x8(%ebp),%eax
  16:	pop    %ebp
  17:	lea    0x0(,%eax,8),%edx     # 8n
  1e:	lea    (%edx,%eax,2),%eax    # 8n + 2n
  21:	ret    

For comparison, here is the same code compiled with gcc -c -O0 -march=i386. Note that shifts are used in both cases. If you try a few other values of -O and -march, you’ll see some other interesting results, but I’m not going to bother to paste them all here.

00000000 <imul>:
   0:	push   %ebp
   1:	mov    %esp,%ebp
   3:	mov    0x8(%ebp),%edx
   6:	mov    %edx,%eax
   8:	shl    $0x2,%eax    # n << 2
   b:	add    %edx,%eax    # (n << 2) + n
   d:	shl    %eax         # ((n << 2) + n) << 1
   f:	leave  
  10:	ret    

00000011 <bitshift>:
  11:	push   %ebp
  12:	mov    %esp,%ebp
  14:	mov    0x8(%ebp),%eax
  17:	lea    0x0(,%eax,8),%edx      # 8n
  1e:	mov    0x8(%ebp),%eax
  21:	shl    %eax                   # n << 1
  23:	lea    (%edx,%eax,1),%eax     # 8n + (n << 1)
  26:	leave  
  27:	ret    

If you go through some of the major Intel processor models, you will see that the actual assembler output varies quite a bit. What does this mean? Mostly that micro-optimizations designed to produce ever-so-slightly better assembler are usually the wrong approach for long-lived software. Yet, it’s a fact of software that this code will be seen on any sufficiently large system, and it must be understood and fixed if possible.

Developers working on embedded systems with a restricted set of compilers… YMMV. Sorry.

Follow

Get every new post delivered to your Inbox.