Conventional wisdom says that no programming language is faster than C, and all higher level languages (such as Haskell) are doomed to be much slower because of their distance from the real machine.
TL;DR: Conventional wisdom is wrong. Nothing can beat highly micro-optimised C, but real everyday C code is not like that, and is often several times slower than the micro-optimised version would be. Meanwhile the high level of Haskell means that the compiler has lots of scope for doing micro-optimisation of its own. As a result it is quite common for everyday Haskell code to run faster than everyday C. Not always, of course, but enough to make the speed difference moot unless you actually plan on doing lots of micro-optimisation.
This view seems to be borne out by the
Computer Language Benchmarks Game (aka the Shootout), which collects implementations of various programs written in different languages and measures their execution speed and memory usage. The winningest programs are always written in highly optimised C. The
Haskell versions run anything from slightly to several times slower than the C versions. Furthermore, if you read the Haskell it is also highly optimised, to the point where it no longer looks like Haskell. Instead it reads like C translated into Haskell. Based on this, you would be justified in concluding that everyday Haskell (that is, Haskell that plays to the strengths of the language in brevity, correctness and composability) must be irredeemably slow.
But then I read
this presentation
by Tony Albrecht, in which he talks about how some seemingly innocent everyday
C++ code is actually very inefficient. Two things in particular caught
my eye:
- A fetch from the main memory when there is a cache miss costs 400 cycles. Pointer indirection in particular tends to fill up cache lines, and hence increase the number of cache misses overall.
- A wrong branch prediction costs over 20 cycles. A branch that skips a
simple calculation can actually run slower than the calculation.
To put it another way, C is no longer close to the real machine. The real machine has 3 levels of CPU cache (some of them shared between multiple cores), long instruction pipelines, branch prediction, multiple ALUs and FPUs, and hardware data flow analysis done while the program is being executed in order to schedule all this in a way that makes it look like a simple processor executing one instruction at a time. C doesn't expose all that to the programmer, so it seems to me that the only way to write highly optimized C is to have a sophisticated mental model of the processor and its memory architecture, decide what you want this machine to do, and then reverse-engineer the C which is going to make that happen. The result is difficult to understand and hence hard to maintain. Look at the C implementations of the Shootout problems for examples of what I mean.
But most code isn't written like that. Tony Albrecht is a games programmer, an expert at squeezing cycles out of the rendering loop. Most developers do not live in that world. For them the objective is to produce code that meets the requirements, which includes being fast enough. This is not laziness or waste, but practicality. First design the optimal algorithm, then implement it in idiomatic code. Only if that does not run fast enough should you start detailed profiling and micro-optimisation. Not only is the micro-optimisation process itself expensive, but it makes the code hard to understand for future maintenance.
So I wondered: the high level of Haskell gives the compiler many more opportunities for micro-optimisation than C. Rather than comparing micro-optimised programs therefore, it seemed sensible to compare everyday programs of the sort that might be produced when readability is more important than raw speed. I wanted to compare programs that solved a problem large enough to have a useful mix of work, but small enough that I could write Haskell and C versions fairly quickly. After poking around the Shootout website I settled on the
reverse-complement problem.
A potential issue was that one of my programs might inadvertently use something highly non-optimal, so I decided I would profile the code and remove anything that turned out to be pointlessly slow, but not change the structure of the program or add complexity merely for the sake of speed. With that in mind I wrote Haskell and C versions. I also downloaded the Shootout winner to get some feel for how my programs compared. You can see my code at the bottom of this post.
The first version of the Haskell took 30 seconds (compared with the Shootout time of about 0.6 seconds). As I had feared, profiling did indeed reveal something pointlessly slow in it. In order to filter out carriage returns from the input I used "
isLetter", but unlike the C
char type the Haskell
Char covers the whole of Unicode, and determining if one of those is a letter is not trivial. I put the filter after the complement operation and compared the result with zero, which in addition to being faster is also the Right Thing if the input contains invalid characters. Once I had this fixed it dropped down to a much more respectable 4.3 seconds. Interestingly, profiling suggests that about half the time is being spent writing out the 60 character lines; merely printing out the result with no line breaks cut execution down to around 2 seconds.
The C version, meanwhile, took 8.2 seconds. Profiling did not directly reveal the cause, but it seemed to imply that the processor was spending most of its time somewhere other than my code. I strongly suspect that this time is being spent in
putc(3) and
getc(3). The obvious optimisation would use
fread(3) and
fwrite(3) to read and write characters in blocks instead of one at a time, but that would require significant changes to the code; extra bookkeeping to deal with the start of the next sequence (signalled by a "
>" character) when it is found half way through a block, and to insert newlines similarly. Unlike the replacement of
isLetter in Haskell this would require new variables and control structures driven by the cost of the solution rather than a simple switch to a less expensive expression
It might be argued that I have tilted the playing field against C by not making these changes, and that any halfway competent C programmer would do so when faced with code that runs too slowly. If
isLetter is pointlessly slow, isn't the same thing true of
putc(3) and
getc(3)? But I think there is a clear
difference. Both programs are written in a character-oriented way because the problem is described in terms of characters. I wrote the inner loop of the C to operate on a linked list of blocks because it looked like a faster and simpler choice than copying the whole string into a new buffer twice the size every time it overflowed (on average this algorithm copies each character once or twice; see Knuth for details). I might have considered reading or writing the characters in reverse order rather than doing the in-memory reverse in a separate function, but profiling didn't show that as a significant time sink. Overall, getting decent performance out of the C is going to take about the same amount of work as writing the code in the first place.
On the other hand the Haskell has decent performance out of the box because the compiler automates a lot of the micro-optimisation that C forces you to do manually. It may not do as good a job as a human with plenty of time might do, but it does it automatically and reasonably well.
This isn't principally about code size, but for the record the Haskell has 42 SLOC, of which 21 are executable. The C has 115 SLOC, of which 63 are executable. The Shootout C winner has 70 SLOC, of which 46 are executable (counting comma separated statements as one line per statement).
So here is the bottom line. If you really need your code to run as fast as possible, and you are planning to spend half your development time doing profiling and optimisation, then go ahead and write it in C because no other language (except possibly Assembler) will give you the level of control you need. But if you aren't planning to do a significant amount of micro-optimisation then you don't need C. Haskell will give you the same performance, or better, and cut your development and maintenance costs by 75 to 90 percent as well.
Update: here are the compilation and run commands:
ghc -main-is HaskellText -O2 -fllvm -funbox-strict-fields HaskellText.hs
gcc -O2 naive-c.c -o naive
time ./HaskellText < big-fasta.txt > /dev/null
time ./naive < big-fasta.txt > /dev/null