by Tom Ellis on 27th October 2012
(a rhetorical question)
Honza Pokorny set Haskell Reddit ablaze with his criticism of Haskell’s string-handling performance. It was discussed at length in one story and CharlesStain wrote a response with a proposed solution.
The benchmark left a lot vague. Python’s str.capitalize
upper-cases the first letter in the string but lower-cases the rest, whereas his C and Haskell versions upper-case the initial letter and leave the rest unchanged; his C version forces lines to end after 1024 characters; and he does not specify character encoding. However, it’s worth pointing out that idiomatic Haskell can be astonishingly fast at this kind of task.
I generated a 175MB file of random words using a Python program:
import random
letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def text():
for i in xrange(175 * 1024 * 1024):
if random.random() < 0.01:
yield "\n"
elif random.random() < 0.1:
yield " "
else:
yield letters[int(random.random() * len(letters))]
print ''.join(text())
This is Honza’s C version:
#include <stdio.h>
#include <string.h>
int main(void) {
static const char filename[] = "file";
FILE *file = fopen(filename, "r");
if (file != NULL) {
char line[1024];
while(fgets(line, sizeof line, file) != NULL) {
line[strcspn (line, "\n")] = '\0';
int lengthOfLine = strlen(line);
int word = 0;
int i;
for (i = 0; i < lengthOfLine; i++) {
if (isalpha(line[i])) {
if (!word) {
line[i] = (char) toupper (line[i]);
word = 1;
}
} else {
word = 0;
}
}
printf ("%s\n", line);
}
fclose(file);
} else {
perror(filename);
}
return 0;
}
On my machine the -O3
optimized version runs in 2.6 seconds. It iterates over line
three times, once in strcspn
and once in the for
loop. I tried making it iterate only once, but that wasn’t actually any faster.
Here is a simplification of CharlesStain’s version:
import Control.Monad
import qualified Data.ByteString as B
import Data.Word8
{- | Converts a Char to a Word8. Took from MissingH -}
c2w8 :: Char -> Word8
c2w8 = fromIntegral . fromEnum
capitalize :: B.ByteString -> B.ByteString
capitalize = B.tail . B.scanl (\a b -> if isSpace a then toUpper b else b) (c2w8 ' ')
main = (B.putStr <=< (return . capitalize) <=< B.readFile) "file"
On my machine the -O2
optimized version runs in 1.8 seconds. It is significantly faster than the C!
The ByteString
-using version reads the whole file into memory. Haskell can do it 1024 bytes at a time too, but the only such way I found that maintains performance is using the hardcore mutability of foreign pointers:
import Data.Word8 (Word8, toUpper)
import Control.Monad (when)
import System.IO (Handle, openBinaryFile, IOMode(ReadMode), hGetBuf, hPutBuf, stdout)
import GHC.ForeignPtr (mallocPlainForeignPtrBytes, ForeignPtr)
import Foreign.ForeignPtr (withForeignPtr)
import Foreign.Storable (peekByteOff, pokeByteOff)
import Foreign.Ptr (Ptr)
main :: IO ()
main = do
ptr <- mallocPlainForeignPtrBytes 1024
handle <- openBinaryFile "file" ReadMode
untilDone (readAndPrint handle ptr) False
readAndPrint :: Handle -> ForeignPtr Word8 -> Bool -> IO (Bool, More)
readAndPrint h ptr s = withForeignPtr ptr $ \p -> do
count <- hGetBuf h p 1024
s' <- capitalize s count p
hPutBuf stdout p count
return $ if count < 1024 then (s', Done) else (s', More)
capitalizeLetter :: Word8 -> Bool -> (Maybe Word8, Bool)
capitalizeLetter c previouslyWithinWord = if isAlpha_ c
then if not previouslyWithinWord
then (Just $ toUpper c, True)
else (Nothing, True)
else (Nothing, False)
capitalize :: Bool -> Int -> Ptr Word8 -> IO Bool
capitalize = mapAccumL_ capitalizeLetter
data More = More | Done deriving Eq
untilDone :: (s -> IO (s, More)) -> s -> IO ()
untilDone f = again
where again s = do
(s', more) <- f s
when (more == More) (again s')
mapAccumL_ :: (Word8 -> s -> (Maybe Word8, s)) -> s -> Int -> Ptr Word8 -> IO s
mapAccumL_ f s_ total p = go 0 s_
where go i s = if i == total then return s
else do
old <- peekByteOff p i :: IO Word8
let (maybe_new, next_s) = f old s
case maybe_new of Nothing -> return ()
Just new -> pokeByteOff p i new
go (i+1) next_s
isAlpha_ :: Word8 -> Bool
isAlpha_ c = ((65 <= c) && (c <= 90)) || ((97 <= c) && (c <= 122))
With -O2
it takes 2.1s, uses barely any memory, and no inlining is required. This code looks longer and more complicated than the C, but that is an illusion. It is full of simple combinators and utility functions, and there is no reason they, or something like them, should not be part of a standard library. (Perhaps they already are! This is my first time using Ptr
and ForeignPtr
). A special implementation of isAlpha_
is necessary for this to be fast, because the one provided by Data.Word8
is more complicated.
This version used to be as fast as CharlesStain’s version, until Verroq on Reddit pointed out a bug. Thanks Verroq!
Why are these simple text processing programs so much faster than in C? I don’t know. They do not all perform the same task, for one thing! The behaviour of C’s toupper
depends on the locale, so perhaps that is making it slower than it would be if it had exactly the same behaviour as the Haskell versions.
On Reddit, Syzygies has produced a neat piece of Haskell which is even faster. Someone enthusiastic can wring every last drop of performance out of C and make it faster than Haskell. However that’s not the point. The point is that writing simple and idiomatic Haskell is productive and a pleasure. We should not, and do not, have to sacrifice performance.