Sunday, February 26, 2012

Programming Language Readability

Lets compare some Python to Haskell for solving the same problem.  The problem we’ll pick is Trie data-structure for auto-completions.  We are interested not so much in the nitty gritty of the algorithm, but in the language style itself.  Auto-complete has been in the programming news a lot recently; both a Python and a Haskell solver have turned up.

Here’s the Python:

"""
A fast data structure for searching strings with autocomplete support.
"""

class Trie(object):
    def __init__(self, value=None):
        self.children = {}
        self.value = value
        self.flag = False # Flag to represent that a word ends at this node

    def add(self, char):
        val = self.value + char if self.value else char
        self.children[char] = Trie(val)

    def insert(self, word):
        node = self
        for char in word:
            if char not in node.children:
                node.add(char)
            node = node.children[char]
        node.flag = True

    def find(self, word):
        node = self
        for char in word:
            if char not in node.children:
                return None
            node = node.children[char]
        return node.value

    def all_prefixes(self):
        results = set()
        if self.flag:
            results.add(self.value)
        if not self.children: return results
        return reduce(lambda a, b: a | b, [node.all_prefixes() for node in self.children.values()]) | results

    def autocomplete(self, prefix):
        node = self
        for char in prefix:
            if char not in node.children:
                return set()
            node = node.children[char]
        return node.all_prefixes() 

And here’s the Haskell:

module TST
       ( TST
       , empty
       , singleton
       , toList
       , fromList
       , insert
       , prefix
       , matchWL
       , lookup
       ) where

import Control.Arrow (first)

import Wildcard

import Prelude hiding (lookup)

data TST c a = Branch c (TST c a) (TST c a) (TST c a)
             | End a (TST c a)
             | Null

instance (Ord c, Show c, Show a) => Show (TST c a) where
  show = ("fromList " ++) . show . toList

instance (Ord c, Eq c, Eq a) => Eq (TST c a) where
  t1 == t2 = toList t1 == toList t2

empty :: TST l a
empty = Null

singleton :: [c] -> a -> TST c a
singleton [] v      = End v Null
singleton (c : s) v = Branch c Null (singleton s v) Null

toList :: Ord c => TST c a -> [([c], a)]
toList = prefix []

fromList :: Ord c => [([c], a)] -> TST c a
fromList = foldr (uncurry insert) empty

insert :: Ord c => [c] -> a -> TST c a -> TST c a
insert []       v  Null              = End v Null
insert []       v  (End _ t)         = End v t
insert []       v  (Branch c l m r)  = Branch c (insert [] v l) m r
insert s        v  Null              = singleton s v
insert s        v1 (End v2 t)        = End v2 (insert s v1 t)
insert (c1 : s) v  (Branch c2 l m r) =
  case compare c1 c2 of
    LT -> Branch c2 (insert (c1 : s) v l) m r
    EQ -> Branch c2 l (insert s v m) r
    GT -> Branch c2 l m (insert (c1 : s) v r)

prefix :: Ord c => [c] -> TST c a -> [([c], a)]
prefix _        Null              = []
prefix []       (End v t)         = ([], v) : prefix [] t
prefix []       (Branch c l m r)  =
  prefix [] l ++ map (first (c :)) (prefix [] m) ++ prefix [] r
prefix s        (End _ t)         = prefix s t
prefix (c1 : s) (Branch c2 l m r) =
  case compare c1 c2 of
    LT -> prefix (c1 : s) l
    EQ -> map (first (c1 :)) (prefix s m)
    GT -> prefix (c1 : s) r

matchWL :: Ord c => WildList c -> TST c a -> [([c], a)]
matchWL _       Null              = []
matchWL []      (End v _)         = [([], v)]
matchWL []      (Branch _ l _ _)  = matchWL [] l
matchWL s       (End _ t)         = matchWL s t
matchWL (w : s) (Branch c2 l m r) =
  let left   = matchWL (w : s) l
      middle = map (first (c2 :)) (matchWL s m)
      right  = matchWL (w : s) r
  in case w of
    Wildcard -> left ++ middle ++ right
    El c1 -> case compare c1 c2 of
      LT -> left
      EQ -> middle
      GT -> right

lookup :: Ord c => [c] -> TST c a -> Maybe a
lookup s t =
  case prefix s t of
    ((s', v):_) -> if s == s' then Just v else Nothing
    _           -> Nothing

Wow.  Stand back from the monitor and squint at them :)

My opinion is that Python is massively more readable than Haskell.

Why is Python more readable?

The Python is executable pseudo-code.  Python has this wonderful ability to read like a description of the algorithm you are implementing.  If a list comprehension or something isn’t obvious, you could just rewrite the Python in a less-pretentious way and recover this readability.

The Python has indent and this is important to speed-reading.  It is telling you which lines you have to study - the function names, mostly, at first glance - and which bits you don’t.  Indent is very important to readability.

Second is the use of short words instead of symbols.  Perhaps the Haskell is trying a little to hard to feel like algebra?  In that direction lays APL (mind blowing cool video) but few call it readable.

Python goes further than most to use short concise words; COBOL used long words (I studied COBOL; mercifully forgotten it all now) and the curly-brackets club use more symbols than words, as though being less typing is faster.  (FWIW, typing curly brackets on non-US keyboards requires an ambidextrous octopus.  And just you try and find the back-tick!)

Python is so readable because of significant whitespace, its conciseness and use of small words instead of symbols.

Haskell is doing neither of these.  There is some nicely indented functions but you get the feeling they are frowned upon by Haskellers.  Haskell can move to `case` instead of all those duplicate left-aligned pattern preambles, but it seems sorely under-used.  And still the symbols add noise in the squint test.

You really have to read each line of Haskell code, building a mental model of the Python-style indented functions and such in your head as, examining each line, you mentally juggle how it mutates this model.

(Side topic: I don’t think its about being declarative or imperative.)

Most of the training material on Haskell that I’ve read is aimed at students with no prior programming experience.  Almost as though, by learning Haskell first, you can avoid the tainting impurities of imperativeness and be a better programmer!  Perhaps Haskell’s unreadability is meant as a cleverness-filter to keep the practitioners elitist?  Just joking.  Perhaps more likely is that after staring at chalk-filled blackboards the idea of notation instead of short words seems no barrier at all, and they just have to find ascii line-art to bare a passing resemblance to such algebra.  Perhaps C was popular because its Pascal with more symbols, so it must be cooler?  If its hard to write it should be hard to understand mentality?

You can make functional languages in the Python readability style; you don’t have to try so hard to look like algebra.  You can make static fast languages in Python readability style too.  This readability vs symbols is really the defining difference between Ruby and Python too.

There are as many new languages as ever; my dream programming language post got suggestions other than Haskell, including Rust, D and Tart.  Those three are curly bracket languages again though.  Big opportunity miss.

End is the same as a curly bracket; look at this Ruby:

class CreateUsers < ActiveRecord::Migration
  def change
    create_table :users do |t|
      t.string :role
      t.string :name

      t.timestamps
    end
  end
end

I once asked a friend to do a line-count on their Lua code.  Fully 30% of non-empty lines contained just the keyword `end`.  Talk about noise cluttering up and making the nesting less readable!

I lament that all languages aren’t trying to be readable the Python way; to my thinking, it ought to be goal number 0.

Spelling Suggestions Problem

The Haskell goes on to use Levenshtein distance - edit distance - (here) to offer spelling corrections.  Interestingly, Dr Pete Norvig has written a neat essay on this with Python source-code.  Lets focus on readability over algorithm.  Compare and contrast them too.

Test-Driven-Problem-Solving

Dr Pete Norvig’s spelling corrector leads us off on two nice tangents.

Firstly, A long time ago, the Test-Driven-Development gurus imagined that by writing test-cases and evolving the code to pass it they could solve problems they couldn’t actually sit down and solve the traditional understanding-the-problem way.  And it is so worth a read :)

As usual, tangents spring to mind: TDD evolving solutions is really paralleled in my old adventures in genetic programming and also Kolmogorov Complexity.

Secondly and sadly less compelling reading, I’ve been looking at tries vs sets for spelling correctors in the Causes challenge.  Just thought experiments.  A very long time ago I wrote a fast C crossword solver for an otherwise-Python entry in Al Zimmerman’s word search contest.  (Luckily a link that works; newer Al Zimmerman’s contests have been pushed off-line and lost forever it seems.)  It was a trie where at each depth it stored a linked list across words so you could find all words with an unknown at a particular point by iterating down the peers on each side.  Its an idea that hasn’t found so much favor in compression but which I think could be applied to LZ4 encoders perhaps.  (Yet another rainy-day project.)  And so I could imagine using a node in trie to navigate its peers to iterate over those that are edit-distance-1 -away, rather than generating all the possible words and seeing if they are in a word-list.  This would be seeing if cache locality can ever beat O(1) random lookups in a hash table :)

(This post is inspired to explain why “looks like Python” is all through my dream programming language requirements.  If you liked this, you may also like the rest of the blog ;) )

[Every blog post contains a small word-misuse for the grammar glue-chewers; this post its bare instead of bear.  Do tell me if you spot any others that are accidental :) ]

Notes

  1. download-movies-new-movies reblogged this from williamedwardscoder and added:
    Movie Thea
  2. katychuang reblogged this from williamedwardscoder
  3. mt3-666 reblogged this from williamedwardscoder
  4. mlkell reblogged this from inky
  5. inky reblogged this from williamedwardscoder
  6. katsyoshi reblogged this from williamedwardscoder
  7. williamedwardscoder posted this

 ↓ click the "share" button below!