Cause’s Levenshtein Distance Challenge
I hadn’t seen this before, but I bumped into a Python version on the web: Alexandros Marino
His time of 11 seconds - 12 seconds on my laptop - seems a little slow :)
From the beginning, two things jump out: that he’s using the same splices a lot and that he’s using yield.
So lets tackle the re-use of slices first: if you want the left-part of the current word then you can just getting the slice more than once is redudant; simply put b[:i] into a local variable called left:
import string
w = set(open("word.list").read().splitlines())
f, nf = set(), set(["causes"])
#from b, yield all unused words where levdist==1
def nextgen(b):
for i in xrange(len(b)): #for each index in b
left, ch, at, right = b[:i], b[i], b[i:], b[i+1:]
for c in string.ascii_lowercase: #for letters [a..z]
if c != ch:
#substitute b[i] with c
x = left + c + right
if x in w: yield x
#inject c before b[i]
x = left + c + at
if x in w: yield x
#remove b[i]
x = left + right
if x in w: yield x
for c in string.ascii_lowercase: #for letters [a..z]
x = b + c
if x in w: yield x #append c after b
while len(nf):
cf = nf
nf = set(j for i in cf for j in nextgen(i) if j not in f)
w -= nf
f |= nf
print len(f)
assert (len(f) == 78482)
There, runtime down to 9.45 seconds on my laptop.
Next, lets tackle yield. If we just put them into a temporary set instead:
f, nf, w = set(), ["causes"], set(open("word.list").read().splitlines())
def next():
f = []
for b in nf:
for i in range(len(b)): #for each index in b
left, at, right = b[:i], b[i:], b[i+1:]
for c in 'abcdefghijklmnopqrstuvwxyz':
#substitute b[i] with c
x = left + c + right
if x in w: f.append(x)
#inject c before b[i]
x = left + c + at
if x in w: f.append(x)
#remove b[i]
x = left + right
if x in w: f.append(x)
for c in 'abcdefghijklmnopqrstuvwxyz':
x = (b + c)
if x in w: f.append(x) #append c after b
return set(f)
while nf:
nf = next()
w -= nf
f |= nf
print len(f)
assert (len(f) == 78482)
Hmm, only a marginal improvement. Under 9 seconds.
It would be fun with stackless python or pypy to test out a recursive approach. But I’m not going to zap my daily python for that test!
With small tweaks, like renaming next(), it runs in shedskin:
w.difference_update(nf)
f.update(nf)
Hmm. 22 seconds. Not going in the right direction :) I’ll send it in to the shedskin list so they can consider how its hurting.
↓ click the "like" button to share on twitter and facebook!