How to Write a Spelling Corrector

All you need is 20 lines of Python 2.5 code. 

  1: import re, string, collections
  2: 
  3: def words(text): return re.findall('[a-z]+', text.lower()) 
  4: 
  5: def train(features):
  6:     model = collections.defaultdict(lambda: 1)
  7:     for f in features:
  8:         model[f] += 1
  9:     return model
 10: 
 11: NWORDS = train(words(file('Documents/holmes.txt').read()))
 12: 
 13: def edits1(word):
 14:     n = len(word)
 15:     return set([word[0:i]+word[i+1:] for i in range(n)] + ## deletion
 16:                [word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] + ## transposition
 17:                [word[0:i]+c+word[i+1:] for i in range(n) for c in string.lowercase] + ## alteration
 18:                [word[0:i]+c+word[i:] for i in range(n+1) for c in string.lowercase]) ## insertion
 19: 
 20: def known_edits2(word):
 21:     return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
 22: 
 23: def known(words): return set(w for w in words if w in NWORDS)
 24: 
 25: def correct(word):
 26:     return max(known([word]) or known(edits1(word)) or known_edits2(word) or [word],
 27:                key=lambda w: NWORDS[w])
 28: 
 29:  
 30: 
 31: 

A complete analasis is availabe here: http://norvig.com/spell-correct.html

Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: