Levenshtein distance calculation in Excel witl xlOil and python Levenshtein module
A very quick experiment to try a traditional data cleaning example with Excel and xlOil: calculate the Levenshtein distance between every pair of words. In this case I’m using the Python-Levenshtein module, and the xloil wrapping is trivial – each pair is calculated separately.
All that is needed is to install the Levenshtein module with pip and a wrap of type:
@xloil.func
def lvdist(s1, s2):
return Levenshtein.distance(s1,s2)
In the example below I’m using a small library of words so I have 57764 words matched against 10 examples, for a total of about half a million of Levenshtein distance caluclations. This takes about 3 seconds in total on my laptop.
