Niels Ott

Computational Linguist

GenericLevenshtein

Yet another Implementation of Levenshtein Distance

GenericLevenshtein is an implementation of Minimum Edit Distance, also called Levenshtein Distance, written by Ramon Ziai and Niels Ott. This algorithm is very popular and it is often used to compute the similarity of strings. The difference in the presented implementation is that it can operate on sequences of any Java object implementing equals(Object). So no matter if you want to compare genome sequences or sequences of numbers, or just strings, here you go!

Furthermore, the costs of the replace, insert, and delete operations can be customized by implementing the simple WeightCalculator<T> interface. In that case it is not a requirement to rely on equals(Object) as your implementation can do whatever you like it to do in oder to compare objects.

Usage Examples

There is a simple convenience method for comparing strings:
System.out.println(SimpleLevenshtein.getStringDistance( "Quasselsack", "Niels"));

This demonstrates the use of the generic algorithm:
LevenshteinDistance<Character> levDistance = new LevenshteinDistance<Character>();
System.out.println(levDistance.getDistance( Conversion.convertToArray("Quasselsack"), Conversion.convertToArray("Niels")));

System Requirements

The GenericLevensthein library requires Java 1.5 or later.

Download

Please be aware that this package is released under the terms of the Apache License v.2.

Version History

  • CHANGELOG
  • Version 0.1.0: doc (Version 0.1.0 contains a mean bug and its usage is discouraged. It is not provided here any more.)
  • Version 0.2.0: doc (Version 0.2.0 contains a mean bug and its usage is discouraged. It is not provided here any more.)
Posted by Niels Ott • 2010-08-24