Niels Ott

Computational Linguist

ClusterLib

A Java Library for Hierarchical Bottom-up Clustering

ClusterLib is a free (open source) library for agglomerative hierarchical clustering written in Java. It was developed by Ramon Ziai and Niels Ott as a programming task for the seminar Computational Dialectometry taught by Thomas Zastrow and Erhard Hinrichs at the Seminar für Sprachwissenschaft (linguistics department) at Tübingen University.

ClusterLib can work with arrays of Java's Double as well as with other custom data types. In principle, any class implementing the DataPoint interface of this library can be used in clustering. This type of clustering involves linkeage methods which are implemented in a strategy pattern (cf. Gamma et al. 1995, p. 315). Therefore, new linkeage methods can be added easily. The current implementation covers Nearest Neighbor, Furthest Neighbor, Cluster Centroid Distance, Average Cluster Distance, and Ward's Method. All of these were implemented as described in Schulte im Walde (2003, p. 186).

Documentation

Part of the programming task was to write a documentation on how to use the program or library. The documentation of ClusterLib should enable every Java programmer to make use of it.

System Requirements

ClusterLib requires Java 1.5 or newer.

Download

Please be aware that this library is released under the GNU Lesser General Public License (v. 2.1)

References

Gamma, Erich, Helm, Richard, Johnson, Ralph, and Vlissides, John (1995). Design Patterns: Elements of Reusable Object-Oriented Software. Boston: Addison-Wesley.

Schulte im Walde, Sabine (2003). Experiments on the Automatic Induction of German Semantic Verb Classes. Ph.D. thesis, Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart. Published as AIMS Report 9(2). [PDF version online]

Posted by Niels Ott • 2010-08-24