ClusterLib
A Java Library for Hierarchical Bottom-up Clustering
ClusterLib is a free (open source) library for agglomerative hierarchical clustering written in Java. It was developed by Ramon Ziai and Niels Ott as a programming task for the seminar Computational Dialectometry taught by Thomas Zastrow and Erhard Hinrichs at the Seminar für Sprachwissenschaft (linguistics department) at Tübingen University.
ClusterLib can work with arrays of Java's Double as well
as with other custom data types. In principle, any class implementing the
DataPoint interface of this library can be used in clustering.
This type of clustering involves linkeage methods which are implemented in
a strategy pattern (cf. Gamma et al. 1995, p. 315). Therefore, new linkeage methods can be added easily.
The current implementation covers Nearest Neighbor, Furthest Neighbor,
Cluster Centroid Distance, Average Cluster Distance, and Ward's Method.
All of these were implemented as described in Schulte
im Walde (2003, p. 186).
Documentation
Part of the programming task was to write a documentation on how to use the program or library. The documentation of ClusterLib should enable every Java programmer to make use of it.
- ClusterLib Documentation (PDF file)
System Requirements
ClusterLib requires Java 1.5 or newer.
Download
Please be aware that this library is released under the GNU Lesser General Public License (v. 2.1)
- clusterlib-0.1.1.jar is the ready-to-use Java archive containing the class files.
- Get source code in a Zip file.
- View the JavaDoc online or download it as zip archive.
References
Gamma, Erich, Helm, Richard, Johnson, Ralph, and Vlissides, John (1995). Design Patterns: Elements of Reusable Object-Oriented Software. Boston: Addison-Wesley.
Schulte im Walde, Sabine (2003). Experiments on the Automatic Induction of German Semantic Verb Classes. Ph.D. thesis, Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart. Published as AIMS Report 9(2). [PDF version online]