Tom Brøndsted: Document Classifier

Upload | Help | Back (more online)

Version 2.0 April 2008 (NEW agglomerative clustering)

[Ex.1] [Ex.2] [Ex.3]

Example 1: The Bible. The preset URIs include three texts from the Old Testament (The First Book of Samuel, The First Book of the Kings, and Ezra) and three texts from the New Testament (The Gospel According to St. Matthew, Saint Mark, and St. Luke). The vector space model clusters them well into two groups! Press "Measure similarity" at the bottom of the page or select "Clear URIs" and input your own settings.

[Clear URIs]

inverse doc. freq. apply English stemming (Porter)
term=word term=wordpair term=wordtriplet

view!
view!
view!
view!
view!
view!
view!
view!
view!
(Patience! Calculation can take 10-40 sec.)


An experimental document classifier based on the vector space model and agglomerative clustering. Input is a number of links to documents to be analyzed. Output is a distance matrix depicting the similarities of the documents and how they cluster.