Tom Brøndsted: Document Classifier

Input form | Help | Select examples

[Load Example 1] Determine if bible texts origin from the old or the new testament.
[Load Example 2] Looking for fingerprints in spam emails.
[Load Example 3] Separating spam from not-spam.

inverse doc. freq. apply English stemming (Porter)
term=word term=wordpair term=wordtriplet

view!
view!
view!
view!
view!
view!
view!
view!
view!
(Patience! Calculation can take 10-40 sec.)


An experimental document classifier based on the vector space model and agglomerative clustering. Input is a number of links to documents to be analyzed. Output is a distance matrix depicting the similarities of the documents and how they cluster.