Moby thesaurus extractor

From WandoraWiki
Revision as of 14:50, 9 January 2010 by Akivela (Talk | contribs)

Jump to: navigation, search

Wandora's Moby thesaurus extractor was developed to convert Moby's thesaurus to topic map format. Moby thesaurus is a specially formatted text file where each line contains a root word and similar words:

rootword similar1 similar2 similar3

Number of similar words varies. Extractor converts previous example line to three binary associations

rootword, similar1
rootword, similar2
rootword, similar3

Association type and roles remain same in all assocations. As Moby thesaurus is very large, you need to give JRE at least 2G of memory to successfully process whole thesaurus. Wandora's Moby thesaurus extractor starts with menu option File > Extract > Language > Moby thesaurus extractor.

Moby thesaurus is not included in Wandora application but you should easily find one as it is public domain. See Project Gutenberg for example.