MobyThesaurusExtractor

From WandoraWiki
Jump to: navigation, search

Tool reads Moby thesaurus file and converts if to a topic map. Moby thesaurus file is a simple text file where each line defines single word and related words. For example:

word1 relatedWord1 relatedWord2 relatedWord3 relatedWord4 ...
word2 relatedWord1 relatedWord2 relatedWord3 relatedWord4 ...

Extractor creates a topic for each word (including related words) and a binary association for each word-relatedWord pair. If word has four related words then extractor creates four associations. Notice the word may be a related word for some other word, increasing the overall number of associations one word eventually gets.

Moby thesaurus is public domain and can be acquired from http://www.gutenberg.org/etext/3202

As the Moby thesaurus contains hundreds of thousands words Wandora requires at least 2G of memory to extract complete thesaurus. Even with 2G of memory the application is rather unstable after extraction. This is also the reason why tool is not available in Wandora GUI by default.

GUI name

Tool is not in the Wandora GUI at the moment. You need to use Tool manager to initialize the MobyThesaurusExtractor.

Tool Class

org.wandora.application.tools.extractors.MobyThesaurusExtractor

Personal tools