Moby thesaurus extractor

From WandoraWiki
(Difference between revisions)
Jump to: navigation, search
 
 
(3 intermediate revisions by one user not shown)
Line 9: Line 9:
 
  rootword, similar3
 
  rootword, similar3
  
Association type and roles remain same in all assocations. As Moby thesaurus is very large, you need to give JRE at least 2G of memory to successfully process whole thesaurus. Wandora's Moby thesaurus extractor starts with menu option '''File > Extract > Other > Moby thesaurus extractor'''.
+
Association type and roles remain same in all assocations. As Moby thesaurus is very large, you need to give JRE at least 2G of memory to successfully process whole thesaurus. Wandora's Moby thesaurus extractor starts with menu option '''File > Extract > Language > Moby thesaurus extractor'''.
  
Moby thesaurus is not included in Wandora application but you should easily find one in internet as thesaurus is public domain. See [http://www.gutenberg.org/etext/3202 Project Gutenberg] for example.
+
Moby thesaurus is not included in Wandora application but you should easily find one as it is public domain. See [http://www.gutenberg.org/etext/3202 Project Gutenberg] for example.
 +
 
 +
Also note the input format can be used to construct not only word-relations but any associations, if you like. Just change the default association type and roles after extraction.
 +
 
 +
== See also ==
 +
 
 +
* [[Topic map conversion of Moby Thesaurus II]]

Latest revision as of 16:01, 11 July 2011

Wandora's Moby thesaurus extractor was developed to convert Moby's thesaurus to topic map format. Moby thesaurus is a specially formatted text file where each line contains a root word and similar words:

rootword similar1 similar2 similar3

Number of similar words varies. Extractor converts previous example line to three binary associations

rootword, similar1
rootword, similar2
rootword, similar3

Association type and roles remain same in all assocations. As Moby thesaurus is very large, you need to give JRE at least 2G of memory to successfully process whole thesaurus. Wandora's Moby thesaurus extractor starts with menu option File > Extract > Language > Moby thesaurus extractor.

Moby thesaurus is not included in Wandora application but you should easily find one as it is public domain. See Project Gutenberg for example.

Also note the input format can be used to construct not only word-relations but any associations, if you like. Just change the default association type and roles after extraction.

[edit] See also