Topic map conversion of WordNet
(→Download WordNet topic map) |
(→Download WordNet topic map) |
||
Line 6: | Line 6: | ||
* [http://www.wandora.net/wandora/download/other/wordnet_v23.wpr Wandora project file with separate layers] (6.9 MB). This version is targeted to Wandora users who want to play with, filter, and modify the WordNet. | * [http://www.wandora.net/wandora/download/other/wordnet_v23.wpr Wandora project file with separate layers] (6.9 MB). This version is targeted to Wandora users who want to play with, filter, and modify the WordNet. | ||
− | * [http://www.wandora.net/wandora/download/other/wordnet_v23.xtm.zip Single | + | * [http://www.wandora.net/wandora/download/other/wordnet_v23.xtm.zip Single XTM dump] (zipped 5.4 MB, unzipped 144 MB). This version is a general topic map file usable in many topic map applications. |
Glossary is available at separate topic map: | Glossary is available at separate topic map: |
Revision as of 10:32, 11 July 2007
WordNet is a large lexical database for English. WordNet has been developed at the Cognitive Science Laboratory of Princeton University. Topic map conversion is based on W3's work on RDF version of WordNet 2.0.
Contents |
Download WordNet topic map
There are two versions of WordNet topic map available:
- Wandora project file with separate layers (6.9 MB). This version is targeted to Wandora users who want to play with, filter, and modify the WordNet.
- Single XTM dump (zipped 5.4 MB, unzipped 144 MB). This version is a general topic map file usable in many topic map applications.
Glossary is available at separate topic map:
- WordNet glossary XTM dump (zipped 6.0 MB, unzipped 52 MB). Glossary contains only topic stubs with single subject identifier and glossary occurrences. Merge the glossary topic map to the WordNet topic map or import the glossary to WordNet project as a separate layer.
Metrics of WordNet topic map
The WordNet topic map contains
- 115486 topics
- 137383 associations
Word topics:
- 115424 word topics
- 79689 noun topics
- 13508 verb topics
- 7482 adjective topics
- 3664 adverb topics
- 11081 adjective satellite topics
Associations:
- 648 attribute associations
- 218 causes associations
- 1280 classified-by-region associations
- 6166 classified-by-topic associations
- 983 classified-by-usage associations
- 409 entails associations
- 94842 hyponym-of associations
- 12205 member-meronym-of associations
- 8636 part-meronym-of associations
- 874 same-verb-group associations
- 11098 similar-to associations
Moreover:
- Each word topic has a unique base name and English display variant
- Each word topic has an occurrence of synsetID referring to the original word ID given in Princeton
- Clustering coefficient of topic map WordNet is 0.0265
Using WordNet topic map in Wandora
Topic map version of WordNet contains over 100 000 topic and associations, and requires at least 2 GB of memory to be used properly in Wandora. To get such a memory for Wandora, start the application with bin/Wandora-huge.bat or adjust Java's memory settings in bin/Wandora.bat. Below is a screenshot of Wandora with WordNet's meeting topic open. Note the layer structure.
Conversion details
The topic map conversion of WordNet is based on W3's RDF version of WordNet. The conversion had (little simplified) steps
- Import each single RDF file of WordNet to Wandora as a separate layer. For each imported layer
- Manually fix RDF triplets to topic map associations
- Map RDF's subject and object to topic map roles
- Manually fix certain subject identifiers of imported topics
- Create light-weight topic hierarchy to connect WordNet topics to Wandora's topic tree.
I (akivela) was actually little surprised how easily the RDF version converted to a topic map. The overall amount of work was about two working days. The most demanding step was to decide which roles to use in associations. Next chapters describe the most important base names and subject identifiers of the topic map conversion.
Synsets
Synsets are classes that collect all words under word categories. Categories comply with W3's and WordNet's categories. Single words are instances of these class topics.
Base name | Subject identifiers |
AdjectiveSatelliteSynset (wordnet) | http://www.w3.org/2006/03/wn/wn20/schema/AdjectiveSatelliteSynset |
AdjectiveSynset (wordnet) | http://www.w3.org/2006/03/wn/wn20/schema/AdjectiveSynset |
AdverbSynset (wordnet) | http://www.w3.org/2006/03/wn/wn20/schema/AdverbSynset |
FullSynset (wordnet) | http://www.wandora.net/wordnet/synset |
NounSynset (wordnet) | http://www.w3.org/2006/03/wn/wn20/schema/NounSynset |
VerbSynset (wordnet) | http://www.w3.org/2006/03/wn/wn20/schema/VerbSynset |
Association types
Association types define separate relations between word topics. Association types comply with W3's WordNet schema. Each association type has been added extra subject identifier to connect the topic to Wandora.
Association roles
W3's WordNet does not contain association roles as RDF has no similar structure. For this reason role topics have no corresponding subject identifier of W3's RDF schema.
Occurrence types
synsetId (wordnet) | http://www.w3.org/2006/03/wn/wn20/schema/synsetId |
Limitations of the topic map WordNet
To limit the size of resulting topic map some RDF files of WordNet have been left outside the conversion. For example the current WordNet topic map does not contain glossary. However it is very easy to extend the current version by simply importing the required RDF files to Wandora.
WordNet license
WordNet has been created originally in Cognitive Science Laboratory of Princeton University. The topic map conversion of WordNet is based on W3's work on RDF version of WordNet. Read more: