Akivela at 07:39, 1 May 2015

2015-05-01T07:39:19Z

Eero: Created page with "The word similarity extractor uses a given similarity metric to associate words with sufficiently similar topic data. The configuration dialog for the extractor allows setting..."

2015-04-27T13:04:37Z

Created page with "The word similarity extractor uses a given similarity metric to associate words with sufficiently similar topic data. The configuration dialog for the extractor allows setting..."

New page

The word similarity extractor uses a given similarity metric to associate words
with sufficiently similar topic data. The configuration dialog for the extractor
allows setting the metric, similarity threshold as well as a few other settings
controlling the association process:

; Similarity metric
: Which similarity metric to use. The implementation relies on [https://github.com/Simmetrics/simmetrics SimMetrics].
; Threshold
: The level of similarity required in order to consider data similar. Level of 0 corresponds to everything being considered similar and 1 corresponds to only identical data being considered similar.
; Case sensitive
: Whether to match words in a case sensitive manner.
; Variant Name
: Whether to consider the variant name of the topic when determining a match
; Base Name
: Whether to consider the base name of the topic when determining a match
; Instance Data
: Whether to consider instance data of the topic when determining a match

The extractor is found in '''File > Classification > Word Similarity Extractor'''.

== Example ==

The extractor's operations is demonstrated using the current description of
Wandora found on [http://wandora.org wandora.org]:

<blockquote>
Wandora is a tool for people who collect and process information, especially networked knowledge and knowledge about WWW resources. With Wandora you can aggregate and combine information from various different sources. You can manipulate the collected knowledge flexible and efficiently, and without programming skills. More generally speaking Wandora is a general purpose information extraction, management and publishing application based on Topic Maps and Java. Wandora suits well for constructing and maintaining vocabularies, ontologies and information mashups. Application areas include linked data, open data, data integration, business intelligence, digital preservation and data journalism. Wandora's license is GNU GPL. Wandora application is developed actively by a small number of experienced software developers. We call ourselves as the Wandora Team.
</blockquote>

First create a topic to contain this text data via '''Topics > New topic'''.

[[File:word_similarity_1.png]]

Open the topic and add the text as English occurrence data. Use the default occurrence type by pressing '''Use default'''.

[[File:word_similarity_2.png]]
[[File:word_similarity_3.png]]

The default configuration options are as listed in the configuration dialog.

[[File:word_similarity_4.png]]

Open the extractor via '''File > Extract > Classification > Word Similarity extractor'''. In the '''raw''' panel specify words that are similar to what appear in the description text

[[File:word_similarity_5.png]]

The extractor then associates the description topic with the words according to the given extraction parameters.

[[File:word_similarity_6.png]]

The generated associations contain three players describing the topic, word as well as the similarity score given by the specified metric. Here '''preserve''' was considered to have a similarity score of 0.875 to the word '''preservation''' found in the description text.

[[File:word_similarity_7.png]]

==See also==
*[[Simple Word Matching Extractor]]

@@ Line 17: / Line 17: @@
 : Whether to consider instance data of the topic when determining a match
-The extractor is found in '''File > Classification > Word Similarity Extractor'''.
+The extractor is found in '''File > Extract > Classification > Word Similarity Extractor'''.
 == Example ==

Similarity Word Extractor - Revision history

Akivela at 07:39, 1 May 2015

Eero: Created page with "The word similarity extractor uses a given similarity metric to associate words with sufficiently similar topic data. The configuration dialog for the extractor allows setting..."