Atom extractor
(→Example) |
(→Example) |
||
Line 13: | Line 13: | ||
− | User copies feed URL, starts Wandora application, and chooses Atom extractor with '''File > Extract > Atom extractor''' option. User selects ''' | + | User copies feed URL, starts Wandora application, and chooses Atom extractor with '''File > Extract > Atom extractor''' option. User selects '''Urls''' tab and pastes Atom feed URL to the text area and clicks '''Extract''' button. |
Revision as of 15:07, 6 January 2009
Wandora's Atom extractor converts generic Atom syndication feeds to topic maps format. Extractor starts with menu option File > Extract > Atom extractor.... Wandora requires feed file or an URL resource before transformation is possible.
See also
Example
In this example we show how Wandora user extracts Atom feed provided by Citeseer. User has searched for Topic Maps in Citeseer and gets result set (or 100 first matches) Atom feed as shown below.
User copies feed URL, starts Wandora application, and chooses Atom extractor with File > Extract > Atom extractor option. User selects Urls tab and pastes Atom feed URL to the text area and clicks Extract button.
After successful extraction Wandora views some data about extraction process.
Now Wandora has topics for the extracted Atom feed and all feed entries. Feed entry topics are associated to the feed topic.
Now Wandora user clicks open one extracted entry topic.
Atom feed contains also abstracts of Topic Maps related papers. Each abstract is stored as occurrence related to entry topic.
As you might have noticed above, Atom feed entry authors are authors of scientific paper the Atom entry represent. If paper has multiple authors as our example above, all authors are stacked into a single topic. This is a consequence of original Atom feed structure where single XML element is used to represent Atom feed entry authors. Next images show how Wandora user may split up the author topic. Wandora user must select the author topic first. Then user must right click the authr topic cell and choose Topics > Split topics > Split topic with base name... in context menu. As authors are separated with comma and white space, the regular expression used for topic split is
\,
Although not visible, the comma character should follow white space character in regular expression above.
After topic split the Atom feed entry has two authors instead of one and both authors represent one person instead of group of persons. After split topic is viewed below.
Notice Wandora contains also Bibtex extractor that might suit better for bibliographical extractions compared to Atom feed extractor.