Importing RDF

From WandoraWiki
(Difference between revisions)
Jump to: navigation, search
(Post-processing the imported RDF(S))
(See also)
 
(26 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Wandora reads [http://www.w3.org/RDF/ RDF(S)] and N3 files. Import starts with '''File > Import > [[SimpleRDFImport|Simple RDF(S) Import...]]''' or '''File > Import > [[SimpleN3Import|Simple N3 Import...]]'''. Optionally you can drag and drop RDF(S) files to layer stack. Layer stack automatically imports dropped RDF(S) file and creates a new layer for the file. Wandora converts imported RDF triplets to topics, associations and occurrences. Convert schema is very simple and pays no attention to semantics of RDF(S) file. Lets see the conversion process more detailed.
+
Wandora reads [http://www.w3.org/RDF/ RDF] XML, N3, Turtle and JSON-LD files. Import starts with '''File > Import > [[SimpleRDFImport|Simple RDF XML Import...]]''' or '''File > Import > [[SimpleN3Import|Simple RDF N3 Import...]]''' or '''File > Import > Simple RDF Turtle Import...''' or '''File > Import > Simple RDF JSON-LD Import...'''. Optionally you can drag and drop RDF files to layer stack. Layer stack automatically imports dropped RDF file and creates a new layer for the file. Wandora converts imported RDF triplets to topics, associations and occurrences. Convert schema is very simple and pays no attention to semantics of RDF file. Lets see the conversion process more detailed.
  
 
* A topic is always created for RDF '''subject''' and '''predicate'''. Topics created for the '''subject''' and '''predicate''' are typed with Wandora's predefined type topics.
 
* A topic is always created for RDF '''subject''' and '''predicate'''. Topics created for the '''subject''' and '''predicate''' are typed with Wandora's predefined type topics.
  
* If '''object''' is RDF literal an occurrence (text data) is created for the '''subject''' topic. Occurrence's type is the '''predicate''' topic and occurrence's value the RDF literal.
+
* If '''object''' is RDF literal, an occurrence (text data) is created for the '''subject''' topic. Occurrence's type is the '''predicate''' topic and occurrence's value the RDF literal. Occurrence's scope is derived from ''lang'' attribute. If ''lang'' attribute is not found, scope is language independent.
  
* If '''object''' is not RDF literal a topic is created for the '''object''' and the topic is associated with the '''subject''' topic. Association's type is the '''predicate''' topic. Both roles are Wandora's predefined topics. '''Object''' topic is typed with Wandora's predefined type topic.
+
* If '''object''' is not RDF literal, a topic is created for the '''object''' and the topic is associated with the '''subject''' topic. Association's type is the '''predicate''' topic. Both roles are Wandora's predefined topics. '''Object''' topic is typed with Wandora's predefined type topic.
  
Created topics contain no base name or variant names. Created topics inherit one subject identifier from equivalent RDF resource. Subject identifier is the URI of equivalent RDF resource. Wandora employs [http://jena.sourceforge.net/ Jena RDF framework] to read RDF(S) files. Below is the Java code snippet used to handle RDF statements in Wandora.  
+
Created topics doesn't contain base names or variant names. Created topics inherit one subject identifier from equivalent RDF resource. Subject identifier is the URI of equivalent RDF resource. Wandora employs [http://jena.apache.org/ Jena RDF framework] to read RDF files. Below is the Java code snippet used to handle RDF statements in Wandora.  
  
  
Line 19: Line 19:
 
         Property predicate = stmt.getPredicate();  // get the predicate
 
         Property predicate = stmt.getPredicate();  // get the predicate
 
         RDFNode object    = stmt.getObject();      // get the object
 
         RDFNode object    = stmt.getObject();      // get the object
 +
        String lan        = null;                  // language attribute
 
          
 
          
 
         Topic subjectTopic = getOrCreateTopic(map, subject.toString());
 
         Topic subjectTopic = getOrCreateTopic(map, subject.toString());
Line 27: Line 28:
 
        
 
        
 
         if(object.isLiteral()) {
 
         if(object.isLiteral()) {
             subjectTopic.setData(predicateTopic,
+
             try { lan = stmt.getLanguage(); } catch(Exception e) { /* LANG ATTRIBUTE NOT FOUND! */ }
 +
            if(lan==null || lan.length()==0) {
 +
              subjectTopic.setData(predicateTopic,
 
                                 getOrCreateTopic(map, occurrenceScopeSI),
 
                                 getOrCreateTopic(map, occurrenceScopeSI),
 
                                                   ((Literal) object).getString());
 
                                                   ((Literal) object).getString());
 +
            }
 +
            else {
 +
              subjectTopic.setData(predicateTopic,
 +
                                getOrCreateTopic(map, XTMPSI.getLang(lan)),
 +
                                                  ((Literal) object).getString());
 +
            }
 
         }
 
         }
 
         else if(object.isResource()) {
 
         else if(object.isResource()) {
Line 44: Line 53:
  
  
==Post-processing the imported RDF(S)==
+
==Post-processing the imported RDF==
 +
 
 +
To make the imported RDF more topic mappish you may want to modify it after import. This chapter discusses about the post-processing techniques to make the RDF-imported topic map more convenient.
 +
 
 +
=== Constructing base names ===
 +
 
 +
RDF(S) originated topics contain no base names. First step is to add base names to the imported topics. You can create a base name with topic's subject identifier using '''[[MakeBasenameWithSI|Make base name with SI]]''' tool found in topic table's context menu under '''Topics > Base names'''. Base name is automatically constructed using filename and anchor of the subject identifier URLs. If the created topic map contains subject identifiers with identical filenames, take extra care of these topics to prevent automatic merge of topics.
 +
 
 +
Second step is to clean up base names. You can use '''Topics > Base names > [[BasenameRegexReplacer|Regex replace...]]''' to filter out undesired parts of the base names. If you start the tool in context of layer, tool processes all base names found in layer's topic map. For example, to filter out starting '''prefix''' string in base names you could use regular expression
 +
 
 +
prefix(.+)
 +
 
 +
and replacement
 +
 
 +
$1
 +
 
 +
=== Constructing variant names ===
 +
 
 +
Third step is to generate variant names from RDF label occurrences. Generally RDF document carries labels attached to RDF concepts. Labels may be language dependent. If such labels exists, a label occurrence is associated to RDF topic. To generate variant names from RDF label occurrences, select all RDF topics and use tool '''Topics > Variant names > Make display variants with occurrences'''. Tool copies occurrence texts to variant names.
 +
 
 +
If variant construction was successful, you may want to remove label occurrences. To remove occurrences of given type use tool '''Topics > Occurrences > Delete occurrences with type...'''. Tool seeks all possible occurrence types and asks which occurrences to remove. Once again, if you want to process every topic in topic map, start the tool in context of layer.
 +
 
 +
=== Processing associations ===
 +
 
 +
Final step is to change roles of RDF originated associations. By default these roles are
 +
 
 +
* http://wandora.net/si/core/rdf-subject
 +
* http://wandora.net/si/core/rdf-object
 +
 
 +
You can not rename role topics as all players share same roles. Instead you need to modify associations with '''[[ChangeAssociationRole|Change association role...]]''' and '''[[ChangeAssociationType|Change association type...]]''' tools found in context menu of association table. In general this step includes subtasks:
 +
 
 +
* Create all '''new''' role and association type topics
 +
* For each association type
 +
** Open association type topic
 +
** Select all associations within the association table
 +
** Use tool '''[[ChangeAssociationRole|Change association role...]]''' to change each role
 +
** Use tool '''[[ChangeAssociationType|Change association type...]]''' if necessary
  
To make the imported RDF(S) more topic mappish you may want to modify it after import. This chapter discusses about the post-processing techniques to make the RDF-imported topic map more convenient.
+
== See also ==
  
First step is to add base names to the topics. All RDF(S) originated topics contain no base name and only one subject identifier. You can create base name with topic's subject identifier using '''[[MakeBasenameWithSI|Make base name with SI]]''' tool found in topic table's context menu under '''Topics > Base names'''. Base name is automatically constructed using filename and anchor of the subject identifier URLs. If your topic map contains subject identifiers with identical filenames you should take extra care of these topics to prevent automatic merge of topics.
+
Wandora contains also several different RDF extractors that can automatically recognize RDF's name space and create valid base names and association roles for extracted topics and associations. By default these simple RDF extractors locate in '''File > Extract > Simple RDF extract''' menu. Current RDF extractors are
  
Second step is to clean up base names. You can use '''Topics > Base names > [[BasenameRegexReplacer|Regex replace...]]''' to filter out undesired parts of the base names.
+
* [[Twine RDF extractor]]
 +
* [[SKOS RDF extractor]]
 +
* [[Dublin Core RDF extractor]]
 +
* [[FOAF RDF extractor]]
 +
* [[IIIF RDF extractor]]
 +
* OWL Extractor
 +
* RDFS Extractor
 +
* RSS 1.0 RDF Extractor
  
Third step is to change roles of RDF(S) originated associations. By default these roles are http://www.wandora.net/core/rdf-subject and http://www.wandora.net/core/rdf-object. You can not just edit these two role topics as all players share role topics. Instead you need to modify associations with '''[[ChangeAssociationRole|Change association role...]]''' and '''[[ChangeAssociationType|Change association type...]]''' tools found in context menu association table. Depending on the original RDF(S) structure changing roles may be a big task.
+
__NOTOC__

Latest revision as of 15:02, 29 May 2015

Wandora reads RDF XML, N3, Turtle and JSON-LD files. Import starts with File > Import > Simple RDF XML Import... or File > Import > Simple RDF N3 Import... or File > Import > Simple RDF Turtle Import... or File > Import > Simple RDF JSON-LD Import.... Optionally you can drag and drop RDF files to layer stack. Layer stack automatically imports dropped RDF file and creates a new layer for the file. Wandora converts imported RDF triplets to topics, associations and occurrences. Convert schema is very simple and pays no attention to semantics of RDF file. Lets see the conversion process more detailed.

  • A topic is always created for RDF subject and predicate. Topics created for the subject and predicate are typed with Wandora's predefined type topics.
  • If object is RDF literal, an occurrence (text data) is created for the subject topic. Occurrence's type is the predicate topic and occurrence's value the RDF literal. Occurrence's scope is derived from lang attribute. If lang attribute is not found, scope is language independent.
  • If object is not RDF literal, a topic is created for the object and the topic is associated with the subject topic. Association's type is the predicate topic. Both roles are Wandora's predefined topics. Object topic is typed with Wandora's predefined type topic.

Created topics doesn't contain base names or variant names. Created topics inherit one subject identifier from equivalent RDF resource. Subject identifier is the URI of equivalent RDF resource. Wandora employs Jena RDF framework to read RDF files. Below is the Java code snippet used to handle RDF statements in Wandora.


   public void handleStatement(Statement stmt, TopicMap map,
                               Topic subjectType,
                               Topic predicateType,
                               Topic objectType) throws TopicMapException {
       
       Resource subject   = stmt.getSubject();     // get the subject
       Property predicate = stmt.getPredicate();   // get the predicate
       RDFNode object     = stmt.getObject();      // get the object
       String lan         = null;                  // language attribute
       
       Topic subjectTopic = getOrCreateTopic(map, subject.toString());
       Topic predicateTopic = getOrCreateTopic(map, predicate.toString());
       
       subjectTopic.addType(subjectType);
       predicateTopic.addType(predicateType);
      
       if(object.isLiteral()) {
           try { lan = stmt.getLanguage(); } catch(Exception e) { /* LANG ATTRIBUTE NOT FOUND! */ }
           if(lan==null || lan.length()==0) {
              subjectTopic.setData(predicateTopic,
                                getOrCreateTopic(map, occurrenceScopeSI),
                                                 ((Literal) object).getString());
           }
           else {
              subjectTopic.setData(predicateTopic,
                                getOrCreateTopic(map, XTMPSI.getLang(lan)),
                                                 ((Literal) object).getString());
           }
       }
       else if(object.isResource()) {
           Topic objectTopic = getOrCreateTopic(map, object.toString());
           Association association = map.createAssociation(predicateTopic);
           association.addPlayer(subjectTopic, subjectType);
           association.addPlayer(objectTopic, objectType);
           objectTopic.addType(objectType);
       }
       else if(object.isURIResource()) {
           log("URIResource found but not handled!");
       }        
   }


[edit] Post-processing the imported RDF

To make the imported RDF more topic mappish you may want to modify it after import. This chapter discusses about the post-processing techniques to make the RDF-imported topic map more convenient.

[edit] Constructing base names

RDF(S) originated topics contain no base names. First step is to add base names to the imported topics. You can create a base name with topic's subject identifier using Make base name with SI tool found in topic table's context menu under Topics > Base names. Base name is automatically constructed using filename and anchor of the subject identifier URLs. If the created topic map contains subject identifiers with identical filenames, take extra care of these topics to prevent automatic merge of topics.

Second step is to clean up base names. You can use Topics > Base names > Regex replace... to filter out undesired parts of the base names. If you start the tool in context of layer, tool processes all base names found in layer's topic map. For example, to filter out starting prefix string in base names you could use regular expression

prefix(.+)

and replacement

$1

[edit] Constructing variant names

Third step is to generate variant names from RDF label occurrences. Generally RDF document carries labels attached to RDF concepts. Labels may be language dependent. If such labels exists, a label occurrence is associated to RDF topic. To generate variant names from RDF label occurrences, select all RDF topics and use tool Topics > Variant names > Make display variants with occurrences. Tool copies occurrence texts to variant names.

If variant construction was successful, you may want to remove label occurrences. To remove occurrences of given type use tool Topics > Occurrences > Delete occurrences with type.... Tool seeks all possible occurrence types and asks which occurrences to remove. Once again, if you want to process every topic in topic map, start the tool in context of layer.

[edit] Processing associations

Final step is to change roles of RDF originated associations. By default these roles are

You can not rename role topics as all players share same roles. Instead you need to modify associations with Change association role... and Change association type... tools found in context menu of association table. In general this step includes subtasks:

  • Create all new role and association type topics
  • For each association type

[edit] See also

Wandora contains also several different RDF extractors that can automatically recognize RDF's name space and create valid base names and association roles for extracted topics and associations. By default these simple RDF extractors locate in File > Extract > Simple RDF extract menu. Current RDF extractors are


Personal tools