Topic map conversion of OpenCyc

From WandoraWiki
(Difference between revisions)
Jump to: navigation, search
(Conversion details)
(Conversion details)
Line 39: Line 39:
 
* '''Thing''' (http://www.w3.org/2002/07/owl#Thing) is a root node of OpenCyc ontology. It can be used to navigate anywhere in the ontology. However, it appears to contain a lot more subclasses than [http://www.cyc.com/cycdoc/upperont-diagram.html OpenCyc Upper Ontology diagrams] usually suggest. Thing is also a subclass of OpenCyc topic.
 
* '''Thing''' (http://www.w3.org/2002/07/owl#Thing) is a root node of OpenCyc ontology. It can be used to navigate anywhere in the ontology. However, it appears to contain a lot more subclasses than [http://www.cyc.com/cycdoc/upperont-diagram.html OpenCyc Upper Ontology diagrams] usually suggest. Thing is also a subclass of OpenCyc topic.
  
Each OpenCyc topic in topic map conversion contains a
+
Each OpenCyc topic contains a subject identifier of format http://sw.cyc.com/2006/07/27/cyc/Concept where '''Concept''' is the CycLConstant i.e. '''#$Concept'''. Subject identifier resolves a WWW page of the concept. In some cases subject identifier is equivalent to a concept of RDFS and OWL vocabulary. Such concepts are '''domain''' with SI http://www.w3.org/2000/01/rdf-schema#domain and '''subPropertyOf''' with SI http://www.w3.org/2000/01/rdf-schema#subPropertyOf for example.
 +
 
 +
Each OpenCyc topic contains a base name equal to CycLConstant. For example the topic for a concept '''#$DistributedFilesystem''' has a base name '''DistributedFilesystem'''.
 +
 
 +
Most OpenCyc topics contains occurrences for prettyString's of the OpenCyc concept. PrettyString is a string representation of the concept. You could think it as the variant name of the OpenCyc topic. However, variant names are '''not''' used to model the prettyString. Design decision was due to an idea to keep OpenCyc changes minimal. Occurrence's type is '''prettyString''' http://sw.cyc.com/2006/07/27/cyc/prettyString and scope '''Lang.indep.''' http://www.wandora.org/core/langindependent.
 +
 
 +
Most OpenCyc topics contains occurrences for prettyString-Canonical. These occurrences are similar to prettyStrings except the name is canonical.
 +
 
 +
Many OpenCyc topics contain occurrence for comment. Comment is a free text description of the OpenCyc concept. Usually the comment contains references to other openCyc concepts with a prefix '''#$'''. Topic maps have no standard mechanism to link a topic in occurrence text and Wandora user must follow occurrence links using other tools in Wandora. Wandora features a special tool '''Topics > Associations > Find associations in occurrences...''' used to extract associations in occurrence texts.
  
* Subject identifier of format http://sw.cyc.com/2006/07/27/cyc/Concept where '''Concept''' is the CycLConstant i.e. '''#$Concept'''. Subject identifier resolves a WWW page of the concept. In some cases subject identifier is equivalent to a concept of RDFS and OWL vocabulary. Such concepts are '''domain''' with SI http://www.w3.org/2000/01/rdf-schema#domain and '''subPropertyOf''' with SI http://www.w3.org/2000/01/rdf-schema#subPropertyOf for example.
 
* Base name equal to CycLConstant. For example the topic for a concept '''#$DistributedFilesystem''' has a base name '''DistributedFilesystem'''.
 
* Occurrences for prettyString's of the OpenCyc concept. PrettyString is a string representation of the concept. You could think it as the variant name of the OpenCyc topic. However, variant names are '''not''' used to model the prettyString. Design decision was due to an idea to keep OpenCyc changes minimal. Occurrence's type is '''prettyString''' http://sw.cyc.com/2006/07/27/cyc/prettyString and scope '''Lang.indep.''' http://www.wandora.org/core/langindependent.
 
  
  

Revision as of 11:43, 29 July 2008

OpenCyc is a large general knowledge base and commonsense reasoning engine. OpenCyc is open source and limited version of the Cyc. Topic map conversion of the OpenCyc is based on RDF conversion of the OpenCyc provided by Stephen L. Reed for his Texai project. Topic map conversion was created with Wandora's RDF import feature following light manual processing.

Contents

Download

There are two versions of the OpenCyc topic map available:

  • OpenCyc Wandora project file (14.6MB) is targeted for Wandora users. Wandora requires at least 1.4G of memory to open the OpenCyc project file successfully.
  • OpenCyc XTM dump (zipped 14.8MB, uncompressed 250 MB) is targeted for all topic map applications capable to import XTM format.

History

  • 2008-07-15. First version published.

Metrics

Metrics have been measured on OpenCYC layer of the Wandora project file. The XTM dump metrics may differ a bit.

  • Number of topics: 120410
  • Number of associations: 424064
  • Number of topic base names: 120409
  • Number of subject identifiers: 120415
  • Number of subject locators: 0
  • Number of occurrences: 244173
  • Number of distinct topic classes: 1
  • Number of distinct types of associations: 73
  • Number of distinct roles in associations: 4
  • Number of distinct players in associations: 116212
  • Average clustering coefficient: 0.16878


Opencyc association stats.gif

Conversion details

Topic map conversion of OpenCyc has a navigation structure of topics:

Each OpenCyc topic contains a subject identifier of format http://sw.cyc.com/2006/07/27/cyc/Concept where Concept is the CycLConstant i.e. #$Concept. Subject identifier resolves a WWW page of the concept. In some cases subject identifier is equivalent to a concept of RDFS and OWL vocabulary. Such concepts are domain with SI http://www.w3.org/2000/01/rdf-schema#domain and subPropertyOf with SI http://www.w3.org/2000/01/rdf-schema#subPropertyOf for example.

Each OpenCyc topic contains a base name equal to CycLConstant. For example the topic for a concept #$DistributedFilesystem has a base name DistributedFilesystem.

Most OpenCyc topics contains occurrences for prettyString's of the OpenCyc concept. PrettyString is a string representation of the concept. You could think it as the variant name of the OpenCyc topic. However, variant names are not used to model the prettyString. Design decision was due to an idea to keep OpenCyc changes minimal. Occurrence's type is prettyString http://sw.cyc.com/2006/07/27/cyc/prettyString and scope Lang.indep. http://www.wandora.org/core/langindependent.

Most OpenCyc topics contains occurrences for prettyString-Canonical. These occurrences are similar to prettyStrings except the name is canonical.

Many OpenCyc topics contain occurrence for comment. Comment is a free text description of the OpenCyc concept. Usually the comment contains references to other openCyc concepts with a prefix #$. Topic maps have no standard mechanism to link a topic in occurrence text and Wandora user must follow occurrence links using other tools in Wandora. Wandora features a special tool Topics > Associations > Find associations in occurrences... used to extract associations in occurrence texts.



Opencyc example.gif

Limitations

  • The topic map conversion contains only OpenCyc's binary relations.
  • Non-atomic terms are not included.
  • Topic Maps do not support semantics of many OpenCyc relations.
  • Each Cyc topic contains at most one arbitrary selected PrettyString and PrettyStringCanonical.

License

GNU General Public License (GPL)

Personal tools