Topic map conversion of OpenCyc
(→Conversion details) |
(→Conversion details) |
||
Line 48: | Line 48: | ||
Many OpenCyc topics contain occurrence for comment. Comment is a free text description of the OpenCyc concept. Usually the comment contains references to other openCyc concepts with a prefix '''#$'''. Topic maps have no standard mechanism to link a topic in occurrence text and Wandora user must follow occurrence links using other tools in Wandora. Wandora features a special tool '''Topics > Associations > Find associations in occurrences...''' used to extract associations in occurrence texts. | Many OpenCyc topics contain occurrence for comment. Comment is a free text description of the OpenCyc concept. Usually the comment contains references to other openCyc concepts with a prefix '''#$'''. Topic maps have no standard mechanism to link a topic in occurrence text and Wandora user must follow occurrence links using other tools in Wandora. Wandora features a special tool '''Topics > Associations > Find associations in occurrences...''' used to extract associations in occurrence texts. | ||
+ | |||
+ | Two basic relations in OpenCyc are '''isa''' and '''genls'''. First one, the '''isa''' is a individual-collection relation identical to class-instance relation specified in Topic Map standard. However, standard Topic Map relation was not used to represent OpenCyc's '''isa''' relations. The problem was that Wandora's data model doesn't contain explicit association type topic nor role topics for class-instance relation. This limitation inhibits other associations for the class-instance association type. For example it is impossible to specify a subclass for the class-instance relation. OpenCyc contains not only subclasses for the class-instance relation but also inverse and subproperty relations. Thus, a separate association type and roles were constructed to represent openCyc's '''isa''' relations. Association type's base name is '''isa''' and SI http://www.w3.org/1999/02/22-rdf-syntax-ns#type. Roles topics are discussed below. If you require different association type or roles use Wandora's '''Change type''' and '''Change role''' tools in context to '''isa''' association table. | ||
+ | |||
+ | Other typical OpenCyc relation is '''genls''' (generalizes). Relation is equal to superclass-subclass relation. As superclass-subclass relation has explicit type and role topics '''genls''' relations were mapped to standard Topic Map constructs. However, association type's base name is overwritten and is '''genls'''. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
Revision as of 12:15, 29 July 2008
OpenCyc is a large general knowledge base and commonsense reasoning engine. OpenCyc is open source and limited version of the Cyc. Topic map conversion of the OpenCyc is based on RDF conversion of the OpenCyc provided by Stephen L. Reed for his Texai project. Topic map conversion was created with Wandora's RDF import feature following light manual processing.
Contents |
Download
There are two versions of the OpenCyc topic map available:
- OpenCyc Wandora project file (14.6MB) is targeted for Wandora users. Wandora requires at least 1.4G of memory to open the OpenCyc project file successfully.
- OpenCyc XTM dump (zipped 14.8MB, uncompressed 250 MB) is targeted for all topic map applications capable to import XTM format.
History
- 2008-07-15. First version published.
Metrics
Metrics have been measured on OpenCYC layer of the Wandora project file. The XTM dump metrics may differ a bit.
- Number of topics: 120410
- Number of associations: 424064
- Number of topic base names: 120409
- Number of subject identifiers: 120415
- Number of subject locators: 0
- Number of occurrences: 244173
- Number of distinct topic classes: 1
- Number of distinct types of associations: 73
- Number of distinct roles in associations: 4
- Number of distinct players in associations: 116212
- Average clustering coefficient: 0.16878
Conversion details
Topic map conversion of OpenCyc has a navigation structure of topics:
- OpenCyc (http://www.wandora.org/opencyc) is a subclass of Wandora class topic. It collects both the OpenCyc types and root node of OpenCyc i.e. Thing topic together.
- OpenCyc Types (http://www.wandora.org/opencyc/types) is a subclass of OpenCyc. It collects all OpenCyc's association and occurrence types as instances.
- Thing (http://www.w3.org/2002/07/owl#Thing) is a root node of OpenCyc ontology. It can be used to navigate anywhere in the ontology. However, it appears to contain a lot more subclasses than OpenCyc Upper Ontology diagrams usually suggest. Thing is also a subclass of OpenCyc topic.
Each OpenCyc topic contains a subject identifier of format http://sw.cyc.com/2006/07/27/cyc/Concept where Concept is the CycLConstant i.e. #$Concept. Subject identifier resolves a WWW page of the concept. In some cases subject identifier is equivalent to a concept of RDFS and OWL vocabulary. Such concepts are domain with SI http://www.w3.org/2000/01/rdf-schema#domain and subPropertyOf with SI http://www.w3.org/2000/01/rdf-schema#subPropertyOf for example.
Each OpenCyc topic contains a base name equal to CycLConstant. For example the topic for a concept #$DistributedFilesystem has a base name DistributedFilesystem.
Most OpenCyc topics contains occurrences for prettyString's of the OpenCyc concept. PrettyString is a string representation of the concept. You could think it as the variant name of the OpenCyc topic. However, variant names are not used to model the prettyString. Design decision was due to an idea to keep OpenCyc changes minimal. Occurrence's type is prettyString http://sw.cyc.com/2006/07/27/cyc/prettyString and scope Lang.indep. http://www.wandora.org/core/langindependent.
Most OpenCyc topics contains occurrences for prettyString-Canonical. These occurrences are similar to prettyStrings except the name is canonical.
Many OpenCyc topics contain occurrence for comment. Comment is a free text description of the OpenCyc concept. Usually the comment contains references to other openCyc concepts with a prefix #$. Topic maps have no standard mechanism to link a topic in occurrence text and Wandora user must follow occurrence links using other tools in Wandora. Wandora features a special tool Topics > Associations > Find associations in occurrences... used to extract associations in occurrence texts.
Two basic relations in OpenCyc are isa and genls. First one, the isa is a individual-collection relation identical to class-instance relation specified in Topic Map standard. However, standard Topic Map relation was not used to represent OpenCyc's isa relations. The problem was that Wandora's data model doesn't contain explicit association type topic nor role topics for class-instance relation. This limitation inhibits other associations for the class-instance association type. For example it is impossible to specify a subclass for the class-instance relation. OpenCyc contains not only subclasses for the class-instance relation but also inverse and subproperty relations. Thus, a separate association type and roles were constructed to represent openCyc's isa relations. Association type's base name is isa and SI http://www.w3.org/1999/02/22-rdf-syntax-ns#type. Roles topics are discussed below. If you require different association type or roles use Wandora's Change type and Change role tools in context to isa association table.
Other typical OpenCyc relation is genls (generalizes). Relation is equal to superclass-subclass relation. As superclass-subclass relation has explicit type and role topics genls relations were mapped to standard Topic Map constructs. However, association type's base name is overwritten and is genls.
Limitations
- The topic map conversion contains only OpenCyc's binary relations.
- Non-atomic terms are not included.
- Topic Maps do not support semantics of many OpenCyc relations.
- Each Cyc topic contains at most one arbitrary selected PrettyString and PrettyStringCanonical.
License
GNU General Public License (GPL)