mowl.datasets

Base dataset

This module contains classes intended to deal with mOWL datasets.

class mowl.datasets.base.Dataset(ontology, validation=None, testing=None)[source]

Bases: object

This class represents an mOWL dataset.

Parameters
  • ontology (org.semanticweb.owlapi.model.OWLOntology) – The ontology containing the training data of the dataset.

  • validation (org.semanticweb.owlapi.model.OWLOntology, optional) – The ontology containing the validation data of the dataset, defaults to None.

  • testing (org.semanticweb.owlapi.model.OWLOntology, optional) – The ontology containing the testing data of the dataset, defaults to None.

property ontology

Training dataset

Return type

org.semanticweb.owlapi.model.OWLOntology

property validation

Validation dataset

Return type

org.semanticweb.owlapi.model.OWLOntology

property testing

Testing ontology

Return type

org.semanticweb.owlapi.model.OWLOntology

property classes

List of classes in the dataset. The classes are collected from training, validation and testing ontologies using the OWLAPI method ontology.getClassesInSignature().

Return type

OWLClasses

property individuals

List of individuals in the dataset. The individuals are collected from training, validation and testing ontologies using the OWLAPI method ontology.getIndividualsSignature().

Return type

OWLIndividuals

property object_properties

List of object properties (relations) in the dataset. The object properties are collected from training, validation and testing ontologies using the OWLAPI method ontology.getObjectPropertiesInSignature().

Return type

OWLObjectProperties

property evaluation_classes

List of classes used for evaluation. Depending on the dataset, this method could return a single OWLClasses object (as in PPIYeastDataset) or a tuple of OWLClasses objects (as in GDAHumanDataset). If not overriden, this method returns the classes in the testing ontology obtained from the OWLAPI method getClassesInSignature() as a OWLClasses object.

property labels

This method returns labels of entities as a dictionary. To be called, the training ontology must contain axioms of the form \(class_1 \sqsubseteq \exists http://has\_label . class_2\).

Return type

dict

class mowl.datasets.base.PathDataset(ontology_path: str, validation_path: Optional[str] = None, testing_path: Optional[str] = None)[source]

Bases: Dataset

Loads the dataset from ontology documents.

Parameters
  • ontology_path (str) – Training dataset

  • validation_path (str, optional) – Validation dataset. Defaults to None.

  • testing_path (str, optional) – Testing dataset. Defaults to None.

class mowl.datasets.base.TarFileDataset(tarfile_path: str, *args, **kwargs)[source]

Bases: PathDataset

Loads the dataset from a tar file.

Parameters
  • tarfile_path (str) – Location of the tar file

  • **kwargs – See below

Keyword Arguments
  • dataset_name (str): Name of the dataset

class mowl.datasets.base.RemoteDataset(url: str, data_root='./')[source]

Bases: TarFileDataset

Loads the dataset from a remote URL.

Parameters
  • url (str) – URL location of the dataset

  • data_root (str) – Root directory

class mowl.datasets.base.Entities(collection)[source]

Bases: object

Abstract class containing OWLEntities indexed by they IRIs

check_owl_type(collection)[source]

This method checks whether the elements in the provided collection are of the correct type.

to_dict()[source]

Generates a dictionaty indexed by OWL entities IRIs and the values are the corresponding OWL entities.

to_index_dict()[source]

Generates a dictionary indexed by OWL objects and the values are the corresponding indicies.

property as_str

Returns the list of entities as string names.

property as_owl

Returns the list of entities as OWL objects.

property as_dict

Returns the dictionary of entities indexed by their names.

property as_index_dict

Returns the dictionary of entities indexed by their names.

class mowl.datasets.base.OWLClasses(collection)[source]

Bases: Entities

Class containing OWL classes indexed by they IRIs

check_owl_type(collection)[source]

This method checks whether the elements in the provided collection are of the correct type.

class mowl.datasets.base.OWLIndividuals(collection)[source]

Bases: Entities

Class containing OWL individuals indexed by they IRIs

check_owl_type(collection)[source]

This method checks whether the elements in the provided collection are of the correct type.

class mowl.datasets.base.OWLObjectProperties(collection)[source]

Bases: Entities

Class containing OWL object properties indexed by they IRIs

check_owl_type(collection)[source]

This method checks whether the elements in the provided collection are of the correct type.

Built-in datasets

class mowl.datasets.builtin.PPIYeastDataset(url=None)[source]

Bases: RemoteDataset

This dataset represent protein–protein interactions on the yeast species. The data used for this dataset consists of the Gene Ontology released on 20-10-2021 and protein interaction data found in String Database version 11.5. Protein interaction data was randomly split 90:5:5 across training, validation and testing ontologies and Gene Ontology functional annotations of proteins is part of the training ontology only. Protein interactions are represented as an axiom of the form \(protein_1 \sqsubseteq interacts\_with . protein_2.\)

property evaluation_classes

Classes that are used in evaluation

class mowl.datasets.builtin.PPIYeastSlimDataset(*args, **kwargs)[source]

Bases: PPIYeastDataset

Reduced version of PPIYeastDataset. Tranining ontology is built from the Slim Yeast subset of Gene Ontology.

class mowl.datasets.builtin.GDADataset(url=None)[source]

Bases: RemoteDataset

Abstract class for Gene–Disease association datasets. This dataset represent the gene-disease association in a particular species. This dataset is built using phenotypic annotations of genes and diseases. For genes annotations we used the Mouse/Human Orthology with Phenotype Annotations document. Disease annotations were obtained from the HPO annotations for rare disease document. These annotations were added to the Unified Phenotype Ontology (uPheno) to build the training ontology. Futhermore, gene-disease associations were obtained from the Associations of Mouse Genes with DO Diseases file, from which associations for human and mouse were extracted (to build separate datasets) and each of them were randomly split 80:10:10, added to the training ontology and created the validation and testing ontologies, respectively.

property evaluation_classes

List of classes used for evaluation. Depending on the dataset, this method could return a single OWLClasses object (as in PPIYeastDataset) or a tuple of OWLClasses objects (as in GDAHumanDataset). If not overriden, this method returns the classes in the testing ontology obtained from the OWLAPI method getClassesInSignature() as a OWLClasses object.

class mowl.datasets.builtin.GDAHumanDataset[source]

Bases: GDADataset

class mowl.datasets.builtin.GDAHumanELDataset[source]

Bases: GDADataset

This dataset is a reduced version of GDAHumanDataset. The training ontology contains axioms in the \(\mathcal{EL}\) language.

class mowl.datasets.builtin.GDAMouseDataset[source]

Bases: GDADataset

class mowl.datasets.builtin.GDAMouseELDataset[source]

Bases: GDADataset

This dataset is a reduced version of GDAMouseDataset. The training ontology contains axioms in the \(\mathcal{EL}\) language.

class mowl.datasets.builtin.FamilyDataset(url=None)[source]

Bases: RemoteDataset

This dataset represents a family domain. It is a short ontology with 12 axioms describing family relationships. The axioms are:

\[\begin{split}\begin{align} Male & \sqsubseteq Person \\ Female & \sqsubseteq Person \\ Father & \sqsubseteq Male \\ Mother & \sqsubseteq Female \\ Father & \sqsubseteq Parent \\ Mother & \sqsubseteq Parent \\ Female \sqcap Male & \sqsubseteq \perp \\ Female \sqcap Parent & \sqsubseteq Mother \\ Male \sqcap Parent & \sqsubseteq Father \\ \exists hasChild.Person & \sqsubseteq Parent\\ Parent & \sqsubseteq Person \\ Parent & \sqsubseteq \exists hasChild. \top \end{align}\end{split}\]
property evaluation_classes

List of classes used for evaluation. Depending on the dataset, this method could return a single OWLClasses object (as in PPIYeastDataset) or a tuple of OWLClasses objects (as in GDAHumanDataset). If not overriden, this method returns the classes in the testing ontology obtained from the OWLAPI method getClassesInSignature() as a OWLClasses object.

Dataset for \(\mathcal{EL}\) language

class mowl.datasets.el.ELDataset(ontology, class_index_dict=None, object_property_index_dict=None, extended=True, device='cpu')[source]

Bases: object

This class provides data-related methods to work with \(\mathcal{EL}\) description logic language. In general, it receives an ontology, normalizes it into 4 or 7 \(\mathcal{EL}\) normal forms and returns a torch.utils.data.Dataset per normal form. In the process, the classes and object properties names are mapped to an integer values to create the datasets and the corresponding dictionaries can be input or created from scratch.

Parameters
  • ontology (org.semanticweb.owlapi.model.OWLOntology) – Input ontology that will be normalized into \(\mathcal{EL}\) normal forms

  • extended (bool, optional) – If true, the normalization process will return 7 normal forms. If false, only 4 normal forms. See Embedding the EL language for more information. Defaults to True.

  • class_index_dict (dict, optional) – Dictionary containing information class name –> index. If not provided, a dictionary will be created from the ontology classes. Defaults to None.

  • object_property_index_dict (dict, optional) – Dictionary containing information object property name –> index. If not provided, a dictionary will be created from the ontology object properties. Defaults to None.

get_gci_datasets()[source]

Returns a dictionary containing the name of the normal forms as keys and the corresponding datasets as values. This method will return 7 datasets if the class parameter extended is True, otherwise it will return only 4 datasets.

Return type

dict

property class_index_dict

Returns indexed dictionary with class names present in the dataset.

Return type

dict

property object_property_index_dict

Returns indexed dictionary with object property names present in the dataset.

Return type

dict

class mowl.datasets.el.GCI0Dataset(*args, **kwargs)[source]

Bases: GCIDataset

class mowl.datasets.el.GCI1Dataset(*args, **kwargs)[source]

Bases: GCIDataset

class mowl.datasets.el.GCI2Dataset(*args, **kwargs)[source]

Bases: GCIDataset

class mowl.datasets.el.GCI3Dataset(*args, **kwargs)[source]

Bases: GCIDataset