Datasets

mOWL is designed to handle input in OWL format. That is, you can input OWL ontologies. A mOWL dataset contains 3 ontologies: training, validation and testing.

Built-in datasets

There are several built-in datasets related to bioinformatics tasks such as protein-protein interactions prediction and gene-disease association prediction. Datasets can be found at Datasets API docs.

To access any of these datasets you can use:

from mowl.datasets.builtin import PPIYeastSlimDataset
ds = PPIYeastSlimDataset()
train_ontology = ds.ontology
valid_ontology = ds.validation
test_ontology = ds.testing

evaluation_classes = ds.evaluation_classes

Built-in datasets already contain the attribute evaluation_classes, which is used to evaluate a model on the dataset. In the PPI example, the evaluation classes correspong to ontology classes representing proteins.

Your own dataset

In case you have your own training, validation and testing ontologies, you can turn them easily to a mOWL dataset as follows:

from mowl.datasets.base import PathDataset
ds = PathDataset("training_ontology.owl",
                 validation_path="validation_ontology.owl",
                 testing_path="testing_ontology.owl")

training_axioms = ds.ontology.getAxioms()
validation_axiom = ds.validation.getAxioms()
testing_axioms = ds.testing.getAxioms()

Note

Validation and testing ontologies are optional when using PathDataset. By default they are set to None.

Attention

Custom datasets require the implementation of the evaluation_classes attribute. This can be done as:

class CustomDataset(PathDataset):
    def __init__(self, *args, **kwargs):
        super().__init__(train_path, valid_path, test_path)

   @property
   def evaluation_classes(self):
       #################
       # your code here
       #################