Datasets

mOWL is designed to handle input in OWL format. That is, you can input OWL ontologies. A mOWL dataset contains 3 ontologies: training, validation and testing.

Built-in datasets

There are several built-in datasets related to bioinformatics tasks such as protein-protein interactions prediction and gene-disease association prediction. Datasets can be found at Datasets API docs.

To access any of these datasets you can use:

from mowl.datasets.builtin import PPIYeastSlimDataset
ds = PPIYeastSlimDataset()
train_ontology = ds.ontology
valid_ontology = ds.validation
test_ontology = ds.testing

evaluation_classes = ds.evaluation_classes

Built-in datasets already contain the attribute evaluation_classes, which is used to evaluate a model on the dataset. In the PPI example, the evaluation classes correspong to ontology classes representing proteins.

Your own dataset

In case you have your own training, validation and testing ontologies, you can turn them easily to a mOWL dataset as follows:

from mowl.datasets.base import PathDataset
ds = PathDataset("training_ontology.owl",
                 validation_path="validation_ontology.owl",
                 testing_path="testing_ontology.owl")

training_axioms = ds.ontology.getAxioms()
validation_axiom = ds.validation.getAxioms()
testing_axioms = ds.testing.getAxioms()

Note

Validation and testing ontologies are optional when using PathDataset. By default they are set to None.

Attention

Custom datasets require the implementation of the evaluation_classes attribute. This can be done as:

class CustomDataset(PathDataset):
    def __init__(self, *args, **kwargs):
        super().__init__(train_path, valid_path, test_path)

   @property
   def evaluation_classes(self):
       #################
       # your code here
       #################