Datasets
mOWL is designed to handle input in OWL format. That is, you can input OWL ontologies. A mOWL dataset contains 3 ontologies: training, validation and testing.
Built-in datasets
There are several built-in datasets related to bioinformatics tasks such as protein-protein interactions prediction and gene-disease association prediction. Datasets can be found at Datasets API docs.
To access any of these datasets you can use:
from mowl.datasets.builtin import PPIYeastSlimDataset
ds = PPIYeastSlimDataset()
train_ontology = ds.ontology
valid_ontology = ds.validation
test_ontology = ds.testing
evaluation_classes = ds.evaluation_classes
Built-in datasets already contain the attribute evaluation_classes, which is used to evaluate a model on the dataset. In the PPI example, the evaluation classes correspong to ontology classes representing proteins.
Your own dataset
In case you have your own training, validation and testing ontologies, you can turn them easily to a mOWL dataset as follows:
from mowl.datasets.base import PathDataset
ds = PathDataset("training_ontology.owl",
validation_path="validation_ontology.owl",
testing_path="testing_ontology.owl")
training_axioms = ds.ontology.getAxioms()
validation_axiom = ds.validation.getAxioms()
testing_axioms = ds.testing.getAxioms()
Note
Validation and testing ontologies are optional when using PathDataset. By default they are set to None.
Attention
Custom datasets require the implementation of the evaluation_classes attribute. This can be done as:
class CustomDataset(PathDataset):
def __init__(self, *args, **kwargs):
super().__init__(train_path, valid_path, test_path)
@property
def evaluation_classes(self):
#################
# your code here
#################