ELBoxEmbeddings

This example is based on the paper Description Logic EL++ Embeddings with Intersectional Closure. This paper is based on the idea of EL Embeddings, but in this work the main point is to solve the intersectional closure problem.

In the case of EL Embeddings, the geometric objects representing ontology classes are \(n\)-dimensional balls. One of the normal forms in EL is:

\[C_1 \sqcap C_2 \sqsubseteq D\]

As we can see, there is an intersection operation \(C_1 \sqcap C_2\). Computing this intersection using balls is not a closed operations because the region contained in the intersection of two balls is not a ball. To solve that issue, this paper proposes the idea of changing the geometric objects to boxes, for which the intersection operation has the closure property.

This example is quite similar to the one found in EL Embeddings. There might be slight changes in the training part but the most important changes are in the definition of loss functions definition of the loss functions for each normal form.

import mowl
mowl.init_jvm("10g")
import torch as th

ELBoxEmbeddings (PyTorch) module

ELBoxEmbeddings defines a geometric modelling for all the GCIs in the EL language. The implementation of ELEmbeddings module can be found at mowl.nn.el.elem.module.ELBoxModule

ELBoxEmbeddings model

The module mowl.nn.el.elem.module.ELBoxModule is used in the mowl.models.elboxembeddings.model.ELBoxEmbeddings. In the use case of this example, we will test over a biological problem, which is protein-protein interactions. Given two proteins \(p_1,p_2\), the phenomenon “\(p_1\) interacts with \(p_2\)” is encoded using GCI 2 as:

\[p_1 \sqsubseteq \exists interacts\_with. p_2\]

For that, we can use the class mowl.models.elembeddings.examples.model_ppi.ELBoxPPI mode, which uses the mowl.datasets.builtin.PPIYeastSlimDataset dataset.

Training the model

from mowl.datasets.builtin import PPIYeastSlimDataset
from mowl.models.elboxembeddings.examples.model_ppi import ELBoxPPI

dataset = PPIYeastSlimDataset()

model = ELBoxPPI(dataset,
                 embed_dim=30,
                 margin=-0.05,
                 reg_norm=1,
                 learning_rate=0.001,
                 epochs=20,
                 batch_size=20000,
                 model_filepath=None,
                 device='cpu')

model.train()
  0%|          | 0/20 [00:00<?, ?it/s]
  5%|▌         | 1/20 [00:00<00:05,  3.60it/s]
 10%|█         | 2/20 [00:00<00:04,  4.13it/s]
 15%|█▌        | 3/20 [00:00<00:03,  4.37it/s]
 20%|██        | 4/20 [00:00<00:03,  4.53it/s]
 25%|██▌       | 5/20 [00:01<00:03,  4.65it/s]
 30%|███       | 6/20 [00:01<00:02,  4.70it/s]
 35%|███▌      | 7/20 [00:01<00:02,  4.50it/s]
 40%|████      | 8/20 [00:01<00:02,  4.57it/s]
 45%|████▌     | 9/20 [00:01<00:02,  4.60it/s]
 50%|█████     | 10/20 [00:02<00:02,  4.57it/s]
 55%|█████▌    | 11/20 [00:02<00:01,  4.61it/s]
 60%|██████    | 12/20 [00:02<00:01,  4.66it/s]
 65%|██████▌   | 13/20 [00:02<00:01,  4.55it/s]
 70%|███████   | 14/20 [00:03<00:01,  4.60it/s]
 75%|███████▌  | 15/20 [00:03<00:01,  4.64it/s]
 80%|████████  | 16/20 [00:03<00:00,  4.50it/s]
 85%|████████▌ | 17/20 [00:03<00:00,  4.59it/s]
 90%|█████████ | 18/20 [00:03<00:00,  4.46it/s]
 95%|█████████▌| 19/20 [00:04<00:00,  4.42it/s]
100%|██████████| 20/20 [00:04<00:00,  4.33it/s]
100%|██████████| 20/20 [00:04<00:00,  4.49it/s]

1

Evaluating the model

Now, it is time to evaluate embeddings. For this, we use the ModelRankBasedEvaluator class.

from mowl.evaluation import PPIEvaluator

model.set_evaluator(PPIEvaluator)
model.evaluate(dataset.testing)

Total running time of the script: (0 minutes 37.188 seconds)

Estimated memory usage: 1681 MB

Gallery generated by Sphinx-Gallery