Note
Go to the end to download the full example code.
ELBoxEmbeddings
This example is based on the paper Description Logic EL++ Embeddings with Intersectional Closure. This paper is based on the idea of EL Embeddings, but in this work the main point is to solve the intersectional closure problem.
In the case of EL Embeddings, the geometric objects representing ontology classes are \(n\)-dimensional balls. One of the normal forms in EL is:
As we can see, there is an intersection operation \(C_1 \sqcap C_2\). Computing this intersection using balls is not a closed operations because the region contained in the intersection of two balls is not a ball. To solve that issue, this paper proposes the idea of changing the geometric objects to boxes, for which the intersection operation has the closure property.
This example is quite similar to the one found in EL Embeddings. There might be slight changes in the training part but the most important changes are in the definition of loss functions definition of the loss functions for each normal form.
import mowl
mowl.init_jvm("10g")
import torch as th
ELBoxEmbeddings (PyTorch) module
ELBoxEmbeddings defines a geometric modelling for all the GCIs in the EL language.
The implementation of ELEmbeddings module can be found at mowl.nn.el.elem.module.ELBoxModule
ELBoxEmbeddings model
The module mowl.nn.el.elem.module.ELBoxModule
is used in the mowl.models.elboxembeddings.model.ELBoxEmbeddings
.
In the use case of this example, we will test over a biological problem, which is
protein-protein interactions. Given two proteins \(p_1,p_2\), the phenomenon
“\(p_1\) interacts with \(p_2\)” is encoded using GCI 2 as:
For that, we can use the class mowl.models.elembeddings.examples.model_ppi.ELBoxPPI
mode, which uses the mowl.datasets.builtin.PPIYeastSlimDataset
dataset.
Training the model
from mowl.datasets.builtin import PPIYeastSlimDataset
from mowl.models.elboxembeddings.examples.model_ppi import ELBoxPPI
dataset = PPIYeastSlimDataset()
model = ELBoxPPI(dataset,
embed_dim=30,
margin=-0.05,
reg_norm=1,
learning_rate=0.001,
epochs=20,
batch_size=20000,
model_filepath=None,
device='cpu')
model.train()
0%| | 0/20 [00:00<?, ?it/s]
5%|▌ | 1/20 [00:00<00:05, 3.60it/s]
10%|█ | 2/20 [00:00<00:04, 4.13it/s]
15%|█▌ | 3/20 [00:00<00:03, 4.37it/s]
20%|██ | 4/20 [00:00<00:03, 4.53it/s]
25%|██▌ | 5/20 [00:01<00:03, 4.65it/s]
30%|███ | 6/20 [00:01<00:02, 4.70it/s]
35%|███▌ | 7/20 [00:01<00:02, 4.50it/s]
40%|████ | 8/20 [00:01<00:02, 4.57it/s]
45%|████▌ | 9/20 [00:01<00:02, 4.60it/s]
50%|█████ | 10/20 [00:02<00:02, 4.57it/s]
55%|█████▌ | 11/20 [00:02<00:01, 4.61it/s]
60%|██████ | 12/20 [00:02<00:01, 4.66it/s]
65%|██████▌ | 13/20 [00:02<00:01, 4.55it/s]
70%|███████ | 14/20 [00:03<00:01, 4.60it/s]
75%|███████▌ | 15/20 [00:03<00:01, 4.64it/s]
80%|████████ | 16/20 [00:03<00:00, 4.50it/s]
85%|████████▌ | 17/20 [00:03<00:00, 4.59it/s]
90%|█████████ | 18/20 [00:03<00:00, 4.46it/s]
95%|█████████▌| 19/20 [00:04<00:00, 4.42it/s]
100%|██████████| 20/20 [00:04<00:00, 4.33it/s]
100%|██████████| 20/20 [00:04<00:00, 4.49it/s]
1
Evaluating the model
Now, it is time to evaluate embeddings. For this, we use the
ModelRankBasedEvaluator
class.
from mowl.evaluation import PPIEvaluator
model.set_evaluator(PPIEvaluator)
model.evaluate(dataset.testing)
Total running time of the script: (0 minutes 37.188 seconds)
Estimated memory usage: 1681 MB