Embedding with KGE methods

PyKEEN integration

Generating graphs from ontologies opens a wide range of possibilities on Knowledge Graph Embeddings. PyKEEN is a Python package for reproducible, facile knowledge graph embeddings. mOWL provides some functionalities to ease the integration with PyKEEN methods that are subclasses of pykeen.models.base.EntityRelationEmbeddingModel or pykeen.models.nbase.ERModel. After generating a graph from an ontology, the output is a list of Edge. It is possible to transform this list to a PyKEEN pykeen.triples.TriplesFactory class:

from mowl.projection.edge import Edge
from mowl.datasets.builtin import PPIYeastSlimDataset
from mowl.projection import TaxonomyProjector

ds = PPIYeastSlimDataset()
proj = TaxonomyProjector(True)

edges = proj.project(ds.ontology)

#edges = [Edge("node1", "rel1", "node3"), Edge("node5", "rel2", "node1"), Edge("node2", "rel1", "node1")] # example of edges
triples_factory = Edge.as_pykeen(edges, create_inverse_triples = True)

Note

The create_inverse_triples parameter belongs to PyKEEN triples factory method.

Now, this triples factory can be used to call a PyKEEN model:

from pykeen.models import TransE
pk_model = TransE(triples_factory=triples_factory, embedding_dim = 50, random_seed=42)

At this point, it is possible to continue in either in PyKEEN or mOWL environments. mOWL mowl.kge.model.KGEModel wraps the pykeen.training.SLCWATrainingLoop construction:

from mowl.kge import KGEModel

model = KGEModel(triples_factory, pk_model, epochs = 10, batch_size = 32)
model.train()
ent_embs = model.class_embeddings_dict
rel_embs = model.object_property_embeddings_dict

Attention

PyKEEN might generate more than one embedding vector per entity. However, in mOWL wrapping class only the primary embedding vector is returned.

Generating embeddings using a mOWL model

Although the embedding generations can be done step by step, we also provide a class that performs all the steps internally:

from mowl.datasets.builtin import FamilyDataset
from mowl.models import GraphPlusPyKEENModel
from mowl.projection import DL2VecProjector
from pykeen.models import TransE
import torch as th

model = GraphPlusPyKEENModel(FamilyDataset())
model.set_projector(DL2VecProjector())
model.set_kge_method(TransE, random_seed=42)
model.optimizer = th.optim.Adam
model.lr = 0.001
model.batch_size = 32
model.train(epochs = 2)

# Get embeddings

class_embs = model.class_embeddings
role_embs = model.object_property_embeddings
ind_embs = model.individual_embeddings