Embedding with KGE methods
PyKEEN integration
Generating graphs from ontologies opens a wide range of possibilities on Knowledge Graph Embeddings. PyKEEN is a Python package for reproducible, facile knowledge graph embeddings. mOWL provides some functionalities to ease the integration with PyKEEN methods that are subclasses of pykeen.models.base.EntityRelationEmbeddingModel
or pykeen.models.nbase.ERModel
. After generating a graph from an ontology, the output is a list of Edge
. It is possible to transform this list to a PyKEEN pykeen.triples.TriplesFactory
class:
from mowl.projection.edge import Edge
from mowl.datasets.builtin import PPIYeastSlimDataset
from mowl.projection import TaxonomyProjector
ds = PPIYeastSlimDataset()
proj = TaxonomyProjector(True)
edges = proj.project(ds.ontology)
#edges = [Edge("node1", "rel1", "node3"), Edge("node5", "rel2", "node1"), Edge("node2", "rel1", "node1")] # example of edges
triples_factory = Edge.as_pykeen(edges, create_inverse_triples = True)
Note
The create_inverse_triples
parameter belongs to PyKEEN triples factory method.
Now, this triples factory can be used to call a PyKEEN model:
from pykeen.models import TransE
pk_model = TransE(triples_factory=triples_factory, embedding_dim = 50, random_seed=42)
At this point, it is possible to continue in either in PyKEEN or mOWL environments. mOWL mowl.kge.model.KGEModel
wraps the pykeen.training.SLCWATrainingLoop
construction:
from mowl.kge import KGEModel
model = KGEModel(triples_factory, pk_model, epochs = 10, batch_size = 32)
model.train()
ent_embs = model.class_embeddings_dict
rel_embs = model.object_property_embeddings_dict
Attention
PyKEEN might generate more than one embedding vector per entity. However, in mOWL wrapping class only the primary embedding vector is returned.
Generating embeddings using a mOWL model
Although the embedding generations can be done step by step, we also provide a class that performs all the steps internally:
from mowl.datasets.builtin import FamilyDataset
from mowl.models import GraphPlusPyKEENModel
from mowl.projection import DL2VecProjector
from pykeen.models import TransE
import torch as th
model = GraphPlusPyKEENModel(FamilyDataset())
model.set_projector(DL2VecProjector())
model.set_kge_method(TransE, random_seed=42)
model.optimizer = th.optim.Adam
model.lr = 0.001
model.batch_size = 32
model.train(epochs = 2)
# Get embeddings
class_embs = model.class_embeddings
role_embs = model.object_property_embeddings
ind_embs = model.individual_embeddings