Embedding with Random Walks
After generating a graph from an ontology, one possible next step is to generate random walks. mOWL provides two different algorithms for random walks generation. All the implemented projectors can be found in
The algorithms in mOWL are a variation from the original ones. Graphs obtained from ontologies always have labeled edges, therefore the edge labels are included in the random walks.
Important
Random walks with size \(n\) will include \(n\) nodes with its edges (except in the last node). Therefore a random walk with size \(n\) will be at most \(2n-1\) long.
In generating a graph from an ontology, we saw that graphs were represented as an edge list and each edge was an instance of the Edge
class. This edge list is the input or the random walk methods.
For example, let’s take DeepWalk
:
from mowl.walking import DeepWalk
walker = DeepWalk(
10, #num_walks,
8, #walk_length,
0.1, #alpha
outfile = "/tmp/walks2.txt", # /optional/path/to/save/walks,
workers = 4)
Tip
Information about each method can be found at Walking models API docs.
After generating an edge list where each element is instance of Edge
, the walks can be generated by:
walker.walk(edges)
The walks will be stored in walker.outfile
file.
Filtering random walks
New in version 0.1.0.
It is possible to input a list of nodes (strings) in order to generate random walks that include at least one of the nodes of interest.
from mowl.projection import Edge
edge1 = Edge("node_1", "rel", "node_2")
edge2 = Edge("node_1", "rel", "node_3")
edge3 = Edge("node_3", "rel", "node_4")
edges = [edge1, edge2, edge3]
Let’s see the difference of filtered and non-filtered random walks:
No filtered
from mowl.walking import DeepWalk
walker = DeepWalk(6,3,alpha=0,outfile="no_filtered_walks", workers=4)
walker.walk(edges)
with open("no_filtered_walks", "r") as f:
lines = f.readlines()
lines.sort()
print(lines)
The output will include the following walks:
node_1 rel node_2
node_1 rel node_3 rel node_4
node_3 rel node_4
Filtered
from mowl.walking import DeepWalk
walker2 = DeepWalk(3,3,alpha=0,outfile="filtered_walks", workers=4)
walker2.walk(edges, nodes_of_interest = ["node_1", "node_2"])
with open("filtered_walks", "r") as f:
lines = f.readlines()
lines.sort()
print(lines)
In this case, the output will include the following walks:
node_1 rel node_2
node_1 rel node_3 rel node_4
Hint
The walk node_3 rel node_4
is not included in this case because it does not contain any of the nodes_of_interest
.
Note
In the case that any “filtering node” does not exist in the graph, a Warning will be raised.
Generating embeddings
Once the walks are generated, they can be used to generate embeddings using, for example, a Word2Vec
model:
from gensim.models.word2vec import LineSentence
from gensim.models import Word2Vec
walk_corpus_file = walker.outfile
sentences = LineSentence(walk_corpus_file)
w2v_model = Word2Vec(sentences)
w2v_model.save("/tmp/my_word2vec_outfile")
Generating embeddings using a mOWL model
Although the embedding generations can be done step by step, we also provide a class that performs all the steps internally:
from mowl.datasets.builtin import FamilyDataset
from mowl.models import RandomWalkPlusW2VModel
from mowl.projection import DL2VecProjector
from mowl.walking import DeepWalk
# Setup and train
model = RandomWalkPlusW2VModel(FamilyDataset())
model.set_projector(DL2VecProjector())
model.set_walker(DeepWalk(1,1))
model.set_w2v_model(min_count=1)
model.train()
# Get embeddings
class_embs = model.class_embeddings
role_embs = model.object_property_embeddings
ind_embs = model.individual_embeddings