Embedding with Random Walks

After generating a graph from an ontology, one possible next step is to generate random walks. mOWL provides two different algorithms for random walks generation. All the implemented projectors can be found in

The algorithms in mOWL are a variation from the original ones. Graphs obtained from ontologies always have labeled edges, therefore the edge labels are included in the random walks.

Important

Random walks with size \(n\) will include \(n\) nodes with its edges (except in the last node). Therefore a random walk with size \(n\) will be at most \(2n-1\) long.

In generating a graph from an ontology, we saw that graphs were represented as an edge list and each edge was an instance of the Edge class. This edge list is the input or the random walk methods.

For example, let’s take DeepWalk:

from mowl.walking import DeepWalk
walker =  DeepWalk(
             10, #num_walks,
             8, #walk_length,
             0.1, #alpha
             outfile = "/tmp/walks2.txt", # /optional/path/to/save/walks,
             workers = 4)

Tip

Information about each method can be found at Walking models API docs.

After generating an edge list where each element is instance of Edge, the walks can be generated by:

walker.walk(edges)

The walks will be stored in walker.outfile file.

Filtering random walks

New in version 0.1.0.

It is possible to input a list of nodes (strings) in order to generate random walks that include at least one of the nodes of interest.

from mowl.projection import Edge

edge1 = Edge("node_1", "rel", "node_2")
edge2 = Edge("node_1", "rel", "node_3")
edge3 = Edge("node_3", "rel", "node_4")

edges = [edge1, edge2, edge3]

Let’s see the difference of filtered and non-filtered random walks:

  • No filtered

from mowl.walking import DeepWalk

walker = DeepWalk(6,3,alpha=0,outfile="no_filtered_walks", workers=4)
walker.walk(edges)
with open("no_filtered_walks", "r") as f:
    lines = f.readlines()
    lines.sort()
    print(lines)

The output will include the following walks:

node_1 rel node_2
node_1 rel node_3 rel node_4
node_3 rel node_4
  • Filtered

from mowl.walking import DeepWalk

walker2 = DeepWalk(3,3,alpha=0,outfile="filtered_walks", workers=4)
walker2.walk(edges, nodes_of_interest = ["node_1", "node_2"])
with open("filtered_walks", "r") as f:
    lines = f.readlines()
    lines.sort()
    print(lines)

In this case, the output will include the following walks:

node_1 rel node_2
node_1 rel node_3 rel node_4

Hint

The walk node_3 rel node_4 is not included in this case because it does not contain any of the nodes_of_interest.

Note

In the case that any “filtering node” does not exist in the graph, a Warning will be raised.

Generating embeddings

Once the walks are generated, they can be used to generate embeddings using, for example, a Word2Vec model:

from gensim.models.word2vec import LineSentence
from gensim.models import Word2Vec

walk_corpus_file = walker.outfile
sentences = LineSentence(walk_corpus_file)

w2v_model = Word2Vec(sentences)
w2v_model.save("/tmp/my_word2vec_outfile")

Generating embeddings using a mOWL model

Although the embedding generations can be done step by step, we also provide a class that performs all the steps internally:

from mowl.datasets.builtin import FamilyDataset
from mowl.models import RandomWalkPlusW2VModel
from mowl.projection import DL2VecProjector
from mowl.walking import DeepWalk

# Setup and train
model = RandomWalkPlusW2VModel(FamilyDataset())
model.set_projector(DL2VecProjector())
model.set_walker(DeepWalk(1,1))
model.set_w2v_model(min_count=1)
model.train()

# Get embeddings

class_embs = model.class_embeddings
role_embs = model.object_property_embeddings
ind_embs = model.individual_embeddings