Embedding with Random Walks ============================== .. |projection| replace:: :doc:`generating a graph from an ontology ` .. testsetup:: from mowl.projection import Edge edges = [Edge("node1", "rel1", "node2"), Edge("node2", "rel3", "node2")] After |projection|, one possible next step is to generate random walks. mOWL provides two different algorithms for random walks generation. All the implemented projectors can be found in The algorithms in mOWL are a variation from the original ones. Graphs obtained from ontologies always have labeled edges, therefore the **edge labels are included** in the random walks. .. important:: Random walks with size :math:`n` will include :math:`n` nodes with its edges (except in the last node). Therefore a random walk with size :math:`n` will be at most :math:`2n-1` long. In |projection|, we saw that graphs were represented as an edge list and each edge was an instance of the :class:`Edge ` class. This edge list is the input or the random walk methods. For example, let's take :class:`DeepWalk `: .. testcode:: from mowl.walking import DeepWalk walker = DeepWalk( 10, #num_walks, 8, #walk_length, 0.1, #alpha outfile = "/tmp/walks2.txt", # /optional/path/to/save/walks, workers = 4) .. tip:: Information about each method can be found at :doc:`Walking models API docs <../../api/walking/index>`. After generating an edge list where each element is instance of :class:`Edge `, the walks can be generated by: .. testcode:: walker.walk(edges) The walks will be stored in ``walker.outfile`` file. Filtering random walks ------------------------ .. versionadded:: 0.1.0 It is possible to input a list of nodes (strings) in order to generate random walks that include at least one of the nodes of interest. .. testcode:: filtered from mowl.projection import Edge edge1 = Edge("node_1", "rel", "node_2") edge2 = Edge("node_1", "rel", "node_3") edge3 = Edge("node_3", "rel", "node_4") edges = [edge1, edge2, edge3] Let's see the difference of filtered and non-filtered random walks: * No filtered .. testcode:: filtered from mowl.walking import DeepWalk walker = DeepWalk(6,3,alpha=0,outfile="no_filtered_walks", workers=4) walker.walk(edges) .. code:: python with open("no_filtered_walks", "r") as f: lines = f.readlines() lines.sort() print(lines) The output will include the following walks: .. code:: bash node_1 rel node_2 node_1 rel node_3 rel node_4 node_3 rel node_4 * Filtered .. testcode:: filtered from mowl.walking import DeepWalk walker2 = DeepWalk(3,3,alpha=0,outfile="filtered_walks", workers=4) walker2.walk(edges, nodes_of_interest = ["node_1", "node_2"]) .. code:: python with open("filtered_walks", "r") as f: lines = f.readlines() lines.sort() print(lines) In this case, the output will include the following walks: .. code:: bash node_1 rel node_2 node_1 rel node_3 rel node_4 .. hint:: The walk ``node_3 rel node_4`` is not included in this case because it does not contain any of the ``nodes_of_interest``. .. note:: In the case that any "filtering node" does not exist in the graph, a Warning will be raised. Generating embeddings --------------------- Once the walks are generated, they can be used to generate embeddings using, for example, a :class:`Word2Vec ` model: .. testcode:: from gensim.models.word2vec import LineSentence from gensim.models import Word2Vec walk_corpus_file = walker.outfile sentences = LineSentence(walk_corpus_file) w2v_model = Word2Vec(sentences) w2v_model.save("/tmp/my_word2vec_outfile") Generating embeddings using a mOWL model ------------------------------------------------- Although the embedding generations can be done step by step, we also provide a class that performs all the steps internally: .. testcode:: from mowl.datasets.builtin import FamilyDataset from mowl.models import RandomWalkPlusW2VModel from mowl.projection import DL2VecProjector from mowl.walking import DeepWalk # Setup and train model = RandomWalkPlusW2VModel(FamilyDataset()) model.set_projector(DL2VecProjector()) model.set_walker(DeepWalk(1,1)) model.set_w2v_model(min_count=1) model.train() # Get embeddings class_embs = model.class_embeddings role_embs = model.object_property_embeddings ind_embs = model.individual_embeddings