Skip to content

Generates a set of property-specific entity embeddings from knowledge graphs using node2vec

License

Notifications You must be signed in to change notification settings

D2KLab/entity2vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

entity2vec

entity2vec computes vector representations of Knowledge Graph entities that preserve semantic similarities and are suitable for classification tasks. It generates a set of property-specific entity embeddings by running node2vec on property specific subgraphs, i.e. K(p) = (s,p,o) where p is a given property. The repository includes:

  • A reimplementation of node2vec, which introduces the possibility of avoiding the preprocessing of the transition probabilities, which has the effect of reducing memory effort, but slowing down the computation

  • entity2vec, which generates a set of entity embeddings from Knowledge Graphs corresponding to different properties. Entity2vec can work with a set of pre-downloaded dumps or download them from a SPARQL endpoint.

Requirements

  • Python 2.7 or above
  • numpy
  • gensim
  • networkx
  • pandas
  • SPARQL Wrapper

If you are using pip:

    pip install -r requirements.txt

Property-specific entity embeddings

The set of properties can be defined in the configuration file config/properties.json, otherwise the software will run on each file that is located in datasets/your_dataset/graphs or if a SPARQL endpoint is provided, it will download all the graphs for all properties in datasets/your_dataset/graphs.

    python src/entity2vec.py --dataset dataset --config_file config_file --entities entities --sparql sparql --default_graph default_graph

Alternatively, e2v can be loaded as a module and used like:

    from entity2vec.entity2vec import Entity2Vec

    e2v = Entity2Vec(False, False, False, 1, 1, 10, 5,
                 128, 10, 8, 5, 'path/to/properties.json', False,
                 'dataset_name', 'all', False, False,
                 'path/to/feedback.edgelist')
option default description
dataset null (Required) name of the dataset. It will be used to create folders and retrieve properties from config file
config_file config/properties.json path of the configuration file
entities all a list of entities for which the embeddings have to be computed. By default, it will use them all.
sparql null endpoint from which property-specific graphs are obtained. If not provided, it assumes that the graphs are already stored in datasets/your_dataset/graphs
default_graph null whether using a default_graph in the SPARQL endpoint
num_walks 500 number of random walks per entity
feedback_file null Path to a DAT file that contains all the couples user-item. If not defined, it assumes that is the file datasets/<my_dataset>/graphs/feedback.edgelist

Entity classification

Generate unique vector representation for an entity, without considering the role of semantic properties, to use in classification tasks.

  1. Create empty directory called emb

  2. Run node2vec on the whole graph to create a single global embedding of the entity

     python src/node2vec.py --input datasets/aifb/aifb.edgelist --output emb/aifb_p1_q4.emd  --p 1 --q 4
    
  3. Obtain scores, e.g.:

     cd ml
    
     python rdf_predict.py --dataset aifb --emb ../emb/aifb_p1_q4.emd --dimension 500
    

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •