pykeen

🤖 A Python library for learning and evaluating knowledge graph embeddings

Showing:

Popularity

Downloads/wk

0

GitHub Stars

535

Maintenance

Last Commit

20d ago

Contributors

22

Package

Dependencies

32

License

MIT

Categories

Readme

PyKEEN

GitHub Actions License DOI Optuna integrated

PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information).

InstallationQuickstartDatasetsModelsSupportCitation

Installation PyPI - Python Version PyPI

The latest stable version of PyKEEN can be downloaded and installed from PyPI with:

$ pip install pykeen

The latest version of PyKEEN can be installed directly from the source on GitHub with:

$ pip install git+https://github.com/pykeen/pykeen.git

More information about installation (e.g., development mode, Windows installation, Colab, Kaggle, extras) can be found in the installation documentation.

Quickstart Documentation Status

This example shows how to train a model on a dataset and test on another dataset.

The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.

from pykeen.pipeline import pipeline

result = pipeline(
    model='TransE',
    dataset='nations',
)

The results are returned in an instance of the PipelineResult dataclass that has attributes for the trained model, the training loop, the evaluation, and more. See the tutorials on using your own dataset, understanding the evaluation, and making novel link predictions.

PyKEEN is extensible such that:

  • Each model has the same API, so anything from pykeen.models can be dropped in
  • Each training loop has the same API, so pykeen.training.LCWATrainingLoop can be dropped in
  • Triples factories can be generated by the user with from pykeen.triples.TriplesFactory

The full documentation can be found at https://pykeen.readthedocs.io.

Implementation

Below are the models, datasets, training modes, evaluators, and metrics implemented in pykeen.

Datasets (29)

The following datasets are built in to PyKEEN. The citation for each dataset corresponds to either the paper describing the dataset, the first paper published using the dataset with knowledge graph embedding models, or the URL for the dataset if neither of the first two are available. If you want to use a custom dataset, see the Bring Your Own Dataset tutorial. If you have a suggestion for another dataset to include in PyKEEN, please let us know here.

NameDocumentationCitationEntitiesRelationsTriples
BioKGpykeen.datasets.BioKGWalsh et al., 2019105524172067997
Clinical Knowledge Graphpykeen.datasets.CKGSantos et al., 202076174191126691525
CN3l Familypykeen.datasets.CN3lChen et al., 201732064221777
CoDEx (large)pykeen.datasets.CoDExLargeSafavi et al., 20207795169612437
CoDEx (medium)pykeen.datasets.CoDExMediumSafavi et al., 20201705051206205
CoDEx (small)pykeen.datasets.CoDExSmallSafavi et al., 202020344236543
ConceptNetpykeen.datasets.ConceptNetSpeer et al., 2017283700835034074917
Countriespykeen.datasets.CountriesBouchard et al., 201527121158
Commonsense Knowledge Graphpykeen.datasets.CSKGIlievski et al., 20202087833584598728
DB100Kpykeen.datasets.DB100KDing et al., 201899604470697479
DBpedia50pykeen.datasets.DBpedia50Shi et al., 20172462435134421
Drug Repositioning Knowledge Graphpykeen.datasets.DRKGgnn4dr/DRKG972381075874257
FB15kpykeen.datasets.FB15kBordes et al., 2013149511345592213
FB15k-237pykeen.datasets.FB15k237Toutanova et al., 201514505237310079
Hetionetpykeen.datasets.HetionetHimmelstein et al., 201745158242250197
Kinshipspykeen.datasets.KinshipsKemp et al., 20061042510686
Nationspykeen.datasets.NationsZhenfengLei/KGDatasets14551992
OGB BioKGpykeen.datasets.OGBBioKGHu et al., 202045085515088433
OGB WikiKGpykeen.datasets.OGBWikiKGHu et al., 2020250060453517137181
OpenBioLinkpykeen.datasets.OpenBioLinkBreit et al., 2020180992284563407
OpenBioLinkpykeen.datasets.OpenBioLinkLQBreit et al., 20204808763227320889
Unified Medical Language Systempykeen.datasets.UMLSZhenfengLei/KGDatasets135466529
WD50K (triples)pykeen.datasets.WD50KTGalkin et al., 202040107473232344
Wikidata5Mpykeen.datasets.Wikidata5MWang et al., 2019459414982220624239
WK3l-120k Familypykeen.datasets.WK3l120kChen et al., 201711974831091375406
WK3l-15k Familypykeen.datasets.WK3l15kChen et al., 2017151261841209041
WordNet-18pykeen.datasets.WN18Bordes et al., 20144094318151442
WordNet-18 (RR)pykeen.datasets.WN18RRToutanova et al., 2015405591192583
YAGO3-10pykeen.datasets.YAGO310Mahdisoltani et al., 2015123143371089000

Models (31)

NameReferenceCitation
CompGCNpykeen.models.CompGCNVashishth et al., 2020
ComplExpykeen.models.ComplExTrouillon et al., 2016
ComplEx Literalpykeen.models.ComplExLiteralKristiadi et al., 2018
ConvEpykeen.models.ConvEDettmers et al., 2018
ConvKBpykeen.models.ConvKBNguyen et al., 2018
CrossEpykeen.models.CrossEZhang et al., 2019
DistMApykeen.models.DistMAShi et al., 2019
DistMultpykeen.models.DistMultYang et al., 2014
DistMult Literalpykeen.models.DistMultLiteralKristiadi et al., 2018
ER-MLPpykeen.models.ERMLPDong et al., 2014
ER-MLP (E)pykeen.models.ERMLPESharifzadeh et al., 2019
HolEpykeen.models.HolENickel et al., 2016
KG2Epykeen.models.KG2EHe et al., 2015
MuREpykeen.models.MuREBalažević et al., 2019
NTNpykeen.models.NTNSocher et al., 2013
PairREpykeen.models.PairREChao et al., 2020
ProjEpykeen.models.ProjEShi et al., 2017
QuatEpykeen.models.QuatEZhang et al., 2019
RESCALpykeen.models.RESCALNickel et al., 2011
R-GCNpykeen.models.RGCNSchlichtkrull et al., 2018
RotatEpykeen.models.RotatESun et al., 2019
SimplEpykeen.models.SimplEKazemi et al., 2018
Structured Embeddingpykeen.models.StructuredEmbeddingBordes et al., 2011
TorusEpykeen.models.TorusEEbisu et al., 2018
TransDpykeen.models.TransDJi et al., 2015
TransEpykeen.models.TransEBordes et al., 2013
TransFpykeen.models.TransFFeng et al., 2016
TransHpykeen.models.TransHWang et al., 2014
TransRpykeen.models.TransRLin et al., 2015
TuckERpykeen.models.TuckERBalažević et al., 2019
Unstructured Modelpykeen.models.UnstructuredModelBordes et al., 2014

Losses (9)

NameReferenceDescription
Binary cross entropy (after sigmoid)pykeen.losses.BCEAfterSigmoidLossA module for the numerically unstable version of explicit Sigmoid + BCE loss.
Binary cross entropy (with logits)pykeen.losses.BCEWithLogitsLossA module for the binary cross entropy loss.
Cross entropypykeen.losses.CrossEntropyLossA module for the cross entropy loss that evaluates the cross entropy after softmax output.
Double Marginpykeen.losses.DoubleMarginLossA limit-based scoring loss, with separate margins for positive and negative elements from [sun2018]_.
Focalpykeen.losses.FocalLossA module for the focal loss proposed by [lin2018]_.
Margin rankingpykeen.losses.MarginRankingLossA module for the margin ranking loss.
Mean square errorpykeen.losses.MSELossA module for the mean square error loss.
Self-adversarial negative samplingpykeen.losses.NSSALossAn implementation of the self-adversarial negative sampling loss function proposed by [sun2019]_.
Softpluspykeen.losses.SoftplusLossA module for the softplus loss.

Regularizers (5)

NameReferenceDescription
combinedpykeen.regularizers.CombinedRegularizerA convex combination of regularizers.
lppykeen.regularizers.LpRegularizerA simple L_p norm based regularizer.
nopykeen.regularizers.NoRegularizerA regularizer which does not perform any regularization.
powersumpykeen.regularizers.PowerSumRegularizerA simple x^p based regularizer.
transhpykeen.regularizers.TransHRegularizerA regularizer for the soft constraints in TransH.

Optimizers (6)

NameReferenceDescription
adadeltatorch.optim.AdadeltaImplements Adadelta algorithm.
adagradtorch.optim.AdagradImplements Adagrad algorithm.
adamtorch.optim.AdamImplements Adam algorithm.
adamaxtorch.optim.AdamaxImplements Adamax algorithm (a variant of Adam based on infinity norm).
adamwtorch.optim.AdamWImplements AdamW algorithm.
sgdtorch.optim.SGDImplements stochastic gradient descent (optionally with momentum).

Training Loops (2)

NameReferenceDescription
lcwapykeen.training.LCWATrainingLoopA training loop that uses the local closed world assumption training approach.
slcwapykeen.training.SLCWATrainingLoopA training loop that uses the stochastic local closed world assumption training approach.

Negative Samplers (3)

NameReferenceDescription
basicpykeen.sampling.BasicNegativeSamplerA basic negative sampler.
bernoullipykeen.sampling.BernoulliNegativeSamplerAn implementation of the Bernoulli negative sampling approach proposed by [wang2014]_.
pseudotypedpykeen.sampling.PseudoTypedNegativeSamplerA sampler that accounts for which entities co-occur with a relation.

Stoppers (2)

NameReferenceDescription
earlypykeen.stoppers.EarlyStopperA harness for early stopping.
noppykeen.stoppers.NopStopperA stopper that does nothing.

Evaluators (2)

NameReferenceDescription
rankbasedpykeen.evaluation.RankBasedEvaluatorA rank-based evaluator for KGE models.
sklearnpykeen.evaluation.SklearnEvaluatorAn evaluator that uses a Scikit-learn metric.

Metrics (16)

NameDescription
AUC-ROCThe area under the ROC curve, on [0, 1]. Higher is better.
Adjusted Arithmetic Mean Rank (AAMR)The mean over all chance-adjusted ranks, on (0, 2). Lower is better.
Adjusted Arithmetic Mean Rank Index (AAMRI)The re-indexed adjusted mean rank (AAMR), on [-1, 1]. Higher is better.
Average PrecisionThe area under the precision-recall curve, on [0, 1]. Higher is better.
Geometric Mean Rank (GMR)The geometric mean over all ranks, on [1, inf). Lower is better.
Harmonic Mean Rank (HMR)The harmonic mean over all ranks, on [1, inf). Lower is better.
Hits @ KThe relative frequency of ranks not larger than a given k, on [0, 1]. Higher is better
Inverse Arithmetic Mean Rank (IAMR)The inverse of the arithmetic mean over all ranks, on (0, 1]. Higher is better.
Inverse Geometric Mean Rank (IGMR)The inverse of the geometric mean over all ranks, on (0, 1]. Higher is better.
Inverse Median RankThe inverse of the median over all ranks, on (0, 1]. Higher is better.
Mean Rank (MR)The arithmetic mean over all ranks on, [1, inf). Lower is better.
Mean Reciprocal Rank (MRR)The inverse of the harmonic mean over all ranks, on (0, 1]. Higher is better.
Median RankThe median over all ranks, on [1, inf). Lower is better.

Trackers (7)

NameReferenceDescription
consolepykeen.trackers.ConsoleResultTrackerA class that directly prints to console.
csvpykeen.trackers.CSVResultTrackerTracking results to a CSV file.
jsonpykeen.trackers.JSONResultTrackerTracking results to a JSON lines file.
mlflowpykeen.trackers.MLFlowResultTrackerA tracker for MLflow.
neptunepykeen.trackers.NeptuneResultTrackerA tracker for Neptune.ai.
tensorboardpykeen.trackers.TensorBoardResultTrackerA tracker for TensorBoard.
wandbpykeen.trackers.WANDBResultTrackerA tracker for Weights and Biases.

Hyper-parameter Optimization

Samplers (3)

NameReferenceDescription
gridoptuna.samplers.GridSamplerSampler using grid search.
randomoptuna.samplers.RandomSamplerSampler using random sampling.
tpeoptuna.samplers.TPESamplerSampler using TPE (Tree-structured Parzen Estimator) algorithm.

Any sampler class extending the optuna.samplers.BaseSampler, such as their sampler implementing the CMA-ES algorithm, can also be used.

Experimentation

Reproduction

PyKEEN includes a set of curated experimental settings for reproducing past landmark experiments. They can be accessed and run like:

$ pykeen experiments reproduce tucker balazevic2019 fb15k

Where the three arguments are the model name, the reference, and the dataset. The output directory can be optionally set with -d.

Ablation

PyKEEN includes the ability to specify ablation studies using the hyper-parameter optimization module. They can be run like:

$ pykeen experiments ablation ~/path/to/config.json

Large-scale Reproducibility and Benchmarking Study

We used PyKEEN to perform a large-scale reproducibility and benchmarking study which are described in our article:

@article{ali2020benchmarking,
  title={Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework},
  author={Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Galkin, Mikhail and Sharifzadeh, Sahand and Fischer, Asja and Tresp, Volker and Lehmann, Jens},
  journal={arXiv preprint arXiv:2006.13365},
  year={2020}
}

We have made all code, experimental configurations, results, and analyses that lead to our interpretations available at https://github.com/pykeen/benchmarking.

Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

Acknowledgements

Supporters

This project has been supported by several organizations (in alphabetical order):

Funding

The development of PyKEEN has been funded by the following grants:

Funding BodyProgramGrant
DARPAAutomating Scientific Knowledge Extraction (ASKE)HR00111990009
German Federal Ministry of Education and Research (BMBF)Maschinelles Lernen mit Wissensgraphen (MLWin)01IS18050D
German Federal Ministry of Education and Research (BMBF)Munich Center for Machine Learning (MCML)01IS18036A
Innovation Fund Denmark (Innovationsfonden)Danish Center for Big Data Analytics driven Innovation (DABAI)Grand Solutions

The PyKEEN logo was designed by Carina Steinborn

Citation

If you have found PyKEEN useful in your work, please consider citing our article:

@article{ali2021pykeen,
    author = {Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Sharifzadeh, Sahand and Tresp, Volker and Lehmann, Jens},
    journal = {Journal of Machine Learning Research},
    number = {82},
    pages = {1--6},
    title = {{PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings}},
    url = {http://jmlr.org/papers/v22/20-825.html},
    volume = {22},
    year = {2021}
}

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100
No reviews found
Be the first to rate

Alternatives

No alternatives found

Tutorials

No tutorials found
Add a tutorial