rnn

rnnmorph

pypi i rnnmorph

Readme

rnnmorph

Current version on PyPI Python versions Tests Status Code Climate

Morphological analyzer (POS tagger) for Russian and English languages based on neural networks and dictionary-lookup systems (pymorphy2, nltk).

Contacts

Russian language, MorphoRuEval-2017 test dataset, accuracy

DomainFull tagPoS tagF.t. + lemmaSentence f.t.Sentence f.t.l.
Lenta (news)96.31%98.01%92.96%77.93%52.79%
VK (social)95.20%98.04%92.06%74.30%60.56%
JZ (lit.)95.87%98.71%90.45%73.10%43.15%
All95.81%98.26%N/A74.92%N/A

English language, UD EWT test, accuracy

DatasetFull tagPoS tagF.t. + lemmaSentence f.t.Sentence f.t.l.
UD EWT test91.57%94.10%87.02%63.17%50.99%

Speed and memory consumption

Speed: from 200 to 600 words per second using CPU.

Memory consumption: about 500-600 MB for single-sentence predictions

Install

pip install rnnmorph

Usage

Example: Open In Colab

from rnnmorph.predictor import RNNMorphPredictor
predictor = RNNMorphPredictor(language="ru")
forms = predictor.predict(["мама", "мыла", "раму"])
print(forms[0].pos)
>>> NOUN
print(forms[0].tag)
>>> Case=Nom|Gender=Fem|Number=Sing
print(forms[0].normal_form)
>>> мама
print(forms[0].vector)
>>> [0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1]

Training

Simple model training: Open In Colab

Acknowledgements

Jump To