chakin

Simple downloader for pre-trained word vectors

Showing:

Popularity

Downloads/wk

0

GitHub Stars

324

Maintenance

Last Commit

2yrs ago

Contributors

4

Package

Dependencies

3

License

MIT

Categories

Readme

chakin

chakin is a downloader for pre-trained word vectors. Supported many vectors

This library lets you download pre-trained word vectors without troublesome work.



Installation

To install chakin, simply:

$ pip install chakin

Usage

You can download pre-trained word vectors as follows:

$ python
>>> import chakin
>>> chakin.search(lang='English')
                   Name  Dimension                     Corpus VocabularySize  
2          fastText(en)        300                  Wikipedia           2.5M   
11         GloVe.6B.50d         50  Wikipedia+Gigaword 5 (6B)           400K   
12        GloVe.6B.100d        100  Wikipedia+Gigaword 5 (6B)           400K   
13        GloVe.6B.200d        200  Wikipedia+Gigaword 5 (6B)           400K   
14        GloVe.6B.300d        300  Wikipedia+Gigaword 5 (6B)           400K   
15       GloVe.42B.300d        300          Common Crawl(42B)           1.9M   
16      GloVe.840B.300d        300         Common Crawl(840B)           2.2M   
17    GloVe.Twitter.25d         25               Twitter(27B)           1.2M   
18    GloVe.Twitter.50d         50               Twitter(27B)           1.2M   
19   GloVe.Twitter.100d        100               Twitter(27B)           1.2M   
20   GloVe.Twitter.200d        200               Twitter(27B)           1.2M   
21  word2vec.GoogleNews        300          Google News(100B)           3.0M 

>>> chakin.download(number=2, save_dir='./') # select fastText(en)
Test: 100% ||               | Time: 0:00:02  60.7 MiB/s
'./wiki.en.vec'

Supported vectors

So far, chakin supports following word vectors:

NameDimensionCorpusVocabularySizeMethodLanguage
fastText(ar)300Wikipedia610KfastTextArabic
fastText(de)300Wikipedia2.3MfastTextGerman
fastText(en)300Wikipedia2.5MfastTextEnglish
fastText(es)300Wikipedia985KfastTextSpanish
fastText(fr)300Wikipedia1.2MfastTextFrench
fastText(it)300Wikipedia871KfastTextItalian
fastText(ja)300Wikipedia580KfastTextJapanese
fastText(ko)300Wikipedia880KfastTextKorean
fastText(pt)300Wikipedia592KfastTextPortuguese
fastText(ru)300Wikipedia1.9MfastTextRussian
fastText(zh)300Wikipedia330KfastTextChinese
GloVe.6B.50d50Wikipedia+Gigaword 5 (6B)400KGloVeEnglish
GloVe.6B.100d100Wikipedia+Gigaword 5 (6B)400KGloVeEnglish
GloVe.6B.200d200Wikipedia+Gigaword 5 (6B)400KGloVeEnglish
GloVe.6B.300d300Wikipedia+Gigaword 5 (6B)400KGloVeEnglish
GloVe.42B.300d300Common Crawl(42B)1.9MGloVeEnglish
GloVe.840B.300d300Common Crawl(840B)2.2MGloVeEnglish
GloVe.Twitter.25d25Twitter(27B)1.2MGloVeEnglish
GloVe.Twitter.50d50Twitter(27B)1.2MGloVeEnglish
GloVe.Twitter.100d100Twitter(27B)1.2MGloVeEnglish
GloVe.Twitter.200d200Twitter(27B)1.2MGloVeEnglish
word2vec.GoogleNews300Google News(100B)3.0Mword2vecEnglish
word2vec.Wiki-NEologd.50d50Wikipedia335Kword2vec + NEologdJapanese

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100