Stopwords for various languages in JSON format. Per Wikipedia:
Stop words are words which are filtered out prior to, or after, processing of natural language data [...] these are some of the most common, short function words, such as the, is, at, which, and on.
You can use all stopwords with stopwords-all.json (keyed by language ISO 639-1 code), or see the below table for individual language stopword files.
There are a total of 50 supported languages:
|Language
|Stopword count
|Filename
|Afrikaans
|51
|af.json
|Arabic
|162
|ar.json
|Armenian
|45
|hy.json
|Basque
|98
|eu.json
|Bengali
|116
|bn.json
|Breton
|126
|br.json
|Bulgarian
|259
|bg.json
|Catalan
|218
|ca.json
|Chinese
|542
|zh.json
|Croatian
|179
|hr.json
|Czech
|346
|cs.json
|Danish
|101
|da.json
|Dutch
|275
|nl.json
|English
|570
|en.json
|Esperanto
|173
|eo.json
|Estonian
|35
|et.json
|Finnish
|772
|fi.json
|French
|606
|fr.json
|Galician
|160
|gl.json
|German
|596
|de.json
|Greek
|75
|el.json
|Hausa
|39
|ha.json
|Hebrew
|194
|he.json
|Hindi
|225
|hi.json
|Hungarian
|781
|hu.json
|Indonesian
|355
|id.json
|Irish
|109
|ga.json
|Italian
|619
|it.json
|Japanese
|109
|ja.json
|Korean
|679
|ko.json
|Latin
|49
|la.json
|Latvian
|161
|lv.json
|Marathi
|99
|mr.json
|Norwegian
|172
|no.json
|Persian
|332
|fa.json
|Polish
|260
|pl.json
|Portuguese
|408
|pt.json
|Romanian
|282
|ro.json
|Russian
|539
|ru.json
|Slovak
|110
|sk.json
|Slovenian
|446
|sl.json
|Somalia
|30
|so.json
|Southern Sotho
|31
|st.json
|Spanish
|577
|es.json
|Swahili
|74
|sw.json
|Swedish
|401
|sv.json
|Thai
|115
|th.json
|Turkish
|279
|tr.json
|Yoruba
|60
|yo.json
|Zulu
|29
|zu.json
Copyright (c) 2017 Peter Graham, contributors. Released under the Apache-2.0 license.