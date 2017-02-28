Stopwords for various languages in JSON format. Per Wikipedia:

Stop words are words which are filtered out prior to, or after, processing of natural language data [...] these are some of the most common, short function words, such as the, is, at, which, and on.

You can use all stopwords with stopwords-all.json (keyed by language ISO 639-1 code), or see the below table for individual language stopword files.

Languages

There are a total of 50 supported languages:

Language Stopword count Filename Afrikaans 51 af.json Arabic 162 ar.json Armenian 45 hy.json Basque 98 eu.json Bengali 116 bn.json Breton 126 br.json Bulgarian 259 bg.json Catalan 218 ca.json Chinese 542 zh.json Croatian 179 hr.json Czech 346 cs.json Danish 101 da.json Dutch 275 nl.json English 570 en.json Esperanto 173 eo.json Estonian 35 et.json Finnish 772 fi.json French 606 fr.json Galician 160 gl.json German 596 de.json Greek 75 el.json Hausa 39 ha.json Hebrew 194 he.json Hindi 225 hi.json Hungarian 781 hu.json Indonesian 355 id.json Irish 109 ga.json Italian 619 it.json Japanese 109 ja.json Korean 679 ko.json Latin 49 la.json Latvian 161 lv.json Marathi 99 mr.json Norwegian 172 no.json Persian 332 fa.json Polish 260 pl.json Portuguese 408 pt.json Romanian 282 ro.json Russian 539 ru.json Slovak 110 sk.json Slovenian 446 sl.json Somalia 30 so.json Southern Sotho 31 st.json Spanish 577 es.json Swahili 74 sw.json Swedish 401 sv.json Thai 115 th.json Turkish 279 tr.json Yoruba 60 yo.json Zulu 29 zu.json

Sources

License and Copyright

Copyright (c) 2017 Peter Graham, contributors. Released under the Apache-2.0 license.