subtlex-word-frequencies
subtlex-word-frequencies
npm i subtlex-word-frequencies
subtlex-word-frequencies

subtlex-word-frequencies

A list of words from the SUBTLEX movie subtitles corpus, sorted by frequency.

by words

2.0.0 (see all)License:ISCTypeScript:Not Found
npm i subtlex-word-frequencies
Readme

subtlex-word-frequencies

Build Downloads Size

List of 74,286 words sorted by frequency of use in spoken English.

The word counts are derived from SUBTLEXus, a corpus of American English subtitles of movies.

Install

npm:

npm install subtlex-word-frequencies

Use

var subtlex = require('subtlex-word-frequencies')

console.log(words.length)

console.log(words.slice(0, 3))

console.log(words.filter(d => d.word.match(/chick/)).slice(0, 5))

Yields:

74286
[
  {word: 'you', count: 2134713},
  {word: 'I', count: 2038529},
  {word: 'the', count: 1501908}
]
[
  {word: 'chicken', count: 3148},
  {word: 'chick', count: 1334},
  {word: 'chicks', count: 742},
  {word: 'chickens', count: 520},
  {word: 'chickenshit', count: 85}
]

API

subtlexWordFrequencies

Array.<Entry> — List of all entries in SUBTLEXus. Each entry has the following properties:

  • word (string) — Unique word (example: git)
  • value (number) — Number of times the word appears in the corpus (example: 101)

word starts with a capital when the word more often starts with an uppercase letter than with a lowercase letter (example: I).

The entire original corpus consists of 51 million words.

License

ISC © Zeke Sikelianos

Downloads/wk

6

GitHub Stars

24

LAST COMMIT

3yrs ago

MAINTAINERS

2

CONTRIBUTORS

2

OPEN ISSUES

0

OPEN PRs

0
VersionTagPublished
2.0.0
latest
3yrs ago
No alternatives found
No tutorials found
Add a tutorial

Rate & Review

100
No reviews found
Be the first to rate