pt

preprocess-tweets

Clean tweets and makes them ready for training

Showing:

Popularity

Downloads/wk

4

GitHub Stars

4

Maintenance

Last Commit

4yrs ago

Contributors

0

Package

Dependencies

1

Size (min+gzip)

2.3KB

License

MIT

Type Definitions

Tree-Shakeable

No?

Categories

Readme

Preprocessing of tweets

This module can be used for easier preparation of training twitter data. It removes:

  • mentions
  • links
  • emojis
  • keyword RT
  • sentences, which contain single word
  • some special characters

There is an option to filter whether you want to remove URLs, mentions and emojis.

The default option is:

var filter = {
    "mentions": true,
    "links": true,
    "emojis": true
}

For example:

The tweet:

New @Imaginedragons song 'Whatever It Takes' and a new album 'Evolve'. I'm so #excited this song is incredible ❤️ https://t.co/PS9NM4pTBQ

Will become:

New song 'Whatever It Takes' and a new album 'Evolve'. I'm so excited this song is incredible 

Install

npm install preprocess-tweets

Prequisits

The file with the extracted tweets shuold be txt file, containing one tweet per row.

Example

In this example the URLs won't be deleted.

var preprocessing = require('preprocess-tweets')

var file = './originalFile.txt';
var writeFile = './modifiedFile.txt'

var filter = {
    "mentions": true,
    "links": false,
    "emojis": true
}

preprocessing.clean(file, writeFile, JSON.stringify(filter))

The result will be new file, containing the modified tweets.

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100