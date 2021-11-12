Profanity filter, based on "Shutterstock" dictionary
// npm
npm install leo-profanity
npm install leo-profanity --no-optional # install only English bad word dictionary
// yarn
yarn add leo-profanity
yarn add leo-profanity --ignore-optional # install only English bad word dictionary
// Bower
bower install leo-profanity
// dictionary/default.json
// support langs
// - en
// - fr
var filter = require('leo-profanity');
// replace current dictionary with the french one
filter.loadDictionary('fr');
// replace dictionary with the default one (same as filter.reset())
filter.loadDictionary();
// return all profanity words (Array.string)
filter.list();
Check out more cases on filter.clean
// output: true
filter.check('I have boob');
// no bad word
// output: I have 2 eyes
filter.clean('I have 2 eyes');
// normal case
// output: I have ****, etc.
filter.clean('I have boob, etc.');
// case sensitive
// output: I have ****
filter.clean('I have BoOb');
// separated by comma and dot
// output: I have ****.
filter.clean('I have BoOb.');
// multi occurrence
// output: I have ****,****, ***, and etc.
filter.clean('I have boob,boob, ass, and etc.');
// should not detect unspaced-word
// output: Buy classic watches online
filter.clean('Buy classic watches online');
// clean with custom replacement-character
// output: I have ++++
filter.clean('I have boob', '+');
// support "clear letter" in the beginning of the word
// output: I have bo++
filter.clean('I have boob', '+', 2);
// add word
filter.add('b00b');
// add word's array
// check duplication automatically
filter.add(['b00b', 'b@@b']);
// remove word
filter.remove('b00b');
// remove word's array
filter.remove(['b00b', 'b@@b']);
Reset word list by using default dictionary (also remove word that manually add)
Clear all profanity words
This project decide to split it into 2 parts,
Sanitize and
Filter
and these below is a interesting algorithms.
Attempt 1 (1.1): convert all into lower string
Advantage:
- simple
Disadvantage:
- none
Attempt 2 (1.2): turn "similar-like" symbol to alphabet
e.g. convert `@` to `a`, `5` and `$` to `s`
Advantage:
- simple + detect some trick word (e.g. @ss, b00b)
Disadvantage:
- "false positive"
- limit user imagination (user cannot play with word)
e.g. joe@ssociallife.com
e.g. user want to try something funny like "a$$a$$in"
Attempt 3 (1.3): replace `.` and `,` with space to separate words
in some sentence, people usually using `.` and `,` to connect / end the sentence
Advantage:
- increase founding possibility
e.g. I like a55,b00b
Disadvantage:
- none
Attempt 1 (2.1): split into array (or using regex, somehow)
using space to split it into array then check by profanity word list
Advantage:
- simple
Disadvantage:
- need proper list
- some "false positive"
e.g. Great tit (https://en.wikipedia.org/wiki/Great_tit)
Attempt 2 (2.2): filter word inside (with or without space)
detect all alphabet that contain "profanity word" (e.g. `thistextisfunnyboobsanda55`)
Advantage:
- simple
- can detect "un-spaced" profanity word
Disadvantage:
- many "false positive"
e.g. http://www.morewords.com/contains/ass/
e.g. Clbuttic mistake (filter mistake)
Summary
So, this project decide to go with 1.1, 1.3 and 2.1. (*note - you can found other attempts in "Reference" section)
add method
clean API
setDictionary function
loadDictionary + French words
getDictionary
proceed method
badWordsUsed method
git add -A to add your changes
npm run commit (don't use
git commit)
git push then create Pull Request
$ npm install -g semantic-release-cli
$ semantic-release-cli setup
Using above command to setup "semantic-release"