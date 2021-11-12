Profanity filter, based on "Shutterstock" dictionary

Installation

// npm npm install leo-profanity npm install leo-profanity // yarn yarn add leo-profanity yarn add leo-profanity // Bower bower install leo-profanity // dictionary / default .json

Example usage for npm

var filter = require ( 'leo-profanity' );

filter.loadDictionary( 'fr' ); filter.loadDictionary();

filter.list();

Check out more cases on filter.clean

filter.check( 'I have boob' );

filter.clean( 'I have 2 eyes' ); filter.clean( 'I have boob, etc.' ); filter.clean( 'I have BoOb' ); filter.clean( 'I have BoOb.' ); filter.clean( 'I have boob,boob, ass, and etc.' ); filter.clean( 'Buy classic watches online' ); filter.clean( 'I have boob' , '+' ); filter.clean( 'I have boob' , '+' , 2 );

filter.add( 'b00b' ); filter.add([ 'b00b' , 'b@@b' ]);

filter.remove( 'b00b' ); filter.remove([ 'b00b' , 'b@@b' ]);

Reset word list by using default dictionary (also remove word that manually add)

Clear all profanity words

Algorithm

This project decide to split it into 2 parts, Sanitize and Filter and these below is a interesting algorithms.

Sanitize

Attempt 1 ( 1.1 ): convert all into lower string Advantage: - simple Disadvantage: - none Attempt 2 ( 1.2 ): turn "similar-like" symbol to alphabet e.g. convert `@` to `a` , `5` and `$` to `s` Advantage: - simple + detect some trick word (e.g. , b00b) Disadvantage: - "false positive" - limit user imagination (user cannot play with word) e.g. joe .com e.g. user want to try something funny like "a$$a$$in" Attempt 3 ( 1.3 ): replace `.` and `,` with space to separate words in some sentence, people usually using `.` and `,` to connect / end the sentence Advantage: - increase founding possibility e.g. I like a55,b00b Disadvantage: - none

Filter

Attempt 1 ( 2.1 ): split into array (or using regex, somehow) using space to split it into array then check by profanity word list Advantage: - simple Disadvantage: - need proper list - some "false positive" e .g . Great tit (https: Attempt 2 ( 2.2 ): filter word inside (with or without space) detect all alphabet that contain "profanity word" (e .g . `thistextisfunnyboobsanda55`) Advantage: - simple - can detect "un-spaced" profanity word Disadvantage: - many "false positive" e .g . http: e .g . Clbuttic mistake ( filter mistake)

Summary

We don't know all methods that can produce profanity word (e.g. how many different ways can you enter a55 ?)

There have a non-algorithm-based approach to achieve it (yet)

People will always find a way to connect with each other (e.g. Leet)

So, this project decide to go with 1.1, 1.3 and 2.1. (*note - you can found other attempts in "Reference" section)

TODO

Other languages

Javascript on npmjs.com/package/leo-profanity

Javascript on npmjs.com/package/leo-profanity PHP on packagist.org/packages/jojoee/leo-profanity

PHP on packagist.org/packages/jojoee/leo-profanity Python on pypi.org/project/leoprofanity

Python on pypi.org/project/leoprofanity Java on Maven

Java on Maven Wordpress on wordpress.org

Contribute

Fork the repo Install Node.js and dependencies Make a branch for your change and make your changes Run git add -A to add your changes Run npm run commit (don't use git commit ) Push your changes with git push then create Pull Request

Contribute for owner

$ npm install -g semantic- release -cli $ semantic- release -cli setup Using above command to setup "semantic-release"

