chatbot-ner

chatbot_ner: Named Entity Recognition for chatbots.

Showing:

Popularity

Downloads/wk

0

GitHub Stars

295

Maintenance

Last Commit

4d ago

Contributors

58

Package

Dependencies

27

License

GNU

Categories

Readme

Named Entity Recognition for chatbots

chatbotner logo

Chatbot NER is an open source framework custom built to supports entity recognition in text messages. After doing thorough research on existing NER systems, team at Haptik felt the strong need of building a framework which is tailored for Conversational AI and also supports Indian languages. Currently Chatbot-ner supports English, Hindi, Gujarati, Marathi, Bengali and Tamil and their code mixed form. Currently this framework uses common patterns along with few NLP techniques to extract necessary entities from languages with sparse data. API structure of Chatbot ner is designed keeping in mind usability for Conversational AI applications. Team at Haptik is continuously working towards porting this framework for all Indian languages and their respective local dialects.

Installation

Detailed documentation on how to setup Chatbot NER on your system using docker is available here.

Supported Entities

Entity typeCode referenceDescriptionexampleSupported languages - ISO 639-1 code
TimeTimeDetectorDetect time from given text.tomorrow morning at 5, कल सुबह ५ बजे, kal subah 5 baje'en', 'hi', 'gu', 'bn', 'mr', 'ta'
DateDateAdvancedDetectorDetect date from given textnext monday, agle somvar, अगले सोमवार'en', 'hi', 'gu', 'bn', 'mr', 'ta'
NumberNumberDetectorDetect number and respective units in given text50 rs per person, ५ किलो चावल, मुझे एक लीटर ऑइल चाहिए'en', 'hi', 'gu', 'bn', 'mr', 'ta'
Phone numberPhoneDetectorDetect phone number in given text9833530536, +91 9833530536, ९८३३४३०५३५'en', 'hi', 'gu', 'bn', 'mr', 'ta'
EmailEmailDetectorDetect email in texthello@haptik.co'en'
TextTextDetectorDetect custom entities in text string using full text search in Datastore or based on contextual modelOrder me a pizza, मुंबई में मौसम कैसा हैSearch supported for 'en', 'hi', 'gu', 'bn', 'mr', 'ta', Contextual model supported for 'en' only
PNRPNRDetectorDetect PNR (serial) codes in given text.My flight PNR is 4SGX3E'en'
regexRegexDetectorDetect entities using custom regex patternsMy flight PNR is 4SGX3ENA

There are other custom detectors such as city, budget shopping size which are derived from above mentioned primary detectors but they are supported currently in English only and limited to Indian users only. We are currently in process of restructuring them to scale them across languages and geography and their current versions might be deprecated in future. So for applications already in production, we would recommend you to use only primary detectors mentioned in the table above.

API structure

Detailed documentation of APIs for all entity types is available here. Current API structure is built for ease of accessing it from conversational AI applications. However, it can be used for other applications also.

Framework Overview

In any conversational AI application, there are several entities to be identified and logic for detection on one entity might be different from other. We have organised this repository as shown below

entity hierarchy

We have classified entities into four main types i.e. numeral, pattern, temporal and textual.

  • numeral: This type will contain all the entities that deal with the numeral or numbers. For example, number detection, budget detection, size detection, etc.

  • pattern: This will contain all the detection logics where identification can be done using patterns or regular expressions. For example, email, phone_number, pnr, etc.

  • temporal: It will contain detection logics for detecting time and date.

  • textual: It identifies entities by looking at the dictionary. This detection mainly contains detection of text (like cuisine, dish, restaurants, etc.), the name of cities, the location of a user, etc.

Numeral, temporal and pattern have been moved to ner_v2 for language portability with more flexible detection logic. In ner_v1, currently only text entity has language support. We will be moving it to ner_v2 without any major API changes.

Contribution Guidelines

Currently, you can contribute to ner_v2 in Chatbot NER either by adding Training Data or by contributing Detection Patterns in form of regex. We will work on removing few architectural limitations which will ease out process of adding ML models and New Entities in future.

  • Adding Training Data: You can significantly improve detection capabilities of Chatbot NER by simply adding data in csv files. For example, date detection in Hindi and Hinglish can be improved by adding data in csv files mentioned in the image below. You can refer to documentation for date, time and numbers respectively if you wish to contribute. Date Contribution
  • Adding Detection Pattern: You can simply add custom language patterns for different languages by adding simple functions. An example of adding custom pattern for detecting number of people can be referred here.

Please refer to general steps of contribution, approval and coding guidelines mentioned here.

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100