nlp

nlplot

Visualization Module for Natural Language Processing

Showing:

Popularity

Downloads/wk

0

GitHub Stars

134

Maintenance

Last Commit

2mos ago

Contributors

9

Package

Dependencies

11

License

MIT License

Categories

Readme

📝 nlplot

nlplot: Analysis and visualization module for Natural Language Processing 📈

Description

Facilitates the visualization of natural language processing and provides quicker analysis

You can draw the following graph

  1. N-gram bar chart
  2. N-gram tree Map
  3. Histogram of the word count
  4. wordcloud
  5. co-occurrence networks
  6. sunburst chart

(Tested in English and Japanese)

Requirement

Installation

pip install nlplot

I've posted on this blog about the specific use. (Japanese)

And, The sample code is also available in the kernel of kaggle. (English)

Quick start - Data Preparation

The column to be analyzed must be a space-delimited string

# sample data
target_col = "text"
texts = [
    "Think rich look poor",
    "When you come to a roadblock, take a detour",
    "When it is dark enough, you can see the stars",
    "Never let your memories be greater than your dreams",
    "Victory is sweetest when you’ve known defeat"
    ]
df = pd.DataFrame({target_col: texts})
df.head()
text
0Think rich look poor
1When you come to a roadblock, take a detour
2When it is dark enough, you can see the stars
3Never let your memories be greater than your dreams
4Victory is sweetest when you’ve known defeat

Quick start - Python API

import nlplot

# target_col as a list type or a string separated by a space.
npt = nlplot.NLPlot(df, target_col='text')

# Stopword calculations can be performed.
stopwords = npt.get_stopword(top_n=30, min_freq=0)

# 1. N-gram bar chart
npt.bar_ngram(title='uni-gram', ngram=1, top_n=50, stopwords=stopwords)
npt.bar_ngram(title='bi-gram', ngram=2, top_n=50, stopwords=stopwords)

# 2. N-gram tree Map
npt.treemap(title='Tree of Most Common Words', ngram=1, top_n=30, stopwords=stopwords)

# 3. Histogram of the word count
npt.word_distribution(title='words distribution')

# 4. wordcloud
npt.wordcloud(stopwords=stopwords, colormap='tab20_r')

# 5. co-occurrence networks
npt.build_graph(stopwords=stopwords, min_edge_frequency=10)
# The number of nodes and edges to which this output is plotted.
# If this number is too large, plotting will take a long time, so adjust the [min_edge_frequency] well.
>> node_size:70, edge_size:166
npt.co_network(title='Co-occurrence network')

# 6. sunburst chart
npt.sunburst(title='sunburst chart', colorscale=True)

Document

TBD

Test

cd tests
pytest

Other

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100