emp

empath

analyze text with empath

Showing:

Popularity

Downloads/wk

0

GitHub Stars

230

Maintenance

Last Commit

4yrs ago

Contributors

4

Package

Dependencies

0

License

UNKNOWN

Categories

Readme

Empath is a tool for analyzing text across lexical categories (similar to LIWC), and also generating new lexical categories to use for an analysis. See our paper.

You can install in python via pip:

pip install empath

Then in a python shell, import like this:

from empath import Empath
lexicon = Empath()

Analyze text over all pre-built categories:

lexicon.analyze("he hit the other person", normalize=True)
# => {'help': 0.0, 'office': 0.0, 'violence': 0.2, 'dance': 0.0, 'money': 0.0, 'wedding': 0.0, 'valuable': 0.0, 'domestic_work': 0.0, 'sleep': 0.0, 'medical_emergency': 0.0, 'cold': 0.0, 'hate': 0.0, 'cheerfulness': 0.0, 'aggression': 0.0, 'occupation': 0.0, 'envy': 0.0, 'anticipation': 0.0, 'family': 0.0, 'crime': 0.0, 'attractive': 0.0, 'masculine': 0.0, 'prison': 0.0, 'health': 0.0, 'pride': 0.0, 'dispute': 0.0, 'nervousness': 0.0, 'government': 0.0, 'weakness': 0.0, 'horror': 0.0, 'swearing_terms': 0.0, 'leisure': 0.0, 'suffering': 0.0, 'royalty': 0.0, 'wealthy': 0.0, 'white_collar_job': 0.0, 'tourism': 0.0, 'furniture': 0.0, 'school': 0.0, 'magic': 0.0, 'beach': 0.0, 'journalism': 0.0, 'morning': 0.0, 'banking': 0.0, 'social_media': 0.0, 'exercise': 0.0, 'night': 0.0, 'kill': 0.0, 'art': 0.0, 'play': 0.0, 'computer': 0.0, 'college': 0.0, 'traveling': 0.0, 'stealing': 0.0, 'real_estate': 0.0, 'home': 0.0, 'divine': 0.0, 'sexual': 0.0, 'fear': 0.0, 'monster': 0.0, 'irritability': 0.0, 'superhero': 0.0, 'business': 0.0, 'driving': 0.0, 'pet': 0.0, 'childish': 0.0, 'cooking': 0.0, 'exasperation': 0.0, 'religion': 0.0, 'hipster': 0.0, 'internet': 0.0, 'surprise': 0.0, 'reading': 0.0, 'worship': 0.0, 'leader': 0.0, 'independence': 0.0, 'movement': 0.2, 'body': 0.0, 'noise': 0.0, 'eating': 0.0, 'medieval': 0.0, 'zest': 0.0, 'confusion': 0.0, 'water': 0.0, 'sports': 0.0, 'death': 0.0, 'healing': 0.0, 'legend': 0.0, 'heroic': 0.0, 'celebration': 0.0, 'restaurant': 0.0, 'ridicule': 0.0, 'programming': 0.0, 'dominant_heirarchical': 0.0, 'military': 0.0, 'neglect': 0.0, 'swimming': 0.0, 'exotic': 0.0, 'love': 0.0, 'hiking': 0.0, 'communication': 0.0, 'hearing': 0.0, 'order': 0.0, 'sympathy': 0.0, 'hygiene': 0.0, 'weather': 0.0, 'anonymity': 0.0, 'trust': 0.0, 'ancient': 0.0, 'deception': 0.0, 'fabric': 0.0, 'air_travel': 0.0, 'fight': 0.0, 'dominant_personality': 0.0, 'music': 0.0, 'vehicle': 0.0, 'politeness': 0.0, 'toy': 0.0, 'farming': 0.0, 'meeting': 0.0, 'war': 0.0, 'speaking': 0.0, 'listen': 0.0, 'urban': 0.0, 'shopping': 0.0, 'disgust': 0.0, 'fire': 0.0, 'tool': 0.0, 'phone': 0.0, 'gain': 0.0, 'sound': 0.0, 'injury': 0.0, 'sailing': 0.0, 'rage': 0.0, 'science': 0.0, 'work': 0.0, 'appearance': 0.0, 'optimism': 0.0, 'warmth': 0.0, 'youth': 0.0, 'sadness': 0.0, 'fun': 0.0, 'emotional': 0.0, 'joy': 0.0, 'affection': 0.0, 'fashion': 0.0, 'lust': 0.0, 'shame': 0.0, 'torment': 0.0, 'economics': 0.0, 'anger': 0.0, 'politics': 0.0, 'ship': 0.0, 'clothing': 0.0, 'car': 0.0, 'strength': 0.0, 'technology': 0.0, 'breaking': 0.0, 'shape_and_size': 0.0, 'power': 0.0, 'vacation': 0.0, 'animal': 0.0, 'ugliness': 0.0, 'party': 0.0, 'terrorism': 0.0, 'smell': 0.0, 'blue_collar_job': 0.0, 'poor': 0.0, 'plant': 0.0, 'pain': 0.2, 'beauty': 0.0, 'timidity': 0.0, 'philosophy': 0.0, 'negotiate': 0.0, 'negative_emotion': 0.0, 'cleaning': 0.0, 'messaging': 0.0, 'competing': 0.0, 'law': 0.0, 'friends': 0.0, 'payment': 0.0, 'achievement': 0.0, 'alcohol': 0.0, 'disappointment': 0.0, 'liquid': 0.0, 'feminine': 0.0, 'weapon': 0.0, 'children': 0.0, 'ocean': 0.0, 'giving': 0.0, 'contentment': 0.0, 'writing': 0.0, 'rural': 0.0, 'positive_emotion': 0.0, 'musical': 0.0}

Or over a specific set of categories:

lexicon.analyze("he hit the other person", categories=["violence"])
# => {'violence': 1.0}

By default, Empath will return raw counts, but you can ask it to normalize over words in the document.

lexicon.analyze("he hit the other person", categories=["violence"], normalize=True)
# => {'violence': 0.2}

You can create new lexical categories for analysis using word embeddings in our VSM:

lexicon.create_category("colors",["red","blue","green"])
# => ["blue", "green", "purple", "purple", "green", "yellow", "red", "grey", "violet", "gray", "blue", "orange", "white", "pink", "yellow", "black", "brown", "brown", "red", "aqua", "turquoise", "blue_color", "colored", "color", "same_shade", "violet", "gray", "grey", "teal", "nice_shade", "coloured", "forest_green", "colored", "different_shade", "colour", "sparkly", "reddish", "beautiful_shade", "greenish", "indigo", "darker_shade", "emerald", "lovely_shade", "tints", "crimson", "dark_purple", "pink", "emerald", "sapphire", "golden", "lighter_shade", "lime_green", "coloured", "bright", "same_color", "specks", "red", "golden_color", "different_shades", "chocolate_brown", "orange", "bluish", "green", "deep_purple", "magenta", "green_color", "dark_shade", "bright_orange", "milky", "lilac", "light_brown", "sparkling", "golden_brown", "silvery", "baby_blue", "blood_red", "pink", "teal", "blue", "yellowish", "turquoise", "same_colour", "sparkly", "aquamarine", "black_color", "white", "cerulean", "perfect_shade", "dark", "speckled", "charcoal", "greyish", "midnight_blue", "emerald_green", "deep_brown", "ocean_blue", "flecks", "amber", "pinkish", "jet_black"]

Then analyze with those categories:

lexicon.analyze("My favorite color is blue", categories=["colors"], normalize=True)
# => {'colors': 0.4}

Right now Empath has three different models you can use to create categories: fiction, nytimes, and reddit. (I'm working on integrating all the different models soon). For now, they have different strengths and weaknesses in terms of generating categories. Nytimes would be better for something like the cold war:

lexicon.create_category("cold_war", ["cold_war"], model="nytimes")
# => ["cold_war", "the_cold_war", "the_Cold_War", "war", "Soviet_threat", "the_end_of_the_cold_war", "Communism", "world_war", "Soviet_empire", "Soviet_power", "Communism", "gulf_war", "Soviet_bloc", "the_Soviet_Union", "communism", "superpowers", "nuclear_age", "nuclear_war", "Soviet_system", "evil_empire", "Soviets", "wars", "arms_race", "Indochina", "detente", "Iran-Iraq_war", "Persian_Gulf_war", "American_power", "new_world_order", "American_involvement", "wartime", "American_foreign_policy", "American_occupation", "the_Soviet_Union's", "Soviet_Communism", "nuclear_arms_race", "the_Korean_War", "military_power", "Persian_Gulf_war", "great_powers", "Marshall_Plan", "the_Second_World_War", "Communist_rule", "the_Warsaw_Pact", "Soviet_military", "Reagan_years", "Reagan_era", "Cuban_missile_crisis", "world_wars", "postwar_period", "Communist_world", "military-industrial_complex", "perestroika", "superpower", "new_war", "Desert_Storm", "space_race", "Mikhail_Gorbachev", "Communist_system", "World_War_II", "nation-building", "the_Vietnam_War", "dictatorship", "South_Vietnam", "Iron_Curtain", "diplomacy", "old_Soviet_Union", "military_buildup", "containment", "German_unification", "Balkans", "gulf_crisis", "revolution", "last_war", "Soviet_era", "dictatorships", "warfare", "glasnost", "Soviet_state", "Communist_regimes", "domestic_politics", "Khrushchev", "American_diplomacy", "postwar_era", "Soviet_economy", "peacetime", "Korean_peninsula", "Allies", "Soviet-American_relations", "cold_war_era", "space_program", "Soviet_occupation", "arms_control", "Soviet_leaders", "World_War_I", "Western_alliance", "military_strategy", "quagmire", "regime", "fascism"]

You can adjust the size of the requested categories. You may not always get a bigger category when you ask for it because we're still filtering on a minimum cosine similarity.

lexicon.create_category("cold_war", ["cold_war"], model="nytimes", size=300)
# => ["cold_war", "the_cold_war", "the_Cold_War", "war", "Soviet_threat", "the_end_of_the_cold_war", "Communism", "world_war", "Soviet_empire", "Soviet_power", "Communism", "gulf_war", "Soviet_bloc", "the_Soviet_Union", "communism", "superpowers", "nuclear_age", "nuclear_war", "Soviet_system", "evil_empire", "Soviets", "wars", "arms_race", "Indochina", "detente", "Iran-Iraq_war", "Persian_Gulf_war", "American_power", "new_world_order", "American_involvement", "wartime", "American_foreign_policy", "American_occupation", "the_Soviet_Union's", "Soviet_Communism", "nuclear_arms_race", "the_Korean_War", "military_power", "Persian_Gulf_war", "great_powers", "Marshall_Plan", "the_Second_World_War", "Communist_rule", "the_Warsaw_Pact", "Soviet_military", "Reagan_years", "Reagan_era", "Cuban_missile_crisis", "world_wars", "postwar_period", "Communist_world", "military-industrial_complex", "perestroika", "superpower", "new_war", "Desert_Storm", "space_race", "Mikhail_Gorbachev", "Communist_system", "World_War_II", "nation-building", "the_Vietnam_War", "dictatorship", "South_Vietnam", "Iron_Curtain", "diplomacy", "old_Soviet_Union", "military_buildup", "containment", "German_unification", "Balkans", "gulf_crisis", "revolution", "last_war", "Soviet_era", "dictatorships", "warfare", "glasnost", "Soviet_state", "Communist_regimes", "domestic_politics", "Khrushchev", "American_diplomacy", "postwar_era", "Soviet_economy", "peacetime", "Korean_peninsula", "Allies", "Soviet-American_relations", "cold_war_era", "space_program", "Soviet_occupation", "arms_control", "Soviet_leaders", "World_War_I", "Western_alliance", "military_strategy", "quagmire", "regime", "fascism", "socialism", "Vietnam", "totalitarianism", "new_Europe", "American_leadership", "long_war", "World_War_II.", "colonial_rule", "the_Persian_Gulf_war", "atom_bomb", "NATO_alliance", "world_affairs", "military_threat", "home_front", "Western_Europe", "Eastern_Europe", "German_reunification", "glasnost", "Stalin", "Iraq_war", "Reagan_Presidency", "military_might", "American_policy", "colonialism", "major_war", "East-West_relations", "Soviet_history", "Soviet_rule", "Russians", "the_Gulf_War", "Atlantic_alliance", "the_Bay_of_Pigs", "democracies", "coups", "old_order", "Islamic_world", "Soviet_leadership", "unification", "Stalinism", "nuclear_threat", "Vietnam_era", "the_Afghan_war", "Gorbachev_era", "the_Vietnam_war", "American_President", "American_military_power", "Western_powers", "American_Government", "Soviet_domination", "foreign_policy", "military_establishment", "new_thinking", "Communist_regime", "Communist_era", "militarism", "isolationism", "the_Persian_Gulf", "first_gulf_war", "upheavals", "Saddam_Hussein's", "reunification", "Second_World_War", "Reagan_Administration", "Eastern_Europe's", "disintegration", "empires", "American_strategy", "civil_war", "Soviet_society", "Western_democracies", "common_enemy", "Communist_state", "Korean_Peninsula", "New_Deal", "the_Marshall_Plan", "Berlin_wall", "American_influence", "American_president", "Communist_dictatorship", "political_struggle", "the_Reagan_Administration", "American_public_opinion", "military_victory", "American_policy_makers", "Central_Europe", "modern_history"]

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100
No reviews found
Be the first to rate

Alternatives

No alternatives found

Tutorials

No tutorials found
Add a tutorial