OpenAttack is an open-source Python-based textual adversarial attack toolkit, which handles the whole process of textual adversarial attacking, including preprocessing text, accessing the victim model, generating adversarial examples and evaluation.
OpenAttack has following features:
OpenAttack has a wide range of uses, including:
You can either use
pip or clone this repo to install OpenAttack.
pip install OpenAttack
git clone https://github.com/thunlp/OpenAttack.git cd OpenAttack python setup.py install
After installation, you can try running
demo.py to check if OpenAttack works well:
OpenAttack builds in some commonly used text classification models such as LSTM and BERT as well as datasets such as SST for sentiment analysis and SNLI for natural language inference. You can effortlessly conduct adversarial attacks against the built-in victim models on the datasets.
The following code snippet shows how to use a genetic algorithm-based attack model (Alzantot et al., 2018) to attack BERT on the SST dataset:
import OpenAttack as oa # choose a trained victim classification model victim = oa.DataManager.load("Victim.BERT.SST") # choose an evaluation dataset dataset = oa.DataManager.load("Dataset.SST.sample") # choose Genetic as the attacker and initialize it with default parameters attacker = oa.attackers.GeneticAttacker() # prepare for attacking attack_eval = oa.attack_evals.DefaultAttackEval(attacker, victim) # launch attacks and print attack results attack_eval.eval(dataset, visualize=True)
The following code snippet shows how to use the genetic algorithm-based attack model to attack a customized sentiment analysis model (a statistical model built in NLTK) on SST.
import OpenAttack as oa import numpy as np from nltk.sentiment.vader import SentimentIntensityAnalyzer # configure access interface of the customized victim model class MyClassifier(oa.Classifier): def __init__(self): self.model = SentimentIntensityAnalyzer() # access to the classification probability scores with respect input sentences def get_prob(self, input_): rt =  for sent in input_: rs = self.model.polarity_scores(sent) prob = rs["pos"] / (rs["neg"] + rs["pos"]) rt.append(np.array([1 - prob, prob])) return np.array(rt) # choose the costomized classifier as the victim model victim = MyClassifier() # choose an evaluation dataset dataset = oa.DataManager.load("Dataset.SST.sample") # choose Genetic as the attacker and initialize it with default parameters attacker = oa.attackers.GeneticAttacker() # prepare for attacking attack_eval = oa.attack_evals.DefaultAttackEval(attacker, victim) # launch attacks and print attack results attack_eval.eval(dataset, visualize=True)
OpenAttack incorporates many handy components which can be easily assembled into new attack model.
Here gives an example of how to design a simple attack model which shuffles the tokens in the original sentence.
OpenAttack can easily generate adversarial examples by attacking instances in the training set, which can be added to original training data set to retrain a more robust victim model, i.e., adversarial training.
Here gives an example of how to conduct adversarial training with OpenAttack.
OpenAttack supports designing a customized adversarial attack evaluation metric.
Here gives an example of how to add BLEU score as a customized evaluation metric to evaluate adversarial attacks.
According to the level of perturbations imposed on original input, textual adversarial attack models can be categorized into sentence-level, word-level, character-level attack models.
According to the accessibility to the victim model, textual adversarial attack models can be categorized into
blind attack models.
TAADPapers is a paper list which summarizes almost all the papers concerning textual adversarial attack and defense. You can have a look at this list to find more attack models.
Currently OpenAttack includes 13 typical attack models against text classification models that cover all attack types.
Here is the list of currently involved attack models.
gradient[pdf] [code] [website]
Following table illustrates the comparison of the attack models.
|GAN||Decision||Sentence||Text generation by encoder-decoder|
|SememePSO||Score||Word||Particle Swarm Optimization-based word substitution|
|TextFooler||Score||Word||Greedy word substitution|
|PWWS||Score||Word||Greedy word substitution|
|Genetic||Score||Word||Genetic algorithm-based word substitution|
|FD||Gradient||Word||Gradient-based word substitution|
|TextBugger||Gradient, Score||Word+Char||Greedy word substitution and character manipulation|
|UAT||Gradient||Word, Char||Gradient-based word or character manipulation|
|HotFlip||Gradient||Word, Char||Gradient-based word or character substitution|
|VIPER||Blind||Char||Visually similar character substitution|
|DeepWordBug||Score||Char||Greedy character manipulation|
Considering the significant distinctions among different attack models, we leave considerable freedom for the skeleton design of attack models, and focus more on streamlining the general processing of adversarial attacking and the common components used in attack models.
OpenAttack has 7 main modules: