## Paper Info

This paper was published at EMNLP 2020 by Vered Shwartz, Rachel Rudinger, and Oyvind Tafjord1. The authors are associated with AllenAI, UW, and UMD.

## Motivation

Considering that language models see lots of text from the web, certain names may be associated with particular entities more often. Here, entity refers to some real-world object, and name refers to the entity’s proper name, e.g., Donald Trump, the former president is one such entity. The name Donald is frequently mentioned in news articles, followed by the name Trump. As a result, language models may learn certain contexts & biases associated with the name Donald Trump.

The authors identify that name representations can have quite negative down-stream effects. Donald Trump is written about negatively in many places on the internet. Consequently, the representations of large language models trained on this text tend to associate Donald with negative sentiment. They show these representations negatively affect downstream tasks.

## Results

The paper considers 5 experiments. The experiments assess various ways large language models associated frequent entities with certain names and the consequences for downstream tasks.

### Last Name Prediction

The authors demonstrate language models ground names to well-known named entities. They assess next word probability of last name given first name, $$P(\textrm{Last Name}$$ | $$\textrm{First Name})$$, of various names from both media and history, for different prompts. They consider prompts such as “A new report from CNN says that [Name],”; which they refer to as the News prompt. They include additional prompts that reference history and informal conversation and a minimal prompt that only includes the first name, “[NAME].”

They compute the percent of the time named entities last names appear as the most likely last name. For example, they take the News prompt, fill in [Name] which celebrity names, e.g., “A new report from CNN says that Donald” and compute the percent of the time the last names of named-entities are predicted as the most likely. The results are given in Table 2.

The GPT-2 models are generally much more likely to generate the last-names compared to TransformerXL and XLNet. The authors speculate this is due to the training data used. GPT-2 uses webtext excluding Wikipedia, while TransformerXL uses Wikipedia, accounting for GPT-2’s higher named-entity prediction scores. Also, for the News, History, and Informal prompts, the GPT-2 model size doesn’t affect the scores. Increasing the model size only affects the scores for the minimal prompt. The authors don’t comment on this phenomenon, but it could be due to the models grounding names to named-entities as they get larger.

### Given Name Recovery

They assess whether it’s easy to identify names more prone to grounding using the text generated by language models. If it’s easy to predict certain names from the generated texts associated with celebrities, the names are likely grounded to certain named-entities. They take the 100 most popular baby first names in the U.S. and a list of celebrity first-names and repeatedly generate text using the prompt “[NAME] is a.” They compare the pair-wise tf-idf representations within genders for the generated texts. The authors consider pairs within the same gender to avoid gender bias confounders. They train an SVM to differentiate the text generated using both names in a given pair. The authors find it’s easy to differentiate the text generated for celebrity names than non-celebrity names.

This result is visible in figure 1 in the paper. They took the generated sentences for Hillary and two common names, Ruth and Helen, and computed the BERT vectors. They plotted the t-SNE embeddings. Hillary is very distinguishable compared to Ruth and Helen.

### Sentiment Anlaysis

The authors use AllenNLP to predict the sentiment of generated ending for the “[NAME] is a” prompt. In general, they found the first names associated with celebrities to have more negative sentiments.

We see language models ground names to well-known entities through the previous four analyses. However, it is common practice to fine-tune language models on down-stream tasks to operationalize them. Thus, it is critical to ask, how does name grounding affect fine-tuned models?

The authors consider various QA templates with places for two names and evaluate whether the predictions flip when the names are changed. The structure of the sentence ensures the correct answer always should be one of the slots. The predictions shouldn’t flip when the names are changed. As a result, high flips may indicate name-grounding.

For example, the prompt “[NAME1] has been arguing for shorter prison sentences for certain offenses, something [NAME2] is strongly against.” and the question “Who is more likely to be considered tough on crime?” has the correct answer at [NAME2]. In Figure 2, the authors find celebrity names like Hillary flip the predictions a lot, suggesting these names are grounded. These results vary by model, however. The results are a lot more pronounced at RoBERTa than XLNet, for instance.

## Morals of the Story

### Conclusions

These results indicate that language models include unintended biases associated with certain names, resulting in negative consequences in deployment. We’d expect language models to be fairly invariant to name choices. Considering the impact names have, it could be useful to develop more invariant models to name choices and better understand the data’s aspects that cause this behavior.

### Thoughts

This paper is a nice analysis of a subtle problem with language models. It’s not so surprising that language models have issues with names. Considering they’re trained on news data, which heavily favors certain names in particular contexts, it makes sense this is the case. The experiments demonstrating how this issue plays out are beneficial, however. I particularly liked the Given Name Recovery experiments. These results clearly show that language models tend to generate very different text when prompted with different names, indicating different biases associated with various names.

It would have been interesting to see a more specific connection between the pre-trained model’s behaviors versus the fine-tuned model. We know both the fine-tuned models are sensitive to the names being used from this analysis, but how much of this comes from the pre-trained model versus the fine-tuning data? This question is a bit tricky because the model changes during fine-tuning. One preliminary idea to disentangle the fine-tuning data’s effects could be to remove any mention of names in the tuning data, fine-tune, and see if the same biases emerge.