Toxicity detection sensitive to conversational context

Authors

  • Alexandros Xenos Athens University of Economics and Business
  • John Pavlopoulos Athens University of Economics and Business https://orcid.org/0000-0001-9188-7425
  • Ion Androutsopoulos Athens University of Economics and Business
  • Lucas Dixon Google
  • Jeffrey Sorensen Google Jigsaw
  • Léo Laugier Telecom Paris, Institut Polytechnique de Paris

DOI:

https://doi.org/10.5210/fm.v27i5.12285

Keywords:

Natural Language Processing, Abusive Language Detection, Offensive Language Detection

Abstract

User posts whose perceived toxicity depends on conversational context are rare in current toxicity detection datasets. Hence, toxicity detectors trained on existing datasets will also tend to disregard context, making the detection of context-sensitive toxicity harder when it does occur. We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels: (i) annotators considered each post with the previous one as context; and (ii) annotators had no additional context. Based on this, we introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context (previous post) is also considered. We then evaluate machine learning systems on this task, showing that classifiers of practical quality can be developed, and we show that data augmentation with knowledge distillation can improve performance further. Such systems could be used to enhance toxicity detection datasets with more context-dependent posts, or to suggest when moderators should consider parent posts, which often may be unnecessary and may otherwise introduce significant additional costs.

Downloads

Published

2022-09-05

How to Cite

Xenos, A., Pavlopoulos, J., Androutsopoulos, I., Dixon, L., Sorensen, J., & Laugier, L. (2022). Toxicity detection sensitive to conversational context. First Monday, 27(5). https://doi.org/10.5210/fm.v27i5.12285