Categories of coherence relations in discourse annotation


  • Merel C.J. Scholman Computational Linguistics and Phonetics, Saarland University
  • Jacqueline Evers-Vermeul Utrecht Institute of LinguisticsOTS, Utrecht University
  • Ted J.M. Sanders Utrecht Institute of Linguistics OTS, Utrecht University



Over the last decennia, annotating discourse coherence relations has gained increasing interest of the linguistics research community. Because of the complexity of coherence relations, there is no agreement on an annotation standard. Current annotation methods often lack a systematic order of coherence relations. In this article, we investigate the usability of the cognitive approach to coherence relations, developed by Sanders et al. (1992, 1993), for discourse annotation. The theory proposes a taxonomy of coherence relations in terms of four cognitive primitives. In this paper, we first develop a systematic, step-wise annotation process. The reliability of this annotation scheme is then tested in an annotation experiment with non-trained, non-expert annotators. An implicit and explicit version of the annotation instruction was created to determine whether the type of instruction influences the annotator agreement. The results show that two of the four primitives, polarity and order of the segments, can be applied reliably by non-trained annotators. The other two primitives, basic operation and source of coherence, are more problematic. Participants using the explicit instruction show higher agreement on the primitives than participants used the implicit instruction. These results are comparable to agreement statistics of other discourse corpora annotated by trained, expert annotators. Given that non-trained, non-expert annotators show similar amounts of agreement, these results indicate that the cognitive approach to coherence relations is a promising method for annotating discourse.