Corpus-driven Semantics of Concession: Where do Expectations Come from?∗

Concession is one of the trickiest semantic discourse relations appearing in natural language. Many have tried to sub-categorize Concession and to define formal criteria to both distinguish its subtypes as well as for distinguishing Concession from the (similar) semantic relation of Contrast. But there is still a lack of consensus among the different proposals. In this paper, we focus on those approaches, e.g. Lagerwerf (1998), Winter and Rimon (1994), and Korbayova and Webber (2007), assuming that Concession features two primary interpretations, “direct” and “indirect”. We argue that this two way classification falls short of accounting for the full range of variants identified in naturally occurring data. Our investigation of one thousand Concession tokens in the Penn Discourse Treebank (PDTB) reveals that the interpretation of concessive relations varies according to the source of expectation. Four sources of expectation are identified. Each is characterized by a different relation holding between the eventuality that raises the expectation and the eventuality describing the expectation. We report a) a reliable inter-annotator agreement on the four types of sources identified in the PDTB data, b) a significant improvement on the annotation of previous disagreements on Concession-Contrast in the PDTB and c) a novel logical account of Concession using basic constructs from Hobbs (1998)’s logic. Our proposal offers a uniform framework for the interpretation of Concession while accounting for the different sources of expectation by modifying a single predicate in the proposed formulae.


Introduction
Our previous work on the semantic annotation of discourse relations in the Penn Discourse Treebank ) confirmed the well-known problem of achieving satisfactory interannotator agreement at the discourse level. As we move deeper into making semantic distinctions between eventualities participating in discourse relations, it becomes increasingly even more difficult to achieve reliability, even for well-studied relations such as Concession. While the less fine semantic level described as COMPARISON in the PDTB enjoyed high inter-annotator agreement, the distinction between its subtypes, i.e. Contrast and Concession, proved more challenging.
A non-negligible 30% of tokens had to be adjudicated because in these instances one annotator picked Concession and the other Contrast. This was surprising given the reasonably clear semantic definition of the PDTB labels (cf Prasad et al. (2008) or Section 2 below).
A careful analysis of the instances of disagreement as well as the examples studied so far in the literature revealed that a lot of linguistic variants could convey Concession and Contrast. Of course, such a high variability makes the distinction between the two classes, or between their subtypes, rather unclear, especially when it is carried out by non-expert annotators.
The research that we report in this paper is motivated by our belief that data driven representations can help us develop more precise formalization of semantic distinctions known to present challenges for annotators. Precise semantic distinctions guided by naturally occurring data will help us to build semantic representations, the reliability of which can be empirically tested. We summarize our research goals below.
(1) a. How can the study of discourse relations as attested in naturally occurring data improve our understanding of the semantics of discourse relations?
b. What kind of semantic representation will allow covering the rich range of variants conveying Concession and Contrast?
To address these questions, our research methodology is guided by a hybrid theoretical and empirical approach. We develop formal semantic representations of discourse relations based on an analysis of large scale empirical data. Specifically, we analyze the semantic tagging and interannotator agreement of the discourse relations marked in the Penn Discourse Treebank (Prasad et al. (2008)). The Penn Discourse Treebank 2.0 is, to date, the largest annotation effort at the discourse level, including approximately 40,000 annotations of discourse connectives and their arguments, sense labels, and speaker attribution.
In the PDTB, sense labels are grouped in four basic types of semantic relations: a) TEMPO-RAL, b) CONTINGENCY, c) COMPARISON, and d) EXPANSION. Each category has types and subtypes. The full hierarchy of senses used in the PDTB is illustrated in Prasad et al. (2008) and . As mentioned above, our focus in this paper is on the distinction between Concession and Contrast, the two subtypes of the COMPARISON relation.
As is natural when the body of the literature is large and coming from different disciplines, the interpretation of concessive relations has been addressed from several viewpoints. Mann and Thompson (1988)'s influential Rhetorical Structure Theory (RST) views relations from a functional perspective. The proposed interpretation includes the speaker's intention and the effect that the relation is intended to achieve on the hearer. Grote et al. (1995) implement the theoretical insights from RST and other similarly minded proposals in the same vein into a real natural language generation (NLG) system designed to generate concessive sentences from formal representations of the speaker's beliefs and communicative intentions.
Following Moore and Pollack (1992), we recognize the distinction between the intentional and informational levels of interpretation and find it problematic that the RST presumes a single relation between two discourse segments, thus conflating this distinction.
In this spirit, our work extends prior work in sub-classifying discourse relations and developing formal representations of the identified classes.
Our analysis revealed that concessive relations differ according to the source of expectation. Specifically, we identified four distinct sources of expectation: Causality, Implication, Correlation, and Implicature.
The reliability of the proposed categories was evaluated with a study of inter-annotator agreement. In addition to confirming the reliability of the proposed distinctions, we evaluated the merits of this proposal over existing denial-based approaches which treat all eventualities that trigger expectations uniformly. Specifically, we extracted 200 problematic PDTB tokens that had previously been marked as tokens of inter-annotator disagreement between Concession and Contrast. These tokens were re-annotated by two new annotators. The high inter-annotator agreement in this challenging task provided further evidence for the validity of our proposal.
Finally, we developed a formal account of Concession, grounded on our sub-classification, using basic semantic constructs from Davidson (1967) and Hobbs (1998). The resulting formulae are able to uniformly take into account the semantics of all variants of concessive statements identified in the literature.
The paper is organized as follows. Section 2 gives a brief overview of prior work on Concession and Contrast, while comparing the two classes. Section 3 reviews logical accounts that have been proposed to model concessive interpretations, while Section 4 highlights some important questions that have remained open. Section 5 illustrates examples taken from PDTB that convinced us to further classify Concession into four subtypes, depending on how expectations are created. On the other hand, Section 6 reports the results of two empirical investigations carried out on PDTB instances that seems to support our analysis. In Section 7, we present, briefly, the basic semantic constructs that we use from Hobbs's logic and outline in detail our semantic account for all but one source of expectation. The source of expectation we do not encompass in our approach is 'Implicature'. That requires pragmatic reasoning and so it is left for future work. We conclude in Section 8.

Concession and Contrast
Concession is a particular relation holding between the interpretation of one clausal argument that creates an expectation and another clausal argument which denies it. In English, typical discourse connectives conveying Concession are 'but', 'although', 'however', 'yet', and 'nevertheless'. Concessive discourse connectives are, of course, available in other languages (König (1983)), which also have specialized words or even inflections to mark concessive relations, c.f. Dascal and Katriel (1977), Horn (1989), and Lagerwerf (1998). According to König (1983), this diversity in the linguistic devices used to express concession suggests that the term 'concessive' does not only express a two-term relation, but also other possible rhetorical uses of the involved clauses. In the same spirit, Grote et al. (1995) identify three rhetorical strategies a concessive construction may be built for: convincing the hearer, preventing false implicatures, and emphasizing surprising events. We investigate the interpretation of discourse connectives only, leaving outside other linguistic or non-linguistic cues that might be used to express concession. Discourse connectives, in English and other languages, may use the same connective to express more than one type of relation. For instance, as observed in the PDTB, 'but' is used to express both Contrast and Concession. In line with prior work (Lakoff (1971), Spooren (1989), Grote et al. (1995), and Kehler (2002), among others), the PDTB adopts the following two definitions of Contrast and Concession 1 : (2) a. Contrast applies when the connective indicates that the two sentence-arguments share a predicate or property and a difference is highlighted with respect to the values assigned to the shared property. For now, let us indicate the relation between A and the expectation C as a kind of 'default Implication', following Winter and Rimon (1994), and we will characterize this defeasible relation later in section 7.2. As pointed out above, Contrast and Concession are the two PDTB types of the higher level semantic category COMPARISON. In the literature, prior work has described semantic classes that would fit under 'COMPARISON' according to the PDTB sense tagset. For the connective "but", specifically, there has been work analyzing its "Corrective" or "Rectification" use (Dascal and Katriel (1977), Lang (1984), Foolen (1991), von Klopp (1994), Winter and Rimon (1994), and others). 'Rectification' arises when the argument of the connective rewrites a predicate, e.g., "John is not American, but British". In PDTB, similar cases are collapsed into the type Contrast and are not our main focus 2 . In this paper, we are interested primarily in Concession. The present section addresses the problem of distinguishing between Concession and Contrast. In the next section, we will address the central research question of the paper: identifying different subtypes of Concession depending on how the expectation is created. Later in the paper, we will provide some evidence that focusing on the source of expectation could also help distinguishing between Concession and Contrast.
Although the definitions in 2, as well as those used in most existing schemes of discourse relations, appear to be rather intuitive, it is possible that the distinction is sometimes hard because it is sensitive to the context. This has been investigated and argued in the work of Lakoff (1971), Anscombre and Ducrot (1979), Lang (1984), Blakemore (1989), Winter and Rimon (1994), and Spenader and Lobanova (2009). An example, taken from Winter and Rimon (1994), is: (3) [John is quick ], but [Bill is slow].
1. All discourse connectives annotated in the PDTB have two arguments. In the examples shown in (2), Argc is shown in boldface, Argd in italics, and the discourse connective is underlined. 2. For a more detailed discussion of contrastive interpretations see Izutsu (2008). According to the instructions given in (2), an annotator should tag (3) as 'Contrast'. In fact, we may identify a common property 'Y is-a-quality-of X', shared by the two arguments' meaning representations, where X is either John or Bill and Y either one of the contrastive values 'quick' and 'slow'. However, as discussed by Winter and Rimon (1994), (3) may be interpreted as Concession in a context where John and Bill belong to a sport team that is known to have only quick players. In this case, the first sentence in (3) is better interpreted as an assertion A which creates the expectation C that all players, including Bill, are quick, while the second sentence in (3) explicitly asserts ¬C.
As can be seen, forming a neat distinction between Contrast and Concession is even harder in real data. Spenader and Lobanova (2009) point out that the following example, marked as 'Concession' in the RST corpus (Carlson et al. (2001)), could be tagged as 'Contrast' by an annotator who takes the brokerage operation and Kidder as parallel elements and the profits/losses as the (symmetric) values with respect to which the parallel elements contrast.
(4) [Its 1,400-member brokerage operation reported an estimated $5 million loss last year ], although [Kidder expects it to turn a profit this year].
According to Winter and Rimon (1994), the context is also responsible for the interpretation of certain concessive utterances, which, in different contexts, would be odd. For instance, (5.a) sounds odd if it is not interpreted in an appropriate context as in (5.b) (Note that we mark the argument that creates the expectation and the one that denies it as Arg c and Arg d respectively). Context and other pragmatic considerations are not only involved in the identification of the "default Implication" that creates the expectation from the meaning of Arg c . Once such Implication is identified, the concessive interpretation may be triggered in different ways. Sweetser (1990) identified three possible ways: Content (or semantic) usage, Epistemic usage, and Speech Act usage. These three classes have been recognized by many authors, among which Lagerwerf (1998) and Lang (2000). Three examples of the classes, from Lagerwerf (1998), are shown in (6) In (6.a), Arg c creates the expectation via the pragmatic "default Implication" if Connor expects little wind, then he uses Kevlar sails. Based on the observation that Theo is gasping for air, the speaker of (6.b) expects him to be exhausted. However, in (6.b), it cannot be deduced, or assumed, that if someone gasps for air, then he is exhausted. Rather, the "default Implication" has the opposite direction: if someone is exhausted, then he gasps for air. It is the fact of being exhausted that causes gasping for air, not the opposite. Lagerwerf (1998) argues that, while Content usage involves a "default Implication" which is derived deductively, Epistemic usage involves one that is derived abductively: from the observations to the causes 3 .
Finally, in (6.c) the expectation is denied by the illocution of Arg d , i.e., its speech act, rather than by its locutionary meaning. It is the fact that I tell you Arg d , and not Arg d itself, that is inconsistent with the expectation, created by the "default Implication" If (I know that) you already know something, I do not tell it to you.
Note that it is possible to distinguish three further sub-cases of Speech Act usage: the concessive relation may involve the illocution of Arg d only, the one of Arg c only, or it may involve both. As argued in Winter and Rimon (1994), an occurrence of Concession holding between the speech acts of both arguments is easily obtained by using imperative mood in both: Lagerwerf (1998) was the first who tried to collect all data and previous analyses, e.g., Lakoff (1971) and Sweetser (1990), and propose a classification of contrastive relations in Natural Language. He distinguished three main classes: Semantic opposition, Denial of expectation, and Concessive opposition 4 . According to Lagerwerf, all classes in (8) may involve speech acts in the arguments, not only Denial of expectation (which we have seen in (6)).
Semantic opposition corresponds to the PDTB definition of Contrast, e.g., (2.a) whereas both Lagerwerf's Denial of expectation and Lagerwerf's Concessive opposition are included in PTDB's definition of Concession. Treating these as sub-types of Concession is an approach that has also been adopted in several other works, Winter and Rimon (1994) and Korbayova and Webber (2007). We focus on this treatment in the next section.
For the present discussion, the critical distinction is between Lagerwerf's Semantic opposition and Concessive opposition that we saw in the examples (8.a) and (8.c), respectively.
With respect to (8.c), note how the preceding question "Shall we go to King Shin?" sets up a context that favors the concessive interpretation. This observation has been empirically verified by Spooren (1989) who conducted an experiment with English speakers who showed a clear tendency to consider China First as the preferred restaurant.
However, as discussed in Lagerwerf (1998), an interpretation of Semantic opposition is, also, possible for (8.c) if a different question is used to set up the context: 3. Abduction is a form of non-monotonic defeasible reasoning. However, note that in case of Concession the deductive counterpart is also defeasible, due to "default Implication". See also section 7.3. 4. In Lagerwerf (1998), (8.c) is indeed termed as 'Concession', not as 'Concessive opposition'. In PDTB 2.0 and other related work Korbayova and Webber (2007), the definition of 'Concession' encompasses both Lagerwerf's 'Denial of expectation' (6.a-c) and Lagerwerf's 'Concession' (8.c). Korbayova and Webber (2007)  Indeed, Lagerwerf (1998) argues that with the wh-question in (9), Semantic opposition is the only available interpretation; two restaurants are compared with respect to their properties.
From the discussion above it becomes clear that, in some cases, the crucial distinction between a concessive versus contrastive interpretation is entirely context dependent. Different contexts, e.g., an introductory wh-versus yes/no question, enables different interpretations of the same statements, focusing on a symmetric versus asymmetric perspective with respect to the compared items.
Furthermore, both Concessive and Contrastive connectives compare two clausal arguments and highlight some kind of disagreement between the facts they each assert. Contrast is a symmetric relation between two opposite facts. Concession, on the other hand, features some kind of "directionality": one of the two arguments asserts a fact that triggers an expectation and the other argument overrides it later. Since context strongly influences the identification of the asymmetric roles of the arguments, in several cases the distinction could be rather subtle. Although such instances are not pervasive, in the inter-annotator study that we report in Section 6, we observed that in some cases, both a contrastive and a concessive interpretation could be built for the same token.

Concessive interpretations
In the previous section, we discussed prior work on Concession with a focus on its difference from Contrast. In this section, we review the literature on Concession focusing on the various interpretations that have been proposed. Lagerwerf (1998) identified two types of Concession that he termed as Denial of expectation and Concessive opposition. A similar sub-categorization has also been outlined in prior work on the logical formalization of concessive relations, given in Abraham (1991), Winter and Rimon (1994), Korbayova and Webber (2007). The corresponding formalizations model a direct and a less direct relation between the triggered expectation and the content of the textual span that denies it. Korbayova and Webber (2007) explain the distinction giving the two examples in (10). For simplicity, we will refer to them as instances of 'Direct Concession' (10.a) and 'Indirect Concession ' (10.b In (10.a), a general "default Implication" is presupposed, paraphrasable as Beautiful women usually get married. Because of this rule, Arg c directly triggers the expectation that Greta Garbo got married. This expectation is explicitly denied by Arg d .
Example (10.b) is different. In this case, "not having a car" does not imply "not having a bike", i.e., no defeasible rule holds between the two arguments. According to Lagerwerf (1998)'s terminology, in this case we can identify a 'Tertium Comparationis', i.e., a proposition entailed by Arg c with its negation entailed by Arg d . This proposition is presumably "he is mobile". Thus, two general rules can be identified from the arguments to the Tertium Comparationis: not having a car implies being less mobile and having a bike implies being mobile. As discussed in the previous section, the identification of the Tertium Comparationis strongly depends on context, e.g., it could be induced by a suitable introductory question.
How has this basic intuition on the relationship between expectation and denied expectation been formalized? Francez (1995) proposes bilogic which uses two semantic structures, the standard and the actual world. The contrast between the two worlds gives rise to what we characterize here as Concession and models the difference between Concession and Contradiction in terms of whether the statements are evaluated in different worlds. Winter and Rimon (1994) agree with Francez (1995) on the basic intuition but propose to analyze Concession as presupposition failure. They combine presupposition failure with the possibility and necessity operators of modal logic to define the semantics of what they classify as direct ('although', 'even though', 'yet', 'nevertheless') and indirect concessive connectives ('but') 5 . Winter and Rimon (1994)'s formal account of the two cases is shown in (11). In (11), p and q are the propositions denoted by Arg c and Arg d respectively, and is the standard possibility operator defined in modal logic. In case of Direct Contrast (our Direct Concession), the expectation is identified by ¬q, while in case of Indirect Contrast (our Indirect Concession), a third proposition r is assumed to exist, which clearly refers to the Tertium Comparationis. Its existence is implied by q while its negation is implied by p. (11) Note that the '→' is not the standard predicate logic operator of Implication. While the details of its characterization are too complex to review here, note that the '→' denotes an underspecified "cognitive reasoning" relation and it must not be confused with the standard first order logic entailment. In Winter and Rimon (1994), the standard first order logic entailment is formalized via the symbol '⇒'. Several properties are asserted on '→': it is reflexive, it holds that either '→' is not transitive or that '⇒' is not a special case of '→', and, most importantly, '→' is not defeasible. Winter and Rimon (1994) model defeasibility in terms of possible worlds, so that the consequent of '→' holds when its antecedent holds, but not all cognitive implications that may be extracted from the sentence's meaning are asserted in the current world. In order to assert that 'p →¬ q' (and 'p → ¬ r ') only weakly hold, the implication is asserted as possible, via the modal operator ' ', while q (and 'q → r ') are asserted as true.
However, as Winter and Rimon acknowledge, some cases are problematic for such an account. For instance: (12) [John walks slowly ] Argc , but [he walks. ] Argd 5. For Winter and Rimon (1994), although, even though, yet, nevertheless have 'restrictive' meaning and only but is 'non-restrictive'. This one-to-one correspondence between semantic descriptions and connectives breaks quickly when we look at empirical data. In PDTB, several connectives, including 'but' and other concessive connectives have more than one interpretation. The connective but, for example, has been annotated with seven sense tags.
Suppose (12) is uttered in a context where John had a surgical operation. p may be taken as the eventuality "John walks slowly", q as "John walks", and ¬r as the expectation "the operation was not a success". However, "John walks slowly" is clearly a particular case of "John walks" (p ⇒q) and from this we infer that p →r holds in the current state of information. In other words, "John walks slowly" cognitively implies "the operation was a success", which is clearly not the case.
In order to handle such inconsistencies, Winter and Rimon (1994) have to propose further restrictions on '→', in terms of possible worlds. In our account, we propose a formal solution using non-monotonic (default) reasoning, instead of possible world semantics. Such a solution, which is also advocated by Winter and Rimon (1994) as a plausible alternative to their account, does not suffer from the problem exemplified in (12).
Winter and Rimon's 'Direct Contrast' appears to be a viable formalization of Lagerwerf's 'Denial of Expectation'. However, among the three subcases of Denial of Expectation demonstrated above in (6), Winter and Rimon (1994) consider examples of Content Usage and Speech Act Usage only, while remaining silent with respect to occurrences of Epistemic Usage.
Let us now turn our attention to Lagerwerf and how he formalizes his intuitions about non-Content 'Denial of Expectation'. Lagerwerf (1998) proposes associating the sentences (6.b-c), copied here as (13.a-b) for convenience, with the formulae (14.a-b), respectively.
Let us, first, consider the formula in (14.b). K(a, x) means that the agent a knows x while T(a 1 , a 2 , x) asserts that the agent a 1 tells x to the agent a 2 . i and y are two constants referring to the speaker and the hearer respectively. Thus, T(i, y, x) means that the speaker tells x to the hearer. '>' is the defeasible implication operator defined in Asher and Morreau (1991). It corresponds to Winter and Rimon's operator '→' when it is asserted within the scope of the possibility operator ' '. Formula (14.b) states that if someone knows something, I need not say it.
This formalization directly mirrors the intuition about Speech Act Usage and we agree with that. Obviously, the parallel with Winter and Rimon's is obtained by assuming On the other hand, in (14.a), Gfb and Exh are predicates denoting, respectively, the set of individuals gasping for air and the exhausted ones, while B(i, Exh(x)) is an epistemic operator asserting that the speaker believes x to be exhausted.
In the next section, we discuss the main challenges that these frameworks still need to address and we clarify our data-driven approach to meeting these challenges.

Challenges
In the previous section, we looked at basic issues in building the semantics of concession and some of the basic logical accounts that were proposed in the literature, among which Winter and Rimon (1994), Lagerwerf (1998), and Korbayova and Webber (2007).
In this section, we will look a little closer at the challenges that the concession data present to these accounts. We start with a summary of the key points of these accounts: (15) a. These approaches focus on the distinction between 'Direct Concession' and 'Indirect Concession'. In 'Direct Concession', the expectation raised by one argument is explicitly denied by the other. In 'Indirect Concession', a 'Tertium Comparationis' must be first identified, i.e., an intermediate proposition entailed by one argument, whose negation is entailed by the other.
b. The expectation or the Tertium Comparationis is triggered by some kind of default Implication. A proper formalization of such a default Implication has been mostly neglected in literature. In some approaches, e.g., Sanders et al. (1993), it has been argued that it is a causal relation in case of 'Direct Concession' and a comparative one in case of 'Indirect Concession'.
c. As argued in Lagerwerf (1998), a logical account of Concession needs to be general enough to include several variants, featuring expectations that correspond to the Speech Acts of the arguments and/or an abductive, rather than deductive, use of the default Implication.
Drawing from our observations in the instances of Concession in the PDTB, it becomes clear that any theory that recognizes all and only two concessive interpretations will fall short when accounting for real data. On the contrary, we argue that the effort to successfully characterize how expectations are created, rather than how they are denied, is critical.
In other words, we recognize an underlying general principle similar to the ABC-scheme proposed in Grote et al. (1995), which has been the starting point of Korbayova and Webber (2007), where we simply assert that Arg d is inconsistent with the raised expectation. Contrary to Korbayova and Webber (2007), who further develop that principle by specifying subtypes of Concession according to how the expectation is denied, we develop, instead, a deeper analysis on how the expectation is created because, in so doing, we are able to characterize more accurately the "default Implication" mentioned in (15.b).
Of course, we are still able to distinguish between Direct or Indirect Concession, but our approach does not advocate any direct correspondence of logic formulae to Direct/Indirect Concession.
It cannot be maintained that in all cases of Direct Concession the expectation is created by a causal rule. We have seen this even in simple examples such as (16.a-b). Unless we assume an ad-hoc context, it would be odd to assert that being a penguin "causes" not flying and that the fact that John will do his report "causes" the fact that he will do it at home. It would, also, be hard to try to identify a Tertium Comparationis in all cases of Indirect Concession. Indeed, there are cases in which a Tertium Comparationis is not there to be identified. Lagerwerf (1998) suggests that an easy way to identify a Tertium Comparationis is by presenting the utterance as an answer to an appropriate question. But what could be the appropriate questions for utterances (17.a-b)? It is rather hard to interpret (17.a) as an answer to a particular question. Perhaps we may think about a context in which someone asks Is there some pizza left?, and the speaker replies with (17.a). In such a case, a Tertium Comparationis analysis seems possible for (17.a) : Arg c is interpreted by the hearer as a negative comment to the prospect of eating, and Arg d as a (stronger) positive one.
In the case of (17.b), it is even harder to find a context which involves a Tertium Comparationis because the example may be uttered in any context in which someone gives instructions on how to open the computer case.
To account for all the observed instances of Concession, including (17.a-b), we need a more general definition of Indirect Concession. In our account, all cases in which Arg c is insufficient or irrelevant with respect to the satisfaction of speaker's intentions are classified under the term 'Concession-Implicature'. In (17.a), Arg c is irrelevant with respect to the satisfaction of speaker's intentions, i.e., communicating to the hearer that there is some pizza left, and could lead the latter to conclude that there is nothing for him to eat. Analogously, in (17.b), the command in Arg c is insufficient, and could lead the hearer to take wrong actions. The speaker adds then further specifications by uttering Arg d .
On the other hand, in (8.c), Arg c is interpreted by the hearer as a preference for King Tsin over China First, which does not meet the speaker's intentions. However, the fact that the latter has another preference, and so a Tertium Comparationis may be identified, in our view is simply a special instance. If the sentence was modified to "King Tsin has great mu shu pork, but I do not want to talk about that", the Tertium Comparationis would be less easy to identify.
Finally, we agree with Lagerwerf (1998) that a proper logical account of Concession must take into account Speech Acts and abductive use of the relation triggering the expectations, but we find his formalization of the Epistemic Usage problematic for two reasons. Consider again the example in (13), repeated in (18) In (18.b), Gfb and Exh are predicates denoting, respectively, the set of individuals gasping for air and the exhausted ones, while B(i, Exh(x)) is an epistemic operator asserting that the speaker believes x to be exhausted.
However, it is somehow odd to assert that the speaker believes someone to be exhausted given that he is gasping for air. The defeasible rule in (18.a) is general and therefore does not apply specifically to any particular speaker. Therefore, in the formalization, i should be most properly substituted by universal quantification over all possible believers. This is in line with Spooren (1989), Sanders (1994) and Pander Maat (1998), who identify different subjectivities that may be ascribed to the statements in a discourse. In particular, Pander Maat (1998) conducts a corpus analysis showing that three perspectives of subjectivity must be distinguished: 'objective perspective' (the statements are objective facts that are taken to be acceptable by the speaker), 'speaker perspective' (the belief of the statement is ascribed to the speaker), and 'other perspective' (the belief of the statement is ascribed to persons other than the speaker). Pander Maat (1998) proposes a revision of the hierarchy of discourse relations provided by Sanders et al. (1992) that includes a new feature specifying the perspective configuration.
Secondly, the formula in (18.b) does not adhere to Lagerwerf's intuition (Lagerwerf (1998), pp.41-42) that Epistemic Usage of Concession involves a defeasible rule which applies abductively, i.e., getting from observations to causes. In fact, the rule does not appear in his formulae, e.g., (18.b). Furthermore, the "default Implication" is also defeasible in case when it is used deductively, as in Content Usage. Therefore, if it should be asserted that the speaker believes the pre-conditions when he observes the effects, it should also be asserted that he believes the effects when he observes the pre-conditions.
In our view, Lagerwerf (1998)'s valid intuition must be formalized exactly as it is stated: the formula must include an explicit defeasible rule corresponding to "being exhausted causes gasping for air". Separately, the formula asserts that the rule yields the expectation abductively. In other words, in the formulae the assertion of the defeasible rule must remain orthogonal to its usage.
To sum up, in this and the previous sections we have attempted to give a comprehensive review of the literature on Concession and the challenges that any logical account of Concession will need to address. One could view these challenges as a purely theoretical exercise in semantic theory and continue to work on them on a theoretical basis. One might argue that these theoretical challenges should be mostly irrelevant to human annotators whose task is to identify and annotate Concession in naturally occurring data. While, indeed, it may not be surprising that there are theoretical challenges to be addressed in the formal treatment of concession, we were intrigued by the fact that what, for a human, seemed to be a fairly straightforward definition of Concession (Denial of expectation) yielded surprisingly high disagreement among annotators. Indeed, almost 30% of PDTB tokens annotated as Concession by one annotator were annotated as Contrast by the other. Close investigation of the data helped us realize that viewing the relation with a focus on the denial of expectation made it hard for the annotators to discern the triggers of expectation. Analyzing the sources of expectation, the different types of relation that trigger them and the inferences that they allow helped them identify the relations with much improved consistency.
Our work bridges the gap between corpus data and logic and our methodological approach is doing so by starting at the bottom. We started by looking at discourse connectives in the PDTB, and then built up more abstract models for deriving appropriate inferences.
In the next section, we will look at Concession in the PDTB and present a data-driven analysis of the different inferences that are triggered from the range of the sources of expectations attested in the annotations of Concession.

Concession in the PDTB: Where do expectations come from?
The PDTB corpus contains 1193 annotated instances of Concession associated with an explicit discourse connective. In order to identify the possible defeasible relations involved in concessive relations in real data, 1000 of these 1193 instances have been analyzed. Table 1 shows the distribution of all the tokens in the PDTB that were labelled 6 as Concession (or any of its subtypes 7 ) and Contrast (or any of its subtypes). There were a total of 1193 tokens 6. A full description of the sense tags used in the PDTB is given in Prasad et al. (2008) and . 7. The PDTB distinguishes two subtypes of Concession: "expectation" and "contra-expectation". When the clausal argument that syntactically bounds the discourse connective creates the expectation, the PDTB instance is labelled as "expectation". Otherwise, it is labelled as "contra-expectation". Note that this sub-categorization is orthogonal to labelled as Concession. The most common concessive connective is 'but' with 508 tokens (42% of all concessive labels), followed by 'although' with 154 tokens (13% of all concessive labels). The connective 'but' is, also, very common in relations marked as 'Contrast', which may have contributed to the confusion between 'Concession' and 'Contrast' that we noted earlier. It seems surprising that Concession, despite having a straightforward definition, was so frequently confused with Contrast. For all the tokens that were annotated as either Concession or Contrast by one of the PDTB annotators, there was almost 30% disagreement.

Connective Concession
In most cases of disagreement, at least one annotator would choose Contrast over Concession because they would prefer to construct a contrastive interpretation between the created expectation and its denial, failing to see that the involved predicates were not symmetric.
This section reports several PDTB instances tagged as Concession, out of 1000 selected ones, and shows that it is possible to identify four types of semantic relation that give rise to the asymmetry characterizing Concession: Causality, non-monotonic Implication, Correlation, and Implicature.
The next four subsections discuss each category with corpus examples and outline their meanings. Support for the proposed classification of the identified sources of expectations is given by the results of an inter-annotator study that we conducted asking the annotators to label the data with the new categories (cf. section 6).

Causality
In Sweetser (1990), Sanders et al. (1992), Lagerwerf (1998), and also in Prasad et al. (2008), it is assumed that in all cases of Concession the expectation comes from a (defeasible) causal rule. The next subsections argue that this is true in most, but not all, cases.
An example, taken from the PDTB, in which the expectation is created by a defeasible causal rule is shown in (19). the one addressed in this paper, i.e. "direct" and "indirect" Concession. The latter concerns the way the expectation is denied, regardless of the clausal argument that creates it.
(19) Although [they represent only 2% of the population] Argc , [they control nearly one-third of discre- In (19), Arg c asserts that "they" represent a very low percentage of population. That creates the expectation that they control a (proportionally) low percentage of income. Where does this expectation come from? The obvious answer is that our world knowledge includes a general causal rule representing that a low percentage of population causes control of a small amount of income, which instantiates on Arg c and creates the expectation. This causal rule is, however, defeasible, i.e., its effect may be falsified or canceled, as it is done by Arg d in (19).
More cases of Concession triggered by a causal rule are shown in (20).
(20) a. Although [imports account for less than 1% of beer sales in Japan] Argc , [Asahi Breweries Ltd., which has been gaining share with its popular dry beer, plans to fend off Japanese competitors by pouring $1.06 billion into facilities to brew 50% more beer] Argd . In (20.a), the low percentage of sales in Japan should cause Asahi Breweries Ltd. to invest somewhere else. Similarly, in (20.b), "the procedural steps triggered by the meeting" (defeasibly) causes "taking important decisions in both of these functions" and, in (20.c), the fans should blow away whatever it was that hung over it. In (20.d), the declarations of the Sanwa Bank spokesman create the expectation that Mr. Utsumi may be on the safe side. Finally, (20.e) is particularly interesting in that the expectation is created by a conjunction of two different causes. The fact that Mr. Corry was an undistinguished college student and the fact that he had the intention to work for a big company defeasibly cause the fact that he got smart professional results.

Implication
The previous subsection presented some examples of Concession where the expectations are created via abstract (defeasible) causal relations that instantiate on Arg c . And, it has been pointed out that many past proposals assume that the expectation is always triggered by a causal rule.
The data in the PDTB reveal that not all occurrences of Concession involve causality. An interesting instance is shown in (21): (21) [The prime minister,] Argd [whose hair is thinning and gray and whose face has a perpetual pallor,] Argc nonetheless [continues to display an energy, a precision of thought and a willingness to say publicly what most other Asian leaders dare say only privately] Argd . Arg c describes two properties featured by the prime minister, which do not appear to cause the negation of Arg d , or a Tertium Comparationis related to it. Rather, the description recalls in our minds some kind of prototypical old and tired man, of which the prime minister would be an instantiation. The expectation stems from the fact that the prime minister inherits all typical properties of such a prototype, among which the one of having a lazy and indolent attitude. Default inheritance from a prototype is clearly a defeasible implication, i.e., its consequent may be overridden as it is done by Arg d in (21).
These considerations are well-known by researchers working on Default Logics. Consider the typical example shown in (22). In (22), Arg c suggests that penguins have the property of flying, which they inherit from the prototype of 'bird'. This expectation is explicitly denied by Arg d .
In many PDTB occurrences Arg c evokes a kind of prototype of which some properties are overridden by Arg d . Some are reported in (23) In (23.a-b) it is easy to see the inheritance by default that creates the expectation. The concepts of "safe" and "pragmatic" respectively used in the sentences are not exactly the ones that are standarly assumed, i.e., the prototypes. Arg d specifies the prominent differences with respect to such a prototype, i.e., what properties are overridden.
In many cases, the prototype from which the canceled expectations are inherited is not so easy to identify. In those cases rather than thinking in terms of "inherited properties", it is more convenient to think in terms of "necessary conditions" to which the prototype must adhere.
For instance, in (23.c), working for U.S. exclusively is perceived as a necessary condition for working for U.S. intelligence. In other words, by reading Arg d in (23.c) we perceive that Mr. Noriega is arguably breaking some kind of rule required by his role. Similarly, in (23.d), it seems that, in order to claim that "something is criminal", it is necessary that "it is defined as such by the law". Finally, in the context of (23.d) it defeasibly holds that whoever can do all this must be either a reporter, or a scholar, or a researcher, etc.

Correlation
The annotators of the empirical study presented below in section 6 chose 'causality' or 'defeasible entailment' for about 70% of the occurrences of Concession taken from the PDTB. The remaining cases seem to involve different relations. Consider for instance (24): (24) [The Treasury will raise 10 billion in fresh cash by selling 30 billion of securities . . . ] Argc . But [rather than sell new 30-year bonds, the Treasury will issue 10 billion of 29 year, nine-month bonds] Argd .
In (24), it does not seem that there is a general causal rule at stake. The fact that the Treasury will raise money cannot be the cause of the way it will actually do it. Arguing for the existence of a prototype evoked by Arg c also seems hard, though more compatible than the causal interpretation (cf. next subsection).
It seems that in examples such as (24) the expectation is created on the basis of the history of the previous similar situations. In the context, it is assumed that there are two events that usually correlate. Arg c describes one of the two, and we expect the other one to co-occur based on the fact that in several similar previous situations they did so. Accordingly, the third source of expectation has been termed as 'Correlation'. Archetypal A variant of this pattern encompasses occurrences where Arg d describes an eventuality that sounds "surprising" together with the one described by Arg c (cf. König (1983)). (26) shows some of such instances. In (26.a), it is "surprising" that Wedtech got rolling so late, given its start date. Similarly, in (26.b) and (26.c), it is surprising that Mr. Collor remains 'the favorite' and that the Journal did not mention the Reserve Fund and the creators of the money-fund concept. Is Correlation a source of expectation that is really distinct from the others? As argued above, Concession may stem from Causality or Implication if the context includes a general causal or entailment rule that creates the expectation. It may then be observed that, in those cases, the event that triggers the expectation and the event that describes the expectation co-occur. Let us look at the following simpler example of Correlation: (27) [John will finish his report ] Argc , but [he'll do it at home ] Argd .
From (27), we infer that John usually does not finish his reports at home, and the present occasion constitutes an exception to this general trend.
But it may be argued that there is a particular (unknown) reason why John never does his reports at home. Maybe his home is too noisy or the reports must be returned by the end of the work day. These reasons might cause the fact that John does not finish his reports at home. Similarly, in (24) we may think of a "prototypical Treasury" that always raises money in the same way, namely by selling new 30-year bonds.
Although such considerations might indicate that Causality and non-monotonic Implication often entail Correlation, in our view they should be kept distinct for two reasons. First, precisely because we do not know if there is a particular hidden reason why John does his reports at the office, we should not assert its existence, unless we believe that this is the inference that the reader draws from the text, which is clearly not the case. Secondly, it has been attested beyond doubt that there are instances of concession for which no causal rule or defeasible entailment can be construed. There are, also, examples involving a causal rule, for which it cannot be asserted that the cause co-occurs with the effect.
(28.a-b) from Winter and Rimon (1994) and Grice (1961) are cases in point: In (28.a), we cannot infer a causal rule or prototype stating that encouraging someone to take a chair "causes" or "entails" an invitation to sit on it. Maybe Correlation could best model instances of concession involved in Speech Act Usage but we have not conducted a study for speech acts specifically to support any claims. Conversely, in (28.b) the expectation is created via a causal rule: poverty may be the cause driving people to criminal activity such as stealing. But, it would be wrong to infer from that causal relation that poor people tend to be dishonest, i.e., a Correlation relation.

Implicature
There are cases of Concession in which the expectation is created by the pragmatics of the conversation. As mentioned earlier, while all occurrences belonging to this class express Indirect Contrast, not all of them involve a Tertium Comparationis.
For this reason, we associate this class with a broader definition. Concession is triggered via Implicature whenever Arg c is insufficient or irrelevant to the speaker's intention. It could lead the hearer to draw unintended inferences. It seems that in such cases Arg c violates a Gricean Maxim, Grice (1975). Arg d adds to Arg c the relevant information that the speaker wants to convey. The examples discussed in the Introduction are repeated in (29) In (29.a), Arg c could be interpreted by the hearer as " I (the speaker) want to rent this room", which is not the speaker's intention. In (29.b), Arg c is irrelevant with respect to the satisfaction of speaker's intentions, i.e., communicating to the hearer that there is some pizza left, who in the context might be looking for something to eat. Similarly, in (17.c), the command in Arg c could lead the hearer into thinking that the permission to which opening the computer case extends is unconstrained.
Below are some examples of Concession via Implicature taken from the PDTB: In (30.a), Arg c does not create any expectation that is inconsistent with Arg d . Arg d , simply, it conveys an achievement that is worth noticing in this context.
Similarly, in (30.c) Arg c reports some data about the stock value trend of Exxon and Allied-Signal. Arg d simply stops the potential inference that their results, which are indeed independent from the stock value, were not in line with the forecasts.
The PDTB does not include enough Implicature examples for analysis, so clearly more work is needed before a satisfactory treatment of this category can be offered.

Studies of inter-annotator agreement
In this section we report two inter-annotator agreement studies that we conducted to evaluate a) the reliability of distinguishing four sources of expectations in the semantic description of Concession and b) the impact of the new analysis of concession on the, previously low, inter-annotator agreement between Contrast and Concession in the PDTB.
It must be pointed out that these annotation experiments ought to be considered only as preliminary studies of our claims, i.e., Concession is more characterized by how expectations are created rather than by how they are denied. On the other hand, in order to obtain reliable annotations we will need precise guidelines with linguistic examples and subsequent adjudication steps as suggested by Versley and Gastel (2013). Since annotating discourse relations is a rather difficult task, Versley and Gastel (2013) propose a set of linguistic tests that annotators should use in order to tag difficult non-archetypal cases. For such cases, annotators are required to perform paraphrases of the utterance, insertion/substitution operations of either the connective or its argument, etc. and check which aspects of the overall meaning are changed and which are not. The check should make annotators able to select the proper sense label. With respect to the ambiguity between Contrast/Concession, Versley and Gastel (2013) propose linguistic tests aiming at testing the symmetric/asymmetric role of the arguments (cf. Versley and Gastel (2013), section 4.1). Furthermore, since the quality of the annotations obviously does not only depend on the clarity of the guidelines, but also on how the annotators are able to apply them, Versley and Gastel (2013) suggest using a set of quantitative tests to subsequently inter-adjudicate the annotations. This is particularly strategic for discourse relations, for which all annotation schemes proposed so far in the literature appear to be intuitive with respect to sample cases, but it is not so when applied to real data, due to the strong context-sensitivity of discourse connectives (cf. (2) above).
In the same spirit, Spenader and Lobanova (2009) uses χ 2 to check statistically significant correlations between lexical markers and their senses. This and similar methods could be used for "filtering" discourse markers that are intuitively associated with certain senses but that, empirically, are not. For instance, Spenader and Lobanova (2009) found out that "however", standardly taken to be a marker of Contrast, is indeed equally used in Cause-Effect relations.
Nevertheless, the creation of such a reliable corpus is beyond the goal of the present paper, and it will deserve a new separate paper. The key point of our paper, we stress again, is to provide a logical formalization of concessive relations alternative to the ones proposed by Winter and Rimon (1994), Lagerwerf (1998), Korbayova and Webber (2007), and others. These proposals are essentially grounded on the analysis of sample sentences while our formalization is mainly guided by an empirical analysis of real data stored in the PDTB.

Annotation of expectation sources
We conducted an empirical analysis on 1000 PDTB tokens of explicit connectives annotated as 'Concession'. Two trained annotators, one of the authors and a post-doctoral researcher in linguistics, tagged each token with one of the four sources of Concession identified above. The postdoctoral researcher received a short tutorial about the different sources of expectation as explained and had the option to use 'other' if none of the suggested labels were appropriate. The option 'other' was not used by either annotator.

Source
Although  The kappa statistic for inter-annotator agreement yielded 0.8 agreement, indicating that the defined categories are reliable 8 . In the formula below, P r(a) is the percentage of agreement (85% of 8. Since the mid-1990s, when we saw an increased interest in producing semantic and discourse level annotations to linguistic corpora, it has been widely recognized that the highly subjective nature of semantic and pragmatic interpretations could yield unreliable annotations. When two annotators disagree, either one of the two annotators is wrong or the annotation schema, often a set of tag categories, is not capturing a reliable characterization. Semantic and discourse annotation efforts are renowned for the struggle to identify reliable categories that would minimize interannotator disagreement (among others, Carletta (1996), Di Eugenio (2000), and Poesio and Artstein (2008)). For example, in the development of the RST corpus, Carlson et al. (2001) used professional language analysts with prior the 1000 cases considered) while the percentage of each tag, i.e., P r(e), is equal to 25%, as there are four possible sources of Concession. κ = P r(a)−P r(e) 1−P r(e) = 0.85−0.25 1−0.25 = 0.8 After the computation of inter-annotator agreement, there was a brief adjudication effort that resulted in resolving any disagreements so we could compute the distribution of labels. In the cases of disagreement, we did not observe any interesting pattern to report. Table 2 shows the distribution of the four labels for the most common connectives conveying Concession, i.e., 'But' and 'Although'.

Annotation of Concession vs Contrast
Making a reliable distinction between Contrast and Concession was the most challenging annotation task in the PDTB, exhibiting relatively low inter-annotator agreement. In a total of 4319 instances of explicit connectives that were annotated as either Concession or Contrast, there was agreement in 3057 cases, i.e., 70.8%.
For that reason, we conducted a second inter-annotator agreement study focusing only on the PDTB tokens that were annotated as either Concession or Contrast. Specifically, we extracted 200 tokens of disagreement, i.e., tokens that one annotator had labelled as Concession and the other as Contrast. We trained two annotators, not the authors, to perform the task. Both annotators were linguistics students who attended a two-hour seminar on the distinctions between the different sources of expectation. The definition of Contrast remained the same as in the original PDTB annotation. For each token, they were instructed to choose one of four annotation labels: a) Contrast, b) Concession, c) COMPARATIVE, and d) Other. They were allowed to use the label COMPARATIVE when they could not decide between Contrast and Concession and Other when they thought that the example belonged to a different semantic class.
The results are reported in Table 3. The distributions of the two annotators are almost equal. But, of course, this does not mean that we obtained almost 100% agreement. Indeed, there are only six instances that have been labelled as 'Contrast' by both annotators. For all other instances, either both annotators chose 'Concession' or they assigned different labels. Each annotator used the label 'Other' a single time, but not for the same instance. The label 'COMPARATIVE' has never been selected.
Therefore, annotators agreed on 161 tokens (80.5%), most of which (155 tokens) have been labelled by both annotators as 'Concession'. The kappa score is: However, this kappa cannot be compared with the statistics of the original PDTB annotators, as they had more labels to choose from when they performed the annotations. But this is not critical for our experience in data annotation and they only achieved kappa 0.60 for annotating RST-style discourse relations (including concession) reaching maximum kappa 0.75 after the annotators had worked together for a week. Explaining to annotators what to do does not guarantee agreement even if they are trained. Inter-annotator studies are, therefore, crucial for the evaluation of the reliability of the suggested semantic categories.  purposes because we are only interested in evaluating the possible gain of analyzing Concession in terms of the four sources of expectation on the annotation of Contrast versus Concession. The strong preference towards Concession was indeed expected. We recall that the 200 instances were selected among those that were ambiguous between Concession and Contrast in the original PDTB annotation. Intuitively, it is somehow unlikely that such doubtful cases were conveying a symmetrical relation, which should be rather easy to identify.
In other words, it is possible that the PDTB annotators could not reliably identify the underlying relations that gave rise to expectations.
Focusing on the types of relations that give rise to expectations made it clearer that unlike Concession, an asymmetrical relation, Contrast involves a symmetrical relation between a common shared predicate receiving different values. Concession, on the other hand, is always an asymmetrical relation, which relies on understanding the underlying relation of two events, not mentioned explicitly. Understanding the nature of the relation that gives rise to an expectation (causality, implication, correlation, implicature) highlights the asymmetry inherent in Concession.
Therefore, in the same spirit as Versley and Gastel (2013), section 4.1, who propose linguistic tests aiming at testing the symmetric/asymmetric role of the arguments for disambiguating between Contrast and Concession, focusing on the sources of expectation could be perhaps taken as a semantic/pragmatic test for the very same task.
Looking at the instances of persisting disagreement, we observed that in most of these cases, it was possible to construct both a contrastive and a concessive interpretation. Consider for example, token (31). In this case, both a concessive and a contrastive interpretation can be built. A contrastive interpretation can be built by juxtaposing the predicates aware but not responsible. A concessive interpretation can be built if the reader assumes that the President Waldheim knew about the killings before they happened and did nothing to prevent them. In this context, asserting that he was not responsible for the killings creates the expectation that he was not aware of them. It is possible that in some cases, better understanding of the context might help in disambiguating the intention of the author. On the other hand, it is also possible that both interpretations are entertained by the reader. Since we did not give the annotator the option to annotate with a double tag Concession-Contrast, we do not know if such a tag would be used.
While further studies would be required to evaluate the impact of the proposed analysis on a bigger scale, these results offer strong support in favor of looking closer at sources of expectation when analyzing Concession. With these encouraging results, we set out to develop a logical account that would most elegantly capture the semantics of Concession while making appropriate distinctions for the identified (semantic) sources of expectation.

Semantics of Concession
This section proposes a logical account for the occurrences of Concession in which the source of the expectation is either Causality, non-monotonic Implication, or Correlation. A proper formal treatment of Concession via Implicature is seen as the object of future work.
Rather than designing new ad-hoc logical constructs to handle the semantics of Concession, we make the effort to formalize our insights using an existing logical framework, if possible. The framework that allowed us to give the most elegant account has been defined in Hobbs (1998) and several other earlier publications by the same author 9 . Hobbs defines a wide-coverage logic for Natural Language semantics based on the notion of reification Davidson (1967), Bach (1981). It implements a fairly large set of linguistic and semantic concepts including sets, composite entities, scales, change, causality, time, event structure, etc., into an integrated first order logical formalism.
Hobbs' framework includes all ingredients needed to properly represent the concepts introduced in the previous sections, in particular the possibility of defining defeasible relations. In addition, Hobbs' modular logic can be used to study the semantics of the connectives independently of the semantics of the arguments Arg c and Arg d .
Finally, we also show that Hobbs' framework is a suitable choice for the easy integration of other insights offered in the literature, such as Lagerwerf (1998)'s, and the extension to the semantics of other discourse connectives. Interestingly, Lagerwerf (1998)'s work as well as several other researchers' work on discourse semantics, is based on the taxonomy of coherence relations proposed by Sanders et al. (1992), which is in turn based on Hobbs' notion of "discourse coherence" Hobbs (1991).
The following subsection briefly describes Hobbs' logical framework, with a particular focus on the ingredients needed to handle concessive relations. Our proposal for a logical account of Concession will be illustrated in 7.2.

Hobbs' logical framework
Hobbs (1998) proposed a wide coverage logical framework for NL semantics centered on the notion of Reification. Reification allows a wide variety of complex natural language (NL) statements to be expressed in Predicate Logic. NL statements are formalized such that events, states, etc., correspond to constants or quantifiable variables of the logic. In other words, the states and events denoted by these constants as well as the variables are things in the world. Hobbs uses the term 'eventuality' to denote the reification of both a state or an event.
Hobbs distinguishes two parallel sets of predicates: primed and unprimed. The unprimed predicates are standard first order predicates commonly used in logical representations. For example, (give a b c) asserts that a gives b to c in the real world. The primed predicate represents the reification of the corresponding unprimed relation. The expression (give e a b c) says that e is a giving event by a of b to c. Eventualities may be possible or actual. In Hobbs, this distinction is represented via a unary predicate Rexist that holds for eventualities really existing in the world. To give 9. See http://www.isi.edu/∼hobbs/csknowledge-references/csknowledge-references.html and http://www.isi.edu/∼hobbs/csk.html. an example cited in Hobbs, if I want to fly, my wanting really exists, but my flying does not. This is represented as: (Rexist e) ∧ (want e I e 1 ) ∧ (f ly e 1 I) Eventualities can be treated as the objects of human thoughts. Reified eventualities are inserted as parameters of such predicates as believe, think, want, etc. Reification can be applied recursively. The fact that John believes that Jack wants to eat an ice cream is represented as an eventuality e such that it holds 10 : (Rexist e) ∧ (believe e John e 1 ) ∧ (want e 1 Jack e 2 ) ∧ (eat e 2 Jack Ic) ∧ (iceCream e 3 Ic) Hobbs' logic distinguishes between specific eventualities, like "Fido is barking", and general or abstract types of eventualities, like "Dogs bark". They are not treated as radically different kinds of entities. At some level, they are both eventualities that can be the content of thoughts. To this end, the logical framework includes the notion of typical element (from Hobbs (1995) and Hobbs (1998)). The typical element of a set is the reification of the universally quantified variable ranging over the elements of the set (cf. McCarthy (2002)). Typical elements are first-order individuals. Their introduction is motivated by the need of moving from the standard set theoretic notation in Predicate Logic: to a simple statement that p is true of a "typical element" of s. In Hobbs' notation, the typical element t of a set s satisfies the predicate (typelt t s) .
The principal property of typical elements is that all properties asserted on them are inherited by the members of their corresponding sets.
It is important not to confuse the concept of a typical element with the standard concept of "prototype", which allows for defeasibility, i.e., properties that are not inherited by all of the real members of the set. Asserting a predicate on a typical element of a set is logically equivalent to the multiple assertions of that predicate on all elements of the set. These considerations lead to the distinction between eventuality types and eventuality tokens. The logic defines the following concepts, for which we omit formal details: a. Eventuality types (also known as abstract eventualities): eventualities that involve at least one typical element among their arguments or arguments of their arguments.
b. Partially instantiated eventuality types (aka partial instances): a particular kind of eventuality type resulting from substituting the typical elements of some of its (sub-)arguments with other typical elements corresponding to proper subsets.
c. Eventuality tokens (also known as instances): a particular kind of partially instantiated eventuality type with no typical elements in the arguments or sub-arguments 11 .
10. The formula expresses the de re reading of the sentence, where e1, e2, e3, John, Jack, Ic are first order constants respectively referring to the three eventualities, the two boys, and an ice cream. 11. Actually, 'instance' is a term with a broader meaning. There are instances of typical elements that are not eventualities. For simplicity in this paper we assume 'instances' and 'eventuality tokens' to be synonymous.
In order to assert that an eventuality e is a, possibly partial, instance of another abstract eventuality e a , Hobbs introduces the predicate (partialInstance e a e). Another predicate (instance e a e) specifies that e is a total instantiation of e a . It is a consequence of universal instantiation: any property that holds of an eventuality type is true of any (partial) instance of it. We omit here the axioms that formally assert is-a inheritance between eventuality types and their instances.
Every relation on eventualities, including logical operators, causal and temporal relations, and even tense and aspect, may be reified into another eventuality. For instance, by asserting (imply e e 1 e 2 ), we reify the implication from e 1 to e 2 into an eventuality e and e is, then, thought of as "the state holding between e 1 and e 2 such that whenever e 1 really exists, e 2 really exists too". On the other hand, negation is represented as (not e 1 e 2 ): e 1 is the eventuality of e 2 's not existing.
The predicates imply and not are defined to model the concept of 'inconsistency'. In the next subsection, we show how this concept can be used to construct a uniform account of 'Direct' and 'Indirect' Contrast.
Two eventualities e 1 and e 2 are said to be inconsistent if and only if they (respectively) imply two other eventualities e 3 and e 4 such that e 3 is the negation of e 4 . The definition is as follows 12 : (32) (forall (e 1 e 2 ) (iff (inconsistent e 1 e 2 ) (and (eventuality e 1 ) (eventuality e 2 ) (exists (e 3 e 4 ) (and (imply e 1 e 3 ) (imply e 2 e 4 )(not' e 3 e 4 )))))) The concept of reification used in Hobbs' logic is suitable for the study of the semantics of discourse connectives because it allows focusing on their meaning while leaving underspecified details about the eventualities involved. In the case of Concession, this amounts to identifying the two eventualities that respectively create and deny the expectation in Arg c /Arg d , and define the semantics of concessive relations on them.
(32) is an example of 'axiom schema'. In this logic, an 'axiom schema' provides one or more different axioms for each predicate p. Axioms determine the expressivity and the computational complexity of the logic. However, the axioms defined in the current version of the logic do not guarantee that the logic is recursively enumerable or computationally tractable. In a real system, we envision handling this problem by defining ontologies for specific domains and making queries to these domains.
In what follows, we will briefly illustrate three basic concepts from Hobbs' logic that we utilize in our proposed semantics of Concession, namely Causality, Defeasible Implication and Likelihood.

CAUSALITY
Hobbs' logic adopts a defeasible account of Causality, originally proposed in Hobbs (1993). This distinguishes between the monotonic notion of 'causal complex' and the non-monotonic, defeasible notion of 'cause'. As Hobbs (1993) explains, when we flip a switch to turn on a light, we say that flipping the switch "caused" the light to turn on. But for this to happen, many other factors need to be satisfied: the bulb is good, the switch is connected to the bulb, there is power in the city, etc. The set of all the states and events that are necessary for the event e to take place as a result, are called the 'causal complex' of e. In a causal complex, the majority of participating eventualities are normally true and therefore presumed to hold. In the light bulb case, it is normally true that the bulb is not burnt out, the wiring is in good condition and the power is on, so the conditions are presumed to hold. What cannot be presumed to hold is whether the switch is on or off. Eventualities that cannot be assumed to be true under normal contexts are commonly identified as causes (cf. Kayser and Nouioua (2009)).
Based on these ontological grounds, Hobbs represents Causality in terms of two predicates: (cause c e 1 e 2 ) and (causalComplex s e 2 ). The predicate cause says that c is the state holding between e 1 and e 2 such that the former is a non-presumable cause of the latter. The predicate causalComplex says that s is the set of all presumable or non-presumable eventualities that are involved in causing e 2 , including e 1 13 . In order to preserve defeasibility, the real existence of the effect e 2 does not depend on the real existence of the cause e 1 and the causal rule c. In other words, the truth of (Rexist e 1 ) and (Rexist c) does not imply that (Rexist e 2 ) is also true. (Rexist e 2 ) is true just in case all the eventualities in the causal complex of c 2 really exist, as asserted by the following axiom: (forall (s e) (if (and (causalComplex s e) (forall (e 1 )(if (member e 1 s) (Rexist e 1 ))) ) (Rexist e) )) It must be pointed out that in practice we can never specify all the eventualities in a causal complex. For instance, consider the following toy example of Concession: (33) Although [John studied hard] Argc , [he did not pass the exam ] Argd .
In (33) the expectation "John passed the exam" is created by a defeasible general causal rule "studying hard causes passing exams" that instantiates on the present context. Nevertheless, John did not pass the exam. There was a particular unknown reason why he did not, despite his hard studying. Determining all context-dependent co-causes that had to be in place in order to properly trigger the causal rule would be clearly impossible. Of course, this amounts to saying that NL sentences may be properly interpreted even if causal complexes are unknown.
Therefore, to conclude, in most cases the causal complex exists, but it is not possible to infer it.

DEFEASIBLE IMPLICATION
Defeasibility does not hold only for causal rules. Most of our everyday knowledge is non-monotonic, i.e., only approximately correct. For example, knowing that birds fly allows us to infer that if Tweety is a bird, then Tweety can fly. This conclusion will be defeated later when we learn that Tweety is actually a penguin and therefore does not fly. The example illustrates that we need to be careful about how we model knowledge of the world. Hobbs, following McCarthy (1980), models common sense implication via monotonic implication (meta-operator if), but allows for defeasibility via the introduction of the underspecified predicate etc in the antecedent of the implication.
(forall (x) (if (and (bird x) (etc)) (f ly x))) The formula says that if x is a bird and has other unspecified properties encoded as etc (i.e., x's wings are robust enough), then x can fly. In other words, the formula describes the prototype of bird with respect to the property of flying. etc is a conjunction of eventualities that are true for the prototype and allow for the property of flying. For non-flying birds, at least one of those properties does not hold. Although etc is left underspecified in the formulae, its precise definition depends, and so needs to be indexed, on the corresponding predicate, e.g., bird, in the example above.
In order to set up a uniform formal account of Concession, we need to introduce a new predicate that denotes non-monotonic implications. Let us term this new predicate as 'nonMonotonicIf '. The predicate nonMonotonicIf must have the same syntactic structure as the predicate cause: it must relate two eventualities e 1 and e 2 , and it may be reified into a new eventuality. As for e 1 and e 2 they can be abstract eventualities or instances. Obviously, (nonMonotonicIf e 1 e 2 ) is true iff e 1 defeasibly implies e 2 .
The definition of nonMonotonicIf is reported in (34). An eventuality e 1 defeasibly implies e 2 if and only if for each partial instance of e 1 there is a partial instance of e 2 for which the meta-predicate if, augmented with an opportune etc predication, holds. Of course, if and only if e 1 and e 2 are two instances, it is necessary to assume that the predicate partialInstance denotes a reflexive relation, i.e., that every eventuality is a partial instance of itself. p 1 and p 2 are the predicates indicating the types of the eventualities e 1 and e 2 respectively. Hobbs defines a meta-predicate pred to relate an eventuality with the unique predication that describes it: (pred p e) states that p is the predicate whose reification is e.

LIKELIHOOD
Eventualities exist in a Platonic universe of possible individuals: entities, states and events. As said above, if they happen to actually occur in the real world, that is one of their properties, and we express it with the predicate Rexist. Real existence is one mode of existence but there are others, too. The eventuality could be part of someone's beliefs but not occur in the real world. It could be merely possible or likely but not real. It could, also, be unlikely or impossible. An especially important modality is "happening at a particular time". Possibility is one common judgment we make about eventualities in situations of uncertainty. Likelihood is another. Likelihood is intended as the common sense notion of the mathematical version of probability. Mathematically defined probability is a special case of common sense likelihood. Likelihood is a qualitative notion intended to model the vague probability judgements we make in everyday life, as when we say that it's likely to rain or that the train may be late.
Likelihoods are members of a partially ordered scale of likelihoods. For Hobbs, such a scale s satisfies the predicate (likelihoodScale s). The likelihood of an eventuality e is with respect to an implicit set of constraints c defining the sample space. An eventuality c may be defined as a single eventuality e c that reifies the conjunction of all the constraints such that (and e c e 1 . . . e n ) is true, where e 1 , . . . , e n are the eventuality-constraints. The likelihood of e is given in the context where the predicate (Rexist e c ) is assumed to hold.
With the formula (likelihood d e c), where d is a number, e an eventuality, and c a set of constraints, we assert that d is the likelihood of e's really existing, if the set of eventualities in c really exist and d belongs to the contextually relevant likelihood scale s. We say that a certain eventuality e is 'likely' when a set of eventualities c holds, iff the likelihood of e given c is a qualitative value belonging to the highest part of the contextually relevant likelihood scale.
(forall (e c) (iff (likely e c) (exists (s d s 1 ) (likelihood d e c) (likelihoodScale s) (belong d s 1 ) (high s 1 s)) )) Likelihood is connected to other modalities via additional axioms. If the likelihood of an eventuality e with constraints c is the top of the likelihood scale, then e is necessary given c, i.e., it is implied from the latter. If the likelihood of e is the bottom of the likelihood scale, then it is not possible given c.
In the next subsection, we use the above definitions of Causality, non-monotonic Implication and Likelihood to define the semantics of Concession with respect to the different sources of expectation that we identified in our analysis in the PDTB corpus.

A refined logical account of Concession
In this section, we propose logical formulae in Hobbs' logic that represent the meaning of concessive relations using a uniform representation that is minimally adjusted to reflect the interpretation of the different sources of expectation. As discussed in Section 1, previous approaches of Concession, e.g., Winter and Rimon (1994) and Lagerwerf (1998), mostly focus on how the expectation is denied. The distinction between 'Denial of expectation' and 'Concessive opposition' is in this spirit. In 'Denial of expectation', Arg d directly denies the expectation. In 'Concessive opposition', Arg d entails the negation of the expectation.
In this line of work, not a lot of attention has been paid to how Arg c creates the expectation. In most cases, it is simply assumed that the expectation is created from Arg c via an underspecified entailment '→'. Our approach is different in that it focuses on characterizing how the expectation is created rather than how it is denied.
In formal terms, we propose to represent semantic concessive relations via the following general pattern in Hobbs' logic: where Φ is a generic predicate referring to the underspecified entailment that creates the expectation; it corresponds to Winter&Rimon's '→'. According to our analysis, Φ can be any of the predicates cause', nonMonotonicIf', or likely' as defined by Hobbs' account and illustrated in the previous section.
The eventuality e e corresponds to the created expectation. It is created from the eventuality e c , conveyed by Arg c , via the relation Φ. The eventuality e d is conveyed by Arg d . The formula in (35) asserts that e e and e d are inconsistent, via the predicate (inconsistent e e e d ).
According to the definition in (32), whether the inconsistency is achieved directly or indirectly remains underspecified. (inconsistent e e e d ) comes out true iff e e implies a third, possibly different, eventuality, and e d its negation. Obviously, it may be either the case that e d is already the negation of e e (Denial of expectation), or that it implies it (Concessive opposition).
The eventuality s a c is a general abstract (defeasible) rule. The formula asserts that the reification of Φ, i.e. the eventuality s c , is a more specific instantiation of s a c . Both e c and e d really exist in the context, as asserted in (35) via the predicate Rexist . On the contrary, s a c and s c do not necessarily exist in the real world; as exemplified below in (45), they could exist only in the speaker's beliefs.
The general pattern in (35) (37), (38), and (39) respectively. Note that, with respect to the general pattern in (35), the three formulae below differ only in the predicate Φ. In (37), it has been substituted by cause', in (38) by nonMonotonicIf', and in (39) by likely'. In the next subsection, we will show that (35) is also able to account for the generalizations identified above in Section 4. (inconsistent e e e d ) ) e c = "John will do his report" s a c = "John does not usually do his reports at home" e e = "John will not do his report at home" e d = "John will do his report at home" 7.3 Abductive usage of defeasible rules Lagerwerf (1998)  In (40.a) the expectation is denied by the illocution of Arg d , i.e., its Speech Act, rather than by its locutionary meaning. It is the fact that I tell you Arg d , and not Arg d , that is inconsistent with the expectation, created by the rule "If I know that you already know something, I won't' say it to you again".
The formalization of occurrences of Concession involving Speech Acts are simple in Winter and Rimon (1994) and Lagerwerf (1998). The default implication is asserted on the Speech Act associated with the proposition rather than the proposition itself. Speech Acts do not raise any problems in our approach, either, and so we will not go into the details. Consistent with Hobbs's logic, the Speech Act of an eventuality is simply reified into a new eventuality, and the defeasible rules are asserted on the latter.
The interesting case is 'Denial of expectation -Epistemic', shown in (40.b). In this example, there is a causal rule "Being exhausted causes gasping for air", and the expectation is created abduc-tively, i.e., by observing that Theo was gasping for air, it may be concluded that he was exhausted. Lagerwerf (1998) formalizes this intuition in Predicate Logic as follows 14 : where Gfb and Exh are predicates denoting the set of individuals gasping for air and the set of exhausted individuals, respectively. B(y, Φ) is an epistemic operator asserting that y believes Φ, i refers to the speaker, and '>' is the defeasible implication operator defined in Asher and Morreau (1991). As discussed in Section 4, Lagerwerf's intuition is correct in our view, but the proposed formalization seems to deviate from the heart of the intuition.
As discussed above in (18.a), the defeasible rule is general and therefore does not apply specifically to any particular speaker. Therefore, in the formalization, i should be most properly substituted by a universal quantification over all possible believers. Similar remarks may be found in Pander Maat (1998), who argue that three different perspectives of subjectivity may be ascribed to the belief of the statements in a discourse: 'objective', 'speaker', and 'other' perspective. The latter holds when the belief of a statement is ascribed to people other than the speaker. Consider the following examples of 'Denial of expectation' taken from the PDTB 15 : From the example in (42), we understand that the speaker believes that "watering the feet causes growing up". In Hobbs', such an abstract causal rule is reified into an eventuality s a c . Then, in order to ascribe it to the speaker's subjectivity only, a separate predicate like (B i s a c ) can be conjoined to the whole formula. In other words, in all other examples seen above, each defeasible rule has been always taken as a true fact, but obviously in the case that the hearer does not believe it, it may be asserted as a speaker's belief only.
In our view, Lagerwerf (1998)'s intuition must be formalized exactly as it is stated. In (40.b), the defeasible causal rule is "being exhausted causes gasping for air", and it yields the expectation abductively. Thus, the formula is the one in (43); the only difference with respect to the formulae associated above with occurrences of Concession where the expectation is created via Causality is the assertion of (cause' s c e e e c ) in place of (cause' s c e c e e ). (inconsistent e e e d ) ) e c = "Theo was gasping for air" s a c = "Being exhausted causes gasping for air" e e = "Theo was exhausted" e d = "Theo was not exhausted" If the causes/effects of the causal rule or the causal rule itself are believed by the speaker or by someone else, this is separately asserted via additional conjuncts. The formulae corresponding to (41.a) and (42) are shown in (44) and (45) respectively (in the formulae, b is a constant referring to the board and i is a constant referring to the speaker). e c = "I watered my feet" s a c = "Watering the feet causes getting taller" e e = "I got taller" e d = "I did not get taller" 8. Conclusions

Sources of expectation
We presented an empirical analysis of Concession based on the annotations of Concessive connectives in the Penn Discourse Treebank.
In concessive relations, one argument gives rise to an expectation which is then denied in the second argument. In this paper, we have argued that a proper account of Concession should be grounded on how the expectation is created. Specifically, we identified four sources of expectation: Causality, Implication, Correlation, and Implicature. In Causality, the created expectation is causally related to the eventuality that creates it. In Implication, the expectation is created on the basis of a specific property associated with the eventuality that creates the expectation and in Correlation the expectation is related to the eventuality that creates it via co-occurrence. We termed "Implicature" the source of expectation that involves pragmatic inferencing. We leave a formal account of this semantically complex category for future work.
To test the reliability of these categories, we conducted an inter-annotator agreement study on one thousand Concessive tokens in the PDTB. The high kappa score confirms that the categories can be distinguished reliably.
To evaluate the practical merits of our approach over previous accounts of Concession, we conducted a second inter-annotator study. In this study, two linguistic students annotated 200 instances from PDTB that had been annotated as Concession by one annotator and Contrast by the other. We trained the new annotators with the new definition of Concession, explaining the four sources of expectations. There was more than 80% agreement in this dataset which is a very significant improvement in making reliable distinctions between Contrast and Concession.

Formal treatment of Concession
Following earlier work by Lagerwerf (1998), we refined the semantics of Concession using basic constructs from the logic proposed in Hobbs (1998). Central to Hobbs' proposal is the notion of reification which allows complex natural language statements to be modelled in Predicate Logic. We propose that every type of Concession presupposes a general rule that holds in the context. We propose a logical account that models the semantics of Concession by defining the rule for each type of Concession. In the case of Causality, we infer a defeasible causal relationship between the eventuality expressed in one argument and the eventuality of the expectation. In Implication, we infer that the eventuality expressed in one argument non-monotonically entails the eventuality of the expectation. In Correlation we infer that the eventuality expressed in the argument that creates the expectation is likely to co-occur with the eventuality of the expectation. The proposed logic formulae differ by the kind of predicate describing the abstract rule (cause', nonMonotonicIf', likely').

Impact and future work
We have shown that by identifying the different sources of expectation in Concession not only are we able to characterize more precisely the events that give rise to expectations but we obtain more reliable semantic distinctions between Concession and Contrast. We were able to obtain empirical evidence for this claim by a new inter-annotator study that showed significant inter-annotator agree-ment for tokens that were previously confusing (tokens of annotator disagreement between Contrast and Concession).
The proposed account of Concession and its logical treatment is an improvement over some previous accounts which are insufficient in capturing the range of variants identified in naturally occurring data. We maintain that the identified sources of expectation in Concession and their logical treatment adequately demonstrated how the study of discourse relations as attested in naturally occurring data can help us improve our understanding of the semantics of discourse relations. An important aspect of concession, the source of expectation, was overlooked in previous approaches but became apparent when studying the data.
Thus, addressing our second question "What kind of semantic representation will allow covering the rich range of variants conveying Concession and Contrast?", we concluded that defining predicates which take as arguments reified eventualities (and even speech acts) is critical for handling discourse level semantic relations. Hobbs's proposed semantics for natural language has proven to be especially well suited for articulating a uniform model of Concession while accounting for the range of variants in a simple and straightforward manner.
For future work, we need to delve deeper in the cases in which the created expectation involves pragmatic reasoning. Moreover, we need to further test the ambiguity between Contrast and Concession that, in the present paper, was studied only with respect to 200 tokens that were identified as problematic in the PDTB. Finally, although the identification of the four sources of Concession was empirically tested against a larger set of occurrences (1000 PDTB tokens), in order to see how our results are generalizable, we advocate further experiments on corpora pertaining to different genres and in other languages.
While we believe that our approach is in the right direction, defining an important step towards processing discourse problems automatically, the proposed semantics cannot be readily implemented in current state-of-the-art inference systems. Significantly more work would be required to integrate the proposed semantics in a real system, e.g., TACITUS system Hobbs (1986), Montazeri and Hobbs (2011), Ovchinnikova et al. (2011), which implements Hobbs' logic.
On the other hand, most current systems are based on shallow features. A recent proposal along this line is the one of Meyer and Popescu-Belis (2012). They trained a statistical classifier on PDTB data for disambiguating discourse connectives, among which discourse connectives conveying Concession and Contrast. The classifier involves a large set of syntactic and semantic features and it is used for enhancing the performances of a separate Statistical Machine Translation system. Classifiers based on shallow features could also benefit from our work. For instance, the overall performances of Meyer and Popescu-Belis (2012)'s classifier could perhaps be improved by including semantic features specifically aimed at identifying the sources of concessive relations.