The Use of Perspective Markers and Connectives in Expressing Subjectivity: Evidence from Collocational Analyses

This study explores how subjectivity is expressed in coherence relations, by means of a distinctive collocational analysis on two Chinese causal connectives: the specific subjective kejian ‘so’, used in subjective argument-claim relations, and the underspecified suoyi ‘so’, which can be used in both subjective argument-claim and objective cause-consequence relations. On the basis of both Horn’s pragmatic Relation and Quality principles and the Uniform Information Density Theory, we hypothesized that the presence of other linguistic elements expressing subjectivity in a discourse segment should be related to the degree of subjectivity encoded by the connective. In line with this hypothesis, the association scores showed that suoyi is more frequently combined with perspective markers expressing epistemic stance: cognition verbs and modal verbs. Kejian, which already expresses epistemic stance, co-occurred more often with perspective markers related to attitudinal stance, such as markers of expectedness and importance. The paper also pays attention to similarities and differences in collocation patterns across contexts and genres.


Introduction
In everyday communication, speakers and writers often express their conclusions and feelings. For instance, instead of merely reporting objective causal relations between events in the real world, as in (1a), they frequently utter subjective relations, which involve someone's reasoning (Langacker 1990, Pander Maat & Sanders 2000, Verhagen 2005, as illustrated in (1b). Subjective relations are not observable in the real world; one needs to take into account another person's (e.g., the speaker's or another agent's) perspective (Sanders et al. 2009(Sanders et al. , 2012 to process the reasoning, and thus one needs to track the source of information. In other words, subjective relations concern the degree of involvement of a locutionary agent or a Subject of Consciousness (Finegan 1995, Lyons 1977, Sanders et al. 2009).
(1) a. This restaurant is decorated with several art works of Mondriaan, so it attracts lots of fans of Modern art. b. This restaurant is decorated with several art works of Mondriaan, so its owner must be a fan of Modern art.
In order to communicate in a coherent way, speakers choose words to express the relations between consecutive discourse segments (Sanders et al. 1993: 94, cf. also Sanders & Spooren 2007, Schilperoord & Verhagen 1998. For instance, they can use connectives such as so and therefore to provide the reader with information on the type of coherence relation to be established, in this case a causal one (Britton 1994, Graesser & McNamara 2011, Mak & Sanders 2010, van Silfhout et al. 2014, 2015. Such information facilitates the reading process. It triggers faster processing of information immediately following the connective (Cain & Nash 2011, Cozijn et al. 2011, Sanders & Noordman 2000, van Silfhout et al. 2014, 2015 compared to the processing of that same information in unmarked relations. As examples (1a) and (1b) illustrate, English so can be used in objective and subjective causal relations. It only marks the causal nature of the relation, and does not indicate the degree of subjectivity of the relation. However, certain connectives in other languages do code information about subjectivity. For example, some connectives are only used for objective relations, such as Dutch daardoor 'as a result' and Chinese yin'er 'as a result', as is illustrated in the Dutch (2a) respectively Chinese (3a) translations of (1a). By contrast, the Dutch connectives want 'because' and dus 'so' (Degand & Pander Maat 2003, Sanders & Spooren 2015, Spooren et al. 2010, Stukker & Sanders 2008, Verhagen 2005, and Mandarin Chinese kejian 'so' prototypically express subjective coherence relations (Li et al. 2013). This is illustrated by the Dutch (2b) respectively Chinese (3b) counterparts of the subjective relation in (1b). Just like English so in example (1a) and (1b), some connectives in other languages leave the subjectivity information underspecified, i.e. they can be used for both subjective and objective relations (e.g. Chinese suoyi 'so' in example (3a) and (3b)).
The degree of subjectivity expressed by connectives is found to affect the processing of coherence relations. For instance, the Dutch subjective connective want 'because' leads to longer processing times directly after the connective compared to the Dutch objective connective omdat 'because' (Canestrelli et al. 2013). Such processing effects can be attributed to the difficulty of interpreting subjectivity: the reader needs to track the source of information to interpret subjectivity. Specific subjective connectives such as want 'because' instruct the reader at an early stage that there is a coherence relation, and that the relation is subjective, before the entire sentence is processed. In terms of the information density, subjective connectives encode more information compared to underspecified connectives. As the choice of connectives in examples (1) to (3) and the accompanying processing results illustrate, speakers and writers continuously have to decide how informative they should be in order to provide sufficient cues for others to comprehend them. At the same time, they should also avoid being too wordy. This tension has been systematically described by Horn's framework for pragmatic inference: his Q (Quality) Principle describes the need to 'make your contribution sufficient', the R (Relation) Principle describes the need to 'make your contribution necessary' (Horn 1984: 13). According to Horn (1984), speakers should find a balance between the speakerbased economy (saving the speaker's production efforts) and the hearer-based economy (saving the hearer's processing efforts).
A highly similar point has been made by the Uniform Information Density Theory (UID), which is about the speakers' strategy of choosing between alternative linguistic forms at several levels of linguistic representations: phonetic, syntactic, pragmatic, etc. (Frank & Jaeger 2008, Jaeger 2010, Levy & Jaeger 2007. The UID suggests that speakers modulate their word choice according to the amount of information in the utterance: full linguistic forms are more often used at the point where the content conveyed by the form is unexpected in its context, i.e. the point with a low probability and a high information density (for details, see Frank & Jaeger 2008). For instance, connectives can be omitted if the information they convey is highly predictable given other linguistic cues in the context (Asr & Demberg 2015). Through such modulation of word choices, the density of information of the utterance is kept at a uniform levela roughly equal amount of information at each unit of the sentence (Levy & Jaeger 2007). The UID theory echoes Horn's pragmatic theory in the sense that both theories predict a modulated process of word selection to optimize communication.
In terms of discourse relations and connectives, these theoretical discussions raise the question as to which information is exactly conveyed by connectives, and how that information may become predictable given other cues in the context. Hence, it is worthwhile to explore, as is done in the current paper, which linguistic markers also provide information on the degree of subjectivity of a relation, and would thereby allow for a division of labor between connectives and segment-internal elements (see Hoek 2018;Hoek et al., 2018). If other markers already indicate the degree of subjectivity, this will reduce the need of information on subjectivity to be expressed at the connective. This seems to be the case for expressions such as probably, surprisingly and according to Peter, which are addressed as markers of stance (Biber et al. 1999, Conrad & Biber 2000, evaluation markers (Bednarek 2006, 2009, Thompson & Hunston 2000, or appraisals (Eggins & Slade 1997, Martin 2000. Conrad and Biber (2000) suggest three sub-types of stance markers (see Bednarek 2006, Bednarek 2009and Thompson & Hunston 2000 for similar classifications): i.
Epistemic stance, which indicates how certain the speaker or writer is, or where the information comes from (e.g. probably, according to the President). ii.
Attitudinal stance, which indicates feelings or judgements about what is said or written (e.g. surprisingly, unfortunately). iii.
Style stance, which indicates how something is said or written (e.g. honestly, briefly.) (Conrad & Biber 2000: 57) Stance markers introduce the viewpoint of the speaker or other agents, and hence can be termed as perspective markers (Sanders & Redeker 1996). Perspective markers expressing epistemic stance show overlap with specific subjective connectives. Both indicate subjective reasoning, either from the speaker or from a character. Canestrelli et al. (2013) and Traxler et al. (1997) found that the processing effects of connectives are influenced by epistemic stance markers: by adding volgens Peter 'according to Peter' to the first clause connected in a subjective relation, as in example (2c), the extra processing time associated with the subjective connective want 'because' disappears.
(2) c. Volgens Peter is de eigenaar van dit restaurant een fan van moderne kunst, want het restaurant is versierd met diverse kunstwerken van Mondriaan. According to Peter the owner of this restaurant is a fan of Modern art, because the restaurant is decorated with several art works of Mondrian.
In terms of Horn's pragmatic theory, the reader/hearer has obtained sufficient information about the degree of subjectivity by the introduction of epistemic perspective markers. Upon encountering the subjective connective the reader/hearer does not have to establish an entirely new subjective mental representation, but rather only has to make a link to an already established mental representation introduced by the perspective marker in the first clause. In other words, epistemic stance markers in the first clause make it clear that the first clause is a claim and thereby create the expectation that the next clause will be an argument for this claim. The empirical findings of Canestrelli et al. (2013) and Traxler et al. (1997) suggest an overlap between specific subjective connectives and perspective markers in their function of instructing readers on the degree of subjectivity of the relation. The question is whether this holds true for perspective markers in general, including all types of stance markers, or only pertains to markers of epistemic stance. Epistemic stance markers explicate the dimension of reliability/certainty and evidentiality, which directly introduces a source of information. However, attitudinal stance markers and style stance markers introduce a source in an indirect way: by indicating attitudes, feelings and styles of writing/speaking that can be attributed to a source. Although all three types of stance markers presuppose a source of information, they differ in the way in which this source of information is involved. How these perspective markers overlap with connectives marking different degrees of subjectivity may shed light on the relation between subjectivity and perspective marking.
In this paper, we investigate this issue in natural language data. Starting from the assumption that language users will tend to avoid a doubling of information in terms of marking subjectivity in discourse relations, we may expect authors/speakers to observe some pragmatic strategies (e.g. apply Horn's R principle or try to produce an information flow with a Uniform Information Density) to achieve a successful communication (both sufficient and necessary). Avoiding repetition of information in the same dimension fits the R principle as well as the UID. Therefore, in natural language data we may expect connectives marking different degrees of subjectivity to vary in their co-occurrence patterns with perspective markers.
In corpus linguistics, the method of collocational analysis (Evert 2008;Gries & Stefanowitsch 2004) provides insightful information on the context of given linguistic elements. It measures the association strengths between words or expressions, and produces a list of important collocates in attraction or repulsion with a target word. Collocational analysis can advance our knowledge about the properties of a connective on the basis of its contextual features. We therefore conducted a corpus-based study using collocational analyses to examine the use of connectives and perspective markers in discourse, aiming to answer the following research questions: 1) Do connectives of different subjectivity degrees differ in their types of collocates? 2) More specifically, do connectives differ in the types of perspective markers they co-occur with?
We focused on two Chinese causal connectives for which we could derive hypotheses from the literature. Kejian 'so' is mostly used in the epistemic domain (Li et al., 2013), indicating that the causal reasoning arises from someone's mind; it encodes the epistemic stance apart from its discourse function of causally connecting two segments. Such subjectivity information is underspecified with the generic connective suoyi 'so', which can be used in both objective and subjective relations (Li et al. 2013). On the basis of Horn's theory of speaker economy, kejian can be expected to co-occur less with perspective markers of the epistemic stance than suoyi. Since neither the specific subjective kejian nor the generic connective suoyi encode attitudinal or style stance, no differences in collocation tendencies are expected between the connectives for the other two types of perspective markers.

Method
We conducted a series of distinctive collocates analyses on the two Chinese causal connectives suoyi 'so' and kejian 'so', with the aim to investigate the contextual features of the two connectives. Regular collocational analyses allow researchers to calculate association strengths between target words and their collocates. Distinctive collocates analyses (Church et al., 1991) are a specific type of collocational analysis: they allow for a direct comparison of the contexts of two semantically similar words (a word pair), identifying collocates that prefer to appear in the context of one word over the other word from the pair. With this type of analysis, words with high association scores are not associated with the target word in a general sense, but only if they are attracted more to this target word than to a reference context (i.e., in this study the alternative connective). This type of analyses has become especially popular for lexical alternatives in specific constructions (i.e., distinctive collexeme analysis or distinctive collostructional analysis, see Gries & Stefanowitsch 2004;Stefanowitsch & Gries 2003). In the current study, we use this method to identify words that tend to 'sit' in the context of suoyi more often than in the context of kejian and vice versa, paying special attention to linguistic elements expressing subjectivity.

Sample of texts
We used a balanced modern Chinese corpus: the CCL corpus (Zhan et al. 2003), which covers a variety of written texts: fiction, newspapers, conferences, translated literature, blogs, etc. The total size of the CCL corpus is 581,794,456 characters.
We only investigated actual texts, which lead to the exclusion of dictionaries, and to make sure all the texts were homogeneous in terms of mode (written), we excluded the sources of oral texts (spoken), and TV (written to be spoken), etc. From the remainder of the corpus, we selected texts from three types of genres: narrative genres on the one hand, and informative and argumentative genres on the other. Narrative genres included literature, drama, biographies and fiction magazines; informative and argumentative genres included newspapers, legal documents, academic works of natural science and social sciences, governmental reports and other texts labeled as practical writing. The argumentative and informative texts were collapsed as the 'non-narrative genre', because of the low number of argumentative texts available in the CCL corpus.
From the afore-mentioned parts of CCL, we then generated two raw datasets: text files containing all the sentences with the words suoyi or kejian, with a search scope of 200 characters to the left and 200 to the right. This scope was much wider than the length of a sentence so that we would have enough contexts for the analysis on the intended discourse unit.
In line with the parameters of collocation (Gries 2013), we first decided to investigate words as the linguistic units of collocates. Because natural Chinese texts do not have spaces between words, we used the Chinese word segmentation tool NLPIR-ICTCLAS (Zhang et al., 2003;tag: ICT_POS_MAP_SECOND) to separate the word boundaries of characters in the text. In this segmentation system, white spaces were added between words, and words were tagged based on their semantic types. Meanwhile, punctuations such as commas, full stops, parentheses, colons were also marked with tags. The word segmentation tool thereby generated segmented and annotated texts for later analysis.
In terms of the distance between collocates, a collocate did not need to be directly adjacent to the connective. Any words appearing within one clause before or one clause after the connectives were considered collocates. Instead of adapting an arbitrary number of words as the context, we set the context of the target word in such a way that it was meaningful at the discourse level: discourse clauses were taken as the units for analysis.

Sample of connective fragments
From the two segmented datasets of all sentences containing suoyi or kejian, we compiled a sample of connective fragments. This step was necessary, because suoyi does not only occur as a connective, but can also be used in an inversion construction zhisuoyi 'why there is a consequence of'. For the word kejian, we can observe a clear grammaticalization process in progress (Liu & Yao 2011;Q. Zhang 2012). There are cases in which kejian is used as a verb, sometimes resulting in modified constructions such as qingxi kejian 'clearly can see', and there are cases where kejian is clearly a connective, or where the use of kejian is ambiguous. In order to exclude the clear verbal cases of kejian and all the inversion constructions zhisuoyi, we conducted a two-step screening process in our sampling. First, we restricted the sample of target items to cases preceded by a punctuation marker (namely comma, full stop, question mark, exclamation mark, semicolon, or ellipsis) in the software AntConc_3.4.4.0 (Anthony 2016). This screening process filtered out verbal uses of kejian such as qingxi kejian 'clearly can see', as well as cases of kejian which are preceded by prepositional phrases such as youci kejian 'from this can see'. After the rough automatic screening process, 67,147 sentences with suoyi and 3,902 sentences with kejian were included for further analyses.
We then manually checked the remaining sentences marked by kejian, in order to exclude all other verbal instances of kejian. The verbal status of kejian could easily be derived from the absence of the main verb in the clause headed by kejian. For example, in (4), interpreting kejian as a connective with the meaning 'so' would only leave a noun phrase as the remainder of the second clause: the status of German cars in the minds of Chinese. By contrast, interpreting kejian as a verb 'can see', results in a grammatical clause, because in Chinese, the subject can be dropped. Hence, only full sentences such as (5) were included in the analyses of the connective use of kejian.
(4) Deguo chan de dazhong, Aodi and Benchi zhanyou hen da de bili, kejian deguo chan de qiche zai zhongguoren xinmuzhong de diwei. Germany produce MOD Volkswagen, Audi and Benz occupy very big MOD proportion, kejian 'from this can see'/*kejian 'so' Germany produce MOD car in Chinese mind MOD status. The German products Volkswagen, Audi and Benz take a big proportion (of Chinese market), from this we can see/*so the status of German cars in the mind of Chinese people. (5) Yi ge neng zhide yi tou niu de jiaqian, kejian nashihou shiliu zai woguo haishi xihan wu.
One CL can worth one CL cow MOD price, kejian 'so' that-time pomegranate in ourcountry still-is rare thing. One (pomegranate) was worth the price of a cow, so pomegranate was still very rare in our country at that time (in Ancient China).
All in all, the automatic and manual screening process excluded 20,096 cases of suoyi and 10,900 cases of kejian. Table 1 shows the resulting distribution of suoyi and kejian in the narrative and non-narrative texts in the sample.

Three sets of distinctive collocates analyses
The actual collocate analyses were conducted using the software R (R Core Team 2015) with the R package mclm_0.1 (Speelman 2018). The method of distinctive collocates analysis was applied three times. We first applied it to a context of one clause before and one clause after the connective, irrespective of genre. To obtain a proper context containing exactly one clause before and one clause after the connective, we automatically searched the closest punctuation markers (including comma, full stop, question mark, exclamation mark, semicolon, and ellipsis) around the target (a connective preceded by a punctuation marker). With this first analysis, we obtained a general picture of the words in collocation with one connective compared to the other. Second, we explored the collocates of the two connectives in their preceding context and following context separately, so that contextual features could be located more precisely. However, the distinctive collocates of suoyi versus kejian may be different depending on the genre they appear in, because the narrative genre is supposed to be more descriptive (e.g., describing events and actions), while the non-narrative genre is expected to be more argumentative. Therefore, in the third analysis, we took genre into account, distinguishing the collocational patterns in the narrative genre on the one hand, and in the informative and argumentative genres on the other.
The attraction and repulsion strength between a given word and the target connectives are measured by association scores (Evert 2008, Gries 2013, which are calculated on the basis of observed frequencies (O11, O12, O21, O22) and expected frequencies (E11, E12, E21, E22) in a contingency table (Table 2). For each word (target word) in the corpus that appeared at least once in the context of kejian or suoyi, the mclm R package computes O11 (i.e. target word instances in the target context), O12 (non-target words in the target context), O21 (target word instances in nontarget contexts), and O22 values (occurrence of non-target words in non-target contexts), as well as the corresponding expected frequencies. In the current study, we selected G2 (the log-likelihood measure, 2∑ij Oij log(Oij/Eij), Evert 2008), a statistical measure that is one of the most frequently used measures in collocational analyses. It is robust for differences in sample size, and compares observed frequencies and expected frequencies for each of the words taking into account the amount of evidence. We selected the top 100 items ranked according to G2 values (see the Appendix). Since G2 reports association strengths without indication of their direction, the top 100 collocates contains both words in strong attraction with the target word suoyi (i.e. in repulsion to kejian) and words in strong repulsion to suoyi (i.e. in attraction with the reference word kejian). The dir (direction) values provided by the R package were used to judge whether a word was attracted to suoyi (positive) or repelled by suoyi and hence attracted by kejian (negative). The Delta-P value (O11/(O11+O12) -O21/(O21+O22), Gries 2013) is an effect size measure and measures the difference between the observed frequency of the target word in one context and that in the other context. In this study, Delta-P value was used as a secondary criterion for the collocates: the words in attraction to suoyi all needed to be above the threshold of 0, and the words in repulsion to suoyi (the collocates of kejian) needed to be below this threshold (<0). The Delta-P measure was applied because it is considered more psycholinguistically realistic, as it takes into account the directionality of the collocation: 'whether the word1 is more predictive of word2 or the other way round' (Gries 2013: 141).

Observed frequencies Expected frequencies Target-word Non-target word Totals Target-word Non-target word
An efficient and common way to interpret the outcomes of a collocational study is to cluster the collocates manually and draw meaningful interpretations based on these clusters (see Gries & Stefanowitsch 2010). In the current study, we were able to identify seven clusters within the top 100 collocates: pronouns, communication verbs, cognition verbs, modal verbs, and three types of perspective markers, namely exclamatory adverbials, expressions of expectation and expressions of importance. The results section will focus on these items; a full list of collocates for each distinctive collocates analysis can be found in the Appendix.

Results
In this section, we discuss the results of the three distinctive collocates analyses. Section 3.1 illustrates the general collocation patterns of the two connectives. Section 3.2 compares the collocates of the two connectives in the clause preceding the connective and the clause following the connective. A genre-specific analysis shown in Section 3.3 reveals the collocations in different genres.

General analysis
The top 100 collocates (either attracted by suoyi or attracted by kejian) were categorized according to their semantic types. Since our goal was to find out whether language users avoid overlap in the expression of subjectivity in their utterances, we checked the top 100 for linguistic elements that can be related to subjectivity and perspective marking. Table 3 shows the collocates of suoyi that are relevant to our discussion, with their observed and expected frequencies and the G2 scores indicating the distinctiveness of particular collocates in the context of suoyi compared to the context of kejian.
From the top 100, certain types of words stood out as significant collocates of suoyi, the connective that is underspecified in terms of subjectivity. An important cluster is formed by pronouns of all types (singular and plural, 1st, 2nd and 3rd person). On the one hand, pronouns can be linked to objective relations in which actors carry out certain actions for certain reasons. On the other hand, they can be used in subjective relations in which the pronouns refer to the individuals whose perspective is presented. Therefore, we are not sure whether the higher number of occurrences of pronouns in the context of suoyi compared to the context of kejian should be attributed to a contextual feature of the objective relations that suoyi can express, or to the tendency to avoid doubling of subjectivity information in the context of kejian. This is much clearer for the other clusters that are attracted by suoyi, but repulsed by kejian: communication verbs, cognition verbs and modal verbs. Both communication verbs and cognition verbs can express the epistemic stance of the speaker, to be specific, the evidentiality of the information. Modal verbs indicate the author's/character's degree of certainty towards the proposition, which is also one of the dimensions of epistemic stance. This observation can be accounted for in terms of subjectivity. With suoyi in the sentence, the subjectivity information is underspecified. If subjectivity needs to be expressed, cognition verbs (marking evidentiality) and modal verbs (marking certainty) are used to help readers/hearers track the source of information.  Cognition and modal verbs would be repetitive for readers/hearers, however, in kejian contexts.
Kejian already implies someone is making the inference (normally, the speaker), so the use of cognition verbs and modal verbs would be a repetition of information on subjectivity.
The strong association between suoyi and the communication verb shuo 'say' is indicative of the pattern we try to establish. An alternative explanation would be that this collocation is due to the high frequency of the expression suoyi shuo 'so (I) say'. This expression has been segmented as two separate words by NLPIR-ICTCLAS, but in combination, it functions as a discourse marker that expresses the epistemic stance of the speaker. However, the cases in which suoyi and shuo are not intervened by any other linguistic elements, only account for 6.36% of the data (782 out of 12301 instances). This leaves many instances in which the communication verbs contributed to the expression of the epistemic stance of the speaker, as in example (6). Still, our data show that communication verbs were not exclusively used in epistemic contexts; they could also be used for reporting an objective description of real-world events, as in example (7). Therefore, we cannot be sure of the reason for the collocation of communication verbs and suoyi. This collocation pattern could be due to the speaker/author's strategy to avoid repetition of subjectivity information in subjective relations, just as for the cases with cognition verbs. Alternatively, communication verbs could be a feature of the context typical of the objective relations expressed by suoyi. Some of the words in the top 100 list were repelled by suoyi and should therefore be seen as distinctive for kejian instead of suoyi, as illustrated in Table 4. As mentioned in Section 2, we included all kinds of indications of subjectivity, irrespective of their grammatical categories. The noun jiazhi 'value' was clustered with the adjective zhongyao 'important', because jiazhi 'value' is often associated with evaluations that are made from a person's perspective. The exclamatory adverbials, expressions of expectation and expressions of importance can be related to subjectivity: they indicate that someone's feeling or evaluation is involved, and that the hearer/reader is not merely dealing with a description of real-world facts. These collocational patterns indicate that language users do not necessarily avoid a doubling of information, as both kejian and these collocates express that subjectivity is involved. However, from this list of collocates of kejian, it can also be derived that language users do pay attention to the type of subjectivity information, in other words how the perspective of a speaker/character is involved. While the important collocates of suoyi (cognition verbs, communication verbs and modal verbs) could be related to epistemic stance marking, the important collocates of kejianexpectation markers and importance markerscan be related to attitudinal stance marking. Hence, there is no doubling of epistemic stance marking information, the crucial type of subjectivity expressed by the connective kejian.

Collocational analysis on different clauses
Given the general information on the contextual features in the analysis across clauses and genres, we obtained a basic understanding of the types of collocates that appear in the context of kejian and suoyi. However, we do not know from the overall analysis where these collocates appeared exactly do they appear in the clause preceding the connective, or do they appear in the clause following the connective? By precisely identifying the locations of different types of collocates, we can be more informed on how language users combine different linguistic cues to express subjectivity in discourse. Moreover, for further psycholinguistic experiments, collocation distributions by clause provide insights into how linguistic stimuli should be designed to closely reflect authentic linguistic data. The current section therefore elaborates on the distribution of collocates in different clauses. Table 5 and Table 6 show the collocates of suoyi and kejian we derived from the top 100 in preceding clauses and in following clauses. Most of the general collocation patterns also held in the analysis per clause except for the communication verbs. In both preceding and following clauses, pronouns, cognition verbs, modal verbs co-occurred with suoyi. Most of these perspective markers may serve as the supplement of subjectivity information supplied by suoyi, regardless of whether they appear before or after the connective. Examples (8) and (9) illustrate the combined use of suoyi 'so' and the perspective marker renwei 'believe', which can appear in both the clause before and the clause after the connective. An important difference with the general collocation pattern is that the communication verbs appeared as important collocates of suoyi only in the clauses following this connective. This means that for the co-occurrence with such reportative verbs, no significant difference between suoyi and kejian can be found in the first clause. A clear-cut difference between the collocates of kejian in preceding and following clauses is suggested in Table 6. Exclamatory adverbials and expressions of importance were only distinctive for kejian in the clauses following this connective. This finding may be due to the tendency to express an evaluation in the second clause in a forward causal relation: the evaluation of importance is expressed in the second clause based on the events/phenomena described in the first clause, as is illustrated in (10).
Mothers who are anxious and disturbed after pregnancy are more likely to suffer dystocia and deliver abnormal infants, so paying attention to mental health is very important during pregnancy.
Expressions of expectation only appeared as important collocates of kejian in the preceding clause. These linguistic elements express an attitude of the speaker towards the situation described in the first clause, such as in example (11): the author is surprised by the fact that Wang Jian, a general, won the battles both in the south and in the north.  In contrast to the general collocation pattern, some instances of communication verbs were found as important collocates of kejian instead of suoyi in the preceding clause. Most of them are formal expressions, which are more characteristic of formal contexts such as informative and argumentative texts. Compared to the findings in Table 5 and 6, communication verbs can be collocates of either suoyi or kejian, depending on the formalities encoded in different specific communication verbs. Formal communication verbs such as cheng 'state' and yue 'say' patterned with kejian, while informal communication verbs such as shuo 'say' patterned with suoyi. Therefore, it is not possible to identify a uniform pattern in the co-occurrence of communication verbs in relation to the degree of subjectivity expressed by the connective.

Collocational analysis on different genres
The results discussed so far may be the result of a confound with the genre preference of the connectives under investigation. Suoyi is a generic connective that can be used for all types of genres, while kejian is not frequent in narrative texts (cf . Table 1). Moreover, several of the collocate clusters found in Section 3.1 and 3.2 may be a side-effect of genre preferences as well. For example, communication verbs can be expected to appear more in the narrative genre, just like pronouns. Therefore, communication verbs and pronouns may pattern with suoyi simply because they all share the preference for the narrative genre. To neutralize the influence of genre as a confounding factor, we further examined the collocation of the two connectives in different genres, namely narratives and non-narratives. The collocation distributions of these two connectives with other linguistic elements in different genres are summarized in Table 7 and Table 8 Table 7. Important collocates of suoyi in different genres Pronouns were observed as important collocates of suoyi in both types of genres, which indicated that this collocation pattern is not a side-effect of the genre preference of suoyi. Cognition verbs also appeared as important collocates of suoyi in both types of genres. Although the exact collocates differ per genre, they all expressed the same cognitive state of knowing and thinking. In addition, modal verbs were still distinctive collocates for suoyi in both narratives and nonnarratives. Therefore, we may infer that the collocation of the generic connective suoyi with cognition verbs and modal verbs is not due to genre differences. We did find a difference with the general collocation pattern, however. Contrary to our hypothesis communication verbs were found not to be significant collocates of suoyi in narratives, although they were still important collocates in non-narratives (559 cases (8.44%) of which were instances of suoyi immediately followed by shuo). Apparently, suoyi and kejian do not differ in their preference for co-occurring with communication verbs in the narrative genre, but only in the non-narrative genre. Even though the ratios of observed versus expected frequencies differ from the ones in the general analysis, the top 100 items still display similar collocation patterns for kejian. As Table 8 indicates, exclamatory adverbials and expressions of expectations stayed distinctive for kejian in both narratives and non-narratives, although there were some differences per item. These perspective markers on the attitudinal stance dimension of expectedness are more associated with kejian rather than suoyi across genres. Expressions of importance only appeared as important collocates of kejian in non-narrative genres.

General Conclusion and Discussion
The current study explored whether language users try to avoid doubling of subjectivity information in discourse, specifically in coherence relations. On the basis of distinctive collocates analyses, we examined whether the Chinese connectives kejian and suoyi, which differ in the degree of subjectivity they express, differed in their types of collocates, and especially if they differed in the types of perspective markers they co-occurred with. In line with our predictions, the degrees of subjectivity encoded in the two connectives was related to the type of linguistic cues in their contexts. In Section 4.1, we will summarize and discuss the general patterns we found; in Section 4.2, we will discuss our main findings per clause (preceding or following the connective) and genre, and in Section 4.3, we discuss the limitations of our study and put forward some suggestions for future research.

General collocation patterns in line with pragmatic principles and UID
In general, the underspecified connective suoyi 'so', which can express both subjective and objective relations, patterned with more occurrences of cognition verbs and modal verbs in comparison to the specific subjective connective kejian 'so'. In the context of kejian, we found more exclamatory adverbials, expressions of importance and expressions of expectation compared to the context of suoyi as a reference level.
The collocation results showed that perspective markers as a general type of linguistic cues marking subjectivity can be used in combination with either of the two causal connectives. However, if perspective markers are specifically categorized into sub-types with regards to various dimensions of subjectivity, different collocation patterns surfaced. Suoyi turned out to collocate with epistemic stance markers more often, while kejian co-occurred with attitudinal stance markers.
The collocation pattern of epistemic stance markers and suoyi is consistent with Horn's pragmatic theory of Relation principle (reducing the speaker's production effort) and Quality principle (reducing the hearer's comprehension effort). From the perspective of the R principle, if subjectivity information on the epistemic stance (including (un)certainly and evidentiality) is already specified in the connective kejian, epistemic stance markers in the context of the connective are redundant, i.e. not efficient from the speaker economy account. Suoyi, by contrast, does not provide sufficient information on the epistemic stance, and the use of epistemic stance markers therefore provides valuable information that compensates the lack of subjectivity information in suoyi. The Q principle is observed and hearers/readers' comprehension process should be facilitated.
The collocation results can also be well explained by the Uniform Information Density Theory account. With the two alternative connectives expressing discourse coherence, the presence of epistemic stance markers (e.g. cognition verbs, modal verbs) makes the content of the context highly expectable (high probability and low information), which is why it is more likely to have an underspecified connective, suoyi in this case. On the other hand, utterances with fewer occurrences of epistemic stance markers make the content conveyed by the context unexpected (low probability and high information). In this sense, the use of a specific connective is preferred. The prevalence of epistemic stance markers in the context of suoyi and their lower co-occurrence with kejian fit the need for a uniform information density throughout the sentence in terms of subjectivity. Optimal information density is realized in this way.
However, speakers/authors did not avoid overlap in the expression of subjectivity at all costs. Some attitudinal stance markers such as jingran 'surprisingly' and zhongyao 'important', which also indicate the involvement of a speaker responsible for an evaluation, occurred as important collocates in the context of kejian. Both epistemic stance markers and attitudinal stance markers express that a source of information is involved. Apparently, in their use of kejian, which also indicates that a source of information is involved, speakers and writers do not avoid overlap with that same information provided by attitudinal stance markers. However, kejian does not overlap with attitudinal stance markers in the subjectivity dimension it expresses, i.e. in expressing how the source of information is involved. The fact that kejian patterns with attitudinal but not with epistemic stance markers indicates that language users try to avoid overlap in the expression of these dimensions. The connective kejian and epistemic stance markers both indicate how certain the speaker/writer is about the information, while attitudinal stance markers express the attitude or feelings of a person towards the information. In terms of UID, the combination of attitudinal stance markers and kejian does not create high information density in the utterance. Taken together, the two observations above help to explain why attitudinal stance markers and kejian were found in collocation; they show a kind of agreement of subjectivity at the discourse level, jointly contributing to a subjective context.

Collocation patterns in different genres and clauses
To test for potential genre influences on the results, we performed collocational analyses on narratives and non-narratives separately. These analyses showed that communication verbs patterned with suoyi in the non-narrative genre, but not in narratives. This asymmetry might be due to different usage patterns of communication verbs in these genres. As illustrated in Section 3.1, communication verbs can be used to express an epistemic stance, as in example (6), or as a reportative verb to introduce a description of real-world events, as in example (7). In the nonnarrative genre, we would expect a higher frequency of epistemic communication verbs. The fact that communication verbs stood out as important collocates of suoyi in this genre is in line with the avoidance of doubling of information (as illustrated in Table 7): suoyi has more needs of epistemic markers to strengthen the epistemic nature of the utterance than kejian, which encodes such information by itself. In narratives, with their abundance of descriptions of real-world events, however, we would expect a higher number of reportative communication verbs, which do not create a doubling of information with the information provided by kejian when they are used to report objective events in one of the clauses connected by kejian. This might explain why communication verbs do not stand out as collocates of suoyi in narratives, although this explanation needs to be corroborated in future corpus research on the actual usage of different types of communication verbs in narrative and non-narrative genres.
Apart from communication verbs, all other types of collocates of suoyi in the general analysis still surfaced as important in both narrative and non-narrative genres. Although the individual collocates of each cluster slightly vary per genre, the collocation between cognition verbs, modal verbs and pronouns with suoyi was robust across genres. As for kejian, exclamatory adverbials and expressions of expectation also appeared in both narratives and non-narratives as important collocates, which suggests that the perspective markers related to expectations were indeed an important contextual feature of kejian.
In order to locate the positions of each type of collocates in causal relations, we analyzed the preceding clauses and the following clauses of connectives separately. Most of the perspective markers as collocates of suoyi appeared in both the clauses preceding the connective and the clauses following it, except for communication verbs. The collocates of kejian, however, differed in the position they appeared in contexts. Expressions of expectation appeared as important collocates of kejian in the preceding clause, which makes sense because in a subjective relation with an argument-claim structure, expressions of expectation such as jingran 'surprisingly' in example (11) mark the speaker's surprisaleither about the propositional content of the clause preceding the connective, or about the fact as an argument-claim relation as a whole. Both exclamatory adverbials and expressions of importance tended to appear with kejian in the clause following the connective. These perspective markers served as expressions of the speaker's attitude towards the claim presented in the second segment of the relation.
Communication verbs exhibited very different collocation patterns depending on the clause they occurred in. In the preceding clause, formal communication verbs did not surface as important collocates of suoyi, but rather patterned with kejian more often (example (11)). In the following clause, the tendency was reversedcommunication verbs only surfaced as collocates of suoyi such as in example (6). As example (12) shows, the communication verb yue 'say' co-occurred with kejian mainly in very formal texts, in which kejian was found more often. Therefore, such collocation pattern could be attributed to an effect of formality. On the basis of the current explorative study, we cannot draw a decisive conclusion on this issue. Further studies on the use of communication verbs in different contexts are needed, especially to find out whether our ideas about the formality and about the objective versus the subjective use of communication verbs can be corroborated.
(12) (Master Linji, a Buddhism master) da yue: ruguo yi kou qi bu lai, zhe routi haiyou ganqing ma? Kejian qinggan buzai routi shang, er zai lingxing shang. (Master Linji) answer say: if one CL breath NEG come, this body have emotion? CONJ emotion NEG at body, but at spirituality on. (Master Linji) said in response: if one doesn't breathe anymore, does the body still have emotions? So emotion is not in the body, but rather in the spirit.

Future studies and conclusion
In closing, we would like to discuss some limitations of this study. First, the current study only comprises a small set of connectives, which enabled us to provide an in-depth analysis. Future studies could extend this set to include other causal connectives. Interesting candidates for further distinctive collocates analyses seem to be yushi and yin'er (both meaning 'so/therefore'); Li et al. (2013) have shown that these causal connectives differ in the area of volitionality (i.e., is the causal relation an intentional one or not?). It would be interesting to see whether collocation patterns vary with this feature as well. Second, the sentences in causal relations were retrieved directly from the corpus without any manual annotation of the relation type, which would have taken even more time and effort. Kejian is mainly used for subjective relations, while suoyi is generic (Li et al. 2013). This means that the sample of suoyi contexts contained both subjective and objective relations, while the contexts of kejian mainly consisted of subjective relations. The unbalanced distribution of relations in the contexts of the two connectives may be a confounding factor. For instance, the fact that pronouns were distinctive for suoyi may be a feature of objective relations, because the descriptions of events and acts in objective relations may involve the use of pronouns. Nonetheless, the major findings such as the fact that modal verbs and cognition verbs are important collocates of suoyi are not characteristic of objective relations at all. We would expect stronger distinctive collocation patterns of these expressions with suoyi if we had limited the scope of the investigation to subjective relations only. More fine-grained analyses are expected to shed a clearer light on this issue.
Third, collocational analyses only provide rough tendencies in the word use in the context of a target word. It cannot support any decisive inferences, such as the predictability of one word given the other word. To further investigate the relation between connectives and their collocates, one could refer to regression analyses to investigate whether the presence of certain words in the context correlates with the presence of specific connectives, or one could opt for experimental research to investigate the effects of perspective markers on the processing of connectives.
A fourth limitation is that collocational analyses cannot distinguish word forms with the same syntactic tag but with multiple meanings. A relevant case in our corpus concerns modal verbs, some of which can be used in either a deontic or an epistemic way. Although both types of modals can have a subjective/interpersonal function (Lyons, 1977), they differ in the meaning they encodedeontic modals express obligations and permissions, while epistemic modals concerns believes (Foley & Van Valin, 1984;Johnson-Laird & Ragni, 2019;Verstraete, 2001). Our claims about modal verbs as linguistic means to express perspective would pertain to epistemic modals in particular. Examining whether the modal verbs in our corpus are used in the deontic or the epistemic way, however, would require a manual screening process. This seems to be an interesting avenue for further research.
Lastly, for practical reasons we made no distinction between argumentative and informative genres. These two genres have certain features in common in which they differ from narratives. For instance, both argumentative and informative genres have the author as the illocutionary force in most of the cases, while in narrative texts other characters are also frequently involved as the illocutionary force. However, argumentative genres also differ from informative genres in several respectsthe use of communication verbs, for instance, may be different between argumentative texts and informative texts. Separate analyses of the argumentative genres and the informative genres can provide a more refined picture.
Despite these limitations, we can conclude that the explorative approach in the current study has produced a number of interesting insights into the way in which subjectivity is expressed in discourse. What is more, this study has illustrated the relation between connectives and perspective markers, demonstrating that this will be a productive area for future research to pursue.