Topics and emotions in Russian Twitter propaganda
First Monday

Topics and emotions in Russian Twitter propaganda by Dan Taninecz Miller



Abstract
The increasing importance of social media to political communication means the study of government-sponsored social media activity deserves further exploration. In particular, text-as-data techniques like topic models and emotional lexicons provide potential for new types of content analysis of large collections of government-backed social media discourse. Applying text-as-data methods to a corpus of Russian-sponsored Twitter data generated before, during and after the 2016 U.S. presidential election shows tweets containing a diverse set of policy-related topics as well as levels of angry and fearful emotional language that peaks in close association to the election. Text-as-data techniques show Russian sponsored tweets mentioned candidate Clinton overwhelmingly negatively and referenced candidate Trump in a positive but less consistent manner. The tweets contained large minorities of apolitical topics, and also saw higher levels of conservative hashtags than progressive ones. Topics within the tweet data show a contradictory set of topics on all “sides” of the political spectrum alongside increases in fearful and angry language in temporal association with the U.S. election. The findings of this inquiry provide evidence that the tweets were sent to heighten existing tensions through topically heterogeneous propaganda. They also caution against an overly black and white interpretation of Russian disinformation efforts online.

Contents

Introduction
Russian disinformation policy and social media
Research questions
Data
Methodology
Topics models, results
Topic models, interpretation
Emotional lexicon analysis results
Discussion and conclusion

 


 

Introduction

“In all chaos there is a cosmos, in all disorder a secret order.”
— Carl Jung

Social media and electoral democracy

Social media venues are increasingly recognized as central discursive elements of the electoral processes of democracies. The 2016 U.S. presidential election demonstrates this clearly, as election day 2016 saw Twitter as the largest source of breaking news in the country with 40+ million tweets generated (Isaac and Ember, 2016). Worldwide, Twitter has been adopted by politicians across a wide swath of cultural and social backgrounds as an effective way to communicate with constituents. Social media venues like Twitter are also increasingly recognized as platforms for nation states to produce and distribute disinformation, particularly during election periods.

Unsurprisingly, social media discourse during election periods has received increased study related to political participation and to the “traction” between social media support and electoral success (Adamic and Glance, 2005; Diakopoulos and Shamma, 2010; Bekafigo and McBride, 2013; Carlisle and Patton, 2013; DiGrazia, et al., 2013). Some studies have noted that social media may influence political participation and engagement in both online and off-line contexts (Dimitrova, et al., 2014; Johnson, et al., 2010; Zhang, et al., 2013). Academics have used social media to predict election outcomes with error terms close to traditional polls (Tumasjan, et al., 2011) and social media may also hold the power to influence individual choices and preferences (Aral and Walker, 2010). Importantly, researchers have also raised concerns that the popularity and effectiveness of social media as political arena may allow for the spreading of propaganda and the manipulation of public opinion (Howard, 2006; El-Khalili, 2013; Woolley and Howard, 2016; Shorey and Howard, 2016). Automated or semi-automated social media accounts are beginning to be studied for their relationship to democratic discourse, and the potential they hold for disruptive impacts on discussion in open societies (Bessi and Ferrara, 2016). Scholars have demonstrated not only the potential power of such agents in the social media system but also the need for additional methodological tools to analyze these large qualitative datasets in repeatable ways.

While other studies interrogate social media data to discover general public communication by citizens (Larsson and Moe, 2014), the extent and characteristics of bot communications and the diffusion of information between bots and humans (Bessi and Ferrara, 2016), or to analyze the “soft influence” potential of online forums for cross-national political actors (Zelenkauskaite and Balduccini, 2017), this paper concerns itself with a policy-centric study of social media data and reaches new conclusions about the topical and emotional composition of Russian Federation agents’ discussion on Twitter during the 2016 U.S. presidential election.

If, as research suggests, political actors increasingly view digital platforms as a primary form of policy communication as they provide traction for policies (Stieglitz and Dang-Xuan, 2013), it holds that policy preferences can be induced from social media messaging. The primary role of social media for campaigns and governments means this data can be utilized as an inductive lens to understand the policy objectives of state actors. This approach is particularly useful when state policy objectives are obfuscated and/or clandestine in nature, and when datasets associated with one political actor present themselves. If political actors, including nation states, are treating social media as a platform for policy dissemination, analysis of the content (here, individual tweets) produced by state actors on online platforms can be helpful in understanding what state actor’s desired policy outcomes would be. Such efforts are facilitated by advances in computational methods capable of replicably analyzing the large-n corpora of qualitative online data generated by nation state participation in social media.

 

++++++++++

Russian disinformation policy and social media

Twitter data regarding the 2016 U.S. presidential election provides an excellent case study for this kind of state-affiliated social media analysis. The 2016 campaign was not only remarkable for the popularity of Twitter as a discursive political arena, but also because of the degree to which this arena (and indeed the election itself) was compromised by state-backed actors. Indeed, the current consensus of the U.S. intelligence community now suggests Russian “meddling” in the election unequivocally occurred [1].

Twitter and NBC news have also publicly, and in testimony to Congress, tied Russian interventions on the platform to the Russian Federation. Specifically, Twitter has presented research tying tweets on the platform to the Russia-based and state-affiliated Internet Research Agency or Агентство интернет-исследований (Edgett, 2017; Popken, 2018). The Internet Research Agency (IRA) has been assessed by the U.S. intelligence community (here, the Federal Bureau of Investigation, Central Intelligence Agency, and National Security Agency) as an organization of “professional trolls located in Saint Petersburg” (U.S. National Intelligence Council, 2017). The overall assessment of the U.S. intelligence community is that the IRA is tied to the Russian president through its financier who is “a close Putin ally with ties to Russian intelligence” (U.S. National Intelligence Council, 2017). A Special Counsel indictment notes “The organization (the IRA) sought, in part, to conduct what it called ‘information warfare against the United States of America’ through fictitious U.S. personas on social media platforms and other Internet-based media” [emphasis mine] (U.S. Department of Justice, 2018a). U.S. intelligence community reports note the Russian influence campaign was ordered directly by the president of the Russian Federation, and was also in part focused on undermining public faith in the democratic process (U.S. National Intelligence Council, 2017). For their part, former IRA employees who have spoken on the record note their confidence that the organization is “absolutely” connected to the Russian state (Popken and Cobiella, 2017). The March 2018 Twitter/NBC News corpus of over 200,000 tweets analyzed here demonstrates one element of this cohesive program on the part of the Russian Federation to interfere in the U.S. electoral debate.

This paper presents a data inquiry that resulted in new conclusions about the topical and emotional composition of Russian Federation agents’ discussion on Twitter during the 2016 U.S. presidential election. The inquiry described here uses topic models to inductively view latent topics based on the co-occurrence patterns of words within documents. This allows the researcher to “discover topics from the data, rather than assume them” (Roberts, et al., 2014). Topic modeling to understand underlying concepts also “both improves efficiency (new representation takes up less space) and eliminates noise (transformation into topics can be viewed as noise reduction)” (Řehůřek and Sojka, 2010). This feature of the method is demonstrated in various literatures to date, including analysis of climate change media coverage (Jiang, et al., 2017), as a predictor of country level conflict (Mueller and Rauh, 2017), for the categorization of speeches in the U.S. Congress (Quinn, et al., 2010), and to quantify discussions in the central bank committee of the Bank of England (Hansen, et al., 2018).

Applying topic models methods to the Twitter corpus indicates Russian-backed Twitter intervention from 2014–2017 had relatively high proportions of anti-Clinton/pro-Trump topics, though it should be noted that topics related to Clinton can be interpreted as more consistently negative than those regarding Trump are positive. Complicating matters, the corpus is also concerned with discussing both conservative policy objectives as well as progressive policy objectives, though hashtag use is proportionally overwhelmingly conservative-focused. Topic models indicate a high degree of topical heterogeneity and suggest Russian efforts sought to simultaneously provide support for discussion of a wide variety of contradictory political topics while also prioritizing goals around the candidates in the race. In other words, the Russian-backed Twitter corpus is not only topically diverse but also displays concrete foci or goals relating to language around candidates Clinton and Trump.

This inquiry also includes the application of emotional lexicons to the corpus to allow for comparison of the emotional value of word choice within tweets against a time period coinciding with the 2016 presidential election. This paper uses the English Regressive Imagery Dictionary (RID) designed by Martindale (1990, 1975), which consists of roughly 3,150 words and roots “assigned to 29 categories of primary process cognition, 7 categories of secondary process cognition, and 7 categories of emotions” (Martindale, 1975). These emotions include “positive affect”, “anxiety”, “sadness”, “affection”, “aggression”, “expressive behavior,” and “glory”. Evidence supporting the Martindale Lexicon can be found in Martindale (1990, 1975); Martindale and Dailey (1996); Reynes, et al. (1984); Martindale and Fischer (1977); West, et al. (1983); West, et al. (1985); and, West and Martindale (1988). This paper also utilizes the National Research Council Canada (NRC) Emotion Lexicon. The NRC lexicon is a list of English words and their associations with the eight emotions of anger, fear, anticipation, trust, surprise, sadness, joy, and disgust (Mohammad and Turney, 2013). The NRC Emotion Lexicon annotations were manually coded via crowdsourcing through Amazon’s Mechanical Turk. This coding process was developed to help reduce variance between coders for word-emotion associations. Further usage of the lexicon can be found in Mohammad and Yang (2011), Mohammad (2011), and Mohammad and Turney (2010).

What follows is a brief overview of our research questions, data, and methods. After establishing research questions, this paper presents two topic model output results for the Twitter corpus. Following this, the results of the model are interpreted. Next, several emotional lexicons are used to provide further information on the data. Finally, the paper concludes with a discussion of possible explanations for the topic model and emotional lexicon outputs.

 

++++++++++

Research questions

  • What topics make up the largest proportions of the Russian-backed Twitter corpus? What policies, individuals, and social movements were most discussed in the tweets?

  • What, if any, emotional and/or semantic trends are evident in the Russian-backed Twitter corpus over time?

  • Taken together, what do the topics and emotional/semantic themes of IRA/Russian tweets say about the policy goals of the actor(s) that supported their creation?

To address these research questions an inductive analysis of the tweets within this corpus has been conducted here for a simple reason: the aim of this paper is to provide insight into the topical and emotional content and characteristics of Russian-backed Twitter data which was unknown at the initiation of this inquiry. As such, this paper forgoes formal hypotheses to a: better represent a naive starting point with the data; and, b: to avoid post hoc theorizing. In the interest of research transparency it should be noted there were no formal expectations of specific topics or emotional content patterns when this analysis began.

 

++++++++++

Data

Data collection for this inquiry consisted of obtaining the publicly available Twitter/NBC News corpus of IRA/Russian Federation associated tweets from July 2014 to September 2017. The Russian-backed IRA tweets, and not the account handles themselves, are the primary level of analysis in this inquiry. This data was made publicly available by NBC News in collaboration with three unnamed sources familiar with Twitter’s development system in March 2018, and consists of some 200,000 tweets from 3,814 Twitter accounts associated by Twitter with the Russia-based Internet Research Agency [2]. The same underlying Twitter accounts, originating from several third parties as well as Twitter’s internal analysis of Russia-linked accounts, were also presented in Twitter testimony to the U.S. Congress in October and November 2017 (U.S. Senate Committee on the Judiciary, Subcommittee on Crime and Terrorism, 2017).

The third parties that originally alerted Twitter to the IRA linkage are not named specifically in the Congressiona testimony, but Twitter’s counsel notes “The IRA tips we got were from news organizations in 2015 and then also a third-party company we used to do deep Web monitoring to give us threat information” (U.S. House of Representatives, Permanent Select Committee on Intelligence, 2017). Twitter’s counsel also notes “[...] we’ve been fighting these types of issues for a while. We saw, in 2015, IRA activity and took large scale action against those accounts and shared that information with other companies at the time.”

Twitter does note its investigators have traced Russia-linked accounts in the past through detection of “unusual activity” including patterns of tweets, “likes”, and “follows” (U.S. Senate Committee on the Judiciary, Subcommittee on Crime and Terrorism, 2017). Twitter also notes they employ internal, manual reviews conducted by employees. This proprietary process also incorporates user reports to “calibrate” detection tools to identify spam. Twitter notes specifically that they “rely on objective, measurable signals” like the timing of tweets to classify a given action as automated (U.S. Senate Committee on the Judiciary, Subcommittee on Crime and Terrorism, 2017).

As no single characteristic can reliably determine geographic origin of a tweet, Twitter utilized several criteria ranging from origin of account creation, user registration tying the account to a Russian phone carrier or a Russian e-mail address, user display names showing Cyrillic characters, frequency of user tweets in the Russian language, and/or whether the user has ever logged in from any Russian IP address (U.S. Senate Committee on the Judiciary, Subcommittee on Crime and Terrorism, 2017). As such, Twitter assumed an account to be Russian-linked if the account had a Russian e-mail address/mobile number/credit card/login IP; or, the Russian Federation was the declared country on the account; or, Russian language or Cyrillic characters appeared in the account information or name. The 3,814 user accounts whose tweets are examined here were linked to the IRA through this information, through the aforementioned deep Web monitoring, through press/journalist tips to Twitter, and through analyzing specific purchases of promoted tweets during the election period (U.S. Senate Committee on the Judiciary, Subcommittee on Crime and Terrorism, 2017; U.S. House of Representatives, Permanent Select Committee on Intelligence, 2017).

The author is not aware of any speculation that the data used here is unreliable. However, prior to engaging in this inquiry it is important to address whether the Twitter data represents IRA-linked content to bolster concept validity. Indeed, no dataset is perfect, a thorough understanding of underlying information is key to any research project, and there are several caveats to be made with the NBC/Twitter corpus. In particular, it should be noted that Twitter lacks complete transparency in releasing full datasets related to controversial issues like Russian interference. Twitter did not itself release the tweet content related to IRA activity (only a list of user names), as the firm has a policy of deleting controversial posts entirely. As such, the Twitter/NBC News corpus, like many other Twitter corpora, rely on third parties to access the information and/or re-attach Twitter IDs to actual tweets. Twitter’s terms of service (TOS) does not allow full datasets of tweets to be given to a third party, but the TOS does allow datasets of tweet IDs to be shared. From this the data, full tweet content can be re-attached to the appropriate account. Twitter also does not provide full code or detailed metrics for their identification process, nor do they appear to disclose the names of their subcontractors.

In this context researchers must unfortunately also do without additional corroborating evidence that may be either classified by the U.S. government, kept private by Twitter itself, and/or protected by NBC News (who are also understandably keen to protect sources). For example, the individuals who connected the Twitter user names to actual tweet data are anonymous, and the practice of connecting user names to user data is a common enough practice in journalism and academic research to make further identification impossible. Ultimately, the NBC/Twitter dataset used in this inquiry represents not only Twitter’s public efforts to disclose Russian meddling on the site but also the work of others to connect user names to underlying tweets.

With the above data sourcing realities in mind, there are several important logical corroborators for the dataset. First, the underlying user names in this corpus were presented in testimony to the U.S. Congress by a legal professional. Second, organizations and researchers ranging from the Atlantic Council to academic researchers at Clemson University have utilized this data or related datasets for analysis. The Digital Forensics Lab at the Atlantic Council notes Twitter “maintains high confidence [the data are] associated with the Russian Internet Research Agency [...]” (Digital Forensic Research Lab, 2018). Third, follow up coverage of this dataset indicates very few of the accounts in question were real/human U.S. accounts misidentified as Russian-backed. Analysis by researchers suggest perhaps only 20 accounts out the data were actually real U.S. individuals [3].

Finally, concept validity is also supported by Twitter’s Congressional testimony, stating nine percent of the original 2,752 tweets were “election-related”, with roughly 47 percent being automated (U.S. Senate Committee on the Judiciary, Subcommittee on Crime and Terrorism, 2017). As such, we can also be confident the tweet data examined here was not simply selected by Twitter because of content relationships or matches to the U.S. election itself.

 

++++++++++

Methodology

To utilize these tweets for topic model and emotional lexicon analysis several standard steps had to be undertaken in R. Data cleaning was an initial requirement as this textual data (and Twitter data in particular) was characterized by high proportions of non-word content and typos. As noted elsewhere, data of this type typically needs to undergo extensive preparation before it can be fed into a topic model algorithm (Schmiedel, et al., 2018).

This said, one of the major advantages of performing text analysis in R is the ability to combine different analytics packages and streamline data cleaning. Additionally, the analysis conducted is inherently replicable. This paper utilizes the Quanteda and STM packages, along with several emotional lexicons, to produce analysis of the data. To prepare the tweet data it first must be imported from the .csv file into R (R natively supports reading flat text files such as .csv). After confirming the data was properly encoded, the tweets were converted from the .csv file into a Quanteda corpus using the Quanteda package. This allowed for metadata to be read by R as separate variables and is also one opportunity for standard “preprocessing” measures to be taken (Welbers, et al., 2017). This corpus was then converted into a Document Frequency Matrix, Quanteda’s version of a Document Feature Matrix. This step allows for data analysis with matrix and vector algebra and moves the data character text to numeric values (Welbers, et al., 2017).

A DTM/DFM was created from a Quanteda corpus because this allowed the associated metadata and document-level variables (including tweet-specific year-month-day/hour:minute:second time values) to be preserved in the DFM. Subsequently, R packages could then determine how often a given term is used in each document, data/time values for each document, etc. Common stopwords were also removed during the DFM conversion process. The resulting DFM also removed punctuation and converted all letters to a lower-case form. Symbols and hyperlink related syntax were also removed for clarity. Tokens (originally words) were stemmed using the SnowballC package (Bastin and Bouchet-Valat, 2014) through Quanteda. Additionally, additional topic model analysis using only the STM package (forgoing Quanteda) was conducted to utilize additional estimators for k (topic number) values.

This analysis then diverges into two data approaches that build on Boumans and Trilling’s (2016) characterization establishing “counting/and dictionary”, “supervised machine learning”, and “unsupervised machine learning” as the three current methodological schemes for text analysis. Their study orders these three approaches from most deductive to most inductive, respectively. They convincingly argue dictionary/lexicon analysis employs the use of a priori coding schemes (words to emotions) while unsupervised learning tools use algorithms to induce meaningful from text data. Here, topic models are used to draw patterns from the co-occurrence of words in the Twitter data [4], while emotional lexicons are deployed as pre-established patterns to sit on top of the Twitter data [5].

Topic model-based research generally refers to the number of topics within the analysis as a value of k . This k value is the only element of the unsupervised method set by the user, and represents the total number of topics requested by the user to summarize the corpus. This inquiry began by using a series of k values (in a variety of initializations and labeling outputs), examining k=5, k=10, k=20, k=50, k=56, k=60, and k=100 models in the exploratory stage of research. These k levels were chosen to reflect current topic number ranges listed in relevant literature to assess the overall topical content at different topic number levels (Schmiedel, et al., 2018; Roberts, et al., 2018) [6]. Producing multiple k value models was also done to assist in reading and establishing the topical characteristics of the corpus for the researcher. After initial modeling and comparison, the author chose to exclusively use the “spectral” initialization recommended by the STM package vignette as it (in general) outperforms both Latent Dirichlet Allocation via collapsed Gibbs sampling as well as random initialization (Roberts, et al., 2018).

Obviously, including summaries of all the topic models listed above in this paper would produce a final report of both great length and high repetition. As such, the k=56 model was chosen to summarise this corpus for a variety of reasons explained briefly below. First, the k=56 model fits well within the already referenced best practices in topic modeling literature for a corpus of this size. Second, this model was informed by STM’s “k=0” functionality that uses Lee and Mimno’s (2014) t-SNE/anchor words algorithm, along with the spectral initialization, to construct plausible k values for a given data space size (note however the STM package stresses this is not a “correct” k initialization, merely one that is well-fitted to the data space). This initialization process is not deterministic, and thus will not necessarily produce the same number of topics/content within topics in each run. As such, the author ran multiple k=0/spectral models, noting the resulting range of models to be centered roughly around the k value chosen here. In other words, 56 topics was a value that fell within a range of multiple similar estimates made by the package to fit the data well. Third, a k=56 model also corresponds well to estimates for this dataset resulting from metrics developed by both Arun, et al. (2010) and Griffiths and Steyvers (2004). For these reasons utilizing the k=56 model is well-supported in this application.

This paper presents the k=56 model using the “highest probability” and “score” labeling algorithms. The highest probability labeling algorithm is inferred directly from the topic-word distribution parameter, β. The score labeling algorithm is a matrix containing the log probabilities of seeing word v conditional on topic k. Highest probability and score labeling algorithms were used here as they routinely generated only non-junk topics for all topics in the model. In other words, these configurations were least prone to constructing topics with non-word data remaining in the corpus despite preprocessing. This inquiry structures analysis of both labeling outputs in two ways. First, we present the results of the models. Second, we provide subject matter expert interpretation of the models in keeping with the literature of topic models (Schmiedel, et al., 2018) [7]. Both results and interpretation sections also include description and interpretation of words and hashtags included in the topic labels generated by the models.

The second element of this analysis focuses on the use of emotional lexicons with the tweet data to discern potential emotional and semantic trends over the July 2014 to September 2017 time period. This analysis concerns the emotional and semantic values of the language within the IRA tweets as measured by the Martindale and NRC lexicons (Martindale, 1990, 1975; Mohammad and Turney, 2013). Emotional lexicons were used with the same data as the topic model analysis and was also conducted by first running the tweets through Quanteda’s Document Frequency Matrix function with one of the lexicons set as a dictionary. This step coded the tweets to the respective emotions and sentiments contained in each of the lexicons. This DFM was then converted to a conventional data frame, and a string value variable indicating tweet creation time was converted into a “date” format. Proportions of emotions and sentiments could then be measured over time. Tweets remain the primary level of analysis in the emotion lexicon section as well.

 

++++++++++

Topics models, results

What are the most important topics within the Russian-backed Twitter corpus coinciding with the 2016 U.S. presidential election? To help answer this question several visualizations of the k=56 model are useful (see Figures 1 and 2). Figure 1 shows the model in the “highest probability” labeling algorithm format (inferred directly from the topic-word distribution parameter, β), while Figure 2 shows the model in the “score” format (a k by v matrix containing the log probabilities of seeing word v conditional on topic k).

Topic models of the data reveal several general observations. On the whole this is a corpus with a majority of topics referencing political concepts, with the highest probability labeling algorithm having 75 percent and the score labeling outputs having 73.2 percent of topics containing political language as judged by the author [8]. Using the spectral initialization at k=56 the highest probability labeling algorithm shows 42 of 56 topics (75 percent) contain political language (Figure 1 and Appendix 1). At the k=56 level the score labeling algorithm shows 41 out of 56 topics (73.2 percent) contain political language (see Figure 2 and Appendix 2) [9].

Assessment of political/apolitical language by three coding assistants unfamiliar with the corpus contents actually rated the corpus significantly less political in content, with the highest probability label seeing a mean of 31 (55.4 percent) and the score label seeing a mean of 33 (58.9 percent) of topics as containing political wording [10]. More discussion of this disparity can be found in the Topic Models Interpretation section that follows.

 

Expected Topic Proportions (highest probability labeling algorithm) for Internet Research Agency sponsored tweets
 
Figure 1: Expected Topic Proportions (highest probability labeling algorithm) for Internet Research Agency sponsored tweets, k=56.
 
Note: Larger version available here.

 

 

Expected Topic Proportions (score labeling algorithm) for Internet Research Agency sponsored tweets
 
Figure 2: Expected Topic Proportions (score labeling algorithm) for Internet Research Agency sponsored tweets, k=56.
 
Note: Larger version available here.

 

Several characteristics of the corpus are relevant in relation of the research questions of this paper. The k=56/highest probability model contains 10 topics (17 percent of the topics) referencing candidate Trump, of which four were identified by the author as being supportive of Trump/the campaign via topic content [11]. The role of the Clinton campaign and Hillary Clinton herself in the corpus in the k=56/highest probability topic model is both prominent and negative. This model contains seven topics (12.5 percent of the topics) referencing candidate Clinton, of which all seven can be identified as being anti-Clinton via topic content.

In the highest probability model, 6 topics (10.7 percent of the topics) were judged by the author to contain language referring to refugees and border issues, and two of six had negative context. In contrast to the Clinton topics, of the five topics (8.9 percent) referencing President Obama, only one topic was identified as being overtly negative in topic content. Three topics (5.3 percent of the corpus) involve the Black Lives Matter movement and/or Black Power movements, two of which appeared to have positive language context. Seven topics (12.5 percent of the corpus) contain Christian language, with two of seven having positive context. The highest probability model also has two topics (3.5 percent of the corpus) featuring Russia and/or President Putin, though no positive/negative context was clear in topics featuring Russian and/or Putin.

Finally, 14 (25 percent) of the topics in the highest probability output were judged by the author to be apolitical and contain no overtly political language. As noted previously, coders unfamiliar with the data scored this output as 44.6 percent apolitical. This is a pattern closely mirrored in the score output.

The k=56/score configuration shows similar topical distributions, with several subtle but important differences. The k=56/score model contains nine topics (16.07 percent of the topics) judged by the author to refer to candidate Trump, of which four can be identified as being explicitly pro-Trump via topic content. The Clinton campaign/Hillary Clinton in the k=56/score model are also referenced in nine topics (16.07 percent), of which seven were identified as being anti-Clinton via topic content (one of the two topics identifying Clinton that is not explicitly anti-Clinton or pro-Trump appears to reference the Clinton Foundation, a popular negative talking point among some conservative media outlets during the 2016 campaign). In the score output there are an equal number of topics containing language opposing Clinton as there are containing language supporting Trump. In the score configuration candidate Clinton and candidate Trump were also featured in equal numbers of topics proportionally. Notably, the single most prominent topic in each output both contain language focused on the Clinton campaign and/or Hillary Clinton, and both of the most proportionally prominent topics contain context/language that is negative towards the Clinton campaign.

The k=56/score output shows four topics (7.14 percent of the topics) featuring language referencing refugees and border issues with two of four identified by the author as negative in content. In contrast to the Clinton topics, of the three topics (5.3 percent of the corpus) referencing President Obama, only one topic was identified as being negative towards President Obama in topic content. Again, three topics (5.3 percent) involve the Black Lives Matter (BLM) movement and/or African American political movements, two of which were coded by the author as positive references to the BLM movement. The score model shows four topics (7 percent) contain Christian language, all without positive or negative context. Two topics (3.5 percent) concern Russia and or President Putin, though no context could be discerned.

Like with the previous output, a large proportion of the score model contains apolitical topics. In the score output, 15 topics (26.7 percent) appeared to the author to be apolitical or contain no overtly political language. Observers unfamiliar with the corpus judged it to be composed of 41.1 percent apolitical topics.

Hashtags within the topics show the tweets more likely to feature conservative and/or Republican content. Conservative hashtags like “#maga”, “#trump”, “#pjnet” [12], and “#tcot” [13], “#trumpforpresident”, “#ccot” [14], “#lnyhbt” [15], “#trumptrain”, and “#neverhillary” appear at least 37 times in the highest probability output and 52 times in the score output. In contrast, the highest probability output had at least eight progressive associated hashtags, while the score output had at least nine such hashtags. Coding for hashtags was conducted only by the author as hashtags identification requires subject matter expertise and research for accurate identification.

As a segue into the next portion of this paper it should also be noted that potentially violent language, including terms like “kill”, “hate”, “attack”, and “protest”, are present in several topics in both topic model outputs. Several racial slurs targeting African Americans are also included in various topics generated from the data.

 

++++++++++

Topic models, interpretation

One of the values of topic models is that they allow subject matter expert review of replicable and transparent building blocks (here, the topic outputs). This section concerns itself with an SME review of the topic model results described above.

First, it is notable that both model outputs contain many topics containing no discernible references to political phenomena. SME review of the outputs finds 25 and 26.7 percent of topics as apolitical for the highest probability and score outputs respectively. Review from coders unfamiliar with the corpus show even higher levels of apolitical topical content, with ratings of 44.6 and 41.1 percent apolitical for the highest probability and score outputs respectively. Readers should note the obscure hashtags, slang, and individuals referenced in the corpus may help explain this difference. For example, this corpus contains references to Trump supporter and pundit sheriff David Clarke, as well as to the hashtag “#BB4SP” (Barracuda Brigade for Sarah Palin), representing an obscure political personality and a difficult to interpret hashtag, respectively. Such complex language may help explain the relatively weak Krippendorff’s alpha coefficients and simple agreement percentages between the coders. As obscure abbreviated references and hashtags are contained in many of topics in both outputs, low inter rater reliability (IRR) may indicate that topic models benefit from subject matter expert review and interpretation (Schmiedel, et al., 2018). These IRR scores may also indicate more extensive pre-coding training would be advisable for future similar studies. The esoteric language within the topics, as well as the relatively low IRR scores, do indicate variance between a subject matter expert and interpreters unfamiliar with the data is not necessarily surprising. It is nonetheless interesting to note this corpus is if anything less political in language content than the author’s assessment.

Topic models showing 25–26.7 percent of topics (SME review) being non-political may be partially explained by the fact that managers of artificial networks of Twitter profiles are likely interested in not appearing to be artificial. Such an interest in appearing human may help partially explain the absence of a binary and unified narrative from the IRA-associated accounts containing only political information. The apolitical content and topics within the IRA Twitter data provides a fruitful further avenue of research and a modulation of our understanding of Russian-linked online interference. As we discuss in the final section of this paper, the IRA’s role as something of a trusted contractor for the Russian state may also explain some of this content (the IRA may have been experimenting with multiple strategies to win government approval and funding, and/or may have had multiple contracts or projects underway at a given time).

In terms of political topics, topic models generate several key findings. While intelligence community investigators of Russian disinformation are likely unsurprised to find President Trump and his campaign featured heavily in this data, it remains noteworthy a corpus of this provenance appears to feature then-candidate Trump as one of the most prominent topics. Interestingly, Trump does not appear in the majority of topics here, indicating the common perspective that Russian meddling was purely or primarily pro-Trump is oversimplified. The topic model analysis included here both replicably corroborates intelligence community accounts and anecdotal social media opinions claiming Russian support for the Trump campaign while at the same time adding nuance and complexity.

Indeed, in the score label output, Donald Trump and the Trump campaign do not exclusively hold the most prominent topical position in the Russian Twitter data. In this output the number of topics featuring Clinton and Trump are equal. If we combine frequency of candidate name/hashtag inclusion into a topic with frequency of negative context of a topic, the role of the Clinton campaign and candidate Clinton stands on its own as a feature of the data. It is notable that while there is seemingly a more ambiguous treatment of the Trump campaign and Donald Trump himself in the data, the topics generated by Russian agents regarding Clinton are overwhelmingly negative in word content. Upon SME qualitative review, both labeling algorithm outputs Russian-backed Twitter topics about Hillary Clinton were more likely to be negative than Russian-backed Twitter topics regarding Donald Trump were to be positive. When judging by the language context of each topic, the score output appears to show a corpus at least as focused on negative language in association with the Clinton campaign as on positive language in association with the Trump campaign.

The findings of this inquiry also complicate analysis of Russian-sponsored online propaganda by indicating the tweet corpus contains a host of both conservative and progressive political topics. An SME qualitative summarization of the top five most proportional topics in each label output helps point this out. The top five most prominent topics by proportion of the total corpus in the highest probability output shows a diverse mix of language content. An SME labeling of these topics based on their content might be stated as: Topic #1 (Topic 13) “Corrupt Clinton and Never Hillary”, Topic #2 (Topic 55) “The Obama Administration and International Relations”, Topic #3 (Topic 20) “Vote Trump, MAGA, fraud and Hillary for Prison”, Topic #4 (Topic 39) “Trump/Clinton, Immigration and Russia”, Topic #5 (Topic 22) “Change the World, ‘Donald’, and Refugees”. The top five topics in this output demonstrate this corpus has multiple overarching policy goals and does not focus exclusively on supporting the Trump campaign.

An SME qualitative labeling of the score topics shows quite a similar picture to the highest probability output. Based on content, SME labeling might be stated as: Topic #1 (Topic 13) “Corrupt Clinton and Never Hillary”, Topic #2 (Topic 55) “The Obama Administration, Congress and International Relations”, Topic #3 (Topic 20) “Vote Trump, Get out the Vote, and Hillary for Prison”, Topic #4 (Topic 39) “Trump/Clinton, Rallies, Immigration and Russia”, and Topic #5 (Topic 22) “Change the World, ‘Donald’, Anti-Islam and Refugees”.

Topics in both the highest probability and score labeling outputs contain language referencing a variety of contradictory political positions. Immigration, refugees, and border issues are all included in topics in both outputs. At the same time, race politics/identity movements, in particular the Black Lives Matter Movement, plays a prominent policy role in the corpus. Beyond the two “primary” candidate-centered topics, topic models indicate Russian-backed Twitter intervention also supported a topical heterogeneity that must be recognized. Contextually, the host of differing topics in the models make positing a monolithic policy preference within the data difficult. Both the highest probability and score topic model outputs demonstrate this disinformation effort was both complex and nuanced. For example, while then-candidate Trump is one of the central topics in the Russian tweet corpus, the data also demonstrates negative conversation regarding the Clinton campaign was in some ways as important a topic to the creators of the tweets as positive conversation regarding the Trump campaign.

An important caveat to this assessment is that hashtag usage in the tweets appears to indicate topics in the Russian Twitter data include much higher levels of conservative hashtags use than liberal/progressive hashtags use. Further exploration of variance in policy topics and hashtag utilization are one interesting avenue for future research opened up by this data.

Discussion of increases in angry and fearful language in this corpus will make up the next section of this paper.

 

++++++++++

Emotional lexicon analysis results

The IRA-linked tweets also contain trends in terms of emotional language content and sentiment analysis. The Twitter data contains a year-month-day/hour:minute:second (“Y-M-D/H:M:S”) format time value for each document (tweet), making temporal analysis possible. Using several emotional lexicons on the tweets shows the IRA used increasingly aggressive language on social media during and immediately preceding the 2016 election. Cross referencing all the IRA-tied tweets with a peer-reviewed dictionary containing lists of words associated with particular emotions makes it clear that during September, October, and November of 2016 the overall number of aggressive words in the corpus increased in quantity. Using the Martindale lexicon it is apparent the number of anger-associated words in the corpus was highly concentrated around the temporal window of the election (Martindale 1990, 1975) (see Figure 3).

The Martindale findings are also corroborated with the more recently created crowdsourced NRC dictionary developed by Mohammad and Turney (2013). Using the NRC lexicon, measures of fear and anger see proportional increases somewhat earlier than the Martindale configuration (see Figures 4 and 5), as well as smaller increases in this type of language coinciding with the election period. Both of the increases in proportion of fearful and angry words occur in the 2016 calendar year. Additionally, both emotional word usage increases fall within the election cycle and come well after the IRA was already confirmed to be closely monitoring/discussing interfering in the 2016 election.

 

Number of aggression-associated words over time
 
Figure 3: Number of aggression-associated words over time (Martindale, 1975). X axis from the Y-M-D/H:M:S value of each tweet document.
 
Note: Larger version available here.

 

 

Proportion of anger-associated language tweets over time
 
Figure 4: Proportion of anger-associated language tweets over time (Mohammad and Turney, 2013). X axis from the Y-M-D/H:M:S value of each tweet document.
 
Note: Larger version available here.

 

 

Proportion of fear-associated language tweets over time
 
Figure 5: Proportion of fear-associated language tweets over time (Mohammad and Turney, 2013). X axis from the Y-M-D/H:M:S value of each tweet document.
 
Note: Larger version available here.

 

 

++++++++++

Discussion and conclusion

What topics make up the largest proportions of the Russian-backed Twitter corpus? What policies, individuals and social movements were most discussed in the tweets? What, if any, emotional and/or semantic trends are evident in the Russian-backed Twitter corpus over time?

The findings detailed here necessitate a more nuanced conceptualization of Russian Twitter interference than as a singularly pro-Trump project. First, topic models indicate many of the prominent topics in the data may be better articulated as “anti-Hillary” than “pro-Trump”. Second, topic models show us the data is topically heterogeneous and contradictory, addresses a variety of political positions and issues, and has a relatively high proportion of apolitical topic content. Anecdotal Twitter-user and U.S. intelligence community information lends credibility to the claims of pro-Trump Russian interference, and indeed the findings of this inquiry support such a conclusion. However, topic model analysis modifies such a perspective in important ways.

Topics generated by Russia-backed accounts on Twitter before and during the 2016 election do feature then-candidate Trump heavily. Hashtag use in Russian-backed discourse is also overwhelmingly focused on politically conservative issues, talking points, organizations and beliefs. However, topic models paint a more complex picture that shows the Clinton campaign as in some ways an equal target of Russian efforts online. Contextually speaking, Russian Twitter propaganda was by some measures more negative towards Clinton than it was positive towards Trump, and in the score label output Clinton’s campaign was an equally important topic in the data. This corpus also contains too high a degree of topical heterogeneity and contradiction to categorize it as simply “pro-Trump” (or for that matter “anti-Clinton”).

It is noteworthy that in conjunction with this topical complexity there are also distinct emotional word usage patterns in the Twitter corpus. This is important because it is entirely reasonable to expect a priori that a corpus allegedly supportive of conservative U.S. politics (attacking Democrats and supporting Republicans being held equal for the moment) would not display spikes of fearful and angry emotional content during the election period. Indeed, because this is a corpus at least somewhat concerned with candidate Trump winning, it is interesting one method used to achieve this goal would include peaked levels of angry/fearful emotional wording. A corpus heavily featuring conservative political hashtags and topics, while also being increasingly fearful and angry, raises further questions. Additional research should be conducted to interrogate possible associations between conservative/progressive topics and particular emotions and sentiments in Russian-backed online messaging.

The Twitter data also suggests an IRA/Russian policy aimed at negatively affecting U.S. political consensus. Such a policy is displayed here through the angry/fearful, topically diverse, and topically contradictory nature of the topics within the data. The Russian/IRA [16] efforts seem to prioritize several favored outcomes while pushing all sides of a debate and encouraging multiple different political viewpoints. The Russian tweets also appear to model unbiased human users, instead of automated or semi-automated accounts, to facilitate this attack on consensus. Such acting may be an important element to the “bot” and/or “troll” Twitter farm operational model in that it allows for covert infiltration of real human networks (Ferrara, et al., 2016). This is further supported by assertions elsewhere claiming accounts presented by Twitter to the U.S. Senate were likely actually “cyborg” accounts at least partially operated by human users (Chu, et al., 2012). These factors help explain the prevalence of apolitical and “small-talk” topics in the corpus; the topics could have been promoted to garner trust from human Twitter users and obfuscate the true origin of the tweets.

While topic models demonstrate a diversity of topics within the corpus and thus suggests a contradictory flow of information, policy and public perception management are of course not just about topics but also about emotions and sentiments. Russian policy frameworks set on confusion and heightened “political intensity” (U.S. Department of Justice, 2018a) are supported and expounded upon through observations of increased angry and fearful wording in the tweets. While this data is limited in several important ways (can a reference distribution of human Twitter users be obtained during this time period and compared to the Russian accounts? Would this distribution show statistically significant differences in such emotions to the Russian distribution? Can we ever be sure a reference distribution is not corrupted by Russian troll/“cyborg” accounts?), it is still important to observe Russian-sponsored tweets containing angry and fearful words increasing in frequency in close proximity to a U.S. election. Previous work has demonstrated political parties exploit online campaigning in a “stop-start” structure centered on election cycles (Gibson, 2004; Larsson and Moe, 2014). In this sense increases in overall Twitter activity should not necessarily surprise us. However, this inquiry A: makes clear the Russia-backed tweets examined here show a similar “stop-start” characteristic within the temporal context of a domestic U.S. election; B: demonstrate such election-centric cycles of Russia-backed messaging may be characterized by increased levels of anger, fear, etc.; and, C: increases in certain types of emotional word usage may be a component of online Russian propaganda activity more broadly.

Taken together, what do the topics and emotional/semantic themes of IRA/Russian tweets say about the policy goals of the actor(s) that supported their creation?

The topically diverse, politically conflicted, and fearful/angry nature of the data also necessitates a more nuanced theoretical explanation than simply “Russian Twitter misinformation was pro-Trump”. The IRA/Russian efforts on Twitter in 2015–2017 are in many ways best explained as part of a broader policy of Russian Federation propaganda emphasizing demoralization, division, and distraction while targeting democratic polities using opportunistic, fragmented, and contradictory messaging. Similar articulations have been convincingly offered, without the social media data context, by scholars like Paul and Matthews (2016). Others have noted Russian Federation policy operates through a series of semi-formal organizing strategies to guide disinformation campaigns, and include the “Firehose of Falsehood” (Paul and Matthews, 2016), “Gerasimov” (Galeotti, 2018), and “Aleksandr Dugin” (Dunlop, 2004) models of intelligence operations.

Aleksandr Dugin, a Russian academic and political advisor whose work The foundations of geopolitics: The geopolitical future of Russia (1997) [17] is seen as a “textbook” for the Academy of the General Staff of the Russian military, believes the Russian State should encourage separatism and unrest in the United States, in particular along racial lines (Dunlop, 2004). Similarly, a “hybrid war” and/or a “Gerasimov Doctrine” approach to conflict aims to “destabilise societies and create ambiguity to hinder decision-making.” (European Commission, 2016). These approaches often emphasize social media as a powerful medium of distribution that plays a central element in disinformation operations. Moreover, digital information and communication issues also appear to be key risks mentioned in Russia’s military doctrine (Embassy of the Russian Federation to the United Kingdom of Great Britain and Northern Ireland, 2015).

Internal memos from the IRA itself note that the organization should “use any opportunity to criticize Hillary and the rest (except Sanders and Trump — we support them)” and that “it is imperative to intensify criticizing Hillary Clinton” (U.S. Department of Justice, 2018a). This analysis suggests anti-Clinton topical emphasis was proportionally one of the more dominant elements of the Twitter campaign. One of the topic models demonstrates the Russian efforts were more anti-Clinton than they were Pro-Trump, and both models show negative discourse around the Clinton campaign as one of the two most prominent topics. This presents an important difference from current scholarly and popular assessments of Russian intervention in U.S. social media.

Numerous topics connected to candidate Trump, along with President Obama and a host of domestic and international policy issues, appear in both topic model outputs from all sides of the political spectrum. Meanwhile, the IRA itself noting a bias towards both Sanders and Trump suggests a deeply pragmatic policy focused on disinformation that does not easily conform to domestic U.S. “left/right” paradigms and supports viewing such social media efforts from a Russian policy/Russian geopolitical interests’ perspective. Stepping back from specific candidate support or immigration policy debates, the Russian effort here has candidate preferences but is also highly concerned with discursive chaos. The corpus examined here lends replicable, social media centered support to the idea that the Russian disinformation approach may not necessarily be most concerned with pushing a particular policy outcome, but instead with the promotion of a host of contradictory outcomes.

The findings of this inquiry are also in dialogue with a growing body of literature in the computational social sciences related to online propaganda, manipulation, and discussion setting. The findings of this paper help establish that similar types of conversation manipulation outlined in previous scholarship is not limited to so-called “young democracies” and can be found even in highly developed representative systems (Zelenkauskaite and Balduccini, 2017). The findings of this inquiry are also important in light of recent scholarship demonstrating automated efforts on Twitter are likely larger and more pervasive manipulations of discourse than previously identified (Bessi and Ferrara, 2016). Woolley and Howard (2016) identify a fundamental need to understand how the data of the individual is used for “political applications”, and the author hopes this study helps add to such an understanding through an exploration of nested Russian policy objectives in social media data.

Further research is certainly needed to attempt to find a reference distribution with which to compare the data used here. Perhaps further research could incorporate causality tests of emotional interactions between the Russian accounts and human users. Interesting work has also been done to use topic models alongside regression analysis for predictive purposes (Mueller and Rauh, 2017), an avenue of research this author is currently pursuing as well. All the same, the spike in IRA-sponsored angry/fearful language, along with the topic content shown here, provides a specific discursive preference within Russian-state associated tweets.

A pervasive electoral communications platform being utilized by a foreign state for its own policy objectives is worthy of a deep, inductive and repeatable analysis. The research conducted here was not concerned with measuring the impact of Russian interference on American citizens, but in fact what the form of the interference itself may indicate about underlying Russian policy objectives. The IRA has a stated goal of “spread[ing] distrust towards the candidates and the political system in general” (U.S. Department of Justice, 2018a), and the underlying Twitter data examined here affirms this goal while also adding to it in important ways. This type of biased, contradictory, angry, and fearful discourse, as articulated through topic models and emotional lexicons, helps to define Russian social media disinformation and propaganda campaigns. End of article

 

About the author

Dan Taninecz Miller is currently a Doctoral Candidate at the Jackson School of International Studies (JSIS) at the University of Washington. He also holds a Master’s degree from JSIS and a B.A. from Guilford College. Dan’s research interests lie in applying computational social science tools to large corpora of mixed qualitative-quantitative data. His primary area expertise is in China’s outbound investment policy as well as in the Chinese political economy more broadly. Outside of his academic work he is a financial investigations professional working primarily on East Asian due diligence research questions.
E-mail: taninecz [at] uw [dot] edu

 

Data availability

The Twitter data set is publicly available. It can be accessed at: https://www.nbcnews.com/tech/social-media/now-available-more-200-000-deleted-russian-troll-tweets-n844731.

 

Software information

R was used for the analysis conducted in this paper. The “stm” and “quanteda” packages were particularly relied upon.

 

Notes

1. U.S. Department of Justice, 2018b. “Grand jury indicts thirteen Russian individuals and three Russian companies for scheme to interfere in the United States political system” (16 February), U.S. Department of Justice, press release 18–198, at https://www.justice.gov/opa/pr/grand-jury-indicts-thirteen-russian-individuals-and-three-russian-companies-scheme-interfere, accessed 10 April 2019.

2. To clarify, the list presented to Congress by Twitter originally included 2,752 Twitter accounts linked by Twitter to the IRA. Later, Twitter would amend this number to the 3,814 number used in this inquiry. More information on this can be found in a Twitter blog post at Twitter (2018).

3. Drs. Darren Linvill and Patrick Warren cite this count in several Wired.com articles covering their research on Russian disinformation campaigns. The claims of Linvill and Warren are in reference to their research on the Twitter data, but underlying peer reviewed articles containing these claims were not publicly available at time of writing. References to the 20 person value found by Drs. Darren Linvill and Patrick Warren can be found in Paris Martineau, “Twitter’s dated data dump doesn’t tell us about future meddling,” at https://www.wired.com/story/twitters-dated-data-dump-doesnt-tell-us-about-future-meddling/ and Alex Calderwood, Erin Riglin, and Shreya Vaidyanathan, “How Americans wound up on Twitter’s list of Russian bots,” at https://www.wired.com/story/how-americans-wound-up-on-twitters-list-of-russian-bots/.

4. More accurately, when using topic models the user assumes observed words (in a “bag of words” model that ignores word order) are generated by a joint-probability of two mixtures. Documents are a mixture of topics and topics are a mixture of words. Each topic is a discrete distribution of words, creating a word-topic matrix. This word-topic matrix provides a conditional probability for every word (row) given each latent topic (column). These probability distributions allow an observer to order and rank words by topics and thus to determine the most common word used when referring to each topic. For more details, see Ryan Wesslen, “Computer-assisted text analysis for social science: Topic models and beyond,” at https://arxiv.org/abs/1803.11045.

5. These patterns are invariably subject to critiques of intercoder reliability. However, it should be noted: 1: Human sentiment coding agreement is never 100 percent, and at least one study shows sentiment agreement between human coders at around 82 percent (Wilson, et al., 2005). This work also shows lexicon analysis can match or nearly match this kind of reliability; and, 2: One of the emotional lexicons used here is crowd sourced, and thus makes attempts to address inter-coder reliability inherently. The other has been deployed in multiple studies and has a long peer reviewed track record.

6. One aggregation of topic model studies notes 10–50 topics (k=35), while the STM package references k=60–100 as a “good starting point” for larger corpora. The overall takeaway is that there is no perfect or “correct” k value, and that the subsequent models are replicable tools to assist in research, not immutable, “pure” or otherwise indefectible summarizations of data.

7. For one conceptual question regarding the interpretation of these models coders unfamiliar with the corpus were also utilized to rate topics.

8. The author defined political language simply as words associated with prominent politicians, social movements, political campaigns, government policies, political hashtags, nation states, religions, foreign leaders, and political concepts like patriotism. This definition also included U.S. political slang like “snowflake” or “woke”. Such a definition was also presented to the coders unfamiliar with the corpus prior to their assistance.

9. Full topic label lists are available in Appendices 1 and 2 for the highest probability and score models, respectively. Readers concerned with the interpretation given here can refer to the full model outputs.

10. Krippendorff’s alpha (Kα) reliability coefficient values, as well as simple agreement percentages, were calculated for coder’s political/apolitical ratings. For the highest probability output the Kα coefficient was .734, while the score output rating value was .587. The simple agreement percentages were 80.4 and 69.6 for the highest probability and score outputs, respectively. Neither of the Kα values here is particularly strong, indicating a relative lack of agreement between coders. Due to the inductive structure of this inquiry (and thus the limited coaching/instructions for coders), as well as the highly esoteric nature of the Twitter data (hashtags, slang, etc.), such low Kα scores are not necessarily surprising.

11. Review from coders unfamiliar with the data was not conducted for the more specific topics present in this inquiry. The author chose to forgo this review because topic content for these topics was self-evident (as in the case of the “Clinton/Trump” coding). Topics either mention Obama, Trump, Black Lives Matter, etc., or they do not. Whether a topic is “political” or “apolitical” is a subjective assessment judged to be improved by repeat coder validation. As always, readers are encouraged to review the topics themselves in Appendices 1 and 2.

12. PJNET, or the Patriot Journalist Network, is a conservative information network and associated Twitter account (some potentially bots) currently banned from Twitter. PJNET has been tied to disinformation campaigns related to education politics. See Aaron Mak, “Twitter is shutting down a conservative group’s automated tweets,” at http://www.slate.com/blogs/future_tense/2017/10/17/twitter_has_labeled_a_conservative_group_s_automated_tweets_as_spam.html.

13. “TCOT” is an abbreviation for “Top Conservatives on Twitter”.

14. “CCOT” is an abbreviation for “Christian Conservatives on Twitter”.

15. “Let not your heart be troubled” “LNYHBT”) is a phrase associated with conservative television personality Sean Hannity.

16. Logistically, major operations within the Russian Federation are sometimes coordinated through the presidential administration, but many are carried out by an array of “political entrepreneurs” hoping that their success will “win them the Kremlin’s favor” (Galeotti, 2018). This may also help explain the host of topics within this corpus. Of course, it also provides plausible deniability for the Kremlin as the IRA is not officially a structural element of the state and is more of a favored sub-contractor receiving state funding.

17. Aleksandr Dugin, 1997. Основы геополитики (геополитическое будущее России) (Osnovy geopolitiki: Geopoliticheskoe budushchee Rossii). Moskva: Arktogeja; a brief description can be found at https://en.wikipedia.org/wiki/Foundations_of_Geopolitics.

 

References

Lada A. Adamic and Natalie Glance, 2005.. “The political blogosphere and the 2004 U.S. election: Divided they blog,” Link KDD ’05: Proceedings of the Third International Workshop on Link Discovery, pp. 36–43.
doi: http://dx.doi.org/10.1145/1134271.1134277, accessed 11 April 2019.

R. Arun, V. Suresh, C.E. Veni Madhavan, and M.N. Narasimha Murthy, 2010. “On finding the natural number of topics with Latent Dirichlet Allocation: Some observations,” In: Mohammed J. Zaki, Jeffrey Xu Yu, Balaraman Ravindran, and Vikram Pudi (editors). Advances in knowledge discovery and data mining. Lecture Notes in Computer Science, volume 6118. Berlin: Springer, pp. 391–402.
doi: http://doi.org/10.1007/978-3-642-13657-3_43, accessed 11 April 2019.

Gilles Bastin and Milan Bouchet-Valat, 2014. “Media corpora, text mining, and the sociological imagination — A free software text mining approach to the framing of Julian Assange by three news agencies using R.TeMiS,” Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, volume 122, number 1, pp. 5–25.
doi: https://doi.org/10.1177/0759106314521968, accessed 11 April 2019.

Marija A. Bekafigo and Allan McBride, 2013. “Who tweets about politics? Political participation of Twitter users during the 2011 gubernatorial elections,” Social Science Computer Review, volume 31, number 5, pp. 625–643.
doi: https://doi.org/10.1177/0894439313490405, accessed 11 April 2019.

Alessandro Bessi and Emilio Ferrara, 2016. “Social bots distort the 2016 U.S. presidential election online discussion,” First Monday, volume 21, number 11, at https://firstmonday.org/article/view/7090/5653, accessed 11 April 2019.
doi: http://dx.doi.org/10.5210/fm.v21i11.7090, accessed 11 April 2019.

Jelle W. Boumans and Damian Trilling, 2016. Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars, Digital Journalism, volume 4, number 1, pp. 8–23.
doi: http://dx.doi.org/10.1080/21670811.2015.1096598, accessed 11 April 2019.

Alex Calderwood, Erin Riglin, and Shreya Vaidyanathan, 2018. “How Americans wound up on Twitter’s list of Russian bots,” Wired, at https://www.wired.com/story/how-americans-wound-up-on-twitters-list-of-russian-bots/, accessed 11 April 2019.

Juliet E. Carlisle and Robert C. Patton, 2013. “Is social media changing how we understand political engagement? An analysis of Facebook and the 2008 presidential election,” Political Research Quarterly, volume 66, number 4, pp. 883–895.
doi: https://doi.org/10.1177/1065912913482758, accessed 11 April 2019.

Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia, 2012. “Detecting automation of Twitter accounts: Are you a human, bot, or cyborg?” IEEE Transactions on Dependable and Secure Computing, volume 9, number 6, pp. 811–824.
doi: https://doi.org/10.1109/TDSC.2012.75, accessed 11 April 2019.

Nicholas Diakopoulos and David Shamma, 2010. “Characterizing debate performance via aggregated Twitter sentiment,” CHI ’10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1,195–1,198.
doi: https://doi.org/10.1145/1753326.1753504, accessed 11 April 2019.

Digital Forensic Research Lab, 2018. “#TrollTracker: Twitter troll farm archives” (17 October), at https://medium.com/dfrlab/trolltracker-twitter-troll-farm-archives-8d5dd61c486b, accessed 11 April 2019.

Joseph DiGrazia, Karissa McKelvey, Johan Bollen, and Fabio Rojas, 2013. “More tweets, more votes: Social media as a quantitative indicator of political behavior,” PLoS ONE, volume 8, number 11 (27 November), e79449.
doi: https://doi.org/10.1371/journal.pone.0079449, accessed 11 April 2019.

Daniela V. Dimitrova, Adam Shehata, Jesper Strömbäck, and Lars W. Nord, 2014. “The effects of digital media on political knowledge and participation in election campaigns: Evidence from panel data,” Communication Research, volume 41, number 1, pp. 95–118.
doi: https://doi.org/10.1177/0093650211426004, accessed 11 April 2019.

John Dunlop, 2004. “Aleksandr Dugin’s Foundations of Geopolitics,” Demokratizatsiya, volume 12, number 1, at http://demokratizatsiya.pub/archives/Geopolitics.pdf, accessed 11 April 2019.

Sara El-Khalili, 2013. “Social media as a government propaganda tool in post–revolutionary Egypt,” First Monday, volume 18, number 3, at https://firstmonday.org/article/view/4620/3423, accessed 11 April 2019.
doi: https://doi.org/10.5210/fm.v18i3.4620, accessed 11 April 2019.

Embassy of the Russian Federation to the United Kingdom of Great Britain and Northern Ireland, 2015. “The military doctrine of the Russian Federation” (29 June), at https://rusemb.org.uk/press/2029, accessed 11 April 2019.

European Commission, 2016. “Security: EU strengthens response to hybrid threats,” press release (6 April), at http://europa.eu/rapid/press-release_IP-16-1227_en.htm, accessed 11 April 2019.

Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini, 2016. “The rise of social bots,” Communications of the ACM, volume 59, number 7, pp. 96–104.
doi: https://doi.org/10.1145/2818717, accessed 11 April 2019.

Mark Galeotti, 2018. “I’m sorry for creating the ‘Gerasimov Doctrine’.” Foreign Policy (5 March), at https://foreignpolicy.com/2018/03/05/im-sorry-for-creating-the-gerasimov-doctrine/, accessed 11 April 2019.

Rachel Gibson, 2004. “Web campaigning from a global perspective,” Asia-Pacific Review, volume 11, number 1, pp. 95–126.
doi: https://doi.org/10.1080/13439000410001687779, accessed 11 April 2019.

Thomas L. Griffiths and Mark Steyvers, 2004. “Finding scientific topics,” Proceedings of the National Academy of Sciences, volume 101, supplement 1 (6 April), pp. 5,228–5,235.
doi: https://doi.org/10.1073/pnas.0307752101, accessed 11 April 2019.

Stephen Hansen, Michael McMahon, and Andrea Prat, 2018. “Transparency and deliberation within the FOMC: A computational linguistics approach,“ Quarterly Journal of Economics, volume 133, number 2, pp. 801–870.
doi: https://doi.org/10.1093/qje/qjx045, accessed 11 April 2019.

Philip N. Howard, 2006. New media campaigns and the managed citizen. New York: Cambridge University Press.

Mike Isaac and Sydney Ember, 2016. “For election day influence, Twitter ruled social media,” New York Times (8 November), at https://www.nytimes.com/2016/11/09/technology/for-election-day-chatter-twitter-ruled-social-media.html, accessed 11 April 2019.

Ye Jiang, Xingyi Song, Jackie Harrison, Shaun Quegan, and Diana Maynard, 2017. “Comparing attitudes to climate change in the media using sentiment analysis based on Latent Dirichlet Allocation,” Proceedings of the 2017 EMNLP Workshop on Natural Language Processing meets Journalism, pp. 25–30, and at https://aclweb.org/anthology/W17-4205, accessed 11 April 2019.

Thomas J. Johnson, Weiwu Zhang, Shannon L. Bichard, and Trent Seltzer, 2010. “United we stand? Online social network sites and civic engagement,” In: Zizi Papacharissi (editor). The networked self: Identity, community, and culture on social network sites. New York: Routledge, pp. 185–207.

Anders Olof Larsson and Hallvard Moe, 2014. “Twitter in politics and elections: Insights from Scandinavia,” In: Katrin Weller, Axel Bruns, Jean Burgess, Merja Mahrt, and Cornelius Puschmann (editors), 2014. Twitter and society. New York: Peter Lang, pp. 319–330.

Moontae Lee and David Mimno, 2014. “Low-dimensional embeddings for interpretable anchor-based topic inference,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1,319–1,328, and at https://www.aclweb.org/anthology/D14-1138, accessed 11 April 2019.

Aaron Mak, 2017. “Twitter is shutting down a conservative group’s automated tweets,” Slate, at http://www.slate.com/blogs/future_tense/2017/10/17/twitter_has_labeled_a_conservative_group_s_automated_tweets_as_spam.html, accessed 11 April 2019.

Colin Martindale, 1990. The clockwork muse: The predictability of artistic change. New York: Basic Books.

Colin Martindale, 1975. Romantic progression: The psychology of literary history. Washington, D.C.: Hemisphere.

Colin Martindale and Audrey Dailey, 1996. “Creativity, primary process cognition and personality,” Personality and Individual Differences, volume 20, number 4, pp. 409–414.
doi: https://doi.org/10.1016/0191-8869(95)00202-2, accessed 11 April 2019.

Colin Martindale and Roland Fischer, 1977. “The effects of psilocybin on primary process content in language,” Confinia Psychiatrica, volume 20, number 4, pp. 195–202.

Paris Martineau, 2018. “Twitter’s dated data dump doesn’t tell us about future meddling,” Wired, at https://www.wired.com/story/twitters-dated-data-dump-doesnt-tell-us-about-future-meddling/, accessed 11 April 2019.

Saif Mohammad, 2011. “From once upon a time to happily ever after: Tracking emotions in novels and fairy tales,” LaTeCH ’11: Proceedings of the Fifth ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 105–114, and at https://www.aclweb.org/anthology/W11-1514, accessed 11 April 2019.

Saif Mohammad and Peter D. Turney, 2013. “Crowdsourcing a word-emotion association lexicon,” Computational Intelligence, volume 29, number 3, pp. 436–465.
doi: https://doi.org/10.1111/j.1467-8640.2012.00460.x, accessed 11 April 2019.

Saif Mohammad and Peter D. Turney, 2011. “Emotions evoked by common words and phrases: using mechanical turk to create an emotion lexicon,” CAAGET ’10: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 26–34, and at https://www.aclweb.org/anthology/W10-0204, accessed 11 April 2019.

Saif Mohammad and Tony (Wenda) Yang, 2011. “Tracking sentiment in mail: How genders differ on emotional axes,” WASSA ’11: Proceedings of the Second Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, pp. 70–79, and at https://www.aclweb.org/anthology/W11-1709, accessed 11 April 2019.

Hannes Mueller and Christopher Rauh, 2017. “Reading between the lines: Prediction of political violence using newspaper text,” Barcelona Graduate School of Economics, Working Paper Series, number 990, at http://www.iae.csic.es/investigatorsMaterial/a17219152631sp75273.pdf, accessed 11 April 2019.

Christopher Paul and Miriam Matthews, 2016. “The Russian ‘Firehose of Falsehood’ propaganda model: Why it might work and options to counter it,” RAND Perspective, PE-198-OSD, at https://www.rand.org/content/dam/rand/pubs/perspectives/PE100/PE198/RAND_PE198.pdf, accessed 11 April 2019.

Ben Popken, 2018. “Twitter deleted 200,000 Russian troll tweets. Read them here,” NBC News (14 February), at https://www.nbcnews.com/tech/social-media/now-available-more-200-000-deleted-russian-troll-tweets-n844731, accessed 11 April 2019.

Ben Popken and Kelly Cobiella, 2017. “Russian troll describes work in the infamous misinformation factory,” NBC News (16 November), at https://www.nbcnews.com/news/all/russian-troll-describes-work-infamous-misinformation-factory-n821486, accessed 11 April 2019.

Kevin M. Quinn, Burt L. Monroe, Michael Colaresi, Michael H. Crespin, and Dragomir R. Radev, 2010. “How to analyze political attention with minimal assumptions and costs,” American Journal of Political Science, volume 54, number 1, pp. 209–228.
doi: https://doi.org/10.1111/j.1540-5907.2009.00427.x, accessed 11 April 2019.

Radim Řehůřek and Petr Sojka, 2010. “Software framework for topic modelling with large corpora,” Natural Language Processing Laboratory, Masaryk University (Brno), at https://radimrehurek.com/gensim/lrec2010_final.pdf, accessed 11 April 2019.

Robert Reynes, Colin Martindale, and Hartvig Dahl, 1984. “Lexical differences between working and resistance sessions in psychoanalysis,” Journal of Clinical Psychology, volume 40, number 3, pp. 733–737.

Margaret E. Roberts, Brandon M. Stewart, Dustin Tingley, and Kenneth Benoit, 2018. “Package ‘stm’” (28 January), at https://cran.r-project.org/web/packages/stm/stm.pdf, accessed 11 April 2019.

Margaret E. Roberts, Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder–Luis, Shana Kushner Gadarian, Bethany Albertson, and David G. Rand, 2014. “Structural topic models for open-ended survey responses,” American Journal of Political Science, volume 58, number 4, pp. 1,064–1,082.
doi: https://doi.org/10.1111/ajps.12103, accessed 11 April 2019.

Theresa Schmiedel, Oliver Müller, and Jan vom Brocke, 2018. “Topic modeling as a strategy of inquiry in organizational research: A tutorial with an application example on organizational culture,” Organizational Research Methods (6 May).
doi: https://doi.org/10.1177/1094428118773858, accessed 11 April 2019.

Samantha Shorey and Philip N. Howard, 2016. “Automation, big data and politics: A research review,” International Journal of Communication, volume 10, at https://ijoc.org/index.php/ijoc/article/view/6233, accessed 11 April 2019.

Stefan Stieglitz and Linh Dang-Xuan, 2013. “Social media and political communication: A social media analytics framework,” Social Network Analysis and Mining, volume 3, number 4, pp. 1,277–1,291.
doi: https://doi.org/10.1007/s13278-012-0079-3, accessed 11 April 2019.

Twitter, 2018. “Update on Twitter’s review of the 2016 US election,” Twitter Public Policy (19 January), at https://blog.twitter.com/en_us/topics/company/2018/2016-election-update.html, accessed 11 April 2019.

U.S. Department of Justice, 2018a. “United States of America v. Internet Research Agency LLC, Case 1:18-cr-00032-DLF Document 1, filed 16 February, at https://www.justice.gov/file/1035477/download, accessed 11 April 2019.

U.S. Department of Justice, 2018b. “Grand jury indicts thirteen Russian individuals and three Russian companies for scheme to interfere in the United States political system” (16 February), U.S. Department of Justice, press release 18–198, at https://www.justice.gov/opa/pr/grand-jury-indicts-thirteen-russian-individuals-and-three-russian-companies-scheme-interfere, accessed 10 April 2019.

U.S. House of Representatives, Permanent Select Committee on Intelligence, 2017. “Russia Investigative Task Force hearing with social media companies“ (1 November), at https://docs.house.gov/meetings/IG/IG00/20171101/106558/HHRG-115-IG00-Transcript-20171101.pdf, accessed 11 April 2019.

U.S. National Intelligence Council, 2017. “Assessing Russian activities and intentions in recent US elections,” Office of the Director of National Intelligence, National Intelligence Council, Intelligence Community Assessment, ICA 2017-01 D, at https://www.dni.gov/files/documents/ICA_2017_01.pdf, accessed 11 April 2019.

U.S. Senate Committee on the Judiciary, Subcommittee on Crime and Terrorism, 2017. “Testimony of Sean J. Edgett, Acting General Counsel, Twitter, Inc.” (31 October), at https://www.judiciary.senate.gov/imo/media/doc/10-31-17%20Edgett%20Testimony.pdf, accessed 11 April 2019.

Kasper Welbers, Wouter Van Atteveldt, and Kenneth Benoit, 2017. “Text analysis in R,” Communication Methods and Measures, volume 11, number 4, pp. 245–265.
doi: https://doi.org/10.1080/19312458.2017.1387238, accessed 11 April 2019.

Ryan Wesslen, 2018. “Computer-assisted text analysis for social science: Topic models and beyond,” arXiv (3 April), at https://arxiv.org/abs/1803.11045, accessed 11 April 2019.

Alan N. West and Colin Martindale, 1988. “Primary process content in paranoid schizophrenic speech,” Journal of Genetic Psychology, volume 149, number 4, pp. 547–553.
doi: https://doi.org/10.1080/00221325.1988.10532180, accessed 11 April 2019.

Alan N. West, Colin Martindale, and Brian Sutton-Smith, 1985. “Age trends in the content of children’s spontaneous fantasy narratives,” Genetic, Social, and General Psychology Monographs, volume 111, number 4, pp. 389–405.

Alan N. West, Colin Martindale, Dwight Hines, and Walton T. Roth, 1983. “Marijuana-induced primary process content in the TAT,” Journal of Personality Assessment, volume 47, number 5, pp. 466–467.
doi: https://doi.org/10.1207/s15327752jpa4705_3, accessed 11 April 2019.

Theresa Wilson, Janyce Wiebe, and Paul Hoffmann, 2005. “Recognizing contextual polarity in phrase-level sentiment analysis,” HLT ’05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354.
doi: https://doi.org/10.3115/1220575.1220619, accessed 11 April 2019.

Samuel C. Woolley and Philip N. Howard, 2016. “Political communication, computational propaganda, and autonomous agents — Introduction,” International Journal of Communication, volume 10, pp. 4,882–4,890, and at https://ijoc.org/index.php/ijoc/article/view/6298, accessed 11 April 2019.

Asta Zelenkauskaite and Marcello Balduccini, 2017. “‘Information warfare’ and online news commenting: Analyzing forces of social influence through location-based commenting user typology,” Social Media + Society (17 July).
doi: https://doi.org/10.1177/2056305117718468, accessed 11 April 2019.

Weiwu Zhang, Trent Seltzer, and Shannon L. Bichard, 2013. “Two sides of the coin: Assessing the influence of social network site use during the 2012 U.S. presidential campaign,” Social Science Computer Review, volume 31, number 5, pp. 542–551.
doi: https://doi.org/10.1177/0894439313489962, accessed 11 April 2019.

 

Appendix 1: Topic label content (highest probability labeling algorithm)for Internet Research Agency sponsored tweets, k=56.

Topic 1: girl, mother, king, blue, green, street, @gamiliel, dude, #fishtv, beer, catch, #drunkband, ball, dark, robert, #redneckamovi, door, brother, harri, #dogsong

Topic 2: #tcot, muslim, #pjnet, islam, attack, terrorist, terror, christian, #teaparti, refuge, radic, kill, obama, peac, @guntrust, must, america, europ, #isi, legaci

Topic 3: play, issu, @nine_oh, @rapstationradio, emx9jgtv3v, #nowplay, @1063atl, featur, radio, magazin, artist, feat, ichlzwqg0i, #music, #listen2, #rap, #hiphop, digit, print, @indieradioplay

Topic 4: interview, mess, #toavoidworki, guest, @a5kem, #busi, channel, matthew, #vulnerabilityin5word, @ospatriot, steve, #podernfamili, @emenogu_phil, askem, segment, furious, scream, subscrib, jimmi, @wyzechef

Topic 5: #merkelmussbleiben, merkel, #merkel, frau, #girlstalkselfi, nicht, deutschland, wird, unser, sich, dass, amtszeit, immer, schaffen, kein, kanzlerin, angela, viel, bleibt, auto

Topic 6: host, hill, film, scene, #addabandtoatvshow, warren, chuck, behind, chao, knock, elizabeth, schumer, till, @blicqer, introduc, premier, effect, transform, flash, #artyoucanhear

Topic 7: listen, beat, #nowplay, track, @boogsmalon, soldier, produc, tune, dirti, music, avail, hook, grand, rapper, live, london, @nine_oh, dope, gang, prod

Topic 8: media, racist, support, protest, liber, polit, #mustbeban, call, #trumpsfavoriteheadlin, racism, video, outsid, report, govern, #blacklivesmatt, realiti, violenc, nazi, educ, white

Topic 9: pleas, latest, daili, dear, bless, david, mark, thank, share, sheriff, clark, @kattfunni, @tlcprincess, lewi, barri, chicken, @trivagod, duke, @jstines3, castro

Topic 10: life, happi, everyon, littl, make, wish, fear, deserv, nice, @breitbartnew, celebr, @deray, beauti, birthday, @tycashh, monday, thank, friday, other, gotta

Topic 11: trump, trumpâ, #thingsyoucantignor, order, just, rock, video, #obamaswishlist, front, @blicqer, presid, step, take, execut, call, ignor, woman, name, talk, inaugur

Topic 12: histori, folk, struggl, @shaunk, @blicqer, cultur, #blacktwitt, africa, upset, @khankne, @em2wic, hidden, hypocrit, buri, @jojokejohn, cycl, histor, @trueblackpow, contribut, inform

Topic 13: clinton, corrupt, peopl, american, #debat, hillari, media, #neverhillari, clintonâ, call, sick, @hillaryclinton, crimin, said, just, question, trump, prison, #debatenight, support

Topic 14: black, peopl, white, kill, polic, woman, matter, children, power, hous, #blacklivesmatt, live, student, amaz, arrest, murder, young, crime, @blicqer, shot

Topic 15: @midnight, start, promot, detail, @jayceodpromot, head, @chicagoildaili, career, click, quick, just, #chicago, #mondaymotiv, drag, #break, #addcartoonstohistori, woke, lieben, mein, @independ

Topic 16: fight, stand, join, patriot, freedom, armi, enlist, america, @abninfvet, veteran, dare, today, click, rrzgbccxbo, grow, sign, fellow, american, speech, socialist

Topic 17: show, true, video, stori, penc, mike, sexual, victim, view, assault, silenc, abort, plane, @realalexjon, includ, onlin, #dnc, toler, watch, spot

Topic 18: come, blame, @elena07617349, suspect, known, enjoy, polic, theyâ, #idontneedacostumebecaus, repeat, hope, dalla, favor, anti, revolut, pure, spell, snowflak, band, forgotten

Topic 19: everi, tweet, free, anyth, #secondhandgift, phone, minut, half, futur, kick, gift, cool, card, album, kiss, babi, player, gave, tree, appl

Topic 20: vote, trump, elect, democrat, voter, #trumpforpresid, anyon, today, #trumppence16, #maga, support, #trump, fraud, parti, #electionday, #trump2016, senat, earli, #hillaryforprison2016, candid

Topic 21: #maga, #trump, #pjnet, #trump2016, #cruzcrew, #hillari, #trumptrain, #prayers4california, #neverhillari, #wakeupamerica, cartoon, #obama, #uniteblu, #tedcruz, exact, @christichat, @amrightnow, agre, #cosproject, #lnyhbt

Topic 22: want, just, peopl, chang, world, someth, work, someon, mani, feel, hear, like, make, donâ, place, stop, refuge, think, perfect, countri

Topic 23: know, thing, noth, need, mean, happen, never, #thingspeopleontwitterlik, #gopdeb, everyth, #vegasgopdeb, y’all, @jadedbypolit, gold, @lupash7, cours, research, peopl, well, #liberallog

Topic 24: #thingsinventedwhilehigh, #terror, #lgbt, @ohhsocialmedia, chancen, @agendaofevil, sigh, real, @drc_19, bruh, @amerpatriot1, @justicewillett, hahahah, para, @az_valentin, @quinnqueen, @bleuishbleu, @questlov, @trusselis, franã

Topic 25: like, back, look, without, bring, wall, around, build, border, just, #makemehateyouinonephras, take, #2017survivaltip, avoid, come, street, hold, keep, season, talk

Topic 26: year, still, better, look, fuck, wait, russian, word, done, lost, death, alreadi, 2017, shit, welcom, hour, damn, well, drink, long

Topic 27: love, anoth, leav, heart, hell, creat, singl, just, dream, expect, wife, #valentinesdayin3word, alon, short, soon, husband, much, seat, publish, pizza

Topic 28: part, @youtub, bodi, except, establish, #sport, @activistpost, tour, video, main, @onpiratesat, #myolympicsportwouldb, process, pick, playlist, tale, robot, anniversari, @nyc_everyday, measur

Topic 29: time, york, sound, late, crazi, sing, wast, @shutupamanda, excit, stuff, @jarmadillo, cold, like, butt, just, @charissemsrd, legend, @chris_1791, mobil, awar

Topic 30: live, fall, american, link, intern, danger, learn, stream, plant, baltimor, injur, #phosphorusdisast, apart, everyday, #thingsdonebymistak, octob, water, updat, market, protest

Topic 31: #alternativeacronyminterpret, shut, readi, ladi, bitch, direct, idiot, organ, drive, fool, attent, pari, definit, dumb, self, studi, conspiraci, anim, femal, terribl

Topic 32: presid, even, ever, next, becom, first, thought, think, #idrunforpresidentif, trump, might, unit, never, #igetdepressedwhen, worst, obama, potus, elect, make, though

Topic 33: #pjnet, cruz, control, constitut, lord, govt, conserv, amend, defeat, militari, liberti, american, #rednationris, jesus, republ, @tedcruz, reagan, disgrac, @ggeett37aaa, govern

Topic 34: post, news, twitter, follow, @conservatexian, hate, open, fact, check, parti, move, clear, account, facebook, joke, fake, choic, #2016electionin3word, #thingsthatshouldbecensor, make

Topic 35: said, #thingsmoretrustedthanhillari, drug, room, bank, pull, #nocybercensorship, hollywood, agenda, info, tear, high, safe, lack, homeless, cheat, west, cancer, internet, bulli

Topic 36: best, problem, moment, land, industri, altern, integr, modern, gegen, afghanistan, #brexit, sich, jetzt, sind, #cdu, fundament, mehr, rent, mein, nicht

Topic 37: realli, like, #ruinadinnerinonephras, #survivalguidetothanksgiv, food, guess, huge, pray, tonight, sorri, awesom, hair, wanna, make, invit, marri, dress, just, finish, sister

Topic 38: miss, hand, case, chanc, strong, sell, small, song, univers, enter, origin, doubt, gonna, found, opportun, music, blood, easi, just, contest

Topic 39: donald, #polit, #new, clinton, support, ralli, speech, immigr, republican, plan, debat, call, russia, comment, meet, campaign, endors, putin, accus, presid

Topic 40: dead, journalist, reach, #local, 2015, juli, @spiegelonlin, truck, offens, sweden, wind, leagu, @phoenixnewsaz, 2013, slow, @camboviet, effort, region, villag, provid

Topic 41: hillari, campaign, email, bill, investig, foundat, wikileak, releas,leak, health, @wikileak, reveal, hack, camp, donor, scandal, sander, birther, comey, privat

Topic 42: @realdonaldtrump, great, thank, america, @hillaryclinton, @potus, @foxnew, @cnn, make, #maga, proud, honor, american, need, #makeamericagreatagain, @seanhann, @kellyannepol, countri, @jenn_abram, #trumpbecaus

Topic 43: state, trump, poll, nation, lead, point, secur, break, #2016in4word, news, report, ahead, race, show, depart, secretari, #new, north, among, dept

Topic 44: #ccot, #wakeupamerica, #gop, #usa, #teapartynew, #america, michael, @gop, kelli, #conserv, #nra, #polit, #theteaparti, sweet, @dmashak, simpli, bigger, @thedemocrat, #wethepeopl, #tgdn

Topic 45: million, china, worth, dollar, @zerohedg, googl, explos, complain, smith, #oscarssowhit, #oscarhasnocolor, capit, normal, imag, @screamymonkey, affair, @welt, chart, loan, label

Topic 46: right, women, left, protect, human, wrong, seem, march, #ihavearighttoknow, @talibkw, yeah, session, @feministajon, civil, respect, equal, defend, #guns4ni, anti–trump, marriag

Topic 47: money, famili, christma, full, #christmasaftermath, #todolistbeforechristma, whole, spend, realiz, complet, #my2017resolut, away, take, broke, propos, present, taken, georg, santa, credit

Topic 48: 2016, photo, save, john, list, servic, energi, @hashtagroundup, rose, sunday, #makemusicreligi, secret, @blicqer, spoke, decemb, show, februari, april, week, make

Topic 49: good, read, night, friend, morn, enough, idea, last, pretti, #myfarewellwordswouldb, ain’t, make, feel, actor, #whatiwouldtella15yearoldm, @politweec, #myemmynominationwouldb, ever, @keshatedd, intent

Topic 50: @hillaryclinton, #rejecteddebatetop, discuss, star, @cmdorsey, @realjameswood, prefer, @berniesand, wear, crap, wave, wood, behavior, toward, @jamesokeefeiii, regard, #imnotwithh, less, sock, brief

Topic 51: block, system, global, massiv, union, germani, german, websit, ring, island, allen, sport, migrant, #athleticstvshow, partner, auch, pool, sind, haben, heut

Topic 52: million, fund, donat, went, #clinton, spent, launch, #election2016, grant, budget, chariti, percent, pretend, billion, foundat, paid, 2014, #clintonfound, @blaviti, firm

Topic 53: watch, game, #betteralternativetodeb, hashtag, @giselleevn, #thingsnottaughtatschool, movi, sleep, special, receiv, rape, mass, light, flag, #reallifemagicspel, grab, #tofeelbetteri, @worldofhashtag, danc, clean

Topic 54: truth, book, #giftideasforpolitician, #islamkil, target, water, sens, shoot, common, #prayforbrussel, #brussel, smoke, limit, #isi, ticket, soul, increas, brain, term, account

Topic 55: obama, final, hous, deal, offic, iran, congress, presid, barack, michell, american, administr, court, refuge, syria, israel, obamaâ, climat, #new, syrian

Topic 56: just, make, take, trump, think, call, keep, peopl, real, need, stop, give, never, tell, show, talk, come, help, much, must

 

Appendix 2: Topic label content (score labeling algorithm) for Internet Research Agency sponsored tweets, k=56.

Topic 1: girl, #fishtv, #drunkband, #dogsong, #redneckamovi, @gamiliel, #sexysport, #summeramovi, blue, mother, #maketvshowscanadian, #dickflick, dude, #addamovieruinamovi, king, beer, #maketvsexi, #onewordoffbook, green, #stonedcomicbook

Topic 2: #tcot, muslim, #pjnet, islam, terrorist, #teaparti, terror, attack, christian, @guntrust, radic, #renewus, #isi, refuge, @petefrt, @fallenangelmovi, #islam, legaci, #ccot, kill

Topic 3: play, @rapstationradio, @nine_oh, emx9jgtv3v, @1063atl, #nowplay, issu, feat, ichlzwqg0i, #listen2, #music, magazin, @indieradioplay, #rapstationradio, @stopbeefinradio, #rap, #hiphop, radio, featur, @dagr8fm

Topic 4: @a5kem, askem, #toavoidworki, mess, interview, guest, #vulnerabilityin5word, #busi, matthew, @emenogu_phil, @ospatriot, channel, @slobodarskasrbi, segment, #podernfamili, subscrib, @wyzechef, furious, @sarahkendzior, @coasttocoastam

Topic 5: #merkelmussbleiben, merkel, #merkel, frau, #girlstalkselfi, nicht, deutschland, wird, amtszeit, unser, schaffen, sich, dass, immer, kein, kanzlerin, bleibt, viel, tagebuch, ganz

Topic 6: scene, host, chuck, #addabandtoatvshow, #artyoucanhear, hill, premier, chao, schumer, warren, elizabeth, flash, knock, film, mama, davi, introduc, #art, till, daddi

Topic 7: #nowplay, listen, beat, @boogsmalon, tune, produc, track, soldier, prod, dirti, 1js42r66si, @nine_oh, feat, hook, rapper, @dagr8fm, grand, #dagr8fm, london, t–shirt

Topic 8: media, racist, #mustbeban, protest, #trumpsfavoriteheadlin, support, racism, nazi, liber, outsid, #blacklivesmatt, #rncincl, mainstream, polit, nigga, realiti, @danageezus, correct, ivanka, educ

Topic 9: pleas, david, dear, daili, bless, latest, mark, sheriff, clark, @tlcprincess, thank, chicken, @kattfunni, lewi, @trivagod, @rappersiq, barri, @jstines3, castro, #famouscreatur

Topic 10: happi, everyon, life, birthday, littl, wish, @tycashh, deserv, fear, @deray, @annogalact, gotta, @breitbartnew, monday, #god, friday, nice, smile, #happybirthdayharrytruman, #supremesacrificeday

Topic 11: trump, trumpâ, #thingsyoucantignor, #obamaswishlist, rock, front, order, execut, presid, breath, @gloed_up, inaugur, step, pipelin, @tgjones_62, repli, just, loud, @bizpacreview, ignor

Topic 12: histori, @shaunk, #blacktwitt, @em2wic, @trueblackpow, folk, struggl, @moorbey, @khankne, @fresh_flames1, africa, @angelaw676, @blackmoses2015, cultur, @3rdeyeplug, @jojokejohn, belov, @historyhero, cycl, contribut

Topic 13: clinton, corrupt, clintonâ, #debat, hillari, #neverhillari, #debatenight, crook, sick, @hillaryclinton, #demdeb, #birther, prison, #hillaryshealth, crimin, american, #benghazi, peopl, media, @cernovich

Topic 14: black, white, peopl, kill, polic, matter, children, woman, #blacklivesmatt, amaz, power, student, hous, color, male, murder, arrest, young, chicago, live

Topic 15: @midnight, start, promot, @jayceodpromot, lieben, detail, #mondaymotiv, click, mein, @chicagoildaili, drag, nacht, euch, #addcartoonstohistori, #chicago, nsche, woke, #morgen, career, guten

Topic 16: join, fight, patriot, stand, enlist, freedom, @abninfvet, armi, rrzgbccxbo, rrzgbcu8tm, dare, usfa, click, @usfreedomarmi, #usfa, #bb4sp, veteran, fellow, america, central

Topic 17: true, mike, penc, show, sexual, view, assault, stori, silenc, victim, video, plane, abort, @realalexjon, toler, #dnc, #ilove__butihate__, interrupt, writer, loui

Topic 18: @elena07617349, blame, come, dalla, suspect, known, #idontneedacostumebecaus, theyâ, band, enjoy, revolut, #media, pure, anti, snowflak, spell, favor, @ashleywarrior, @italians4trump, repeat

Topic 19: everi, tweet, #secondhandgift, phone, free, anyth, album, minut, gift, kick, card, player, half, cool, tree, kiss, appl, shirt, bone, futur

Topic 20: vote, trump, #trumpforpresid, voter, elect, democrat, anyon, #electionday, #trumppence16, #hillaryforprison2016, #maga, fraud, elector, #trump2016, earli, #lostin3word, #trumptrain, #trump, ballot, popular

Topic 21: #maga, #trump, #pjnet, #cruzcrew, #trump2016, #trumptrain, @amrightnow, #tedcruz, #uniteblu, @dbargen, #wakeupamerica, #lnyhbt, #hillari, #cosproject, #prayers4california, #realdonaldtrump, cartoon, #veteran, @christichat, @robhoey

Topic 22: want, someth, someon, chang, donâ, just, hear, peopl, world, #potuslasttweet, work, refuge, #islamkil, feel, #sometimesitsokto, nobodi, place, perfect, mani, @abusedtaxpay

Topic 23: know, thing, noth, mean, need, @lupash7, #vegasgopdeb, @jadedbypolit, happen, #gopdeb, @mikerz, #thingspeopleontwitterlik, @jhwalz32, @sarcatstyx, @kcarslin, never, @geraldyak420, #liberallog, @shariromin, gold

Topic 24: #thingsinventedwhilehigh, chancen, #terror, #lgbt, @ohhsocialmedia, @quinnqueen, @drc_19, #justiceforbenghazi4, @agendaofevil, hahahah, @bleuishbleu, frauen, #whytepanth, @whytepantherrn, @chuca_85, @diosmisalva, @lauranestor4, @mariaguenzani, @silviarn19, @vivaciousstar2

Topic 25: like, back, wall, without, build, around, bring, look, border, #makemehateyouinonephras, #2017survivaltip, avoid, season, street, #istartcryingwhen, goal, #unlikelythingsheardatwalmart, @blackgirlnerd, walk, hold

Topic 26: year, better, fuck, wait, still, russian, done, 2017, lost, alreadi, death, look, welcom, drink, word, damn, shit, music, hour, berni

Topic 27: love, anoth, leav, heart, singl, #valentinesdayin3word, dream, expect, hell, alon, wife, creat, short, husband, pizza, seat, dinner, publish, chocol, ugli

Topic 28: part, @youtub, @onpiratesat, except, #sport, bodi, establish, tale, tour, main, @activistpost, playlist, #snrtg, #myolympicsportwouldb, anniversari, #mrrobot, robot, #tvseri, pursu, #usanetwork

Topic 29: time, sound, late, sing, york, wast, @jarmadillo, @shutupamanda, excit, @charissemsrd, crazi, butt, mobil, legend, stuff, cold, curs, #twittercanbeabit, freez, technolog

Topic 30: live, fall, link, stream, intern, #phosphorusdisast, american, injur, phosphorus, apart, plant, #hamburg, halt, octob, baltimor, everyday, #thingsdonebymistak, asleep, @todaypittsburgh, danger

Topic 31: #alternativeacronyminterpret, ladi, shut, bitch, direct, idiot, pari, fool, readi, drive, organ, self, attent, conspiraci, dumb, terribl, anim, driver, dump, design

Topic 32: presid, even, becom, thought, ever, next, #idrunforpresidentif, #igetdepressedwhen, unit, first, might, worst, though, prayer, #probabletrumpstweet, think, obama, trump, potus, vice

Topic 33: #pjnet, cruz, constitut, lord, control, amend, #rednationris, @ggeett37aaa, liberti, govt, defeat, #trust, republ, #syria, jesus, @peddoc63, @tedcruz, disgrac, #wakeupamerica, #climatechang

Topic 34: twitter, post, @conservatexian, news, follow, hate, open, #2016electionin3word, fact, account, facebook, clear, #thingsthatshouldbecensor, joke, move, check, choic, troll, parti, suspend

Topic 35: #thingsmoretrustedthanhillari, drug, room, #nocybercensorship, bank, agenda, info, said, tear, lack, pull, homeless, hollywood, cancer, cheat, pressur, taco, bulli, chines, jone

Topic 36: best, moment, land, problem, industri, #cdu, altern, gegen, sich, jetzt, integr, sind, mehr, #brexit, deutschland, afghanistan, modern, mein, sehr, kein

Topic 37: realli, #ruinadinnerinonephras, food, pray, #survivalguidetothanksgiv, awesom, huge, marri, guess, sorri, invit, hair, cook, dress, wanna, sister, @johnfplan, like, finish, suppos

Topic 38: miss, hand, chanc, case, strong, enter, small, opportun, @musicjunkypush, sell, song, @powermusicteam, indi, doubt, univers, @braziliangirl32, contest, @cherimus, blood, @yaboymiko

Topic 39: donald, #polit, clinton, #new, ralli, support, immigr, speech, republican, campaign, debat, comment, ohio, russia, endors, convent, putin, accus, melania, @zaibatsunew

Topic 40: @spiegelonlin, dead, juli, journalist, #local, reach, 2015, truck, leben, leagu, erdogan, @phoenixnewsaz, wieder, sweden, offens, region, 2013, wind, @camboviet, @millcitynew

Topic 41: hillari, email, campaign, bill, foundat, investig, wikileak, leak, @wikileak, donor, comey, releas, birther, server, camp, scandal, health, probe, chelsea, sander

Topic 42: @realdonaldtrump, great, thank, @hillaryclinton, @potus, @foxnew, america, @cnn, @seanhann, #maga, #makeamericagreatagain, @kellyannepol, proud, #trumpbecaus, @loudobb, honor, @jenn_abram, #fakenew, @carminezozzora, #americafirst

Topic 43: state, poll, lead, point, nation, trump, #2016in4word, secur, depart, secretari, dept, ahead, break, among, north, carolina, unit, sourc, swing, news

Topic 44: #ccot, #wakeupamerica, #teapartynew, #conserv, #gop, #usa, #america, #theteaparti, @gop, #nra, kelli, @dmashak, michael, sweet, #wethepeopl, #tgdn, #makedclisten, @thedemocrat, #patriot, #republican

Topic 45: china, million, dollar, worth, googl, #oscarssowhit, @zerohedg, complain, capit, explos, smith, @welt, #oscarhasnocolor, @tagesschau, @screamymonkey, hier, normal, warum, affair, #flã

Topic 46: right, women, left, human, @talibkw, yeah, march, protect, seem, #ihavearighttoknow, @feministajon, session, wrong, civil, equal, #guns4ni, marriag, wing, #womensmarch, jeff

Topic 47: #christmasaftermath, money, christma, full, #todolistbeforechristma, famili, whole, realiz, #my2017resolut, santa, complet, broke, spend, propos, present, @batshake1, credit, stick, thereâ, wrap

Topic 48: photo, save, list, 2016, john, servic, @hashtagroundup, rose, #makemusicreligi, energi, 3qdisjyhb9, sunday, @zenrand, decemb, @500px, februari, spoke, @thehashtaggam, @hashtagzoo, upbuild

Topic 49: good, morn, night, read, friend, idea, enough, #myfarewellwordswouldb, pretti, ain't, @politweec, #whatiwouldtella15yearoldm, last, @keshatedd, luck, #myemmynominationwouldb, felt, advic, actor, @ilovemywife0007

Topic 50: #rejecteddebatetop, @hillaryclinton, @cmdorsey, discuss, star, @realjameswood, @berniesand, @jamesokeefeiii, wave, prefer, whisper, hail, wood, sock, @flotus, behavior, @timkain, #imnotwithh, crap, @billclinton

Topic 51: block, system, global, germani, german, massiv, websit, union, auch, sind, haben, heut, #athleticstvshow, schã, einen, allen, nach, ring, morgen, island

Topic 52: donat, fund, million, went, #clinton, #election2016, spent, budget, chariti, launch, grant, percent, foundat, firm, #clintonfound, @blaviti, 2014, 2011, #hurricanematthew, #fbi

Topic 53: game, watch, #betteralternativetodeb, @giselleevn, hashtag, sleep, #thingsnottaughtatschool, movi, receiv, @worldofhashtag, danc, rape, special, mass, light, clown, #tofeelbetteri, #reallifemagicspel, shoe, weather

Topic 54: #giftideasforpolitician, truth, #islamkil, #iceisi, #opiceisi, #target, @ctrlsec, book, #prayforbrussel, target, sens, #brussel, soul, smoke, common, #isi, water, ticket, limit, increas

Topic 55: obama, final, iran, barack, michell, administr, congress, hous, deal, presid, bb4sp, syria, approv, syrian, offic, climat, sentenc, admin, @gerfingerpoken, court

Topic 56: just, #electâ, trump, @romaacorn, make, #imwithchickenavocadosubwaydressedwithsweetonionanddoublehoneymustard, #imwithhumusmarshmallowspeanutbuttersteakribey, take, think, call, peopl, keep, real, give, never, stop, need, tell, show, much

 


Editorial history

Received 31 January 2019; revised 24 February 2019; revised 15 March 2019; accepted 18 March 2019.


Copyright © 2019, Dan Taninecz Miller. All Rights Reserved.

Topics and emotions in Russian Twitter propaganda
by Dan Taninecz Miller.
First Monday, Volume 24, Number 5 - 6 May 2019
https://journals.uic.edu/ojs/index.php/fm/article/view/9638/7785
doi: http://dx.doi.org/10.5210/fm.v24i5.9638





A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2019. ISSN 1396-0466.