National Research University Higher School of Economics, Russian Federation

Olessia Koltsova, Sergey Koltcov, Sergel Nikolenko, Svetlana Alexeeva, Oleg Nagorny


The ability of social media to rapidly disseminate judgements on ethnicity to wide publics and to influence offline ethnic relations creates demand for methods of automatic monitoring of ethnicity-related online content (Burnap & Williams 2015). In this study we seek to measure the overall volume of ethnicity-related discussion in the Russian-language social media and to develop an approach that would automatically detect various aspects of judgements on ethnicity. We develop a comprehensive list of ethnonyms that embrace 100 Post-Soviet ethnic groups and obtain all messages containing one of those items from a two-year period from all Russian-language social media (N=2,850,947 texts). We find meaningful regional variation in the volume of attention to different ethnicities. We hand-code 7,181 messages where rare ethnicities are over-represented and train a number of classifiers (logistic regressions) to recognize different text features. We reach good quality in detecting presence of intergroup conflict, positive intergroup contact, and overall negative and positive sentiment, as well as fair quality in predicting general attitude to an ethnic group. Relevance to the topic of ethnicity is least well predicted, while some aspects such as calls for violence against an ethnic group are not sufficiently present in the data to be predicted. Unlike previous studies (Bessudnov 2016), here we see that various Central Asians, not Caucasians, take the lead in negative representation. Caucasians lead in producing their own discourse which is most likely to shift their scores up. Finally, Ukrainians are among most negatively represented because of the recent military conflict.


social media, ethnicity, monitoring, big data, attitudes

Full Text:



  • There are currently no refbacks.