Giorgio Comai

Researcher, data analyst

EDJNet

I am senior researcher and data analyst at OBCT / CCI.

I have been working on a research project dedicated to text as data (and data in the text): find out more on tadadit.xyz.

For a long time, I have been conducting research on post-Soviet affairs, focusing in particular on de facto states. I have been visiting Russia and other post-Soviet countries since 2000, and I speak fluently Russian and Romanian. I am a member of the board of directors of Asiac, the Italian Association for the study of Central Asia and the Caucasus. I have been lecturer at the University of Trento and at Dublin City University.

In recent years, I have increasingly been working on structured approaches for analysing online sources in conflict studies and international relations, data collection methods, and related ethical issues. I have also been working as a data analyst and consultant, crunching data at the European Data Journalism Network ( EDJNet), writing code and developing packages in the R programming language, and working on data visualisation and geographic data analysis.

I occasionally post on my data visualisation blog: the codebase.

Find me micro-blogging @giocomai@mastodon.social.

Interests

data crunching, data visualisation and geocomputation
peace and conflict studies
data collection methods and ethics in research
structured analysis of web contents
Post-Soviet affairs and de facto states
digital humanities
R programming

Education

PhD, Law and Government, 2018

Dublin City University
MA, area studies, 2006

MIREES - Interdisciplinary Research and Studies on Eastern Europe
BA, political science, 2004

University of Bologna

Articles, features, and analyses

The “Russian Hawks” and the Kremlin

Previously marginal revanchist intellectuals have been producing grandiose ideological theorisations, that are now at the core of state-sanctioned narratives legitimising Russia’s invasion of Ukraine. There is however little evidence that the “hawks” can effectively constrain the Kremlin’s policy choices.

2025-09-19 10 min read Articles

Why Kyiv cannot possibly fix the 'root causes' of the conflict

No amount of unilateral Ukrainian concessions would address what Russia considers the ‘root causes’ of the conflict.

2025-05-05 6 min read Articles

Armed conflict of the Dniester, thirty years later

A newly-published book explores the circumstances around the violence that accompanied Transnistria’s de facto secession from Moldova. Three decades later, finding new answers to old conundrums is key to preventing ongoing tensions from escalating

2024-02-07 7 min read Articles

See all posts

Academic publications

Where is Russian higher education? The spatial imaginaries and realities of internationalization policy

The article considers how and at which recent historical and political junctures documents, experts and academics frame the …

Giorgio Comai

DOI

zavtra.ru_ru - Full text corpus based on the website of Russian weekly newspaper 'Zavtra' (in Russian, 1996-2024)

Full text of all articles published on zavtra.ru with metadata, in tabular format.

Giorgio Comai

DOI

Russian state institutions full-text datasets

A collection of corpora based on contents extracted from the websites of Russian state institutions

Giorgio Comai

DOI

kremlin_en - A textual dataset based on the contents published on the English-language version of the Kremlin’s website

A corpus in tabular format with all posts published on the official website of the president of the Russian Federation between 31 …

Giorgio Comai

DOI

Responding to Alleged Russian Interference by Focusing on the Vulnerabilities That Make It Possible

Based on a media analysis of mainstream Western media, this chapter defines ‘Russian meddling’ as a distinct phenomenon that emerged at …

Giorgio Comai

Project

Russian Meddling in Democratic Processes in Europe and the US

In recent years, the issue of Russian meddling and Russian interference have prominently entered the public and political debate in …

Giorgio Comai

PDF Project

See all publications

Data notes

Last updated on 2023-03-01 28 min read Data notes

The messy data sources behind "Mapping diversity"

We have been toying with the idea of doing large-scale open-ended analysis of street names across Europe for quite some time. Back in 2019, I quickly built an R package that facilitated retrieving the names of streets used in a municipality, tried to guess which were humans and the relative gender (simplistically operationalised as “sex at birth”), and provided a basic interface to check the data (here’s a walkthrough of the process).

2021-08-02 6 min read Data notes

The data you need to win the Olympics if you go NUTS

When everybody’s moved by the contagious joy of two athletes making history by agreeing to share an Olympic gold medal, the data analyst thinks: “two gold medals for the same competition? is this going to break my dashboard?” "Can we have two gold? 🥇" "Let's make history, man"#GOLD #Tokyo2020 pic.twitter.com/y2PATi92Jq — Giuseppe Famà (@FamaNelMondo) August 1, 2021 In my case, I was worried it would break my parsing script.

See all posts

R packages and data projects

Check out a more comprehensive curated list on GitHub

castarter - content analysis starter toolkit for R

castarter is a more modern, fully-featured, and consistent iteration of castarter - Content Analysis Starter Toolkit for the R programming language (a previous iteration is still available as castarter.legacy. It facilitates text mining and web scraping by taking care of many of the most common file management issues, keeps tracks of download advancement in a local database, facilitates extraction through dedicated convenience functions, and allows for basic exploration of textual corpora through a Shiny interface.

tidywikidatar - Interact with Wikidata and get tidy data frames in response

The goal of tidywikidatar is to facilitate interaction with Wikidata: all responses are transformed into data frames or simple character vectors it is easy to enable efficient caching in a local sqlite database (integration with other databases is also available) If you want to benefit of the wealth of information stored by Wikidata, but you do not like SPARQL queries and nested lists, then you may find tidywikidatar useful.

tifkremlinen

tifkremlinen is a package providing a single dataset - kremlin_en - including all contents published on the English-language version of kremlin.ru starting with 31 December 1999 and until 31 December 2020. Yearly updates will likely be made available. Link to repo on GitHub Link to official version of dataset with all details

castarter.legacy - content analysis starter toolkit for R

castarter (now castarter.legacy) is designed to make it easy also for relatively inexperienced users to create a textual dataset from a website, or a section of a website, keep it up-to-date, and explore it through word frequency graphs or a web interface that makes it possibe to tag items. Documentation is available on castarter’s website.

EDJNet’s Quote Finder

EDJNet’s Quote Finder facilitates finding different takes on European affairs. It provides an interactive interface to explore and filter tweets by all members of the European Parliament who are on Twitter, and to visualise word frequencies as wordclouds. It is possible to filter contents based on keywords, hashtags, political affiliation and language of the tweet. A different interface allows for interactively exploring textual contents published by EU-institutions such as press-releases. In this case, available visualisations include time series in order to highlight changes in the relative prominence of certain issues within the official EU discourse.

ganttrify - Create beautiful gantt charts with ggplot2

ganttrify facilitates the creation of nice-looking Gantt charts, commonly used in project proposals and project management. Documentation is available on GitHub.

genderedstreetnames - Find and plot on a map gendered street names

genderedstreetnames automatically finds the gender of street names, facilitates manually fixing what the automatic part got wrong, and plots the results. It gets information from OpenStreetMap and Wikidata. There is currently a vignette showing examples of what can be done with this package on the package’s website. Documentation is available on GitHub.

networkedwebsitesdetector

networkedwebsitesdetector offers a structured approach for finding websites which have clear signs of common ownership or are otherwise related. There is currently a vignette showing examples of what can be done with this package on the package’s website. Documentation is available on GitHub.

zoteror - Access the Zotero API in R

zoteror introduces basic functionalities to access the Zotero API. It allows to create new Zotero items, and to take a csv file (or data frame) and import it into Zotero, as long as data are properly mapped. zoteror has function that facilitate giving to tabular data a structure that can properly be read into Zotero. It facilitates resizing the storage space used, by ordering items by attachment size, and by allowing to add items to a collection if certain criteria are met.

Recent & Upcoming Talks

Who said it first? Investigating the diffusion of the Kremlin’s buzzwords before they entered the mainstream

Paper presented at the European ASN in Cluj-Napoca

2023-07-06 00:00 Cluj-Napoca, Romania

Project

Wikidata for journalists

Session for Dataharvest 2022 (Mechelen, Belgium).

2022-05-22 09:30 Mechelen, Belgium

Project Slides

Wikidata for data journalism (with R)

An online event part of the Wikidata Data Reuse Days 2022.

2022-02-16 18:00 Online

Project Slides

Nagorno Karabakh: sarà (nuovamente) guerra?

An online event on the Nagorno Karabakh conflict organised by ISPI [in Italian]

2020-09-30 18:00 ISPI

Russia hacked: problematic sources for insights on conflicts in Ukraine and the South Caucasus

In recent years, high-level leaks and hacks have featured prominently in media reporting. Russia has been repeatedly blamed for carrying out cyber-attacks against a variety of actors in Western countries, including the US Democratic party and then-presidential candidate Emmanuel Macron in France. However, Russian government actors have themselves been repeatedly hacked in recent years, including by alleged Ukrainian hacker groups and others (e.g. #SurkovLeaks). People associated with the de facto authorities in the Donbas region have also been hacked.

2019-06-11 09:30 School of International Studies, University of Trento

Responding to alleged Russian interference by focusing on the vulnerabilities that make it possible

In recent years, the legitimacy of electoral processes in Western democracies has been repeatedly put into question due to alleged …

2019-05-06 Sant'Anna School of Advanced Studies, Pisa

Victims of double standards: double victimhood and changing narratives in Azerbaijan’s public rhetoric

Here is a brief summary of key points in the form of a Twitter thread: On my way back from ASIAC's latest conference in Gorizia, where I presented a joint work co-authored with @sofiebedford: "Victims of double standards: double victimhood and changing narratives in Azerbaijan’s public rhetoric" pic.twitter.com/9amsIIdMfp — Giorgio Comai (@giocomai) December 7, 2018

2018-12-05 15:30 University of Trieste, Gorizia Campus

Giorgio Comai, Sofie Bedford

Should the EU talk more or less about conflict?

2018-11-12 09:30 Oxford University, St. Antony's college

Project

See all talks

Research notes

Russia-affiliated scholars publishing about internationalisation of higher education (based on Web of Science and OpenAlex)

Russia-affiliated scholars publishing about internationalisation of higher education (based on OpenAlex)

Russia-affiliated scholars publishing about internationalisation of higher education (based on Scopus)

Call for papers - De facto states since 2020: new realities and theory updates

On humiliation in international conflicts and the Treaty of Versailles

See all posts

Blog posts

How a small difference between two beach volley console games from the early 2000s holds an important insight for debates on "AI"

"Water has no border", even along the Inguri - A documentary (brief review)

The tech model and the communal model

See all posts

Microblog posts

Russia's invasion of Ukraine and historical trends on the use of force

On the use of 'ethnic cleansing' in the case of Nagorno Karabakh

On Ukrainians and Russians as 'One People'

See all posts

Projects

Context and crisis scenario analysis in Moldova and Transnistria

Imagining Transnistria without Russia’s gas subsidy

Text as data & data in the text

Studying conflicts in post-Soviet spaces through structured analysis of textual contents available on-line

Digital archives and local history

Digitising local history

R packages

R packages for research and data journalism

ESVEI

“Exploring systemic vulnerabilities for external influence in Italy”

De facto states

Non-recognition is the symptom, not the cause

Journey to Armenia – A film documentary project

“Journey to Armenia” is a film documentary project I’ve been working on together with Andrea Rossini and other colleagues at OBCT between 2009 and 2013. It develops around the journey of Osip and Nadezhda Mandelstam in the Caucasus in 1930 (at the basis of Osip’s “Journey to Armenia”) and more in general around the life of the Mandelstams. At the same time, it is also a journey from Abkhazia to Nagorno Karabakh across today’s Caucasus.

Youth, patriotism and politics in the Northern Caucasus

Youth in the Northern Caucasus: associationism, identity, and patriotism in a complex, multi-ethnic context.

Giorgio Comai

Researcher, data analyst

Interests

Education

Articles, features, and analyses

Academic publications

Data notes

R packages and data projects

Recent & Upcoming Talks

Research notes

Blog posts

Microblog posts

Projects

Contact