Giorgio Comai

Researcher, data analyst



I am a researcher and data analyst at OBCT/CCI.

I am currently working on a research project dedicated to text as data (and data in the text): find out more on tadadit.xyz.

For a long time, I have been conducting research on post-Soviet affairs, focusing in particular on de facto states. I have been visiting Russia and other post-Soviet countries since 2000, and I speak fluently Russian and Romanian. I am a member of the board of directors of Asiac, the Italian Association for the study of Central Asia and the Caucasus.

In recent years, I have increasingly been working on structured approaches for analysing online sources in conflict studies and international relations, data collection methods, and related ethical issues.

I have also been working as a data analyst and consultant, crunching data at the European Data Journalism Network ( EDJNet), writing code and developing packages in the R programming language, and working on data visualisation and geographic data analysis.

I occasionally post on my data visualisation blog: the codebase.

Find me micro-blogging in the Fediverse @giocomai@mastodon.social.


  • data crunching, data visualisation and geocomputation
  • data collection methods and ethics in research
  • structured analysis of web contents
  • Post-Soviet affairs and de facto states
  • R programming
  • digital humanities


  • PhD, Law and Government, 2018

    Dublin City University

  • MA, area studies, 2006

    MIREES - Interdisciplinary Research and Studies on Eastern Europe

  • BA, political science, 2004

    University of Bologna

Articles, features, and analyses

Armed conflict of the Dniester, thirty years later

A newly-published book explores the circumstances around the violence that accompanied Transnistria’s de facto secession from Moldova. Three decades later, finding new answers to old conundrums is key to preventing ongoing tensions from escalating

Has Transnistria just entered its last year with Russia’s gas subsidy?

A large share of Transnistria’s economy, including most of its budget, depends on a structural subsidy it receives from Russia in the form of free gas. As Ukraine has promised to stop all Russian pipelines going through its territory by the end of 2024, how will Transnistria cope?

Conflitto tra Armenia e Azerbaijan e guerra in Nagorno Karabakh

Analisi e podcast usciti in italiano riguardanti il conflitto tra Armenia e Azerbaijan e la guerra in Nagorno Karabakh

Academic publications

kremlin_en - A textual dataset based on the contents published on the English-language version of the Kremlin’s website

A corpus in tabular format with all posts published on the official website of the president of the Russian Federation between 31 …

Responding to Alleged Russian Interference by Focusing on the Vulnerabilities That Make It Possible

Based on a media analysis of mainstream Western media, this chapter defines ‘Russian meddling’ as a distinct phenomenon that emerged at …

Russian Meddling in Democratic Processes in Europe and the US

In recent years, the issue of Russian meddling and Russian interference have prominently entered the public and political debate in …

Official external support to de facto states in the South Caucasus [published in Italian]

De facto states in the South Caucasus are supported by a patron: Russia in the case of Abkhazia and South Ossetia, Armenia in the case …

Developing a New Research Agenda on Post-Soviet De Facto States

The scholarship on post-Soviet de facto states has structurally focused on issues related to their contested status, and has long …

Conceptualising Post-Soviet de facto States as Small Dependent Jurisdictions

De facto states, according to the most established elaborations of the concept, by definition strive towards full-fledged, …

Data notes

The messy data sources behind "Mapping diversity"

We have been toying with the idea of doing large-scale open-ended analysis of street names across Europe for quite some time. Back in 2019, I quickly built an R package that facilitated retrieving the names of streets used in a municipality, tried to guess which were humans and the relative gender (simplistically operationalised as “sex at birth”), and provided a basic interface to check the data (here’s a walkthrough of the process).

The data you need to win the Olympics if you go NUTS

When everybody’s moved by the contagious joy of two athletes making history by agreeing to share an Olympic gold medal, the data analyst thinks: “two gold medals for the same competition? is this going to break my dashboard?” "Can we have two gold? 🥇" "Let's make history, man"#GOLD #Tokyo2020 pic.twitter.com/y2PATi92Jq — Giuseppe Famà (@FamaNelMondo) August 1, 2021 In my case, I was worried it would break my parsing script.

R packages and data projects

Check out a more comprehensive curated list on GitHub

castarter - content analysis starter toolkit for R

castarter is a more modern, fully-featured, and consistent iteration of castarter - Content Analysis Starter Toolkit for the R programming language (a previous iteration is still available as castarter.legacy. It facilitates text mining and web scraping by taking care of many of the most common file management issues, keeps tracks of download advancement in a local database, facilitates extraction through dedicated convenience functions, and allows for basic exploration of textual corpora through a Shiny interface.

tidywikidatar - Interact with Wikidata and get tidy data frames in response

The goal of tidywikidatar is to facilitate interaction with Wikidata: all responses are transformed into data frames or simple character vectors it is easy to enable efficient caching in a local sqlite database (integration with other databases is also available) If you want to benefit of the wealth of information stored by Wikidata, but you do not like SPARQL queries and nested lists, then you may find tidywikidatar useful.


tifkremlinen is a package providing a single dataset - kremlin_en - including all contents published on the English-language version of kremlin.ru starting with 31 December 1999 and until 31 December 2020. Yearly updates will likely be made available. Link to repo on GitHub Link to official version of dataset with all details

castarter.legacy - content analysis starter toolkit for R

castarter (now castarter.legacy) is designed to make it easy also for relatively inexperienced users to create a textual dataset from a website, or a section of a website, keep it up-to-date, and explore it through word frequency graphs or a web interface that makes it possibe to tag items. Documentation is available on castarter’s website.

EDJNet’s Quote Finder

EDJNet’s Quote Finder facilitates finding different takes on European affairs. It provides an interactive interface to explore and filter tweets by all members of the European Parliament who are on Twitter, and to visualise word frequencies as wordclouds. It is possible to filter contents based on keywords, hashtags, political affiliation and language of the tweet. A different interface allows for interactively exploring textual contents published by EU-institutions such as press-releases. In this case, available visualisations include time series in order to highlight changes in the relative prominence of certain issues within the official EU discourse.

ganttrify - Create beautiful gantt charts with ggplot2

ganttrify facilitates the creation of nice-looking Gantt charts, commonly used in project proposals and project management. Documentation is available on GitHub.

genderedstreetnames - Find and plot on a map gendered street names

genderedstreetnames automatically finds the gender of street names, facilitates manually fixing what the automatic part got wrong, and plots the results. It gets information from OpenStreetMap and Wikidata. There is currently a vignette showing examples of what can be done with this package on the package’s website. Documentation is available on GitHub.


networkedwebsitesdetector offers a structured approach for finding websites which have clear signs of common ownership or are otherwise related. There is currently a vignette showing examples of what can be done with this package on the package’s website. Documentation is available on GitHub.

zoteror - Access the Zotero API in R

zoteror introduces basic functionalities to access the Zotero API. It allows to create new Zotero items, and to take a csv file (or data frame) and import it into Zotero, as long as data are properly mapped. zoteror has function that facilitate giving to tabular data a structure that can properly be read into Zotero. It facilitates resizing the storage space used, by ordering items by attachment size, and by allowing to add items to a collection if certain criteria are met.

Recent & Upcoming Talks

Who said it first? Investigating the diffusion of the Kremlin’s buzzwords before they entered the mainstream

Paper presented at the European ASN in Cluj-Napoca

Wikidata for journalists

Session for Dataharvest 2022 (Mechelen, Belgium).

Wikidata for data journalism (with R)

An online event part of the Wikidata Data Reuse Days 2022.

Nagorno Karabakh: sarà (nuovamente) guerra?

An online event on the Nagorno Karabakh conflict organised by ISPI [in Italian]

Russia hacked: problematic sources for insights on conflicts in Ukraine and the South Caucasus

In recent years, high-level leaks and hacks have featured prominently in media reporting. Russia has been repeatedly blamed for carrying out cyber-attacks against a variety of actors in Western countries, including the US Democratic party and then-presidential candidate Emmanuel Macron in France. However, Russian government actors have themselves been repeatedly hacked in recent years, including by alleged Ukrainian hacker groups and others (e.g. #SurkovLeaks). People associated with the de facto authorities in the Donbas region have also been hacked.

Responding to alleged Russian interference by focusing on the vulnerabilities that make it possible

In recent years, the legitimacy of electoral processes in Western democracies has been repeatedly put into question due to alleged …

Victims of double standards: double victimhood and changing narratives in Azerbaijan’s public rhetoric

Here is a brief summary of key points in the form of a Twitter thread: On my way back from ASIAC's latest conference in Gorizia, where I presented a joint work co-authored with @sofiebedford: "Victims of double standards: double victimhood and changing narratives in Azerbaijan’s public rhetoric" pic.twitter.com/9amsIIdMfp — Giorgio Comai (@giocomai) December 7, 2018

Should the EU talk more or less about conflict?


Context and crisis scenario analysis in Moldova and Transnistria

Imagining Transnistria without Russia’s gas subsidy

Text as data & data in the text

Studying conflicts in post-Soviet spaces through structured analysis of textual contents available on-line

Digital archives and local history

Digitising local history

R packages

R packages for research and data journalism


“Exploring systemic vulnerabilities for external influence in Italy”

De facto states

Non-recognition is the symptom, not the cause

Journey to Armenia – A film documentary project

“Journey to Armenia” is a film documentary project I’ve been working on together with Andrea Rossini and other colleagues at OBCT between 2009 and 2013. It develops around the journey of Osip and Nadezhda Mandelstam in the Caucasus in 1930 (at the basis of Osip’s “Journey to Armenia”) and more in general around the life of the Mandelstams. At the same time, it is also a journey from Abkhazia to Nagorno Karabakh across today’s Caucasus.

Youth, patriotism and politics in the Northern Caucasus

Youth in the Northern Caucasus: associationism, identity, and patriotism in a complex, multi-ethnic context.


To get in touch, email is best: g@giorgiocomai.eu.