I am senior researcher and data analyst at OBCT/CCI, and lecturer at the University of Trento (2024/2025).
I am currently working on a research project dedicated to text as data (and data in the text): find out more on tadadit.xyz.
For a long time, I have been conducting research on post-Soviet affairs, focusing in particular on de facto states. I have been visiting Russia and other post-Soviet countries since 2000, and I speak fluently Russian and Romanian. I am a member of the board of directors of Asiac, the Italian Association for the study of Central Asia and the Caucasus.
In recent years, I have increasingly been working on structured approaches for analysing online sources in conflict studies and international relations, data collection methods, and related ethical issues.
I have also been working as a data analyst and consultant, crunching data at the European Data Journalism Network ( EDJNet), writing code and developing packages in the R programming language, and working on data visualisation and geographic data analysis.
I occasionally post on my data visualisation blog:
the codebase
.
Find me micro-blogging in the Fediverse @giocomai@mastodon.social.
PhD, Law and Government, 2018
Dublin City University
MA, area studies, 2006
MIREES - Interdisciplinary Research and Studies on Eastern Europe
BA, political science, 2004
University of Bologna
We have been toying with the idea of doing large-scale open-ended analysis of street names across Europe for quite some time. Back in 2019, I quickly built an R package that facilitated retrieving the names of streets used in a municipality, tried to guess which were humans and the relative gender (simplistically operationalised as “sex at birth”), and provided a basic interface to check the data (here’s a walkthrough of the process).
When everybody’s moved by the contagious joy of two athletes making history by agreeing to share an Olympic gold medal, the data analyst thinks: “two gold medals for the same competition? is this going to break my dashboard?” "Can we have two gold? 🥇" "Let's make history, man"#GOLD #Tokyo2020 pic.twitter.com/y2PATi92Jq — Giuseppe Famà (@FamaNelMondo) August 1, 2021 In my case, I was worried it would break my parsing script.
Check out a more comprehensive curated list on GitHub
castarter is a more modern, fully-featured, and consistent iteration of castarter - Content Analysis Starter Toolkit for the R programming language (a previous iteration is still available as castarter.legacy. It facilitates text mining and web scraping by taking care of many of the most common file management issues, keeps tracks of download advancement in a local database, facilitates extraction through dedicated convenience functions, and allows for basic exploration of textual corpora through a Shiny interface.
The goal of tidywikidatar is to facilitate interaction with Wikidata: all responses are transformed into data frames or simple character vectors it is easy to enable efficient caching in a local sqlite database (integration with other databases is also available) If you want to benefit of the wealth of information stored by Wikidata, but you do not like SPARQL queries and nested lists, then you may find tidywikidatar useful.
tifkremlinen is a package providing a single dataset - kremlin_en - including all contents published on the English-language version of kremlin.ru starting with 31 December 1999 and until 31 December 2020. Yearly updates will likely be made available. Link to repo on GitHub Link to official version of dataset with all details
castarter (now castarter.legacy) is designed to make it easy also for relatively inexperienced users to create a textual dataset from a website, or a section of a website, keep it up-to-date, and explore it through word frequency graphs or a web interface that makes it possibe to tag items. Documentation is available on castarter’s website.
EDJNet’s Quote Finder facilitates finding different takes on European affairs. It provides an interactive interface to explore and filter tweets by all members of the European Parliament who are on Twitter, and to visualise word frequencies as wordclouds. It is possible to filter contents based on keywords, hashtags, political affiliation and language of the tweet. A different interface allows for interactively exploring textual contents published by EU-institutions such as press-releases. In this case, available visualisations include time series in order to highlight changes in the relative prominence of certain issues within the official EU discourse.
ganttrify facilitates the creation of nice-looking Gantt charts, commonly used in project proposals and project management. Documentation is available on GitHub.
genderedstreetnames automatically finds the gender of street names, facilitates manually fixing what the automatic part got wrong, and plots the results. It gets information from OpenStreetMap and Wikidata. There is currently a vignette showing examples of what can be done with this package on the package’s website. Documentation is available on GitHub.
networkedwebsitesdetector offers a structured approach for finding websites which have clear signs of common ownership or are otherwise related. There is currently a vignette showing examples of what can be done with this package on the package’s website. Documentation is available on GitHub.
zoteror introduces basic functionalities to access the Zotero API. It allows to create new Zotero items, and to take a csv file (or data frame) and import it into Zotero, as long as data are properly mapped. zoteror has function that facilitate giving to tabular data a structure that can properly be read into Zotero. It facilitates resizing the storage space used, by ordering items by attachment size, and by allowing to add items to a collection if certain criteria are met.
In recent years, the legitimacy of electoral processes in Western democracies has been repeatedly put into question due to alleged …
Imagining Transnistria without Russia’s gas subsidy
Studying conflicts in post-Soviet spaces through structured analysis of textual contents available on-line
Digitising local history
R packages for research and data journalism
“Exploring systemic vulnerabilities for external influence in Italy”
Non-recognition is the symptom, not the cause
“Journey to Armenia” is a film documentary project I’ve been working on together with Andrea Rossini and other colleagues at OBCT between 2009 and 2013. It develops around the journey of Osip and Nadezhda Mandelstam in the Caucasus in 1930 (at the basis of Osip’s “Journey to Armenia”) and more in general around the life of the Mandelstams. At the same time, it is also a journey from Abkhazia to Nagorno Karabakh across today’s Caucasus.
Youth in the Northern Caucasus: associationism, identity, and patriotism in a complex, multi-ethnic context.