content analysis

castarter - content analysis starter toolkit for R

castarter is a more modern, fully-featured, and consistent iteration of castarter - Content Analysis Starter Toolkit for the R programming language (a previous iteration is still available as castarter.legacy. It facilitates text mining and web scraping by taking care of many of the most common file management issues, keeps tracks of download advancement in a local database, facilitates extraction through dedicated convenience functions, and allows for basic exploration of textual corpora through a Shiny interface.

tifkremlinen

tifkremlinen is a package providing a single dataset - kremlin_en - including all contents published on the English-language version of kremlin.ru starting with 31 December 1999 and until 31 December 2020. Yearly updates will likely be made available. Link to repo on GitHub Link to official version of dataset with all details

castarter.legacy - content analysis starter toolkit for R

castarter (now castarter.legacy) is designed to make it easy also for relatively inexperienced users to create a textual dataset from a website, or a section of a website, keep it up-to-date, and explore it through word frequency graphs or a web interface that makes it possibe to tag items. Documentation is available on castarter’s website.

Quantitative Analysis of Web Content in Support of Qualitative Research. Examples from the Study of Post-Soviet De Facto States

In recent years, the internet has been increasingly adopted as a key means of communication by local authorities, organisations and …

Word frequency of Ukraine, Crimea, DNR/LNR and Novorossiya on 1tv.ru

The data included in this post were prepared for publication on the online journal Ukraine-Analysen 182 (http://www.laender-analysen.de/ukraine/pdf/UkraineAnalysen182.pdf). This is a quick update to the data presented in a previous post published on this blog in November 2015 on “ Word frequency of ‘Ukraine’, ‘Crimea’, and ‘Syria’ on Russia’s First Channel". The dataset has been created by extracting textual contents of each news item published on Pervy Kanal’s website between the beginning of Putin’s presidency on 7 May 2012 and 1 March 2017 (115.

Russian media: more Trump than Putin

Recently, Scan Interfax revealed that in the month of January 2017, for the first time since 2011, Vladimir Putin has not been the most frequently mentioned individual on Russian media: Donald Trump was. The news quickly made the rounds on American media (e.g. on Washington Post, Newsweek, CNN, and others). According to an article by Konstantin von Eggert published by Deutsche Welle, on 15 February Russian state-owned media of the VGTRK group were instructed to stop talking about Trump so much.

Word frequency of ‘Ukraine’, ‘Crimea’, and ‘Syria’ on Russia’s First Channel

Just more than a month has passed since Russia started its military intervention in Syria. There has been a lot of talking about the motivations behind the Kremlin’s decision to take an active military role in the Middle East, and a number of competing explanations have been proposed. However, one element that has been frequently quoted is the Kremlin’s desire to shift attention in Russia’s media from Ukraine to something else, while maintaining the focus on foreign affairs rather than domestic issues.

Aliyev: more and more ‘double standards’

At the latest ASN conference in New York I have been talking with Sofie Bedford about the rhetoric of ‘double standards’ in Azerbaijan and elsewhere. The conversation prompted a vary basic research question: “is it true that Azerbaijan’s president has been using more and more the rhetoric of ‘double standards’?” Since it is exactly the kind of straightforward question that can be easily approached with a tool I have been working on recently that simplifies quantitative content analysis of textual materials available online, I gave it a spin… and here are the results.