In recent years, the internet has been increasingly adopted as a key means of communication by local authorities, organisations and news media throughout the post-Soviet context. This has led to the creation and on-line publication of content that is routinely consulted and quoted by scholars of area studies, who, however, seemingly approach the web as an inordinate mass of content that can be superficially explored thanks to search engines and meaningful keywords. Structured analysis of content is still uncommon in area studies for a few reasons: it is considered to be time consuming, difficult to learn and, fundamentally, relevant datasets are usually not readily available. This paper briefly presents how to overcome these obstacles by introducing an open source package developed by the author that facilitates the creation of structured textual datasets from web content, and allows for basic word frequency analysis in a straightforward web interface. This article argues in favour of a wider use of quantitative methods based on the analysis of word frequency in textual datasets extracted from the internet as a starting point for in depth research with established qualitative methods. The examples presented in this paper relate to the study of post-Soviet de facto states.