Scholars working on the post-Soviet space frequently refer to web contents at different stages of their research process. However, they (we) usually approach the internet as an inordinate mass of contents, that can be superficially explored thanks to search engines and meaningful keywords. This is partially due to lack of technical skills, as well as the unavailability of relevant, pre-existing datasets. Both obstacles are not insurmountable. A research question involving a well-defined territory, institution or community may benefit of a structured analysis of the textual contents of a specific website, a section of a website, or a limited number of websites. Such analysis may allow both “to find the needle and to describe the haystack”, allowing to proceed with fieldwork and established qualitative research methods with more confidence. This talk presents the author’s experience with creating textual datasets related to Georgia and post-Soviet de facto states, and his attempts at answering questions such as: When exactly did it become common to refer to Abkhazia and South Ossetia as “occupied territories”? Why “occupation” and not “annexation”? And what do authorities in post-Soviet de facto states talk about? Is there anything unusual about it?