Session co-hosted by Giorgio Comai and Friedrich Lindenberg
Wikipedia is an incredible source of information, but much of the details it stores are unstructured and difficult to retrieve. Its lesser known sister project, Wikidata, has however many of the features that data and investigative journalists love: a huge treasure trove of connected data with nicely deduplicated unique identifiers, unecumbered by licensing restrictions. Even if often not as complete as we wish it was, Wikidata is a great point of reference for works small and big.
Wikidata’s data structure and default querying mechanisms may be intimidating at first, but things start to make sense quickly with some familiarity with its core logic and key concepts. Tools commonly used by data journalists such as OpenRefine and the R programming language (via the tidywikidatar
package) can also be effectively used to make the complexity much more manageable.
Use cases, pitfalls, and practical examples of including Wikidata in your data journalism workflows, from exploring sanctioned entities and persons, to analysing street names. Understand how Wikidata stores data, and how that information can be retrieved systematically with different tools to be employed in data and investigative journalism workflows.
Slides available following this link
Package documentation, including a more extensive example of a workflow that takes Wikipedia as a starting point.