University of Leicester | School of Geography, Geology and the Environment
Data Census: Charting the diverse geographies of User-Generated Content
The past ten years have seen a momentous change in the way data is produced and used, driven by “web 2.0” and social media platforms (such as Wikipedia, Twitter and Instagram), and intertwined with the wide-spread adoption of smartphones and the open data movement. These phenomena have led to a deluge of digital data, including geospatial data, and the subsequent boom in big data analytics, data science, and machine learning (Kitchin, 2013). The use of these information sources is now commonplace, considered enabling of knowledge production (Smith, 2018) and a vital component of the recent advances in artificial intelligence.
However, the relationships between the geographies of user-generated content – commonly referred to as volunteered geographic information (VGI) in geographic information science – and the underlying socio-demographics at global (Graham et al., 2015), national (Bright et al., 2015) and urban (Ballatore and De Sabbata, 2018; Shelton et al., 2015) scales is complex and still largely unexplored. These studies indicate that user-generated content is highly concentrated in few locations, typically in urban areas with high socio-economic profiles, but only about half of the variation is accounted for by variation in socio-demographic variables, leaving the rest unexplained. Salient questions need to be asked about what part of our cities and regions are represented and how, and what impact these uneven geographies have on the algorithms and social science research that rely on those data.
The project aims to move forward from isolated analyses of single platforms to a contextualised, integrated understanding of the complex representations created on multiple platforms, and how user-generated content varies through time and space.
The project will:
- identify specific ontological dimensions of user-generated content (Kitchin and McArdle, 2016), including its geographic representativeness and relevance;
- conduct a cross-platform analysis of user-generated content and its relationship to socio-demographic data, including multiple platforms;
- establish the core of a novel data census, i.e. an open spatial dataset that charts the representativeness and salience of user-generated content, which will become a point of reference for analysts and developers on the advantages, biases and limitations of popular sources of big data.