Personal data metaphors and imagery

I am currently completing my new book, with the working title of Data Selves, to be published by Polity. Here is an excerpt from a chapter that looks at personal data materialisations.

We have to work hard to find figures of speech and ways of thinking to encapsulate the ontology of digital data. The concept of digital data, a first glance, appears to describe a wholly immaterial phenomenon that does not engage the senses: there seems to be nothing to look at, touch, hear, smell or taste. The metaphors and other figures of language employed to describe digital data are attempts to conceptualise and make sense of these novel forms of information and their ontologies. Even as digital technologies continue to generate and process detailed information about people’s bodies, social relationships, emotions, practices and preferences, prevailing discourses on these data tend to de-personalise and de-humanise them. The use of the term ‘data’ to describe these details signals a way of viewing and treating them, presenting these aspects as raw materials, ripe for processing and exploitation to make them give up their meaning (Räsänen and Nyce 2013; Gitelman and Jackson 2013). Once they have become defined and labelled as ‘data’, these details about people’s lives tend to be imagined as impersonal, scientific and neutral. They have been extracted from their embodied, sensory and affective contexts, rendered into digitised formats and viewed as material for research, management or commercial purposes.

The term ‘data’ is closely associated with ‘information’. Information as a term is subject to a wide range of (often debated) definitions in the academic literature. It usually involves the assumption that there are structures, correlations and patterns involved in the organisation and communication of meaning. Information tends to be imbued with the pragmatic meanings of rational thought-processes and material that can contribute to acquiring and using knowledge. It has use and value based on these attributes (Buckland 1991). Digital data, as forms of information that have been collected and processed using digital technologies, are often portrayed as more accurate and insightful than many other information sources (Lupton 2015; Kitchin 2014). Many references to big data represent it as anonymised massive collections of details that are valuable commodities, open to profitable exploitation. The World Economic Forum’s report (2011) describing big data as ‘the new oil’, ‘a valuable resource of the 21st century’ and a ‘new asset class’ is an influential example of this metaphor.

Metaphors of fluidities also tend to be employed when describing digital data. Digital data are popularly imagined to stream, flow and circulate in the ephemeral digital data economy, emitting imperceptibly from digital devices, flying through the air to lodge in and move between computing clouds as if comprised of vaporised water. Many metaphors of digital data use words and phrases that denote overwhelming power and mobilities, again often referring to large bodies of uncontrollable water; the data ‘deluge’, ‘flood’, ‘ocean’ and even ‘tsunami’ that constantly appear in popular accounts of big data in particular. These figures of speech are used to denote feelings of being overwhelmed by large, powerful masses of data (‘big data’) that appear to be difficult to control or make sense of in their volume. Still other metaphors represent data as ‘exhaust’, ‘trails’ or ‘breadcrumbs’, denoting the by-products of other interactions on digital networks. These metaphors suggest a tangible, perceivable form of digital data, albeit tiny, that require effort to discern and give up their value (Lupton 2015).

The terms ‘clean’ and ‘dirty’ have long been used in descriptions of data, however these data are generated. These terms refer to the degree to which the data can be used for analysis: clean data are ready for use, dirty data sets require further processing because they are incomplete, outdated, incorrect or obsolete. Portrayals of the affordances of digital data on the body/self, in their emphasis on objectivity and neutrality – or what might be described as their ‘cleanliness’ – denote a view of information about oneself that privileges such ‘clean’ data over what might be contrasted as the ‘dirty’ data that the body produces from sensual experience. Human cognition, memory, perception and sensation are ‘weak’ because they are ‘unscientific’. They are borne of fallible fleshly embodiment rather than the neutral, objective data that are generated by computer software and hardware.

Data have also been referred to as ‘raw’, suggesting that they are materials that are untouched by culture. It is assumed that by working on ‘raw’ data, data scientists transform these materials into useable commodities. Part of this transformation may involve ‘cleaning’ ‘dirty data’. Boellstofff (2013) uses the term ‘rotted data’ to describe the ways in which the materiality of data can degrade (for example, damaged hard drives that store data), but also how data can be transformed in unplanned or accidental ways that do not follow algorithmic prescriptions. Here again, these metaphors of ‘raw’, ‘cooked’ and ‘rotted’ draw attention the materiality of data and the processing, deterioration and recuperation that are part of human-data assemblages.

In her essay on digital data, Melissa Gregg (2015) employs a number of other metaphors that she devised to encapsulate the meanings of data. Data ‘agents’ suggests the capacities of data to work with algorithms to generate connections: matches, suggestions and relationships between social phenomena that otherwise would not be created. Gregg gives the examples of recommendation sites and online dating services, which connect strangers and their experiences with each other in ways that were previously unimaginable. She goes on to suggest that ‘In these instances, data acts [sic] rather more like our appendage, our publicist, even our shadow’ (Gregg 2015). Gregg also employs the metaphor of data ‘sweat’ (another liquid metaphor) in the attempt to emphasise the embodied nature of data, emerging or leaking from within the body to the outside in an uncontrolled manner to convey information about that body, including how hard it is working or how productive it is. Data ‘sweat’, therefore, can be viewed as a materialisation of labour. She then suggests the concept of data ‘trash’ (similar to the ‘exhaust’ metaphor mentioned above). Data ‘trash’ is data that is in some way useless or potentially polluting or hazardous: Gregg links this metaphor with the environmental effects generated by creating, storing and processing data in data centres. Both the metaphors of data ‘sweat’ and ‘trash’ suggest the materiality of digitised information as well as its ambivalent and dynamic status as it moves between ascriptions of high value and useless or even disgusting by-product.

An analysis of images used to represent big data in online editions of The New York Times and The Washington Post (Pentzold et al. 2018) found that they tended to fall into several categories in the attempt to visually represent big data: using large-scale numbers, interpretive abstract renditions, showing numbers or graphs on smartphone or computer screens, images of data warehouses and devices that generate data, robots, datafied individuals and meteorological imagery such as clouds. A dominant visual image involved photographic images of people working in the big data industry, such as data scientists, ‘nerds’ and ‘geeks’ (overwhelmingly male) and logos of internet companies. These images served as visual surrogates to represent the immateriality of big data. The researchers compared these images with those found on a general Google image search for ‘big data’ and also on Wikipedia and the image platforms Fotolia, Flickr and Pinterest. They noted that the images they found on these platforms were very homogeneous, featuring the colour blue, the words ‘big data’ written large, binary numbers, network structures and surveillant human eyes. These kinds of descriptions suggest that big datasets (including those drawn from people’s lives and experiences) are natural resources that are unproblematically open to exploration, mining and processing for profit. The personal details about people contained within these massive datasets are reimagined as commodities or research material. It is telling that the human elements of these images largely include men working in data analytics rather than the range of people who generate data or who may make use of their own data as part of their everyday lives.

In these types of portrayals, the status of personal data as human, or at least partly human entities is submerged in the excitement about how best to exploit these details as material commodities. Their liveliness is represented in ways that suggest their economic potential as nonhuman organic materials (streams, flows, oil, clouds, breadcrumbs). Yet conversely, another dominant discourse about personal data, which is particularly promulgated by the data profiling industry and civil society privacy advocates, is that these details are all-too-human or even excessively human: intensely intimate and revealing of people’s uniquely human characteristics. Proponents of the ‘Internet of Me’ make claims such as:

Now imagine tech working in your body at the biological level. Your body could express itself on its own, without you having to be in charge, to deliver more happiness, better health, whatever you truly need and want.

These sociotechnical imaginaries position devices and data as working together with human bodies in ways that devolve agency to the device. ‘You’ no longer have to be ‘in charge’ – instead, the device takes over. Other imaginaries around the Internet of Me configure the idea of personal cloud computing, in which all people’s personal data go to a centralised cloud computing repository where they will be able to access all their data.

When I performed my own Google image search using the term ‘personal data’, the images that were returned by the search again featured the colour blue, male figures and binary numbers. Notably, several images showed a pen and a paper form with the words ‘personal information’ at the top, perhaps as an attempt to respond to the immateriality of digitised information by rendering it in analogue forms with which many people would be familiar. Images using locks and keys as metaphors were also dominant, suggesting the value of personal data but also how closed they are to people who may want to make use of them. When I used the search term ‘personal data privacy’, new images were introduced in addition to those appearing under ‘personal data’. These included images of spy-like or Big Brother surveillance figures and also images showing human hands protectively attempting to cover computer keyboards or screens, as if to elude the gaze of these spying figures as people used their devices.

One online article on the Internet of Me features an image in which a human body is comprised of many different social media and other internet platform icons as well as coloured dots representing other data sources. Instead of an assemblage of flesh-bone-blood, the body is completely datafied and networked. The interesting thing is that this body is represented as an autonomous agent. The networks that generate data and keep the body vibrant and functioning are internal, not externalised to networks outside this socially alienated body. Data flows are contained within elements of the body rather than leaking outside it to other bodies. This suggests an imaginary in which the Internet of Me is neatly contained within the envelope of the body/self and thus able to control ingress and egress. This is an orderly closed system, one that confounds both utopian and dystopian imaginaries concerning the possibilities and risks of one’s body/self being sited as just one node in vast and complex networked digital system.

In contrast, a series of 2018 British advertisements for the BBH London & Experian data analytics company used the ‘data self’ concept in an attempt to humanise data profiling and emphasise the similarities of these profiles to the people from whom they are generated. Six versions of this ad featured photographs of comedian Marcus Brigstoke and his ‘data self’, a person who looked exactly like him. As one of the ads, headlined ‘Meet your Data Self’ claimed: ‘Your Data Self is the version of you that companies see when you apply for things like credit cards, loans and mortgages. You two should get acquainted’. One of the ads, headlined, ‘What shape is your Data Self in?’, showed the comedian looking at his doppelganger lifting a heavy barbell. The copy read ‘If your Data Self looks good to lenders, you’re more likely to be approved for credit. That’s a weight off. Get to know your Data Self at Experian.com.uk.’ Another ad asked ‘Is your Data Self making the right impression?’, depicting the comedian, dressed in casual clothes, shaking hands with his more formally dressed (in suit and tie) data self.  Notably, this person and his ‘data self’ was a white, youngish man, excluding representatives from other social groups.

The ontological status of personal data, therefore, constantly shifts in popular representations between human and nonhuman, valuable commodity and waste matter, nature and culture, productive and dangerous. In both modes of representation, the vibrancies of digital data – their ceaseless production, movements, leakages – are considered to be both exciting and full of potential but also as dangerous and risky. Personal data assemblages are difficult to control or exploit by virtue of their liveliness.

References

Boellstorff, T. (2013). Making big data, in theory. First Monday, 18(10).

Buckland, M. K. (1991). Information as thing. Journal of the American Society for Information Science, 42(5), 351.

Gitelman, L., & Jackson, V. (2013). Introduction. In L. Gitelman (Ed.), Raw Data is an Oxymoron (pp. 1-14). Cambridge, MA: MIT Press.

Gregg, M. (2015). The gift that is not given. In T. Boellstorff, & B. Maurer (Eds.), Data, Now Bigger and Better! (pp. 47-66). Chicago: Prickly Paradigm.

Kitchin, R. (2014). The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. London: Sage.

Lupton, D. (2015). Digital Sociology. London: Routledge.

Pentzold, C., Brantner, C., & Fölsche, L. (2018). Imagining big data: Illustrations of “big data” in US news articles, 2010–2016. New Media & Society, online first.

Räsänen, M., & Nyce, J. M. (2013). The raw is cooked: data in intelligence practice. Science, Technology & Human Values, 38(5), 655-677.

World Economic Forum (2011). Personal Data: The Emergence of a New Asset Class. World Economic Forum.