Personal data metaphors and imagery

I am currently completing my new book, with the working title of Data Selves, to be published by Polity. Here is an excerpt from a chapter that looks at personal data materialisations.

We have to work hard to find figures of speech and ways of thinking to encapsulate the ontology of digital data. The concept of digital data, a first glance, appears to describe a wholly immaterial phenomenon that does not engage the senses: there seems to be nothing to look at, touch, hear, smell or taste. The metaphors and other figures of language employed to describe digital data are attempts to conceptualise and make sense of these novel forms of information and their ontologies. Even as digital technologies continue to generate and process detailed information about people’s bodies, social relationships, emotions, practices and preferences, prevailing discourses on these data tend to de-personalise and de-humanise them. The use of the term ‘data’ to describe these details signals a way of viewing and treating them, presenting these aspects as raw materials, ripe for processing and exploitation to make them give up their meaning (Räsänen and Nyce 2013; Gitelman and Jackson 2013). Once they have become defined and labelled as ‘data’, these details about people’s lives tend to be imagined as impersonal, scientific and neutral. They have been extracted from their embodied, sensory and affective contexts, rendered into digitised formats and viewed as material for research, management or commercial purposes.

The term ‘data’ is closely associated with ‘information’. Information as a term is subject to a wide range of (often debated) definitions in the academic literature. It usually involves the assumption that there are structures, correlations and patterns involved in the organisation and communication of meaning. Information tends to be imbued with the pragmatic meanings of rational thought-processes and material that can contribute to acquiring and using knowledge. It has use and value based on these attributes (Buckland 1991). Digital data, as forms of information that have been collected and processed using digital technologies, are often portrayed as more accurate and insightful than many other information sources (Lupton 2015; Kitchin 2014). Many references to big data represent it as anonymised massive collections of details that are valuable commodities, open to profitable exploitation. The World Economic Forum’s report (2011) describing big data as ‘the new oil’, ‘a valuable resource of the 21st century’ and a ‘new asset class’ is an influential example of this metaphor.

Metaphors of fluidities also tend to be employed when describing digital data. Digital data are popularly imagined to stream, flow and circulate in the ephemeral digital data economy, emitting imperceptibly from digital devices, flying through the air to lodge in and move between computing clouds as if comprised of vaporised water. Many metaphors of digital data use words and phrases that denote overwhelming power and mobilities, again often referring to large bodies of uncontrollable water; the data ‘deluge’, ‘flood’, ‘ocean’ and even ‘tsunami’ that constantly appear in popular accounts of big data in particular. These figures of speech are used to denote feelings of being overwhelmed by large, powerful masses of data (‘big data’) that appear to be difficult to control or make sense of in their volume. Still other metaphors represent data as ‘exhaust’, ‘trails’ or ‘breadcrumbs’, denoting the by-products of other interactions on digital networks. These metaphors suggest a tangible, perceivable form of digital data, albeit tiny, that require effort to discern and give up their value (Lupton 2015).

The terms ‘clean’ and ‘dirty’ have long been used in descriptions of data, however these data are generated. These terms refer to the degree to which the data can be used for analysis: clean data are ready for use, dirty data sets require further processing because they are incomplete, outdated, incorrect or obsolete. Portrayals of the affordances of digital data on the body/self, in their emphasis on objectivity and neutrality – or what might be described as their ‘cleanliness’ – denote a view of information about oneself that privileges such ‘clean’ data over what might be contrasted as the ‘dirty’ data that the body produces from sensual experience. Human cognition, memory, perception and sensation are ‘weak’ because they are ‘unscientific’. They are borne of fallible fleshly embodiment rather than the neutral, objective data that are generated by computer software and hardware.

Data have also been referred to as ‘raw’, suggesting that they are materials that are untouched by culture. It is assumed that by working on ‘raw’ data, data scientists transform these materials into useable commodities. Part of this transformation may involve ‘cleaning’ ‘dirty data’. Boellstofff (2013) uses the term ‘rotted data’ to describe the ways in which the materiality of data can degrade (for example, damaged hard drives that store data), but also how data can be transformed in unplanned or accidental ways that do not follow algorithmic prescriptions. Here again, these metaphors of ‘raw’, ‘cooked’ and ‘rotted’ draw attention the materiality of data and the processing, deterioration and recuperation that are part of human-data assemblages.

In her essay on digital data, Melissa Gregg (2015) employs a number of other metaphors that she devised to encapsulate the meanings of data. Data ‘agents’ suggests the capacities of data to work with algorithms to generate connections: matches, suggestions and relationships between social phenomena that otherwise would not be created. Gregg gives the examples of recommendation sites and online dating services, which connect strangers and their experiences with each other in ways that were previously unimaginable. She goes on to suggest that ‘In these instances, data acts [sic] rather more like our appendage, our publicist, even our shadow’ (Gregg 2015). Gregg also employs the metaphor of data ‘sweat’ (another liquid metaphor) in the attempt to emphasise the embodied nature of data, emerging or leaking from within the body to the outside in an uncontrolled manner to convey information about that body, including how hard it is working or how productive it is. Data ‘sweat’, therefore, can be viewed as a materialisation of labour. She then suggests the concept of data ‘trash’ (similar to the ‘exhaust’ metaphor mentioned above). Data ‘trash’ is data that is in some way useless or potentially polluting or hazardous: Gregg links this metaphor with the environmental effects generated by creating, storing and processing data in data centres. Both the metaphors of data ‘sweat’ and ‘trash’ suggest the materiality of digitised information as well as its ambivalent and dynamic status as it moves between ascriptions of high value and useless or even disgusting by-product.

An analysis of images used to represent big data in online editions of The New York Times and The Washington Post (Pentzold et al. 2018) found that they tended to fall into several categories in the attempt to visually represent big data: using large-scale numbers, interpretive abstract renditions, showing numbers or graphs on smartphone or computer screens, images of data warehouses and devices that generate data, robots, datafied individuals and meteorological imagery such as clouds. A dominant visual image involved photographic images of people working in the big data industry, such as data scientists, ‘nerds’ and ‘geeks’ (overwhelmingly male) and logos of internet companies. These images served as visual surrogates to represent the immateriality of big data. The researchers compared these images with those found on a general Google image search for ‘big data’ and also on Wikipedia and the image platforms Fotolia, Flickr and Pinterest. They noted that the images they found on these platforms were very homogeneous, featuring the colour blue, the words ‘big data’ written large, binary numbers, network structures and surveillant human eyes. These kinds of descriptions suggest that big datasets (including those drawn from people’s lives and experiences) are natural resources that are unproblematically open to exploration, mining and processing for profit. The personal details about people contained within these massive datasets are reimagined as commodities or research material. It is telling that the human elements of these images largely include men working in data analytics rather than the range of people who generate data or who may make use of their own data as part of their everyday lives.

In these types of portrayals, the status of personal data as human, or at least partly human entities is submerged in the excitement about how best to exploit these details as material commodities. Their liveliness is represented in ways that suggest their economic potential as nonhuman organic materials (streams, flows, oil, clouds, breadcrumbs). Yet conversely, another dominant discourse about personal data, which is particularly promulgated by the data profiling industry and civil society privacy advocates, is that these details are all-too-human or even excessively human: intensely intimate and revealing of people’s uniquely human characteristics. Proponents of the ‘Internet of Me’ make claims such as:

Now imagine tech working in your body at the biological level. Your body could express itself on its own, without you having to be in charge, to deliver more happiness, better health, whatever you truly need and want.

These sociotechnical imaginaries position devices and data as working together with human bodies in ways that devolve agency to the device. ‘You’ no longer have to be ‘in charge’ – instead, the device takes over. Other imaginaries around the Internet of Me configure the idea of personal cloud computing, in which all people’s personal data go to a centralised cloud computing repository where they will be able to access all their data.

When I performed my own Google image search using the term ‘personal data’, the images that were returned by the search again featured the colour blue, male figures and binary numbers. Notably, several images showed a pen and a paper form with the words ‘personal information’ at the top, perhaps as an attempt to respond to the immateriality of digitised information by rendering it in analogue forms with which many people would be familiar. Images using locks and keys as metaphors were also dominant, suggesting the value of personal data but also how closed they are to people who may want to make use of them. When I used the search term ‘personal data privacy’, new images were introduced in addition to those appearing under ‘personal data’. These included images of spy-like or Big Brother surveillance figures and also images showing human hands protectively attempting to cover computer keyboards or screens, as if to elude the gaze of these spying figures as people used their devices.

One online article on the Internet of Me features an image in which a human body is comprised of many different social media and other internet platform icons as well as coloured dots representing other data sources. Instead of an assemblage of flesh-bone-blood, the body is completely datafied and networked. The interesting thing is that this body is represented as an autonomous agent. The networks that generate data and keep the body vibrant and functioning are internal, not externalised to networks outside this socially alienated body. Data flows are contained within elements of the body rather than leaking outside it to other bodies. This suggests an imaginary in which the Internet of Me is neatly contained within the envelope of the body/self and thus able to control ingress and egress. This is an orderly closed system, one that confounds both utopian and dystopian imaginaries concerning the possibilities and risks of one’s body/self being sited as just one node in vast and complex networked digital system.

In contrast, a series of 2018 British advertisements for the BBH London & Experian data analytics company used the ‘data self’ concept in an attempt to humanise data profiling and emphasise the similarities of these profiles to the people from whom they are generated. Six versions of this ad featured photographs of comedian Marcus Brigstoke and his ‘data self’, a person who looked exactly like him. As one of the ads, headlined ‘Meet your Data Self’ claimed: ‘Your Data Self is the version of you that companies see when you apply for things like credit cards, loans and mortgages. You two should get acquainted’. One of the ads, headlined, ‘What shape is your Data Self in?’, showed the comedian looking at his doppelganger lifting a heavy barbell. The copy read ‘If your Data Self looks good to lenders, you’re more likely to be approved for credit. That’s a weight off. Get to know your Data Self at’ Another ad asked ‘Is your Data Self making the right impression?’, depicting the comedian, dressed in casual clothes, shaking hands with his more formally dressed (in suit and tie) data self.  Notably, this person and his ‘data self’ was a white, youngish man, excluding representatives from other social groups.

The ontological status of personal data, therefore, constantly shifts in popular representations between human and nonhuman, valuable commodity and waste matter, nature and culture, productive and dangerous. In both modes of representation, the vibrancies of digital data – their ceaseless production, movements, leakages – are considered to be both exciting and full of potential but also as dangerous and risky. Personal data assemblages are difficult to control or exploit by virtue of their liveliness.


Boellstorff, T. (2013). Making big data, in theory. First Monday, 18(10).

Buckland, M. K. (1991). Information as thing. Journal of the American Society for Information Science, 42(5), 351.

Gitelman, L., & Jackson, V. (2013). Introduction. In L. Gitelman (Ed.), Raw Data is an Oxymoron (pp. 1-14). Cambridge, MA: MIT Press.

Gregg, M. (2015). The gift that is not given. In T. Boellstorff, & B. Maurer (Eds.), Data, Now Bigger and Better! (pp. 47-66). Chicago: Prickly Paradigm.

Kitchin, R. (2014). The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. London: Sage.

Lupton, D. (2015). Digital Sociology. London: Routledge.

Pentzold, C., Brantner, C., & Fölsche, L. (2018). Imagining big data: Illustrations of “big data” in US news articles, 2010–2016. New Media & Society, online first.

Räsänen, M., & Nyce, J. M. (2013). The raw is cooked: data in intelligence practice. Science, Technology & Human Values, 38(5), 655-677.

World Economic Forum (2011). Personal Data: The Emergence of a New Asset Class. World Economic Forum.


What do Australian women think of My Health Record?

The Australian government has met with difficulties in persuading Australians to register with its national electronic health record system, My Health Record. Just one in five Australians have a My Health Record. I have just submitted an article for peer review that reports on the findings from the Australian Women and Digital Health Project in which the participants talked about their attitudes to and experiences with My Health Record in interviews and focus groups. As the Australian Digital Health Agency moves towards an opt-out process to register as many Australians as possible, the findings from this study offer important insights into what Australian women think of My Health Record.

The full preprint version of the article can be accessed here: Article – My Health Record preprint.

Here are the major findings:

  • Despite their generally highly engaged use of online health and medical sources, awareness and use of My Health Record was quite low among the participants. When asked if they had signed up to My Health Record, only a third (24 out of the 66 participants) answered that they definitely had enrolled themselves. Nine women said they weren’t sure or couldn’t remember if they had registered, while the remaining 33 women responded either that they had not heard of My Health Record or they had decided not to sign up.
  • The women who had registered for My Health Record said that they had done so because of the benefits they could see of being able to have a digital health record that could be shared across providers. None of the women who had registered for My Health Record made any reference to the opportunity to be able to view their health records themselves or add to them. As this suggests, there was little awareness among the participants that My Health Record had been initially designed as a patient engagement tool as well as a platform for storing their medical information and sharing it with their healthcare professionals.
  • Technical difficulties were major barriers to enrolling and using the system successfully. The problem was not just My Health Record itself, but the MyGov platform on which it was hosted. Several women made reference to other services on MyGov being difficult to access and use.
  • No participants had yet found any benefit or use for My Health Record. It was viewed more as a repository for the use of healthcare professionals than for women’s own active use as contributors and users of their data.
  • Several participants said that they regularly had to remind their doctors that they had a My Health Record, only to find that the doctors were not using the system or uploading information, and even discouraging patients from using it.
  • Recent publicity in relation to the Australian government’s misuse or lack of protection of citizens’ personal data have led to the participants demonstrating low levels of faith in the government’s capability to adequately manage My Health Record. Many participants also referred to their distrust in the Australian government to protect their medical information adequately. Government agencies were represented as incompetent rather than malicious, lacking the knowledge and skills to establish and maintain a national EHR system that was secure and effective enough to give them enough confidence or motivation to register and use it.
  • In summary, these findings suggest that the Australian government needs to provide adequate and appropriate information to the Australian public about My Health Record, and particularly the opt-out process and negotiating consent to data sharing. It so doing, it will have to address wider problems of the Australian public’s lack of trust in the ways in which government agencies collect, share, protect or exploit their personal data.



Who owns your personal health and medical data?

09/01/15 -- A moment during day 1 of the 2-day international Healthcare and Social Media Summit in Brisbane, Australia on September 1, 2015. Mayo Clinic partnered with the Australian Private Hospitals Association (APHA), a Mayo Clinic Social Media Health Network member to bring this first of it's kind summit to Queensland's Brisbane Convention & Exhibition Centre. (Photo by Jason Pratt / Mayo Clinic)

Presenting my talk at the Mayo Clinic Social Media and Healthcare Summit (Photo by Jason Pratt / Mayo Clinic)

Tomorrow I am speaking on a panel at the Mayo Clinic Healthcare and Social Media Summit on the topic of ‘Who owns your big data?’. I am the only academic among the panel members, who comprise of a former president of the Australian Medical Association, the CEO of the Consumers Health Forum, the Executive Director of a private hospital organisation and the Chief Executive of the Medical Technology Association of Australia. The Summit itself is directed at healthcare providers, seeking to demonstrate how they may use social media to publicise their organisations and promote health among their clients.

As a sociologist, my perspective on the use of social media in healthcare is inevitably directed at troubling the taken-for-granted assumptions that underpin the jargon of ‘disruption’, ‘catalysing’, ‘leveraging’ and ‘acceleration’ that tend to recur in digital health discourses and practices. When I discuss the big data phenomenon, I evoke the ‘13 Ps of big data‘ which recognise their social and cultural assumptions and uses.

When I speak at the Summit, I will note that the first issue to consider is for whom and by whom personal health and medical data are collected. Who decides whether personal digital data should be generated and collected? Who has control over these decisions? What are the power relations and differentials that are involved? This often very intimate information is generated in many different ways – via routine online transactions (e.g. Googling medical symptoms, purchasing products on websites) or more deliberately as part of people’s contributions to social media platforms (such as PatientsLikeMe or Facebook patient support pages) or as part of self-tracking or patient self-care endeavours or workplace wellness programs. The extent to which the generation of such information is voluntary, pushed, coerced or exploited, or indeed, even covert, conducted without the individual’s knowledge or consent, varies in each case. Many self-trackers collect biometric data on themselves for their private purposes. In contrast, patients who are sent home with self-care regimes may do so reluctantly. In some situations, very little choice is offered people: such as school students who are told to wearing self-tracking devices during physical education lessons or employees who work in a culture in which monitoring their health and fitness is expected of them or who may be confronted with financial penalties if they refuse.

Then we need to think about what happens to personal digital data once they are generated. Jotting down details of one’s health in a paper journal or sharing information with a doctor that is maintained in a folder in a filing cabinet in the doctor’s surgery can be kept private and secure. In this era of using digital tools to generate and archive such information, this privacy and security can no longer be guaranteed. Once any kind of personal data are collected and transmitted to the computing cloud, the person who generated the data loses control of it. These details become big data, part of the digital data economy and available to any number of second or third parties for repurposing: data mining companies, marketers, health insurance, healthcare and medical device companies, hackers, researchers, the internet empires themselves and even national security agencies, as Edward Snowden’s revelations demonstrated.

Even the large institutions that are trusted by patients for offering reliable and credible health and medical information online (such as the Mayo Clinic itself, which ranks among the top most popular health websites with 30 million unique estimated monthly visitors) may inadvertently supply personal details of those who use their websites to third parties. One recent study found that nine out of ten visits to health or medical websites result in data being leaked to third parties, including companies such as Google and Facebook, online advertisers and data brokers because the websites use third party analytic tools that automatically send information to the developers about what pages people are visiting. This information can then be used to construct risk profiles on users that may shut them out of insurance, credit or job opportunities. Data security breaches are common in healthcare organisations, and cyber criminals are very interested in stealing personal medical details from such organisations’ archives. This information is valuable as it can be sold for profit or used to create fake IDs to purchase medical equipment or drugs or fraudulent health insurance claims.

In short, the answer to the question ‘Who owns your personal health and medical data?’ is generally no longer individuals themselves.

My research and that of others who are investigating people’s responses to big data and the scandals that have erupted around data security and privacy are finding that concepts of privacy and notions of data ownership are beginning to change in response. People are becoming aware of how their personal data may be accessed, legally or illegally, by a plethora of actors and agencies and exploited for commercial profit. Major digital entrepreneurs, such as Apple CEO Tim Cook, are in turn responding to the public’s concern about the privacy and security of their personal information. Healthcare organisations and medical providers need to recognise these concerns and manage their data collection initiatives ethically, openly and responsibly.