Swimming or drowning in the data ocean? Thoughts on the metaphors of big data

English: Tsunami hazard sign

English: Tsunami hazard sign (Photo credit: Wikipedia)

There is no doubt that there is a current fascination in both popular culture and academic research with big data – the vast quantities of data that are generated from people’s interactions with digital technologies. The term ‘big data’ is appearing with ever-greater frequency in the popular media, government reports, blogs and academic journals and conferences.

The ways in which big digital data are described rhetorically reveal much about their contemporary social and cultural meanings. As Sue Thomas writes in her book Technobiophilia: Nature and Cyberspace, organic metaphors drawn from the natural world have been continually used to describe computer technologies since their emergence. Such natural terms as the web, the cloud, bug, virus, root, mouse and spider have all been employed in attempting to conceptualise and describe these technologies. These metaphors work to render digital technologies more ‘natural’, and therefore as less threatening and alienating. However nature is not always benign: it may sometimes be wild, chaotic and threatening, and these meanings of nature may also be bestowed upon digital technologies.

By far the most commonly employed metaphors to discuss big data are those related to water or liquidity: streams, flows, leaks, rivers, oceans, seas, waves and so on. Both academic and popular cultural descriptions of big data have frequently referred to the ‘fire hose’ of data issuing from a social media site such as Twitter and the data ‘deluge’, ‘flood’ or ‘tsunami’ that as internet users we both contribute to and which threaten to ‘swamp’ or ‘drown us’. These rather vivid descriptions of data as a fluid, uncontrollable entity possessing great physical power emphasise the sheer volume and fast nature of digital data movements, as well as their unpredictability and the difficulty of control and containment. They suggest an economy of digital data and surveillance in which data are collected constantly and move from site to site in ways that cannot easily themselves be monitored, measured or regulated.

Other metaphors are sometimes employed to describe the by-product data that are generated include data ‘trails’, ‘breadcrumbs’, ‘exhausts’, ‘smoke signals’, ‘shadows’. All these tend to suggest the notion of data as objects that are left behind as tiny elements of another activity or entity (‘trails’, ‘breadcrumbs’, ‘exhausts’), or as less material derivatives of the phenomena from which they are viewed to originate (‘smoke signals’, ‘shadows’).

Digital data are also often referred as living things, as having a kind of vitality in their ability to move from site to site and morph into different forms. The rhizome metaphor is sometimes employed to describe how digital data flow from place to place, or from node to node, suggesting that they are part of a living organism such as a plant. This also suggests a high level of complexity and a network of interconnected tubes and nodes.

The focus on liquidity, ceaseless movement and flux and vitality, while accurately articulating the networked nature of contemporary societies and the speed and ease at which information travels across the networks, also tends to obscure certain dimensions of digitisation. The blockages and resistances, the solidities that may impede the fluid circulation of data tend to be left out of such discussions. The rhetoric of free streams of flowing communication tends to obscure the politics and power relations behind digital and other information technologies. The continuing social disadvantage and lack of access to economic resources (including the latest digital devices and data download facilities) that many people experience belies the discourse of free-flowing digital data and universal, globalised access to and sharing of these data.

Liquidity metaphors evoke the notion of an overwhelming volume of data that must somehow be dealt with, managed and turned to good use. Instead of ‘surfing the net’, a term that was once frequently used to denote moving from website to website easily and playfully, we now must cope with huge waves of information or data that threaten to engulf us. When we think of digital data as ‘breadcrumbs’ or ‘shadows’, they are less overtly threatening, but are also depicted as subtle means of tracking and tracing our movements and activities. As we grow increasingly aware of the use of digital data for surveillance and espionage purposes, these metaphors may take on a more malign meaning, suggesting that we are being monitored constantly whether we agree or not. Digital data surveillance systems are beginning to know more about us than we ourselves do in their capacity to silently watch and record our actions. When we conceptualise digital data and the systems that produce them as complex living organisms, they appear more benign, part of ‘good nature’, but also again as potentially wild and uncontained, growing out of our control.

What the rhetoric of big data tends to suggest is that we harbour both attraction towards and fear about this phenomenon. Big data may offer many benefits, but they also generate anxieties due to their volume, power, ceaseless movement, complexity, mystery and ability to generate knowledge about us that we may not want others to see.

For more on the social and cultural aspects of big data see my Bundlr ‘The Social Life of Big Data and Algorithms’.