Privacy and Big Data

===

Overlapping the movement paths of different people can reveal important information for contact tracing, helping to identify who is in contact with whom. However, that information can be highly sensitive, depending on the context in which people meet and their concerns about privacy. There is a very real danger of stumbling into a long-lasting Orwellian surveillance state through nothing but goodwill and the best intentions. Think of all the domains beyond COVID-19 in which granular location data could provide a tempting trove of information; for instance, other health outcomes (cancer, heart disease) could be traced to the precise geographic locations where environmental or societal factors exert significant causal influences. Such big data could prove an immensely useful tool for public health, but the result might be a breakdown of privacy and greater state control of how people live their lives. If you can pinpoint a block or neighborhood with a large number of COVID-19 cases, does a different set of rules apply to the behavior of people living there? If you can identify an area where environmental toxins are leading to disease later in life, are people forced or encouraged to relocate? And who makes the decisions – is it some well-positioned technocrat with access to the big data, or is it the affected communities themselves?

The risks posed by big data suggest that this field of research should be approached with caution. Reverse geo-coding can reveal individual identities with sufficient data, and while there are trainings and protocols in place for protecting confidentiality, they sometimes fall short of the mark. When they do, specific home and work locations can be revealed to the public or to malevolent actors. Big data also sometimes lack social context, and when these data deal with the sensitive information of already vulnerable populations, they can actually increase vulnerability through processes like stigmatization. Mei-Po Kwan and others are working on the kinds of data infrastructure that could keep data secure, developing the Geospatial Virtual Data Enclave, which would require researchers to go through credentialing to access data. This project has some major upsides, geo-masking some of the sensitive elements of GPS data and limiting the researcher’s ability to analysis by preventing upload/download/printing. If big location data is going to be used, this is a good way to protect it, but there is not universal agreement that these data should be used as they are.

During a pandemic, it is important to understand human space-time behavior. However, this need might be better addressed through interviews or a similar process that involves a more voluntary, ethical provision of data than the passive, automatic collection of GPS data from a device. We might also question the process of accumulating a massive amount of data for the eyes of researchers, when there is already such a wealth of local knowledge regarding human behavior across space and over time. Can there not be a more radically bottom-up approach to disease prevention and mitigation, involving the people themselves rather than merely tracing them and aggregating their GPS data? Perhaps the answer comes down to ideology: do we believe that communities can be self-regulating, or do they need to be regulated by a more knowing authority?

===

Home