Leader: Remember this - but not that
We need a way to archive the web for the future while ensuring that people are not lost in 'data shadows' or digital doppelgängers
One of the most frequent requests to this publication is to remove a story from our online archive, which goes back to 1994. It seems that some people who google their names find a story about themselves from years ago that they would rather not be seen by a future employer or feel they have moved on and would prefer to forget their previous antics.
But the internet does not forget. Nothing is erased completely, argues Viktor Mayer-Schönberger in Delete: The Virtue of Forgetting in the Digital Age. A "data shadow" is left behind, one that can cast a pall over someone's life and sometimes obscure the truth.
When a positive review of Lois S. Bibbings' Telling Tales about Men: Conceptions of Conscientious Objectors to Military Service during the Great War appeared online, Dr Bibbings, senior lecturer at the University of Bristol Law School, was delighted. But her joy soured somewhat when she spotted that a "u" had been added to her first name. She had, as she says indignantly in our cover story, "been inadvertently gender-reassigned in the textual world". That the mistake should occur in the journal Gender and History only made the situation more surreal as "Louis" started to "sow his seed on the information superhighway. The error was proliferating online as various sites began citing the review of a book he had apparently sired."
Although Dr Bibbings had the error corrected online, the damage had been done: Louis remains out there on the web, claiming credit for her work.
This problem is not unusual: it plagues citation counts. But capturing a person's activities online correctly and archiving their digital footprint raises myriad issues. The challenge, say researchers Eric Meyer et al in Web Archives: The Future(s), is how to build tools that allow individuals to "manually specify how to automatically collect" this footprint. Such tools must be able not only to remember but also to forget, by allowing people to delete information, which some scholars (including Professor Mayer-Schönberger, an expert on internet governance) think is a basic right. There are many privacy and legal problems that will need to be addressed, the report's authors say.
Academics, however, are not doing enough to engage with this topic. This failure means, the report says, that in the worst case "the vast amount of information being created globally today may just as well have been written on scraps of paper stored in a billion shoeboxes, for all the good it will do towards understanding developments in the world as reflected by the content on the internet".
Archiving web content is important. Given that we now live so much of our lives online, it is of concern that a great deal of material on the web is disappearing. Web pages change all the time - surviving unaltered for only 100 days on average. Such content must be preserved, but it must be done in a way that will be useful to researchers rather than sitting in archives that risk gathering "digital dust".
A combination of uncoordinated, piecemeal approaches to collection, technical challenges, permissions and concerns from commercial publishers means that while libraries are busily digitising documents from the 19th century, ironically information widely accessed through websites today may not be available to tomorrow's scholars.