The Internet is forever.” So goes a saying regarding the impossibility of removing material—such as stolen photographs—permanently from the Web. Yet paradoxically the vast and growing digital sphere faces enormous losses. Google has been criticized for failing to ensure access to its archive of Usenet newsgroup postings that stretch back to the early 1980s. And now Internet pioneer Vint Cerf has warned of a “digital dark age” that would result if decades of data—emails, photographs, website postings—becoming lost or unreadable.
Millions of paper records more than 500 years old exist today. But your entire family photo collection could be lost forever with just a single hard drive failure. Stone tablets, parchment, paper, printed photographs have all lasted through the centuries. But some of our data may not. What do we do about preserving the digital deluge?
Cost Versus Value
Technical solutions already exist, but they’re not well known and relatively expensive. How much are we prepared to pay to ensure that digital stuff today is usable in the future? Because if there’s cost involved, inevitably we have to think about what has value that makes it worth keeping.
How can we calculate that value? As an example, the holdings of the U.K. Data Archive include machine-readable versions of all of the General Household Surveys (GHS) carried out between 1971 and 2011. This was a continuous national survey of people living in private households conducted on an annual basis. The cost of the GHS in 2001 was reported as 1.43 million pounds (about $2.2 million) making the value of the survey and its data at least that. As it was the 30th year of this survey the value could be said to be higher as it was part of a series, so we could say the survey was worth more than it cost.
The Office for National Statistics transferred the 2001 data to the U.K. Data Archive in 2002, where we prepared them for preservation and access and published them. Up until today this survey data has been downloaded by 426 people working in government departments, 759 staff working in education, 1,331 students, and 109 others for various uses. So benefits accrue from making the data available even after its creators have exhausted their primary value—re-use is a significant benefit from preserving data and adds value.
Making Digital as Long-Lasting as Paper
How can we improve the chances of something being preserved? Professor Michael Clanchy, writing in his seminal “From Memory to Written Record,” discusses how the concept of records developed. Owing to the media available to scribes in the Middle Ages they made conscious choices between creating an ephemeral document (on a wax tablet) or a permanent record (on parchment). Today digital media proliferates mainly because it provides the easiest means to transmit a work, and so that distinction has to a point disappeared.
Documents and records are now both digital, but the question remains as to what should be kept for posterity and why. These are hard questions which lead to hard choices, because by their nature the cost of preserving digital materials can be much more expensive than their analogue counterparts. You can’t just put them in a box and walk away—the effort and tools required to read a 100-year-old letter is considerably less than the effort required to read a 30-year-old LocoScript popular on Amstrad computers in the 1980s−1990s.
The Future Starts Now
Organizations can do a better job than individuals, but require a business model and a mandate to do so. Asking someone to pay for something a long time before its value can be realized (if at all) is not an attractive business proposition. What we can do, at a minimum, is try and convince people that it is possible.
Of course neither creator nor archivist can fully understand how future users may approach digital information preserved over time. Social and cultural historians have, by necessity, used records for purposes for which they were not created and often in inventive and interesting ways. Historians are often helped by context, and the digital material we’re creating today needs the same contextual information to ensure its usefulness.
Matthew Woollard is a professor and director for the U.K. Data Archive and the U.K. Data Service at University of Essex. This article was originally published on. This article was previously published on The Conversation.com.