UC Davis Magazine Online
Volume 18
Number 3
Spring 2001
Current IssuePast IssuesMagazine HomeSearch Class NotesSend a Letter
Features: Why Milk? | Disappearing Data


Disappearing Data

By Barbara Anderson

Photo of disappearing computer disksWill today's important records still be available tomorrow—or will they be trapped in obsolete computer formats, readable only by software and hardware that no longer exist?

What happens to the things we write? For millennia, the thoughts of humans have been migrating from stone to paper to Web pages to e-mail—and to poetry and prose composed, not on paper, but on a computer screen.

What becomes of these electronic documents? What happens when the technology races ahead, trapping your novel in a no-longer-supported version of Word? And what about the things that now exist in paper format? Should they be digitized to preserve their content? Who will be able to vouch for their authenticity, assure that there have been no changes to text or image?

Why do we save things, anyway?

These questions are the subject of intense scrutiny by librarians, archivists, historians and scholars, as the electronic information age sweeps through academia, business, government and even our personal lives. How will we save these things? And, in a CD-era when we can save it all, who will decide what's worth saving? And, once it's saved, will it be readable in 10, 30, 100 years? Will data disappear?

*

THE TECHNOLOGY, SHE'S A-CHANGIN'

Dick Walters saves old computers. He has a sort of "museum" of them, though there aren't any guided tours, and the collection is housed, not in a facility designed for it, but in a storage room in Engineering II. Microcomputers using 8-inch floppy disks, a MITS Altair (the machine Bill Gates got his start on), Apple IIs, Atari games systems, early portables like the Osborne. They're not used much, but, says Walters, about a half-dozen times a year he gets a request from someone who needs to be able to read a disk or a file that was created with software and a machine that are no longer readily available. Sometimes Walters can help; sometimes, he can't.

Walters, a professor emeritus of computer science, believes most of what's being produced electronically will be accessible in the future, but he worries about the rapidity with which new technologies overtake the old and the possibility that soon there may be no one who can remember how to run these old machines—data preserved on 7-track (let alone 9-track) tape may still be viable, and readable, but only if a 7- or 9-track tape player can be found to read it and an operator is around who knows the hardware and software to make it work.

Walters started his museum collection around 1976 with the advent of the microcomputer, adding to it as the term changed to personal computer, or PC, and Apple brought out the Macintosh. And as computers and their operating systems have become increasingly complex, Walters says, there is a "vanishing number" of people who remember how to run those older machines.

"There's very little interest today in looking at the hardware of yesterday," he says. "And yesterday is creeping up on us—the half-life of people retaining an interest in some of these old machines is getting smaller rather than larger with time."

It's a problem that can have consequences both expensive and embarrassing. When the Census Bureau compiled the information from the 1960 census, it retained records in what it believed was "permanent" storage. But in 1976, when the National Archives identified some of that data as having long-term historical value, it was discovered that a significant portion of those records was on tapes readable only with a UNIVAC type II-A tape drive, which by that time was long obsolete. And while the Census Bureau was successful in rescuing nearly all the data, the incident caused warning bells to ring in high places; in a 1985 report, the Committee on the Records of Government said, "The United States is in danger of losing its memory." Even if information doesn't have the import of census records, culturally significant events can get lost if those in charge don't make sure there's a way to keep them viable. Now-ubiquitous e-mail began in 1964 with one message; trouble is, nobody knows if it was sent from the Massachusetts Institute of Technology, the Carnegie Institute of Technology or Cambridge University. The message no longer exists, and with its demise came the demise of the trail to its origin.

*

SAVING OUR E-SELVES

Saving paper was relatively easy—you put it in a file folder, put the folder in a drawer. But paper is bulky, and after a while, you'd weed that file. Electronic files, on the other hand, aren't bulky at all; you can save hundreds, make that thousands, of documents on a CD, making it easy to save everything. But then what? For most of us, it might not matter whether anyone has the time or perseverance to go through each of those disks to see what we so assiduously saved over a lifetime of e-correspondence and e-business and e-jottings. In the case of someone in the public eye, though—a politician, for instance, a university administrator, a scientist, a writer—those documents contain threads that tie together a lifetime of work. Untangling those threads—which include both gems of insight and the detritus of joke e-mails—can provoke a major headache for archivists when they set about determining just what's there and what should be saved.

"We can save it all," says John Skarstad, with a smile. "And so we do." Skarstad, university archivist in the Department of Special Collections in Shields Library, talks a lot about appraisal, the system of evaluation that determines whether something, be it a manuscript, a photograph, or a hard drive full of e-mail, has value—intellectual value, evidential value, monetary value. That appraisal becomes more difficult, and more time-consuming, when what's being examined hasn't been distilled. He offers the example of someone in the paper universe—the analog world—who, when clearing out a file cabinet, would toss the copies of the 85 thank-you notes sent to everyone who attended a conference, saving only the master copy and perhaps the list of who came to the event (which, Skarstad says, "turns out to be a little piece of paper"), whereas getting those documents—and a whole lot of others—on a CD is akin to getting somebody's entire file cabinet. "What you want is there," he says, "and what you don't want is also there."

Along with the necessity of opening every file on a floppy disk or CD comes the challenge presented by people who don't separate their personal files from their professional ones, their confidential files from public documents. Skarstad hopes for the day when computer users click a box indicating whether an e-mail is personal or business-related. "I might still want to see the personal e-mails," Skarstad says, noting that if a person has regular correspondence with a significant historical character, those personal e-mails would be helpful. "But not," he says, "if it's your e-mail to your child reminding her to clean her room before you get home."

And there's the issue of staff to handle the workload. Skarstad says it's inconceivable to someone who uses a file cabinet that a foot-high stack of floppies or, worse yet, that tidy foot-long box of CDs, is the equivalent of an appraiser's life's work.

We're still trying to fit things into the paper model, Skarstad says, a model that doesn't quite work when dealing with e-documents. But devising new models is a challenge for those of us grounded in the paper model. We're in a middle phase, he says, "and I'm not sure where in the middle we are. But we aren't at the end yet."

A big part of that middle phase is concerned with access. The paper universe had a thing called the card catalog, Skarstad says, and the 3x5 card contained all the information about the book; in the case of a manuscript collection, the card indicated that the collection existed in so many boxes, and sometimes, "if we were lucky," Skarstad says, time and money would have been spent in creating a finding aid, a paper index, to that collection. "So," he says, "you could look at the paper index and, through it, get access to this box of stuff."

Some years ago, though, card catalogs began migrating to a new format, called MARC (for machine-readable cataloging). And though MARC was designed to carry data from the card to the electronic universe, it didn't allow all of the data on the cards to make the trip. Now there's another term for cataloging—metadata—("essentially cataloging," Skarstad says, "but it's cataloging with a vengeance") that has spawned large international projects attempting to figure out, given that we're all universally linked by the Internet, how to describe things in universally consistent ways so they are universally findable and accessible.

*

THE DIGITIZED LIBRARY

Paper is nice, but it has limitations. Most things printed since the 1850s were printed on paper with a high acid content and are degrading rapidly. And paper, being tangible, can exist in only one place at one time. Given that the technology exists to make it so, libraries have begun digitizing their collections, converting traditional materials like books, maps and manuscripts into digital form that can be accessed via computer.

But even, or especially, in the digital world, there's no such thing as a free lunch. Whether it be dyes, glues or plastics, impermanent materials make for impermanent solutions. Digitizing analog material can make it accessible to many users simultaneously, but CDs, like tape and floppies before them, degrade, sometimes in only five years' time. The issue then becomes one of ensuring that digitally coded information stays fresh: Should it be transferred from CD to CD? Stored on a hard drive? How big a hard drive? And who will do the maintenance?

Equally relevant is that digitally encoding data changes that data. In a February 1999 report titled "Why Digitize" prepared for the Council on Library and Information Resources, CLIR's Director of Programs Abby Smith writes that "Analog information can range from the subtle tones and gradations . . . in a Berenice Abbot photograph . . . to the changes in volume, tone and pitch recorded on a tape. . . . But when such information is fed into a computer, broken up into 0s and 1s and put together in a binary code, its character is changed in quite precise ways." Further, "digital information is not eye-legible: It is dependent on a machine to decode and re-present the bit streams in images on a computer screen. Without that machine, and without active human intervention, those data will not last."

*

THE 404 NOT FOUND PROBLEM

We may manage to save it, maybe even make it readable, but what happens when what used to be there goes somewhere else? Think of that Web page you bookmarked some months ago. Can you still get to it? If not, where did that information go? How will scholarly publications cite material that has appeared only on the Web, in e-zines or on a professor's own Web site? How can they know that those sources will remain available and accessible to future users? The Internet Archive (www.archive.org), whose aim is to build an Internet library of snapshots of publicly accessible Internet sites, uses Web-crawling robots—software that automatically collects Web pages from Web servers—to "prevent the Internet . . . from disappearing into the past." As of March 2000, the archive had collected 1 billion Web pages—13.8 terabytes (one terabyte equals a million megabytes)—that it makes available to researchers, historians and scholars.

*

READING BETWEEN THE LINES

"Last Thursday, 27 sheets of faded graph paper filled with James Joyce's handwriting sold at auction in New York for $1,546,000. . . . The text on these papers is an ant track of ink, with deletions and insertions and lists of words" (The New York Times, Dec. 20, 2000).

Computers have given us the ability to write quickly and to edit even more quickly. But if Joyce had written Ulysses on a laptop, what, if anything, would be lost to us now? What about those lined-out words, those notes in the margin, even that coffee stain at the bottom—what do they say about the creator and the creative process that isn't said when all you have is a document displayed on a computer monitor or a printout of that image? What's the intangible quality that a tangible work has?

Anthony Hunt, professor of English at the University of Puerto Rico-Mayaguez and a Gary Snyder scholar, recalls researching Snyder's poem "The Blue Sky." "I could see seven pieces of paper with several variations of a poem, sometimes just a few lines, sometimes an entire draft" that culminated in the final version. "If Gary had been using a computer," Hunt says, "there would probably be only one last version. How do I know? I write poetry myself and faced the issue years ago. Do I let my ego go wild and keep every backup copy of every draft of a poem that takes me two years to write? Or do I just wind up with one final poem, no earlier drafts on computer disks? The latter is more likely to happen."

Hunt also remembers another experience, this with the work of another poet, T.S. Eliot. In 1972 or thereabouts, Hunt was one of the last people to see the actual manuscript of Eliot's The Wasteland, which was at the Berg Collection at the New York Public Library. "I wheedled them into letting me see it because I had come all the way from Puerto Rico," Hunt says. "I sat on one side of a desk and an archivist sat on the other side. I kept my hands in my lap while she turned the pages for me to see them; I wasn't allowed to take notes until after she returned the manuscript to the vault and I left the special room. The pages have since been published separately as a facsimile edition of the poem. But my moment was a magical moment, one that won't take place when everything is on hard disks."

*

INTIMATIONS OF IMMORTALITY

Ever since the first marks were scratched onto a piece of stone, data have been disappearing. Things get lost, damaged, destroyed, forgotten. So, in the long view, does it matter, really, that some of what's been produced in electronic format may not make it into the future? "Sophocles wrote lots of plays," says Skarstad. "We have little more than a handful of them." The rest? We can only imagine what they were, he says. But, "we have a handful." And what we have has contributed to our humanity, to our sense of who we are and where we came from. Future generations, if they are able to read what we've saved, will know us in the same way we know the ancient Greeks—by the fragments that remain behind.

*


Current Issue | Past Issues | Magazine Home | Search Class Notes | Send a Letter