Data Dump

Humans sometimes have the strangest squirrel mentality; god forbid you should forget where you put those winter nuts last fall.

Scientists who collaborate via email, Google, YouTube, Flickr and Facebook are leaving fewer paper trails, while the information technologies that do document their accomplishments can be incomprehensible to other researchers and historians trying to read them. Computer-intensive experiments and the software used to analyze their output generate millions of gigabytes of data that are stored or retrieved by electronic systems that quickly become obsolete.

"It would be tragic if there were no record of lives that were so influential," Dr. John says.
Yes, entirely tragic if all we could discover about them were the vast number of published papers, articles, books, lectures, and presentations mirrored throughout the web. That won't even begin to answer what the consistency of their ca ca was like on 090828.

The growing scale of new science projects, however, has university data custodians worried. "We are swimming in data these days, and people are overwhelmed," says digital curator Sayeed Choudhoury at Johns Hopkins University, the principal investigator for a national consortium of data preservationists called the Data Conservancy.

"Our ability to collect data now outstrips our ability to maintain it for the long run," says William Michener at the University of New Mexico, who leads a data-preservation network called DataONE. "We lose an awful lot of data that is collected with public funds."

"The problem is to actually capture the way scientists interact with the data," Dr. Szalay says. "Today's graduate students are starting to use instant messaging in their scientific work. We have to figure out how to capture these."

"Digital information lasts forever -- or five years," says RAND Corp. computer analyst Jeff Rothenberg, "whichever comes first."

There's no question that new tools mean new discoveries, new ways of working, and new relationships to power and history; how science changes as a result of networked computers is valid question. What's silly is this breathless, stuck-in-the-past conservationist mentality that privileges the recording every little cough and shift (kinda reminds me of Kenneth Goldsmith's book Fidget) over the synthesis of new ideas. Why are we always so constantly worried that certain things will simply pass out of existence -- species or languages or some geek's IM messages? Why are we so concerned that it is getting more and more difficult to write the "definitive history" of a scientific discovery or an artistic idea or a given year because so many things collaborated in its creation. Why do we spend more time worrying about keeping our data in one repository to end all repositories than we do in figuring out how to have our data make more data? We still think like puny humans.

So what if we lose bit here and there -- we have already arranged for everything of importance to go viral.

