The geeks who saved Usenet
Google's restoration of digital history relied on a few heroes' packrat mentality and a mountain of decaying mag tapes.
By Katharine Mieszkowski
Salon.com
January 7, 2002
On May 11, 1981, one Mark Horton, then a graduate student at the University of California at Berkeley, using the e-mail address "ucbvax^mark," posted this message to the Usenet newsgroup Net.general:
Rusty is right (or is that "Rusty is Wright"?)
- we have ALL in our .ngfile so I tend to forget
this. ALL.ALL may or may not work, but
ALL certainly does. Mark
Then, the ancient Internet scribe added this ominous postscript:
I plan to make the change on Tuesday
unless something horrible happens.
Horton's message was a response to a previous post, the intact original of which is now lost to history, from one "sdcarl!rusty," aka Rusty Wright. With this incomplete fragment of a cryptic exchange, the history of Usenet, as we have it today, begins.
The message is the oldest Usenet posting [ http://groups.google.com/groups?selm=anews.Aucbarpa.111] in the 20-year archive, now searchable on Google [ http://www.google.com/googlegroups/archive_announce_20.html ]. It's the first of some 700 million posts that provide a record spanning the early history to the present of Usenet -- the sprawling public bulletin board, composed of a vast hierarchy of newsgroups, that grew up alongside the Internet itself.
Granted, this message doesn't exactly have the ever-quotable and historic ring of Alexander Graham Bell braying on the first telephone call, "Mr. Watson. Come here. I need you." But it's not the first Usenet message ever -- it's just the first one captured in this vast, yet still incomplete, archive of Usenet's 35,000 topic categories. It's an ordinary exchange between two of the first few hundred denizens of Usenet posting back in 1981.
Still, if you squint, you can see glimmers of what's to follow in this poignant gem of a fragment. What are these geeks talking about, anyway? It's a meta-post about the system itself, of course! It's part of a technical discussion of how Usenet should be administered. And catch that corny play on words, goofing off Rusty's last name: "or is that 'Rusty is Wright'?"
Geeks talking amongst themselves on Usenet about how Usenet should best be run, while having fun with homonyms: Almost 20 years later, has anything really changed?
In mid-December 2001, Google unveiled its improved Usenet archives, which now go more than a decade deeper into the Net's past than did the millions of posts that the company salvaged from DejaNews [ http://news.cnet.com/news/0-1005-200-4794014.html ]. Now on a browser near you: a glimpse of the prehistory of the Net culture we all take for granted today. The first "me too" post! [ http://groups.google.com/groups?selm=bnews.csu-cs.1997 ] The first "Make-Money-Fast" post! [ http://groups.google.com/groups?selm=2193%40cucstud.UUCP ] It's enough to make even a relative newbie nostalgic for a past she never experienced firsthand.
The debut of the archive touched off a flurry of chatter among the geeks on Slashdot [ http://slashdot.org/article.pl?sid=01/12/11/0727218&mode=thread ], some of whom had been there back in the day. There were some grumbles. Imagine what it's like to see your flames from 15 years ago, when Usenet still had the population of a small town, now searchable by anyone on the Web.
"Glad I've changed my e-mail address since those long, (best) forgotten days. It wasn't me, I swear," joked one poster to Slashdot. Another one griped: "It's like having naked baby pictures of yourself stapled to your forehead when you walk around." (Google vows that at the author's request, they'll delete old posts; so if you want to be the Internet equivalent of a rare-book burner, go right ahead.)
Google gets the credit for making these relics of the early Net accessible to anyone on the Web, bringing the early history of Usenet to all. Michael Schmidt, 29, a Google software engineer, spent the last year and a half playing detective, trying to track down the Internet's lost history: "It was a long and painful investigative process. I was searching on the Web, calling people. There were a lot of dead ends."
But it was the geeky pragmatism and historical foresight of Usenet old-timers themselves that actually saved the early history of the newsgroups so that we can all poke around in it today. These "archive donors," whom Google thanks here [ http://www.google.com/googlegroups/archive_announce_20.html ], gave their copies of the millions of messages they'd saved back to the Net.
The tale of how early Usenet was saved begins with one of the Net's great old-timers: Henry Spencer [ http://www.lysator.liu.se/c/henry/index.html ]. "Henry Spencer is the real hero, because his contributions are what makes this historic," says Schmidt. "Back in the Stone Age of the Internet, he was already archiving this stuff, and he was the only one doing it."
Spencer, a legendary Unix hacker -- a species not exactly known for humility -- is pleasantly understated about his role as Usenet's great early archivist. He's the first to point out that he wasn't really the only one saving those early messages. But the copies he kept of Usenet postings from 1981 to 1991 appear to be the only ones that still exist. "There were several other people who were archiving stuff, but all of them gave up before we did, and as far as I know none of their archiving survived," he says. For instance, legend has it that two guys at Bell Labs kept back-ups as well, but their stores of these ultra-rare posts are nowhere to be found."I'm very glad the stuff is finally out there, and I can stop worrying about how the only copy might get lost," Spencer says, now that Google has assured the preservation of the more than 2 million old messages he saved. "I'm just glad that this particular great mass of data is no longer my worry."
One of the early adopters of the computer language C, Spencer is known for his Ten Commandments for C Programmers [ http://www.lysator.liu.se/c/ten-commandments.html ], as well as for being the coauthor of C News, one of the early programs for transferring and reading Usenet messages.
Now 46 years old, he works as an independent consultant, but back in 1981 he ran the computer facility at the University of Toronto's zoology department. While the geeks over in the university's computer science department were busy with the Arpanet, the Department of Defense's system was too expensive for the zoologists.
"The zoology department may sound like a funny place for pioneering networking work," says Spencer. "But the computer science department wasn't very interested in this inferior networking. It was very low-tech by their standards. But it worked and theirs didn't. Their opinion changed fast when we started providing e-mail."
That's how, in the spring of 1981, with a 300 baud modem, the zoology department at the University of Toronto became a central distribution point for Usenet, when the network was just 2 years old.
Traffic was almost unimaginably lighter in those days. Only about 200 people had access to Usenet: "In the first few years, it was at least plausible to come in in the morning and read all the Usenet traffic that had come in, and 15 minutes later be off doing something useful," remembers Spencer. But even that low level of traffic was too much for the storage requirements of the day. "Pretty soon, it was necessary to think about expiring old stuff," he says.
It wasn't a sense of historical importance that initially led Spencer to think about creating an archive. His motivation was much more pragmatic than that: Most of the conversations on Usenet at the time were very technical, and he was reluctant to see the information in them disappear, because it might be useful to the university's geeks: "A lot of the early traffic was about things like Unix systems bugs, and it seemed unwise to just throw it out."
So the archiving began with 40 megabytes filling up a new mag tape -- each reel one-half inch thick and 10 inches in diameter -- every few months. In this era, messages from the outside world came in at the tortoise rate of 300 baud. ("When we got a 1,200 baud auto-dialing modem, that was just wonderful. Twelve-hundred baud was just total luxury," Spencer recalls.) As Usenet grew, this meant that Spencer and his system administrators had to be selective about which newsgroups they received and archived, keeping technical conversations but throwing away some of the more general discussions that generated a lot of traffic.
"We started dumping stuff that we thought was obviously of no future use, groups that specialized in a lot of talk and no substance, so to speak. For example, fairly early on there was a newsgroup about abortion which specialized in violent arguments."
That's why not only the very earliest Usenet posts, before Spencer started archiving in 1981 (Usenet began in 1979) but even some of the posts in the 1980s are still lost. It's too bad; today, wouldn't more of us rather see what was being said about abortion in 1984 than sift through the arcana of bug fixes in systems that have probably been long since retired? "It was perfectly reasonable from the viewpoint of stuff that we might want to use again, but a little sad from today's viewpoint," Spencer admits.
For 10 years, the nine-track mag tapes piled up, hanging in a huge rack at the zoology department's computer facility. Finally, in the early '90s, with the growth of Usenet outpacing the zoology department's budget for $15-a-pop tapes, the general archiving project ended.
In the spring of 1991, Bruce Jones [ http://www.shikan.org/ ], then a grad student in the communications department at the University of California at San Diego, flew to Ontario at his own expense. He was writing his Ph.D. dissertation on the history of Usenet and was eager to get his hands on Spencer's tapes.The 141 tapes, most of which held 120 megabytes of posts, now lived at the University of Western Ontario, thanks to a road trip in the middle of the Canadian winter that David Wiseman, the university's network administrator, had taken earlier that year to unburden the University of Toronto's zoology department of them.
Jones would spend the next two weeks rescuing the data off them. Not only was the tape technology rapidly becoming obsolete -- just try to find a working tape-reader today -- but the tapes themselves do not have anything like a 10-year shelf life.
By now the historical import of the tapes was already apparent. But spending two weeks running tapes through a tape-cleaning machine and dumping them on disks was the prerequisite to even looking at them. "Spencer had written a program for removing data from tapes when the tapes went bad," Jones explains. "I was just the first person who was willing to invest my time and money -- a lot of people wanted to see what was on them." In two weeks, Jones got through the first 105 tapes.
"Usenet has always been about arguing about itself," Jones says of the posts that were unearthed. "And the arguments that you see today are the same arguments that go way back into the early '80s, and I'm sure that those arguments will continue well into the future."
Case in point: the fact that the older parts of the archive are now available on Google has given Usenet denizens something new to argue about. "I've already gotten three letters from people accusing me of trying to make money off these archives," Jones observes wryly. All the "archive donors" gave the posts to Google for posterity.
Over the next 10 years, Wiseman got through the remaining three dozen or so tapes by wangling the time and energies of "bored graduate students." But by 1995, constrained by university budgets, the archiving project was running out of disk space.
So, Brewster Kahle, the creator of the Web's other major archiving project, the Internet Archive Wayback Machine [ http://www.salon.com/tech/feature/2001/11/02/wayback/index.html ], chipped in, donating a then-humongous nine-gigabyte hard drive to the cause.
In the end, they pulled more than 2,056,000 posts off the 141 tapes. "It took us 10 years. I got so busy and everybody else got less interested," says Wiseman, almost sheepishly. More than 2 million posts: It doesn't sound like a lot compared to the 700 million total in Google's archive, but they're the oldest remnants.
Apparently someone is still interested. Wiseman used FTP to hand off the files to Google. And just after Google announced the availability of the archive, some rogue used FTP to grab the whole archive off the University of Western Ontario's FTP server -- all three gigs of it transferring in one night. "I have no idea what they plan on using it for, since if it's spam e-mail the addresses are all wrong," says Wiseman. Now, anyone who wants a full copy will have to ask politely first -- it's no longer on the server.
Google filled in the more recent posts not covered by the old DejaNews archive thanks to Jürgen Christoffel of the German National Research Center for Information Technology, who'd kept his own archives in the '90s, and Kent Landfield, a network security developer and the maintainer of FAQs.org [ http://www.faqs.org/ ].
Landfield started archiving with entrepreneurial motives. In 1992 and 1993, while at Sterling Software in Omaha, Neb., Landfield had a side project that sold CDs of the Usenet archive. For $349.95 a year, every month you could get a CD burned with the content of Usenet. It was an attempt to cater to the user with a slower modem who still wanted access to every newsgroup.
"I realized that there was definitely a valuable historical aspect to the CDs themselves," says Landfield. "The reality is, everybody thought that. We're all just a bunch of packrats. We all knew there was a value to it, and it was a matter of how and when it would be used."
Thanks to these packrats, Google now estimates that 95 percent of the posts ever made to Usenet are now searchable from the site. But Spencer, for one, can't help thinking of all that's still been lost -- not just of the other 5 percent of Usenet, but also of the other early history of online communication.
Think of the Arpanet mailing lists that were the precursors to Usenet. Spencer points out that while most of the mailing lists kept archives, a significant number of them have been lost over time. "The first flame war, things like that, most certainly dates before Usenet," he says. "And I would bet that a lot of that material is gone, because at some point, nobody thought it was worth saving."
About the writer
Katharine Mieszkowski [ km@salon.com ] is a senior writer for Salon Technology.
Copyright 2002