Those of us with large data storage needs today are spoiled, with hard drives totaling over 20 terabytes, and options like remote cloud storage offer virtually unlimited storage, if you are willing to pay. Step back to 1998, and things were quite different. According to one source, a 12-gigabyte hard drive cost $349 (the equivalent of more than $650 in 2024).
While 12 gigabytes of storage was large for the time, hard drives were not the only solution for data storage. Magnetic tape storage has been around nearly since the start of computing. It comes in a variety of sizes, with varying tape widths and lengths. Modern Linear Tape-Open, or LTO, formats support 18 terabytes of uncompressed data, or up to 45 terabytes if it compresses well.
Magnetic tape had a few advantages over media like hard drives. Magnetic tapes often stored more data than other contemporary formats. For instance, Digital Data Storage 2, or DDS-2, which launched in 1993, could store 4 gigabytes of uncompressed data, a large amount of data when CD-ROM adoption had just started to take off. The tape media itself was often cheaper than other forms of storage, but the drives to read and write the media were much more expensive, costing thousands of dollars. Still, for large companies using large amounts of data, tape was the way to go.
While tape was often too slow to use as direct storage, it was a tested and reliable form of backup storage. Backups could be scheduled to run at a specific time, with automated robotic tape libraries switching out tapes, easily allowing data backups to span across multiple tapes for even larger storage pools, without human intervention. Unlike hard drives, which have comparatively fragile data platters, tape cartridges are more robust, making them a good candidate to store remotely off site for additional data security.
It should be no surprise then that many video game developers used, and continue to use, tape backup as one form of storage. But recovering data from old tape backups isn’t a sure thing. As the technology advanced, the security technology did too, with encryption stopping anyone without a key from recovering any data at all. Without that key, it would be nearly impossible to recover data.
When no encryption was used, it can still be difficult to recover data from tape backups. The tape drives themselves, once necessary (and expensive) devices, were often used until they couldn’t function anymore, so finding hardware that can be trusted today can be difficult. Older tapes can have several condition issues, including sticky-shed syndrome, where the tape deteriorates and gets left behind on the now-precious hardware.
Even if both the hardware and storage media are in good condition, there is no guarantee that the data will be accessible. A wide range of software has been created to aid in the creation of backups, but many of these formats are largely undocumented, which leaves some formats inaccessible outside of using dated legacy software, which may require licenses and activation services that no longer are supported or even exist.
Finding what software was used to initially create the backup can be difficult. Using Linux and the command-line program “dd,” backups of the tape can often be made, though there may not be a way to access what was saved. With any luck, the dd backup will reveal what program was used, as some programs would leave a marker at the beginning of a tape that indicated the software used.
And I’ve gotten lucky in the past, with remnants of “NT Backup,” a Windows utility, or “Backup Exec,” a popular backup program, left in the header, enough for me to properly decode the tapes using both vintage and newly-created software. In other cases, I have had to dig deeper to find the solution.
Often, there are no immediately obvious clues. But luck is a funny thing, and one email, readable through a hex editor of a dd backup, was from a systems administrator at the game developer, which not only gave me a hint to a specific person that may know more, but also contained instructions to employees telling them to exclude certain files from tape backup using a “arcagent.log.” With a quick Google search revealing just a handful of results, it was clear that “Arcserve” was the likely software used, and using some old software I’ve been able to confirm that is the case.
Because it is always a nice feeling to receive additional confirmation, digging through the files that were cataloged gave the ultimate confirmation: the tapes had a version of Arcserve 6 backed up. One of the almost-magical things about data tape recovery is that you really don’t know what might be on them! While game developers would often burn CD or other optical discs containing completed projects, tape backups can capture much of the day-to-day of a studio, with email conversations, a developer’s desktop on a given day, and in-progress ideas or concepts that only appear on these tapes, as they may never make it to or be relevant to the final project.
The lack of documentation for many of these formats can create a huge, almost insurmountable, roadblock for digital preservation professionals. Data recovery companies do exist, but I have yet to find one willing to share details about how they work, which is not surprising given it is their business! It can also be difficult to discern what their capabilities are, or the final cost of the recovery.
Data recovery is just one step of many in the preservation process. Accessing, analyzing, and storing the resulting data adds further complexity, but at the end of the day, it’s the closest thing to traveling in a time machine. So what exactly is on these tapes that I have been recovering? Stay tuned…