Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Dennis Donovan

(29,963 posts)
Tue Apr 1, 2025, 02:31 PM Tuesday

IEEE Spectrum: How Digital Archivists Are Saving Public Information from the Memory Hole

IEEE Spectrum - How Digital Archivists Are Saving Public Information from the Memory Hole

Through clever usage of APIs, the Library Innovation Lab at Harvard Law School has created an archive of Data.gov, home to 311,000 public datasets

Harry Goldstein
1 hour ago

In the three decades since Brewster Kahle spun up the nonprofit Internet Archive’s Wayback Machine, it has scaled up to include government websites and datasets—many of which are essential to the engineering and scientific communities. U.S. government agencies like the National Science Foundation, Department of Energy, and NASA are critical sources of research data, technical specifications, and standards documentation in pretty much every area where IEEE Spectrum’s audience works—AI & computer science, biomedical devices, power and energy, semiconductors, telecommunications…the list goes on.

Access to that governmental data directly affects the reproducibility of experiments, the validation of models, and the integrity of the scholarly record.

So what happens if an entire dataset vanishes? Among other things, it can invalidate years of research built upon that foundation.

Until recently, wholesale deletion of data has been rare. In the United States, presidential transitions typically involve some changes to government websites to reflect new policy priorities. And after 9/11, the George W. Bush administration removed “millions of bytes” of information from government sites for security reasons as well as hundreds of Department of Defense documents and “tens of thousands” of Federal Energy Regulation Commission files.

The Obama and Biden administrations likewise made changes to government websites but didn’t engage in large-scale removal of Web pages or datasets. Obama, in fact, expanded public access to government data in 2009 by launching Data.gov, whose stated mission is in part “to unleash the power of government open data to inform decisions by the public and policymakers.”

/snip
Latest Discussions»General Discussion»IEEE Spectrum: How Digita...