


Archivists Work To Identify and Save the Thousands of Datasets Disappearing From Data.gov (404media.co) 70
An anonymous reader quotes a report from 404 Media: Datasets aggregated on data.gov, the largest repository of U.S. government open data on the internet, are being deleted, according to the website's own information. Since Donald Trump was inaugurated as president, more than 2,000 datasets have disappeared from the database. As people in the Data Hoarding and archiving communities have pointed out, on January 21, there were 307,854 datasets on data.gov. As of Thursday, there are 305,564 datasets. Many of the deletions happened immediately after Trump was inaugurated, according to snapshots of the website saved on the Internet Archive's Wayback Machine. Harvard University researcher Jack Cushman has been taking snapshots of Data.gov's datasets both before and after the inauguration, and has worked to create a full archive of the data.
"Some of [the entries link to] actual data," Cushman told 404 Media. "And some of them link to a landing page [where the data is hosted]. And the question is -- when things are disappearing, is it the data it points to that is gone? Or is it just the index to it that's gone?" For example, "National Coral Reef Monitoring Program: Water Temperature Data from Subsurface Temperature Recorders (STRs) deployed at coral reef sites in the Hawaiian Archipelago from 2005 to 2019," a NOAA dataset, can no longer be found on data.gov but can be found on one of NOAA's websites by Googling the title. "Stetson Flower Garden Banks Benthic_Covage Monitoring 1993-2018 -- OBIS Event," another NOAA dataset, can no longer be found on data.gov and also appears to have been deleted from the internet. "Three Dimensional Thermal Model of Newberry Volcano, Oregon," a Department of Energy resource, is no longer available via the Department of Energy but can be found backed up on third-party websites. [...]
Data.gov serves as an aggregator of datasets and research across the entire government, meaning it isn't a single database. This makes it slightly harder to archive than any individual database, according to Mark Phillips, a University of Northern Texas researcher who works on the End of Term Web Archive, a project that archives as much as possible from government websites before a new administration takes over. "Some of this falls into the 'We don't know what we don't know,'" Phillips told 404 Media. "It is very challenging to know exactly what, where, how often it changes, and what is new, gone, or going to move. Saving content from an aggregator like data.gov is a bit more challenging for the End of Term work because often the data is only identified and registered as a metadata record with data.gov but the actual data could live on another website, a state .gov, a university website, cloud provider like Amazon or Microsoft or any other location. This makes the crawling even more difficult."
Phillips said that, for this round of archiving (which the team does every administration change), the project has been crawling government websites since January 2024, and that they have been doing "large-scale crawls with help from our partners at the Internet Archive, Common Crawl, and the University of North Texas. We've worked to collect 100s of terabytes of web content, which includes datasets from domains like data.gov." [...] It is absolutely true that the Trump administration is deleting government data and research and is making it harder to access. But determining what is gone, where it went, whether it's been preserved somewhere, and why it was taken down is a process that is time intensive and going to take a while. "One thing that is clear to me about datasets coming down from data.gov is that when we rely on one place for collecting, hosting, and making available these datasets, we will always have an issue with data disappearing," Phillips said. "Historically the federal government would distribute information to libraries across the country to provide greater access and also a safeguard against loss. That isn't done in the same way for this government data."
"Some of [the entries link to] actual data," Cushman told 404 Media. "And some of them link to a landing page [where the data is hosted]. And the question is -- when things are disappearing, is it the data it points to that is gone? Or is it just the index to it that's gone?" For example, "National Coral Reef Monitoring Program: Water Temperature Data from Subsurface Temperature Recorders (STRs) deployed at coral reef sites in the Hawaiian Archipelago from 2005 to 2019," a NOAA dataset, can no longer be found on data.gov but can be found on one of NOAA's websites by Googling the title. "Stetson Flower Garden Banks Benthic_Covage Monitoring 1993-2018 -- OBIS Event," another NOAA dataset, can no longer be found on data.gov and also appears to have been deleted from the internet. "Three Dimensional Thermal Model of Newberry Volcano, Oregon," a Department of Energy resource, is no longer available via the Department of Energy but can be found backed up on third-party websites. [...]
Data.gov serves as an aggregator of datasets and research across the entire government, meaning it isn't a single database. This makes it slightly harder to archive than any individual database, according to Mark Phillips, a University of Northern Texas researcher who works on the End of Term Web Archive, a project that archives as much as possible from government websites before a new administration takes over. "Some of this falls into the 'We don't know what we don't know,'" Phillips told 404 Media. "It is very challenging to know exactly what, where, how often it changes, and what is new, gone, or going to move. Saving content from an aggregator like data.gov is a bit more challenging for the End of Term work because often the data is only identified and registered as a metadata record with data.gov but the actual data could live on another website, a state .gov, a university website, cloud provider like Amazon or Microsoft or any other location. This makes the crawling even more difficult."
Phillips said that, for this round of archiving (which the team does every administration change), the project has been crawling government websites since January 2024, and that they have been doing "large-scale crawls with help from our partners at the Internet Archive, Common Crawl, and the University of North Texas. We've worked to collect 100s of terabytes of web content, which includes datasets from domains like data.gov." [...] It is absolutely true that the Trump administration is deleting government data and research and is making it harder to access. But determining what is gone, where it went, whether it's been preserved somewhere, and why it was taken down is a process that is time intensive and going to take a while. "One thing that is clear to me about datasets coming down from data.gov is that when we rely on one place for collecting, hosting, and making available these datasets, we will always have an issue with data disappearing," Phillips said. "Historically the federal government would distribute information to libraries across the country to provide greater access and also a safeguard against loss. That isn't done in the same way for this government data."
Don't worry, we have "alternative data" (Score:5, Funny)
Any political side that fears good science data (Score:5, Insightful)
How can people of good conscience support this garbage, cretinous move?
Seriously what's wrong with these cave-dwellers?
This is absolute stupidity in pure, distilled form.
Witness it, and tell your incredulous kids about it, if we ever return to sane times.
Re: (Score:1)
How can people of good conscience support this garbage, cretinous move?
Because they have rent to pay, and if they don't the administration will find someone who will.
Re: (Score:1)
I think your hope is misplaced. His "party" has shown themselves to be bunch of whiny eunuchs who will do anything he tells them to, including giving up their responsibilities as parts of Congress. With the Supreme Court and the 5th circuit kissing his ass, we cannot rely on the Courts either.
Re: (Score:1)
The Great Data maybe isn't destroyed, but it has been Disappeared by the Shallow Broligarchy and the Public who Paid for it has Lost Access, Making it More Difficult to form a Valid Opinion based on the Great American Scientific Facts.
Make Access to Data Easy Again!
Translated to trump English for you
Re:Any political side that fears good science data (Score:4, Insightful)
Re:Any political side that fears good science data (Score:5, Insightful)
This is how it happened last time. Mass deportations, attacks on LGBTQ people and their healthcare, burning all the heretical books. We might have websites instead of printed text now, but it's the same thing.
I just hope he's not stupid enough to invade Greenland or Panama.
Re: (Score:3)
This is how it happened last time. Mass deportations, attacks on LGBTQ people and their healthcare, burning all the heretical books. We might have websites instead of printed text now, but it's the same thing.
I just hope he's not stupid enough to invade Greenland or Panama.
He's stupid enough, but I'm not sure the folks in charge within the military will take those commands and run with them without trying to run it through congress or the courts... oh, never mind.
Re: (Score:2)
We have always been at war with Greenland and Panama.
Re: (Score:2)
Re: (Score:2)
If all that isn't stopped and reversed, we'll
Re: Any political side that fears good science dat (Score:2)
"How can people of good conscience support this garbage, cretinous move?"
That's not happening.
Re: (Score:2)
The Internet is most certainly NOT forever. (Score:1, Flamebait)
Re:The Internet is most certainly NOT forever. (Score:5, Interesting)
All that matters is a false equivalency is established to convince everyone that it's hopeless, just go home, masturbate to porn and leave Trump and his billionaire buddies to do their thing.
How exactly the US is any different from the USSR in the 70s and 80s is beyond me. The oligarchs in the US don't call each other "comrade", but other than that they will hide the facts, bury the evidence, and manipulate what people see so that they have no idea of the effect of government policy.
When Federal troops march into California to suppress a manufactured insurrection, doubtless involving "illegal immigrants" and incompetent or rebellious state officials, and install a military government, which will create a state legislature that votes to unlimited presidential terms, you will nod and agree, comrade, that that was the right thing to do.
Re: (Score:3, Interesting)
Re: (Score:2)
Re: (Score:2)
Oh, dear, yer askin for it. We got to give movie tickets away to the one that is truth. Until then, most people will cling to the fiction that there are two major political parties in the US.
Re: (Score:3, Insightful)
He's talking about seizing the territory of foreign states. He's an imperialist. What's more, he's a.loudmouthed idiot who has religious fanatics and oligarchs whispering in his ears. He's not giving money to US citizens, he's redirecting it into the hands of billionaires.
Are you blind?
Re: (Score:3)
Yes, make America an outcast of the rest of the world. So far he's pissed off a good deal central America, Mexico, just about all S. America, Canada, Europe, Africa, and the Afghanis who we convinced to work for us but are no denied emigration to the U.S. Nice way to treat what used to be friends.
Last time around he pissed off S. Korea and Japan. This time he's starting with Taiwan and will soon get around to re-pissing off S. Korea and Japan.
China already has projects in S. America and even the Bahamas and
Re:The Internet is most certainly NOT forever. (Score:4, Funny)
How exactly the US is any different from the USSR in the 70s and 80s is beyond me.
Breshnev had more hair than Trump, for one.
not a new site... (Score:5, Informative)
Re: (Score:1)
Re:not a new site... (Score:5, Insightful)
NASA was spending $6 million a year to store spacecraft data from the 1960s in the early 2000s. The Bush Madministration ordered the Mariner spacecraft data to be deleted, NASA did so, but only after management had given a copy of the data to the Planetary Society. The bureaucrats were outraged, and later ordered NASA to destroy the Pioneer data according to government records destruction policies, immediately upon receipt of the order. At risk of their jobs NASA admins also forwarded the Pioneer data to the Planetary Society before carrying out the order.
The Society found a tape drive that could read the data (in a computer museum, literally), refurbished the drive, and posted the data on their web site. We have a solution to the Pioneer Anomaly because of it..
Re:not a new site... (Score:5, Informative)
I think that was all data for the '60s, but the article wasn't clear. Climate controlled nitrogen-filled warehouse was mentioned, which can't be cheap.
we were spending $6M/year to save data on computer tapes that we no longer had the hardware to read tapes?
Yep, Congress won't give them the money to change the format, and none of the presidents will ask for it anyway. The joys of allowing a herd of technophobic lawyers run a science and engineering program.
Re:not a new site... (Score:4, Interesting)
Re: (Score:2)
I think I love you.
Re: (Score:3)
Lest this be taken as a comment about the current admin, they started back in January of 2024
Killjoy asshole. I was getting my hate on for teh ebil trumpanzee and his latest democracy destroying nazi book burning operation, and then you come along and louse it up! It was a perfectly cromulent rageline!
But really, it's as likely to be nervous bureaucrats sweeping inconvenient truths under the rug as it is any directive from POTUS. One could certainly forgive them for being paranoid: the DEI take down EO was one of the finest pieces of governance that has ever occurred in the western world: requ
Re: (Score:3)
DEI take down was just code for being as racist as you like without any repercussions from his alleged "administration" and its alleged Justice Department.
Have you always been that blind or is this something new you are trying out?
Re: (Score:2)
2008? Ah, OMG...
You are correct, of course. Just not a popular assessment. Good luck.
ps - I agree with you.
Suspicious minds might conclude that (Score:2)
Inconvenient facts (Score:5, Informative)
Inconvenient facts will be deleted. Only the Truth* shall remain.
*as supported by all available data.
Re: (Score:2)
Keep in mind this wouldn't be deleted by Trump's people, it is an effort that began before he had any people in place and likely he still doesn't in relevant areas of government.
This is likely some effort being driven by TDS or normal activity being raised like it is a big deal. Similar claims were made in 2017 and all kind of now debunked researched regarding government interference on social media was obviously cooked/selective.
The national academy of sciences (Score:3, Insightful)
I suspect anyone in favor of the current administration will avoid this thread and any information like it. Because they couldn't stay in favor of the current administration very long if they didn't...
But they can only ignore reality for so long. Sooner or later it's going to catch up with them. Some of the really old ones might just barely die before it hits them really hard. But a lot of them are under 60.
I don't know how but somehow they will blame Hunter Biden's laptop
Re: (Score:1)
And the reality is what, exactly? That data.gov deleted a link to data stored on government servers?
The horror!
Re:The national academy of sciences (Score:4, Informative)
Got a link? Nevermind, I found it - https://www.nationalacademies.... [nationalacademies.org]
Researchers and scholars have long been targeted in connection with their professional work. In recent years, such attacks have taken on new dimensions, fueled in part by increased use of social media and other digital means of communication. Recognizing that targeting comes in many forms and from a variety of actors, the Committee on Human Rights of the U.S. National Academy of Sciences, National Academy of Engineering, and National Academy of Medicine has identified an array of resources meant to support researchers and scholars in preventing and responding to targeted attacks.
Guess what words aren't on that page? "U.S. Government" it's about helping scientists who come under attack on social media, not specifically the government.
Re: (Score:2)
The fact that you're too dense to read between those lines is why they needed to post those resources in the first place. What you doing is classic it can't happen here bullshit.
It absolutely boggles my mind that the right wing will scream and cry about government all day long u
Re:Are you serious? (Score:5, Interesting)
Re: (Score:2)
"If the obscured data points to an organized campaign to suppress truth or people, then we have a problem."
Sure but this began before Trump even had control and he likely still doesn't have control of this area of government so the campaign might be by his 'administration' in the sense that all of it is technically under him now but not by people loyal to the people or the President we've elected.
Have you considered (Score:2)
interesting (Score:2)
Right after inauguration, before Trump had any time to replace anyone? Weird.
Re: (Score:1)
And you say that with a straight face even knowing all the last minute efforts by Europe and whoever was running the Biden admin to undermine that goal? If they hadn't thrown tens of billions in last minute funding at Ukraine and escalated in major ways it would have been over by now.
Re: (Score:2)
They didn't throw tens of billions in the last minute.
They made sure what congress had already approved to go to Ukraine actually went to Ukraine.
This money was ALREADY APPROVED by congress to go there, the administration just decided not to slow walk it.
Re: (Score:2)
Just because your mom gives you $20 to go the mall doesn't mean you are obligated to spend it all at the mall. Congress controls the purse strings, they have no and are entitled to no control after they open the purse and any bill they've passed saying otherwise violates the separation of powers.
Don't forget, we've got a supreme court that more or less actually follows the Constitution EVEN WHERE the court failed to do so in the past. That means birthright citizenship will likely go back to how we interpret
Re: (Score:2)
"They made sure what congress had already approved to go to Ukraine actually went to Ukraine."
Thus intentionally undermining both the new President and democratic process by giving the United States less leverage. Technically not treason since we are neutral in the conflict, no 'enemy', but definitely an intentional act to hurt the United States. Between the money and permission to escalate Ukraine's offensive they likely have the blood of another hundred thousand or so people on their hands but they don't
Why scrambling? (Score:3)
We had identical stories to this posted in early 2017. You'd think people who cared about this data would be prepared this time around and started archiving stuff back in Nov (realistically based on campaign the Democrats ran, I would have started back in June, but that's just me).
What? (Score:2)
Datasets aggregated on data.gov, the largest repository of U.S. government open data on the internet, are being deleted, according to the website's own information. Since Donald Trump was inaugurated as president, more than 2,000 datasets have disappeared from the database.
What did Donald get butt-hurt about this time?
Yes and no (Score:2)
"It is absolutely true that the Trump administration is deleting government data and research and is making it harder to access."
Technically the government instantly all became the Trump administration but in reality the bureaucrats opposed to his agenda still control most of government and the archive. This would almost certainly be an effort by the same, likely having convinced themselves that this is needed to prevent Trump's agenda in some way.
Or, given the anti-trump history of such efforts and the sma
Re: (Score:2)
less than 1% of the dataset have been removed (Score:2)
less than 1% of the dataset have been removed. Why is this a big deal? How do you know they aren't crap or outdated?
Is there some law that everything must be retained forever?