Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage Government The Internet

Archivists Work To Identify and Save the Thousands of Datasets Disappearing From Data.gov (404media.co) 70

An anonymous reader quotes a report from 404 Media: Datasets aggregated on data.gov, the largest repository of U.S. government open data on the internet, are being deleted, according to the website's own information. Since Donald Trump was inaugurated as president, more than 2,000 datasets have disappeared from the database. As people in the Data Hoarding and archiving communities have pointed out, on January 21, there were 307,854 datasets on data.gov. As of Thursday, there are 305,564 datasets. Many of the deletions happened immediately after Trump was inaugurated, according to snapshots of the website saved on the Internet Archive's Wayback Machine. Harvard University researcher Jack Cushman has been taking snapshots of Data.gov's datasets both before and after the inauguration, and has worked to create a full archive of the data.

"Some of [the entries link to] actual data," Cushman told 404 Media. "And some of them link to a landing page [where the data is hosted]. And the question is -- when things are disappearing, is it the data it points to that is gone? Or is it just the index to it that's gone?" For example, "National Coral Reef Monitoring Program: Water Temperature Data from Subsurface Temperature Recorders (STRs) deployed at coral reef sites in the Hawaiian Archipelago from 2005 to 2019," a NOAA dataset, can no longer be found on data.gov but can be found on one of NOAA's websites by Googling the title. "Stetson Flower Garden Banks Benthic_Covage Monitoring 1993-2018 -- OBIS Event," another NOAA dataset, can no longer be found on data.gov and also appears to have been deleted from the internet. "Three Dimensional Thermal Model of Newberry Volcano, Oregon," a Department of Energy resource, is no longer available via the Department of Energy but can be found backed up on third-party websites. [...]

Data.gov serves as an aggregator of datasets and research across the entire government, meaning it isn't a single database. This makes it slightly harder to archive than any individual database, according to Mark Phillips, a University of Northern Texas researcher who works on the End of Term Web Archive, a project that archives as much as possible from government websites before a new administration takes over. "Some of this falls into the 'We don't know what we don't know,'" Phillips told 404 Media. "It is very challenging to know exactly what, where, how often it changes, and what is new, gone, or going to move. Saving content from an aggregator like data.gov is a bit more challenging for the End of Term work because often the data is only identified and registered as a metadata record with data.gov but the actual data could live on another website, a state .gov, a university website, cloud provider like Amazon or Microsoft or any other location. This makes the crawling even more difficult."

Phillips said that, for this round of archiving (which the team does every administration change), the project has been crawling government websites since January 2024, and that they have been doing "large-scale crawls with help from our partners at the Internet Archive, Common Crawl, and the University of North Texas. We've worked to collect 100s of terabytes of web content, which includes datasets from domains like data.gov." [...] It is absolutely true that the Trump administration is deleting government data and research and is making it harder to access. But determining what is gone, where it went, whether it's been preserved somewhere, and why it was taken down is a process that is time intensive and going to take a while. "One thing that is clear to me about datasets coming down from data.gov is that when we rely on one place for collecting, hosting, and making available these datasets, we will always have an issue with data disappearing," Phillips said. "Historically the federal government would distribute information to libraries across the country to provide greater access and also a safeguard against loss. That isn't done in the same way for this government data."

This discussion has been archived. No new comments can be posted.

Archivists Work To Identify and Save the Thousands of Datasets Disappearing From Data.gov

Comments Filter:
  • by locater16 ( 2326718 ) on Thursday January 30, 2025 @06:49PM (#65131415)
    New, better data will be hallucinated by whatever AI company delivers the best bribes, making data great again!
    • by presidenteloco ( 659168 ) on Thursday January 30, 2025 @07:32PM (#65131463)
      and seeks to have it destroyed or made inaccessible is despicable.

      How can people of good conscience support this garbage, cretinous move?

      Seriously what's wrong with these cave-dwellers?

      This is absolute stupidity in pure, distilled form.
      Witness it, and tell your incredulous kids about it, if we ever return to sane times.
      • by SeaFox ( 739806 )

        How can people of good conscience support this garbage, cretinous move?

        Because they have rent to pay, and if they don't the administration will find someone who will.

      • by arglebargle_xiv ( 2212710 ) on Friday January 31, 2025 @01:55AM (#65131847)
        Day by day and almost minute by minute the past was brought up to date. In this way every prediction made by the Par.. uh, Leader, could be shown by documentary evidence to have been correct; nor was any item of news, or any expression of opinion, which conflicted with the needs of the moment, ever allowed to remain on record. All history was a palimpsest, scraped clean and reinscribed exactly as often as was necessary.
      • by AmiMoJo ( 196126 ) on Friday January 31, 2025 @06:19AM (#65132129) Homepage Journal

        This is how it happened last time. Mass deportations, attacks on LGBTQ people and their healthcare, burning all the heretical books. We might have websites instead of printed text now, but it's the same thing.

        I just hope he's not stupid enough to invade Greenland or Panama.

        • This is how it happened last time. Mass deportations, attacks on LGBTQ people and their healthcare, burning all the heretical books. We might have websites instead of printed text now, but it's the same thing.

          I just hope he's not stupid enough to invade Greenland or Panama.

          He's stupid enough, but I'm not sure the folks in charge within the military will take those commands and run with them without trying to run it through congress or the courts... oh, never mind.

        • by necro81 ( 917438 )

          I just hope he's not stupid enough to invade Greenland or Panama.

          We have always been at war with Greenland and Panama.

        • Don't overlook something that's happening quietly compared to all the ruckus being caused: destroying the Department of Education, and expanding implementation of so-called 'school vouchers', which is just a way to do an end-run around the Separation of Church and State, allowing people to use taxpayer money to enroll their (white nationalist) children into (whites only) religious schools (that no doubt will indoctrinate them into 'white nationalist' ideology).
          If all that isn't stopped and reversed, we'll
      • "How can people of good conscience support this garbage, cretinous move?"

        That's not happening.

    • by vlad30 ( 44644 )
      While your comment is funny due to the truth is pokes fun at the fact that this happens in every transition is interesting and I would like to know why for each piece of information. New people are not immediately in the office so some of the deletions would have to be outgoing staff especially first day. I remember when all the "W" keys were taken off the keyboards to inconvenience the incoming staff, so are the leaving people deleting information as well as incoming people I would have thought incoming wo
  • As soon as someone stops paying for hosting *poof* it's definitely gonna be gone. Now, when it comes to the government, they will trash anything the other side wouldn't want them to trash. For right wingers it'd be climate change data. For lefties it might be the number of sex-change operations they've subsidized or crime stats for immigrants. Who knows. Just remember that whatever extreme bullshit tools & tactics you create while "your guy" is there will be used on you the absolutely second the "other
  • not a new site... (Score:5, Informative)

    by eleuthero ( 812560 ) on Thursday January 30, 2025 @06:56PM (#65131429)
    Lest this be taken as a comment about the current admin, they started back in January of 2024 with the current pre-transition archive work, and they have done this for each cycle back to 2008—apparently it's the norm for new administrations to erase massive amounts of data (and given that the government has to pay for server space just like everyone else, it makes a bit of sense to have a garage cleanup at least once every four years—why not every spring like the rest of us?). I do like that someone wants to archive things though.
    • by Anonymous Coward
      If you want to worry about costs, worry about the costs of collecting the (now deleted) data in the first place, which becomes wasted money when the collected data is nuked. The costs of storage are trivial.
      • by cusco ( 717999 ) <brian.bixby@nOSPAM.gmail.com> on Thursday January 30, 2025 @07:43PM (#65131473)

        NASA was spending $6 million a year to store spacecraft data from the 1960s in the early 2000s. The Bush Madministration ordered the Mariner spacecraft data to be deleted, NASA did so, but only after management had given a copy of the data to the Planetary Society. The bureaucrats were outraged, and later ordered NASA to destroy the Pioneer data according to government records destruction policies, immediately upon receipt of the order. At risk of their jobs NASA admins also forwarded the Pioneer data to the Planetary Society before carrying out the order.

        The Society found a tape drive that could read the data (in a computer museum, literally), refurbished the drive, and posted the data on their web site. We have a solution to the Pioneer Anomaly because of it..

        • Re:not a new site... (Score:4, Interesting)

          by dargaud ( 518470 ) <slashdot2.gdargaud@net> on Friday January 31, 2025 @04:01AM (#65131963) Homepage
          In 1986 one of my very first job as a summer student at NASA was to write a program (in Fortran) to read a 'wall' of tapes from the Pioneer Venus mission, verify them, and save them to hard drives. Took me a few weeks (the tech was ancient). They then paid a mentally disabled person to follow the procedure, rev the tape in, start the program, wait 2 hours and change the tape. And call someone if the program showed an error. I wish I had a copy of that data, but USB keys were too small back then and didn't work on VAXes... ;-P
    • by Tailhook ( 98486 )

      Lest this be taken as a comment about the current admin, they started back in January of 2024

      Killjoy asshole. I was getting my hate on for teh ebil trumpanzee and his latest democracy destroying nazi book burning operation, and then you come along and louse it up! It was a perfectly cromulent rageline!

      But really, it's as likely to be nervous bureaucrats sweeping inconvenient truths under the rug as it is any directive from POTUS. One could certainly forgive them for being paranoid: the DEI take down EO was one of the finest pieces of governance that has ever occurred in the western world: requ

      • by gtall ( 79522 )

        DEI take down was just code for being as racist as you like without any repercussions from his alleged "administration" and its alleged Justice Department.

        Have you always been that blind or is this something new you are trying out?
         

    • 2008? Ah, OMG...

      You are correct, of course. Just not a popular assessment. Good luck.

      ps - I agree with you.

  • Disappearing data is actually deleting evidence... of all sorts of stuff
  • Inconvenient facts (Score:5, Informative)

    by Local ID10T ( 790134 ) <ID10T.L.USER@gmail.com> on Thursday January 30, 2025 @06:58PM (#65131435) Homepage

    Inconvenient facts will be deleted. Only the Truth* shall remain.

    *as supported by all available data.

    • by Shaitan ( 22585 )

      Keep in mind this wouldn't be deleted by Trump's people, it is an effort that began before he had any people in place and likely he still doesn't in relevant areas of government.

      This is likely some effort being driven by TDS or normal activity being raised like it is a big deal. Similar claims were made in 2017 and all kind of now debunked researched regarding government interference on social media was obviously cooked/selective.

  • by rsilvergun ( 571051 ) on Thursday January 30, 2025 @07:33PM (#65131465)
    Has a web page dedicated to resources for scientists who are under attack by the US federal government. It was published shortly after inauguration day.

    I suspect anyone in favor of the current administration will avoid this thread and any information like it. Because they couldn't stay in favor of the current administration very long if they didn't...

    But they can only ignore reality for so long. Sooner or later it's going to catch up with them. Some of the really old ones might just barely die before it hits them really hard. But a lot of them are under 60.

    I don't know how but somehow they will blame Hunter Biden's laptop
    • by kenh ( 9056 )

      And the reality is what, exactly? That data.gov deleted a link to data stored on government servers?

      The horror!

    • by kenh ( 9056 ) on Thursday January 30, 2025 @08:12PM (#65131521) Homepage Journal

      Got a link? Nevermind, I found it - https://www.nationalacademies.... [nationalacademies.org]

      Researchers and scholars have long been targeted in connection with their professional work. In recent years, such attacks have taken on new dimensions, fueled in part by increased use of social media and other digital means of communication. Recognizing that targeting comes in many forms and from a variety of actors, the Committee on Human Rights of the U.S. National Academy of Sciences, National Academy of Engineering, and National Academy of Medicine has identified an array of resources meant to support researchers and scholars in preventing and responding to targeted attacks.

      Guess what words aren't on that page? "U.S. Government" it's about helping scientists who come under attack on social media, not specifically the government.

      • With the headline under threat in the United States what the hell do you think it means? It doesn't mention social media any more than it mentions government because of 15-year-old could figure out what they're talking about.

        The fact that you're too dense to read between those lines is why they needed to post those resources in the first place. What you doing is classic it can't happen here bullshit.

        It absolutely boggles my mind that the right wing will scream and cry about government all day long u
  • Buttery males?
  • Right after inauguration, before Trump had any time to replace anyone? Weird.

  • by flink ( 18449 ) on Friday January 31, 2025 @12:29AM (#65131787)

    We had identical stories to this posted in early 2017. You'd think people who cared about this data would be prepared this time around and started archiving stuff back in Nov (realistically based on campaign the Democrats ran, I would have started back in June, but that's just me).

  • Datasets aggregated on data.gov, the largest repository of U.S. government open data on the internet, are being deleted, according to the website's own information. Since Donald Trump was inaugurated as president, more than 2,000 datasets have disappeared from the database.

    What did Donald get butt-hurt about this time?

  • "It is absolutely true that the Trump administration is deleting government data and research and is making it harder to access."

    Technically the government instantly all became the Trump administration but in reality the bureaucrats opposed to his agenda still control most of government and the archive. This would almost certainly be an effort by the same, likely having convinced themselves that this is needed to prevent Trump's agenda in some way.

    Or, given the anti-trump history of such efforts and the sma

  • less than 1% of the dataset have been removed. Why is this a big deal? How do you know they aren't crap or outdated?
    Is there some law that everything must be retained forever?

"Marriage is like a cage; one sees the birds outside desperate to get in, and those inside desperate to get out." -- Montaigne

Working...