Forgot your password?
typodupeerror
Data Storage The Internet The Media

New York Times Wipes Journalist's Online Corpus 94

Posted by timothy
from the check-the-unpersonals dept.
thefickler writes "Reading about Peter Wayner and his problems with book piracy reminded me of another writer, Thomas Crampton, who has the opposite problem — a lot of his work has been wiped from the Internet. Thomas Crampton has worked for the New York Times (NYT) and the International Herald Tribune (IHT) for about a decade, but when the websites of the two newspapers were merged two months ago, a lot of Crampton's work disappeared into the ether. Links to the old stories are simply hitting generic pages. Crampton wrote a letter to Arthur Sulzberger, the publisher of the NYT, pleading for his work to be put back online. The hilarious part: according to one analysis, the NYT is throwing away at least $100,000 for every month that the links remain broken."
This discussion has been archived. No new comments can be posted.

New York Times Wipes Journalist's Online Corpus

Comments Filter:
  • broken links? (Score:2, Interesting)

    by mcfatboy93 (1363705)

    the NYT is throwing away at least $100,000 for every month that the links remain broken."

    now how much would it cost to fix all those links...

    no wonder newspapers are not doing well

    • Re: (Score:2, Informative)

      by mysidia (191772)

      according to one analysis, the NYT is throwing away at least $100,000 for every month that the links remain broken."

      Also according to one analysis: the world is flat.

      Apparently the NYT may have a different opinion.

      Either that or they're so large $100,000 a month is so insignificant to them it's not the most viable cost-saving/revenue-improving project for them to start at this time.

      • Re:broken links? (Score:4, Interesting)

        by pbhj (607776) on Friday May 15, 2009 @08:31AM (#27965205) Homepage Journal

        Personally I think that analysis is way out.

        I'm seeing 396 results on Google for: "thomas Crampton" site:nytimes.com, out of 1130 results from the NYT on-site search engine.

        5 of those google links are dated in the last week, which I assume are related to this story.

        $100 000 per month estimated loss presumably is advertising revenue on page hits from links for those stories. Earnings of 500c pm (ie $5 for every 1000 visitors) would mean 20 Million visitors a month are clicking through to his stories specifically and can't be assuaged with any other content.

        This would only be a loss if a similar / 404 / search landing page had a lower earnings rate.

        Seems unlikely to me - I think this is just [very clever] linkbaiting from someone who, it appears, was sacked from the NYT and is trying to make a living elsewise.

        • Re: (Score:2, Interesting)

          by RealGrouchy (943109)

          $100 000 per month estimated loss presumably is advertising revenue on page hits from links for those stories.

          Forgive me, father, for I have RTFA:

          According to Compete.com, IHT.com was getting over 1.5 million visitors/month before it shut down. If a third of those visitors were from search and direct old links, 500,000 visitors a month are hitting the dead end in the image above, instead of the page they were looking for. To buy that traffic from Google at $.20/click, you'd have to pay $100,000 a month.

          So essentially, the "one analysis" says that if they wanted to buy the very-roughly-estimated traffic they hypoth

        • Re: (Score:3, Interesting)

          by cream wobbly (1102689)

          Personally I think that analysis is way out.

          I'm seeing 396 results on Google for: "thomas Crampton" site:nytimes.com, out of 1130 results from the NYT on-site search engine.

          That's why you're not an investigative journalist. The $100k/mo. estimate is for all IHT articles which were erased in the merger; not just the one author.

          Do try to keep up.

  • Wayback machine (Score:5, Informative)

    by wjousts (1529427) on Friday May 15, 2009 @07:25AM (#27964539)
    Groovy baby [archive.org].
  • by narfspoon (1376395) on Friday May 15, 2009 @07:27AM (#27964563)
    CNN's website doesn't have as many broken links.
    Articles over a decade old still work!
    Whoever designed theirs deserves a lot of credit.
  • This sucks (Score:4, Insightful)

    by ILongForDarkness (1134931) on Friday May 15, 2009 @07:30AM (#27964581)
    We've come to rely on being able to find things on the internet, it is sad to think that information might go away and cease to exist. That said, I guess it depends on the contract the writers have whether he has a right to have his body of work preserved or not. I mean if a company pays for your work it is theirs and not yours unless your contract entitles you to it. Once you've sold your work to somebody, they can never have anyone read it and use it to line hamster cages for all they care.
    • Re: (Score:2, Interesting)

      by noidentity (188756)
      Yet another reason why locking up content is wrong. Let it be freely copied, and then ANYONE who finds the work valuable can potentially become a caretaker of the work and keep it accessible online. Then the only way a work would disappear is if nobody has interest or time to preserve it.
      • Yep.

        Remember this story from a few hours ago?

        http://it.slashdot.org/article.pl?sid=09/05/15/0138204 [slashdot.org]

        (~)
        Locking up content is fun!! Then you can sue Pirates(TM) when someone copies a plane design. But if the site admin "never actually kept an offline copy" then years of data is gone!!
        (/~)

        (Speaking of which, that story sounds totally bogus. They coded live on production servers?? Sounds like they played dead.)

      • I guess the problem then becomes that sometimes information isn't interesting enough to keep until you have something come up that you need it for. For example say someone rights an article for the editorial section of a newspaper. Not really interesting. Now 20 years later they are running for political office. Those letters are now interesting but they didn't have apparent worth at the time.

        I like the idea in a way though that the consumers decide whether the content is useful enough to hang on to. In a

        • by wisty (1335733)

          Just imagine what anthropologists in 1000 years time will think. They will mark this century as the turning point in human history (just like the last one ... and the one before that), and the only evidence of human culture is a few old tape back-ups from slashdot and 4chan.

    • This reminds me of a scifi story i read probably 35 years ago about a library that had one file drawer filled with all of civilization's information/knowledge. Someone came up with the idea that it needed to be indexed hence another file drawer came into being holding the index. Then a cross index created a need for another drawer and on and on until the world ran out of room. They kept shrinking the size of the drawers until they became "subatomic" in size. They had to hold all the "drawers" on a separate
  • This is so unfortunate. IHT was great before the merge, which was touted as a "new" version of IHT. Instead, they just canned it and attempted to transfer its content to the existing NYT site. And did a dreadful job, it seems.

    I understand the logic - newspapers need to cut costs because they can't figure out the internet and it is killing them. But they lost a dedicated reader in me with this move.

  • And it's got unlimited space. Strangely enough, some people are adamant about keeping their works out of this library. And I say they have the right to insure the internet forgets about them when they die. This poor soul seems to understand what's going on.
    • by mc1138 (718275)
      An interesting point, and perhaps a bigger one points to the eventual shift away from a pay format in terms of a lot of this information. Already we've seen a dramatic rise in piracy or people going after free content. Taking this a step further the market place will eventually push out the pay per use model on a lot of this information be it WSJ or music, or TV, allowing ads for consumer based merchandise to fuel, at least for now the demands on the infrastructure. My question is though how long can this l
    • And it's got unlimited space.

      The internet is actually nearly full, I hope there is eno

    • Sadly, data on the internet is currently a lot more volatile than the library of Alexandria, and the internet's contents are likely to survive for much less time. Digital media doesn't have the lifespan of ancient media, even papyrus. :-(

  • The problem IMHO is not so much the broken links, but instead the desire (or lack of...) from the corporate overlord to retain "obsolete" content. Priority was given to the merger of both titles, without considering what makes a newspaper what it is: content.
    • I still find it strange that it seems to be only old-world, *major* corporations that have this problem so badly.

      Every random kid's blog and webcomic has archives dating back to the day the thing started and easily accessible.

  • I was interested in reading the analysis that led to the $100,000/month loss per month the guy's work was offline. So doing what you do, I clicked on the link and found it grandly hilarious to receive a 500 error stating: "Error establishing a database connection". Oh, the irony.

    • by djmurdoch (306849)

      They're slashdotted, losing lots of traffic: so yes, it's ironic. But you can read the article if you want:

      Paste the link "http://www.globaltechproducts.com/blog/1734/how-not-to-redesign-your-website-a-marketing-lesson-from-nytimescom/" into Google, you'll find the article in the Google cache.

      The (to me questionable) basis for the calculation is that all old International Herald Tribune links are broken. It used to get X million hits per month, which are by a hokey calculation worth $100k.

      • Posting that link into google also provides a google search link to _this_ thread.

        this comment is missing from google cache, however...
      • They assumed that a third of the 1.5m monthly hits are paid click-throughs from Google that are worth 20 cents each, hence the $100K. Pretty bogus. But even better, that article acknowledges that the Times are in the process of migrating the old stories over, so eventually the links will work again anyway.
  • by code65536 (302481) on Friday May 15, 2009 @07:49AM (#27964749) Homepage Journal

    Whenever I redesign my site, I try hard to avoid changing and URLs. But if I do have to change a URL, I always make sure that there is a redirect (preferably a HTTP/301 permanent redirect) that points from the old URL to the new URL. Updating links is not enough, because you will always have links that come from external sites that you don't control, user bookmarks, links found in "Hey, check this article out" e-mails, etc.

    This is one of those basic principles of the web that the W3C (and for those who don't pay attention to them, you can substitute that with "plain old common sense" here) strongly recommends.

    It means that users can always find and view content. It means that you still retain your ad revenue. It means that you still keep your PageRank for external sites that link. It means less bitrot and a more useful web...

    • Well.. we have a huge majority of "designers" out there who design to Microsoft dogma and can't even be bothered to even check their web page using Firefox on their own machine right now. They could care less about any type of good practice let alone trying to conform (or even reading in the first place) the ideas that W3C has put out.

      None of this is surprising to me in the least... just sad.

    • by WillAdams (45638)

      The problem there is this only works if one controls the _entire_ URL.

      I had pages on AOL's FTP/webspace since its inception through AOL's ``sunsetting'' those services --- unfortunately, I published a number of papers which had links to http://members.aol.com/willadams [aol.com] so all the printed copies are out of date since there's no way to update them to http://mysite.verizon.net/william_franklin_adams/ [verizon.net]

      It's this sort of thing which makes the MLA's decision to omit hard-coded URLs from their references....

      http://w [insidehighered.com]

      • You complain about how all of your AOL-hosted links ceased to work and how you're unable to update all the places they were used to point to your (currently) Verizon-hosted content. Do you see the problem with this?

        The solution to this is to get your own domain, so you retain the ability to move it at will. I started out with my primary domain (http://www.fencepost.net/ [fencepost.net]) because I wanted a reliable email address after two successive ISPs were bought out. I would never use a carrier-provided email address as

        • by WillAdams (45638)

          Acknowledged. For my part, I've quit putting my homepage URL in papers and instead will just upload stuff to CTAN and point to that.

          I looked into registering a domain name, but coudn't find one I liked (not that I like william_franklin_adams) --- ::grrr:: squatters.

          William

        • The solution might be to place a GUID and keywords in anything you post online, and specify its location by saying "Google GUID1239872129412 Joe Schmoe Lemur behavior paper"

          Then if you move hosts, it'll eventually get picked up by search engines and people will be able to find it, even if the URL itself has changed. (Hell, it might even find a copy someone made and posted at their own site.)

  • by yogibaer (757010) on Friday May 15, 2009 @07:50AM (#27964753)
    I feel for the guy and his lost articles, but I am wondering why he did not keep backups of everything? The stories seem to be gone forever, or else his letter would be about to re-publishing. his stories on his own website.... That is a rather bad case of negligence on the publisher's side , but more so on the part of Mr. Crampton. For comparison: I work with a professional fotojournalist and this guy has been working for 50 years now and has archived everything (more than 1.5 million pictures) like a mad squirrel. If you ask him about an article he wrote in 1961, it takes him about five minutes to find a copy of the article and the raw materials. Everything analog but nonetheless... That makes you wonder if -while embracing digital media and the blogosphere - many journalists have not brought with them the necessary tools to manage and archive their digital assets.
    • It's hard to tell from the linked article (yeah, I read it) but it doesn't seem like Crampton has no copies of the articles (surely he would keep of his own stuff) but that they're just not accessible on the Internet. All the links that should point to them from the NYT and the IHT went kablammo when the two sites merged.

      There's no way a back up on his end could fix this problem.

    • Re: (Score:3, Insightful)

      by pbhj (607776)

      I feel for the guy and his lost articles, [...]

      I feel for him too. Of course the articles aren't his, they are his employers (unless he has a contract that says otherwise) - which is probably why he's bothered. If they were _his_ articles then he could wholesale upload them to his own site and reap the rewards (whatsoever they may be).

  • by hacker (14635) <hacker@gnu-designs.com> on Friday May 15, 2009 @07:56AM (#27964819)

    In the digital age, wiping out thousands of volumes of material takes mere seconds. Permanently. Gone. Poof.

    We have books, printed books, which go back hundreds and hundreds of years (well, written material; the printing press is a fairly recent invention).

    We don't even have a record of some newspaper articles that came out 5 years ago. We're LOSING our history, not retaining it, because we lack sufficient "printing" to always keep a copy in circulation. Witness the Avism.com [slashdot.org] debacle and hundreds of other cases where this has happened.

    Until we can have a hard-copy of digital media which can NOT be changed, edited, altered or redacted... we're lost.

    When we all have "Kindle DX2" devices in the classroom for digital copies of our textbooks... what is stopping them from "gently changing" some of the wording over time, over a few years, to permanently alter the way our youth views the history of times they never lived through?

    How can you compare one version of a website today, with the one that was there last week? Was anything changed? Was article content "censored" in any subtle way?

    We're heading down a very slippery slope, when digital information can't remain static enough to hold through the years, and be validated and verified to be unchanged, with sufficient copies in enough hands, to ensure survivability. The Internet is not the place to "store" things you want to keep for years and decades.

    • When we all have "Kindle DX2" devices in the classroom for digital copies of our textbooks... what is stopping them from "gently changing" some of the wording over time, over a few years, to permanently alter the way our youth views the history of times they never lived through?

      What makes you believe this isn't already occuring with paper textbooks? I can't speak for the current crop (as new editions are pushed on schools practically every year) but when I was in middle / high school our social studies and

    • by PhxBlue (562201)

      Until we can have a hard-copy of digital media which can NOT be changed, edited, altered or redacted... we're lost.

      You mean like these [wikipedia.org]?

      • by hacker (14635)

        Yes, except the shelf life of standard, single-use, recordable CDs is 5-8 years, max.

        What do you envision happening when those CDs "expire" at that point? Copying the data down to a hard drive and re-burn every decade? Not feasible either.

        • by PhxBlue (562201)

          Yes, except the shelf life of standard, single-use, recordable CDs is 5-8 years, max.

          I'm going to guess that varies based on the quality of the disc, because I have recordable CDs past that age that still work without a hitch. You're right that paper's the best way to go, but it's not the only way.

          • by hacker (14635)
            The dye-based CDs have significantly LESS shelf life than the etched, commercial versions. Less protective layers, cheaper discs, poor-quality dyes, etc.
  • Much of what we know about past days is from written material. With move towards net everything and the decline in print as the internet changes (and I do not mean just the web; email, gopher, irc, usenet, ftp archives, et al are all prone to this problem) much of our history will be lost to generations to come purely through attrition.

    Then we have the problem of changing file formats, media which decays rapidly when compared to paper and decent inks, obsolescence of technology (try finding a laptop with a

  • by Bill Dimm (463823) on Friday May 15, 2009 @08:07AM (#27964941) Homepage

    My company links to articles on a lot of magazine websites, and I'm just amazed at how often the links become broken. Sites get redesigned and they don't bother redirecting the old URLs to the corresponding new locations. Or, even worse, they just discard all of the old articles, or random articles disappear or come up blank or mangled. Does it not occur to them that websites, search engines, and blogs are left with broken links? Do they not realize that people bookmark the articles?

    • by pjt33 (739471)

      Rather than bookmark an article I save a copy to disk. It's the surest way of being able to read it later. Even if the site's admins are competent enough to keep the URL pointing at the right place, there's no guarantee that the article won't disappear behind a paywall.

    • This why Google Notebook is (was) so nice - makes it very easy to copy (with most formatting retained), which keeping the link to where it came from.

      I've dabbled with some of the free replacements (like Zotaro) but none have been able to match the features and ease of use of Google's service.

  • I clicked on the two links listed at the bottom of the open letter to Arthur Sulzberger (both are IHT links), and both now are redirected to the correct articles on the www.nytimes.com domain. Has the NYT fixed the problem and no one has just bothered to mention that?

    • by pbhj (607776)

      Work for me too. Perhaps the web dudes at NYT were in cahoots to help him get this linkbait up.

  • I see over 1000 articles (with photos) by this guy on the Times website. And I can access all of them.
    • Read TFA more closely. He has reported for both the Times and the IHT. It's his IHT work that has disappeared, while the Times stuff is still there.

  • any good /. er could go on and on about the problems of the times website. I actually had to tell them that they needed a button so people could go back or forward one day at a time (any std site for a journal has this feature - look at say amer chem soc journals, there is a button that goes forward or back one issue)

    I have repeatedly told them their comments suck and they should have slashcode and wikipedia - can you imagine how much traffic the times website would generate if each of their great articles

  • Moving websites is a good time for purging embarrassing stuff, especially the comments section. One wonders what else is missing especially from the archive. Ah, I just read this bit; the archives were erased in the move [blorge.com]. It takes willful action to lose your own archive. At least they didn't go back into the archive and replace the negative bits with adverts, like some other online newspapers do. Job well done I guess :)
  • Peter Wayner - author of a famous and well known book on compression algorithms, which managed to survive the Big Howl of Internet due to its relatively popularity on the time it was written. It was recovered thanks to thousands of fragments found in hundreds of hard drives all over the world.

    Thomas Crampton - A supposedly journalist for the once famous New York Times. His personality is quite obscure and nothing is known about him, except for a short reference in the once famous Slashdot forum on Internet.

  • When you read the article, you find one of the main reasons he wants the articles back up is because he himself doesn't have copies of the articles. TFA and Slashdot are full of angst towards the megacorp, but nobody seems to have noted this point.

  • Interesting. I got quite upset with the IHT-NYT change a while ago for exactly this reason: many bookmarks and links to news articles that I had made throughout the years evaporated overnight, making me regret not printing or saving the text of those articles when I had the chance. But apparently the NYT has fixed it now. Crampton links to two articles of a scoop he had a few years ago, and they resolve to a new page. And a bookmark that I have on the computer I'm working on now has the same thing, suggesti

    • by fondacio (835785)

      Apologies for the reply to self, but I tried a few more links which did not resolve, but the current IHT landing page [nytimes.com] says it all: "The most recent IHT articles can now be found by searching NYTimes.com. We are in the process of moving IHT articles dating back to 1991 over to NYTimes.com. Thanks for your patience as we complete this transition."

  • The hilarious part: according to one analysis, the NYT is throwing away at least $100,000 for every month that the links remain broken."

    Analyses are a dime-a-dozen, and as we know from past experience, analysts are often biased, stupid, or insane.

    So does it really matter than one analyst came up with a number that, if true, would make NYT look foolish?

  • I don't know anything about this gentleman, but, maybe, his writings simply go against the current Illiberal pro-Democrat bias of the paper? They weren't always this way — most famously, NYT used to be against government-mandated minimum wage [ncpa.org] until 1999.

    Perhaps, they are trying to score some favors from the current government in the hopes of getting substantial financial help (a bailout [washingtontimes.com], that was, no doubt, already promised to them) and certain writers are no longer welcome?

    One does not need to be

  • "They took my work and erased it! Please mommy help me!" - That's one solution. The other solution is for this journalist to get off his fat ass, buy a personal website, and publish all his back work for everyone to see.

    You know, when I left Lockheed ten years ago most of my work ended-up in the dumpster too. That's life. If I felt it was important enough to publish, I'd simply copy it to my c: drive and later my personal website. It's a much simpler solution than whining to my ex-boss. It's MY job t

  • Welcome to the Web (Score:4, Insightful)

    by sjvn (11568) <sjvn@vna1.cLIONom minus cat> on Friday May 15, 2009 @09:46AM (#27966541) Homepage

    One of the greatest delusions that people have about the Web is that almost all information can be found on it somewhere. What total nonsense.

    Stories rot from the Web faster than newspaper print ever has or ever will. All that we're left with is the most recent version or revision, which may have *nothing* to do with what was first written.

    If you don't keep copies of your work that appears on the Web, you might as well have thrown them into a fire-place. And, as for everyone else, if you assume for even a moment that what you read on the Web about what happened even in technology news even five years reflects what people really wrote and thought at the time, you're a fool.

    It's thanks to delusions like this that, for example, people can argue sincerely that Windows is popular because it's good; and not because Microsoft forced a monopoly on hardware vendors. Almost all the reports of DoJ vs. Microsoft from the time are long gone now. The proof that Microsoft's products are only popular because Microsoft made damn sure that no one else would have a chance to compete against them has vaporized.

    The only thing newsworthy about what's happened here is that people think that stories disappearing like this is in any way what-so-ever noteworthy. It happens every day.

    Steven

  • So where did the value of $100,000 come from?

    "To buy that traffic from Google at $.20/click, you'd have to pay $100,000 a month"

    So google says its worth 20 cents a click. What if I say it's only worth a cent a click then its worth $5000, or perhaps at 0.1 cents a click its worth $500.

    All make believe. Don't tell me "an expert told you so" because I think a bunch of "experts" called "bankers" just got discredited a few months ago for overvaluing other virtual sales... ;-)

    Except I guess this is America so the

What the world *really* needs is a good Automatic Bicycle Sharpener.

Working...