Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Data Storage Science

Data Mining Goes 3D 79

Roland Piquepaille writes "At Sandia National Laboratories (SNL), a data mining and visualization software suite developed in the last two years is now able to extract information from many sources of data and to return 3D images as results. In Sandia's intelligence lab converts business data into 3-D images, the New Mexico Business Weekly reports that Sandia's Information Visualization Lab is able to search structured documents, such as scientific journals, or unstructured ones, such as the Web or an intranet. Since the lab has been established five months ago, this software has already been used to determine the potential of several partnerships with SNL. Other firms, such as Lockheed Martin, also are starting to use the lab. Let's hope that SNL releases this software as open source. It should be fun to use it. For more details and pictures, please read this overview."
This discussion has been archived. No new comments can be posted.

Data Mining Goes 3D

Comments Filter:
  • 3D images? (Score:2, Funny)

    by Dorothy 86 ( 677356 )
    I think they should report it as music!! (if you don't get the reference, it's from Dirk Gentlty's Holistic Decective Agency by Douglas Adams)
  • by lxt ( 724570 ) on Sunday July 11, 2004 @12:14PM (#9667206) Journal
    ...Excel and PowerPoint! The nightmare has been unleashed!

    "In Sandia's intelligence lab converts business data into 3-D images," ...ie, really dodgy pie charts and bar graphs!
  • But maybe not as fun as DOOM [unm.edu]for System admin job. Still seems it may have the same effect of making a rather boring job more interesting.
  • 3-d data mining.... (Score:5, Informative)

    by drfrog ( 145882 ) on Sunday July 11, 2004 @12:15PM (#9667212) Homepage
    is over 5 years old already

    google search [google.ca]

    people have been doing real time data mining in VRML since the vrml2.0 plugins came out back in 97

    • MacSpin (Score:3, Informative)

      by Anonymous Coward
      MacSpin was a 3-d data mining tool that is over 16 years old now.
  • HollywoodOS (Score:5, Funny)

    by Xpilot ( 117961 ) on Sunday July 11, 2004 @12:16PM (#9667218) Homepage
    Just think, if Hollywood bothers to at least try and get some technical stuff even remotely realistic (and look cool), they could incorporate such things into movies. But no... we get a fusion reaction which you can control with metal tentacles (just push the little flames back in!).

  • How much? (Score:5, Insightful)

    by DAldredge ( 2353 ) <SlashdotEmail@GMail.Com> on Sunday July 11, 2004 @12:19PM (#9667238) Journal
    How much will the license this for? I know the taxpayers paid for it, but it always seams like it gets exclusivly licensed to some company for next to nothing then that company charges the people that paid for it in the first place a lot of money to use it.

    • Just because Sandia is primarily funded by the United States government, does not impart that this particular project is. The private sector also has a significant interest in Sandia, especially Lockheed-Martin.

      So save what "you know" for what you really know.
      • Of course they have interest. They get top of the line R&D for next to no risk and next to no cost. These labs cost a lot of money to run and they amount they get from private business doesn't come near the amount required to keep them open.

        It's the same thing in regards to medical research, FEDGOV spends a lot on research and then licenses the results for cents on the dollar to a private company who then profits at the expense of the tax payer.

        If these companies want to do R&D and have full ri
        • I actually do R&D work for LMCO (not related to this particular project) but my work is completely internal R&D. My understanding of the IP ownership depends on the development details:

          If done at contractor site, under contractor funding, for possible contract win, contractor retains IP rights

          If done at contractor site, under government funding, IP may or may not be owned by contractor depending on contract details

          If done at government site by contractor, under government funding or joint govern

    • Re:How much? (Score:5, Insightful)

      by orthogonal ( 588627 ) on Sunday July 11, 2004 @12:58PM (#9667489) Journal
      Sandia's intelligence lab converts business data into 3-D images

      I know the taxpayers paid for it, but it always seams like it gets exclusivly [sic] licensed to some company for next to nothing then that company charges the people that paid for it in the first place a lot of money to use it.

      You're a wisely cynical man.

      In the light of the 9/11 Commission's report of the multiple failures of the CIA and FBI [thenation.com] that allowed the terrorists to attack us in 2001, in the light of Sibel Edmonds's allegations [antiwar.com] that the FBI intentionally destroyed translations of intercepted terrorist conversations, in light of the Senate Intelligence Committee's report about systemic CIA failures to provide accurate intelligence about WMDs in Iraq [pbs.org], why am I less than thrilled to discover that Sandia National Laboratories' businesses?

      When I further learn that "Sandia officials say tech firms or venture capitalists can use the lab on a per-request basis," I begin to understand that Sandia's Corporate Business Development and Partnerships aren't using my tax dollars to protect me, they're providing corporate welfare by dong the Research and Development that business wants but doesn't want to pay for.

      Remember, these are the same businesses that vociferously object [nfib.com] to government programs that might compete with them, whether that's sponsorship of Open Source Software or rural electric cooperatives or IRS software that might be efficient enough to cost H&R Block. These are the same corporations that got a provision added to the Medicare Prescription Drug Bill to prevent the government from getting discounts by buying those drugs in bulk, but which profit from research funded by the National Institutes of Health.

      These are the same corporations that want Ashcroft's Department of Justice to stop worrying so much about fixing the FBI's failures, so it can spend government time -- and your money -- prosecuting civil -- civil, not criminal -- suits against file traders [yahoo.com] under the PIRATE Act on behalf of those corporations. If you need to sue a corporation, you're on your own; maybe you'll get some coupons out of a class action suit. But if the corporation wants to sue you, they get the assistance of top government lawyers and FBI agents packing guns and warrants.

      And this just after the U.S. House passed the biggest corporate tax cuts in twenty years [washingtonpost.com], because existing direct subsidies -- or less politely, corporate welfare -- will no longer be permitted under World Trade Organization rules. Even House Republicans admit this tax cut "is riddled with special-interest provisions that would further complicate the tax code, send jobs overseas and worsen a federal deficit already at record highs."

      Does anyone really expect Sandia's going to release the source code to the data mining software to us, the citizens who have to pay for it?

      Be proud, Americans, of how fat your labor makes your corporate masters! What a joy it is to serve them! It is your privilege to work long hours and pay high taxes so your masters can buy their yachts -- and buy the laws that enslave you.

      America, Of the People, By the People, for the Pe^H^H Corporations
      • You know, occasionally, just occasionally I feel there should be a +6 that's only attainable from some obscenely large score past +5, just to memorably mark such insightful posts...

        Anyway, thanks for the post orthogonal, it is truly deserved of the moniker "insightful".

        -VolVE
      • You really need to stop getting all your news from Nova, 60 Minutes, Dateline, and Michael Moore's diary... Contrary to popular belief: 1. "The Man" doesn't really exist and if he did, he probably wouldn't be hell-bent on "keeping you down". 2. Satan does not fund big-businesses. 3. Big-businesses do not fund Satan. 4. Intelligence did not fail. Please, please, pleeaase do your homework on the 9/11 commission and the actual "failures" of the intelligence community before you make sweeping judgments li
  • by DeepDarkSky ( 111382 ) on Sunday July 11, 2004 @12:26PM (#9667292)
    Is having the knowledge, experience, and creative talent to know how to use the capability to design meaningful and easy to understand data visualization. Anybody can be an Excel monkey and drag and drop charts and graphs, but it doesn't mean they'd make sense. Leaping to 3D is not a panacea for data mining visualization, but the potential is certainly there.
  • by LostCluster ( 625375 ) * on Sunday July 11, 2004 @12:29PM (#9667309)
    Come on.... Let's hope that SNL releases this software as open source.

    Wouldn't the work of a government-funded national lab be public domain if it ever were to be released?

    As great as OSS is, the only truely free license with absoultely no restrictions is public domain, and that's what works of the government usually become.
    • Wouldn't the work of a government-funded national lab be public domain if it ever were to be released?

      As far as I know the Department of Energy labs, which include the Sandia labs, Lawrence Livermore, Los Alamos, are all managed by contractors. The contractor does work for the government, but frequently maintains co-ownership with the government for the work performed.

      I have worked with commercial contractors that worked under similar arragements. The customer paid the contractor for software deve

  • by TheQuestion ( 124286 ) on Sunday July 11, 2004 @12:30PM (#9667323) Homepage
    I wish this story went into more details into the algorithms used. Saying stuff like "we take tons of data and out comes a 3D image" is great, but what does the 3D image actually represent? What are the dimensions being graphed?

    My company manages a very large portfolio of auto loans. I'd like to know more details as to what they are actually doing so that I can judge whether we can use this technology or one like it to predict trends in our consumer base, or to develop better scoring models.
    • by Coryoth ( 254751 ) on Sunday July 11, 2004 @12:59PM (#9667496) Homepage Journal
      I wish this story went into more details into the algorithms used. Saying stuff like "we take tons of data and out comes a 3D image" is great, but what does the 3D image actually represent? What are the dimensions being graphed?

      If I had to guess I would guess that they are doing 3D Self Organizing Maps, or something very similar.

      The principle is: create a huge feature space for the documents in question (something like word counts for each document for each word in the corpus, with appropriate fixes (drop the most and least common words, do stemming etc.). You can now "visualize" the documents in a massive 20,000 dimensional space. However, what you can do, is try to create a projection from 20,000 dimensions down to 2 or 3 dimensions in a way that best preserves distances in the 20,000 dimensional space. This automatically creates a clustering of the documents as well, and you now have something that you can visualize practically. If you start doing things like labelling clusters and subsclusters by the words unique to/defining that cluster you can start to make some sense of the visualisation.

      Effectively this is just a means of doing clustering on a large document space in such a way that the final output can be visualized (instead of the sort of results you get from k-means, or heirarchical clustering, which are a lot harder to visualize in a meaningful way for laymen). The benefit of being able to visualize it in that sense is that you can "see" patterns of other document attributes by adding that to the visualization (via colors, labels, etc.) and see a global overview of those attributes across the entire document space.

      Just to reiterate: I do not know that this is what is being done, and they don't say a lot in the article, but I do have some experience in this field, and what I gleaned from the article would tend to imply an approach like this.

      Jedidiah.
    • Hi

      I should declare my hand up front and let you know I'm a co-founder of Purple Insight, the company that is referred to in another comment about this article:

      http://science.slashdot.org/comments.pl?sid=114 1 24 &threshold=1&commentsort=0&tid=134&tid=137&tid=198 &mode=thread&pid=9668022#9671027

      I'll avoid making this a commercial and talk about the techniques you ask about at a generic level. It might not surprise you to know that our product MineSet provides these techniques a
  • by Doc Ruby ( 173196 ) on Sunday July 11, 2004 @12:41PM (#9667387) Homepage Journal
    The technology is called "ClearForest", in homage to the continents of forests cleared for paper printouts of these 3D reports that PHBs will have shredded once they've "read" them.
    • paper? (Score:2, Funny)

      by cyklo ( 795952 )
      for 3D, they're going to have to carve them out of entire trunks. imagine the shredders you'd have to use...
  • Wow almost every story from Roland Piquepaille is selected into slashdot.
  • by Coryoth ( 254751 ) on Sunday July 11, 2004 @01:03PM (#9667524) Homepage Journal
    Anyone interested in doing powerful 3D data visualization should make a mandatory stop here [kitware.com]. It's an open source visualization toolkit written in C++, but with bindings for Java and Python as well. This is a very powerful and very impressive system, and ought to be rated as one of the great open source projects. It doesn't seem to get much attention - I'm not sure why.

    Have a look, and look at what it is actually capable of doing. If you want to do any sort of 3D visualization, it really is worth your time to learn a bit about VTK.

    Jedidiah.
    • Opendx (http://www.opendx.org) is another open source data visualization tool that is well worth looking at. It uses a dataflow kind of programming language with a large number of primitives and while the learning curve to do advanced things is pretty steep, doing easy stuff is, well, easy.

      It was originally an IBM product but is now open source. Thanks to IBM are do again.

      • Yes. I've spent time with OpenDX. It is good but (1) the interface is on the archaic side, and (2) yeah, there is a lot of learning involved. (3) The quality of the resulting visualisations, in terms of their interactivity was a little limited (though perhaps I simply failed to learn how to do that part).

        The idea is very nice - you simply connect together a bunch of boxes with inputs and outputs and construct a visualisation that way. It means you can do so in an entirely graphical manner, and get a goo
  • So now data-mining will look like a cross between a game of "You Don't Know Jack!" and "Lawnmower Man". Hollywood just may be right.
  • SGI had a product called "MineSet" which did this kind of stuff, only a long long time ago. Originally it was inspired by the 3D filemanager SGI did for Jurassic Park. Cool idea, but old hat :).

    --ralpht
  • I know this.
  • by wintermute42 ( 710554 ) on Sunday July 11, 2004 @02:28PM (#9668146) Homepage

    Other firms, such as Lockheed Martin, also are starting to use the lab.

    I don't find it surprising that Lockheed Martin is one of the firms "starting to use the lab". Lockheed Martin runs Sandia as a contractor for the Department of Energy. Lockheed has a builtin bias to show how applicable the work at Sandia is.

  • Didn't we see this article before?
  • Neat toy, but who is the audience? How do you begin to meaningfully interpret 3-D data? One of the fundamentals of effective communication is to know your audience - I imagine that most people able to pay for 3D data mining (i.e. business executives) aren't going to be able to make heads or tails of this sort of presentation. Visualizing spatial relations require creative abilities, something I don't see much in the typical business manager.
    • I know that on /. everyone at work who is not a programmer is a PHB, and therefore stupid, dull, unimaginative and uninteresting, but this sort of comment is simply ridiculous.

      Presenting information in a 3-D format can really help busy business people see the wood (forest for Americans) from the trees.

      You need to spend far less time and thought (and creativity, if you like) in understanding a 3-D chart than a buch of spreadsheets showing the raw data. This is a fact of life in the real business world.

      • "You need to spend far less time and thought (and creativity, if you like) in understanding a 3-D chart than a buch of spreadsheets showing the raw data. This is a fact of life in the real business world. " I disagree.
  • This is so 90s (Score:2, Informative)

    by Don Tobin ( 320926 )
    I feel like I'm playing Civilization and my agent is reporting that another civilization has just invented something my people have had for the last hour.

    Seriously, I was doing this at the Census Bureau years ago with VRML and enhanced it with those dodgy Performance Copilot (SGI) type tools. Since then products such as, oh, I don't know, Cognos and Crystal Reports (4+) have implemented 3d data set controls and reports in spades(Tivoli Business Decision Manager anyone?).

    Open source tends to lack the robu
  • And turning to the 3D graph, we see an inmistakeable cone of ignorance.
  • I have been tinkering with this since I came across it last year sometime. But it too is nothing new; first release was in 1998

    http://www.opendx.org
  • Oooh, so we can "now" show datamining results in 3d. Wait, we've been doing this with Cubes and Caves for years now (have you recently seen hydrologists getting a full idea from a 2d map? Nope, they pump the data through a viz tool [matlab would be one] and can throw it up on the walls of a "cave", or a "C-6", as we call our implimentation here). This is not news.

news: gotcha

Working...