Data Mining Goes 3D 79
Roland Piquepaille writes "At Sandia National Laboratories (SNL), a data mining and visualization software suite developed in the last two years is now able to extract information from many sources of data and to return 3D images as results. In Sandia's intelligence lab converts business data into 3-D images, the New Mexico Business Weekly reports that Sandia's Information Visualization Lab is able to search structured documents, such as scientific journals, or unstructured ones, such as the Web or an intranet. Since the lab has been established five months ago, this software has already been used to determine the potential of several partnerships with SNL. Other firms, such as Lockheed Martin, also are starting to use the lab. Let's hope that SNL releases this software as open source. It should be fun to use it. For more details and pictures, please read this overview."
3D images? (Score:2, Funny)
Re:Jokes that have to be explained... (Score:1)
Oh god! They've discovered... (Score:4, Funny)
"In Sandia's intelligence lab converts business data into 3-D images,"
Re:Oh god! They've discovered... (Score:2)
Re:Oh god! They've discovered... (Score:2)
Re:Oh god! They've discovered... (Score:2)
PowerPoint doesn't, but Excel certainly has true 3D, at least as the term is normally used in computer usage. The Sandia stuff seems to be different in that it does higher dimensional representation.
Re:Oh god! They've discovered... (Score:2)
Re:Oh god! They've discovered... (Score:1, Informative)
I am in the data mining field, so I really dont seem anything as "new" tech here.
Re:Oh god! They've discovered... (Score:3, Funny)
Sounds fun (Score:1)
3-d data mining.... (Score:5, Informative)
google search [google.ca]
people have been doing real time data mining in VRML since the vrml2.0 plugins came out back in 97
MacSpin (Score:3, Informative)
Re:MacSpin (Score:1)
do the posters & moderators on here even check their facts before posting?
HollywoodOS (Score:5, Funny)
How much? (Score:5, Insightful)
Re:How much? (Score:1)
So save what "you know" for what you really know.
Re:How much? (Score:1)
It's the same thing in regards to medical research, FEDGOV spends a lot on research and then licenses the results for cents on the dollar to a private company who then profits at the expense of the tax payer.
If these companies want to do R&D and have full ri
Re:How much? (Score:2)
If done at contractor site, under contractor funding, for possible contract win, contractor retains IP rights
If done at contractor site, under government funding, IP may or may not be owned by contractor depending on contract details
If done at government site by contractor, under government funding or joint govern
Re:How much? (Score:5, Insightful)
I know the taxpayers paid for it, but it always seams like it gets exclusivly [sic] licensed to some company for next to nothing then that company charges the people that paid for it in the first place a lot of money to use it.
You're a wisely cynical man.
In the light of the 9/11 Commission's report of the multiple failures of the CIA and FBI [thenation.com] that allowed the terrorists to attack us in 2001, in the light of Sibel Edmonds's allegations [antiwar.com] that the FBI intentionally destroyed translations of intercepted terrorist conversations, in light of the Senate Intelligence Committee's report about systemic CIA failures to provide accurate intelligence about WMDs in Iraq [pbs.org], why am I less than thrilled to discover that Sandia National Laboratories' businesses?
When I further learn that "Sandia officials say tech firms or venture capitalists can use the lab on a per-request basis," I begin to understand that Sandia's Corporate Business Development and Partnerships aren't using my tax dollars to protect me, they're providing corporate welfare by dong the Research and Development that business wants but doesn't want to pay for.
Remember, these are the same businesses that vociferously object [nfib.com] to government programs that might compete with them, whether that's sponsorship of Open Source Software or rural electric cooperatives or IRS software that might be efficient enough to cost H&R Block. These are the same corporations that got a provision added to the Medicare Prescription Drug Bill to prevent the government from getting discounts by buying those drugs in bulk, but which profit from research funded by the National Institutes of Health.
These are the same corporations that want Ashcroft's Department of Justice to stop worrying so much about fixing the FBI's failures, so it can spend government time -- and your money -- prosecuting civil -- civil, not criminal -- suits against file traders [yahoo.com] under the PIRATE Act on behalf of those corporations. If you need to sue a corporation, you're on your own; maybe you'll get some coupons out of a class action suit. But if the corporation wants to sue you, they get the assistance of top government lawyers and FBI agents packing guns and warrants.
And this just after the U.S. House passed the biggest corporate tax cuts in twenty years [washingtonpost.com], because existing direct subsidies -- or less politely, corporate welfare -- will no longer be permitted under World Trade Organization rules. Even House Republicans admit this tax cut "is riddled with special-interest provisions that would further complicate the tax code, send jobs overseas and worsen a federal deficit already at record highs."
Does anyone really expect Sandia's going to release the source code to the data mining software to us, the citizens who have to pay for it?
Be proud, Americans, of how fat your labor makes your corporate masters! What a joy it is to serve them! It is your privilege to work long hours and pay high taxes so your masters can buy their yachts -- and buy the laws that enslave you.
America, Of the People, By the People, for the Pe^H^H Corporations
Re:How much? (Score:1)
Anyway, thanks for the post orthogonal, it is truly deserved of the moniker "insightful".
-VolVE
Holy cow... (Score:1)
Paranoia is not an attractive trait (Score:1)
Paranoia is really not becoming of anyone and it's dangerous to your health as the constant looking behind your shoulder can cause whiplash. Take a deep breath, calm down, and put that brain to work. Proverbially speaking, money corrupts. Does that mean that everyone with an extra penny is a little bit more likely to kick you in the teeth for spite? To me, it means that the wealthy philanthropists are less attractive to the media than the wealthy misantrhopes.
Regarding intellige
More important than the capability... (Score:4, Insightful)
OSS can't be everything... (Score:4, Interesting)
Wouldn't the work of a government-funded national lab be public domain if it ever were to be released?
As great as OSS is, the only truely free license with absoultely no restrictions is public domain, and that's what works of the government usually become.
Re:OSS can't be everything... (Score:1, Insightful)
Yes, he should have that right. And so should you, and so should every other citizen.
Re:OSS can't be everything... (Score:3, Informative)
Wouldn't the work of a government-funded national lab be public domain if it ever were to be released?
As far as I know the Department of Energy labs, which include the Sandia labs, Lawrence Livermore, Los Alamos, are all managed by contractors. The contractor does work for the government, but frequently maintains co-ownership with the government for the work performed.
I have worked with commercial contractors that worked under similar arragements. The customer paid the contractor for software deve
Light on details... (Score:5, Insightful)
My company manages a very large portfolio of auto loans. I'd like to know more details as to what they are actually doing so that I can judge whether we can use this technology or one like it to predict trends in our consumer base, or to develop better scoring models.
Re:Light on details... (Score:5, Informative)
If I had to guess I would guess that they are doing 3D Self Organizing Maps, or something very similar.
The principle is: create a huge feature space for the documents in question (something like word counts for each document for each word in the corpus, with appropriate fixes (drop the most and least common words, do stemming etc.). You can now "visualize" the documents in a massive 20,000 dimensional space. However, what you can do, is try to create a projection from 20,000 dimensions down to 2 or 3 dimensions in a way that best preserves distances in the 20,000 dimensional space. This automatically creates a clustering of the documents as well, and you now have something that you can visualize practically. If you start doing things like labelling clusters and subsclusters by the words unique to/defining that cluster you can start to make some sense of the visualisation.
Effectively this is just a means of doing clustering on a large document space in such a way that the final output can be visualized (instead of the sort of results you get from k-means, or heirarchical clustering, which are a lot harder to visualize in a meaningful way for laymen). The benefit of being able to visualize it in that sense is that you can "see" patterns of other document attributes by adding that to the visualization (via colors, labels, etc.) and see a global overview of those attributes across the entire document space.
Just to reiterate: I do not know that this is what is being done, and they don't say a lot in the article, but I do have some experience in this field, and what I gleaned from the article would tend to imply an approach like this.
Jedidiah.
Re:Light on details... (Score:1)
I should declare my hand up front and let you know I'm a co-founder of Purple Insight, the company that is referred to in another comment about this article:
http://science.slashdot.org/comments.pl?sid=114 1 24 &threshold=1&commentsort=0&tid=134&tid=137&tid=198 &mode=thread&pid=9668022#9671027
I'll avoid making this a commercial and talk about the techniques you ask about at a generic level. It might not surprise you to know that our product MineSet provides these techniques a
for the trees (Score:5, Funny)
paper? (Score:2, Funny)
ok let's move to piquepaille blog (Score:3, Insightful)
3D data visualization (Score:4, Informative)
Have a look, and look at what it is actually capable of doing. If you want to do any sort of 3D visualization, it really is worth your time to learn a bit about VTK.
Jedidiah.
Re:3D data visualization (Score:2)
It was originally an IBM product but is now open source. Thanks to IBM are do again.
Re:3D data visualization (Score:2)
The idea is very nice - you simply connect together a bunch of boxes with inputs and outputs and construct a visualisation that way. It means you can do so in an entirely graphical manner, and get a goo
Like the movie Hax0rs.... (Score:1)
Silicon Graphics MineSet (Score:2, Informative)
--ralpht
Data Mining *WENT* 3D... (Score:2, Interesting)
Some nice screen shots there too
Mineset detail [purpleinsight.com]
Network Analysis [purpleinsight.com]
Intrusion Detection [purpleinsight.com]
Fraud Detection [purpleinsight.com]
This is UNIX (Score:1)
No surprise that Lockheed uses Sandia work (Score:3, Informative)
Other firms, such as Lockheed Martin, also are starting to use the lab.
I don't find it surprising that Lockheed Martin is one of the firms "starting to use the lab". Lockheed Martin runs Sandia as a contractor for the Department of Energy. Lockheed has a builtin bias to show how applicable the work at Sandia is.
Re: (Score:1)
Didn't Roland the Plogger post this already? (Score:2)
Who will use this? (Score:1)
Re:Who will use this? (Score:1)
Presenting information in a 3-D format can really help busy business people see the wood (forest for Americans) from the trees.
You need to spend far less time and thought (and creativity, if you like) in understanding a 3-D chart than a buch of spreadsheets showing the raw data. This is a fact of life in the real business world.
Re:Who will use this? (Score:1)
This is so 90s (Score:2, Informative)
Seriously, I was doing this at the Census Bureau years ago with VRML and enhanced it with those dodgy Performance Copilot (SGI) type tools. Since then products such as, oh, I don't know, Cognos and Crystal Reports (4+) have implemented 3d data set controls and reports in spades(Tivoli Business Decision Manager anyone?).
Open source tends to lack the robu
ObSimpsons (Score:2, Funny)
Open Source Data Visualisation based on IBM code (Score:2, Informative)
http://www.opendx.org
*yawn* (Score:2)