Too Much Data? Then 'Good Enough' Is Good Enough 56

Posted by timothy on Thursday June 02, 2011 @07:06PM from the ready-when-it-ships dept.

ChelleChelle writes "While classic systems could offer crisp answers due to the relatively small amount of data they contained, today's systems hold humongous amounts of data content — thus, the data quality and meaning is often fuzzy. In this article, Microsoft's Pat Helland examines the ways in which today's answers differ from what we used to expect, before moving on to state the criteria for a new theory and taxonomy of data."

Too Much Data? Then 'Good Enough' Is Good Enough

This discussion has been archived. No new comments can be posted.

Search 56 Comments Log In/Create an Account

Comments Filter:

And there was much rejoicing... "yay." (Score:5, Interesting)

by VortexCortex ( 1117377 ) writes: <VortexCortexNO@S ... t-retrograde.com> on Thursday June 02, 2011 @07:33PM (#36326440)

A bunch of rambling self-evident or speculative statements, followed by conclusion:
Conclusion
NoSQL systems are emerging because the world of data is changing. The size and heterogeneity of data means that the old guarantees simply cannot be met. Fortunately, we are learning how to meet the needs of business in ways outside of the old and classic database.
Which was apparent to everyone, and missed the real point: We have lots of data, and we're too impatient to wait for it to be aggregated, synchronized and processed. There goes 10 minutes of my life I'll never get back.
Here's a hint: People working on the solutions to this problem work in the financial sector and in quantum physics.

Too Long; Do not Read (Score:5, Interesting)

by Comrade Ogilvy ( 1719488 ) writes: on Thursday June 02, 2011 @07:35PM (#36326456)

The researcher is just throwing together a bunch of problems that have existed, in some fashion, for a very long time, and concludes with open questions rather than even vague proposals for solutions. So I would say this article is both too detailed, and not detailed enough.

Confused and incomplete (Score:4, Interesting)

by lucm ( 889690 ) writes: on Thursday June 02, 2011 @08:28PM (#36326838)

This article is confusing because most of the verbiage is made up by the author (such as "inside" or "locked" data). It is also misleading because it seems to indicate that structured and unstructured data usage is the same. Well it's not - a very large proportion of unstructured data is blog posts and emails but the amount of search and aggregation that is performed on this type of information outside of a few major companies (such as Google) is very low, which makes this usage a niche and not a trend maker.
The reality is that there are three categories of data that are relevant for databases: numbers, text and spatial. Everything else, which falls under the umbrella of "binary", is very unlikely to benefit from a database engine; only the metada can be manipulated and this metadata falls under one of the other categories and is a very good target for ETL. And so far nobody came up with a reliable way to search binary, such as video or audio, without relying on heavy indexing, metadata or any kind of transformation that takes binary and make it text data.
If a piece of data cannot be searched or aggregated, it does not belong in a database, it belongs on a filesystem. Anything can be done with blob columns but performance is usually not very good because the database engine cache is not designed for large objects. NoSql or not.
Also there is so much happening with storage infrastructure, such as sub-volume tiering or block-level replication, any analysis of data that does not take a look at storage is flawed.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Too Much Data? Then 'Good Enough' Is Good Enough 56

Too Much Data? Then 'Good Enough' Is Good Enough More Login

Too Much Data? Then 'Good Enough' Is Good Enough

And there was much rejoicing... "yay." (Score:5, Interesting)

Too Long; Do not Read (Score:5, Interesting)

Confused and incomplete (Score:4, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot