Visualizing System Latency 68
ChelleChelle writes "Latency has a direct impact on performance — thus, in order to identify performance issues it is absolutely essential to understand latency. With the introduction of DTrace it is now possible to measure latency at arbitrary points; the problem, however, is how to visually present this data in an effective manner. Toward this end, heat maps can be a powerful tool. When I/O latency is presented as a visual heat map, some intriguing and beautiful patterns can emerge. These patterns provide insight into how a system is actually performing and what kinds of latency end-user applications experience."
another solution to an already solved problem... (Score:1, Insightful)
Re:pretty graphs (Score:5, Insightful)
These visualizations are used to condense the information gathered on one second intervals from running systems. Any graph of substantially advanced material is going to require explanation until you understand what is being measured, how it is being graphed, and how this information translates in real world performance.
Of course a casual reader from the net needs to read text to understand what is going on. These aren't sales figure pie-charts and shouldn't necessarily be accessible for uninformed parties.
On another note.. Do you think casual readers would have any more success interpreting the raw data files? Anyhow, I am interested in the technique as it is not one I am currently using. With a little practice this may be a good at a glance technique.
Re:pretty graphs (Score:1, Insightful)
The article presented plenty of information related to it's topic. The topic was that using a heat map to describe latency is more useful than simple averages and maximums displayed as line graphs. The article then analyzed certain interesting cases were a heat map had information that would not have existed in a line graph. What you are griping about is that the topic itself is simple and that the article is full of individual analyses that provide support for the topic.
Re:another solution to an already solved problem.. (Score:3, Insightful)
The data presented in the article are actually quite a bit more subtle and interesting than the summary data you've got there. It's probably be impossible to notice the effects of the "icy lake" phenomenon they describe with average summary data like that, or to appreciate the effect of shouting. (Most IO's happen relatively quickly during the shouting, so the average doesn't skew up very high. What's remarkable about the shouting is the sudden burst of outliers indicating a few accesses with terrible performance.)
Re:pretty graphs (Score:2, Insightful)
That's the point, a good engineer's (or scientist's) response to new data that they can't fully explain is generally unmitigated glee, it means they've found something new. My takeaway from the article is, "try this new technique/tool, you'll see new data".
On another note, I've done some very basic analysis of disk performance at work, and this approach would have allowed me to be much more confident in my results. As it was, basically all I could do when comparing disks and filesystems was use iozone to characterize the "knee points" the article keeps mentioning, and try to map changes in aggregate numbers to saturation of various interfaces and/or devices. This method for actually getting sampling data for latency, and potentially from real workloads even, would have been extremely helpful.