Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Graphics Hardware IT

Visualizing System Latency 68

ChelleChelle writes "Latency has a direct impact on performance — thus, in order to identify performance issues it is absolutely essential to understand latency. With the introduction of DTrace it is now possible to measure latency at arbitrary points; the problem, however, is how to visually present this data in an effective manner. Toward this end, heat maps can be a powerful tool. When I/O latency is presented as a visual heat map, some intriguing and beautiful patterns can emerge. These patterns provide insight into how a system is actually performing and what kinds of latency end-user applications experience."
This discussion has been archived. No new comments can be posted.

Visualizing System Latency

Comments Filter:
  • mapping latency in a system using colored maps representing throughput has been a tool of db and network sysadmins for many many MANY years.
    • Re: (Score:1, Informative)

      by Anonymous Coward

      How bout you RTFA before you make you're smartass comments, since yours is almost a direct fucking quote from it. However, this isn't about measuring network latency, it's about disk latency, something that until recently was extraordinarily hard to measure.

      • by Jorl17 ( 1716772 )
        You're = your in this case. Other than that, nice comment.
      • something that until recently was extraordinarily hard to measure.


        Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
        hda 0.00 0.00 114.85 0.00 0.45 0.00 8.00 0.73 6.28 6.34 72.87

        Where await and svctm are average wait (milliseconds) for the disk & queue and service time for the disk.

        Or do you mean something else?


        • Re: (Score:3, Insightful)

          by forkazoo ( 138186 )


          Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
          hda 0.00 0.00 114.85 0.00 0.45 0.00 8.00 0.73 6.28 6.34 72.87

          Where await and svctm are average wait (milliseconds) for the disk & queue and service time for the disk.

          Or do you mean something else?

          The data presented in the article are actually quite a bit more subtle and interesting than the summary data you've got there. It's probably be impossible to notice the effects of the "icy lake" phenomenon they describe with av

          • by Xeleema ( 453073 )

            the summary data you've got there

            Funny. I recall the command syntax for that one lets you setup intervals per second. That would be the "black foot" that gets you out of the "icy lake" phenomenon you describe.

  • I guess shouting at systems to make them start working has the opposite effect. Who knew a server was so emotional.
  • by PPalmgren ( 1009823 ) on Wednesday June 02, 2010 @05:08PM (#32437332)

    Informative article, all on one page, not chock full of ads. Now excuse me while I stock my bunker.

    • Re: (Score:3, Informative)

      by aicrules ( 819392 )
      All truly informative articles follow this paradigm. You only need the multi-page, multi-ad to pay for content that very few people will read because it's not that informative or interesting.
    • I really like ACM Queue, which regularly prints articles for practitioners about things which both we and our more academic colleagues care.

      I recommend it, and on rare occasions, contribute [slashdot.org].


  • by bzdang ( 819783 ) on Wednesday June 02, 2010 @05:37PM (#32437684)
    Back in the day, working at an instrumentation company as a mechanical guy, I stopped to watch the senior electronic design engineer who was doing something that looked interesting. He had an old persistence-type storage oscilloscope hooked up to the rack-mount computer for a new instrument system and was watching the scope display, which was producing some fascinating patterns. Knowing f'all about this stuff but intrigued, I asked him to explain what was happening. He explained (and I'll butcher the explanation with layman's terms) that he was using d/a converters on the high and low bytes of the program address? to drive the x and y axes of the scope, and watching to see where, in the software, that the processor was spending much of it's time. He pointed to a hot spot on the scope display and said that this was where he would concentrate on optimizing his code. Fwiw, I thought that was pretty cool.
    • by harrkev ( 623093 )

      This must have been a long time ago, back when you had easy access to the address lines.

      Now, that same job would be VERY difficult! Most data accesses occur to data in the cache, which is not brought out to pins outside of the processor. And when memory accesses do happen, they happen over dedicated DDR address lines, which are very high speed (hard to probe), and the address lines are used to access both rows and columns, so some external circuitry is needed in order to determine what the real address is

      • Re: (Score:1, Troll)

        by pipatron ( 966506 )
        OMG! Thanks for telling us this, I bet no one that knows what a computer program is knew this!
      • by tuomoks ( 246421 )

        Yep, probably very long time ago, not that it was easy even in mainframes but very useful, no overhead to measure, as you maybe know - a mainframe is happy when 101% busy - the measurement overhead is very often a bad thing! It was fun, really, but reading the results wasn't always easy - is anything? Later on 80's / 90's simulating, estimating, measuring, etc file / disk / network systems the heat maps created with our hardware people on channels, controllers, disks, caches, DMA, etc timings / sizes / rate

  • After reading the article, this idea of a "heat map" or frequency distribution mapping (of sorts) can (sort of) be summed in:

    A particular advantage of heat-map visualization is the ability to see outliers.

    I find this particularly interesting as this graphically now allows a way to "filter" the real outlier out from a sea of data. Also,

    Instead of a random distribution, latency is grouped together at various levels that rise and fall over time, producing lines in a pattern that became known as the icy lake. This was unexpected, especially considering the simplicity of the workload.

    And concluding the section on what they dub as the "icy lake"...

    To summarize what we know about the icy lake: lines come from single disks, and disk pairs cause increasing and decreasing latency. The actual reason for the latency difference over time that seeds this pattern has not been pinpointed; what causes the rate of increase/decrease to change (change in slope seen in figure 5) is also unknown; and, the higher latency line seen in the single-disk pool (figure 4) is also not yet understood. Visualizing latency in this way clearly poses more questions than it provides answers.

    Without actually seeing the data or knowing the specifics of latency, from a pure mathematical standpoint I wonder what would result if one treated the set of numbers (from each disk) as a r

  • ...it's a shame that instrumentation of things such as EMC's PowerPath are a little painful. I guess there will always be gaps where vendor meets vendor and closed source meets open source, but it remains rather complex to analyse what's happening in Solaris with PowerPath and some Storage Foundation stirred in for good measure. Impossible? No...but maybe we'd all benefit from a little more interoperabilty?

    It's a great article though - Brendan's a DTrace authority is impressive.

  • easy. (Score:3, Funny)

    by jd2112 ( 1535857 ) on Wednesday June 02, 2010 @08:56PM (#32439634)
    Take, for example, AT&T Network performance:
    Current: Snail
    Expected, after customers leave in droves over data plan changes: Snail on meth (see yesterday /. article)
    Expected, once AT&T upgrades equipment: Sloth on vallium
  • Heat? (Score:1, Funny)

    Heat kinda makes me slow too...

"Everyone's head is a cheap movie show." -- Jeff G. Bone