Forgot your password?
typodupeerror
Privacy Hardware Technology

High-Tech Microphone Picks Voices From a Crowd 221

Posted by Soulskill
from the watch-your-mouth dept.
JerryQ writes with news of an impressive audio detection system from a company called Squarehead that was demonstrated during a professional basketball game. According to Wired, "325 microphones sit in a carbon-fiber disk above the stadium, and a wide-angle camera looks down on the scene from the center of this disk. All the operator has to do is pinpoint a spot on the court or field using the screen, and the Audioscope works out how far that spot is from each of the mics, corrects for delay and then synchronizes the audio from all 315 of them. The result is a microphone that can pick out the pop of a bubblegum bubble in the middle of a basketball game..."
This discussion has been archived. No new comments can be posted.

High-Tech Microphone Picks Voices From a Crowd

Comments Filter:
  • by cencithomas (721581) on Monday October 11, 2010 @03:15PM (#33862032)
    ...is it 315 or 325? Sheesh.
    • by GungaDan (195739) on Monday October 11, 2010 @03:17PM (#33862066) Homepage

      10 microphones were harmed during the posting of this story.

    • ...is it 315 or 325? Sheesh.

      Fancy slashdot web2.0 math tells us there is no difference between those numbers.

    • "can pick out the pop of a bubblegum bubble in the middle of a basketball game"

      whatever that means. I think it means the author is more interested in sounding clever than making sense. Don't you just hate that?

      Do the players really chew while playing? And why would anyone want to hear it?

      • Watch the video in TFA.
      • Re: (Score:3, Insightful)

        by Black Cardinal (19996)
        Did you actually read the article and watch the example video? This was an example shown in the video, where bubblegum being popped by someone sitting next to the coach (who was being focused upon by the system) was clearly audible above the crowd noise during a heated moment. It wasn't so much desirable as a concrete example of its effectiveness.
      • "can pick out the pop of a bubblegum bubble in the middle of a basketball game"
        whatever that means.

        If you'd STFVideo in the article you'd know what it means. You should, the video illustrates the filtering effect well. Assuming that it's not been "improved" in any way it's really quite impressive.

        • by Dthief (1700318)
          Of course its been *improved*, that's the point...isolating sounds that are normally inaudible because of too many other sounds around them
          • Of course its been *improved*, that's the point...isolating sounds that are normally inaudible because of too many other sounds around them

            He means "improved" in post-production. How do you not get that?

    • by Anonymous Coward on Monday October 11, 2010 @04:07PM (#33862608)

      The quick breakdown of responses on Slashdot:

      The last remaining nerds on Slashdot who actually like technology: "Sweet! That's an impressive display of audio recording techniques!"

      The paranoia crowd: "ZOOOOOOOMG that means THEY(tm) can listen in on you! Then they're already stealing your identity to impersonate you! MY PRIVACY IS AT RISK OHNOEZ START REBELLION NOW PLZ KTHX"

      The audiophiles: "Pfft. Everyone knows you need at least 560 microphones and analog pickups, else you'll clearly lose so much quality as to be unlistenable by any but the most primitive and underdeveloped of eardrums. Plebs."

      cencithomas: "WHERE DID THE TEN MICROPHONES GO?!?"

      • Re: (Score:3, Funny)

        by RenHoek (101570)

        Monster Cable spambot: "You know you need gold plated cables for it to work, right? I've gotta link here somewhere with some good ones.."

      • Meme-mongers: Imagine a Beowulf cluster of that array!

        Meta-commentators: (Present company excluded, well not really) Timothy!

        MAFIAACS: Oh great, they just copyrighted my gum-popping sounds.

        Insightful curmudgeons: Given sufficient sensitivity, this could be done with a tetrahedral array--50 years. Now, get off my lawn!
        • by ScrewMaster (602015) * on Monday October 11, 2010 @05:22PM (#33863312)

          Meme-mongers: Imagine a Beowulf cluster of that array! Meta-commentators: (Present company excluded, well not really) Timothy! MAFIAACS: Oh great, they just copyrighted my gum-popping sounds. Insightful curmudgeons: Given sufficient sensitivity, this could be done with a tetrahedral array--50 years. Now, get off my lawn!

          Yeah, but does it run Linux?

  • I suppose this could be used to record an entire game and then go back and track what each player was saying during the game based on their positions on the court. I'd be interested to see if this could be used in a football stadium (domed or not) with all the extra noise and people.
    • I'd be interested to see if this could be used in a football stadium (domed or not) with all the extra noise and people.

      Because of course professional basketball games are so dull, sparsely attended, and quiet that it makes a perfect test bed...

    • "I'd be interested to see if this could be used in a football stadium (domed or not)"

      You'd design the spacing of the microphone and the logic to match it based on the size and shape of the area. This array is not what you'd use. But you could certainly make an array or set of arrays which would work for a football field.

    • I suppose this could be used to record an entire game and then go back and track what each player was saying during the game based on their positions on the court.

      It can. FTFA:

      Audio from all microphones is stored in separate channels, so you can even go back and listen in on any sounds later.

      I don't know how they record and store 325 (or 315, 345 - whatever) channels of audio, but their equipment can process stored audio as well as the live feed.

    • Re: (Score:3, Informative)

      by arivanov (12034)

      This is the classic phase array antenna approach from radar tech applied to sound. Cool application though.

      In fact it is easier for sound because the amount of data per element is much smaller than in let's say a radar.

    • by HTH NE1 (675604)

      How about the trading floor of a stock exchange?

      Welcome to the Panaudion: everything you say can and will be used against you.

    • I'd be interested to see if this could be used in a football stadium (domed or not) with all the extra noise and people.

      The sound in a basketball stadium can be just as loud as in a football stadium.

      Falcon

  • It occurs to me that if you store all 325 audio streams with accurate time-codes and the relative positions of the microphones you would be able to do this at any time later on the stored sound as well. You could probably get away with much fewer than 325 microphones at some cost in quality.

    • by Sarten-X (1102295) on Monday October 11, 2010 @03:21PM (#33862116) Homepage
      In fact, that's exactly what TFA says.
    • by Animats (122034) on Monday October 11, 2010 @03:26PM (#33862190) Homepage

      It occurs to me that if you store all 325 audio streams with accurate time-codes and the relative positions of the microphones you would be able to do this at any time later on the stored sound as well. You could probably get away with much fewer than 325 microphones at some cost in quality.

      Yes. And that's already part of the system.

    • I wonder if you could do this with mobile phones ... do these provide low level access to GPS signals? (For timing and differential location correction.) You might be able to crowd source a distributed recording for reconstruction.

      • by vlm (69642)

        I wonder if you could do this with mobile phones ... do these provide low level access to GPS signals? (For timing and differential location correction.) You might be able to crowd source a distributed recording for reconstruction.

        Unless you're using 1980s era analog phones, the voice compression is going to destroy the phase relationships you need, and mask out the low level signals that you'd be adding up.

        Also the mics are usually vaguely noise canceling, otherwise think of those dorks whom have cellphone conversations in the bathroom at work, the folks on the other side would hear all kinds of flushing and ... stuff. Or maybe they do hear it but just don't care? Always wondered about that.

        • I wouldn't suggest phoning it in ... it would be recorded together with the GPS data and send over IP with only lossless coding.

      • by AndrewNeo (979708)

        GPS likely isn't accurate enough to handle the delay calculations, the accuracy being anywhere around 5-15 meters.

        • Depending on how low level the access to the GPS data is you will be able to get much better differential accuracy (especially with some temporal averaging).

          • by vlm (69642)

            Depending on how low level the access to the GPS data is you will be able to get much better differential accuracy (especially with some temporal averaging).

            So a wavelength of sound in air around 3 KHz is about five inches (rounded up). To get a couple decimal points of phase accuracy, you're going to need a similar couple decimal points relative to 5 inches. So at each data sample you need the coordinates accurate to a "carpenters level of accuracy". Not as harsh as a machinists level of accuracy but still pretty tough to achieve.

            Also you need to sync your times. I'm thinking you'll need much better than 1/3000th of a second accuracy for your sample timest

            • Once you have a ballpark differential delay you might be able to just use autocorrelation to find the needed delay for the dominant sound source in a relatively small timeframe/volume (the positional uncertainty at the microphone when translated to the source, assuming for a moment we know the exact differential delay, will actually become smaller AFAICS).

    • by internewt (640704) on Monday October 11, 2010 @03:39PM (#33862318) Journal

      This system might also be hackable, such that people can preserve their privacy and not be listened in on from hundreds of feet away.

      You simply have a microphone near your mouth, sample it, and repeat the sound out of a speaker with slight echoes with randomised delays. There must be something that could interfere with the process they use to "zoom in" on a particular sound source. Maybe if you can measure the distance to the listening device, it would be possible to manipulate the frequency of sounds you are making so as to create a standing wave or something that would cause the microphones to be overloaded or to hear nothing..... shit, maybe the tech that drives noise cancelling headphones could be used here? Who you are speaking to gets an earpiece with unedited sound piped to them, and speakers on your lapels kick out anti-sound so eavesdroppers hear nothing.

      So now in public, you just need to have strings of randomised flashing IR LEDs illuminating your face, so CCTV has a hard time capturing your image, and now something to mess with your voice so that The Man cannot listen in too! If you are thinking "paranoid fucker", I am thinking what the fuck business is it of people to listen in on me? And that's a rhetorical question: I don't need to be told to think of the children, etc..

      • by dgatwood (11270)

        No, you just need to calculate the distance between you and someone on the opposite side of the stadium, then put a speaker over there with a phase inverter and a fixed digital delay to make the times match. Won't be perfect, but should be good enough since neither your voice nor the inverted copy will carry to the opposite side of the stadium.

      • by vlm (69642)

        Maybe if you can measure the distance to the listening device, it would be possible to manipulate the frequency of sounds you are making so as to create a standing wave or something that would cause the microphones to be overloaded or to hear nothing

        Well, almost obviously, if you knew your location, and the exact location of each mic in the array, you could figure out the distance to each mic. Assuming constant speed of sound you know the time to each mic. So you make 200 or so clicks or pops each timed to saturate all the mics simultaneously. Then it doesn't matter where they're listening, you'll overload them. Works better if you have, say, thousands of click generators. Would probably make the venue sound like a field of crickets unless you hap

      • ...and now something to mess with your voice so that The Man cannot listen in too!

        Hate to break it to you man, but they've probably done that to you for a long time already. Parabolic mikes are getting better, if not very inconspicuous :)

        If you see someone carrying a 1m parabolic reflector aimed at you from a distance of 50m, better hope you didn't give them too much info already.

      • by Dthief (1700318)
        Well since you give up all rights by entering a stadium (TV peoples can use your image, for example, in all of their broadcasts) this is being used in an arena (pun intended) where there is no expectation of privacy.

        I do agree its worrisome once this gets used in public places

  • FTFY (Score:3, Insightful)

    by sheriff_p (138609) on Monday October 11, 2010 @03:18PM (#33862084)

    Surely that would be better written as "terrifying" rather than "impressive"

    • Re:FTFY (Score:5, Insightful)

      by amicusNYCL (1538833) on Monday October 11, 2010 @03:36PM (#33862288)

      How come you get terrified by an array of microphones with an impressive spatial detection capability? The thing is technically impressive, whether or not it "terrifies" a certain person is about perspective, and that person's tendency towards becoming terrified by mundane objects.

      • Re:FTFY (Score:4, Insightful)

        by wowbagger (69688) on Monday October 11, 2010 @03:45PM (#33862376) Homepage Journal

        "The thing is technically impressive, whether or not it "terrifies" a certain person is about perspective, and that person's tendency towards becoming terrified by mundane objects."

        It is not the object that is terrifying, but rather what the existence of the object, plus the current trends in behavior by our Fearless(fearful) Leaders, plus a modicum of ability to put 2 and 2 together, yielding these devices being everywhere, able to monitor all conversations in the world.

        • Re: (Score:3, Insightful)

          by amicusNYCL (1538833)

          able to monitor all conversations in the world.

          C'mon. The main reason this works so well in a basketball stadium is because everyone is sitting in their seats. When people are moving around it's going to take significantly more work to capture a single conversation, especially if you don't know their direction and speed. It's also only going to pick anything up past a certain volume level, and it's also limited by line of sight (or sound). If the person walks behind something, or turns their head away from the mic array, they lose the audio.

          plus the current trends in behavior by our Fearless(fearful) Leaders

          Seems to

          • by Sabriel (134364)

            Seems to me that all of those technical problems can be solved with a sufficient density of mic arrays and accompanying compute power. And given the rate of technological advancement, I doubt such would be all that high a bar, if it is even now.

      • > How come you get terrified by an array of microphones with an impressive spatial detection capability? The thing is technically impressive, whether or not it "terrifies" a certain person is about perspective, and that person's tendency towards becoming terrified by mundane objects.

        Well, why would anyone alive during the cold war get terrified about thousands of nukes that would effectively destroy the world? The thing is technically impressive; whether or not it "terrifies" a certain person is about

        • Sorry, did you just compare an array of microphones in a basketball arena to the combined nuclear stockpiles of the world's two most powerful countries, capable of destroying the world?

      • "obtusery" is not a word.
      • Re: (Score:3, Insightful)

        by syousef (465911)

        How come you get terrified by an array of microphones with an impressive spatial detection capability? The thing is technically impressive, whether or not it "terrifies" a certain person is about perspective, and that person's tendency towards becoming terrified by mundane objects.

        Pffftt! I had teachers with ears that could do this when I was in primary school in the 80s! Every time I talked trash about them I'd end up in detention! ;-)

    • by hoggoth (414195)

      So now I will invent a little gadget that garbles your speech and a matching headphone that ungarbles it. Just sync your 'keys' before you head out and only the paired devices will be able to understand each other. I will sell the idea to Halliburton and they will implement it using ROT-13 encoding.

      I'm not sure if it should look like the 'Cone Of Silence' from Get Smart, or like the speaking device used by Guild Navigators in Dune.

      • by dhall (1252)

        How about a simpler solution.

        A piece of paper and a pencil.

        Heck, I'm sure there's an app for that...

        • by nschubach (922175)

          Well of course, there's always video vector recognition. They will be able to tell what direction your pencil travels on page and decipher what you are writing. The same with typing on a phone keyboard. The only way to truly get around that would be a keyboard that has moving keys!

    • The two aren't mutually exclusive. Look at sharks: terrifying and impressive. And that's not all, frickin Lasers on frickin sharks: terrifying, impressive, and funny, all at the same time.

      That said, I don't really see terrifying. I assume anything I can say out loud in public can be heard by someone. If I were often in the position of having to go to basketball games to discuss things so that the government bugs can't overhear me this might be terrifying to me, but I'm not so it's not. And it's not the

    • Not really, it's hard to imagine anyone thinking of a conversation in a stadium, or indeed any crowd, as private (and if you do, your stupidity is what's really terrifying).

      The only time you have any reasonable expectation of privacy in conversation is if you're alone with someone, and the technology to listen in on that has been around for decades. If you think of this particular innovation as terrifying, it's probably because you're a social outcast and are afraid of having your face show up on TV at a
  • Now they can enforce the no cell phone usage while driving! Even those pesky key presses won't be able to hide!
  • Nothing to hide (Score:2, Interesting)

    by Anonymous Coward

    Wow... why limit it to just stadiums? You could have arrays of these things lining every street and every mall! Just imagine how many terrorists you could catch by processing all the millions/billions of conversations going on in public places. All that data would be handy for collecting evidence against criminals too, you just go back through your chatlogs (all indexed per-person with voice/facial recognition) and dig up every conversation they've ever had outside.

    • by vlm (69642)

      Wow... why limit it to just stadiums?

      Gunshot detection works pretty well. Been unclassified for about 20 years. Been installed in Wash DC for a couple years. The problem is even big brother can't handle merely the volume of gunshots, making these systems thoroughly useless. If there's a cop whom is too close, in other words under fire, it doesn't tell them anything they don't know. If there is a cop close enough to make an arrest, they are already close enough to hear and need to be looking outside for the shooter not in car at a display.

      • by nschubach (922175)

        I think OP was referring to having an array in a mall, along with the video from surveillance. When a criminal walks into a store and robs the place, the cops could then get a recorded audio history of that person from the moment they walk in the door.

        I'm not sure what you plan on capturing as far as evidence (accomplices?) but I'd bet storing 300 streams of audio for analysis later might be better than extracting that audio and tying it to an individual record.

  • Re: (Score:3, Funny)

    by scruffy-tech (1821510) on Monday October 11, 2010 @03:28PM (#33862214)
    I read the article. It went from 325 to 315 to 300. They may have gotten it down to a single mic had they kept writing.
  • by DoofusOfDeath (636671) on Monday October 11, 2010 @03:33PM (#33862262)

    This sounds like beamforming [wikipedia.org]. Submarines do this. Works great.

    • Re: (Score:3, Interesting)

      by Obfuscant (592200)
      This sounds like beamforming. Submarines do this. Works great.

      So THAT'S what that large, grey cylindrical object hanging over the heads of the crowd at the last professional basketball game I went to was. I always wondered...

      I wonder if they heard me saying "I wonder what that large grey cylindrical object hanging over our heads is", or maybe "I hope those ropes don't break."

    • by john83 (923470)
      Beamforming is only possible where you have as many microphones as sources. This is more probably some sort of blind source separation [wikipedia.org] algorithm - calculating the pseudo-inverse of a mixing matrix based on assumptions about speech.
      • Beamforming is only possible where you have as many microphones as sources.

        As many mikes as actual sources, or as many mikes as there are sources of interest?

        I think when you have very many sources of non-interest, approaches like this rely on the uninteresting sources partially cancelling each other out into background noise.

  • Just make a disc fifty feet wide, using optimal golomb ruler [wikipedia.org] placed microphones in a full hemispherical phased array [wikipedia.org] of around 10,000 microphones, hang it from a tethered helium balloon, and now you can pick out any conversation in an entire city-sized area.

    Nope, nothing to be afraid of here...
    • I suspect Brownian noise starts becoming a bit bothersome at those distances.

      • by dgatwood (11270)

        Not to mention that the signal levels at the required altitude would be way below the white noise floor for pretty much everything from the microphones to the amplifier stages.

        Say, wasn't there a South Park... no, never mind....

  • The classic Coppola movie, The Conversation [imdb.com].

  • by bytestorm (1296659) on Monday October 11, 2010 @03:49PM (#33862420)
    This is a cool application of a well used technique. http://en.wikipedia.org/wiki/Phased_array [wikipedia.org]
  • Prior Art (Score:4, Informative)

    by Anonymous Coward on Monday October 11, 2010 @03:51PM (#33862454)

    My father, would tell me stories when I was growing up about helping design a surveillance tool for ease-dropping on restaurant conversions that used the same principle. They had a map of the table layouts and you would place a pointer over the table you wanted to listen to. Mics hidden around the edge of the restaurant would capture the sound. This was back during the early 60's so they used a mechanical delay mechanism. Said it worked as well as if you had planted the mic at the table, plus you didn't have to worry about where they sat. Like many things, this is more powerful and versatile but hardly anything new.

  • The result is a microphone that can pick out the pop of a bubblegum bubble in the middle of a basketball game...

    ...is if that person brought enough gum for everyone.

  • ... add this recorded decoded demultiplexed sounds to the street view. Would be cool. Or Evil.
  • Coming soon... (Score:4, Insightful)

    by frank_adrian314159 (469671) on Monday October 11, 2010 @04:02PM (#33862566) Homepage

    ... to a political rally near you. You probably don't need particularly accurate microphone placement and, in fact, if you had precise position and velocity coordinates of each of the mikes at any given time, they could even be moving.

    • by choongiri (840652)
      On the other hand, I do wonder how difficult it would be to create this using a distributed network of cell phones, to literally crowd-source listening in on - say - what politicians are saying to each other apparently out of ear-shot of the crowd. I'd think the challenge would be sufficiently precise location awareness.
    • Political rally? Bah. I expect them to be attached to quadrocoptors, hovering around every large city after the next terrorist attack. It will be like Half Life 2. Pick up the can.

    • by pclminion (145572)

      ... to a political rally near you. You probably don't need particularly accurate microphone placement and, in fact, if you had precise position and velocity coordinates of each of the mikes at any given time, they could even be moving.

      Provided you know that such a system is in operation, I'm sure there are some rather simple countermeasures that can be taken. The system's abilities seem frightening, but how well does it perform when it's being deliberately attacked? I'm going to hold off worrying about th

  • a 4-mic tetrahedral array [wikipedia.org] can do the same thing.
  • Turbo super cool (Score:4, Insightful)

    by Swarley (1795754) on Monday October 11, 2010 @04:06PM (#33862604)

    Just in case anybody is confused, that is cool as shit. That's all.

  • ..description. But it sounds like it is in their business plan.
  • So I guess your local terrorist cells won't be having their strategy meetings under the cover of all that stadium noise any more. Of course it also means that Randy Dandy might wanna think twice before sitting in the stands with his best bud and confessing that he's been cheating on his wife.

    • by Culture20 (968837)

      So I guess your local terrorist cells won't be having their strategy meetings under the cover of all that stadium noise any more. Of course it also means that Randy Dandy might wanna think twice before sitting in the stands with his best bud and confessing that he's been cheating on his wife.

      A several-hour long event, with thousands of people... I doubt they're going to have humans listening to every conversation (that would be a lot of paid man-hours). So they'll be using it for after-the-fact evidence of a crime (just like most security cameras), and _maybe_ they'll filter the realtime audio through a few computers to listen for some keywords.

      • by macraig (621737)

        Yeah, but who will be able to "rent" the use of this thing once installed? Only the government, or anyone with enough cash to make them salivate? If the latter, then one can imagine a scenario where some private investigator with a wealthy client buys the use of the thing for some period hoping to catch his target saying something naughty.

  • This could be a boon for speech recognition systems, especially for use in areas with lots of environmental noise, or even just a little.

    Maybe even the effort in clearing out the environmental noise will lead to the ability to clean out the "noise" (accents, minor physical fluctuations) from a person's speech- perhaps to such a point that the complexity of the software speech recognition problem is reduced.

  • Deployed at public gatherings, the super-mics could be zoomed in to eavesdrop on conversations between suspicious persons, or pretty much anyone the cops want to listen in on. Are you scared yet?

    Are you afraid yet? Better not say something listening politicians don't like.

    Falcon

  • by Dammital (220641) on Tuesday October 12, 2010 @10:08AM (#33869210)
    I do some community theater work as a hobby - amateur stuff - and wonder if something like this could be used to track multiple actors on stage? Might be better than fitting them all with transmitters and lavaliers. Targeting would become the next problem, I guess.

"Consistency requires you to be as ignorant today as you were a year ago." -- Bernard Berenson

Working...