Slashdot is powered by your submissions, so send in your scoop


Forgot your password?

RockBox + Refurbished MP3 Players = Crowdsourced Audio Capture 66

An anonymous reader writes "Looking for an inexpensive means to capture audio from a dynamically moving crowd, I sampled many MP3 players' recording capabilities. Ultimately the best bang-for-the-buck was refurbished SanDisk Sansa Clip+ devices ($26/ea) loaded with (open source) RockBox firmware. The most massively multi-track event was a thorium conference in Chicago where many attendees wore a Clip+. Volunteers worked the room with cameras, and audio capture was decoupled from video capture. It looked like this. Despite having (higher quality) ZOOM H1n and wireless mics, I've continued to use the RockBox-ified Clip+ devices ... even if the H1n is running, the Clip+ serves as backup. There's no worry about interference or staying within wireless mic range. The devices have 4GB capacity, and RockBox allows WAV capture. They'll run at least 5 hours before the battery is depleted (with lots of storage left over). I would suggest sticking with 44kHz (mono) capture, as 48kHz is unreliable. To get an idea of their sound quality, here is a 10-person dinner conversation (about thorium molten salt nuclear reactors) in a very busy restaurant. I don't know how else I could have isolated everyone's dialog for so little money. (And I would NOT recommend Clip+ with factory firmware... they only support 22kHz and levels are too high for clipping on people's collars.)" This video incorporating much of that captured audio is worth watching for its content as well as the interesting repurposing.
This discussion has been archived. No new comments can be posted.

RockBox + Refurbished MP3 Players = Crowdsourced Audio Capture

Comments Filter:
  • huh? (Score:5, Insightful)

    by swell ( 195815 ) <> on Monday October 01, 2012 @02:15AM (#41510869)

    Sorry, I have no idea what TFA is about. Please help.

    • tl:dr Recipe for recording the audio of multiple individuals in a large crowd.


      Sandisk Sansa Clip+ MP3 Player - []
      Rockbox - []


      Install Rockbox (open source firmware for MP3 players) on the Sansa Clip+. Configure to record on the Sansa Clip+ microphone in .wav format. Give a Sansa Clip+ to every person you want to record the audio for. Have every person start recording at roughly the same time, leave for 5 hours.

      Gather all Sansa Clip+s at the end of the session, and extract the .wav file. 10-participants = 10-track equivalent audio recording of the session.

      Mix and fade between the tracks to isolate the audio of single conversations between participants.

      He basically has created a relatively inexpensive and reliable way to get this audio. Much like using multiple Go Pro cameras to record action of sports events beats out using professional equipment (and in some ways has become professional equipment). He's arguing that the Sansa Clip+ together with the Rockbox open source firmware, is a better solution than using professional radio mic's and then having recording equipment receive those signals and store them on disk for editing later.

      I've no idea how "crowdsourced" fits into this though, nor how this is anything more than an advert even though the solution is a little interesting. It's useful enough and potentially cheap that you might imagine giving everyone at a Ted one of these as the conversations caught off-record might be even more valuable than the sessions.

      • by Lumpy ( 12016 )

        Nobody uses GoPro cameras. They use the Go Pro Hero HD2 pro version ofthe camera. Huge difference between the low end crappy GoPro and the Professional model in video quality.

        Although audio still sucks horribly on all Go Pro cameras.

      • I don't think the article was meant to mean the approach to audio/video capture they took was "better" than using professional body-pack mics and professional recording gear. I think the point was how such could be accomplished when funds aren't available for the professional gear...

        After having watched a bit of the video they linked, I'd say it did rather well.

      • I've no idea how "crowdsourced" fits into this though, nor how this is anything more than an advert even though the solution is a little interesting.

        RockBox is an open source firmware replacement for the Sansas. Also, he's (sort of) getting his audio from crowd members, instead of a room mic.

      • The real tl:dr, in line with the anniversary mo(o)d... ;-)

        Great idea BTW. Now just think of the kind of footage (including audio) we'll get when everyone is wearing/wielding their Google Glasses (or Sights [] for that matter ;-)) in just a few years (actually, everyone minus the millions who'll get jailed for accidently looking at or listening to anything copyrighted for more than 30 milliseconds while on).
  • Lots of work? (Score:4, Interesting)

    by mpoulton ( 689851 ) on Monday October 01, 2012 @02:20AM (#41510887)
    Maybe I'm misunderstanding the process here, but this seems like it would create a HUGE amount of editing work. Are you manually switching which recorder's audio is used as different people speak? In other words, editing the video using as many simultaneous audio tracks as there are recorders, syncing them, and using the best one at any given instant during the video? That seems like it would add huge amounts of editing time.
    • Re:Lots of work? (Score:5, Insightful)

      by Anonymous Coward on Monday October 01, 2012 @02:31AM (#41510913)

      That depends, there are some applications out there that can align audio automatically (PluralEyes: for example, so then all you would need to do is name the track after the person who it relates to, and alter the levels as needed. All video creation requires a "huge" amount of editing work.

      • Re:Lots of work? (Score:5, Insightful)

        by Anonymous Coward on Monday October 01, 2012 @02:49AM (#41510975)

        All video creation requires a "huge" amount of editing work.

        Exactly. Having dedicated audio sources for all speakers is great to have, and some increased editing time is worth it if your product is going to be higher quality.

        It sucks to have to struggle to hear what's going on in a video and live events can be terribly chaotic. Having well planned audio capture is critical to reducing your stress. This is a clever use of cheap tech, and I may have to give it a shot with my old 2gb clip floating around in my tech bins. If only there was a way to pipe a proper lav into it...

      • Re:Lots of work? (Score:4, Insightful)

        by bertok ( 226922 ) on Monday October 01, 2012 @03:42AM (#41511145)

        Just altering the levels provides a lot of isolation (as seen in the video clips), but I have to wonder if there's an audio equivalent of "image stacking" or Photosynth, that would correlate all of the audio streams, build a "model" of the audio-scape, and allow noise to be cancelled out. Or more accurately, allow a voice to be extracted with a higher specificity than just 100% of one source.

        I'm sensing that we're on the cusp of affordable setups where instead of just a few microphones, rooms could be set up with hundreds of microphones recording in parallel, with analysis done to track and extract individual sound sources moving in 3D. I suspect that a modern GPU already has the computer power, or will soon. This would allow individual speakers to be isolated even if they weren't set up with little clip-on recorders ahead of time.

        • Re: (Score:3, Informative)

          by Anonymous Coward
          I hate to post links to commercial products in a technical discussion, but 3D capture of sounds (as in "you can focus in real-time at any point of a room and listen to whatever happens there) already exists:


          See also "microphone arrays" on google. Plenty of research in the past decades and for the coming ones. []
          • Re:Lots of work? (Score:5, Informative)

            by bertok ( 226922 ) on Monday October 01, 2012 @05:32AM (#41511465)

            I've seen this MIT project [] before, but just like that product you linked, they all seem to be about "regular" arrays or arrangements.

            I'm thinking more along the lines of ad-hoc arrangements of microphones, which is more like what Photosynth does -- it arranges arbitrary photos together to make a 3D scene, instead of taking specific, precisely aligned photos.

            One interesting bit about the MIT project is that they have 1,020 microphones -- a world record -- generating 50MB/sec of data. A quick back-of-the-envelope calculation verifies that this represents 44.1Khz at 8 bits per sample. If you think about it, this amount of data is peanuts to a modern PC. Just one high-end GPU might have 200GB/sec of memory bandwidth and over 2 teraflops of processing power! This translates to about 38,000 operations per sound sample, in real time, at 32-bit precision. That should be enough to track moving sound sources, figure out what's an echo and what isn't, correlate sounds across multiple microphones, perform doppler-shift analysis, etc...

            Going to higher numbers of microphones ought to be easy, and could allow some fantastic applications, as well as some scary ones. There would be enough redundancy in the data to build a 3D scene with tracking of both moving sound sources and moving microphones. It may even be possible to determine room geometry, and the movement of large objects could be tracked based on their interaction with the sound field.

            One application I can think of would be for capturing sound during movie filming. Often, studios have to discard the recorded sound and re-dub everything because of background noises, but this kind of technology would allow the director to perform arbitrary filtering after-the-fact, comparable to the light-field cameras that allow "refocusing" after an image has been captured. An actors voice could be picked out and made louder, everything with a source "behind the camera" could be edited out, and surround sound effects could be generated from any scene setup.

      • Re: (Score:3, Informative)

        by gordm ( 562752 )
        Have used PluralEyes but find not much harder to sync manually. Make 3 loud clapping sounds once recorders are all running, manually sync to that in timeline. The vast majority of the audio can't be put in sync manually because the audio is so different for each perspective (for 5 hours) compared to the 3 seconds where identical clapping can be heard. Ideally the devices are all activated & running (then you clap 3x) before the event starts, and deployed as needed. As opposed to starting them as they a
    • Computers are good for that sort of stuff, and something very similar has been done using them for around forty years. Seismic surveys consist of a large number of devices that strongly resemble a moving coil microphone, and recordings from those devices are stacked (added together) so that random or location specific noise is reduced and the common signal is amplified. There's a bit more than that in seismic surveys due to wide spacing between receivers, but that's not relevant with microphones just abou
  • The most massively multi-track event was a thorium conference

    About a specific isotope, or was it more generic?

    • Probably Th-232.
      • Probably Th-232.

        More likely it's Th-231, its half-life is way shorter ( 1 month) and it beta-decays.

        • I understood it was about Thorium reactors, and a quick wikipedia told me they work on 232.
  • Clock Drift (Score:2, Insightful)

    by Anonymous Coward

    Interesting idea, but it sounds like a pain in the ass to deal with in post production. Each recorder is running off it's own crystal for timing, with each crystal being ever so slightly different. This is why the professional approach is to route a mic signals to one recorder, or if you need more channel capacity to sync recorders to the same master clock.

    It's a neat hack, with some usefulness if you cherry pick recordings and edit the best parts together without mixing/overlapping sources together.

  • by StealthSock ( 634668 ) on Monday October 01, 2012 @03:57AM (#41511187)
    My ears got plugged up while swimming and I could barely hear the next day. Rockbox's recorder function outputs the microphone to headphones even when it is not recording. That $30 Clip+ worked reasonably well as a makeshift hearing aid, as long as I was facing the person I was trying to hear.
  • Audio sync in post will be a NIGHTMARE. Been there done that.

    and isolating people at a dinner party is not hard, 11 people? 11 wireless microphones into a field mixer and then into the camera. OR do it old Skool. Camera guy + audio guy with a boom and a shotgun microphone on it, Two would be better (two audio guys on mic booms) A pair of ME55's in a dead cat are magical.

    • by Urza9814 ( 883915 ) on Monday October 01, 2012 @08:16AM (#41511957)

      and isolating people at a dinner party is not hard, 11 people? 11 wireless microphones into a field mixer and then into the camera. OR do it old Skool. Camera guy + audio guy with a boom and a shotgun microphone on it, Two would be better (two audio guys on mic booms) A pair of ME55's in a dead cat are magical.

      ...I think you just proved the utility in this. First, a hundreds or even thousands of dollars of professional equipment and techs vs. a couple $25 devices. Not to mention needing to clear a couple feet around the table for the people carrying your boom mics plus all the wires to your equipment and all of that set up somewhere...

      Sure, in most cases your professionals are still going to be using their professional quality equipment, because the techs and equipment are already paid for and probably cheaper than the editors anyway, and the space constraints aren't there in a studio. But there are CERTAINLY plenty of situations where repurposing a handfull of cheap MP3 players will come out ahead.

      • by Lumpy ( 12016 )

        I'm an editor, make the lazy crew do something in the field. Instead of me spending 60 hours in the editing suite at $250 an hour for me and the suite editing it. Also the little recorder have sync issues as in they will sections of skipped audio or hicckups that makes you have to resync the audio tracks over and over and over and over.

        This is great for the college student that has $1.50 for his budget and all their editing time on a macbook is free. They are trading the proper gear for extensive free

        • This is great for the college student that has $1.50 for his budget and all their editing time on a macbook is free. They are trading the proper gear for extensive free labor.

          Exactly; that, or any organization that has such free labor. I can't tell you how many times I've gone to conferences where they attempted to record sessions by having two or three people run around and set up a fixed camera/microphone in each of 6-10 sessions during the break between. Hell I've been one of the guys doing it a couple times. In such a situation, getting a couple people to volunteer a few hours over the next several months (usually it takes a couple months to get the videos up anyway...) to g

  • It's been a while since I checked the list of hardware at, but IIRC, the newer Sansas are locked down and can't run Rockbox. Plus, they don't seen to age well; after the second Clip that just stopped responding, I switched to an iRiver - though I've never done any recording.
  • In Illinois, the law, under strict interpretation, requires the consent of all parties before you can eletronically record or intercept any conversation, it could be pursued as a felony offense otherwise... although current opinion is this only applies to recording conversations that you could not otherwise naturally hear with your ears.

    Anyhow, check and know the recording laws in your area beforehand.

  • This might make smartphone videos worth a toss. The audio's pretty terrible on those. Demux the video, mux it with the audio, and you'd be good. Not perfect, but good enough for YouTube.

    BTW, if anyone wants to experiment with this, Newegg's selling some refurbed Clip+ players for $26 here [].

  • Wavpack (.wv) is fully supported on the Fuze (with Rockbox obviously), I figure I would also be useful on the Clip. It's a royalty-free lossless compression format, and beats the shit out of .WAV

"For a male and female to live continuously together is... biologically speaking, an extremely unnatural condition." -- Robert Briffault