RockBox + Refurbished MP3 Players = Crowdsourced Audio Capture 66
An anonymous reader writes "Looking for an inexpensive means to capture audio from a dynamically moving crowd, I sampled many MP3 players' recording capabilities. Ultimately the best bang-for-the-buck was refurbished SanDisk Sansa Clip+ devices ($26/ea) loaded with (open source) RockBox firmware. The most massively multi-track event was a thorium conference in Chicago where many attendees wore a Clip+. Volunteers worked the room with cameras, and audio capture was decoupled from video capture. It looked like this. Despite having (higher quality) ZOOM H1n and wireless mics, I've continued to use the RockBox-ified Clip+ devices ... even if the H1n is running, the Clip+ serves as backup. There's no worry about interference or staying within wireless mic range. The devices have 4GB capacity, and RockBox allows WAV capture. They'll run at least 5 hours before the battery is depleted (with lots of storage left over). I would suggest sticking with 44kHz (mono) capture, as 48kHz is unreliable. To get an idea of their sound quality, here is a 10-person dinner conversation (about thorium molten salt nuclear reactors) in a very busy restaurant. I don't know how else I could have isolated everyone's dialog for so little money. (And I would NOT recommend Clip+ with factory firmware... they only support 22kHz and levels are too high for clipping on people's collars.)" This video incorporating much of that captured audio is worth watching for its content as well as the interesting repurposing.
huh? (Score:5, Insightful)
Sorry, I have no idea what TFA is about. Please help.
Re:huh? (Score:0, Insightful)
The gist of the problem is how to recreate the human ability to focus on individual conversations in a room full of noise. It involves shit like to the FFT, to the DFT, on a rhyming spree, a straight-G. The problem was described in Slashdot shortly before, using all kinds of advanced shit which is too powerful for your feeble mind to comprehend.
Look to the final link of the summary. What happened is that, when you look to youtube for classic videogame speedruns, most of the links resolve to that annoying "let's play..." asshole whose voice is so goddamn annoying that you want to smack him in the mouth with a rolled-up newspaper and look for another speedrun.
What that asshole did was find a way to slow his voice down, so it sounds like he's on Vaaaaaaaliummmmmm, and in doing so nobody knows who he is and so they don't get pissed off listening to such a fuckhead.
-- Ethanol-fueled
Re:Lots of work? (Score:5, Insightful)
That depends, there are some applications out there that can align audio automatically (PluralEyes: http://www.singularsoftware.com/pluraleyes.html) for example, so then all you would need to do is name the track after the person who it relates to, and alter the levels as needed. All video creation requires a "huge" amount of editing work.
Re:Lots of work? (Score:5, Insightful)
All video creation requires a "huge" amount of editing work.
Exactly. Having dedicated audio sources for all speakers is great to have, and some increased editing time is worth it if your product is going to be higher quality.
It sucks to have to struggle to hear what's going on in a video and live events can be terribly chaotic. Having well planned audio capture is critical to reducing your stress. This is a clever use of cheap tech, and I may have to give it a shot with my old 2gb clip floating around in my tech bins. If only there was a way to pipe a proper lav into it...
Clock Drift (Score:2, Insightful)
Interesting idea, but it sounds like a pain in the ass to deal with in post production. Each recorder is running off it's own crystal for timing, with each crystal being ever so slightly different. This is why the professional approach is to route a mic signals to one recorder, or if you need more channel capacity to sync recorders to the same master clock.
It's a neat hack, with some usefulness if you cherry pick recordings and edit the best parts together without mixing/overlapping sources together.
Re:Lots of work? (Score:4, Insightful)
Just altering the levels provides a lot of isolation (as seen in the video clips), but I have to wonder if there's an audio equivalent of "image stacking" or Photosynth, that would correlate all of the audio streams, build a "model" of the audio-scape, and allow noise to be cancelled out. Or more accurately, allow a voice to be extracted with a higher specificity than just 100% of one source.
I'm sensing that we're on the cusp of affordable setups where instead of just a few microphones, rooms could be set up with hundreds of microphones recording in parallel, with analysis done to track and extract individual sound sources moving in 3D. I suspect that a modern GPU already has the computer power, or will soon. This would allow individual speakers to be isolated even if they weren't set up with little clip-on recorders ahead of time.
Re:Nuice but causes problems. (Score:4, Insightful)
and isolating people at a dinner party is not hard, 11 people? 11 wireless microphones into a field mixer and then into the camera. OR do it old Skool. Camera guy + audio guy with a boom and a shotgun microphone on it, Two would be better (two audio guys on mic booms) A pair of ME55's in a dead cat are magical.
...I think you just proved the utility in this. First, a hundreds or even thousands of dollars of professional equipment and techs vs. a couple $25 devices. Not to mention needing to clear a couple feet around the table for the people carrying your boom mics plus all the wires to your equipment and all of that set up somewhere...
Sure, in most cases your professionals are still going to be using their professional quality equipment, because the techs and equipment are already paid for and probably cheaper than the editors anyway, and the space constraints aren't there in a studio. But there are CERTAINLY plenty of situations where repurposing a handfull of cheap MP3 players will come out ahead.