Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Communications Input Devices Technology

CMU Video Conference System Gets 3D From Cheap Webcams 94

Hesham writes "Carnegie Mellon University's HCI Institute just released details on their "why-didn't-I-think-of-that-style" 3D video conferencing application. Considering how stale development has been in this field, this research seems like a nice solid step towards immersive telepresence. I was really disappointed with the "state-of-the-art" systems demoed at CES this year — they are all still just a flat, square, video stream. Hardly anything new. What is really cool about this project, is that researchers avoided building custom hardware no one is going to ever buy, and explored what could be done with just the generic webcams everyone already has. The result is a software-only solution, meaning all the big players (AIM, Skype, MSN, etc.) can release this as a simple software update. 'Enable 3D' checkbox anyone? YouTube video here. Behind the scenes, it relies on a clever illusory trick (motion parallax) and head-tracking (a la Johnny Lee's Wiimote stuff — same lab, HCII). It was just presented at IEEE International Symposium on Multimedia in December."
This discussion has been archived. No new comments can be posted.

CMU Video Conference System Gets 3D From Cheap Webcams

Comments Filter:
  • 2.5D, not 3D (Score:5, Insightful)

    by adam ( 1231 ) * on Thursday January 29, 2009 @03:51PM (#26657505)
    The post title/summary is misleading -- this is actually 2.5D and not 3D at all. (It works on the premise that the background is static, and obtains a matte of the background, and using subtraction to dynamically key/mask the participant from the image, and then add the user as a second foreground layer; on the viewer side, headtracking is used to gently shift the user layer to reveal background hidden behind it)

    For what it's worth, I really don't care for this effect at all. I am not denigrating its inventors in the slightest; this is a novel (read: low cost) approach, and I am sure some people would enjoy having this in their iChat/AIM/skype. To me, it's the equivalent of Apple's Photobooth filters (fisheye, inverted colors, etc) -- a cheap parlor trick that seems nifty for about 5 seconds, and then becomes precipitously distracting. True 3D has its own issues with distraction and visual anomalies (leading to headaches, etc). Even the best 3D cinematographers around have to be very careful to avoid these issues (for instance, Vince Pace, who shoots 3D for James Cameron (Titanic, Terminator, etc) has plenty of headache-inducing scenes in his demoreel, and this is a guy with state-of-the-art facilities who has as much knowledge as anyone about how to do stereoscopic cinematography). Frankly, I think video conferencing is best left 2D, and any efforts toward improving it should be spent increasing framerate/resolution (and reducing lag + dropped frames).
  • by Quarters ( 18322 ) on Thursday January 29, 2009 @03:54PM (#26657531)
    ...but that sample conversation at the end of the video may have well been between two drunken epilepsy sufferers on boats in the North Atlantic. Who moves around like that while they are talking?
  • Re:2.5D, not 3D (Score:3, Insightful)

    by 0100010001010011 ( 652467 ) on Thursday January 29, 2009 @04:02PM (#26657661)

    Why not just use 2 webcams a red/blue filters and a camera on the other end?

    It'll be slightly annoying wearing the glasses, but it'll be much more 'real' than what this appears to be. Set the cameras eye width apart for realism or farther to make the effect more predominant.

  • Re:2.5D, not 3D (Score:5, Insightful)

    by JustinOpinion ( 1246824 ) on Thursday January 29, 2009 @04:22PM (#26657881)

    I agree with you: having this kind of 2.5D experience is neat but not particularly useful.

    But I wonder if this software could be adapted to do something else... One of the things that most people dislike about webcam-conferencing is that the other person is never looking "at" you. They are looking on their screen at an image of you, so they are not looking directly at their camera, and so on your end they seem to be looking away from you. (And they see you looking away from them, too.)

    While this may seem trivial, it is actually a significant roadblock to inter-person tele-communication. People rely on body language and eye contact to establish each other's moods, to really "connect". Webcam-conferencing forces us to violate social conventions (like looking into people's eyes), which can be anywhere from subconsciously bothersome, to somewhat distracting, or even perceived as insulting.

    So what I would like is a multi-camera system that uses similar kinds of interpolation to rebuild the image of the person so that they are looking directly at the camera. So if I put one webcam on either side of my screen, they can combine their images to create a shifted image where I am looking directly at the viewer on the other end.

    Though it is a rather small and subtle addition to tele-conferencing, I believe it would have a bigger impact than what TFA seems to be showing. I think it would make the interaction "more real."

  • Re:2.5D, not 3D (Score:3, Insightful)

    by dinther ( 738910 ) on Thursday January 29, 2009 @06:05PM (#26659369) Homepage

    Very cool and I like the fact that they give the webcam a double function. The 2.5D effect against a static background is indeed novel only but I see there is a confusion between 3D and stereo vision.

    I agree with most writers that stereo vision induces headaches which are simply due to the fact that the eyes each see a different image which giver your brain a depth cue, yet your eyes focal point (To your screen) conflicts with that depth cue thus resulting in a headache. It is unavoidable with normal screens.

    However, if it was possible to determine for each pixel in the webcam image at what depth it occurs then each pixel could be placed in 3D space and a virtual camera could be used to rotate around the subject.

    For this you would need two web cam's and some real clever software to figure out the depth of each pixel. So the result here is a 3D video model of a real life subject shown on a 2D screen like is done with every 3D game.

    But I suppose such software is difficult to write as the Darpa urban challenge with autonomous cars proved. Those guys used extensive sensor arrays that were used to build a 3D model of the road in front of the car.

  • Re:2.5D, not 3D (Score:2, Insightful)

    by Anonymous Coward on Thursday January 29, 2009 @06:16PM (#26659501)

    I did a study about this gaze problem and a possible algorithmic solution, for a videoconf specialist about one year ago.
    My conclusion:
    no algorithm was/is/will be suitable to combine any point of view with any other point of view. Consider an object occluded for each point of view but not occluded for your virtual view (the combination of the two actual views): there is no solution but to guess areas that can be very wide and for situation very frequent (if 2 objects are near, like your hand and your body for example). This would create unacceptable artifacts that would be far more annoying than the gaze effet.
    And to answer your question that you are about to ask: why our brain can do it? It can't. Try it, honestly.
    Moreover, cameras placed on both side of the screen would be too far away for a stereoscopic system.
    What the paper does not say either is that no algorithm can segment any video (or say differently, any segmentation algorithm can be fooled by specific situations), and equivalent artifact (to the combination artifact) will be produced by this pseudo-3D: it is sometime un-decidable. Unacceptable for professionnal systems. It's a toys.

This file will self-destruct in five minutes.

Working...