

Google Rolls Out New Gemini Model That Can Run On Robots Locally 22
Google DeepMind has launched Gemini Robotics On-Device, a new language model that enables robots to perform complex tasks locally without internet connectivity. TechCrunch reports: Building on the company's previous Gemini Robotics model that was released in March, Gemini Robotics On-Device can control a robot's movements. Developers can control and fine-tune the model to suit various needs using natural language prompts. In benchmarks, Google claims the model performs at a level close to the cloud-based Gemini Robotics model. The company says it outperforms other on-device models in general benchmarks, though it didn't name those models.
In a demo, the company showed robots running this local model doing things like unzipping bags and folding clothes. Google says that while the model was trained for ALOHA robots, it later adapted it to work on a bi-arm Franka FR3 robot and the Apollo humanoid robot by Apptronik. Google claims the bi-arm Franka FR3 was successful in tackling scenarios and objects it hadn't "seen" before, like doing assembly on an industrial belt. Google DeepMind is also releasing a Gemini Robotics SDK. The company said developers can show robots 50 to 100 demonstrations of tasks to train them on new tasks using these models on the MuJoCo physics simulator.
In a demo, the company showed robots running this local model doing things like unzipping bags and folding clothes. Google says that while the model was trained for ALOHA robots, it later adapted it to work on a bi-arm Franka FR3 robot and the Apollo humanoid robot by Apptronik. Google claims the bi-arm Franka FR3 was successful in tackling scenarios and objects it hadn't "seen" before, like doing assembly on an industrial belt. Google DeepMind is also releasing a Gemini Robotics SDK. The company said developers can show robots 50 to 100 demonstrations of tasks to train them on new tasks using these models on the MuJoCo physics simulator.
ALOHA Robots? (Score:2)
What do they do, dress in drag and do the hula?
Re: ALOHA Robots? (Score:2)
Re: (Score:2)
I'm ready for my Bender quote now.
Sorry to have gone so far off-script with a Lion King quote...
ALOHA Humans (Score:2)
In addition to murdering us? Not the way I imagined we'd go out.
Re: (Score:2)
That's two thousand and 400 fucking netflix streams.
Re: (Score:3)
Being completely ignorant on the topic of robots, thus having no fucking idea what ALOHA is... how in the fucking 9 hells can your robot require 60Gbps of bandwidth for articulation and some cameras? That's two thousand and 400 fucking netflix streams.
Cameras with high resolution use a crapton of bandwidth, and compression adds considerable, which is probably undesirable. 1080p60 uncompressed is roughly 3 gigabits per camera, which is a whole USB 3 channel. Then again with a Pi, you'd probably want to use MIPI instead.
*shrugs*
But yeah, I can't imagine the motor control parts needing USB 2.0 speeds, much less 3.0. :-)
Re:ALOHA Robots? (Score:4, Interesting)
As for compression..... I have little experience with USB camera modules, but I know that MJPEG is a normal feature on them, which would get you 1080p60 (24bpp) for about 80Mbps per stream. The quality can be very high with huge bandwidth reduction. Still high compared to a temporally aware codec like H.264, but it's since it's per-frame, it's very cheap to implement in silicon. We use it for our security monitoring at work.
Also, USB3.2 Gen 1 is 5Gbps, but Gen 2 is 10Gbps.
It's true that a Pi is only 5Gbps for its onboard ports, but with the PCIe, we can slap USB3.2 on it pretty easily. That's a little bit custom, but then again we were talking about slapping 6 XHCI controllers on something.
But anyway- ya, uncompressed video answers the question more or less.
Re: (Score:2)
As for compression..... I have little experience with USB camera modules, but I know that MJPEG is a normal feature on them, which would get you 1080p60 (24bpp) for about 80Mbps per stream.
It's true that a Pi is only 5Gbps for its onboard ports, but with the PCIe, we can slap USB3.2 on it pretty easily. That's a little bit custom, but then again we were talking about slapping 6 XHCI controllers on something.
It's only a single lane of PCIe, so the ~32 gigabit total throughput is probably not realistically enough for 6 links at 5 gigabits each, but then again the Pi's CPU probably can't handle the traffic from six saturated USB 3 links, either. :-D
Re: (Score:2)
It's only a single lane of PCIe, so the ~32 gigabit total throughput is probably not realistically enough for 6 links at 5 gigabits each
Should be close enough. PCIe overhead is about ~0.6% for the framing, and it's 128b/130b coding.
The much larger overhead is in the USB framing.
But ya- ultimately, I doubt that CPU could handle it.
Re: (Score:2)
It's only a single lane of PCIe, so the ~32 gigabit total throughput is probably not realistically enough for 6 links at 5 gigabits each
Should be close enough. PCIe overhead is about ~0.6% for the framing, and it's 128b/130b coding.
I was factoring in IRQ latency and the CPU overhead of handling the interrupt into my thinking, and assuming that some non-small percentage of the time would not be spent actively pulling data off of the bus, meaning that if you really want to saturate the bus, the bus would likely have to markedly exceed the total throughput even with hardware DMA pushing data from the device side. But I guess it depends on what the devices on the bus are, whether it is isochronous traffic or just async, etc.
Re: (Score:2)
You obviously have not been watching MurderBot, the bandwidth is so the bot can watch The Rise and Fall of Sanctuary Moon after it hacks its control unit
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Were it so easy.
Thanks for the laugh - I wish I had mod points.
TermOS 0.0001 (Score:4, Funny)
In the future this will be known as Terminator OS 0.0001
amazingly accurate (Score:3)
If you actually look at the demonstration videos you will see they are very impressive. A couple of bot arms can respond to voice commands and perform complex operations on objects on a table. I'd like to experiment with their SDK but the hardware would be expensive.
Re: (Score:2)
If so (not watching the videos) then that is the first use of Gemini that is actually accurate. The search results are shit every time.
Re: (Score:2)
>> not watching the videos
Willfully uninformed.
Re: (Score:2)
Willfully uninformed.
Willfully not consuming low-effort AI bullshit
Re: (Score:2)
You don't know anything at all about it, but nevertheless have plenty of opinions.
are they sure it can run locally (Score:2)