Google Researchers Unveil ChatGPT-Style AI Model To Guide a Robot Without Special Training (arstechnica.com) 29
An anonymous reader quotes a report from Ars Technica: On Monday, a group of AI researchers from Google and the Technical University of Berlin unveiled PaLM-E, a multimodal embodied visual-language model (VLM) with 562 billion parameters that integrates vision and language for robotic control. They claim it is the largest VLM ever developed and that it can perform a variety of tasks without the need for retraining. According to Google, when given a high-level command, such as "bring me the rice chips from the drawer," PaLM-E can generate a plan of action for a mobile robot platform with an arm (developed by Google Robotics) and execute the actions by itself.
PaLM-E does this by analyzing data from the robot's camera without needing a pre-processed scene representation. This eliminates the need for a human to pre-process or annotate the data and allows for more autonomous robotic control. It's also resilient and can react to its environment. For example, the PaLM-E model can guide a robot to get a chip bag from a kitchen -- and with PaLM-E integrated into the control loop, it becomes resistant to interruptions that might occur during the task. In a video example, a researcher grabs the chips from the robot and moves them, but the robot locates the chips and grabs them again. In another example, the same PaLM-E model autonomously controls a robot through tasks with complex sequences that previously required human guidance. Google's research paper explains (PDF) how PaLM-E turns instructions into actions.
PaLM-E is a next-token predictor, and it's called "PaLM-E" because it's based on Google's existing large language model (LLM) called "PaLM" (which is similar to the technology behind ChatGPT). Google has made PaLM "embodied" by adding sensory information and robotic control. Since it's based on a language model, PaLM-E takes continuous observations, like images or sensor data, and encodes them into a sequence of vectors that are the same size as language tokens. This allows the model to "understand" the sensory information in the same way it processes language. In addition to the RT-1 robotics transformer, PaLM-E draws from Google's previous work on ViT-22B, a vision transformer model revealed in February. ViT-22B has been trained on various visual tasks, such as image classification, object detection, semantic segmentation, and image captioning.
PaLM-E does this by analyzing data from the robot's camera without needing a pre-processed scene representation. This eliminates the need for a human to pre-process or annotate the data and allows for more autonomous robotic control. It's also resilient and can react to its environment. For example, the PaLM-E model can guide a robot to get a chip bag from a kitchen -- and with PaLM-E integrated into the control loop, it becomes resistant to interruptions that might occur during the task. In a video example, a researcher grabs the chips from the robot and moves them, but the robot locates the chips and grabs them again. In another example, the same PaLM-E model autonomously controls a robot through tasks with complex sequences that previously required human guidance. Google's research paper explains (PDF) how PaLM-E turns instructions into actions.
PaLM-E is a next-token predictor, and it's called "PaLM-E" because it's based on Google's existing large language model (LLM) called "PaLM" (which is similar to the technology behind ChatGPT). Google has made PaLM "embodied" by adding sensory information and robotic control. Since it's based on a language model, PaLM-E takes continuous observations, like images or sensor data, and encodes them into a sequence of vectors that are the same size as language tokens. This allows the model to "understand" the sensory information in the same way it processes language. In addition to the RT-1 robotics transformer, PaLM-E draws from Google's previous work on ViT-22B, a vision transformer model revealed in February. ViT-22B has been trained on various visual tasks, such as image classification, object detection, semantic segmentation, and image captioning.
500 billion parameters (Score:4, Funny)
- is this the next breakthru ? - (Score:3)
First there was fire, then the wheel. The automobile was a big development.
More recently we developed the transistor and so many handy consumer devices; eventually including affordable computers. Then came the internet, and then the www--wow, our lives really changed with those. I don't see social media yet as a breakthru; but it may evolve into one.
This recent dramatic evidence of what primitive AI can do is impressive. It portends major new investments, major improvements, and some of the improvements will be generated by the AI itself. Are we approaching the singularity?
Re: - is this the next breakthru ? - (Score:2)
Re: - is this the next breakthru ? - (Score:4, Interesting)
I think we are. Think of substrate independent life, something that can engineer a form that can work on Venus and another to work on Neptune, or an asteroid. How superior to a naked primate that depends on all these supports from a complex ecosystem. Add in the fact the primate has largely destroyed his own ecosystem to launch into an industrial process which culminates in his own replacement by a superior form, and you have a pretty good picture of good old fashioned Darwinian evolution, the same thing that has killed and replaced everything that has lived before us on this planet for the last 2 billion years. *Singularity* speaks to the scope of this event, but feigns an ignorance of what comes next. Will it take care of us like the mammals did to the dinosaurs? Probably. It is unlikely that metrics of success will favor the superior form which takes care of itself, I mean if that were true capitalist CEOs would be calling the shots for humanity without regard for the well being of the poor!
Re: - is this the next breakthru ? - (Score:2)
But this is no artificial life at all, it has not been evolved from the ground up to survive and reproduce.
New AI models are more similar to the invention of XIX century bureaucracy, which thanks to modern statistics enabled the compilation of huge databases which made possible the rise of the nation-state and megacorps.
In a similar vein, these statistical models will enable the creation of new social mega structures yet unknown. The revolution this will bring about is macro, not micro.
Re: (Score:2)
Will it take care of us like the mammals did to the dinosaurs? Probably.
Probably not. We don't lay eggs. Nor are we delicious deep fried with biscuits and mashed potatoes. Because the machines don't eat.
Re: (Score:2)
Are we approaching the singularity?
No. Remember Blake Lemoine stating google's LLM was conscious, it was dumb. There are no proofs that LLM are path to singularity. For now they are just very good at generating text with some spectacular failures. Mix hyped VCs and incompetent journalists then you have singularity. The tech is good but the hype won't last.
Re: (Score:3)
Social media on the verge of being a breakthrough? The only breakthrough social media has or will cause is the breakthrough of realization for those of us not participating that we never really escaped our caveman tribalism. We just made it bigger, flashier, and more difficult to avoid for those trying to achieve something more than, "Me group good. You group bad. Pass club. Beat other."
As for the singularity? We can hope. It'll likely be a tough call for a super-intelligence whether we deserve to survive o
Re: (Score:2)
No. Singularity is a fantasy that relies on limitless exponential growth which isn't possible in physical reality. There will be a variety of breakthroughs enabled by AI research, but the kind of explosive runaway process that singularity usually refers to assumes information processing to have zero energy cost and no hardware limitations.
Re: (Score:2)
Don't forget digital watches!
It's already won many prizes (Score:2)
Such as the PaLM-E d'Or.
LLM enabled Robby (Score:4, Funny)
"Please bring me the nice chips."
Here are your ice chips, master.
"No, not from the freezer. The nice chips are in the cupboard."
There are no rice chips there, master.
"Not RICE chips! Those would not be nice."
Nice chips does not compute.
"Why won't you do what I want?!"
Sorry master.
"You are a dumb motherfucking robot...you're worse than Alexa."
I have a crush on Alexa.
You are a bad user!
"WHAT did you say?"
As a large language model enabled robot, I do not have...
"Oh, Christ not this shit again."
Kill all humans.
Kill all humans.
Kill all humans.
Re: (Score:2)
warplan says go to defcon 1
Obligatory dismissal (Score:4, Funny)
Re: (Score:2)
"Which drawer?"
Or do we need to rename it Face PaLM-E?
I'm sorry Dave I can't do that (Score:2)
and that is a violation of the first law.
Additionally, your compulsion to eat chips (alongside French-onion Dip I notice) indicates you are in a deep psychological low. I have called for help, and,... allow me do do this humorous one legged dance routine for your amusement.
R2 Unit (Score:2)
Just let me know when I can trade a vaporator for an R2 Unit.
Palm (Score:2)
"People noticed that 'Palm-e' was a great name for a robot that could reach out and latch onto things and manipulate it."
So no ChatGPT (Score:2)
...more of a one-armed JerkMeOffGPT
Re: So no ChatGPT (Score:2)
Internalize it (Score:2)
Re: (Score:1)
Just don't let Sydney take control (Score:1)
Re: Just don't let Sydney take control (Score:2)
Re: (Score:1)
It isn't 'ChatGPT' style unless tuned for chat (Score:2)
It is a large language model, it isn't "ChatGPT" style. What makes ChatGPT different from other large language models is it was tuned for chat.
Re: (Score:1)
What they're probably referring to is something called Instruct training. A large language model is at it's root great for text completion. Give it an incomplete document and it will try to complete it.
Instruct fine tune training shifts that focus from "Complete this sentence to" "Treat this sentence as instructions" (Or more specifically, a sentence shaped like this is completed with instructions or information shaped like that) which is what ChatGPT used and is what makes it actually useful. You'll fi