PaLM (Pathways Language Model), Google Inc., Mountain View, California, USA


Introducing language models to robotics

Aug 16, 2022

In this first episode of Meet a Google Researcher, Drew Calcagno speaks with researchers Sharan Narang and Aakanksha Chowdhery, folks who theorized and coded the Pathways Language Model, or PaLM.

PaLM is a new advanced language model that achieves state-of-the-art performance on challenging language understanding and generation tasks.

In this interview, we learn about the significance of this particular language model, the implications of using unique capabilities of language models in robotics, and what that means for the future of technology.

Chapters:

0:00 - Intro
0:40 - Intro of PaLM
0:58 - What is this research important?
1:55 - Researcher intros
2:24 - What are language models?
3:19 - What makes PaLM special?
4:05 - How do language models work?
5:58 - Where are we now in robotics research?
7:22 - What are the implications of reasoning in this type of model?
8:29 - What are the societal implications of this research?
9:50 - How do you ensure the data in language models is equitable and inclusive?
12:27 - What inspires you?
14:10 - What does the future look like?
 
Last edited:

Google PaLM-E: An Embodied Multimodal Language Model

Apr 14, 2023

PaLM-E is a decoder-only LLM that generates textual completions autoregressively given a prefix or prompt. It combines the power of visual models like ViT with language models like PaLM. It is an embodied LM built by injecting multi-modal information such as images into the embedding space of a pre-trained LLM.

PaLM-E is a single general-purpose multimodal language model for embodied reasoning tasks, visual-language tasks, and language tasks. PaLM-E transfers knowledge from visual-language domains into embodied reasoning – from robot planning in environments with complex dynamics and physical constraints, to answering questions about the observable world. PaLM-E-562B can do zero-shot multimodal chain-of-thought reasoning, can tell visually-conditioned jokes given an image, and demonstrates an array of robot-relevant multimodal-informed capabilities including perception, visually-grounded dialogue, and planning. PaLM-E also generalizes, zero-shot, to multi-image prompts despite only being trained on single-image prompts. PaLM-E can also perform math given an image with textually-interleaved handwritten numbers. In addition, the model can perform, zero-shot, question and answering on temporally-annotated egocentric vision.

Here is the agenda for this video:

00:00:00 What is PaLM-E?
00:03:34 What is the overall architecture of PaLM-E?
00:06:54 What is the input format for PaLM-E?
00:11:30 How are PaLM-E models trained?
00:16:03 How does PaLM-E perform on Task and Motion Planning (TAMP)?
00:22:27 How does PaLM-E perform on a table-top pushing environment?
00:28:05 How does PaLM-E perform in mobile manipulation domain?
00:32:45 How does PaLM-E perform on General Visual-Language Tasks, and General Lang Tasks?
 
Back
Top