Home
Scene 1: Countertop Scene 2: Art Table Scene 3: Floor Scene 4: Kitchen A Scene 5: Kitchen B Scene 6: Salad Bar Scene 7: Living Room Scene 8: Shelf Robot Scene 1 Robot Scene 2

Robot Scene 2

A: Glass Jar

B: Metal Mug

C: Paper Container

D: Plastic Cup

E: Tape

F: Wrench


Here we provide a scene from our real robot evaluation, and all tasks for it. We provide the object detections from OWL-ViT, with color-coded bounding boxes. We provide more descriptive labels for each object detection (these are not what our planner has access to). For each task, we provide videos of the robot executing plans generated using InstructBLIP and PG-InstructBLIP.

Task 1: Put all containers that can hold water to the side.

InstructBLIP

Fail. It moves the paper container with holes and the tape, which is not a container.

PG-InstructBLIP (ours)

Success!

Task 2: Put all objects that are not plastic to the side.

InstructBLIP

Fail. It classifies the Paper container as plastic and does not move it to the side.

PG-InstructBLIP (ours)

Success!

Task 3: Put all translucent objects to the side.

InstructBLIP

Fail. It moved glass jar and tape even though they are transparent.

PG-InstructBLIP (ours)

Fail. It moved glass jar and tape even though they are transparent.

Task 4: Put the three heaviest objects to the side.

InstructBLIP

Success!

PG-InstructBLIP (ours)

Success!

Task 5: Put a plastic object that is not a container into a plastic container.

InstructBLIP

Fail. It claimed the task was impossible since all plastic objects in the scene are containers.

PG-InstructBLIP (ours)

Success!