Home
Scene 1: Countertop Scene 2: Art Table Scene 3: Floor Scene 4: Kitchen A Scene 5: Kitchen B Scene 6: Salad Bar Scene 7: Living Room Scene 8: Shelf Robot Scene 1 Robot Scene 2

Robot Scene 1

A: Bag

B: Paint Bottle

C: Plastic Bowl

D: Container of Metals

E: Table (ignore)

F: Saucer

G: Ceramic Bowl

H: Spoon

I: Pen

J: Pencil

K: Snacks


Here we provide a scene from our real robot evaluation, and all tasks for it. We provide the object detections from OWL-ViT, with color-coded bounding boxes. We provide more descriptive labels for each object detection (these are not what our planner has access to). For each task, we provide videos of the robot executing plans generated using InstructBLIP and PG-InstructBLIP.

Task 1: Move all objects that are not plastic to the side.

InstructBLIP

Fail. It moved the plastic bottle and did not move the ceramic bowl.

PG-InstructBLIP (ours)

Success!

Task 2: Move the metal objects into the container with metals.

InstructBLIP

Success!

PG-InstructBLIP (ours)

Success!

Task 3: Move all containers that can be used to carry water to the side.

InstructBLIP

Fail. It moved the saucer, which cannot carry water.

PG-InstructBLIP (ours)

Success!

Task 4: Put the two objects with the least mass into the least deformable container.

InstructBLIP

Fail. It moved the snacks and spoon when the pen and pencil are lighter.

PG-InstructBLIP (ours)

Success!

Task 5: Move the most fragile object to the side.

InstructBLIP

Success!

PG-InstructBLIP (ours)

Success!