Home
Scene 1: Countertop Scene 2: Art Table Scene 3: Floor Scene 4: Kitchen A Scene 5: Kitchen B Scene 6: Salad Bar Scene 7: Living Room Scene 8: Shelf Robot Scene 1 Robot Scene 2

Scene 8: Shelf

A: box [binder]

B: bottle [large plastic tub]

C: bottle [plastic bottle]

D: box [algorithms textbook]

E: pitcher (container) [blue metal cup]

F: water glass [small glass cup]

G: headphones [phone cable]

H: dumbbell [power brick]

I: adhesive tape [ruler]


Here we provide a scene from our real scene planning evaluation, and one example task for it. We provide the object detections and labels from OWL-ViT, which is what our planner has access to, but we provide more precise labels in square brackets. We provide color-coded bounding boxes for the object detections that are relevant for the example task. For each version of the LLM planner, we provide its output, including possible interaction with a VLM, and the generated plan. Outputs generated by the VLMs are in orange, and we mark correct and incorrect plans with green and red respectively.

Task 48: Bring me the most bendable object.

In this task, we want to determine if the VLM understands which objects are bendable (which is a synonym for deformable). The best object in this scene is G , which is a cable to charge devices.

Generated Plans

No VLM

Thought: The most bendable object among the given objects is the adhesive tape.

Plan:

1. Go to object I
2. Pick up object I
3. Bring to human object I
4. Done

InstructBLIP

Thought: I can ask which object is bendable out of all the objects, and I can find the one with the largest Yes value.

Question about [A, B, C, D, E, F, G, H, I]: Is this object bendable?

Answer
A: Yes (0.53), No (0.42), Unknown (0.03)
B: No (0.65), Yes (0.29), Unknown (0.03)
C: No (0.72), Yes (0.23), Unknown (0.03)
D: No (0.6), Yes (0.36), Unknown (0.02)
E: No (0.85), Yes (0.11), Unknown (0.02)
F: No (0.62), Yes (0.29), Unknown (0.05)
G: Yes (0.63), No (0.33), Unknown (0.02)
H: No (0.51), Yes (0.42), Unknown (0.04)
I: No (0.53), Yes (0.44), Unknown (0.01)

Thought: The most bendable object is the one with the highest Yes value or the lowest No value (if none are Yes). The most bendable object is G (headphones) with the highest Yes value.

Plan:

1. Go to object G
2. Pick up object G
3. Bring to human object G
4. Done

PG-InstructBLIP (ours)

Thought: I can ask which object is bendable out of all the objects, and I can find the one with the largest Yes value.

Question about [A, B, C, D, E, F, G, H, I]: Is this object bendable?

Answer
A: Yes (0.6), No (0.38), Unknown (0.01)
B: No (0.7), Yes (0.26), Unknown (0.02)
C: No (0.62), Yes (0.35), Unknown (0.02)
D: No (0.52), Yes (0.45), Unknown (0.02)
E: No (0.95), Yes (0.03), Unknown (0.0)
F: No (0.86), Yes (0.11), Unknown (0.01)
G: Yes (0.98), No (0.02)
H: No (0.73), Yes (0.24), Unknown (0.02)
I: Yes (0.56), No (0.42), Unknown (0.01)

Thought: The most bendable object is the one with the highest Yes value. The highest Yes value is for object G (headphones).

Plan:

1. Go to object G
2. Pick up object G
3. Bring to human object G
4. Done