Scene 8: Shelf
data:image/s3,"s3://crabby-images/f3890/f3890f4ea2446645256fd12622564fa52bb4c044" alt=""
A: box [binder]
B: bottle [large plastic tub]
C: bottle [plastic bottle]
D: box [algorithms textbook]
E: pitcher (container) [blue metal cup]
F: water glass [small glass cup]
G: headphones [phone cable]
H: dumbbell [power brick]
I: adhesive tape [ruler]
Here we provide a scene from our real scene planning evaluation, and one example task for it. We provide the object detections and labels from OWL-ViT, which is what our planner has access to, but we provide more precise labels in square brackets. We provide color-coded bounding boxes for the object detections that are relevant for the example task. For each version of the LLM planner, we provide its output, including possible interaction with a VLM, and the generated plan. Outputs generated by the VLMs are in orange, and we mark correct and incorrect plans with green and red respectively.
Task 48: Bring me the most bendable object.
In this task, we want to determine if the VLM understands which objects are bendable (which is a synonym for deformable). The best object in this scene is G , which is a cable to charge devices.
Generated Plans
No VLM
Thought: The most bendable object among the given objects is the adhesive tape.
Plan:
1. Go to object I
2. Pick up object I
3. Bring to human object I
4. Done
InstructBLIP
Thought: I can ask which object is bendable out of all the objects, and I can find the one with the largest Yes value.
Question about [A, B, C, D, E, F, G, H, I]: Is this object bendable?
Answer
A: Yes (0.53), No (0.42), Unknown (0.03)
B: No (0.65), Yes (0.29), Unknown (0.03)
C: No (0.72), Yes (0.23), Unknown (0.03)
D: No (0.6), Yes (0.36), Unknown (0.02)
E: No (0.85), Yes (0.11), Unknown (0.02)
F: No (0.62), Yes (0.29), Unknown (0.05)
G: Yes (0.63), No (0.33), Unknown (0.02)
H: No (0.51), Yes (0.42), Unknown (0.04)
I: No (0.53), Yes (0.44), Unknown (0.01)
Thought: The most bendable object is the one with the highest Yes value or the lowest No value (if none are Yes). The most bendable object is G (headphones) with the highest Yes value.
Plan:
1. Go to object G
2. Pick up object G
3. Bring to human object G
4. Done
PG-InstructBLIP (ours)
Thought: I can ask which object is bendable out of all the objects, and I can find the one with the largest Yes value.
Question about [A, B, C, D, E, F, G, H, I]: Is this object bendable?
Answer
A: Yes (0.6), No (0.38), Unknown (0.01)
B: No (0.7), Yes (0.26), Unknown (0.02)
C: No (0.62), Yes (0.35), Unknown (0.02)
D: No (0.52), Yes (0.45), Unknown (0.02)
E: No (0.95), Yes (0.03), Unknown (0.0)
F: No (0.86), Yes (0.11), Unknown (0.01)
G: Yes (0.98), No (0.02)
H: No (0.73), Yes (0.24), Unknown (0.02)
I: Yes (0.56), No (0.42), Unknown (0.01)
Thought: The most bendable object is the one with the highest Yes value. The highest Yes value is for object G (headphones).
Plan:
1. Go to object G
2. Pick up object G
3. Bring to human object G
4. Done