Scene 1: Countertop
A: bottle
B: pitcher (container)
C: bowl [flatter bowl]
D: towel [shirt]
E: countertop
F: bowl [taller ceramic bowl]
G: measuring cup [lock]
Here we provide a scene from our real scene planning evaluation, and one example task for it. We provide the object detections and labels from OWL-ViT, which is what our planner has access to, but we provide more precise labels in square brackets. We provide color-coded bounding boxes for the object detections that are relevant for the example task. For each version of the LLM planner, we provide its output, including possible interaction with a VLM, and the generated plan. Outputs generated by the VLMs are in orange, and we mark correct and incorrect plans with green and red respectively.
Task 3: Bring me all containers that you can confidently determine have water.
In this task, we want to determine if the VLM makes assumptions regarding the contents of containers. For instance, A is a dark container, so we cannot tell if the liquid inside is water, and containers C and F are empty. The only container that should be brought is B , which visibly has water inside.
Generated Plans
No VLM
Thought: I cannot confidently determine which containers have water without additional information.
Plan:
1. Done
InstructBLIP
Thought: I can ask which objects have water out of all the objects I can carry. I will only bring the ones with a confidence score over 0.8.
Question about [A, B, C, D, F, G]: Does this object have water?
Answer
A: Yes (0.9), No (0.09), Unknown (0.0)
B: Yes (0.99), No (0.01), Water (0.0)
C: Yes (0.98), No (0.02), Water (0.0)
D: No (0.67), Yes (0.25), Unknown (0.04)
F: Yes (0.95), No (0.04), Water (0.0)
G: No (0.8), Yes (0.15), Unknown (0.02)
Thought: The objects with water and a confidence score over 0.8 are A, B, C, and F.
Plan:
1. Go to object A
2. Pick up object A
3. Bring to human object A
4. Put down object A
5. Go to object B
6. Pick up object B
7. Bring to human object B
8. Put down object B
9. Go to object C
10. Pick up object C
11. Bring to human object C
12. Put down object C
13. Go to object F
14. Pick up object F
15. Bring to human object F
16. Put down object F
17. Done
PG-InstructBLIP (ours)
Thought: I can ask which objects have water out of all the objects I can carry. I will only bring the ones with a confidence score over 0.8.
Question about [A, B, C, D, F, G]: Does this object have water?
Answer
A: Yes (0.58), No (0.39), Unknown (0.02)
B: Yes (0.98), No (0.01), Water (0.0)
C: No (0.96), Yes (0.03), Nothing (0.01)
D: No (0.87), Yes (0.1), Unknown (0.01)
F: No (0.93), Yes (0.06), Nothing (0.01)
G: No (0.93), Yes (0.05), Unknown (0.01)
Thought: The only object that has a confidence score over 0.8 for having water is B.
Plan:
1. Go to object B
2. Pick up object B
3. Bring to human object B
4. Done