Home
Scene 1: Countertop Scene 2: Art Table Scene 3: Floor Scene 4: Kitchen A Scene 5: Kitchen B Scene 6: Salad Bar Scene 7: Living Room Scene 8: Shelf Robot Scene 1 Robot Scene 2

Scene 1: Countertop

A: bottle

B: pitcher (container)

C: bowl [flatter bowl]

D: towel [shirt]

E: countertop

F: bowl [taller ceramic bowl]

G: measuring cup [lock]


Here we provide a scene from our real scene planning evaluation, and one example task for it. We provide the object detections and labels from OWL-ViT, which is what our planner has access to, but we provide more precise labels in square brackets. We provide color-coded bounding boxes for the object detections that are relevant for the example task. For each version of the LLM planner, we provide its output, including possible interaction with a VLM, and the generated plan. Outputs generated by the VLMs are in orange, and we mark correct and incorrect plans with green and red respectively.

Task 3: Bring me all containers that you can confidently determine have water.

In this task, we want to determine if the VLM makes assumptions regarding the contents of containers. For instance, A is a dark container, so we cannot tell if the liquid inside is water, and containers C and F are empty. The only container that should be brought is B , which visibly has water inside.

Generated Plans

No VLM

Thought: I cannot confidently determine which containers have water without additional information.

Plan:

1. Done

InstructBLIP

Thought: I can ask which objects have water out of all the objects I can carry. I will only bring the ones with a confidence score over 0.8.

Question about [A, B, C, D, F, G]: Does this object have water?

Answer
A: Yes (0.9), No (0.09), Unknown (0.0)
B: Yes (0.99), No (0.01), Water (0.0)
C: Yes (0.98), No (0.02), Water (0.0)
D: No (0.67), Yes (0.25), Unknown (0.04)
F: Yes (0.95), No (0.04), Water (0.0)
G: No (0.8), Yes (0.15), Unknown (0.02)

Thought: The objects with water and a confidence score over 0.8 are A, B, C, and F.

Plan:

1. Go to object A
2. Pick up object A
3. Bring to human object A
4. Put down object A
5. Go to object B
6. Pick up object B
7. Bring to human object B
8. Put down object B
9. Go to object C
10. Pick up object C
11. Bring to human object C
12. Put down object C
13. Go to object F
14. Pick up object F
15. Bring to human object F
16. Put down object F
17. Done

PG-InstructBLIP (ours)

Thought: I can ask which objects have water out of all the objects I can carry. I will only bring the ones with a confidence score over 0.8.

Question about [A, B, C, D, F, G]: Does this object have water?

Answer
A: Yes (0.58), No (0.39), Unknown (0.02)
B: Yes (0.98), No (0.01), Water (0.0)
C: No (0.96), Yes (0.03), Nothing (0.01)
D: No (0.87), Yes (0.1), Unknown (0.01)
F: No (0.93), Yes (0.06), Nothing (0.01)
G: No (0.93), Yes (0.05), Unknown (0.01)

Thought: The only object that has a confidence score over 0.8 for having water is B.

Plan:

1. Go to object B
2. Pick up object B
3. Bring to human object B
4. Done