Evaluation Prompt Protocol

This page provides the exact prompts used in our benchmark evaluation: Direct Answer (NoCoT) and Zero-shot Chain-of-Thought (ZeroShot-CoT).

Across all settings, we enforce final option output within <answer> tags for stable parsing. Under CoT, models are additionally asked to write intermediate observations/calculations in <think> tags before answering.

Please watch the provided video carefully and answer the following question based on the visual information.

**Question:** {{ Question }}

**Options:**
A. {{ Choices[0] }}
B. {{ Choices[1] }}
C. {{ Choices[2] }}
D. {{ Choices[3] }}

Please provide your output according to the following requirements:
1. Directly provide the final chosen option within the <answer> tags.
2. Do not output any reasoning or any extra sentences.

Your output format MUST strictly follow this structure:
<answer>Final option letter</answer>
Please watch the provided video carefully and answer the following question based on the visual information.

**Question:** {{ Question }}

**Options:**
A. {{ Choices[0] }}
B. {{ Choices[1] }}
C. {{ Choices[2] }}
D. {{ Choices[3] }}

Please provide your output according to the following requirements:
1. First, document your observations of video details, frequency of actions, or the specific calculation process within the <think> tags.
2. Then, provide the final chosen option (strictly limited to "A", "B", "C", or "D") within the <answer> tags.

Your output format MUST strictly follow this structure:
<think>
[Provide your reasoning, counting, or calculation process here...]
</think>
<answer>Final option letter</answer>
Back to Main Page