Abstract: While text-to-video (T2V) generative models produce exceptionally realistic videos, they lack a comprehensive evaluation across the temporal dimension, with a limited focus on basic dynamics ...