Better exploiting motion for better action recognition

Better exploiting motion for better action recognition

CVPR 2013
Mihir Jain, Hervé Jegou and Patrick Bouthemy
INRIA - Rennes


In our paper, we present the following contributions for action recognition:
  • Motion compensation: We decompose visual motion into dominant and residual motions.
    • We estimate the dominant motion by an affine motion model, which is a good trade-off between accuracy and efficiency.
    • The residual motion field, which call w-flow, obtained by canceling the dominant motion (predominantly camera motion) is more related to the actions.
    • This w-flow is employed for both extraction of space-time trajectories and for the computation of descriptors.
    • Here is an example of trajectories obtained by using optical flow, affine flow and w-flow:

    Trajectories from optical flow Trajectories from affine flow Trajectories from w-flow

  • Kinematic features: A motion descriptor is proposed which is based on differential motion scalar quantities, divergence, curl and shear features. This descriptors is named as DCS (divergence-curl-shear) descriptor.

  • VLAD in actions: VLAD coding technique proposed in image retrieval provides a substantial improvement for action recognition.



Hollywood 2 (mAP)
Ullah et al. (BMVC'10) 55.7%
Wang et al. (CVPR'11) 58.3%
Vig et al. (ECCV'12) 60.0%
Jiang et al. (ECCV'12) 59.5%
Ours 62.5%
HMDB51 (avg. accuracy)
Kuehne et al. (ICCV'11) 22.8%
Sadanand et al. (CVPR'12) 26.9%
Orit et al. (ECCV'12) 29.2%
Jiang et al. (ECCV'12) 40.7%
Ours 52.1%
Olympic Sports (mAP)
Niebles et al. (ECCV'10) 72.1%
Liu et al. (CVPR'11) 74.4%
Jiang et al. (ECCV'12) 80.6%
Ours 83.2%


Inquiries should be addressed to mihir DOT jain AT inria DOT fr