This paper presents an effective local spatio-temporal descriptor for action recognition from depth video sequences. The unique property of our descriptor is that it takes the shape discrimination and action speed variations into account, intending to solve the problems of distinguishing different pose shapes and identifying the actions with different speeds in one goal. The entire algorithm is carried out in three stages. In the first stage, a depth sequence is divided into temporally overlapping depth segments which are used to generate three depth motion maps (DMMs), capturing the shape and motion cues. To cope with speed variations in actions, multiple frame lengths of depth segments are utilized, leading to a multi-temporal DMMs representation. In the second stage, all the DMMs are first partitioned into dense patches. Then, the local binary patterns (LBP) descriptor is exploited to characterize local rotation invariant texture information in those patches. In the third stage, the Fisher kernel is employed to encode the patch descriptors for a compact feature representation, which is fed into a kernel-based extreme learning machine classifier. Extensive experiments on the public MSRAction3D, MSRGesture3D and DHA datasets show that our proposed method outperforms state-of-the-art approaches for depth-based action recognition.
|Title of host publication
|Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence
|International Joint Conferences on Artificial Intelligence
|Published - 2016