header image

Sum-Product Networks for Modeling Activities with Stochastic Structure (CVPR 2012)

Posted by: | April 17, 2012 | No Comment |

This paper addresses recognition of human activitieswith stochastic structure, characterized by variable spacetimearrangements of primitive actions, and conducted by avariable number of actors. We demonstrate that modelingaggregate counts of visual words is surprisingly expressiveenough for such a challenging recognition task. An activityis represented by a sum-product network (SPN). SPN is amixture of bags-of-words (BoWs) with exponentially manymixture components, where subcomponents are reused bylarger ones. SPN consists of terminal nodes representingBoWs, and product and sum nodes organized in a numberof layers. The products are aimed at encoding particularconfigurations of primitive actions, and the sums serve tocapture their alternative configurations. The connectivityof SPN and parameters of BoW distributions are learnedunder weak supervision using the EM algorithm. SPN inferenceamounts to parsing the SPN graph, which yields themost probable explanation (MPE) of the video in terms ofactivity detection and localization. SPN inference has linearcomplexity in the number of nodes, under fairly generalconditions, enabling fast and scalable recognition. A newVolleyball dataset is compiled and annotated for evaluation.Our classification accuracy and localization precision andrecall are superior to those of the state-of-the-art on thebenchmark and our Volleyball datasets. Paper Poster Code Dataset

under: Publications

Leave a response






Your response:

Categories