header image

Archive for Dissertation

This dissertation addresses the problem of recognizing human activities in videos. Our focus is on activities with stochastic structure, where the activities are characterized by variable space-time arrangements of actions, and conducted by a variable number of actors. These activities occur frequently in sports and surveillance videos. They may appear jointly in multiple instances, at different spatial and temporal scales, under occlusion, and amidst background clutter. These challenges have never been addressed in the literature. Our hypothesis is that these challenges can be successfully addressed using expressive, hierarchical models explicitly encoding activity parts and their spatio-temporal relations. Our hypothesis is formalized using two novel paradigms. One specifies a new constrained hierarchical model of activities allowing efficient activity recognition. Specifically, we formulate Sum-Product Networks (SPNs) for modeling activities, and develop two new learning algorithms using variational learning. The other paradigm considers a more expressive (unconstrained) hierarchical model, And-Or Graphs (AOGs), requiring cost-efficient algorithms for activity recognition. In particular, we develop a new, Monte Carlo Tree Search based inference of AOGs. Our theoretical and empirical studies advance computer vision through demonstrated advantages of each paradigm, compared to the state-of-the-art. Dissertation

under: Dissertation