This paper addresses a new problem, that of multiscale activity recognition. Our goal is to detect and localize a wide range of activities, including individual actions and group activities, which may simultaneously co-occur in high resolution video. The video resolution allows for digital zoom-in (or zoom-out) for examining fine details (or coarser scales), as needed [...]
Author Archive
Cost-Sensitive Top-down/Bottom-up Inference for Multiscale Activity Recognition (ECCV 2012)
Posted by: amerm | August 8, 2012 | No Comment |Sum-Product Networks for Modeling Activities with Stochastic Structure (CVPR 2012)
Posted by: amerm | April 17, 2012 | No Comment |This paper addresses recognition of human activitieswith stochastic structure, characterized by variable spacetimearrangements of primitive actions, and conducted by avariable number of actors. We demonstrate that modelingaggregate counts of visual words is surprisingly expressiveenough for such a challenging recognition task. An activityis represented by a sum-product network (SPN). SPN is amixture of bags-of-words (BoWs) with [...]
Fine-grained Categorization of Fish Motion Patterns in Underwater Videos (ICCV 2011)
Posted by: amerm | December 10, 2011 | 1 Comment |Marine biologists commonly use underwater videos fortheir research on studying the behaviors of sea organisms.Their video analysis, however, is typically based on visualinspection. This incurs prohibitively large user costs, andseverely limits the scope of biological studies. There is aneed for developing vision algorithms that can address specificneeds of marine biologists, such as fine-grained categorizationof fish [...]
PEL-CNF: Probabilistic Event Logic Conjunctive Normal Form for Video Interpretation (ICCV 2011)
Posted by: amerm | December 10, 2011 | 1 Comment |This is a theoretical paper that proves that probabilisticevent logic (PEL) is MAP-equivalent to its conjunctivenormal form (PEL-CNF). This allows us to address theNP-hard MAP inference for PEL in a principled manner.We first map the confidence-weighted formulas from a PEL knowledge base to PEL-CNF, and then conduct MAP inferencefor PEL-CNF using stochastic local search. Our [...]
A Chains Model for Localizing Participants of Group Activities in Videos (ICCV 2011)
Posted by: amerm | August 10, 2011 | No Comment |Given a video, we would like to recognize group activities,localize video parts where these activities occur, anddetect actors involved in them. This advances prior workthat typically focuses only on video classification. We makea number of contributions. First, we specify a new, midlevel,video feature aimed at summarizing local visual cuesinto bags of the right detections (BORDs). [...]
Multiobject Tracking as Maximum-Weight Independent Set (CVPR 2011)
Posted by: amerm | March 24, 2011 Comments Off |This paper addresses the problem of simultaneous tracking of multiple targets representing occurrences of distinct object classes in complex scenes. We apply object detectors to every frame, and build a graph of tracklets, defined as pairs of detection responses from every two consecutive frames. The graph helps transitively link the best matching detections that do [...]
The 2.1D sketch is a layered representation of occluding and occluded surfaces of the scene. Extracting the 2.1D sketch from a single image is a difficult and important problem arising in many applications. We present a fast and robust algorithm that uses boundaries of image regions and T-junctions, as important visual cues about the scene structure, to estimate the scene [...]
