<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mohamed Amer -- Graduate Student at OSU</title>
	<atom:link href="http://blogs.oregonstate.edu/amer/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.oregonstate.edu/amer</link>
	<description></description>
	<lastBuildDate>Mon, 21 Jan 2013 23:47:24 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Cost-Sensitive Top-down/Bottom-up Inference for Multiscale Activity Recognition (ECCV 2012)</title>
		<link>http://blogs.oregonstate.edu/amer/2012/08/08/cost-sensitive-top-downbottom-up-inference-for-multiscale-activity-recognition-eccv-2012/</link>
		<comments>http://blogs.oregonstate.edu/amer/2012/08/08/cost-sensitive-top-downbottom-up-inference-for-multiscale-activity-recognition-eccv-2012/#comments</comments>
		<pubDate>Wed, 08 Aug 2012 23:00:11 +0000</pubDate>
		<dc:creator>amerm</dc:creator>
				<category><![CDATA[Publications]]></category>

		<guid isPermaLink="false">http://blogs.oregonstate.edu/amer/?p=68</guid>
		<description><![CDATA[This paper addresses a new problem, that of multiscale activity recognition. Our goal is to detect and localize a wide range of activities, including individual actions and group activities, which may simultaneously co-occur in high resolution video. The video resolution allows for digital zoom-in (or zoom-out) for examining fine details (or coarser scales), as needed [...]]]></description>
				<content:encoded><![CDATA[<p>This paper addresses a new problem, that of multiscale activity recognition. Our goal is to detect and localize a wide range of activities, including individual actions and group activities, which may simultaneously co-occur in high resolution video. The video resolution allows for digital zoom-in (or zoom-out) for examining fine details (or coarser scales), as needed for recognition. The key challenge is how to avoid running a multitude of detectors at all spatiotemporal scales, and yet arrive at a holistically consistent video interpretation. To this end,we use a three-layered AND-OR graph to jointly model group activities, individual actions, and participating objects. The AND-OR graph allows a principled formulation of efficient, cost-sensitive inference via an explore-exploit strategy. Our inference optimally schedules the following computational processes: 1) direct application of activity detectors – called α process; 2) bottom-up inference based on detecting activity parts – called β process; and 3) top-down inference based on detecting activity context – called γ process. The scheduling iteratively maximizes the log-posteriors of the resulting parse graphs. For evaluation, we have compiled and benchmarked a new dataset of high-resolution videos of groupand individual activities co-occurring in a courtyard of the UCLA campus. <a href="http://web.engr.oregonstate.edu/~amerm/Website/eccv12_multiscale_activities.pdf">Paper</a> Presentation <a href="http://blogs.oregonstate.edu/amer/code/">Code</a> <a href="http://vcla.stat.ucla.edu/Projects/Multiscale_Activity_Recognition/">Dataset</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.oregonstate.edu/amer/2012/08/08/cost-sensitive-top-downbottom-up-inference-for-multiscale-activity-recognition-eccv-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sum-Product Networks for Modeling Activities with Stochastic Structure (CVPR 2012)</title>
		<link>http://blogs.oregonstate.edu/amer/2012/04/17/sum-product-networks-for-modeling-activities-with-stochastic-structure/</link>
		<comments>http://blogs.oregonstate.edu/amer/2012/04/17/sum-product-networks-for-modeling-activities-with-stochastic-structure/#comments</comments>
		<pubDate>Tue, 17 Apr 2012 02:40:44 +0000</pubDate>
		<dc:creator>amerm</dc:creator>
				<category><![CDATA[Publications]]></category>

		<guid isPermaLink="false">http://blogs.oregonstate.edu/amer/?p=61</guid>
		<description><![CDATA[This paper addresses recognition of human activitieswith stochastic structure, characterized by variable spacetimearrangements of primitive actions, and conducted by avariable number of actors. We demonstrate that modelingaggregate counts of visual words is surprisingly expressiveenough for such a challenging recognition task. An activityis represented by a sum-product network (SPN). SPN is amixture of bags-of-words (BoWs) with [...]]]></description>
				<content:encoded><![CDATA[<p>This paper addresses recognition of human activitieswith stochastic structure, characterized by variable spacetimearrangements of primitive actions, and conducted by avariable number of actors. We demonstrate that modelingaggregate counts of visual words is surprisingly expressiveenough for such a challenging recognition task. An activityis represented by a sum-product network (SPN). SPN is amixture of bags-of-words (BoWs) with exponentially manymixture components, where subcomponents are reused bylarger ones. SPN consists of terminal nodes representingBoWs, and product and sum nodes organized in a numberof layers. The products are aimed at encoding particularconfigurations of primitive actions, and the sums serve tocapture their alternative configurations. The connectivityof SPN and parameters of BoW distributions are learnedunder weak supervision using the EM algorithm. SPN inferenceamounts to parsing the SPN graph, which yields themost probable explanation (MPE) of the video in terms ofactivity detection and localization. SPN inference has linearcomplexity in the number of nodes, under fairly generalconditions, enabling fast and scalable recognition. A newVolleyball dataset is compiled and annotated for evaluation.Our classification accuracy and localization precision andrecall are superior to those of the state-of-the-art on thebenchmark and our Volleyball datasets. <a href="http://web.engr.oregonstate.edu/~amerm/Website/cvpr12_SPNGrid.pdf">Paper</a> <a href="http://web.engr.oregonstate.edu/~amerm/Website/CVPR12Poster.pdf">Poster</a> <a href="http://blogs.oregonstate.edu/amer/code/">Code</a> <a href="https://docs.google.com/spreadsheet/viewform?formkey=dFh3MTZZMmpHMkFZVTFZamI5LVVtR0E6MA">Dataset</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.oregonstate.edu/amer/2012/04/17/sum-product-networks-for-modeling-activities-with-stochastic-structure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fine-grained Categorization of Fish Motion Patterns in Underwater Videos (ICCV 2011)</title>
		<link>http://blogs.oregonstate.edu/amer/2011/12/10/fine-grained-categorization-of-fish-motion-patterns-in-underwater-videos/</link>
		<comments>http://blogs.oregonstate.edu/amer/2011/12/10/fine-grained-categorization-of-fish-motion-patterns-in-underwater-videos/#comments</comments>
		<pubDate>Sat, 10 Dec 2011 02:41:13 +0000</pubDate>
		<dc:creator>amerm</dc:creator>
				<category><![CDATA[Publications]]></category>

		<guid isPermaLink="false">http://blogs.oregonstate.edu/amer/?p=52</guid>
		<description><![CDATA[Marine biologists commonly use underwater videos fortheir research on studying the behaviors of sea organisms.Their video analysis, however, is typically based on visualinspection. This incurs prohibitively large user costs, andseverely limits the scope of biological studies. There is aneed for developing vision algorithms that can address specificneeds of marine biologists, such as fine-grained categorizationof fish [...]]]></description>
				<content:encoded><![CDATA[<p>Marine biologists commonly use underwater videos fortheir research on studying the behaviors of sea organisms.Their video analysis, however, is typically based on visualinspection. This incurs prohibitively large user costs, andseverely limits the scope of biological studies. There is aneed for developing vision algorithms that can address specificneeds of marine biologists, such as fine-grained categorizationof fish motion patterns. This is a difficult problem, because of very small inter-class and large intra-classdifferences between fish motion patterns. Our approachconsists of three steps. First, we apply our new fish detectorto identify and localize fish occurrences in each frame, underpartial occlusion, and amidst dynamic texture patternsformed by whirls of sand on the sea bed. Then, we conducttracking-by-detection. Given the similarity between fish detections,defined in terms of fish appearance and motionproperties, we formulate fish tracking as transitively linkingsimilar detections between every two consecutive frames,so as to maintain their unique track IDs. Finally, we extracthistograms of fish displacements along the estimated tracks.The histograms are classified by the Random Forest techniqueto recognize distinct classes of fish motion patterns.Evaluation on challenging underwater videos demonstratesthat our approach outperforms the state of the art. <a href="http://web.engr.oregonstate.edu/~amerm/Website/ICCV11WorkshopVectar.pdf">Paper</a> <a href="http://web.engr.oregonstate.edu/~amerm/Website/iccv11_fish_poster.pdf">Poster</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.oregonstate.edu/amer/2011/12/10/fine-grained-categorization-of-fish-motion-patterns-in-underwater-videos/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>PEL-CNF: Probabilistic Event Logic Conjunctive Normal Form for Video Interpretation (ICCV 2011)</title>
		<link>http://blogs.oregonstate.edu/amer/2011/12/10/pel-cnf-probabilistic-event-logic-conjunctive-normal-form-for-video-interpretation/</link>
		<comments>http://blogs.oregonstate.edu/amer/2011/12/10/pel-cnf-probabilistic-event-logic-conjunctive-normal-form-for-video-interpretation/#comments</comments>
		<pubDate>Sat, 10 Dec 2011 02:40:15 +0000</pubDate>
		<dc:creator>amerm</dc:creator>
				<category><![CDATA[Publications]]></category>

		<guid isPermaLink="false">http://blogs.oregonstate.edu/amer/?p=47</guid>
		<description><![CDATA[This is a theoretical paper that proves that probabilisticevent logic (PEL) is MAP-equivalent to its conjunctivenormal form (PEL-CNF). This allows us to address theNP-hard MAP inference for PEL in a principled manner.We first map the confidence-weighted formulas from a PEL knowledge base to PEL-CNF, and then conduct MAP inferencefor PEL-CNF using stochastic local search. Our [...]]]></description>
				<content:encoded><![CDATA[<p>This is a theoretical paper that proves that probabilisticevent logic (PEL) is MAP-equivalent to its conjunctivenormal form (PEL-CNF). This allows us to address theNP-hard MAP inference for PEL in a principled manner.We first map the confidence-weighted formulas from a PEL knowledge base to PEL-CNF, and then conduct MAP inferencefor PEL-CNF using stochastic local search. Our MAP inference leverages the spanning-interval data structure forcompactly representing and manipulating entire sets of timeintervals without enumerating them. For experimental evaluation,we use the specific domain of volleyball videos. Ourexperiments demonstrate that the MAP inference for PEL-CNF successfully detects and localizes volleyball events inthe face of different types of synthetic noise introduced inthe ground-truth video annotations. <a href="http://web.engr.oregonstate.edu/~amerm/Website/sig11_PEL.pdf">Paper</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.oregonstate.edu/amer/2011/12/10/pel-cnf-probabilistic-event-logic-conjunctive-normal-form-for-video-interpretation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Chains Model for Localizing Participants of Group Activities in Videos (ICCV 2011)</title>
		<link>http://blogs.oregonstate.edu/amer/2011/08/10/a-chains-model-for-localizing-participants-of-group-activities-in-videos-iccv11/</link>
		<comments>http://blogs.oregonstate.edu/amer/2011/08/10/a-chains-model-for-localizing-participants-of-group-activities-in-videos-iccv11/#comments</comments>
		<pubDate>Wed, 10 Aug 2011 06:30:26 +0000</pubDate>
		<dc:creator>amerm</dc:creator>
				<category><![CDATA[Publications]]></category>

		<guid isPermaLink="false">http://blogs.oregonstate.edu/amer/?p=42</guid>
		<description><![CDATA[Given a video, we would like to recognize group activities,localize video parts where these activities occur, anddetect actors involved in them. This advances prior workthat typically focuses only on video classification. We makea number of contributions. First, we specify a new, midlevel,video feature aimed at summarizing local visual cuesinto bags of the right detections (BORDs). [...]]]></description>
				<content:encoded><![CDATA[<p>Given a video, we would like to recognize group activities,localize video parts where these activities occur, anddetect actors involved in them. This advances prior workthat typically focuses only on video classification. We makea number of contributions. First, we specify a new, midlevel,video feature aimed at summarizing local visual cuesinto bags of the right detections (BORDs). BORDs seek toidentify the right people who participate in a target groupactivity among many noisy people detections. Second, weformulate a new, generative, chains model of group activities.Inference of the chains model identifies a subset ofBORDs in the video that belong to occurrences of the activity,and organizes them in an ensemble of temporal chains.The chains extend over, and thus localize, the time intervalsoccupied by the activity. We formulate a new MAP inferencealgorithm that iterates two steps: i) Warps the chainsof BORDs in space and time to their expected locations,so the transformed BORDs can better summarize local visualcues; and ii) Maximizes the posterior probability of thechains. We outperform the state of the art on benchmarkUT-Human Interaction and Collective Activities datasets,under reasonable running times. <a href="http://web.engr.oregonstate.edu/~amerm/Website/iccv11_chain.pdf">Paper</a> <a href="http://web.engr.oregonstate.edu/~amerm/Website/iccv11_chain_poster.pdf">Poster</a> <a href="http://blogs.oregonstate.edu/amer/code/">Code</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.oregonstate.edu/amer/2011/08/10/a-chains-model-for-localizing-participants-of-group-activities-in-videos-iccv11/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Multiobject Tracking as Maximum-Weight Independent Set (CVPR 2011)</title>
		<link>http://blogs.oregonstate.edu/amer/2011/03/24/multiobject-tracking-as-maximum-weight-independent-set-cvpr-2011/</link>
		<comments>http://blogs.oregonstate.edu/amer/2011/03/24/multiobject-tracking-as-maximum-weight-independent-set-cvpr-2011/#comments</comments>
		<pubDate>Thu, 24 Mar 2011 02:23:21 +0000</pubDate>
		<dc:creator>amerm</dc:creator>
				<category><![CDATA[Publications]]></category>

		<guid isPermaLink="false">http://blogs.oregonstate.edu/amer/?p=11</guid>
		<description><![CDATA[This paper addresses the problem of simultaneous tracking of multiple targets representing occurrences of distinct object classes in complex scenes. We apply object detectors to every frame, and build a graph of tracklets, defined as pairs of detection responses from every two consecutive frames. The graph helps transitively link the best matching detections that do [...]]]></description>
				<content:encoded><![CDATA[<p>This paper addresses the problem of simultaneous tracking of multiple targets representing occurrences of distinct object classes in complex scenes. We apply object detectors to every frame, and build a graph of tracklets, defined as pairs of detection responses from every two consecutive frames. The graph helps transitively link the best matching detections that do not violate hard and soft contextual constraints between the resulting tracks. We prove that this data association problem can be formulated as finding the heaviest subset of non-adjacent tracklets in the graph, called the maximum-weight independent set (MWIS). We present a new, polynomial-time MWIS algorithm, and prove that it converges to an optimum. Similarity between object detections, and the contextual constraints between the tracks, used for data association, are learned online from object appearance and motion properties. Long-term occlusions are addressed by iteratively repeating MWIS to hierarchically merge smaller tracks into longer ones. We outperform the state of the art on the benchmark datasets, and show the advantages of simultaneously accounting for soft and hard constraints in multitarget tracking. <a href="http://web.engr.oregonstate.edu/~amerm/Website/0346.pdf">Paper</a> <a href="http://web.engr.oregonstate.edu/~amerm/Website/cvpr11_MWIS_presentation.pdf">Presentation</a> <a href="http://blogs.oregonstate.edu/amer/code/">Code</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.oregonstate.edu/amer/2011/03/24/multiobject-tracking-as-maximum-weight-independent-set-cvpr-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Monocular Estimation of 2.1D Sketch (ICIP 2010)</title>
		<link>http://blogs.oregonstate.edu/amer/2011/03/24/monocular-estimation-of-2-1d-sketch/</link>
		<comments>http://blogs.oregonstate.edu/amer/2011/03/24/monocular-estimation-of-2-1d-sketch/#comments</comments>
		<pubDate>Thu, 24 Mar 2011 02:14:25 +0000</pubDate>
		<dc:creator>amerm</dc:creator>
				<category><![CDATA[Publications]]></category>

		<guid isPermaLink="false">http://blogs.oregonstate.edu/amer/?p=4</guid>
		<description><![CDATA[The 2.1D sketch is a layered representation of occluding and occluded surfaces of the scene. Extracting the 2.1D sketch from a single image is a difficult and important problem arising in many applications. We present a fast and robust algorithm that uses boundaries of image regions and T-junctions, as important visual cues about the scene structure, to estimate the scene [...]]]></description>
				<content:encoded><![CDATA[<div id="_mcePaste">The 2.1D sketch is a layered representation of occluding and occluded surfaces of the scene. Extracting the 2.1D sketch from a single image is a difficult and important problem arising in many applications. We present a fast and robust algorithm that uses boundaries of image regions and T-junctions, as important visual cues about the scene structure, to estimate the scene layers. The estimation is a quadratic optimization with hinge-loss based constraints, so the 2.1D sketch is smooth in all image areas except on image contours, and image regions forming “stems” of the T-junctions correspond to occluded surfaces in the scene. Quantitative and qualitative results on challenging, real-world images—namely, Stanford depthmap and Berkeley segmentation dataset—demonstrate high accuracy, efficiency, and robustness of our approach. <a href="http://web.engr.oregonstate.edu/~amerm/Website/icip10.pdf">Paper</a> <a href="http://web.engr.oregonstate.edu/~amerm/Website/icip10_Poster.pdf">Poster</a> <a href="http://blogs.oregonstate.edu/amer/files/2011/03/icip101.pdf"></a><a href="http://blogs.oregonstate.edu/amer/code/">Code</a></div>
]]></content:encoded>
			<wfw:commentRss>http://blogs.oregonstate.edu/amer/2011/03/24/monocular-estimation-of-2-1d-sketch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
