Brandyn White

Brandyn White

Brandyn White  //  I'm a PhD student at UMD College Park in Computer Science and one half of Dapper Vision, Inc. My research areas are Computer Vision, Artificial Intelligence, and Distributed Systems. My background is in FPGAs, MapReduce, Graphical Models, Computer Vision, and Mobile Devices. My goal is to make computer vision accessible to developers.

Research

A Case for Query by Image and Text Content: Searching Computer Help using Screenshots and Keywords

Abstract
The multimedia information retrieval community has dedicated extensive research effort to the problem of content-based image retrieval (CBIR). However, these systems find their main limitation in the difficulty of creating pictorial queries. As a result, few systems offer the option of querying by visual examples, and rely on automatic concept detection and tagging techniques to provide support for searching visual content using textual queries.

This paper proposes and studies a practical multimodal web search scenario, where CBIR fits intuitively to improve the retrieval of rich information queries. Many online articles contain useful know-how knowledge about computer applications. These articles tend to be richly illustrated by screenshots. We present a system to search for such software know-how articles that leverages the visual correspondences between screenshots. Users can naturally create pictorial queries simply by taking a screenshot of the application to retrieve a list of articles containing a matching screenshot.

We build a prototype comprising 150k articles that are classified into walkthrough, book, gallery, and general categories, and provide a comprehensive evaluation of this system, focusing on technical (accuracy of CBIR techniques) and usability (perceived system usefulness) aspects. We also consider the study of added value features of such a visual-supported search, including the ability to perform cross-lingual queries. We find that the system is able to retrieve matching screenshots for a wide variety of programs, across language boundaries, and provide subjectively more useful results than keyword-based web and image search engine.

Documents
Camera Ready
Preceding Tech Report

Bibtex
@inproceedings{yeh11www,
author = {Tom Yeh and Brandyn White and Jose San Pedro and Boris Katz and Larry Davis},
title = {A Case for Query by Image and Text Content: Searching Computer Help using Screenshots and Keywords},
booktitle = {WWW},
year = {2011},
pages = {775--784}}

VizWiz: nearly real-time answers to visual questions

Abstract
The lack of access to visual information like text labels, icons, and colors can cause frustration and decrease independence for blind people. Current access technology uses automatic approaches to address some problems in this space, but the technology is error-prone, limited in scope, and quite expensive. In this paper, we introduce VizWiz, a talking application for mobile phones that offers a new alternative to answering visual questions in nearly real-time - asking multiple people on the web. To support answering questions quickly, we introduce a general approach for intelligently recruiting human workers in advance called quikTurkit so that workers are available when new questions arrive. A field deployment with 11 blind participants illustrates that blind people can effectively use VizWiz to cheaply answer questions in their everyday lives, highlighting issues that automatic approaches will need to address to be useful. Finally, we illustrate the potential of using VizWiz as part of the participatory design of advanced tools by using it to build and evaluate VizWiz::LocateIt, an interactive mobile tool that helps blind people solve general visual search problems.

Documents
Camera Ready

Bibtex
@inproceedings{bigham10uist,
author = {Bigham, Jeffrey P. and Jayant, Chandrika and Ji, Hanjie and Little, Greg and Miller, Andrew and Miller, Robert C. and Miller, Robin and Tatarowicz, Aubrey and White, Brandyn and White, Samual and Yeh, Tom},
title = {VizWiz: nearly real-time answers to visual questions},
booktitle = {UIST},
year = {2010},
pages = {333--342}}

Web-Scale Computer Vision using MapReduce for Multimedia Data Mining

Abstract
This work explores computer vision applications of the MapReduce framework that are relevant to the data mining community. An overview of MapReduce and common design patterns are provided for those with limited MapReduce background. We discuss both the high level theory and the low level implementation for several computer vision algorithms: classifier training, sliding windows, clustering, bag-of-features, background subtraction, and image registration. Experimental results for the k-means clustering and single Gaussian background subtraction algorithms are performed on a 410 node Hadoop cluster.

Documents
Camera Ready
Presentation

Code
Examples

Bibtex
@inproceedings{white10kdd,
author = {Brandyn White and Tom Yeh and Jimmy Lin and Larry Davis},
booktitle = {MDMKDD},
title = {Web-Scale Computer Vision using MapReduce for Multimedia Data Mining},
year = {2010}}

Automatic Analysis of Embodied Team Actions

Abstract
We describe a system which, building on previous team action recognition systems, performs a more in-depth analysis of an ongoing team action executed by a group of embodied agents. The system relies on team action states with human understandable semantics, estimates the current state and is able to make predictions or identify fringe cases such as incomplete or incorrectly executed team actions. The representation of the team action relies on a dynamic Bayesian network (DBN). We perform reasoning over the DBN using a sampling-importance-resampling particle filter. As a methodological illustration, we describe the process of model building for the bounding overwatch team action. We experimentally test our approach using data acquired from video recordings, and measure the systems' ability to recognize a team action and to estimate the current state.

Documents
Camera Ready

Bibtex
@inproceedings{white09pair,
author = {Brandyn White and Ladislau Boloni},
booktitle = {Workshop on Plan, Activity, and Intent Recognition at IJCAI},
title = {Automatic Analysis of Embodied Team Actions},
year = {2009}}

Analyzing Team Actions with Cascading HMM

Abstract
While team action recognition has a relatively extended literature, less attention has been given to the detailed realtime analysis of the internal structure of the team actions. This includes recognizing the current state of the action, predicting the next state, recognizing deviations from the standard action model, and handling ambiguous cases. The underlying probabilistic reasoning model has a major impact on the type of data it can extract, its accuracy, and the computational cost of the reasoning process. In this paper we are using Cascading Hidden Markov Models (CHMM) to analyze Bounding Overwatch, an important team action in military tactics.The team action is represented in the CHMM as a plan tree. Starting from real-world recorded data, we identify the subteams through clustering and extract team oriented discrete features. In an experimental study, we investigate whether the better scalability and the more structured information provided by the CHMM comes with an unacceptable cost in accuracy. We find the a properly parametrized CHMM estimating the current goal chain of the Bounding Overwatch plan tree comes very close to a flat HMM estimating only the overall Bounding Overwatch state (a subset of the goal chain) at a respective overall state accuracy of 95% vs 98%, making the CHMM a good candidate for deployed systems.

Documents
Camera Ready

Bibtex
@inproceedings{white09flairs,
author = {Brandyn White and Nate Blaylock and Ladislau Boloni},
booktitle = {FLAIRS},
title = {Analyzing Team Actions with Cascading HMM},
year = {2009}}

Using FPGAs to Perform Embedded Image Registration

Abstract
Image registration is the process of relating the intensity values of one image to another image using their pixel content alone. An example use of this technique is to create panoramas from individual images taken from a rotating camera. A class of image registration algorithms, known as direct registration methods, uses intensity derivatives to iteratively estimate the parameters modeling the transformation between the images. Direct methods are known for their sub-pixel accurate results; however, their execution is computationally expensive, often times preventing use in an embedded capacity like those encountered in small unmanned aerial vehicle or mobile phone applications. In this work, a high performance FPGA-based direct affine image registration core is presented. The proposed method combines two features: a fully pipelined architecture to compute the linear system of equations, and a Gaussian elimination module, implemented as a finite state machine, to solve the resulting linear system.The design is implemented on a Xilinx ML506 development board featuring a Virtex-5 SX50 FPGA, zero bus turn-around (ZBT) RAM, and VGA input. Experimentation is performed on both real and synthetic data. The registration core performs in excess of 80 frames per second on 640x480 images using one registration iteration.

Documents
Camera Ready

Code
Full Source

Bibtex
@phdthesis{white09himthesis,
author = {Brandyn White},
type = {Undergraduate thesis},
title = {Using FPGAs to Perform Embedded Image Registration},
school = {University of Central Florida},
year = {2009}}

Automatically tuning background subtraction parameters using particle swarm optimization

Abstract
A common trait of background subtraction algorithms is that they have learning rates, thresholds, and initial values that are hand-tuned for a scenario in order to produce the desired subtraction result; however, the need to tune these parameters makes it difficult to use stateof-the-art methods, fuse multiple methods, and choose an algorithm based on the current application as it requires the end-user to become proficient in tuning a new parameter set. The proposed solution is to automate this task by using a Particle Swarm Optimization (PSO) algorithm to maximize a fitness function compared to provided ground-truth images. The fitness function used is the Fmeasure, which is the harmonic mean of recall and precision. This method reduces the total pixel error of the Mixture of Gaussians background subtraction algorithm by more than 50 % on the diverse Wallflower data-set.

Documents
Camera Ready

Bibtex
@inproceedings{white07icme,
author = {Brandyn White and Mubarak Shah},
booktitle = {ICME},
title = {Automatically tuning background subtraction parameters using particle swarm optimization},
year = {2007}}