For a coarse gaze estimation system to be useful, many people must be tracked simultaneously in real-time and in the presence of frequent occlusions and other distractions such as animals or vehicles. Two tracking systems were developed, both of which were based on two important image measurements. The first measurement was the output of a head detector which was trained using Dalal & Trigg's HOG detection algorithm that has become standard for the purposes of pedestrian detection. Although HOG detection is generally slow, it has become suitable for real-time use due to efficient GPU implementations. The second type of measurement comes from sparse KLT tracking. Although it has been around for a long time, KLT corner tracking still provides an impressive amount of information from very little processing time.
The first tracking system to be developed was based around a Kalman filter, however this proved to be susceptable to data association errors when the HOG detector failed. The second more recent approach uses Markov-Chain Monte-Carlo Data Association (MCMCDA) with an accurate error model. MCMCDA allows ambiguities to be resolved more efficiently, but also allows the tracking system to cope with temporary occlusions.
The 'Town Centre' dataset was used to test tracking performance in both the CVPR 2011 and the BMVC 2009 papers.
TownCentreXVID.avi (342MB) - The video file
TownCentre-calibration.ci (<1K) - The camera calibration data. This is in a human-readable format. The ground plane is at z=0 in the world coordinates.
TownCentre-groundtruth.top (5.3MB) - The hand labelled ground truth data. See below for a description of the 'top' file format. Note that the full body regions were estimated based on the head regions using the camera calibration with approximate human dimensions, so may be inaccurate.
Tracker Output file format
The ground truth and tracking output is provided in the '.top' file format. This consists of rows in comma-seperated variable (CSV) format:
personNumber, frameNumber, headValid, bodyValid, headLeft, headTop, headRight, headBottom, bodyLeft, bodyTop, bodyRight, bodyBottom
personNumber - A unique identifier for the individual person frameNumber - The frame number (counted from 0) headValid - 1 if the head region is valid, 0 otherwise bodyValid - 1 if the body region is valid, 0 otherwise headLeft,headTop,headRight,headBottom - The head bounding box in pixels bodyLeft,bodyTop,bodyRight,bodyBottom - The body bounding box in pixels For the purposes of tracking evaluation, full body regions were considered to be matched if they overlap by at least 50% and head regions were required to overlap by at least 25%.
- Colour Invariant Head Pose Classification in Low Resolution Video
- Guiding Visual Surveillance by Tracking Human Attention
- Stable Multi-Target Tracking in Real-Time Surveillance Video
No license specified, the work may be protected by copyright.