Project: Video-based 3D human recognition
Project: People detection, tracking and pose recognition
Video-based 3D human motion recognition
New techniques for 3D tracking of people and understanding poses
A new way to interact with a computer, is to simply use the player's poses, which are tracked and interpreted via cameras. We develop new techniques for accurately tracking people and recognizing their poses, even when they are occluding each other. No special markers on the bodies are needed to track the people.
Although current computer vision research has achieved promising results in interpretating the pose and gestures of a single person given multiple video sequences, intepreting the poses of multiplepeople in a relatively dense group is still an open problem. The key difficulties are the inter-person occlusion and limb ambiguities which hamper the interpretation. This project studies and develops new techniques for video-based human pose and gesture recognition. We are developing an efficient and robust platform for multiple-people tracking, body model construction, pose recognition and gesture understanding, all in 3D. We aim at utilizing this platform in human-computer interaction applications like in pose-driven games or gesture-driven presentations.
Using multiple cameras, we reconstruct the 3D volume data of moving people in a target scene. The reconstruction is automatic and real time. The volume data consists of the information of the moving people and their poses. The goal is to track the movement and to recognize the poses. To be able to track the movements, first the locations of each individual in the 3D world has to be determined, and then track the location of the next frame by considering the current frame. The basic ideas are (1) to use the appearance of the target person in 2D images, (2) to estimate the 2D location of the person in all views, (3) to backproject the 2D locations from all views onto the 3D world to have an intersection, representing the location of the target person, (4) to employ the location of the previous frames to improve the robustness of the estimation. To overcome the inter-person occlusion problem, a technique based on the best visibility of the views is introduced. The visibility ranking is computed based on run-time measurements of person-person and person-view relative positions. By fusing the information from two views with the best visibility, more robust tracking of people under severe occlusions can be achieved. Having tracked the location of each person, then we can segment the volume data with respect to each person, by fitting a 3D skeleton model to the 3D data. Finally, as a result, the poses of each person can be identified.
We will update the tracking methods to a more flexible framework, to enable automatic initialization, and to handle people who enter or leave the scene. We will further improve our prototype multiplepeople pose estimation, by integrating more cues. These include the appearance of persons, and motion prediction. We will further extend multiperson pose recognition by using joint locations as feature to classify different poses. While multiple people pose recognition is further elaborated, interpretation of multi-person interaction will be a next step. For the evaluation of our methods, we will develop demonstrators such as a gaming environment with video input, and a gesture-driven slide show presentation framework.
3.2 Detecting, interpreting and affecting user behavior
Technical University Delft
Luo et al. (2010). Human Pose Estimation for Multiple Persons Based on Volume Reconstruction. 20th International Conference on Pattern Recognition (ICPR), 2010. More publications.
Remco Veltkamp, Utrecht University
People detection, tracking and pose recognition
Fast and accurate modeling human upper body pose
A well-known video-based application is man-machine interaction, in which people can use their facial expressions, gestures and poses to control e.g. virtual actors or (serious) games. We are developing new methods that allow players to get rid of controllers and play games using intuitive body movements and poses.
Although there have been a significant number of investigations on human motion capture, most of them are marker-based. People need to wear specific suits with markers on it to track the movement of different body parts, which is not convenient for real applications. To solve this problem, marker-less human motion capture system is desired. Compared with a single person situation, multiple person tracking and pose estimation has more challenges, such as dealing with occlusion between persons and self occlusion. The objective of the project is to develop new algorithms wich can detect, track, and model a small group of people in an indoor environment.
We propose a real time system which can detect, track people, and recognize poses. In the people detection, tracking and pose recognition system, body parts such as the torso and the hands are segmented from the whole body and tracked over time. The 2D coordinates of these body parts are used as the input of a pose recognition system. By transferring distance and angles between the torso center and the hands into a classifier feature space, simple classifiers, such as the nearest mean classifier, are sufficient for recognizing predefined key poses. The single person detection and tracking is extended to a multiple person scenario. We developed a combined probability estimation approach to detect and track multiple persons for pose estimation at the same time. It can deal with partial and total occlusion between persons by adding torso appearance to the tracker. Moreover, the upper body of each individual is further segmented into head, torso, upper arm and lower arm in a hierarchical way. The joint location and angles are obtained from the pose estimation and can be used for pose recognition.
We will further extend multiple persons pose estimation into pose recognition. The goal is to use joint locations as features to classify different poses. We will also investigate pose detectors to reject non-pose examples based on the proposed features. The approach is to first separate the poses and non-poses, then to clearly distinguish different poses from each other. We will focus on improving the accuracy and robustness of the existing system. The emphasis will on the use of multiple cameras and information fusion. A vision based human pose detection system makes controller free games possible.
3.2 Detecting, interpreting and affecting user behavior
Delft University of Technology
F. Huo, et al. (2009). Markerless human motion capture and pose recognition. International Workshop on Image Analysis for Multimedia Interactive Services, pp. 13-16. More publications.
Emile Hendriks, Delft University of Technology