Summary: | Detecting and tracking objects in videos and images is a rapidly growing field of
research. Identifying, recognising, detecting and tracking objects such as humans, cars,
obstacles etc. has many applications. There are a large number of methods to perform
these tasks. They vary in performance, quality of results, type of results, types of raw
data and so on.
This project aims to detect and track objects, exclusively from depth video. Depth
video is a video sequence captured by a Kinect camera with the pixel index values of
each frame being the distance of the real point represented by that pixel from the
camera. Detection is performed using a morphological segmentation technique called
watershed transform. The detection parameters chosen are derived from the objects of
interest in the test dataset. Two methods, edge-based detection and region-based
detection, are used for pre-processing and the result with the largest detected area is
selected. The two methods complement each other in many cases, making the use of
both necessary.
The objects of interest in the dataset are human beings. Thus various types of situations
have been used to test the efficiency of the algorithm, such as crowded areas, noncrowded
areas, single object, multiple objects, occluded objects and non-occluded
objects. Challenges arise when multiple objects move across a scene. This problem
has been addressed with a method for tracking multiple objects with, theoretically, no
limit to the maximum number of objects. However, occlusion deteriorates algorithm
performance. Test results are presented and compared to existing methods. While the
accuracy and efficiency of the proposed system is moderately high, its implementation
is simple, reducing processing time greatly.
|