Abstract
Object detection and tracking using background subtraction suffers from the fragmentation problem which means one object fragments into several blobs because of being similar with the reference image in color. In this paper, we build a visual tracking framework using background subtraction for object detection, and we address the association difficulty of blobs with objects caused by the fragmentation problem by two steps. We firstly cluster the blobs according to the boundary distances of them estimated by an approximating method proposed in this paper. Blobs clustered into the same blob-set are considered from the same object. Secondly, we consider blob-sets possibly from the same object if they exhibit coherent motion, since blobs of the same object may be clustered into different blob-sets if the object fragments severely. A background-matching method is proposed to determine whether two blob-sets exhibiting coherent motion are truly from the same object or from different objects. We test the proposed methods on several real-world video sequences. Quantitative and qualitative experimental results show that the proposed methods handle the problems caused by fragmentation effectively.
Similar content being viewed by others
Notes
Available at http://www.cvg.rdg.ac.uk/PETS2009/data.html.
The values of the parameters can be learned from the previous data, which are available in common applications. Given previous data, one can directly estimate the values of parameter \(\varrho \) and \(\phi \), while we need to explore different values of \(\rho \) and \(\beta \) to determine the optimal settings of them, as described in Sect. 5.
i-Lids dataset for AVSS 2007, available at http://www.eecs.qmul.ac.uk/~andrea/avss2007_d.html.
For an object which existed in the scene for \(N_1\) time steps, if it was correctly labeled for \(N_2\) time steps by a labeling method, the correct labeling rate of the method to the object can be computed as \(N_2/N_1\).
Abbreviations
- \(N_g\) :
-
The number of particles used for global sampling
- \(b_{j}\) :
-
The \(j\)th pixel
- \(B_i\) :
-
The boundary of the \(i\)th blob
- \(M_i\) :
-
The number of pixels in \(B_i\)
- \(D(i,k)\) :
-
The boundary distance between \(B_i\) and \(B_k\)
- \(\hat{b}_{ik}\) :
-
The boundary pixel of \(B_i\) closest to the \(k\)th blob
- \(\alpha \) :
-
A constant value deciding the sampling density
- \(T\) :
-
A constant value equals 300
- \(I_i\) :
-
The \(i\)th image
- \(rt_i\) :
-
The time cost of computing the boundary distances of the blobs
- \(\tilde{D}(j,k)\) :
-
The true boundary distance between \(B_j\) and \(B_k\)
- \(\upsilon _{j,k}\) :
-
The accuracy of our method
- \(err_{j,k}\) :
-
The estimation error
- \(N\) :
-
The number of obtained blobs
- \(S_b\) :
-
The obtained blobs
- \(\Xi (i,j)\) :
-
The centroid distance between the \(i\)th blob and the \(j\)th blob
- \(\rho \) :
-
A predefined threshold
- \(\beta \) :
-
A predefined threshold
- \(\varrho \) :
-
The diagonal axis length of the largest object that ever appeared in the scene
- \(s(B_i, B_j)\) :
-
The similarity between the two blobs
- \(K_{x}(\cdot )\) :
-
The flat kernel
- \(\phi \) :
-
A predefined threshold
- \({\Lambda _j}\) :
-
The \(j\)th blob-set
- \(t\) :
-
The index of the time step
- \(\mathbf x _{i,t}\) :
-
The state of the \(i\)th target at time \(t\)
- \((w,h)\) :
-
The width and the height
- \(id\) :
-
The identity of the target
- \((\dot{x},\dot{y})\) :
-
The velocities
- \(l\) :
-
The target label
- \(\mathbf Q \) :
-
The white Gaussian noise
- \(N_p\) :
-
The number of particles used by each tracker
- \(g_{i,j}^{(t)}\) :
-
The motion coherence at time \(t\) between two targets \(\mathbf{x}_{i}\) and \(\mathbf{x}_{j}\)
- \(d_{i,j}^{(t)}\) :
-
The distance between two targets
- \(\chi \) :
-
A predefined threshold
- \(\triangle _{0}\) :
-
The velocity difference of targets
- \(\triangle _{v}\) :
-
The true average velocity difference vector of correlated targets
- \(\sigma ^2_{0}\) :
-
A small variance
- \(\sigma ^2_{1}\) :
-
A large variance
- \(q_{i,j}^{(t)}\) :
-
The correlation at time \(t\) between two targets \(\mathbf{x}_{i}\) and \(\mathbf{x}_{j}\)
- \(C\) :
-
The color histogram of the foreground gap-pixels
- \(C_b\) :
-
The color histogram of the background gap-pixels
- \(\xi {(C,C_b)}\) :
-
The Bhattacharyya similarity coefficient
- \(N_h\) :
-
A constant value equals 20
- \(N_s\) :
-
A constant value equals 20
- \(\lambda \) :
-
A constant value equals 20
- \(G\) :
-
The correlation graph
- \(V\) :
-
The vertices corresponding to the targets in \(G\)
- \(E\) :
-
The edges in \(G\)
- \(e_{i,j}\) :
-
The edge between \(\mathbf{x}_{i}\) and \(\mathbf{x}_{j}\)
- \(w_{i,j}\) :
-
The weight of the edge \(e_{i,j}\)
- \(S_C\) :
-
A correlation-set
References
Stauffer, C., Grimson, W.: Adaptive background mixture modelsfor real-time tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Ft. Collins, Colorado, 23–25 June 1999, pp. 246–252
Chen, Y., Chen, C., Huang, C., Hung, Y.: Efficient hierarchical method for background subtraction. Pattern Recognit. 40(10), 2706–2715 (2007)
Elgammal, A., Duraiswami, R., Harwood, D., Davis, L.: Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proc. IEEE 90(7), 1151–1163 (2002)
Tsai, D., Lai, S.: Independent component analysis-based background subtraction for indoor surveillance. IEEE Trans. Image Process. 18(1), 158–167 (2009)
Intille, S., Davis, J., Bobick, A.: Real-time closed-world tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. San Juan, Puerto Rico, 17–19 June 1997, pp. 697–704
Dente, E., Bharath, A., Ng, J.: A jump-diffusion particle filter for tracking grouped and fragmented objects. In: Proceedings of International Conference on Digital Signal Processing, London, UK, 5–7 July 2009, pp. 1–6
Torabi, A., Bilodeau, G.: A multiple hypothesis tracking method with fragmentation handling. Proceedings of Canadian Conference on Computer and Robot Vision, Kelowna, Canada, 25–27 May 2009, pp. 8–15
Bose, B., Wang, X., Grimson, E.: Multi-class object tracking algorithm that handles fragmentation and grouping. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, 17–22 June 2007, pp. 1–8
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Kauai, Hawaii, 8–14 Dec 2001, pp. 511–518
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, California, 25–27 June 2005, pp. 886–893
Chang, I., Lin, S.: 3D human motion tracking based on a progressive particle filter. Pattern Recognit. 43(10), 3621–3635 (2010)
Sullivan, J., Carlsson, S.: Tracking and labelling of interacting multiple targets. In: Proceedings of European Conference on Computer Vision, Graz, Austria, 7–13 May 2006, pp. 619–632
Javed, O., Shah, M.: Tracking and object classification for automated surveillance. Proceedings of European Conference on Computer Vision, Graz, Austria, 7–13 May 2006, pp. 439–443
Zhao, T., Nevatia, R., Wu, B.: Segmentation and tracking of multiple humans in crowded environments. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1198–1211 (2007)
Masoud, O., Papanikolopoulos, N.: A novel method for tracking and counting pedestrians in real-time using a single camera. IEEE Trans. Veh. Technol. 50(5), 1267–1278 (2001)
Lu, H., Jia, C., Zhang, R.: An effective method for detection and segmentation of the body of human in the view of a single stationary camera. In: Proceedings of 19th IEEE International Conference on Pattern Recognition, Tampa, Florida, 8–11 Dec 2009, pp. 1–4
Senior, A., Hampapur, A., Tian, Y., Brown, L., Pankanti, S., Bolle, R.: Tracking people with probabilistic appearance models. In: Proceedings of 3rd International Workshop on Performance Evaluation of Surveillance Systems, Copenhagen, Denmark, 1–3 June 2002, pp. 48–55
Wang, F., Zhao, Q.: New particle filter for nonlinear filtering problems. Chin. J. Comput. 31(2), 346–352 (2008)
Isard, M., Blake, A.: Condensation-conditional density propagation for visual tracking. Int J Comput Vis 29(1), 5–28 (1998)
Nillius, P., Sullivan, J., Carlsson, S.: Multi-target tracking-linking identities using bayesian network inference. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, 17–22 June 2006, pp. 2187–2194
Meijster, A., Roerdink, J., Hesselink, W.: A general algorithm for computing distance transforms in linear time. In: Proceedings of 5th International Conference on Mathematical Morphology and its Applications to Image and Signal Processing, Palo Alto, California, 26–28 June 2000, pp. 331–340
Maurer Jr, C., Qi, R., Raghavan, V.: A linear time algorithm for computing exact euclidean distance transforms of binary images in arbitrary dimensions. IEEE Trans. Pattern Anal. Mach. Intell. 25(2), 265–270 (2003)
Fabbri, R., Costa, L., Torelli, J., Bruno, O.: 2D euclidean distance transform algorithms: a comparative survey. ACM Comput. Surv. 40(1), 2:1–2:44 (2008)
Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995)
Acknowledgments
The paper was partly supported by a grant from the National Natural Science Foundation of China (No.60772063, No.61175096) and International S&T Cooperation Program of Beijing Institute of Technology.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xu, W., Zhao, Q. & Gu, D. Fragmentation handling for visual tracking. SIViP 8, 1639–1649 (2014). https://doi.org/10.1007/s11760-012-0406-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-012-0406-1