Abstract
Many of the state-of-the-art approaches for automatic abnormal behavior detection in crowded scenes are based on complex models which require high processing time and several parameters to be adjusted. This paper presents a simple new approach that uses background subtraction algorithm and optical flow to encode the normal behavior pattern through a Gaussian Mixture Model (GMM). Abnormal behavior is detected comparing new samples against the mixture model. Experimental results on standards anomaly detection and localization benchmarks are presented and compared to other algorithms considering detection rate and processing time.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Abnormal behavior analysis on crowded scenes is an important and growing research field. Video cameras, given their ease installation and low cost, have been widely used for monitoring internal and external areas such as buildings, parks, stadiums etc. Algorithms for pose detection and action recognition for single or, in some cases, very low density groups of people are extensively treated in the pattern recognition community. Nevertheless, many of these algorithms need to segment each person which is impractical in crowded scenes due to high levels of occlusion.
Abnormal behavior situations are always associated with the scene context, a behavior considered as normal in a scene may be considered abnormal in other. These specific conditions increase the difficulties for automatic analysis and require specific modeling of the abnormal behavior for each particular scene.
In order to build such models many algorithms have been proposed. In [7] optical flow is used to compute interaction forces between adjacent pixels and a bag of words approach is used to classify frames as normal or abnormal. In [6] dynamic textures (DT) are used to model the appearance and dynamics of normal behavior, samples with low probabilities in the model are labeled as abnormal. In [8] entropy and energetic concepts are used as features to model the probability of finding abnormal behavior in the scene. Natural language processing is used in [11] as a classification algorithm for features recognition based on viscous fluid field concepts.
Many algorithms employ machine learning techniques for classification. Support Vector Machine (SVM) is used in [9, 12] to classify histograms of the orientation of optical flow. Multilayer Perceptron Neural Network is used in [14]. k-Nearest Neighbors is used in [1] to classify outlier observed trajectories as abnormal behavior. Finally, Fuzzy C-Means are used in [3, 4] to derive an unsupervised model for the crowd trajectory patterns.
Most of the recent algorithms employ supervised or unsupervised machine learning techniques. Supervised techniques are employed when all possible abnormal situations are well known and there are a sufficient number of video samples with both normal and abnormal situations. In most of the cases supervised techniques present very limited results since it is difficult to obtain the information of all the possible abnormal situations and the number of samples with this type of behavior is usually very low. The detection of abnormal behavior using unsupervised algorithms can be seen as a problem of outlier detection. A model is constructed using only samples of normal behaviors and in the test phase each sample that does not fit the model is labeled as abnormal.
In general, to construct the feature vector used in many of the algorithms described above, a set of parameters must be correctly defined in order to achieve the performance reported by the authors. Some of the state-of-the-art methods are based in complex probabilistic models which leads to high processing time. Despite being the processing time per frame reported only for very few papers, it is in general high. For example, in [6] the authors reported a test time of 25 s per frame for 160 \(\times \) 240 pixel images and in [13] the reported test time per frame is 5 s in videos with 320 \(\times \) 240 pixel resolution.
The main contribution of this paper is a simple but efficient method for abnormal crowd behavior detection that reduces the processing time per frames allowing practical use.
The rest of this paper is organized as follows. Section 2 describes the proposed approach. Section 3 presents the experimental results. Section 4 presents the conclusions.
2 Proposed Method
2.1 Pre-processing
The dense optical flow for each input frame is obtained using the algorithm presented in [2]. The optical flow information \(F(x,y) = (v_x(x,y), v_y(x,y))\) can be expressed as a vector of horizontal \(v_x(x,y)\) and vertical \(v_y(x,y)\) velocity components for each image pixel (x, y). The magnitude and direction of each optical flow vector is obtained using Eqs. 1 and 2 respectively.
In parallel with the optical flow computation, the background segmentation algorithm presented in [10] is used to obtain a foreground binary mask \(I_f\). The foreground information will be used to reduce the number of optical flow vectors processed as shown in the next section.
2.2 Normal Behavior Model
The normal behavior model is obtained from a sequence of N frames containing only normal behavior samples. First, each input frame is divided into smaller regions \(R_i\) of fixed size \(\widehat{w} \times \widehat{h}\). The regions are overlapped in the x and y direction by \(\widehat{w}/2\) and \(\widehat{h}/2\) respectively. Consequently, a total T regions are obtained, where T is computed using
where w and h are the width and the height of the input frame respectively.
A matrix \(\varvec{\mathcal {X}}_i\), shown in Eq. 4, is used to store the magnitude and direction values of the optical flow vectors for all the training frames within the region \(R_i\).
where the column vector \(\varvec{\alpha }_i^j\) will contain the direction values and the column vector \(\varvec{m}_i^j\) their corresponding magnitude within the i-th region of the j-th frame according to Eqs. 5 and 6 respectively.
where \(I_f(x,y)\) is the foreground image obtained as described in the previous section.
Assuming that all the magnitude and direction values of a specific (x, y) point through the training video can be modeled as a mixture of M 2D Gaussian distributions, the probability of a particular pair \(\varvec{x}=(\alpha (x,y),m(x,y))\) being part of the Gaussian Mixture distribution is given by
where \({\varvec{\Theta }_\mathbf{i}} = \{\lambda _1^i,\ldots ,\lambda _M^i,\varvec{\mu }_1^i,\ldots ,\varvec{\mu }_M^i, \varvec{\varSigma }_1^i,\ldots ,\varvec{\varSigma }_M^i\}\) are the model parameter vectors, \(\lambda _k^i\) are the mixing coefficients, \(\varvec{\mu }_k^i\) and \(\varvec{\varSigma }_k^i\) are the mean vector and the covariance matrix of \(\varvec{\mathcal {X}}_i\) respectively and \(\mathcal {N}\) is the Multivariate Gaussian distribution given by
The mixture parameter vectors \({\varvec{\Theta }_\mathbf{i}}\) are obtained through the Expectation-Maximization (EM) algorithm.
Figure 1 shows an example region (in red), the \(\varvec{\mathcal {X}}_i\) matrix formation and the contour map of the Gaussian Distribution obtained from \(\varvec{\mathcal {X}}_i\), with \(i = 22\). From the Gaussian Distribution it is observed that, in the example region, the main displacement directions are approximately \(10^{\circ }\), \(180^{\circ }\) and \(360^{\circ }\) and the main magnitude value is approximately 0.8.
2.3 Abnormal Detection
In the test phase a binary image \(I_b\) is obtained for each input frame. The \(I_b\) image has the same size as the input frames and its values are computed as is shown in Eq. 9.
where \(\varvec{x} = (\alpha (x,y), m(x,y))\) is the optical flow direction and magnitude at point (x, y), \({\varvec{\Theta }_\mathbf{i}}\) is the parameter vector for the Gaussian Mixture Model at the region \(R_i\) and \(T_h\) is a threshold that specify the probability limit where the pixel (x, y) is marked as normal (0) or abnormal (1).
In order to improve the algorithm’s performance a FIFO type list with fixed size S is defined and filled up with the latest S binary images \(I_b(x,y)\). A connected component analysis is performed on each new image \(I_{b}(x,y)\). If any pixel within a blob appears as abnormal in at least W images within the list, where \(W<S\), then the whole blob is marked as abnormal. The list size S and the number W are user controlled parameters and can be used for sensitivity adjustment, since a higher value of W means a higher alarm delay time.
3 Results and Comparisons
The proposed algorithm was implemented in Qt/C++ using OpenCV 3.0 on a 2.7 GHz Intel Core i7 PC with 16 GB of RAM. The parameters of the model were fixed to \(M = 5\) (number of Gaussian in the mixture), \(S = 3\), \(W = 2\). All frames, regardless the dataset, were divided into \(T=35\) regions, that means that the \(\widehat{w}\) and \(\widehat{h}\) values will depend on the input frame size. The presented method was tested in two publicly available anomaly detection datasets: UMNFootnote 1 and UCSDFootnote 2. Figure 2 shows a normal frame for each scenarios in the UMN dataset, the abnormality detected by the proposed approach and the performance comparison against the ground truth.
The Fig. 3 shows six examples frames with abnormal behavior detection from the UCSD dataset.
The proposed method was compared with similar state-of-the-art algorithms including Mixture Dynamic Texture (MDT) [6], Mixture of Optical Flow (MPPCA) [5], Social Force [7], Social Force with MPPCA [5] and the Hierarchical Activity Approach [13]. Figure 4 shows the Receiver Operation Characteristic (ROC) curves for the proposed method and the comparative algorithms, taken from [13]. Table 1 shows the Area Under the ROC curve (AUC) for the five comparative methods and the proposed one. Finally, Fig. 5 shows the processing time per frame for some state-of-the-art algorithms and the proposed in this paper.
4 Conclusions
This paper presents a new method for abnormal behavior detection. It is based on optical flow and Mixture of Gaussians Model. The experimental results show that the proposed method presents a better performance in both detection rate and time processing per frame compared to other state-of-the-art algorithms.
Notes
- 1.
http://mha.cs.umn.edu/proj events.shtml.
- 2.
http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm.
References
Alvar, M., Torsello, A., Sanchez-Miralles, A., Armingol, J.M.: Abnormal behavior detection using dominant sets. Mach. Vis. Appl. 25(5), 1351–1368 (2014)
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., et al. (eds.) ECCV 2004, vol. 3024, pp. 25–36. Springer, Heidelberg (2004)
Chen, Z., Tian, Y., Wei Zeng, T.H.: Detecting abnormal behaviors in surveillance videos based on fuzzy clustering and multiplr auto-encoders. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2015)
Cui, J., Liu, W., Xing, W.: Crowd behaviors analysis and abnormal detection based on surveillance data (2014)
Kim, J., Grauman, K.: Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1975–1981 (2010)
Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 935–942 (2009)
Ren, W.Y., Ll, G.H., Chen, J., Liang, H.Z.: Abnormal crowd beravior detection using beravior entropy model. In: International Conference on Wavelet Analysis and Pattern Recognition, pp. 212–221 (2012)
Snoussi, H., Wang, T.: Detection of abnormal visual events via global optical flow orientation histogram. IEEE Trans. Inf. Forensics Secur. 9(6), 988–998 (2014)
Stauffer, C., Grimson, W.: Adaptive background mixture models for real-time tracking. In: Proceedings in IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2 (1999)
Su, H., Yang, H., Zheng, S., Fan, Y., Wei, S.: Crowd event perception based on spatio-temporal viscous fluid field. In: IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, pp. 458–463 (2012)
Wang, T., Snoussi, H.: Histograms of optical flow orientation for visual abnormal events detection. In: IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance (AVSS), pp. 13–18 (2012)
Xu, D., Song, R., Wu, X., Li, N., Feng, W., Qian, H.: Video anomaly detection based on a hierarchical activity discovery within spatio-temporal contexts. Neurocomputing 143, 144–152 (2014)
Zhang, D., Peng, H., Haibin, Y., Lu, Y.: Crowd abnormal behavior detection based on machine learning. Inf. Technol. J. 12, 1199–1205 (2013)
Acknowledgments
The authors wish to thank Conselho Nacional de Desenvolvimento Científico (CNPq), Brazilian Research Support Foundations, for sponsoring this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Rojas, O.E., Tozzi, C.L. (2016). Abnormal Crowd Behavior Detection Based on Gaussian Mixture Model. In: Hua, G., Jégou, H. (eds) Computer Vision – ECCV 2016 Workshops. ECCV 2016. Lecture Notes in Computer Science(), vol 9914. Springer, Cham. https://doi.org/10.1007/978-3-319-48881-3_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-48881-3_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48880-6
Online ISBN: 978-3-319-48881-3
eBook Packages: Computer ScienceComputer Science (R0)