Skip to main content
Log in

Reduced Analytic Dependency Modeling: Robust Fusion for Visual Recognition

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper addresses the robustness issue of information fusion for visual recognition. Analyzing limitations in existing fusion methods, we discover two key factors affecting the performance and robustness of a fusion model under different data distributions, namely (1) data dependency and (2) fusion assumption on posterior distribution. Considering these two factors, we develop a new framework to model dependency based on probabilistic properties of posteriors without any assumption on the data distribution. Making use of the range characteristics of posteriors, the fusion model is formulated as an analytic function multiplied by a constant with respect to the class label. With the analytic fusion model, we give an equivalent condition to the independent assumption and derive the dependency model from the marginal distribution property. Since the number of terms in the dependency model increases exponentially, the Reduced Analytic Dependency Model (RADM) is proposed based on the convergent property of analytic function. Finally, the optimal coefficients in the RADM are learned by incorporating label information from training data to minimize the empirical classification error under regularized least square criterion, which ensures the discriminative power. Experimental results from robust non-parametric statistical tests show that the proposed RADM method statistically significantly outperforms eight state-of-the-art score-level fusion methods on eight image/video datasets for different tasks of digit, flower, face, human action, object, and consumer video recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets/Multiple+Features.

  2. http://www.robots.ox.ac.uk/~vgg/data/flowers/17/index.html.

  3. http://lear.inrialpes.fr/pubs/2010/GVS10/.

  4. http://www.ee.columbia.edu/ln/dvmm/CCV/.

  5. http://www.comp.hkbu.edu.hk/~jhma/.

  6. http://www.ele.uri.edu/faculty/he/.

  7. It should be noticed that significance in this paper refers to the statistical significance, but not the degree of improvement. In statistics, a result is called statistically significant, if the difference in an experiment is unlikely to be obtained by chance alone and is likely to be the result of a genuine experimental effect (Sheskin 2011).

References

  • Ahonen, T., Hadid, A., & Pietikäinen, M. (2004). Face recognition with local binary patterns. European Conference on Computer Vision, Lecture Notes in Computer Science, 3021, 469–481.

    Article  Google Scholar 

  • Awais, M., Yan, F., Mikolajczyk, K., & Kittler, J. (2011). Augmented kernel matrix vs classifier fusion for object recognition. British Machine Vision Conference, 60(1–60), 11.

    Google Scholar 

  • Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 711–720.

    Article  Google Scholar 

  • Breukelen, M., Duin, R., Tax, D., & Hartog, J. (1998). Handwritten digit recognition by combined classifiers. Kybernetika, 34(4), 381–386.

    MATH  Google Scholar 

  • Canu, S., Grandvalet, Y., Guigue, V., & Rakotomamonjy, A. (2005). SVM and kernel methods matlab toolbox. Rouen: Perception Systèmes et Information, INSA de Rouen.

  • Chen, H., & Meer, P. (2005). Robust fusion of uncertain information. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 35(3), 578–586.

    Article  Google Scholar 

  • Comaniciu, D. (2003). Robust information fusion using variable-bandwidth density estimation. International Conference of Information Fusion, 2, 1303–1309.

    Google Scholar 

  • Cover, T. M., & Thomas, J. A. (2006). Elements of information theory. New York: Wiley.

    MATH  Google Scholar 

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. IEEE Conference on Computer Vision and Pattern Recognition, 1, 886–8930.

    Google Scholar 

  • Dass, S. C., Nandakumar, K., & Jain, A. K. (2005). A principled approach to score level fusion in multimodal biometric systems. In International conference on audio- and video-based biometric person authentication (pp. 1049–1058).

  • Demiriz, A., Bennett, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46(1–3), 225–254.

    Article  MATH  Google Scholar 

  • Dems̆ar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.

    MathSciNet  Google Scholar 

  • Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56, 52–64.

    Article  MATH  MathSciNet  Google Scholar 

  • Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC 2007) results.

  • Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Feller, W. (1968). An introduction to probability theory and its applications, volume I. New York: Wiley.

    Google Scholar 

  • Fernando, B., Fromont, E., Muselet, D., & Sebban, M. (2012). Discriminative feature fusion for image classification. In IEEE conference on computer vision pattern recognition (pp. 3434–3441).

  • Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32, 675–701.

    Article  Google Scholar 

  • Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In IEEE international conference on computer vision (pp. 221–228).

  • Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.

    Article  Google Scholar 

  • Guillaumin, M., Verbeek, J., & Schmid, C. (2010). Multimodal semi-supervised learning for image classication. In IEEE conference on computer vision and pattern recognition (pp. 902–909).

  • He, H., & Cao, Y. (2012). SSC: A classifier combination method based on signal strength. IEEE Transactions on Neural Networks and Learning Systems, 23(7), 1100–1117.

    Article  Google Scholar 

  • He, M., Horng, S.-J., Fan, P., Run, R.-S., Chen, R.-J., Lai, J.-L., et al. (2010). Performance evaluation of score level fusion in multimodal biometric systems. Pattern Recognition, 43(5), 1789–1800.

    Article  MATH  Google Scholar 

  • He, X., Yan, S., Hu, Y., Niyogi, P., & Zhang, H. J. (2005). Face recognition using laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 328–340.

    Article  Google Scholar 

  • Huber, P. J., & Ronchetti, E. M. (2009). Robust Statistics (2nd ed.). New York: Wiley.

    Book  MATH  Google Scholar 

  • Jain, A., Nandakumar, K., & Ross, A. (2005). Score normalization in multimodal biometric systems. Pattern Recognition, 38(12), 2270–2285.

    Article  Google Scholar 

  • Jiang, Y.-G., Ye, G., Chang, S.-F., Ellis, D., & Loui, A. C. (2011). Consumer video understanding: A benchmark database and an evaluation of human and machine performance. ACM International Conference on Multimedia Retrieval, 29(1–29), 8.

    Google Scholar 

  • Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.

    Article  Google Scholar 

  • Krantz, S. G., & Parks, H. R. (2002). A primer of real analytic functions. Basel: Birkhäuser.

    Book  MATH  Google Scholar 

  • Kuncheva, L. I. (2004). Combining pattern classifiers: Methods and algorithms. New York: Wiley.

    Book  Google Scholar 

  • Lan, X., Yuen, P. C., & Ma, A. J. (2014). Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation. In IEEE conference on computer vision and pattern recognition.

  • Liu, D., Lai, K.-T., Ye, G., Chen, M.-S., & Chang, S.-F. (2013). Sample specific late fusion for visual category recognition. In IEEE conference on computer vision and pattern recognition (pp. 803–810).

  • Liu, J., McCloskey, S., & Liu, Y. (2012). Local expert forest of score fusion for video event classification. European Conference on Computer Vision, Lecture Notes in Computer Science, 7576, 397–410.

    Article  Google Scholar 

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Luenberger, D. G., & Ye, Y. (2008). Linear and nonlinear programming (3rd ed.). Berlin: Springer.

    MATH  Google Scholar 

  • Ma, A. J., & Yuen, P. C. (2012). Reduced analytical dependency modeling for classifier fusion. European Conference on Computer Vision, Lecture Notes in Computer Science, 7574, 792–805.

    Article  Google Scholar 

  • Ma, A. J., Yuen, P. C., & Lai, J.-H. (2013a). Linear dependency modeling for classifier fusion and feature combination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5), 1135–1148.

    Article  Google Scholar 

  • Ma, A. J., Yuen, P. C., Zou, W. W., & Lai, J.-H. (2013b). Supervised spatio-temporal neighborhood topology learning for action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 23(8), 1447–1460.

    Article  Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86.

    Article  Google Scholar 

  • Mittal, A., Zisserman, A., & Torr, P. (2011). Hand detection using multiple proposals. British Machine Vision Conference, 75(1–75), 11.

    Google Scholar 

  • Nandakumar, K., Chen, Y., Dass, S. C., & Jain, A. K. (2008). Likelihood ratio based biometric score fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 342–347.

    Google Scholar 

  • Natarajan, P., Wu, S., Vitaladevuni, S., Zhuang, X., Tsakalidis, S., Park, U., et al. (2012). Multimodal feature fusion for robust event detection in web videos. In IEEE conference on computer vision pattern recognition (pp. 1298–1305).

  • Nilsback, M.-E., & Zisserman, A. (2006). A visual vocabulary for flower classification. IEEE Conference on Computer Vision and Pattern Recognition, 2, 1447–1454.

    Google Scholar 

  • Nilsback, M.-E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In IEEE Indian conference on computer vision, graphics and image processing (pp. 722–729).

  • Oh, S., McCloskey, S., Kim, I., Vahdat, A., Cannons, K., Hajimirsadeghi, H., et al. (2014). Multimedia event detection with multimodal feature fusion and temporal concept localization. Machine Vision and Applications, 25(1), 49–69.

    Article  Google Scholar 

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.

    Article  MATH  Google Scholar 

  • Phillips, P. J., Moon, H., Rizvi, S. A., & Rauss, P. J. (2000). The FERET evaluation methodology for face recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(10), 1090–1104.

    Article  Google Scholar 

  • Prabhakar, S., & Jain, A. K. (2002). Decision-level fusion in fingerprint verification. Pattern Recognition, 35(4), 861–874.

    Article  MATH  Google Scholar 

  • Ross, A., Nandakumar, K., & Jain, A. K. (2006). Handbook of multibiometrics. Berlin: Springer.

    Google Scholar 

  • Rudin, W. (1976). Principles of mathematical analysis. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Scheirer, W., Rocha, A., Micheals, R., & Boult, T. (2010). Robust fusion: Extreme value theory for recognition score normalization. European Conference on Computer Vision, Lecture Notes in Computer Science, 6313, 481–495.

    Article  Google Scholar 

  • Schuldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: A local SVM approach. IEEE International Conference on Pattern Recognition, 3, 32–36.

    Google Scholar 

  • Sheskin, D. J. (2011). Handbook of parametric and nonparametric statistical procedures (5th ed.). London: Chapman and Hall/CRC.

    MATH  Google Scholar 

  • Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Sim, T., Baker, S., & Bsat, M. (2003). The CMU pose, illumination, and expression database. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1615–1618.

    Article  Google Scholar 

  • Tang, K., Yao, B., Fei-Fei, L., & Koller, D. (2013). Combining the right features for complex event recognition. In IEEE international conference on computer vision.

  • Terrades, O. R., Valveny, E., & Tabbone, S. (2009). Optimal classifier fusion in a non-bayesian probabilistic framework. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9), 1630–1644.

    Article  Google Scholar 

  • Toh, K.-A., Tran, Q.-L., & Srinivasan, D. (2004a). Benchmarking a reduced multivariate polynomial pattern classifier. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 740–755.

    Article  Google Scholar 

  • Toh, K.-A., Yau, W.-Y., & Jiang, X. (2004b). A reduced multivariate polynomial model for multimodal biometrics and classifiers fusion. IEEE Transactions on Circuits and Systems for Video Technology, 14(2), 224–233.

    Article  Google Scholar 

  • Ueda, N. (2000). Optimal linear combination of neural networks for improving classification performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(2), 207–215.

    Article  MathSciNet  Google Scholar 

  • Wang, H., Nie, F., & Huang, H. (2013). Heterogeneous visual features fusion via sparse multimodal machine. In IEEE conference on computer vision and pattern recognition (pp. 3097–3102).

  • Wang, J., Kwon, S., & Shim, B. (2012). Generalized orthogonal matching pursuit. IEEE Transactions on Signal Processing, 60(12), 6202–6216.

    Article  MathSciNet  Google Scholar 

  • Ye, G., Liu, D., Jhuo, I.-H., & Chang, S.-F. (2012). Robust late fusion with rank minimization. In IEEE conference on computer vision pattern recognition (pp. 3021–3028).

  • Yuan, X.-T., Liu, X., & Yan, S. (2012). Visual classification with multitask joint sparse representation. IEEE Transactions on Image Processing, 21(10), 4349–4360.

    Google Scholar 

  • Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 213–238.

    Article  Google Scholar 

Download references

Acknowledgments

This project was partially supported by the Science Faculty Research Grant of Hong Kong Baptist University, Hong Kong Research Grants Council General Research Fund 212313, National Science Foundation of China Research Grant 61172136. The authors would like to thank the editor and reviewers for their helpful comments which improve the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pong C. Yuen.

Additional information

Communicated by K. Ikeuchi.

Appendices

Appendix A: Proof of Proposition 1

We first show that conditionally independent condition implies the solution to the equation system (16) is trivial, i.e. \({\varvec{a}}_{lm0} = \mathbf 0 , {\varvec{a}}_{lm2} = \mathbf 0 , {\varvec{a}}_{lm3} = \mathbf 0 , \ldots \) is a trivial solution to equation system (16) for \(m = 1, \ldots , M\). If feature representations are independent with each other given class label \(\omega _l\), the analytic function \(h_l({\varvec{s}}_l; {\varvec{a}}_l)\) becomes Eq. (3). Rewriting the analytic function in (3) according to the order of \(s_{lm}\), we get

$$\begin{aligned} h_l({\varvec{s}}_l; {\varvec{a}}_l) = g_{lm1}(\tilde{{\varvec{s}}}_{lm}; {\varvec{a}}_{lm1}) s_{lm} \end{aligned}$$
(36)

where \(g_{lm1}(\tilde{{\varvec{s}}}_{lm}; {\varvec{a}}_{lm1}) = p_l^{1-M} \prod _{m' \ne m} s_{lm'}\). This Eq. (36) means that \(g_{lmn}(\tilde{{\varvec{s}}}_{lm}; {\varvec{a}}_{lmn}) \equiv 0\) or equivalently \({\varvec{a}}_{lmn} = \mathbf 0 \) for \(n \ne 1\), i.e. the solution to equation system (16) is trivial.

On the other hand, given the solution to equation system (16) is trivial, we need to show that the analytic function \(h_l({\varvec{s}}_l; {\varvec{a}}_l)\) is equal to Eq. (3). If \({\varvec{a}}_{lmn} = \mathbf 0 \) for \(n \ne 1\), then the analytic function \(h_l({\varvec{s}}_l; {\varvec{a}}_l)\) can be rewritten as Eq. (36) for \(m = 1, \ldots , M\). This implies each term in the power series \(h_l\) contains all variables \(s_{l1}, \ldots , s_{lM}\) and the order of each \(s_{lm}\) cannot be larger than one. In this case, there is only one non-zero term \(\prod _{m=1}^{M} s_{lm}\) in the analytic function \(h_l\). In addition, according to the normalization equation (15), the non-zero term \(\prod _{m=1}^{M} s_{lm}\) is normalized by the prior. And the analytic function becomes equation (3). This complete the proof of this proposition.

Appendix B: Derivation for \(E_{\mathrm{Dis}}({\varvec{a}}, {\varvec{q}})\)

$$\begin{aligned}&E_\mathrm{Dis }({\varvec{a}}, {\varvec{q}}) \\&\quad = - \theta \sum _{l=1}^L \sum _{l' \ne l} \sum _{{y_j} = \omega _l } q_{jl'} \\&\qquad +\,\, \frac{\theta }{2} \sum _{l=1}^L \sum _{l' \ne l} \sum _{{y_j} = \omega _l } (({\varvec{a}}_l^T {\varvec{z}}_{jl} - {\varvec{a}}_l'^T {\varvec{z}}_{jl'}) - q_{jl'})^2 \\&\quad = - \theta \sum _{l=1}^L \sum _{l' \ne l} {\varvec{q}}_{ll'}^T \mathbf 1 + \frac{\theta }{2} \sum _{l=1}^L \sum _{l' \ne l} \Vert (Z_{ll}^T {\varvec{a}}_l - Z_{ll'}^T {\varvec{a}}_l') - {\varvec{q}}_{ll'}\Vert ^2 \\&\quad = - \theta \sum _{l=1}^L {\varvec{q}}_l^T \mathbf 1 + \frac{\theta }{2} \sum _{l=1}^L ({\varvec{a}}^T Z_l - {\varvec{q}}_l^T) (Z_l^T {\varvec{a}} - {\varvec{q}}_l) \\&\quad = \frac{1}{2} {\varvec{a}}^T H_\mathrm{Dis } {\varvec{a}} + \theta \sum _{l=1}^L (\frac{1}{2} {\varvec{q}}_l^T {\varvec{q}}_l - {\varvec{a}}^T Z_l {\varvec{q}}_l - {\varvec{q}}_l^T \mathbf 1 ) \end{aligned}$$

Appendix C: Derivation of the Matrix Formulation for \(E({\varvec{a}})\)

$$\begin{aligned}&E({\varvec{a}}) \\&\quad = \frac{\sum _{l=1}^{L} \sum _{m=1}^M \Vert {\varvec{a}}_l^T ({\varvec{c}}_{lm0}, \ldots , {\varvec{c}}_{lmN}) - (b_0, \ldots , b_N)\Vert ^2}{2LM(N+1)} \\&\quad = \frac{\sum _{l=1}^{L} \sum _{m=1}^M ({\varvec{a}}_l^T C_{lm} - {\varvec{b}}^T) (C_{lm}^T {\varvec{a}}_l - {\varvec{b}})}{2LM(N+1)} \\&\quad = \frac{\sum _{l=1}^{L} [{\varvec{a}}_l^T (\sum _{m=1}^M C_{lm} C_{lm}^T) {\varvec{a}}_l - 2 {\varvec{a}}_l^T \sum _{m=1}^M C_{lm} {\varvec{b}} + {\varvec{b}}^T {\varvec{b}}]}{2LM(N+1)} \\&\quad = \frac{1}{2} \sum _{l=1}^{L} {\varvec{a}}_l^T H_l {\varvec{a}}_l - \sum _{l=1}^{L} {\varvec{a}}_l^T {\varvec{f}}_l + \frac{1}{2LM(N+1)} \sum _{l=1}^{L} {\varvec{b}}^T {\varvec{b}}\\&\quad = \frac{1}{2} {\varvec{a}}^T H {\varvec{a}} - {\varvec{a}}^T {\varvec{f}} + \frac{1}{2LM(N+1)} \sum _{l=1}^{L} {\varvec{b}}^T {\varvec{b}} \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, A.J., Yuen, P.C. Reduced Analytic Dependency Modeling: Robust Fusion for Visual Recognition. Int J Comput Vis 109, 233–251 (2014). https://doi.org/10.1007/s11263-014-0723-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0723-7

Keywords

Navigation