Reduced Analytic Dependency Modeling: Robust Fusion for Visual Recognition

Ma, Andy J.; Yuen, Pong C.

doi:10.1007/s11263-014-0723-7

Reduced Analytic Dependency Modeling: Robust Fusion for Visual Recognition

Published: 19 April 2014

Volume 109, pages 233–251, (2014)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Andy J. Ma¹ &
Pong C. Yuen^1,2

697 Accesses
22 Citations
Explore all metrics

Abstract

This paper addresses the robustness issue of information fusion for visual recognition. Analyzing limitations in existing fusion methods, we discover two key factors affecting the performance and robustness of a fusion model under different data distributions, namely (1) data dependency and (2) fusion assumption on posterior distribution. Considering these two factors, we develop a new framework to model dependency based on probabilistic properties of posteriors without any assumption on the data distribution. Making use of the range characteristics of posteriors, the fusion model is formulated as an analytic function multiplied by a constant with respect to the class label. With the analytic fusion model, we give an equivalent condition to the independent assumption and derive the dependency model from the marginal distribution property. Since the number of terms in the dependency model increases exponentially, the Reduced Analytic Dependency Model (RADM) is proposed based on the convergent property of analytic function. Finally, the optimal coefficients in the RADM are learned by incorporating label information from training data to minimize the empirical classification error under regularized least square criterion, which ensures the discriminative power. Experimental results from robust non-parametric statistical tests show that the proposed RADM method statistically significantly outperforms eight state-of-the-art score-level fusion methods on eight image/video datasets for different tasks of digit, flower, face, human action, object, and consumer video recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Abundant Inverse Regression Using Sufficient Reduction and Its Applications

Visual recognition based on discriminative and collaborative representation

Article 27 November 2014

Welcome to Riemannian Computing in Computer Vision

Notes

http://archive.ics.uci.edu/ml/datasets/Multiple+Features.
http://www.robots.ox.ac.uk/~vgg/data/flowers/17/index.html.
http://lear.inrialpes.fr/pubs/2010/GVS10/.
http://www.ee.columbia.edu/ln/dvmm/CCV/.
http://www.comp.hkbu.edu.hk/~jhma/.
http://www.ele.uri.edu/faculty/he/.
It should be noticed that significance in this paper refers to the statistical significance, but not the degree of improvement. In statistics, a result is called statistically significant, if the difference in an experiment is unlikely to be obtained by chance alone and is likely to be the result of a genuine experimental effect (Sheskin 2011).

References

Ahonen, T., Hadid, A., & Pietikäinen, M. (2004). Face recognition with local binary patterns. European Conference on Computer Vision, Lecture Notes in Computer Science, 3021, 469–481.
Article Google Scholar
Awais, M., Yan, F., Mikolajczyk, K., & Kittler, J. (2011). Augmented kernel matrix vs classifier fusion for object recognition. British Machine Vision Conference, 60(1–60), 11.
Google Scholar
Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 711–720.
Article Google Scholar
Breukelen, M., Duin, R., Tax, D., & Hartog, J. (1998). Handwritten digit recognition by combined classifiers. Kybernetika, 34(4), 381–386.
MATH Google Scholar
Canu, S., Grandvalet, Y., Guigue, V., & Rakotomamonjy, A. (2005). SVM and kernel methods matlab toolbox. Rouen: Perception Systèmes et Information, INSA de Rouen.
Chen, H., & Meer, P. (2005). Robust fusion of uncertain information. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 35(3), 578–586.
Article Google Scholar
Comaniciu, D. (2003). Robust information fusion using variable-bandwidth density estimation. International Conference of Information Fusion, 2, 1303–1309.
Google Scholar
Cover, T. M., & Thomas, J. A. (2006). Elements of information theory. New York: Wiley.
MATH Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. IEEE Conference on Computer Vision and Pattern Recognition, 1, 886–8930.
Google Scholar
Dass, S. C., Nandakumar, K., & Jain, A. K. (2005). A principled approach to score level fusion in multimodal biometric systems. In International conference on audio- and video-based biometric person authentication (pp. 1049–1058).
Demiriz, A., Bennett, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46(1–3), 225–254.
Article MATH Google Scholar
Dems̆ar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
MathSciNet Google Scholar
Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56, 52–64.
Article MATH MathSciNet Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC 2007) results.
Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Feller, W. (1968). An introduction to probability theory and its applications, volume I. New York: Wiley.
Google Scholar
Fernando, B., Fromont, E., Muselet, D., & Sebban, M. (2012). Discriminative feature fusion for image classification. In IEEE conference on computer vision pattern recognition (pp. 3434–3441).
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32, 675–701.
Article Google Scholar
Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In IEEE international conference on computer vision (pp. 221–228).
Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.
Article Google Scholar
Guillaumin, M., Verbeek, J., & Schmid, C. (2010). Multimodal semi-supervised learning for image classication. In IEEE conference on computer vision and pattern recognition (pp. 902–909).
He, H., & Cao, Y. (2012). SSC: A classifier combination method based on signal strength. IEEE Transactions on Neural Networks and Learning Systems, 23(7), 1100–1117.
Article Google Scholar
He, M., Horng, S.-J., Fan, P., Run, R.-S., Chen, R.-J., Lai, J.-L., et al. (2010). Performance evaluation of score level fusion in multimodal biometric systems. Pattern Recognition, 43(5), 1789–1800.
Article MATH Google Scholar
He, X., Yan, S., Hu, Y., Niyogi, P., & Zhang, H. J. (2005). Face recognition using laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 328–340.
Article Google Scholar
Huber, P. J., & Ronchetti, E. M. (2009). Robust Statistics (2nd ed.). New York: Wiley.
Book MATH Google Scholar
Jain, A., Nandakumar, K., & Ross, A. (2005). Score normalization in multimodal biometric systems. Pattern Recognition, 38(12), 2270–2285.
Article Google Scholar
Jiang, Y.-G., Ye, G., Chang, S.-F., Ellis, D., & Loui, A. C. (2011). Consumer video understanding: A benchmark database and an evaluation of human and machine performance. ACM International Conference on Multimedia Retrieval, 29(1–29), 8.
Google Scholar
Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.
Article Google Scholar
Krantz, S. G., & Parks, H. R. (2002). A primer of real analytic functions. Basel: Birkhäuser.
Book MATH Google Scholar
Kuncheva, L. I. (2004). Combining pattern classifiers: Methods and algorithms. New York: Wiley.
Book Google Scholar
Lan, X., Yuen, P. C., & Ma, A. J. (2014). Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation. In IEEE conference on computer vision and pattern recognition.
Liu, D., Lai, K.-T., Ye, G., Chen, M.-S., & Chang, S.-F. (2013). Sample specific late fusion for visual category recognition. In IEEE conference on computer vision and pattern recognition (pp. 803–810).
Liu, J., McCloskey, S., & Liu, Y. (2012). Local expert forest of score fusion for video event classification. European Conference on Computer Vision, Lecture Notes in Computer Science, 7576, 397–410.
Article Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Luenberger, D. G., & Ye, Y. (2008). Linear and nonlinear programming (3rd ed.). Berlin: Springer.
MATH Google Scholar
Ma, A. J., & Yuen, P. C. (2012). Reduced analytical dependency modeling for classifier fusion. European Conference on Computer Vision, Lecture Notes in Computer Science, 7574, 792–805.
Article Google Scholar
Ma, A. J., Yuen, P. C., & Lai, J.-H. (2013a). Linear dependency modeling for classifier fusion and feature combination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5), 1135–1148.
Article Google Scholar
Ma, A. J., Yuen, P. C., Zou, W. W., & Lai, J.-H. (2013b). Supervised spatio-temporal neighborhood topology learning for action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 23(8), 1447–1460.
Article Google Scholar
Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86.
Article Google Scholar
Mittal, A., Zisserman, A., & Torr, P. (2011). Hand detection using multiple proposals. British Machine Vision Conference, 75(1–75), 11.
Google Scholar
Nandakumar, K., Chen, Y., Dass, S. C., & Jain, A. K. (2008). Likelihood ratio based biometric score fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 342–347.
Google Scholar
Natarajan, P., Wu, S., Vitaladevuni, S., Zhuang, X., Tsakalidis, S., Park, U., et al. (2012). Multimodal feature fusion for robust event detection in web videos. In IEEE conference on computer vision pattern recognition (pp. 1298–1305).
Nilsback, M.-E., & Zisserman, A. (2006). A visual vocabulary for flower classification. IEEE Conference on Computer Vision and Pattern Recognition, 2, 1447–1454.
Google Scholar
Nilsback, M.-E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In IEEE Indian conference on computer vision, graphics and image processing (pp. 722–729).
Oh, S., McCloskey, S., Kim, I., Vahdat, A., Cannons, K., Hajimirsadeghi, H., et al. (2014). Multimedia event detection with multimodal feature fusion and temporal concept localization. Machine Vision and Applications, 25(1), 49–69.
Article Google Scholar
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Article MATH Google Scholar
Phillips, P. J., Moon, H., Rizvi, S. A., & Rauss, P. J. (2000). The FERET evaluation methodology for face recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(10), 1090–1104.
Article Google Scholar
Prabhakar, S., & Jain, A. K. (2002). Decision-level fusion in fingerprint verification. Pattern Recognition, 35(4), 861–874.
Article MATH Google Scholar
Ross, A., Nandakumar, K., & Jain, A. K. (2006). Handbook of multibiometrics. Berlin: Springer.
Google Scholar
Rudin, W. (1976). Principles of mathematical analysis. New York: McGraw-Hill.
MATH Google Scholar
Scheirer, W., Rocha, A., Micheals, R., & Boult, T. (2010). Robust fusion: Extreme value theory for recognition score normalization. European Conference on Computer Vision, Lecture Notes in Computer Science, 6313, 481–495.
Article Google Scholar
Schuldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: A local SVM approach. IEEE International Conference on Pattern Recognition, 3, 32–36.
Google Scholar
Sheskin, D. J. (2011). Handbook of parametric and nonparametric statistical procedures (5th ed.). London: Chapman and Hall/CRC.
MATH Google Scholar
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.
Book MATH Google Scholar
Sim, T., Baker, S., & Bsat, M. (2003). The CMU pose, illumination, and expression database. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1615–1618.
Article Google Scholar
Tang, K., Yao, B., Fei-Fei, L., & Koller, D. (2013). Combining the right features for complex event recognition. In IEEE international conference on computer vision.
Terrades, O. R., Valveny, E., & Tabbone, S. (2009). Optimal classifier fusion in a non-bayesian probabilistic framework. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9), 1630–1644.
Article Google Scholar
Toh, K.-A., Tran, Q.-L., & Srinivasan, D. (2004a). Benchmarking a reduced multivariate polynomial pattern classifier. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 740–755.
Article Google Scholar
Toh, K.-A., Yau, W.-Y., & Jiang, X. (2004b). A reduced multivariate polynomial model for multimodal biometrics and classifiers fusion. IEEE Transactions on Circuits and Systems for Video Technology, 14(2), 224–233.
Article Google Scholar
Ueda, N. (2000). Optimal linear combination of neural networks for improving classification performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(2), 207–215.
Article MathSciNet Google Scholar
Wang, H., Nie, F., & Huang, H. (2013). Heterogeneous visual features fusion via sparse multimodal machine. In IEEE conference on computer vision and pattern recognition (pp. 3097–3102).
Wang, J., Kwon, S., & Shim, B. (2012). Generalized orthogonal matching pursuit. IEEE Transactions on Signal Processing, 60(12), 6202–6216.
Article MathSciNet Google Scholar
Ye, G., Liu, D., Jhuo, I.-H., & Chang, S.-F. (2012). Robust late fusion with rank minimization. In IEEE conference on computer vision pattern recognition (pp. 3021–3028).
Yuan, X.-T., Liu, X., & Yan, S. (2012). Visual classification with multitask joint sparse representation. IEEE Transactions on Image Processing, 21(10), 4349–4360.
Google Scholar
Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Article Google Scholar

Download references

Acknowledgments

This project was partially supported by the Science Faculty Research Grant of Hong Kong Baptist University, Hong Kong Research Grants Council General Research Fund 212313, National Science Foundation of China Research Grant 61172136. The authors would like to thank the editor and reviewers for their helpful comments which improve the quality of this paper.

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong
Andy J. Ma & Pong C. Yuen
BNU-HKBU United International College, Zhuhai, China
Pong C. Yuen

Authors

Andy J. Ma
View author publications
You can also search for this author in PubMed Google Scholar
Pong C. Yuen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pong C. Yuen.

Additional information

Communicated by K. Ikeuchi.

Appendices

Appendix A: Proof of Proposition 1

We first show that conditionally independent condition implies the solution to the equation system (16) is trivial, i.e. ${\varvec{a}}_{lm0} = \mathbf 0 , {\varvec{a}}_{lm2} = \mathbf 0 , {\varvec{a}}_{lm3} = \mathbf 0 , \ldots $ is a trivial solution to equation system (16) for $m = 1, \ldots , M$. If feature representations are independent with each other given class label $\omega _l$, the analytic function $h_l({\varvec{s}}_l; {\varvec{a}}_l)$ becomes Eq. (3). Rewriting the analytic function in (3) according to the order of $s_{lm}$, we get

$$\begin{aligned} h_l({\varvec{s}}_l; {\varvec{a}}_l) = g_{lm1}(\tilde{{\varvec{s}}}_{lm}; {\varvec{a}}_{lm1}) s_{lm} \end{aligned}$$

(36)

where $g_{lm1}(\tilde{{\varvec{s}}}_{lm}; {\varvec{a}}_{lm1}) = p_l^{1-M} \prod _{m' \ne m} s_{lm'}$. This Eq. (36) means that $g_{lmn}(\tilde{{\varvec{s}}}_{lm}; {\varvec{a}}_{lmn}) \equiv 0$ or equivalently ${\varvec{a}}_{lmn} = \mathbf 0 $ for $n \ne 1$, i.e. the solution to equation system (16) is trivial.

On the other hand, given the solution to equation system (16) is trivial, we need to show that the analytic function $h_l({\varvec{s}}_l; {\varvec{a}}_l)$ is equal to Eq. (3). If ${\varvec{a}}_{lmn} = \mathbf 0 $ for $n \ne 1$, then the analytic function $h_l({\varvec{s}}_l; {\varvec{a}}_l)$ can be rewritten as Eq. (36) for $m = 1, \ldots , M$. This implies each term in the power series $h_l$ contains all variables $s_{l1}, \ldots , s_{lM}$ and the order of each $s_{lm}$ cannot be larger than one. In this case, there is only one non-zero term $\prod _{m=1}^{M} s_{lm}$ in the analytic function $h_l$. In addition, according to the normalization equation (15), the non-zero term $\prod _{m=1}^{M} s_{lm}$ is normalized by the prior. And the analytic function becomes equation (3). This complete the proof of this proposition.

Appendix B: Derivation for $E_{\mathrm{Dis}}({\varvec{a}}, {\varvec{q}})$

$$\begin{aligned}&E_\mathrm{Dis }({\varvec{a}}, {\varvec{q}}) \\&\quad = - \theta \sum _{l=1}^L \sum _{l' \ne l} \sum _{{y_j} = \omega _l } q_{jl'} \\&\qquad +\,\, \frac{\theta }{2} \sum _{l=1}^L \sum _{l' \ne l} \sum _{{y_j} = \omega _l } (({\varvec{a}}_l^T {\varvec{z}}_{jl} - {\varvec{a}}_l'^T {\varvec{z}}_{jl'}) - q_{jl'})^2 \\&\quad = - \theta \sum _{l=1}^L \sum _{l' \ne l} {\varvec{q}}_{ll'}^T \mathbf 1 + \frac{\theta }{2} \sum _{l=1}^L \sum _{l' \ne l} \Vert (Z_{ll}^T {\varvec{a}}_l - Z_{ll'}^T {\varvec{a}}_l') - {\varvec{q}}_{ll'}\Vert ^2 \\&\quad = - \theta \sum _{l=1}^L {\varvec{q}}_l^T \mathbf 1 + \frac{\theta }{2} \sum _{l=1}^L ({\varvec{a}}^T Z_l - {\varvec{q}}_l^T) (Z_l^T {\varvec{a}} - {\varvec{q}}_l) \\&\quad = \frac{1}{2} {\varvec{a}}^T H_\mathrm{Dis } {\varvec{a}} + \theta \sum _{l=1}^L (\frac{1}{2} {\varvec{q}}_l^T {\varvec{q}}_l - {\varvec{a}}^T Z_l {\varvec{q}}_l - {\varvec{q}}_l^T \mathbf 1 ) \end{aligned}$$

Appendix C: Derivation of the Matrix Formulation for $E({\varvec{a}})$

$$\begin{aligned}&E({\varvec{a}}) \\&\quad = \frac{\sum _{l=1}^{L} \sum _{m=1}^M \Vert {\varvec{a}}_l^T ({\varvec{c}}_{lm0}, \ldots , {\varvec{c}}_{lmN}) - (b_0, \ldots , b_N)\Vert ^2}{2LM(N+1)} \\&\quad = \frac{\sum _{l=1}^{L} \sum _{m=1}^M ({\varvec{a}}_l^T C_{lm} - {\varvec{b}}^T) (C_{lm}^T {\varvec{a}}_l - {\varvec{b}})}{2LM(N+1)} \\&\quad = \frac{\sum _{l=1}^{L} [{\varvec{a}}_l^T (\sum _{m=1}^M C_{lm} C_{lm}^T) {\varvec{a}}_l - 2 {\varvec{a}}_l^T \sum _{m=1}^M C_{lm} {\varvec{b}} + {\varvec{b}}^T {\varvec{b}}]}{2LM(N+1)} \\&\quad = \frac{1}{2} \sum _{l=1}^{L} {\varvec{a}}_l^T H_l {\varvec{a}}_l - \sum _{l=1}^{L} {\varvec{a}}_l^T {\varvec{f}}_l + \frac{1}{2LM(N+1)} \sum _{l=1}^{L} {\varvec{b}}^T {\varvec{b}}\\&\quad = \frac{1}{2} {\varvec{a}}^T H {\varvec{a}} - {\varvec{a}}^T {\varvec{f}} + \frac{1}{2LM(N+1)} \sum _{l=1}^{L} {\varvec{b}}^T {\varvec{b}} \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, A.J., Yuen, P.C. Reduced Analytic Dependency Modeling: Robust Fusion for Visual Recognition. Int J Comput Vis 109, 233–251 (2014). https://doi.org/10.1007/s11263-014-0723-7

Download citation

Received: 11 June 2013
Accepted: 02 April 2014
Published: 19 April 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11263-014-0723-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reduced Analytic Dependency Modeling: Robust Fusion for Visual Recognition

Abstract

Access this article

Similar content being viewed by others

Abundant Inverse Regression Using Sufficient Reduction and Its Applications

Visual recognition based on discriminative and collaborative representation

Welcome to Riemannian Computing in Computer Vision

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Proof of Proposition 1

Appendix B: Derivation for \(E_{\mathrm{Dis}}({\varvec{a}}, {\varvec{q}})\)

Appendix C: Derivation of the Matrix Formulation for \(E({\varvec{a}})\)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reduced Analytic Dependency Modeling: Robust Fusion for Visual Recognition

Abstract

Access this article

Similar content being viewed by others

Abundant Inverse Regression Using Sufficient Reduction and Its Applications

Visual recognition based on discriminative and collaborative representation

Welcome to Riemannian Computing in Computer Vision

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Proof of Proposition 1

Appendix B: Derivation for \(E_{\mathrm{Dis}}({\varvec{a}}, {\varvec{q}})\)

Appendix C: Derivation of the Matrix Formulation for \(E({\varvec{a}})\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation