Towards Transparent Systems: Semantic Characterization of Failure Modes

Bansal, Aayush; Farhadi, Ali; Parikh, Devi

doi:10.1007/978-3-319-10599-4_24

Aayush Bansal¹⁹,
Ali Farhadi²⁰ &
Devi Parikh²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8694))

Included in the following conference series:

European Conference on Computer Vision

17k Accesses
16 Citations

Abstract

Today’s computer vision systems are not perfect. They fail frequently. Even worse, they fail abruptly and seemingly inexplicably. We argue that making our systems more transparent via an explicit human understandable characterization of their failure modes is desirable. We propose characterizing the failure modes of a vision system using semantic attributes. For example, a face recognition system may say “If the test image is blurry, or the face is not frontal, or the person to be recognized is a young white woman with heavy make up, I am likely to fail.” This information can be used at training time by researchers to design better features, models or collect more focused training data. It can also be used by a downstream machine or human user at test time to know when to ignore the output of the system, in turn making it more reliable. To generate such a “specification sheet”, we discriminatively cluster incorrectly classified images in the semantic attribute space using L1-regularized weighted logistic regression. We show that our specification sheets can predict oncoming failures for face and animal species recognition better than several strong baselines. We also show that lay people can easily follow our specification sheets.

Download to read the full chapter text

Chapter PDF

Perceptual bias and technical metapictures: critical machine vision as a humanities challenge

Article Open access 12 October 2020

How Good Is My Test Data? Introducing Safety Analysis for Computer Vision

Article Open access 09 June 2017

Classical 2D Face Recognition: A Survey on Methods, Face Databases, and Performance Evaluation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Stack, J.: Automation for underwater mine recognition: Current trends & future strategy. In: Proceedings of SPIE Defense & Security (2011)
Google Scholar
Duin, R.P.W., Tax, D.M.J.: Classifier Conditional Posterior Probabilities. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 611–619. Springer, Heidelberg (1998)
Chapter Google Scholar
Kukar, M.: Estimating confidence values of individual predictions by their typicalness and reliability. In: ECAI (2004)
Google Scholar
Muhlbaier, M., Topalis, A., Polikar, R.: Ensemble confidence estimates posterior probability. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 326–335. Springer, Heidelberg (2005)
Chapter Google Scholar
Delany, S.J., Cunningham, P., Doyle, D., Zamolotskikh, A.: Generating estimates of classification confidence for a case-based spam filter. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, pp. 177–190. Springer, Heidelberg (2005)
Chapter Google Scholar
Dredze, M., Crammer, K.: Confidence-weighted linear classification. In: ICML (2008)
Google Scholar
Bach, N., Huang, F., Al-Onaizan, Y.: Goodness: A method for measuring machine translation confidence. In: ACL (2011)
Google Scholar
Jiang, H.: Confidence measures for speech recognition: A survey. Speech Communication (2005)
Google Scholar
Zhang, W., Yu, S.X., Teng, S.H.: Power svm: Generalization with exemplar classification uncertainty. In: CVPR (2012)
Google Scholar
Boshra, M., Bhanu, B.: Predicting performance of object recognition. PAMI (2000)
Google Scholar
Wang, R., Bhanu, B.: Learning models for predicting recognition performance. In: ICCV (2005)
Google Scholar
Scheirer, W.J., Rocha, A., Micheals, R.J., Boult, T.E.: Meta-recognition: The theory and practice of recognition score analysis. PAMI (2011)
Google Scholar
Wang, P., Ji, Q., Wayman, J.L.: Modeling and predicting face recognition system performance based on analysis of similarity scores. PAMI (2007)
Google Scholar
Scheirer, W., Kumar, N., Belhumeur, P., Boult, T.: Multi-attribute spaces: Calibration for attribute fusion and similarity search. In: CVPR (2012)
Google Scholar
Scheirer, W., Rocha, A., Micheals, R., Boult, T.: Robust fusion: Extreme value theory for recognition score normalization. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 481–495. Springer, Heidelberg (2010)
Chapter Google Scholar
Sarma, A., Palmer, D.D.: Context-based speech recognition error detection and correction. In: NAACL (Short papers) (2004)
Google Scholar
Choularton, S.: Early stage detection of speech recognition errors (2009)
Google Scholar
Jammalamadaka, N., Zisserman, A., Eichner, M., Ferrari, V., Jawahar, C.V.: Has my algorithm succeeded? An evaluator for human pose estimators. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 114–128. Springer, Heidelberg (2012)
Chapter Google Scholar
Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 340–353. Springer, Heidelberg (2012)
Chapter Google Scholar
Farhadi, A., Endres, I., Hoiem, D.: Attribute-centric recognition for cross-category generalization. In: CVPR (2010)
Google Scholar
Lampert, C., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)
Google Scholar
Parikh, D., Grauman, K.: Relative attributes. In: ICCV (2011)
Google Scholar
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)
Google Scholar
Kovashka, A., Parikh, D., Grauman, K.: Whittlesearch: Image search with relative attribute feedback. In: CVPR (2012)
Google Scholar
Kumar, N., Belhumeur, P., Nayar, S.: FaceTracer: A search engine for large collections of images with faces. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 340–353. Springer, Heidelberg (2008)
Chapter Google Scholar
Parkash, A., Parikh, D.: Attributes for classifier feedback. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 354–368. Springer, Heidelberg (2012)
Chapter Google Scholar
Berg, T.L., Berg, A.C., Shih, J.: Automatic attribute discovery and characterization from noisy web data. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 663–676. Springer, Heidelberg (2010)
Chapter Google Scholar
Wang, J., Markert, K., Everingham, M.: Learning models for object recognition from natural language descriptions. In: BMVC (2009)
Google Scholar
Wang, G., Forsyth, D.: Joint learning of visual attributes, object classes and visual saliency. In: ICCV (2009)
Google Scholar
Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007)
Google Scholar
Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 438–451. Springer, Heidelberg (2010)
Chapter Google Scholar
Wang, G., Forsyth, D., Hoiem, D.: Comparative object similarity for improved recognition with few or no examples. In: CVPR (2010)
Google Scholar
Parikh, D., Grauman, K.: Interactively building a discriminative vocabulary of nameable attributes. In: CVPR (2011)
Google Scholar
Biswas, A., Parikh, D.: Simultaneous active learning of classifiers & attributes via relative feedback. In: CVPR (2013)
Google Scholar
Kumar, N., Berg, A., Belhumeur, P., Nayar, S.: Attribute and simile classifiers for face verification. In: ICCV (2009)
Google Scholar
Patterson, G., Hays, J.: Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: CVPR (2012)
Google Scholar
Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A.C., Berg, T.L.: Baby talk: Understanding and generating simple image descriptions. In: CVPR (2011)
Google Scholar
Koh, K., Kim, S.J., Boyd, S.: An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res. (2007)
Google Scholar
Platt, J.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Advances in Large Margin Classiers (2000)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Machine Learning International Workshop (1996)
Google Scholar
Appel, R., Fuchs, T., Dollár, P., Perona, P.: Quickly boosting decision trees - pruning underachieving features early. In: ICML (2013)
Google Scholar
Dollár, P.: Piotr’s Image and Video Matlab Toolbox, http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, USA
Aayush Bansal
University of Washington, Seattle, USA
Ali Farhadi
Virginia Tech, Blacksburg, USA
Devi Parikh

Authors

Aayush Bansal
View author publications
You can also search for this author in PubMed Google Scholar
Ali Farhadi
View author publications
You can also search for this author in PubMed Google Scholar
Devi Parikh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 133 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bansal, A., Farhadi, A., Parikh, D. (2014). Towards Transparent Systems: Semantic Characterization of Failure Modes. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8694. Springer, Cham. https://doi.org/10.1007/978-3-319-10599-4_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-10599-4_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10598-7
Online ISBN: 978-3-319-10599-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Transparent Systems: Semantic Characterization of Failure Modes

Abstract

Chapter PDF

Similar content being viewed by others

Perceptual bias and technical metapictures: critical machine vision as a humanities challenge

How Good Is My Test Data? Introducing Safety Analysis for Computer Vision

Classical 2D Face Recognition: A Survey on Methods, Face Databases, and Performance Evaluation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 133 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Towards Transparent Systems: Semantic Characterization of Failure Modes

Abstract

Chapter PDF

Similar content being viewed by others

Perceptual bias and technical metapictures: critical machine vision as a humanities challenge

How Good Is My Test Data? Introducing Safety Analysis for Computer Vision

Classical 2D Face Recognition: A Survey on Methods, Face Databases, and Performance Evaluation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 133 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation