Skip to main content

Observation Influence Diagnostic of a Data Assimilation System

  • Chapter
  • First Online:
Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (Vol. II)

Abstract

The influence matrix is used in ordinary least-squares applications for monitoring statistical multiple-regression analyses. Concepts related to the influence matrix provide diagnostics on the influence of individual data on the analysis, the analysis change that would occur by leaving one observation out, and the effective information content (degrees of freedom for signal) in any sub-set of the analysed data. In this paper, the corresponding concepts are derived in the context of linear statistical data assimilation in Numerical Weather Prediction. An approximate method to compute the diagonal elements of the influence matrix (the self-sensitivities) has been developed for a large-dimension variational data assimilation system (the 4D-Var system of the European Centre for Medium-Range Weather Forecasts). Results show that, in the ECMWF operational system, 18 % of the global influence is due to the assimilated observations, and the complementary 82 % is the influence of the prior (background) information, a short-range forecast containing information from earlier assimilated observations. About 20 % of the observational information is currently provided by surface-based observing systems, and 80 % by satellite systems.A toy-model is developed to illustrate how the observation influence depends on the data assimilation covariance matrices. In particular, the role of high-correlated observation error and high-correlated background error with respect to uncorrelated ones is presented. Low-influence data points usually occur in data-rich areas, while high-influence data points are in data-sparse areas or in dynamically active regions. Background error correlations also play an important role: high correlation diminishes the observation influence and amplifies the importance of the surrounding real and pseudo observations (prior information in observation space). To increase the observation influence in presence of high correlated background error is necessary to introduce the observation error correlation but also observation and background error variances must be of similar size. Incorrect specifications of background and observation error covariance matrices can be identified, interpreted and better understood by the use of influence matrix diagnostics for the variety of observation types and observed variables used in the data assimilation system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bauer P, Buizza R, Cardinali C, Thépaut J-l (2011) Impact of singular vector based satellite data thinning on NWP. Q J R Meteorol Soc 137:286–302

    Article  Google Scholar 

  • Bormann N, Saarinen S, Kelly G, Thépaut J-N (2003) The spatial structure of observation errors in atmospheric motion vectors from geostationary satellite data. Mon Wea Rev 131:706–718

    Article  Google Scholar 

  • Cardinali C (2009) Monitoring the forecast impact on the short-range forecast. Q J R Meteorol Soc 135:239–250

    Article  Google Scholar 

  • Cardinali C, Prates F (2011) Performance measurement with advanced diagnostic tools of all-sky microwave imager radiances in 4D-Var.Q J R Meteorol Soc 137(Issue 661, Part B):2038–2046

    Google Scholar 

  • Cardinali C, Pezzulli S, Andersson E (2004) Influence matrix diagnostics of a data assimilation system. Q J R Meteorol Soc. 130:2767–2786

    Article  Google Scholar 

  • Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–403

    Article  Google Scholar 

  • Geer AJ, Bauer P (2011) Observation errors in all-sky data assimilation. Q J R Meteorol Soc 137(Issue 661, Part B):2024–2037

    Google Scholar 

  • Healy SB, Thépaut J-N (2006) Assimilation experiments with CHAMP GPS radio occultation measurements. Q J R Meteorol Soc 132:605–623

    Article  Google Scholar 

  • Hoaglin DC, Welsch RE (1978) The hat matrix in regression and ANOVA. Am Stat 32:17–22, and Corrigenda 32:146

    Google Scholar 

  • Hoaglin DC, Mosteller F, Tukey JW (1982) Understanding robust and exploratory data analysis. Wiley Series in Probability and Statistics. Wiley, New York

    Google Scholar 

  • Junjie L, Kalnay E, Miyoshi T, Cardinali C (2009) Analysis sensitivity calculation within an ensemble Kalman filter. Q J R Meteorol Soc 135:1842–1851

    Article  Google Scholar 

  • Langland R, Baker NL (2004) Estimation of observation impact using the NRL atmospheric variational data assimilation adjoint system. Tellus 56A:189–201

    Google Scholar 

  • Lorenc A (1986) Analysis methods for numerical weather prediction. Q J R Meteorol Soc 112:1177–1194

    Article  Google Scholar 

  • Rabier F, Järvinen H, Klinker E, Mahfouf JF, Simmons A (2000) The ECMWF operational implementation of four-dimensional variational assimilation. Part I: experimental results with simplified physics. Q J R Meteorol Soc 126:1143–1170

    Article  Google Scholar 

  • Rabier F, Fourrié N, Chafaï D, Prunet P (2002) Channel selection methods for infrared atmospheric sounding interferometer radiances. Q J R Meteorol Soc 128:1011–1027

    Article  Google Scholar 

  • Radnoti G, Bauer P, McNally A, Cardinali C, Healy S, de Rosnay P (2010) ECMWF study on the impact of future developments of the space-based observing system on Numerical Weather Prediction. ECMWF Tec. Memo 638

    Google Scholar 

  • Shen X, Huang H, Cressie N (2002) Nonparametric hypothesis testing for a spatial signal. J Am Stat Ass 97:1122–1140

    Article  Google Scholar 

  • Talagrand O (1997) Assimilation of observations, an introduction. J Meteorol Soc Japan 75(1B):191–209

    Google Scholar 

  • Thépaut JN, Hoffman RN Courtier P (1993) Interactions of dynamics and observations in a four-dimensional variational assimilation. Mon Weather Rev 121:3393–3414

    Article  Google Scholar 

  • Thépaut JN, Courtier P, Belaud G Lemaître G (1996) Dynamical structure functions in four-dimensional variational assimilation: a case study. Q J R Meteorol Soc 122:535–561

    Article  Google Scholar 

  • Tukey JW (1972) Data analysis, computational and mathematics. Q Appl Math 30:51–65

    Google Scholar 

  • Velleman PF, Welsch RE (1981) Efficient computing of regression diagnostics. Am Stat 35:234–242

    Google Scholar 

  • Wahba G (1990) Spline models for observational data. SIAM, CBMS-NSF. Regional conference series in applied mathematics, vol 59. Society for Industrial and Applied Mathematics, Philadelphia, p 165

    Book  Google Scholar 

  • Wahba G, Johnson DR, Gao F, Gong J (1995) Adaptive tuning of numerical weather prediction models: randomized GCV in three- and four-dimensional data assimilation. Mon Weather Rev 123:3358–3369

    Article  Google Scholar 

  • Ye J (1998) On measuring and correcting the effect of data mining and model selection. J Am Stat Ass 93:120–131

    Article  Google Scholar 

  • Zhu Y, Gelaro R (2008) Observation sensitivity calculations using the adjoint of the Gridpoint Statistical Interpolation (GSI) analysis system. Mon Weather Rev 136:335–351

    Article  Google Scholar 

Download references

Acknowledgements

The author thanks Olivier Talagrand and Sergio Pezzulli for the fruitful discussions on the subject. Many thanks are given to Mohamed Dahoui and Anne Fouilloux for their precious technical support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carla Cardinali .

Editor information

Editors and Affiliations

Appendix

Appendix

Influence Matrix Calculation in Weighted Regression Data Assimilation Scheme

Under the frequentist approach, the regression equations for observation

$$\mathbf{y} = \mathbf{H}\boldsymbol{\theta } +\boldsymbol{ \epsilon }_{o}$$

and for background

$$\mathbf{x}_{b} =\boldsymbol{\theta } +\boldsymbol{\epsilon }_{b}$$

are assumed to have uncorrelated error vectors \(\boldsymbol{\epsilon }_{\mathbf{o}}\) and \(\boldsymbol{\epsilon }_{\mathrm{b}}\), zero vector means and variance matrices R and B, respectively. The \(\boldsymbol{\theta }\) parameter is the unknown system state (x) of dimension n. These regression equations are summarized as a weighted regression

$$\mathbf{z} = \mathbf{X}\boldsymbol{\theta } +\boldsymbol{ \epsilon }$$

where \(\mathbf{z} ={ \left [{\mathbf{y}}^{T}\mathbf{x}_{b}^{T}\right ]}^{T}\) is (m + n) ×1; \(\mathbf{X} ={ \left [{\mathbf{H}}^{T}\mathbf{I}_{n}\right ]}^{T}\) is (m + n) ×n and \(\boldsymbol{\epsilon } = {[\boldsymbol{\epsilon }_{o}\boldsymbol{\epsilon }_{b}]}^{T}\) is (m + n) ×1 with zero mean and variances matrix

$$\boldsymbol{\Omega } = \left (\begin{array}{cc} \mathbf{R}& 0\\ 0 &\mathbf{B}\\ \end{array} \right )$$

The generalized LS solution for \(\boldsymbol{\theta }\) is BLUE and is given by

$$\boldsymbol{\hat{\theta }}= {({\mathbf{X}}^{T}{\boldsymbol{\Omega }}^{-1}\mathbf{X})}^{-1}{\mathbf{X}}^{T}{\boldsymbol{\Omega }}^{-1}\mathbf{z}$$
(4.34)

see Talagrand (1997). After some algebra this equation equals (4.11). Thus

$$\mathbf{z} = \mathbf{X}\boldsymbol{\hat{\theta }} ={ \left [{\mathbf{H}}^{T}\mathbf{x}_{ a}^{T}\mathbf{x}_{ a}^{T}\right ]}^{T} = \mathbf{X}{({\mathbf{X}}^{T}{\boldsymbol{\Omega }}^{-1}\mathbf{X})}^{-1}{\mathbf{X}}^{T}{\boldsymbol{\Omega }}^{-1}\mathbf{z}$$

and by (4.5) the influence matrix becomes

$$\mathbf{S}_{\mathrm{zz}} = \frac{\partial \mathbf{\hat{z}}} {\partial \mathbf{z}} = \frac{\partial \boldsymbol{\hat{\theta }}} {\partial \mathbf{z}} = \left (\begin{array}{cc} \mathbf{S}_{\mathit{yy}} & \mathbf{S}_{\mathit{yb}} \\ \mathbf{S}_{\mathit{by}} & \mathbf{S}_{\mathit{bb}}\\ \end{array} \right ) = \left (\begin{array}{cc} {\mathbf{R}}^{-1}{\mathbf{HAH}}^{T}&{\mathbf{R}}^{-1}\mathbf{HA} \\ {\mathbf{B}}^{-1}{\mathbf{AH}}^{T} & {\mathbf{B}}^{-1}\mathbf{A}\\ \end{array} \right )$$

where \(\mathbf{S}_{yy} = \frac{\partial \mathbf{Hx}_{a}} {\partial \mathbf{y}} ;\mathbf{S}_{yb} = \frac{\partial \mathbf{x}_{a}} {\partial \mathbf{y}} ;\mathbf{S}_{\mathit{by}} = \frac{\partial \mathbf{Hx}_{a}} {\partial \mathbf{x}_{b}} ;\mathbf{S}_{\mathit{bb}} = \frac{\partial \mathbf{x}_{a}} {\partial \mathbf{x}_{b}}\). Note that S yy = S as defined in (4.4).

Generalized LS regression is different from ordinary LS because the influence matrix is not symmetric anymore. For idempotence, using (4.33) it easy to show that \(\mathbf{S}_{\mathrm{zz}}\mathbf{S}_{\mathrm{zz}} = \mathbf{S}_{\mathrm{zz}}.\) Finally,

$$\mathbf{S}_{\mathit{bb}} ={ \mathbf{B}}^{-1}\mathbf{A} = \mathbf{I}_{ n} -{\mathbf{H}}^{T}{\mathbf{R}}^{-1}\mathbf{HA}$$

hence,

$$\mathit{tr}(\mathbf{S}_{\mathit{bb}}) = n -\mathit{tr}({\mathbf{H}}^{T}{\mathbf{R}}^{-1}\mathbf{HA}) = n -\mathit{tr}(\mathbf{S}_{\mathit{ yy}})$$

it follows that

$$\mathit{tr}(\mathbf{S}_{\mathit{zz}}) = \mathit{tr}(\mathbf{S}_{\mathit{yy}}) + \mathit{tr}(\mathbf{S}_{\mathit{bb}}) = n$$

The trace of the influence matrix is still equal to the parameter’s dimension.

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Cardinali, C. (2013). Observation Influence Diagnostic of a Data Assimilation System. In: Park, S., Xu, L. (eds) Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (Vol. II). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35088-7_4

Download citation

Publish with us

Policies and ethics