Abstract
The influence matrix is used in ordinary least-squares applications for monitoring statistical multiple-regression analyses. Concepts related to the influence matrix provide diagnostics on the influence of individual data on the analysis, the analysis change that would occur by leaving one observation out, and the effective information content (degrees of freedom for signal) in any sub-set of the analysed data. In this paper, the corresponding concepts are derived in the context of linear statistical data assimilation in Numerical Weather Prediction. An approximate method to compute the diagonal elements of the influence matrix (the self-sensitivities) has been developed for a large-dimension variational data assimilation system (the 4D-Var system of the European Centre for Medium-Range Weather Forecasts). Results show that, in the ECMWF operational system, 18 % of the global influence is due to the assimilated observations, and the complementary 82 % is the influence of the prior (background) information, a short-range forecast containing information from earlier assimilated observations. About 20 % of the observational information is currently provided by surface-based observing systems, and 80 % by satellite systems.A toy-model is developed to illustrate how the observation influence depends on the data assimilation covariance matrices. In particular, the role of high-correlated observation error and high-correlated background error with respect to uncorrelated ones is presented. Low-influence data points usually occur in data-rich areas, while high-influence data points are in data-sparse areas or in dynamically active regions. Background error correlations also play an important role: high correlation diminishes the observation influence and amplifies the importance of the surrounding real and pseudo observations (prior information in observation space). To increase the observation influence in presence of high correlated background error is necessary to introduce the observation error correlation but also observation and background error variances must be of similar size. Incorrect specifications of background and observation error covariance matrices can be identified, interpreted and better understood by the use of influence matrix diagnostics for the variety of observation types and observed variables used in the data assimilation system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bauer P, Buizza R, Cardinali C, Thépaut J-l (2011) Impact of singular vector based satellite data thinning on NWP. Q J R Meteorol Soc 137:286–302
Bormann N, Saarinen S, Kelly G, Thépaut J-N (2003) The spatial structure of observation errors in atmospheric motion vectors from geostationary satellite data. Mon Wea Rev 131:706–718
Cardinali C (2009) Monitoring the forecast impact on the short-range forecast. Q J R Meteorol Soc 135:239–250
Cardinali C, Prates F (2011) Performance measurement with advanced diagnostic tools of all-sky microwave imager radiances in 4D-Var.Q J R Meteorol Soc 137(Issue 661, Part B):2038–2046
Cardinali C, Pezzulli S, Andersson E (2004) Influence matrix diagnostics of a data assimilation system. Q J R Meteorol Soc. 130:2767–2786
Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–403
Geer AJ, Bauer P (2011) Observation errors in all-sky data assimilation. Q J R Meteorol Soc 137(Issue 661, Part B):2024–2037
Healy SB, Thépaut J-N (2006) Assimilation experiments with CHAMP GPS radio occultation measurements. Q J R Meteorol Soc 132:605–623
Hoaglin DC, Welsch RE (1978) The hat matrix in regression and ANOVA. Am Stat 32:17–22, and Corrigenda 32:146
Hoaglin DC, Mosteller F, Tukey JW (1982) Understanding robust and exploratory data analysis. Wiley Series in Probability and Statistics. Wiley, New York
Junjie L, Kalnay E, Miyoshi T, Cardinali C (2009) Analysis sensitivity calculation within an ensemble Kalman filter. Q J R Meteorol Soc 135:1842–1851
Langland R, Baker NL (2004) Estimation of observation impact using the NRL atmospheric variational data assimilation adjoint system. Tellus 56A:189–201
Lorenc A (1986) Analysis methods for numerical weather prediction. Q J R Meteorol Soc 112:1177–1194
Rabier F, Järvinen H, Klinker E, Mahfouf JF, Simmons A (2000) The ECMWF operational implementation of four-dimensional variational assimilation. Part I: experimental results with simplified physics. Q J R Meteorol Soc 126:1143–1170
Rabier F, Fourrié N, Chafaï D, Prunet P (2002) Channel selection methods for infrared atmospheric sounding interferometer radiances. Q J R Meteorol Soc 128:1011–1027
Radnoti G, Bauer P, McNally A, Cardinali C, Healy S, de Rosnay P (2010) ECMWF study on the impact of future developments of the space-based observing system on Numerical Weather Prediction. ECMWF Tec. Memo 638
Shen X, Huang H, Cressie N (2002) Nonparametric hypothesis testing for a spatial signal. J Am Stat Ass 97:1122–1140
Talagrand O (1997) Assimilation of observations, an introduction. J Meteorol Soc Japan 75(1B):191–209
Thépaut JN, Hoffman RN Courtier P (1993) Interactions of dynamics and observations in a four-dimensional variational assimilation. Mon Weather Rev 121:3393–3414
Thépaut JN, Courtier P, Belaud G Lemaître G (1996) Dynamical structure functions in four-dimensional variational assimilation: a case study. Q J R Meteorol Soc 122:535–561
Tukey JW (1972) Data analysis, computational and mathematics. Q Appl Math 30:51–65
Velleman PF, Welsch RE (1981) Efficient computing of regression diagnostics. Am Stat 35:234–242
Wahba G (1990) Spline models for observational data. SIAM, CBMS-NSF. Regional conference series in applied mathematics, vol 59. Society for Industrial and Applied Mathematics, Philadelphia, p 165
Wahba G, Johnson DR, Gao F, Gong J (1995) Adaptive tuning of numerical weather prediction models: randomized GCV in three- and four-dimensional data assimilation. Mon Weather Rev 123:3358–3369
Ye J (1998) On measuring and correcting the effect of data mining and model selection. J Am Stat Ass 93:120–131
Zhu Y, Gelaro R (2008) Observation sensitivity calculations using the adjoint of the Gridpoint Statistical Interpolation (GSI) analysis system. Mon Weather Rev 136:335–351
Acknowledgements
The author thanks Olivier Talagrand and Sergio Pezzulli for the fruitful discussions on the subject. Many thanks are given to Mohamed Dahoui and Anne Fouilloux for their precious technical support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Influence Matrix Calculation in Weighted Regression Data Assimilation Scheme
Under the frequentist approach, the regression equations for observation
and for background
are assumed to have uncorrelated error vectors \(\boldsymbol{\epsilon }_{\mathbf{o}}\) and \(\boldsymbol{\epsilon }_{\mathrm{b}}\), zero vector means and variance matrices R and B, respectively. The \(\boldsymbol{\theta }\) parameter is the unknown system state (x) of dimension n. These regression equations are summarized as a weighted regression
where \(\mathbf{z} ={ \left [{\mathbf{y}}^{T}\mathbf{x}_{b}^{T}\right ]}^{T}\) is (m + n) ×1; \(\mathbf{X} ={ \left [{\mathbf{H}}^{T}\mathbf{I}_{n}\right ]}^{T}\) is (m + n) ×n and \(\boldsymbol{\epsilon } = {[\boldsymbol{\epsilon }_{o}\boldsymbol{\epsilon }_{b}]}^{T}\) is (m + n) ×1 with zero mean and variances matrix
The generalized LS solution for \(\boldsymbol{\theta }\) is BLUE and is given by
see Talagrand (1997). After some algebra this equation equals (4.11). Thus
and by (4.5) the influence matrix becomes
where \(\mathbf{S}_{yy} = \frac{\partial \mathbf{Hx}_{a}} {\partial \mathbf{y}} ;\mathbf{S}_{yb} = \frac{\partial \mathbf{x}_{a}} {\partial \mathbf{y}} ;\mathbf{S}_{\mathit{by}} = \frac{\partial \mathbf{Hx}_{a}} {\partial \mathbf{x}_{b}} ;\mathbf{S}_{\mathit{bb}} = \frac{\partial \mathbf{x}_{a}} {\partial \mathbf{x}_{b}}\). Note that S yy = S as defined in (4.4).
Generalized LS regression is different from ordinary LS because the influence matrix is not symmetric anymore. For idempotence, using (4.33) it easy to show that \(\mathbf{S}_{\mathrm{zz}}\mathbf{S}_{\mathrm{zz}} = \mathbf{S}_{\mathrm{zz}}.\) Finally,
hence,
it follows that
The trace of the influence matrix is still equal to the parameter’s dimension.
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Cardinali, C. (2013). Observation Influence Diagnostic of a Data Assimilation System. In: Park, S., Xu, L. (eds) Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (Vol. II). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35088-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-35088-7_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35087-0
Online ISBN: 978-3-642-35088-7
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)