Introduction

Precipitation is one of the main components of the hydrological cycle, and it is the primary forcing input for the rainfall-runoff models (Kar et al. 2015; Zhu et al. 2016; Wu et al. 2018). The accuracy of hydrological and water resources related studies significantly depends on the spatiotemporal resolution and the quality of long-term time series precipitation data (Sorooshian et al. 2000; Zeng et al. 2018; Hosseini-Moghari et al. 2018). In many regions of the world, mainly, in the least developed nations, ground-based measurement networks (rain gauges and weather radar systems) are sparse and inadequate for capturing the spatial and temporal variability of precipitation (Miao et al. 2015; Dinku et al. 2018; Musie et al. 2019). The limitations of rain gauges and weather radar systems highlight the importance of satellite-based and reanalysis global precipitation data sources (Sorooshian et al. 2011; Miao et al. 2015). Moreover, the complex and higher topographical variations in the highland part of Ethiopia worsens the problem of lack of gauge data. Thus, the accurate estimate of precipitation data is essential for a wide range of applications, including the hydrologic prediction, flood analysis, and water resources management planning. There is a growing demand to improve the spatial and temporal resolutions and accuracy of global-scale precipitation estimates for use in the investigation of climate, hydrology, and various environmental processes (Sorooshian et al. 2000). However, inhomogeneity in the global precipitation estimates has been observed due to coarser spatial and temporal resolutions and variations in satellite inputs (Dinku et al. 2018). Therefore, it is crucial to analyze the accuracy and applicability of the global climate data to effectively utilize the information for the decision support system in different regions of the world.

Awash River basin is one of the most utilized and densely populated river basins of Ethiopia with limited surface water resources (Chekol 2006; Yilma et al. 2016; Shawul et al. 2019) and has significant economic, social, and ecological importance for the country. Therefore, proper management of the basins water resources based on reliable hydrologic studies is vital to satisfy the current and future water demands in the region. In addition to gauged climate data, satellite-based and reanalysis global climate data can be valuable sources of input for hydrological simulations and watershed management studies. The growing availability of high-resolution rainfall products can help the hydrologists to obtain more accurate precipitation data, particularly in developing countries and remote locations where weather radars are non-existent (Worqlul et al. 2014; Wang et al. 2017). Based on the mean areal precipitation values from four different reanalysis datasets over Ethiopia, Tesfaye et al. (2017) demonstrated heavy rainfall patterns in higher elevations which are also densely populated, whereas, lower elevation regions which are mainly located in the eastern and south-eastern part of the country receives lower precipitation amount.

Global reanalysis of weather data such as the National Centers for Environmental Prediction (NCEP), Climate Forecast System Reanalysis (CFSR) has been used for various hydrological applications (Fuka et al. 2014; Dile and Srinivasan 2014; Hu et al. 2017; Tolera et al. 2018). Saha et al. (2010) concluded that CFSR is considerably more accurate than the previous global reanalysis made at NCEP in the 1990s because it includes analyses of both the ocean and sea ice, and it has higher resolution in space and time. The CFSR data provide a new opportunity for modeling the ungauged river basins globally (Fuka et al. 2014). The Tropical Rainfall Measuring Mission (TRMM) is a joint project of the Japan Aerospace Exploration Agency (JAXA) and the U.S. National Aeronautics and Space Administration (NASA) which was launched in 1997 (Kummerow et al. 1998). The evaluation of current rainfall products indicates that the availability of high-quality TRMM rainfall data significantly improves the ability to meet the need of continuous-time high-resolution precipitation data (Sorooshian et al. 2000). In mountainous regions, large variations in the annual precipitation amount occur in areas located few kilometers apart (Anders et al. 2006; Andermann et al. 2011). The satellite-based precipitation data sets such as TRMM have the potential to overcome the issues of data scarcity even in the mountainous areas (Anders et al. 2006; Andermann et al. 2011; Deus et al. 2013; Worqlul et al. 2014; Bai and Liu 2018). The TRMM precipitation has been applied widely for hydrologic modeling. The National Oceanic and Atmospheric Administration (NOAA), Climate Prediction Center (CPC) is global unified gauge-based analysis of daily precipitation data set which is part of the product suite from the CPC unified precipitation project that is underway at (hereafter CPC-NOAA). The primary goal of the project is to create a suite of unified precipitation products with consistent quantity and improved quality by combining all the information sources available at the CPC and by taking advantage of a state-of-the-art quality control systems and the optimal interpolation (OI) objective analysis technique (Silva et al. 2007; Chen et al. 2008a; Higgins and Kousky 2013). The CPC-NOAA daily gauge observations have long been utilized for various applications in climate research and operations (Chen et al. 2008b). It provides continuous-time gridded precipitation and temperature data globally.

The hydrological simulation was performed using the Soil and Water Assessment Tool (SWAT) model. The application of SWAT in predicting stream flow and evaluation of the impact of land use and climate change on the hydrology of different watersheds in Ethiopia has been documented by various studies (Chekol et al. 2007; Setegn et al. 2009, 2010; Wondie et al. 2011; Shawul and Chakma 2018; Shawul et al. 2016, 2019). Meteorological data are the main driving factor of the hydrological modeling in the SWAT model in addition to spatial data such as soil data, topographic data, and land use data. However, obtaining quality input data mainly climatic data has been an exacting issue. Thus, this study aims to statistically evaluate the suitability of the CFSR-NCEP, CPC-NOAA, and TRMM precipitation data for hydrologic simulation using the SWAT model in the Upper Awash basin. The specific objectives are (1) to statistically evaluate the accuracy of rainfall data that are derived from CFSR-NCEP, CPC-NOAA, and TRMM with observed rain gauge data at Upper Basin (UB) and Lower Basin (LB) area in the Upper Awash basin. (2) To evaluate the performance of CFSR-NCEP, CPC-NOAA and TRMM precipitation on the daily and monthly hydrologic simulation efficiency at four main gauged watersheds, namely: Akaki, Mojo, Melka Kuntre and Hombole watersheds in the Upper Awash basin, Ethiopia. The comparison of gridded values of global climate data sources with rain gauge precipitation data was performed based on mean areal values.

Materials and methods

Study area

The Upper Awash river basin lies between the latitudes of 8° 16′ and 9° 18′ and longitudes of 37° 57′ and 39° 17′ and total drainage area is estimated to be 11720 km2 at the outlet on the downstream side of Koka Dam. The elevation ranges from 3571 to 1551 m, which shows a higher altitudinal difference within the basin, as shown in Fig. 1. The rainfall during the year occurs in distinctly different seasons. There are three main seasons in Ethiopia, namely the ‘Bega’ (October–January) drier season; ‘Belg’ (February–May) short rain season, and ‘Kiremt’ (June–September) the long rainy season which are the wettest months of the year. The inter-annual variability of rainfall in Belg season, as well as Kiremt rainfall season, occasionally leads to large scale droughts and floods in Awash River basin (NMSA 2001; Korecha and Sorteberg 2013). This study was undertaken in four main watersheds of the Upper Awash basin, namely: Akaki, Mojo, Melka Kuntre, and Hombole watersheds, and for evaluation of the performance of the global precipitation with observed values the Upper Awash basin is classified into two regions as Upper Basin (UB) and Lower Basin (LB) area as shown in Fig. 1.

Fig. 1
figure 1

Location map of climate data grid points, rain gauge stations, digital elevation model (DEM), Upper Basin (UB) and Lower Basin (LB) areas, and river reaches in the main sub-basins of Upper Awash basin, Ethiopia

Data requirements and data sources

The input data required for the SWAT hydrologic model include climate input data which was obtained from gauge observations as well as the three global precipitation data sets which are briefly described in the following subsections. The spatial distribution of all sources of weather inputs, the river reaches, DEM and sub-basins in the Upper Awash basin are shown in Fig. 1. Moreover, the soil physicochemical properties, Landuse/landcover (LULC), and DEM of the study basin are important spatial inputs to the SWAT model.

Meteorological data

Gauged climate data

The daily scale meteorological data which include precipitation, maximum and minimum temperature, solar radiation, wind speed, and relative humidity were obtained mainly from the National Meteorological Agency of Ethiopia (NMA), (https://www.ethiomet.gov.et/) for stations located within and near the Upper Awash basin. The range of the missing rainfall data series was filled using the inverse distance weighted (IDW) interpolation method. Rainfall data were obtained from 30 stations, and the majority of the stations revealed similar periodic patterns.

CPC-NOAA data

The NCEP Climate Prediction Center (CPC) product is a gauge-based analysis of daily precipitation constructed over the global land areas (Chen et al. 2008a). The CPC global gridded daily precipitation and minimum and maximum temperature data were collected from the National Oceanic and Atmospheric Administration (NOAA), which is available on a daily time scale with a spatial resolution of 0.5° × 0.5° grid mesh from 1979/01/01 to present https://www.esrl.noaa.gov/psd/data/gridded/data.cpc.globalprecip.html.

CFSR-NCEP data

The National Centers for Environmental Prediction (NCEP) Climate Forecast System Reanalysis (CFSR) data are available at 0.5° × 0.5° spatial resolution, on a daily basis with climate variables including precipitation, minimum and maximum temperature, wind speed, solar radiation, and relative humidity data. The CFSR-NCEP data were accessed from SWAT global weather database website https://globalweather.tamu.edu/ for daily climate variables for the grids that represent the Upper Awash basin, as shown in Fig. 1.

TRMM data

The Tropical Rainfall Measuring Mission (TRMM 3B42) relies primarily on passive microwave (PMW) precipitation estimates from the special sensor microwave imager and sounder (SSMIS), the TMI, the advanced microwave sounding unit (AMSU), the microwave humidity sounder (MHS) and the advanced microwave scanning radiometer for the earth observing system (AMSR‐E) (Blacutt et al. 2015). TRMM precipitation estimates are at the 3-hourly temporal resolution and 0.25° × 0.25° resolution. The daily accumulated TRMM data from the period of 1998–2010 were obtained from https://trmm.gsfc.nasa.gov/.

Hydrologic and spatial input data

Hydrologic data

The daily discharge data for the main watersheds of the Upper Awash basin were obtained from the Ministry of Water and Energy of Ethiopia (MoWE), https://mowie.gov.et. The observed daily discharge data of Hombole, Melka Kuntre, Mojo, and Akaki gauging station revealed a mono-modal monthly pattern which has peak discharge in August. Hydrologic performance of the global precipitation data was performed at the four watersheds in the Upper Awash basin with the approximate area of Hombole (7948.8 km2), Melka Kuntre (4975.8 km2), Mojo (1767.5 km2) and Akaki (1467.9 km2). The daily and monthly discharge data periods, which have relatively consistent records were used for evaluating model performance.

Soil properties

Basic physicochemical properties of major soil types were mainly obtained from the following sources: Soil database and digital soil map from the MoWE; Harmonized world soil database; Major Soils of the world CD-ROM (FAO 2002; Nachtergaele et al. 2012). Eutric vertisols is the dominant soil type with nearly 52% area coverage, 11% vertic cambisols, 10.4% humic nitosols, 9.7% chromic luvisols, and the remaining area of Upper Awash basin is covered with eutric cambisols, eutric fluvisols, lithic leptosols, luvic phaeozomes, haplic luvisols, mollic andosols, vitric andosols, and waterbody.

Digital elevation model (DEM)

For this study, DEM with 30 m spatial resolution was used which was obtained from the Shuttle Radar Topography Mission (SRTM), from the earth explorer website https://earthexplorer.usgs.gov/. The topography was defined using DEM and the DEM was also used to analyze the drainage patterns of the land surface terrain. In addition, the value of sub-basin parameters such as slope, slope length, and the stream network was also derived from the DEM.

Land use/land cover (LULC)

The LULC data for the hydrologic simulations were prepared based on Landsat 8 (OLI) images which were obtained from the United States Geological Survey (USGS), earth explorer, https://earthexplorer.usgs.gov/ for the year 2014. The necessary procedures of supervised image classifications methods were followed using ERDAS Imagine software, and post-classification analysis was carried out using the ArcGIS toolboxes. The LULC classes were classified into six main classes, namely: Cropland, Forest, Shrubland, Pasture, Urban, and Water. The detailed LULC change classification and change detection for the last half-century in the Upper Awash basin is presented in Shawul and Chakma (2019).

Comparison of the conventional gauged data and global climate data

The following statistical indices were used to calculate the accuracy of precipitation data derived from CFSR-NCEP, CPC-NOAA and TRMM in relation to the gauged precipitation data (Ebert et al. 2007; Zhu et al. 2016; Gao et al. 2018). Namely, correlation coefficient (CC), root mean squared error (RMSE), mean absolute error (MAE), percent bias (PBIAS) and regression coefficient (R2). The CC reflects the degree of linear correlation, the range of values for the CC bounded between − 1.0 and 1.0. The data sets are best correlated when the value is equal to 1. MAE reflects the average difference between precipitation estimates and the respective gauge observation data. It indicates the relative degree of the systematic error of the precipitation estimates. The zero value is a perfect result which indicates that the observed and estimated values are the same. The PBIAS describes the change between the observed value, and the estimated precipitation value, the perfect value for PBIAS is zero. Moreover, RMSE reflects the average error between precipitation estimates and gauge observations. The perfect value is zero and bigger weights to larger errors. The values of the indices are calculated by the following Eqs. 14; the value of R2 was calculated based on Eq. 6:

$${\text{CC}} = \frac{{\left( {\sum {\left[ {E_{i} - E_{{{\text{av}}}} } \right]} \left[ {O_{i} - O_{{{\text{av}}}} } \right]} \right)}}{{\sqrt {\sum {\left[ {E_{i} - E_{{{\text{av}}}} } \right]}^{2} } \sqrt {\sum {\left[ {O_{i} - O_{{{\text{av}}}} } \right]}^{2} } }},$$
(1)
$${\text{MAE}} = \frac{1}{N}\sum {\left( {E_{i} - O_{i} } \right)} ,$$
(2)
$${\text{PBIAS}} = 100\left( {\frac{{\sum {E_{i} - \sum {O_{i} } } }}{{\sum {O_{i} } }}} \right),$$
(3)
$${\text{RMSE}} = \sqrt {\frac{1}{N}\sum {\left( {E_{i} - O_{i} } \right)^{2} } } ,$$
(4)

where Oi is the observed precipitation and Oav is average observed precipitation from rain gauges; Ei is the precipitation estimates, and Eav average estimated precipitation from CFSR-NCEP, CPC-NOAA, and TRMM 3B42; N is the length of the estimated and observed data series.

The performance global precipitation in relation to the observed precipitation series was evaluated based on the mean areal values calculated at two different regions in the Upper Awash basin. Namely, the Upper Basin (UB) area which is located in the northern and western part of the basin and characterized by higher elevation differences. The second part is Lower Basin (LB) area which is located at the lower end of the basin in the southern and eastern part with relatively plain topography and lower elevation difference.

Hydrologic simulation with the SWAT model

The SWAT model is physically based, computationally efficient, and capable of continuous simulation over long periods (Arnold et al. 1998; Gassman et al. 2007). The SWAT simulates the hydrological cycle based on the water balance equation, as shown in Eq. 5 (Neitsch et al. 2011):

$${\text{SW}}_{t} = {\text{SW}}_{o} + \sum\limits_{i = 1}^{t} {\left( {R_{{{\text{day}}}} - Q_{{{\text{surf}}}} - E_{a} - W_{{{\text{seep}}}} - Q_{{{\text{gw}}}} } \right)} ,$$
(5)

where SWt is the final soil water content (mm), SWo is the initial soil water content on day i (mm), t is the time (days), Rday is the amount of precipitation on day i (mm), Qsurf is the amount of surface runoff on day i (mm), Ea is the amount of evapotranspiration on day i (mm), Wseep is the amount of water entering the vadose zone from the soil profile on day i (mm), and Qgw is the amount of return flow on day i (mm).

The hydrologic response unit (HRU) analysis tool helps to load land use, soil layers, and slope maps to the project. The HRU analysis in SWAT includes divisions of HRUs by slope classes in addition to land use and soils. The LULC, soil, and slope in the SWAT database were used for the delineation of HRU. The HRUs were delineated based on the 10% threshold value for land use, 20% for soil, and 20% for slope. The most sensitive hydrologic parameters were considered for the model calibration processes. Based on the global sensitivity analysis method in the SWAT Calibration and Uncertainty Program (SWAT-CUP) program (Abbaspour 2007), 12 most sensitive hydrologic parameters were selected. Therefore, in the model calibration/evaluation processes, the most sensitive hydrologic parameters which are shown in Table 1 were selected based on global sensitivity analysis in SWAT-CUP. The parameter values were varied iteratively within the allowable ranges until a satisfactory agreement between measured and simulated streamflow was obtained. A detail description of the hydrologic parameters and theoretical backgrounds are indicated in the SWAT model user’s manual (Neitsch et al. 2011).

Table 1 Sensitive hydrologic parameters, iteration methods, and maximum and minimum ranges of values which were applied in the Hombole, Melka Kuntre, Mojo and Akaki watersheds for gauged climate data

Evaluation of the performance SWAT model for hydrologic simulation

The SWAT-CUP was used to evaluate the efficiency of gauged data and global precipitation estimates for a hydrologic prediction based on observed discharge data at four main sub-basins of the Upper Awash basin. The SWAT-CUP provides a decision-making framework that incorporates a semi-automated approach of Sequential Uncertainty Fitting version 2 (SUFI2) using automated calibration, incorporating sensitivity and uncertainty analysis (Abbaspour 2007; Arnold et al. 2012). The regression coefficient (R2) is the square of the Pearson product-moment correlation coefficient and describes the proportion of the total variance in the observed data that can be explained by the model. The closer the value of R2 to 1, the higher is the agreement between the simulated and the measured flows and was calculated based on Eq. 6. Nash and Sutcliffe simulation efficiency (NS) indicates the degree of fitness of observed and simulated data and given by Eq. 7 (Nash and Suttcliffe 1970). The value of NS ranges from 1.0 (best) to negative infinity. The NS indicates how well the plot of observed versus simulated value fits the 1:1 line. The percent bias (PBIAS) measures the average tendency of the simulated data to be larger or smaller than their observed values for a given quantity over a specified period, as shown in Eq. 3. However, higher values for PBIAS are acceptable if the accuracy in which the observed data gathered is relatively poor. The RMSE-observations standard deviation ratio (RSR), standardizes RMSE using the observations standard deviation. RSR incorporates the benefits of error index statistics and includes a scaling/normalization factor so that the resulting statistic and reported values could apply to various constituents. The lower the RSR, the lower the RMSE, and the better the model simulation performance (Moriasi et al. 2007). RSR is calculated as the ratio of the RMSE and the standard deviation of measured data, as shown in Eq. 8:

$$R^{2} = \frac{{\left( {\sum {\left[ {X_{i} - X_{{{\text{av}}}} } \right]} \left[ {Y_{i} - Y_{{{\text{av}}}} } \right]} \right)^{2} }}{{\sum {\left[ {X_{i} - X_{{{\text{av}}}} } \right]}^{2} \sum {\left[ {Y_{i} - Y_{{{\text{av}}}} } \right]}^{2} }},$$
(6)
$${\text{NS}} = 1 - \frac{{\sum {\left( {X_{i} - Y_{i} } \right)^{2} } }}{{\sum {\left( {X_{i} - X_{{{\text{av}}}} } \right)^{2} } }},$$
(7)
$${\text{RSR}} = \frac{{{\text{RMSE}}}}{{{\text{STDevX}}_{i} }} = \frac{{\left[ {\sqrt {\sum {\left( {X_{i} - Y_{i} } \right)^{2} } } } \right]}}{{\left[ {\sqrt {\sum {\left( {X_{i} - X_{{{\text{av}}}} } \right)^{2} } } } \right]}},$$
(8)

where Xi is a measured value, Xav is an average measured value, Yi is a simulated value, Yav is average simulated value, RMSE is the root-mean-square error, and STDev is the standard deviation.

Results and discussions

Comparison of precipitation estimates

The gridded precipitation estimates derived from the CFSR-NCEP, CPC-NOAA, and TRMM were compared with the observed annual, monthly, and daily data. The areal mean values of precipitation for all climate stations which are located in the Upper Basin (UB) and Lower Basin (LB) area within the Upper Awash basin was considered for comparison. Due to higher topographic variability in the study area, the amount of precipitation at different stations exhibited higher spatial differences between the areas in the higher altitude and lower altitude. River basins in the highland areas experience highly diversified rainfall-runoff phenomena due to significant spatial and temporal variability in the rainfall characteristics within the basin (Kar et al. 2014, 2015). Thus, the comparison of areal precipitation data was performed based on two regions which exhibited a relatively similar monthly pattern namely: the UB is at the upstream area and LB in the downstream area. The evaluation was conducted from the year 1980 to 2013 for the CFSR-NCEP and CPC-NOAA, and from 1998 to 2009 for TRMM precipitation estimates. The UB and LB scale areal precipitation estimated from gauged data and precipitation estimates was compared. The annual precipitation and the monthly precipitation values which are shown in Fig. 2 indicate the CFSR-NCEP precipitation overestimated the observed annual value at the UB area, whereas the CPC-NOAA and TRMM precipitation slightly underestimated most of the areal observed precipitation values. Similarly, the monthly plot of precipitation also shows CFSR-NCEP highly overestimated the gauged value, mainly in the wet months (June, July, and August). Unlike the CFSR-NCEP data, the CPC-NOAA and TRMM showed consistent annual and monthly data series with the observed data in the UB part of the Upper Awash basin. However, the CPC-NOAA data highly underestimated the observed annual values abruptly between the years 2000 and 2005 both at the UB and LB part of the basin as shown in Fig. 2. In contrary to the UB areas, the CFSR-NCEP precipitation underestimated the observed values in the LB area both at monthly and annual basis. The relative difference between the mean annual precipitation estimates and the observed precipitation shows that the CFSR-NCEP overestimated by 34.8%, CPC-NOAA and TRMM underestimates the observed values by − 16.4% and − 24.4%, respectively in the UB area. Dile and Srinivasan (2014) also found that the average annual rainfall obtained from CFSR-NCEP over the sub-basins of the northern highlands in Ethiopia was higher than the annual rainfall from the conventional weather databases. In the LB area, all the global precipitation data underestimated the mean annual observed data by − 17.8%, − 38.8% and − 27.6% for CPC-NOAA, CFSR-NCEP and TRMM data, respectively. Moreover, the precipitation estimates indicated variations on a monthly scale. The TRMM and CPC-NOAA monthly precipitation data underestimated the observed values. The performance of mean areal global precipitation data relative to the gauged data exhibited distinctly at the two different parts of the study basin. In the low-lying LB areas of Upper Awash basin, where the elevation difference is relatively lower and flat topography, all three global databases underestimated the observed values. Particularly, CSFR-NCEP data highly underestimated the mean annual values. Conversely, in the UB area which with higher elevation difference at the upstream of the basin, the CPC-NOAA and TRMM performed better, except CSFR-NCEP data which has highly overestimated the mean annual and monthly observed precipitation data.

Fig. 2
figure 2

Mean annual rainfall series and mean monthly rainfall (mm) for Upper Basin (UB) and Lower Basin (LB) areas for rain gauge data (Obs) and global precipitation estimates (CFSR-NCEP, CPC-NOAA, and TRMM data) within the Upper Awash basin

The scatterplot of monthly precipitation between the observation data and CFSR-NCEP, CPC-NOAA, and TRMM data revealed good agreements, as shown in Fig. 3. The regression coefficient (R2) of the fitted line shows 0.79, 0.70, and 0.66 for CFSR-NCEP, CPC-NOAA, and TRMM data in the UB part. Moreover, the R2 values of the fitted line shows 0.64, 0.69, and 0.65 for CFSR-NCEP, CPC-NOAA, and TRMM data, respectively in the LB part of Upper Awash basin. Based on the scatter plot CSFR-NCEP performed better in the UB area, while CPC-NOAA performed better in the LB area of the basin. The scatterplot of monthly precipitation shows CFSR-NCEP data overestimated most of the peak observed precipitation, whereas, CPC-NOAA and TRMM data underestimated the peak monthly precipitation estimates. Moreover, the scatterplot suggests the discrepancy between the observed and global precipitation estimates increases for monthly peak precipitation values (mm/month).

Fig. 3
figure 3

Scatterplot of monthly precipitation (mm) for Upper Basin (UB) and Lower Basin (LB) areas between rain gauge data (OBS) and global precipitation estimates (CFSR-NCEP, CPC-NOAA and TRMM data) within the Upper Awash basin

The descriptive statistics of annual and daily precipitation data both for gauged and global data clearly depicted that the UB area receives larger amounts of precipitation than the LB area as shown in Table 2. In the UB area, the CFSR-NCEP data overestimated the observed precipitation value; while the CPC-NOAA and TRMM data underestimated the observed annual and daily precipitation values. Whereas, in the LB area all the global precipitation estimates underestimate the observed values both at the mean annual and daily series. However, in terms of annual maximum and daily maximum values the CFSR-NCEP, CPC-NOAA, and TRMM overestimated the observed data at both locations in UB and LB areas. The global precipitation estimates show higher annual and daily variability as compared to the observed values at both UB and LB areas which were found to have higher daily variance and standard deviations.

Table 2 Descriptive statistics of observed areal precipitation over Upper Basin (UB) and Lower Basin (LB) area in the Upper Awash basin

The statistical indices used to evaluate the monthly precipitation series exhibited correlation coefficients (CC) with gauge observation of 0.89, 0.84 and 0.81, for CFSR-NCEP, CPC-NOAA, and TRMM estimates in UB area and 0.80, 0.80 and 0.83 respectively in the LB area. The monthly R2 value for all precipitation data was found to be > 0.65 at both UB and LB area as shown in Table 3. The negative and positive values of MAE and PBIAS demonstrates underestimation and overestimation of the observed precipitation data. As a result, the TRMM and CPC-NOAA precipitation data underestimated the mean areal gauged data value both at the UB and LB area of the Upper Awash basin. Whereas, the CFSR-NCEP precipitation data overestimated the gauged value in the UB area and underestimation in the LB area. Relatively higher monthly MAE, PBIAS and RMSE values were obtained for CFSR-NCEP data which indicated a lower fit with the observed monthly precipitation data. Zhu et al. (2016) have also found the TRMM precipitation underestimated the observed value, while CFSR-NCEP overestimates the observed precipitation. However, the CFSR-NCEP database which was obtained from the SWAT model official website provides extra benefits for hydrologic modeling, because the daily temperature, wind speed, solar radiation, and relative humidity data can also be obtained in addition to the precipitation data, which is vital sources of input data in data-scarce remote regions.

Table 3 Performance of global precipitation product on the monthly and daily scale in the Upper Basin (UB) and Lower Basin (LB) area of Upper Awash basin

The comparison of global precipitation estimate and the observed precipitation on a daily time scale illustrated from lower to satisfactory performances. Relatively better daily statistical efficiency was obtained by the CFSR-NCEP data with higher value R2 of 0.28, and CC of 0.53. Whereas, based on the MAE and PBIAS efficiency measures, CPC-NOAA gave a better daily fit with the observed data both at UB and LB area as shown in Table 3. The worst daily performance was obtained by the TRMM precipitation estimate with R2 value of 0.16 and CC value of 0.41, at the UB area. The daily and monthly statistical indices of precipitation estimate with gauge precipitation reveals the CFSR-NCEP performed better than the CPC-NOAA and TRMM precipitation estimates. The reason for the lower daily performance of the CPC-NOAA and TRMM datasets might be due to the high spatial variation of precipitation over the Upper Awash basin due to higher topographical changes. The statistical and graphical analysis of annual, monthly, and daily time series values of precipitation estimates with observed precipitation data demonstrated well to satisfactory performance. However, the application of these data sources on the micro-scale watershed on daily basis should be made with caution. Overall, the global sources of gridded precipitation data can be used for different hydro-climatic studies in the Awash river basin, and data-scarce regions of Ethiopia.

Spatial distribution of annual precipitation estimates

The inverse distance weighted (IDW) spatial interpolation technique was used to create the spatial surfaces of interpolated annual precipitation. IDW determines cell values using a linearly weighted combination of a set of sample points; the points were taken from the location of rain gauge stations and the center of grid points for the gridded global precipitation estimates. The spatial distributions of mean annual precipitation derived from observation rain gauges data, CFSR-NCEP, CPC-NOAA, and TRMM over the Upper Awash basin are shown in Fig. 4. The highest annual average precipitation was obtained by CFSR-NCEP as 1829.2 mm in the north-western part of the Upper Awash basin, as shown in Fig. 4b and also shown in Table 2. The spatial map of measured rain gauge data shown in Fig. 4a indicates the larger area of the basin is covered with mean precipitation in the range of 1000–1200 mm. The map of mean annual precipitation derived from CPC-NOAA shows an underestimation of the mean gauged areal values with larger areas of the basin found to have annual precipitation amounts of less than 900 mm, as shown in Fig. 4c. The mean annual precipitation derived by TRMM offered a better resemblance with the spatial map of gauged data, as shown in Fig. 4d. Better spatial correlation was obtained by TRMM data due to higher spatial resolution than the CFSR-NCEP and CPC-NOAA data. Generally, the annual precipitation estimates derived from global data sets demonstrated comparable spatial patterns with observed annual precipitation. The gauge locations which are located in the higher altitude demonstrated higher annual precipitation compared to the stations located in lowland areas. It can be seen that a higher annual rainfall amount occurred mostly in the higher altitude regions within the watershed. Higher annual precipitation was obtained in the northern-western part of the Upper Awash basin, and it decreases in the southern part and the south-eastern part to the outlet of the basin. Similar to the current result, the map of the spatial distribution of average annual rainfall over the Upper Blue Nile basin in the Northern highlands of Ethiopia shows higher spatial variation (Abtew et al. 2009). Moreover, the spatial distribution of all three global precipitation products underestimated the observed gauged value in the LB area which has relatively flat topography. Mainly, the CFSR-NCEP precipitation data exhibited higher spatial variation within the Upper Awash basin where it is found to largely underestimate gauged values at lower altitudes and overestimated the observed values in the higher elevation regions as shown in Fig. 4b Musie et al. (2019). have also evaluated the performance of four prominent gridded precipitation datasets for hydrologic prediction in the Lake Ziway watershed in the Rift Valley region of Ethiopia and obtained reasonable performance in relatively flat terrain than the mountainous watershed. The satellite precipitation products have challenges over mountainous and coastal regions due to varying spatial structures (Dinku et al. 2018). Overall, the performance of the precipitation products over the Upper Awash basin in the central part of Ethiopia can be seen as promising.

Fig. 4
figure 4

Map of spatial distribution of mean annual rainfall (RF) over Upper Awash basin using inverse distance weighting (IDW) spatial interpolation technique for a rain gauge data, b CFSR-NCEP data, c CPC-NOAA data and d TRMM data

The efficiency of global climate data for monthly and daily hydrologic simulations

The sensitive hydrologic parameters were ranked based on the t stat and p values, which were computed using the SWAT calibration and uncertainty program (SWAT-CUP). The sequential uncertainty fitting (SUFI2) in the SWAT-CUP (Abbaspour 2007) was applied to perform the global sensitivity analysis. The model calibration and uncertainty analysis were made at Hombole, Melka Kuntre, Mojo, and Akaki watersheds in the Upper Awash basin. The most sensitive parameter values were varied iteratively within the allowable ranges until the satisfactory agreement between measured and simulated flow was obtained. The detail description of the hydrologic parameters is listed in the SWAT model user’s manual (Neitsch et al. 2011; Abbaspour 2007, 2013). The surface runoff parameters, such as the SCS runoff curve number for moisture condition II (a_CN2.mgt), Available water capacity of the soil layer (a_SOL_AWC.sol), and Saturated hydraulic conductivity (a_SOL_K.sol) were more sensitive. Also, the baseflow parameter, such as Baseflow alpha factor (v_ALPHA_BF.gw), Threshold depth of water in the shallow aquifer required for return flow to occur (mm) (v_GWQMN.gw), Deep aquifer percolation fraction (v_RCHRG_DP.gw), and Groundwater “revap” coefficient (v_GW_REVAP.gw) were found to be more sensitive parameters for groundwater flow in the subbasins of Upper Awash basin, as shown in Table 1.

The model calibration was performed by considering the range of relatively consistent series of daily and monthly discharge at the outlet of main watersheds. For each model runs, the first three years were used as a model warm-up period. The result from a standard statistical method of model performance evaluation met the criteria of NS > 0.5, R2 > 0.6, RSR < 0.6 and PBIAS < 15% at monthly time scale. The statistical efficiency measures and the hydrograph of simulated versus observed monthly discharge shows comparable results from global precipitation estimates and observed precipitation data, as shown in the consecutive Figs. 5, 6, 7 and 8 for Hombole, Melka Kuntre, Mojo and Akaki watersheds, respectively. The uncertainty bounds, which contain the 95 percent prediction uncertainty (95PPU) corresponding to the behavioral parameter sets were plotted for the ranges of best simulations. The monthly hydrologic simulation and uncertainty analysis of observed rain gauge data and global precipitation estimates with observed monthly discharge indicated that the SWAT model performed well with all precipitation data sources. The 95PPU uncertainty bound revealed the CPC-NOAA precipitation data outperformed other precipitation data sources in the Hombole and Melka Kuntre watersheds with better values of r-factor and p-factor, as shown in Table 4. The global precipitation estimates performed better prediction efficiency relatively in larger watersheds particularly at the Hombole and Melka Kuntre compared to relatively smaller watersheds, Mojo and Akaki watersheds. Moreover, the monthly hydrographs indicated in Figs. 7b and 8b reveals the CFSR-NCEP data underestimated most of the peak flows and indicated lags behind the observed discharge series. The CFSR-NCEP climate data performed better in capturing the peak monthly discharge in the Hombole and Melka Kuntre watersheds. However, the streamflow simulations based on the global precipitation estimate and the observed precipitation data underestimated the peak flows. In agreement with the current finding Musie et al. (2019) concluded that the gridded global precipitation data performed well for simulations of the observed monthly discharges; however, the peak streamflow was not simulated well because of the uneven representation of the spatial distribution of precipitation. The precipitation derived from the CPC-NOAA performs better for simulation of the monthly and daily discharge over the Upper Awash basin, which indicated comparable model efficiency results with observed climate data. Overall, Figs. 5, 6, 7 and 8 illustrated that all model simulations forced by global estimates captured the observed monthly hydrographs very well at the Hombole, Melka Kuntre, Mojo, and Akaki watershed respectively in the Upper Awash basin.

Fig. 5
figure 5

Hydrograph of monthly observed and simulated discharge using a rain gauge precipitation data, b CFSR-NCEP data, c CPC-NOAA data and d TRMM data at Hombole watershed

Fig. 6
figure 6

Hydrograph of monthly observed and simulated discharge using a rain gauge precipitation data, b CFSR-NCEP data, c CPC-NOAA data and d TRMM data at Melka Kuntre watershed

Fig. 7
figure 7

Hydrograph of monthly observed and simulated discharge using a rain gauge data, b CFSR-NCEP data, c CPC-NOAA data and d TRMM data at Mojo watershed

Fig. 8
figure 8

Hydrograph of monthly observed and simulated discharge using a with observed climate data, b CFSR-NCEP reanalysis data and c CPC-NOAA precipitation data at Akaki watershed

Table 4 Statistical efficiency measures for monthly and daily model simulation of runoff at four different main watersheds of Upper Awash basin (a) Hombole, (b) Melka Kuntre, (c) Mojo and (d) Akaki watersheds

The R2, NS, PBIAS, RSR, Mean_obs (mean value of observed discharge), Mean_sim (mean value of simulated discharge), p factor, and r factor were used to assess the efficiency and uncertainty in the runoff predictions of the model and the input data. As shown in Table 4a the statistical results of the model performance measure for rain gauge data found to be higher value with R2, NS, PBIAS, and RSR of 0.88, 0.87, − 4.0 and 0.35 at the Hombole watershed on the monthly time scale. The CPC-NOAA precipitation has shown excellent performance in the Melka Kuntre watersheds both for monthly, and daily scale simulation with R2 and NS efficiency measures values ≥ 0.8. However, the global precipitation estimate poorly performed for daily hydrologic simulation in the Mojo watershed. The better spatial and temporal resolution of precipitation datasets is needed to provide good performance of hydrological simulations and to better characterize streamflow (Terink et al. 2018; Gao et al. 2018). In relatively smaller watersheds, the suitability of global precipitation was found to be lower due to the spatial scale of precipitation data sources. In agreement with the current finding Tolera et al. (2018), revealed the CFSR weather data performs better for streamflow simulation in the relatively larger watersheds. In fact, for smaller watersheds, the number of precipitation data points considered for hydrologic simulation are fewer due to the coarser resolution of global precipitation data. Thus, model performance to capture hydrologic variability will increase when the number of rain gauge increase to a certain threshold and smaller basin requires a denser rain gauge network (Terink et al. 2018; Zeng et al. 2018). Similarly, the SWAT model performs better at Melka Kuntre and Hombole watersheds because more numbers of rain gauges and precipitation grid points are available, which better capture the topographic variability of the watersheds. The hydrographs and the statistical model efficiency measures demonstrated that the CPC-NOAA precipitation data achieved superior performance for daily and monthly hydrologic simulation in the Upper Awash basin as shown in Table 4a–d. The worst daily performance for all precipitation input was obtained at Mojo watershed, which could be due to higher variability of the observed daily discharge as shown in Fig. 9. The scatter plot of daily simulated and observed discharge at Hombole station as shown in Fig. 9 indicates the simulated discharge with rain gauge data and global precipitation data underpredicts the daily peak events and overpredicts the low flows.

Fig. 9
figure 9

Scatter plot of daily best simulated (cms) and observed daily discharge (cms) based on a rain gauge, b CFSR-NCEP, c CPC-NOAA and d TRMM precipitation data at Hombole, Melka Kuntre, Mojo and Akaki watersheds in the Upper Awash basin

Moreover, the comparison of daily observed discharge, simulated discharge using the gauge climate data, CFSR-NCEP, CPC-NOAA, and TRMM precipitation estimate shows the model simulation underestimates most of the peak discharge. The mass-curve of daily discharge at four main watersheds in the Upper Awash basin which are shown in Fig. 10a–d, indicated that simulation forced with the CPC-NOAA precipitation data consistently underestimated the observed discharge. The extreme maximum daily discharge estimated for a 100-year return period was found to be 412.8, 333.2, 326.5, 356.3 and 244.8 cms for observed discharge, gauged-climate, CFSR-NCEP, CPC-NOAA, and TRMM precipitation inputs, respectively at Hombole watershed. In particular, the simulation based on the TRMM data underestimated the extreme daily discharge at Hombole and Melka Kuntre watersheds. The 100-year extreme daily discharges at Melka Kuntre watershed are estimated to be 180.8, 186.8, 179.7, 190.6 and 169.7 cms for the observed discharge, gauged-climate, CFSR-NCEP, CPC-NOAA, and TRMM precipitation. Similarly, the simulation of maximum daily discharge at the Akaki watershed revealed underestimation of the observed discharge and the 100-year maximum discharge was found as 70.7, 41.2, 46.9, and 41.1 cms for the observed, gauged-climate, CFSR-NCEP, and CPC-NOAA data. In contrast, the hydrologic simulation in the Mojo watershed showed the TRMM data significantly overestimated the observed discharge. However, the TRMM underestimated the daily peak values at Hombole and Melka Kutre watersheds. The 100-year maximum daily discharge simulated at Mojo watershed was found to be 85.8, 48.9, 31.6, and 175.1 cms for the observed discharge, gauged-climate, CPC-NOAA, and TRMM. Furthermore, the mass curve of daily discharge illustrated that the variation in simulated discharge among the precipitation data sources is higher for the extreme flow events.

Fig. 10
figure 10

Flow duration curve for daily observed discharges, simulated discharge based on gauged climate, CFSR-NCEP, CPC-NOAA, and TRMM precipitation data at different return periods (year) and the probability of exceedance for a Hombole watershed, b Melka Kuntre watershed, c Akaki watershed, and d Mojo watershed

Implications of different precipitation data sources on the annual hydrologic components

In addition to the daily and monthly streamflow hydrographs, the mean annual water balance components were analyzed to examine the effects of different precipitation input sources on hydrological processes. The change in soil water content is the amount of water that is being added to or removed from what is stored in the system or river basin per unit time. Different sources of precipitation can accurately reproduce the observed streamflow hydrographs by parameter calibration of the hydrological model, but the differences in precipitation inputs inevitably reflect on the simulations of other hydrological variables such as evaporation and soil water storage (Bai and Liu 2018). The water balance components presented in Table 5 illustrated that the precipitation estimates not only influence the performance of hydrologic model simulation, it also affects the variability of mean annual water balance components. The positive value of the change in soil water content indicates surplus and negative values are a deficit in soil water content. The annual basin values show significant variations among the gauge-climate, CFSR-NCEP, CPC-NOAA, and TRMM precipitation inputs. The change in soil water storage for the entire basin indicated surplus values for all the climate data sources which are found to be 6.1, 16.3, 7.3 and 2.7 mm for gauge-climate, CFSR-NCEP, CPC-NOAA, and TRMM precipitation data. The annual surface runoff, lateral flow, and groundwater flow simulated using the CPC-NOAA, and TRMM precipitation highly underestimated the water balance simulation which based on observed climate data. In other words, the annual Evapotranspiration (actual ET) value simulated with the CFSR-NCEP data source is found to be significantly lower than other data sources as a result higher amount of change in soil water storage was obtained. A maximum sediment load of 117.2 (t/ha) was obtained from observed climate data which directly related to the amount of surface runoff and precipitation.

Table 5 Comparison of the mean annual water balance components (mm) of simulations driven by gauge, CFSR-NCEP, CPC-NOAA, and TRMM data over the Upper Awash basin

Conclusions

Accurate precipitation data which has better temporal quality and areal extents of the basin is important for the efficient prediction of the hydrologic variability and water balance components of the basin. Hence, in addition to rain gauge measuring networks, satellite precipitation product and climate model reanalysis data can be used to improve the availability of precipitation data. In this study, the performance of global precipitation datasets, namely, the CFSR-NCEP, CPC-NOAA, and TRMM was evaluated at the basin scale areal values with gauged precipitation data. The Upper Awash basin was divided into two as Upper Basin (UB) and Lower Basin area based on the elevation difference for evaluation of the global precipitation data in relation to gauged precipitation data. All three global data sources underestimated the observed values at the LB area and CFSR-NCEP precipitation data highly overestimated the observed value at the UB area in the higher elevation area of the basin. Furthermore, the ability of these data sources to simulate the streamflow variability on the daily and monthly time scale was performed at four main watersheds viz Hombole, Melka Kuntre, Mojo and Akaki watersheds within the Upper Awash basin using the SWAT model.

The spatial pattern of the precipitation estimate resembles consistently with gauged data where higher annual precipitation amount was obtained in the north-western at the upstream part of the Upper Awash basin, and it decreases to the downstream area at the outlet of the basin. The spatial distribution of the precipitation amounts is highly correlated with the elevation change. The mean annual and monthly time series plot of precipitation indicates the CPC-NOAA and TRMM precipitation data underestimated observed values, whereas, the CFSR-NCEP precipitation overestimates the observed gauge data, mainly in the wet months. Moreover, the statistical efficiency measures indicated reasonable correlations between the observed data and precipitation estimates on a monthly level. The CFSR-NCEP data which is compiled in the SWAT global weather database proved to be a more helpful source of climate data because it provides comprehensive estimates of all climate parameters which are useful to simulate the streamflow characteristics in the data scarce regions.

The hydrologic simulation efficiency of the global precipitation products revealed outstanding performance with NS value of > 0.7 at Hombole and Melka Kuntre watersheds and satisfactory performance at Mojo and Akaki watersheds. Though, the plot of monthly hydrographs and flow duration curve of all the precipitation data sources, including the rain gauge data underestimated most of the monthly and daily peak discharge. The model performs better in relatively larger watersheds at Melka Kuntre and Hombole watersheds. However, the efficiency of global precipitation estimates for daily hydrologic simulation in the Mojo and Akaki watershed was lower. The model efficiency improves when a significant number of rain gauge data or more precipitation grid points are available. The global climate data such as the CPC-NOAA, CFSR-NCEP, and TRMM can be useful data sources for hydrologic modeling in data-scarce tropical regions. The advancement in the satellite meteorology and climate models unlocks new opportunities to counter the challenges in the watershed modeling for sustainable water resources management. Future research should consider the grid-based or micro watershed level analysis of global precipitation dataset to highlight the relative effectiveness of these data for watershed modeling and for planning sustainable water management interventions.