Introduction

The on-going pandemic; novel coronavirus disease 2019 (COVID-19), is spreading. All the continents in the world have been affected with daily records of new cases. The causative agent of COVID-19 is a positive-sense single-stranded RNA virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [6].

Past experiences with disease outbreaks have shown low level of preparedness in Africa, for instance, the disparaging epidemic by Ebola virus from 2014 to 2016 in West Africa [23]. For COVID-19 outbreak, the continent is issuing different schemes to combat continuous spread of the virus; such schemes include (1) screening for the disease on arrivals at airports and some seaports; trainings on the detection of COVID-19 at point-of-entry has been carried out across Africa by designated authorities [24]. (2) Accompanied with the provision of resources, risk countries like Egypt have been educated on prevention and control of infection, community participation and management at health centres with respect to COVID-19 [91319]. On 14th February, 2020, the first COVID-19 confirmed case was detected in Egypt in the whole of Africa, as at the date of this write-up (7th May 2020), Africa’s total confirmed cases have risen to 52,821 with 2018 deaths [25].

SARS-CoV-2 infection took inception from Asia, specifically, China in December, 2019 and has since risen to 605,270 total cases and more than 2205 new case on 6th May, 2020 [25]. For history, China Health Authority noted WHO on 31st December, 2019 about numerous cases of pneumonia-like disease now COVID-19 [14]. Even though, symptoms such as livedo or necrosis has been accrued to COVID-19 [4]. SARS-CoV-2 Phylogenetic studies revealed that SARS-CoV-2 genome sequence are linked to bat-SL-CoVZC45 and bat-SL-CoVZXC21 which are coronaviruses cultured from bats at eastern China, 2018 (88% identity), meanwhile, the genetics of SARS-CoV-2 are distinct from MERS-CoV and SARS-CoV with 79% identity similarity [16]. As of 7th May, 2020, North America has reported a total of 1,378,143 COVID-19 cases, whereas, America has 91.66% of the continent’s cases [25]. The analyzed data from April, 2020 in America revealed that the age group that is less than 18 years were least affected, with 2572 cases out of the total number of 149,082 cases [5]. The continent has since reported 82,650 total deaths [25].

Of all the continents, Europe is presently the worst hit of COVID-19 outbreak. It has the highest number of total confirmed cases (1,536,901), deaths (146,751) and new cases (more than 11,997) [25]. On 25th January, 2020, the continent discovered its index case in France with the presence of 790,748 active cases till date [2125]. Europe recorded as high as 40,800 new confirmed cases on 4th April, 2020. In order of increase, the European countries; France, Russia, UK, Italy and Spain have the highest number of COVID-19 cases in the continent. On 6th May, 2020, Russia recorded more than 88 new deaths out of about 120 new deaths in the continent [25]. Meanwhile, the continent, Oceania has least been affected by COVID-19 outbreak with no new death cases as at 6th May, 2020. The countries; New Caledonia and Papua New Guinea had total cases of 18 and 4 respectively, whereas all these cases have recovered [25].

The use of mathematical models has been evident in decision-making around diverse non-pharmaceutical mediations in the confinement of the spread of COVID-19 globally. Modeling is very vital in studying the spread of COVID-19, with a particular model addressing a specific problem [20]. Zhang et al. [26] considered a descriptive study and modeling of transmission dynamics of COVID-19 outside Hubei province, China and established demographic trends of cases at the provincial level using a Bayesian procedure. Hirk et al. [8] formulated numerous situations using data on the number of COVID-19 cases from the hospitalized patients, those in intensive care unit, along with the number of deaths that tested positive in Iceland and Austria using both frequentist and Bayesian approaches.

In another development, Benvenuto et al. [3] predicted prevalence and incidence of the COVID-19 cases using an autoregressive integrated moving average (ARIMA) model on the Johns Hopkins epidemiological data. Khan et al. [11] conducted a case series study for the assessment of the impact of SARS-CoV-2 infection on adverse pregnancy outcomes. The study included both health-care workers and women from the general population. Liang [15] fitted nonlinear model to the growth rate of COVID-19. Makinde et al. [18] employed generalized estimating equation to analyze the daily COVID-19 mortality rates in African countries, and found that there was a positive weak linear relationship between the daily reported COVID-19 cases and the population of African countries. However, the magnitude of the observed association was particularly small. Also, there were significant monotone trends in the daily COVID-19 incidence and mortality counts of many countries in Africa.

In this paper, the impact of mode of transmission of COVID-19 on the incidence and mortality; as well as drivers of total new deaths over WHO regions are considered. The linear relationships among the total COVID-19 confirmed cases, total confirmed new cases, total deaths and total new deaths in WHO regions are presented. A negative binomial regression model is formulated for total new death cases accounting for the mode of transmission of COVID-19 over WHO regions.

Material and methods

Data

The World Health Organisations (WHO) situation report 187 data, available at https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports, present the information on total confirmed cases, total confirmed new cases, total deaths, total new deaths and mode of transmission of COVID-19 in countries, regions and territories defined as WHO regions till July 26, 2020.

Figure 1 presents the plot of COVID-19 total confirmed cases (in `000) and total deaths in some countries and territories of WHO regions. The figure shows the 30 countries with highest total confirmed cases and 30 countries with highest total mortality cases in the WHO regions. The United States of America recorded the highest COVID-19 total confirmed cases and total deaths. Figure 2 presents pie of pie chart of WHO regions’ total number of COVID-19 deaths. It is observed from the figure that 12% of the WHO regions are yet to record COVID-19 deaths while 9% of the regions recorded 1–5 deaths. 43% of the 215 WHO regions experience more than 100 deaths as at July 26, 2020.

Fig. 1
figure 1

Plot of 30 countries with the highest total number of confirmed cases and 30 countries with highest total number of mortality cases

Fig. 2
figure 2

Pie of pie chart of total number of COVID-19 deaths in all countries and territories defined by WHO

Figure 3 presents frequency distribution of mode of transmission of COVID-19 pandemic in the WHO regions. It was observed that the mode of transmission in 36.74% of the WHO regions is by clusters of cases while 42.79% is through community transmission. Also, 14.88% of the WHO regions had their mode of transmission of COVID-19 to be sporadic cases, mode of transmission in 0.93% of the WHO regions is pending while 4.65% of the WHO regions are yet to record a COVID-19 case. This implies that the commonest mode of transmission in all countries, territories and regions defined by WHO is community transmission.

Fig. 3
figure 3

Frequency distribution of mode of transmission of COVID-19 based on total confirmed cases in all WHO regions

The negative binomial regression model

Suppose a random variable \(Y\) is distributed as negative binomial, the negative binomial distribution [7] of \(Y\) is defined by its probability mass function, defined as

$$P\left(y|\mu , \theta \right)=\frac{{\Gamma }(y+{\theta }^{-1})}{{\Gamma }(y+1){\Gamma }\left({\theta }^{-1}\right)}{\left(\frac{1}{1+\theta \mu }\right)}^{{\theta }^{-1}}{\left(\frac{\theta \mu }{1+\theta \mu }\right)}^{y},\quad \text{f}\text{or} \,\,y =\text{0,1},2,\ldots$$

where \(\mu\) is the mean incidence rate of \(y.\) Suppose a random variable \(Y\) follows the negative binomial distribution. Then its conditional expected value is \(E\left(Y|{X}_{1}, {X}_{2},\ldots, {X}_{p}\right)=E\left(\sum _{j=1}^{p}{\beta }_{j}{X}_{j}\right)\) and the variance is \(var\left(Y|{X}_{1}, {X}_{2},\ldots, {X}_{p}\right)=\mu +\theta {\mu }^{2},\) where \(\theta\) is the over-dispersion parameter and \({X}_{1}, {X}_{2}, \ldots, {X}_{p}\) are predictor variables. Negative binomial regression is used to model over-dispersed count data. It expresses \(\mu\) in terms of \({X}_{1}, {X}_{2}, \ldots, {X}_{p}.\)

The total number of new deaths in each country, region or territory may be over-dispersed. That is, the mean of total number of new deaths may be much less than its variance. To investigate this, the mean and variance of total number of new deaths in all countries and territories are 25.8837 and 15,829.75 respectively. In this study, negative binomial (NB) regression model  [722] of the form:

$${\text{log}}_{e}\left(E\left({Y}_{ND}\right)\right)={\beta }_{1}{\text{l}\text{o}\text{g}}_{e}\left({N}_{TC}\right)+{\beta }_{2}{N}_{LC}+{\beta }_{3}{N}_{TD}+{\beta }_{4}{D}_{CT}+{\beta }_{5}{D}_{P}+{\beta }_{6}{D}_{SC}+{\beta }_{7}{D}_{CC}+{\beta }_{8}{D}_{NC}+\epsilon$$
(1)

is formulated for total number of new deaths, where \({N}_{TC},\) \({N}_{LC},\) \({N}_{TD}\) and \({Y}_{ND}\) are total number of confirmed cases, total number of confirmed new cases, total number of deaths and total number of new deaths in WHO regions, respectively and \({\beta }_{1}, {\beta }_{2},\ldots, {\beta }_{8}\) are coefficients of the model. The dummy variables \({D}_{CT},\) \({D}_{P},\) \({D}_{SC},\) \({D}_{CC}\) and \({D}_{NC}\) denote COVID-19 modes of transmission defined as \({D}_{CT}\) = 1, if mode of transmission is community transmission; \({D}_{CT}=0,\) otherwise. The variable \({D}_{P}=1\) if mode of transmission is pending; \({D}_{P}=0,\) otherwise. The variable \({D}_{SC}=1,\) if mode of transmission is sporadic cases; \({D}_{SC}=0,\) otherwise. The variable \({D}_{CC}=1,\) if mode of transmission is clusters of cases; \({D}_{CC}=0,\) otherwise. The variable \({D}_{NC}=1,\) if mode of transmission is no cases; \({D}_{NC}=0,\) otherwise.

A stepwise model selection approach based on Akaike information criterion (AIC) [2] is used to exclude models with highest AIC values in the fitted NB model in (1). This is executed with the variables \({\text{l}\text{o}\text{g}}_{e}\left({N}_{TC}\right),\) \({N}_{LC}\) and \({N}_{TD}\) in order to identify the variables that have the least importance which were omitted from further analysis [17].

The variable \({N}_{TC}\) is transformed to reduce the variability of data, especially in data that include outlying observations. The variance of \({N}_{TC}\) is 112,325,761,815. The variable \({N}_{TC}\) contains some outlying observations. This is shown in the first boxplot in Fig. 4. The use of the natural log transformation is to reduce variance of \({N}_{TC}.\) The variance of natural logarithm of \({N}_{TC}\) is 8.48. This is shown in the second boxplot in Fig. 4.

Fig. 4
figure 4

Boxplot of a the total number of incidence cases and b the natural logarithm total number of incidence cases in all WHO regions

Results and discussion

Linear relationship among total confirmed cases, total confirmed new cases, total deaths and total new deaths

Table 1 presents the estimate of Kendall’s rank correlation coefficients [10] among variables (\({N}_{TC},\)  \({N}_{LC},\)  \({N}_{TD}\) and \({Y}_{ND}\)), with p value of t test of significance of correlation estimates in parenthesis. The total confirmed cases and total confirmed new cases are significantly, positively and strongly related linearly. This implies that as the total number of confirmed cases increases, the total number of confirmed new cases increases. This is the same for all combinations of all the variables. Also, the total number of deaths recorded increases with increase in the total number of confirmed cases in either of the countries, territories and area under WHO regions.

Table 1 Table showing correlation among variables, with p value of t test of significance of correlation estimates in parenthesis

Modes of transmission

Kruskal–Wallis test [12] is applied to test if modes of transmission (clusters of cases, community transmission, pending and sporadic cases) in all the countries are equivalent in terms of each of \({N}_{TC},\)  \({N}_{LC},\)  \({N}_{TD}\) and \({Y}_{ND}.\) The result of the Kruskal–Wallis test for the modes of transmission based on \({N}_{TC},\)  \({N}_{LC},\)  \({N}_{TD}\) and \({Y}_{ND}\) is shown in Table 2 in terms of p values of the test. The COVID-19 modes of transmission in all the WHO regions are significantly different at 5% level of significance in terms of \({N}_{TC},\)  \({N}_{LC},\)  \({N}_{TD}\) and \({Y}_{ND}.\)

Table 2 Kruskal–Wallis test of equivalence of COVID-19 modes of transmission in WHO regions

Predictive model based on negative binomial model for the total number of COVID-19 new deaths

The mean and variance of the total number of COVID-19 new deaths reported as at July 11, 2020 across countries, territories or areas are 29.163 and 19,596.35 respectively. As a result, negative binomial models are formulated for total number of COVID-19 deaths.

A negative binomial model was formulated for the total number of COVID-19 deaths as a function of total number of COVID-19 confirmed cases, total number of new confirmed cases, total number of deaths and mode of transmission. A stepwise count model selection was performed on Eq. (1). The AIC value for the model (Eq. 1) is 832.19. The model with the least AIC (AIC = 829.12) excludes total number of deaths and total number of new confirmed cases in the WHO regions. The coefficient of natural logarithms of \({N}_{TC},\) as well as modes of transmission (community transmission and clusters of cases), which predicts \({Y}_{ND}\) is statistically significant at 5% level of significance. Table 3 presents the estimates of coefficients of negative binomial regression model for the total number of new deaths in the WHO regions. The total number of new deaths in the WHO regions increases by a factor of 2.863 for a 1-unit increase in log of total number of confirmed cases when other variables are held constant. The total number of new deaths in the WHO regions decreases by a factor of 0.0001 and 0.0002 for a 1-unit increase in factor of clusters of cases and community transmission respectively when other variables are held constant. It can be inferred from the table that the total number of new deaths increases with increase in the total number of confirmed cases.

Table 3 Estimates of coefficients of NB model for total number of COVID-19 deaths

The dispersion parameter (\(\theta =0.6598\)) in Table 3 is finite, this implies that the mean is less than variance. As θ approaches 0, the distant is the mean from the variance [1]. Consequentially, negative binomial model will fit the data well. In addition, the Vuong test is used to check if a null model fits the total number of new deaths equivalently as the fitted NB model. The test rejects the null hypothesis at 0.05 level of significance (p value < 2.22e–16) and suggests that NB model fits the data better than the null model. Figure 5 presents the comparison between observed data and fitted data based on negative binomial model for some countries, territories and areas in WHO regions. It can be observed from the figure that the model fits the data well.

Fig. 5
figure 5

Comparison between observed data and fitted data based on negative binomial model for some countries, territories and areas in WHO regions

Conclusion

This study provides insight into whether mode of transmission in a WHO region affects number of COVID-19 new death cases over a time period. In particular, a negative binomial regression model was formulated for predicting total new death cases in all WHO regions. The findings showed that COVID-19 total number of new deaths is significantly affected by the total number of confirmed cases and modes of transmission of the disease in the WHO regions. The significance of community transmission and clusters of cases among other modes of transmission on mortality was highlighted. This study is limited by dearth of information on population in all the WHO regions.