The COVID-19 pandemic has been coupled worldwide with an explosion of information about the virus. In its present form, COVID-19 seems to have two very challenging characteristics [1]: it is highly infectious and, despite having a benign course in the vast majority of patients, it requires hospital admission and even intensive care for a far from negligible proportion of infected. This has generated a lot of publications at a rapid pace, addressing and investigating different aspects of the epidemic [see, e.g., 2,3,4,5,6,– 7] and, sadly, also an increase in misinformation, that is of course not all malevolent, although its impact can be devastating [8, 9].

Across all the huge literature on the possible determinants leading to a reduction of the infection, fatality, and mortality rates, our attention has been attracted by the work of Ilie et al.  on the role of vitamin D in the prevention of COVID-19 infection and mortality [10]. Vitamin D is known to play an important role in bone metabolism through regulation of calcium and phosphate homeostasis, and may also play an important role in immune system regulation. Vitamin D is produced by the body during exposure to sunlight, but is also found in oily fish, eggs, and fortified food products. In addition to causing rickets, vitamin D deficiency has been linked to respiratory infections [10, 11]. Some studies have suggested that vitamin D supplementation can decrease the frequency and severity of respiratory infections; however, further research is needed before specific recommendations can be made [12,13,14]. According to these general premises, the work of Ilie et al. may play a fundamental role in limiting the spread of COVID-19 and reducing the number of deaths [15]. The authors clearly state that the crude association observed in the present study may be explained by the role of vitamin D in the prevention of COVID-19 infection or more probably by a potential protection of vitamin D from the more negative consequences of the infection. This is a rather neat and important statement, that may strongly contribute to handle the pandemic outcomes properly.

Unfortunately, however, we cast some doubts on the statistical methods employed on which this statement is based. The fundamental statement above is based on a correlation test, though it implicitly implies a cause–effect relationship. However, correlation does not imply causation, that can be investigated, e.g., in a regression setting. We reproduced the analysis published in the work of Ilie et al. [15]. We noticed the following:

  • The correlation between levels of mean vitamin D and the number of deaths caused by COVID-19/1 M population in each country is correctly reported \(\rho = -0.4378\). Nevertheless, the correct p value is not the one reported in the paper. The correct p value is 0.05353, above the nominal significance 5% level.

  • The correlation between levels of mean vitamin D and the number of cases of COVID-19/1 M population in each country is correctly reported \(\rho = -0.4435\). Nevertheless, the reported p value is slightly above the one reported in the paper. The correct p value is 0.05014, slightly above the nominal significance 5% level.

The significant cut point p value\(<0.05\) is not achieved.

As a general matter, correlations must be interpreted with a lot of care [16, 17]. Pearson himself [18, 19] suggested to be very wary of correlating ratios and, if forced to do so, to adopt as the point of no connection not 0, but some value as 0.4. Furthermore, to properly apply the hypothesis testing on the correlation coefficient, some assumptions must be fulfilled. Just to mention violations which potentially affect the results:

  • the data do not follow a Gaussian distribution;

  • there is heteroscedasticity, and the variance is smaller for a particular range of values and larger for another range of values;

  • outliers are present, they can significantly skew the correlation coefficient and make it inaccurate.

The data analysis provided by Ilie et al. [15] has a few more statistical-related weaknesses, some correctly acknowledged by the authors. We mention just a few with the aim of providing a basis for a revision of the data analysis on a more solid statistical ground. First, as discussed also by the authors, data are heterogeneous, i.e., the sample is not drawn from a single population but rather from a mixture. This could be due to the country-specific data collection processes, e.g., on COVID-19 deaths: for example, in Italy, recorded deaths are with COVID-19, while in Germany, recorded deaths are by COVID-19. Moreover, as the epidemic started at different dates in different countries, the time at which the data are collected may also play a role, increasing unobserved heterogeneity. However, if heterogeneity is not taken into account, the obtained results may be unreliable, whatever statistical approach is considered. Second, if a linear relationship/model is considered, fitted values may likely be negative for the data at hand (for Slovakia, the fitted linear predictor is -16.04316 deaths/1M population, i.e., unreliable), and this is not possible as the data are defined on the positives only. This issue can be easily solved by considering a generalized linear model for counts, with an offset. At last, as a further point for discussion, we would remark that there is a large repertoire of methods based on different paradigms of inference that provide ample options for supplementing and enhancing simple hypothesis testing [20]. Thus, even if Ilie et al. pose the basis for a novel and potentially very useful strand of research [15], there is still a lot to do on the statistical ground to be confident on any significant relationship between vitamin D and COVID-19 epidemic.

The main conclusion that we found significant crude relationships between vitamin D levels and the number COVID-19 cases and especially the mortality caused by this infection is currently not supported by the data analysis. Of course, the relationship between vitamin D and COVID-19 deserves dedicated studies, as correctly discussed by Ilie et al. [15], as it may reveal interesting insights on the COVID-19 outbreak. This is a very interesting field of research with no doubts, and further studies should be planned. Currently, however, there is no evidence of any effects of vitamin D in reducing the impact of COVID-19 on the number of cases neither on deaths.