Modeling the Daily Number of Reported Cases of Infection from the COVID-19 Pandemic in Nigeria: A Stochastic Approach

The focus of this paper is to present a stochastic model to capture the random behavior of the number of reported daily infections due to the Corona Virus (COVID-19) in Nigeria. The model expressed in form of a distribution function has five parameters. The model was fitted to the logarithm of the reported daily number of infection cases for the time period March 18th June 11th, 2020. While the results obtained established the adequacy of the model in fitting and explaining the random behavior of the number of reported daily infections, it was also possible to use the model to study the situation of the number of infections exceeding certain thresholds. The procedure for the determination of these thresholds was established and a number of them were estimated for some given return periods.


Introduction
Infectious disease insurgency such as those resulting from an epidemic or a pandemic is one with a very long history with the most elaborate documentation starting in the Middle Ages. It began with the Black Death plague in the 14th century where in Europe alone were recorded some 25 million deaths in a population of about 100 million. Towns and villages were completely ravaged with the plague. Similar report was also obtained from China, where up to two-thirds of many local populations were said to have died in widely scattered settlements [1]. Closer to our time is the Spanish Flu pandemic which surfaced in the twilight of World War I in 1919 with global mortality hitting some 20 million in twelve months. Since this last pandemic, several infectious disease outbreaks have plagued our world coming with their attendant health, economic and social challenges. To name a few, the world have had to endure infectious diseases like Rabies, Lassa Fever, Smallpox, Infectious Cancers, Genital Herpes, HIV/AIDS, Ebola Virus etc. and sandwiched between all of these are diseases induced by parasite like Malaria all having their toll on the well-being of humans.
The current COVID-19 pandemic is one that has placed the entire world on a stand-still. With no known vaccine for the treatment of the disease yet, government of nations and the World Health Organization (WHO) have put in place various measures to curb the spread of the viral infection. There is no doubt that an understanding of the stochastic nature of the infection trajectory of the disease will help a great deal in strengthening the existing control framework while also offering useful clues into the long-term trend or behavior of the disease. Several mathematical models have been put forward recently to understand the dynamics of the virus (see [2][3][4][5][6]). The focus of this paper is to develop a stochastic model in the form of a distribution function to capture the random behavior of the daily number of infections from COVID-19 in Nigeria.
The rest of the paper is organized as follows. In Section 2 preliminary arguments for the development of a stochastic model are presented. The model and its estimation procedure are presented in Section 3. In Section 4 the application of the model to the reported daily number of COVID-19 infection cases in Nigeria is carried out. The paper closes in Section 5 with a conclusion.

Preliminary Arguments for a Stochastic Model
The central focus of a stochastic model is usually on a random variable X which represents a quantity whose outcome is uncertain. This quantity can be the number of infected persons, the number of deaths or the number of recoveries from a disease per unit time in a population. Suppose F is the true model which generates all the outcomes of X in the population. In this regard, F is taken to be a distribution function for calculating the probability that X will be less than or equal to some x. The model F is usually unknown. Suppose there exist some sample data X 1 , ..., X n of size n which are independent realizations of the random variable X, the most plausible thing to do would be to use these independent realizations of X to find an approximation for F sayF n . Observe that the approximationF n will be a model-free estimate of F since it is entirely based on the realized sample. In fact, if X (1) , ..., X (n) denote the ordered sample of the sample of independent observations from F such that X (1) ≤ X (2) ≤ ... ≤ X (n) , for any one of the X (i) , exactly i of the n observations will have a value less or equal to X (i) . So, a sample-based estimate of the probability of an observation being less than or equal X (i) isF n X (i) = i/n while to avoidF n X (n) = 1, a slight adjustment is made and thuŝ is the data-based approximation of F . It follows that the sample-based quantile X (i) can be realized from (1) using the relation whereF −1 n is the inverseF n .
Given thatF n the sample-based estimate of F is without parameter(s), essential statistical properties of F such as shape, moment and asymptotic behaviors will be difficult to ascertain fromF n . It is usually the practice to develop a parameterized modelF and its inverseF −1 that can best be used to approximate F and F −1 respectively, and for any given application to a sample, closely align withF n andF −1 n respectively. Several methods for generating families of parameterized distribution functions have appeared in the literature within the last three decades (see [7][8][9][10][11][12]). In the following section, a parametric distribution function is developed to model the number of reported daily infections from COVID-19 in Nigeria.

Model and Estimation
Here the new stochastic model for the reported daily infections from COVID-19 in Nigeria is presented and defined in form of a distribution function. The estimation of the model using the maximum likelihood method is also presented.

The model
Suppose X is a random variable representing the number of infections from COVID-19 on a given day. Let X 1 , ..., X n be independent observations of the number of infections from day 1 to day n. In many cases the natural logarithm of these observations are taken and modeled for a number of reasons. Random variables of this nature tend to be highly skewed with the possibility of having outliers. If the skewness of the random variable is to the right such that its distribution is approximately a lognormal distribution, the distribution of the natural logarithm of the random variable will be a normal distribution and the readily available analysis and inference which holds for the normal distribution can be applied. However, there is hardly a data set in reality whether original or transformed with a perfect normal distribution. What obtains in reality are distributions which exhibit some form of assymetry and in many cases with different tail and shape behaviors ranging from heavy-tailed, light-tailed, left-skewed, right-skewed and even bimodal shapes. Going forward, it is difficult to know which, among these characteristics of a distribution, a given data arising from random processes will assume and because of that, it will be helpful to develop a model which posseses all of these characteristics. Models of such nature are classified as being flexible due to the fact that they can accomodate the behavior of almost all data types. Again, the need to model the logarithm of the actual data set rather than the actual data set itself, stems from the fact that the logarithmic transformation of a data set can be useful in reducing variability in the data set. Thus, Our goal is to develop an appropriate distribution functionF with a corresponding density functionf wheref is the first derivative ofF that can be used to efficiently model the natural logarithm of the number of reported daily infections from COVID-19 in Nigeria, and for a simple anti-logarithm transformation lead us to the results for the original sample. We believed that convoluting two or more distribution functions will be very useful in arriving at such a flexible model in form of a distribution functions.
Suppose T is a random variable following an extreme value distribution with distribution and density functions given respectively by Let R be a Weibull random variable with distribution and density functions given respectively by Let Y be a standard logistic random variable with quantile function The distribution function is a valid convoluted distribution function. If β is a positive real number, the distribution functionF is a more flexible distribution function than the one in (3) and for β = 1 reduces to the distribution function in (3). Using (4), we define a new model for the logarithm of the reported daily number of infections from COVID-19 in Nigeria expressed in terms of the distribution and density functions given respectively bŷ Observe that the model in (5) has 5 parameters with the parameters α, β, c and k playing the role of shape paramters. These parameters allow the model to be flexible enough to accommodate various shape and tail behaviors arising from the data it is applied to. The parameter λ is a scale parameter. In Figures 1, 2 and 3 various shapes of the density in (6) are shown for different combinations of the parameters. The Figures clearly show that the density can be skewed to the right, skewed to the left and also exhibit bimodal shapes, a situation which points to its flexibility. Consequently, daily infection quantiles at some probability p is obtained from the relation A surface plot of the daily infection quantiles N (p) for some range of values of α and β with p = 0.4, k = 2, c = 0.5 and λ = 1 is shown in Figure 4. The same plot for some range of values of k and c is shown in Figure 5 for p = 0.3, α = 4 and β = 5 and λ = 3.

Model for infection threshold
Suppose it is of interest to determine the possibility of a given day's number of infections exceeding or equating a certain threshold X(t). The quantity X(t) is interpreted as the number of infections which is exceeded at least once in t > 1 days with probability p t . In fact, X(t) is expected to be exceeded on average once in 1/p t days. Determining X(t) is important for planning by the authorities given that the bed spaces for admitting infected individuals are limited and has a finite carrying capacity. It can also come handy when the interest is to know when the available isolation facilities will no longer have space for admitting newly infected individuals. Suppose that the event X ≥ X(t) have just occurred. That is, a given day's infections number have just exceeded the threshold X(t). The number of days it will take for it to happen again is the "recurrence interval" and the expected recurrence interval is the return period t of the event X ≥ X(t).
Thus t is the average number of days that it will take for the event X ≥ X(t) to happen again. It follows that the probability p t of the occurrence of the event X ≥ X(t)) is related to t and is computed from the relation Consequently, using the proposed model, X(t) for a given t can be obtained by solving for X(t) in the equation Solving (9) for X(t) gives Thus (10) is used to estimate the infection threshold for any return period t.
Observe that large values of t will result in large values of X(t) and vice versa.

Model parameters estimation
For a random independent sample x 1 , ..., x n of size n representing the logarithm of the daily number of infections, to estimate the parameters α, β, c, k, and λ of the model in (5) using the maximum likelihood method, we would have to maximize the log-likelihood function with respect to the parameters α, β, c, k, and λ. Let Θ = (α, β, c, k, λ) T be the unknown parameter vector. Then the associated score function is given by For interval estimation of the parameters, the Fisher Information Matrix (FIM) is required and it is the 5 × 5 symmetric matrix where the elements I ij (Θ) = ∂ 2 L ∂Θ i ∂Θ j . The total FIM, I (Θ), can be approximated by For real data, J Θ is obtained after the maximum likelihood estimate of Θ is obtained at the point of convergence of the iterative numerical scheme involved in the estimation process.
SupposeΘ is the maximum likelihood estimate of Θ. Under the usual regularity conditions and that the parameters are in the interior of the parameter space, but not on the boundary, we have: is the inverse of the expected FIM, which is also the variance-covariance matrix of the parameters. The asymptotic behavior is still valid if I −1 (Θ) is replaced by the inverse of the observed information matrix evaluated at Θ, that is J Θ . The multivariate normal distribution with mean vector 0 = (0 0 0 0 0) T and variance-covariance matrix I −1 (Θ) can be used to construct confidence intervals for the parameters. The approximate 100(1 − θ)% two-sided confidence interval for the estimates of the parameters α, β, c, k, and λ are given bŷ

Model Application
Application of the model to the reported daily number of infections from COVID-19 in Nigeria is carried out in this section.

Analysis and results
The data set which is contained in Table 1 is for the period March 18th -June 11th, 2020. The mean, standard deviation, skewness and excess kurtosis of the original data set are 170.4405, 159.0805, 0.9139 and 0.5417 respectively. Observe that the positive value of skewness of the data implies that the data set is skewed to the right and the excess kurtosis value being positive implies that the distribution is peaked and possesses thick tails. A logarithmic transformation of the data set yields a mean value of 4.3655, a standard deviation value of 1.5337, a skewness value of -0.5780 and an excess kurtosis value of -1.0859. Observe that the log transformed data set possesses less variability as evident in its standard deviation value in comparison with that of the original data set. The skewness of the log transformed data set is to the left and the kurtosis value being negative implies that the distribution of the log-transformed data set is flat with thin tails. Scatter plots and boxplots of the original and the log transformed data are shown in Figure 6(a-d). The histogram and the kernel density of both the original and the log transformed data sets are shown in Figure 7(a-d). The empirical cumulative probability and empirical quantile plots using (1) and (2) for both the original and log transformed data are shown in Figure 8(a-d). Using the model in (5) Figure  9(a-d). The inverse of the observed information matrix which also corresponds to the variance-covariance matrix of the estimate of the parameters is given in (12). In Table 2, 95% confidence interval for the parameter estimates are presented. Some daily infection thresholds for some given return periods are computed and presented in Table 3.

Discussion
Results from the analysis clearly support the adequacy of the model in fitting the logarithm of the daily number of reported infections. This is evident in the fact that the p-value of the K-S statistic being 0.5566 is significantly greater than the nominal 0.05. Thus, we are 95% certain that the proposed model is not different from the true process the logarithm of the infections number is following. Put differently, we can confidently used (5) and (6) to determine the cumulative and exact probabilities respectively of observing the logarithm of a certain number of infections on a given day. This will no doubt be very helpful for planning by the Task Force Team saddled with the responsibility of managing the pandemic and indeed all health practitioners. To further buttress the adequacy of the model in fitting the data set, it can be observed in Figure  9 panel (a) that the empirical cumulative probabilities based on the data set closely aligns with the theoretical ones based on the model. Also, in panel (b) of Figure 9, it can also be observed that the empirical quantiles (the 45 0 line) based on the data is almost the same as the theoretical ones generated by the model. The same holds true in panel (d) of Figure 9 where the empirical probabilities based on the data set closely aligns with the theoretical ones based on the model. In panel (c) of Figure 9, it can be observed that the fitted density based on the model adequately approximates the histogram of the data set while also closely aligning with the kernel density of the data in Figure 7 panel (d).  In Table 2, it can be observed that the maximum likelihood method of parameter estimation provided very good estimates for the model's parameters. This is clearly shown by the short width of the confidence interval for each of the parameter estimates. In Table 3, some daily infection thresholds X(t) are computed and shown for some given return period t. For example, for t = 7, the infection threshold is 346. This means that on average 346 number of reported cases will be recorded or exceeded at least once in 7 days. We also observed that as the return period t increases, the daily infection threshold increases also, but increases more sharply at lower return periods than at higher ones while for t within the neighborhood of 100 days, the daily infection threshold does not exceed 800 and this has been the reality of the pandemic in Nigeria since the first case was recorded with the highest reported daily infections number not exceeding 800.

Conclusion
Stochastic quantification of several real life problems have been very helpful in understanding the random nature of their incidence or occurrence. This has further helped in finding solutions to the problems arising from them either in form of minimization of their undesirability or maximizing their rewards. A pandemic is usually a situation of untold inconvenience and as such, all hands must be on deck to help bring it to a halt. Finding appropriate models that can help explain one or more of its effects will no doubt help in the provision of solutions. In this paper that has been carried out using Nigeria as a case study. This article is by no means an exhaustive exposition on the scheme of things but a contribution in an analytical and in a numerical way for the better understanding of the probabilistic trajectory of the pandemic in Nigeria.