A New Hybrid Family of Probability Distributions for Fitting Heavy-tailed Data with Application to Finance
Abstract
The classical continuous univariate probability distributions, which contain one or two parameters, have been observed to break down when complexities exist in the structure of a data set such as when outliers are present, alongside observations centered around the mean. When a data set exhibits heterogeneity or exists in a multi-component form and it becomes impossible to use a single probability distribution to capture the distinct components of the data set, using a composite distribution to model the data set becomes plausible. This situation has led to the formulation of various hybrid or composite models where each component of the hybrid model handles the specific part of the data set that it is well suited for. Furthermore, the approach or method used in the formulation of these hybrid models plays a vital role in determining how meaningful the results obtained from them are. Several approaches or methods for formulating hybrid distributions have appeared in the literature, each with their own pros and cons. We present in this paper a general two-component hybrid model for fitting heterogeneous heavy-tailed data sets with tails to the right. The functional form of the two-component hybrid family is specified by the probability density function (pdf), cumulative distribution function (cdf) and the quantile function. Three members of the family using three different distributions for the right tail are presented. A formal method based on maximum likelihood for the estimation of the parameters of the models belonging to the family is also presented. A Monte Carlo simulation study is carried out to determine the efficiency of the estimation method. An application to a real data set in finance is performed.
References
Bakar, S. A. A., Hamzah, N. A., & Nadarajah, S. (2015). Modeling loss data using composite models. Insurance: Mathematics and Economics, 61, 146-154. https://doi.org/10.1016/j.insmatheco.2014.08.008
Benatmane, C., Zeghdoudi, H., Shanker, R., & Lazri, N. (2020). Composite Rayleigh-Pareto distribution : Application to real fire insurance losses data set. Journal of Statistics and Management Systems, 24(3), 545-557. https://doi.org/10.1080/09720510.2020.1759253
Carreau, J., & Bengio, Y. (2009). A hybrid Pareto mixture for conditional asymmetric fat-tailed distributions. IEEE Transactions on Neural Networks, 7, 1087-1101. https://doi.org/10.1109/TNN.2009.2016339
Cooray, K. (2009). The Weibull–Pareto composite family with applications to the analysis of unimodal failure rate data. Communications in Statistics: Theory and Methods, 38, 1901-1915. https://doi.org/10.1080/03610920802484100
Cooray, K., & Ananda, M. M. A. (2005). Modeling actuarial data with a composite lognormal-Pareto model. Scandinavian Actuarial Journal, 2005(5), 321-334. https://doi.org/10.1080/03461230510009763
Cooray, K., Gunasekera, S., & Ananda, M. (2010). Weibull and inverse Weibull composite distribution for modeling reliability data. Model Assisted Statistics and Applications, 5(2), 109-115. https://doi.org/10.3233/MAS-2010-0149
Debbabi, N., El Asmi, S., & Mboup, M. (2015). Distribution hybride pour la modélisation de données à deux queues lourdes: Application sur les données neuronales. Groupe d’Études du Traitement du Signal et des Images (GRETSI).
Debbabi, N., Kratz, M., & Mboup, M. (2016). A self-calibrating method for heavy-tailed modeling: Application in neuroscience and finance. ESSEC Working Paper, 1619. https://doi.org/10.2139/ssrn.2898731
Li, C., Singh, V. P., & Mishra, A. K. (2012). Simulation of the entire range of daily precipitation using a hybrid probability distribution. Water Resources Research, 48, 1-17. https://doi.org/10.1029/2011WR011446
Mandava, A., Latifi, S., & Emma, R. (2011). Reliability assessment of microarray data using fuzzy classification methods: A comparative study. Communications in Computer and Information Science, 190, 351-360. https://doi.org/10.1007/978-3-642-22709-7_36
Nadarajah, S., & Bakar, S. (2014). New composite models for the Danish fire insurance data. Scandinavian Actuarial Journal, 2014(2), 180-187. https://doi.org/10.1080/03461238.2012.695748
Preda, V., & Ciumara, R. (2006). On composite models: Weibull-Pareto and lognormal-Pareto—A comparative study. Romanian Journal of Economic Forecasting, 3, 32-46.
Scollnik, D. P. (2007). On composite lognormal-Pareto models. Scandinavian Actuarial Journal, 2007(1), 20-33. https://doi.org/10.1080/03461230601110447
Scollnik, D. P., & Sun, C. (2012). Modeling with Weibull-Pareto models. North American Actuarial Journal, 16(2), 260-272. https://doi.org/10.1080/10920277.2012.10590640
Teodorescu, S., & Vernice, R. (2006). A composite exponential-Pareto distribution. Analele Științifice ale Universității Ovidius Constanța, 14, 99-108.
Teodorescu, S., & Vernice, R. (2009). Some composite exponential-Pareto models for actuarial prediction. Romanian Journal of Economic Forecasting, 12, 82-100.
Teodorescu, S., & Vernice, R. (2013). On some Pareto models. Mathematical Reports, 1, 11-29.
This work is licensed under a Creative Commons Attribution 4.0 International License.