مشخصات مقاله | |
ترجمه عنوان مقاله | مدل سازی رگرسیون خطی خوشه ای با محدودیت های مقیاس نرم |
عنوان انگلیسی مقاله | Clusterwise linear regression modeling with soft scale constraints |
انتشار | مقاله سال 2017 |
تعداد صفحات مقاله انگلیسی | 19 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
پایگاه داده | نشریه الزویر |
نوع نگارش مقاله | مقاله پژوهشی (Research article) |
مقاله بیس | این مقاله بیس نمیباشد |
نمایه (index) | scopus – master journals – JCR |
نوع مقاله | ISI |
فرمت مقاله انگلیسی | |
ایمپکت فاکتور(IF) | 1.766 در سال 2017 |
شاخص H_index | 81 در سال 2017 |
شاخص SJR | 0.866 در سال 2017 |
رشته های مرتبط | آمار |
گرایش های مرتبط | آمار ریاضی |
نوع ارائه مقاله | ژورنال |
مجله / کنفرانس | مجله بین المللی معارف تقریبی – International Journal of Approximate Reasoning |
دانشگاه | Department of Economics and Business – University of Catania – Italy |
کلمات کلیدی | رگرسیون خطی خوشه ای، محدودیت های انطباقی، معادله رگرسيونی، محدوده قابل قبول، برآوردگرهای نرم، الگوریتم EM محدود |
کلمات کلیدی انگلیسی | Clusterwise linear regression, Adaptive constraints, Regression equivariance, Plausible bounds, Soft estimators, Constrained EM algorithm |
شناسه دیجیتال – doi |
http://dx.doi.org/10.1016/j.ijar.2017.09.006 |
کد محصول | E9546 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
فهرست مطالب مقاله: |
Abstract 1 Introduction 2 Constrained approaches for ML estimation 3 The proposed methodology 4 Numerical studies 5 Four real data applications 6 Conclusions References |
بخشی از متن مقاله: |
abstract Constrained approaches to maximum likelihood estimation in the context of finite mixtures of normals have been presented in the literature. A fully data-dependent soft constrained method for maximum likelihood estimation of clusterwise linear regression is proposed, which extends previous work in equivariant data-driven estimation of finite mixtures of normals. The method imposes soft scale bounds based on the homoscedastic variance and a cross-validated tuning parameter c. In our simulation studies and real data examples we show that the selected c will produce an output model with clusterwise linear regressions and clustering as a most-suited-to-the-data solution in between the homoscedastic and the heteroscedastic models. Introduction Let {(yi, xi)}n = {(y1, x1),…,(yn, xn)} be a sample of n independent units, where yi is the outcome variable and xi are the J covariates. A clusterwise linear regression model assumes that the density of yi|xi is given by f (yi|xi;ψ) = G g=1 pg f g (yi|xi;σ 2 g ,βg ) = G g=1 pg 1 2πσg 2 exp − (yi − x i βg )2 2σ 2 g , (1) where G is the number of clusters, ψ = {(p1,…, pG;β1,…,βG;σ 2 1 ,…,σ 2 G ) ∈ RG( J+2) : p1 +···+ pG = 1, pg ≥ 0,σ 2 g > 0, g = 1,…, G} is the set of model parameters, and pg , βg , and σ 2 g are respectively the mixing proportions, the vector of J regression coefficients, and the variance term for the g-th cluster. The model in Equation (1) is also known under the name of finite mixture of linear regression models, or switching regression model [21,22,15]. The parameters of finite mixtures of linear regression models are identified if some mild regularity conditions are met [10]. The clusterwise linear regression model of Equation (1) can naturally serve as a classification model. Based on the model, one computes the posterior membership probabilities for each observation as follows: and then classify each observation according, for instance, to fuzzy or crisp classification rules. The problem of clustering sample points grouped around linear structures has been receiving a lot of attention in the statistical literature because of its important applications (see, for instance, [16], and references therein. For the robust literature, among the others, see [6,7]). In order to estimate ψ, one has to maximize the following sample likelihood function L(ψ; y) =n i=1 G g=1 pg 1 2πσg 2 exp − (yi − x i βg )2 2σ 2 g , (3) which can be done using iterative procedures like the EM algorithm [5], whose clustering can be interpreted as a fuzzy partition [9]. Unfortunately, maximum likelihood (ML) estimation of univariate unconditional or conditional normals suffers from the well-known issue of unboundedness of the likelihood function: whenever a sample point coincides with the group’s centroid and the relative variance approaches zero, the likelihood function increases without bound ([14]; also the multivariate case suffers from the issue of unboundedness. See [4]). Hence a global maximum cannot be found. Yet, ML estimation does not fail: Kiefer [15] showed that there is a sequence of consistent, asymptotically efficient and normally distributed estimators for switching regressions with different group-specific variances (heteroscedastic switching regressions). These estimators correspond, with probability approaching one, to local maxima in the interior of the parameter space. Nonetheless, although there is a local maximum which is also a consistent root, there is no tool for choosing it among the local maxima. Day [4] showed, for multivariate mixtures of normals, that potentially each sample point – or any pair of sample points being sufficiently close together, or co-planar [24] – can generate a singularity in the likelihood function of a mixture with heteroscedastic components. This gives rise, both in univariate and multivariate contexts, to a number of spurious maximizers [18]. |