مقاله انگلیسی رایگان در مورد مدل سازی رگرسیون خطی خوشه ای با محدودیت های مقیاس نرم

مشخصات مقاله
ترجمه عنوان مقاله	مدل سازی رگرسیون خطی خوشه ای با محدودیت های مقیاس نرم
عنوان انگلیسی مقاله	Clusterwise linear regression modeling with soft scale constraints
انتشار	مقاله سال 2017
تعداد صفحات مقاله انگلیسی	19 صفحه
هزینه	دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده	نشریه الزویر
نوع نگارش مقاله	مقاله پژوهشی (Research article)
مقاله بیس	این مقاله بیس نمیباشد
نمایه (index)	scopus – master journals – JCR
نوع مقاله	ISI
فرمت مقاله انگلیسی	PDF
ایمپکت فاکتور(IF)	1.766 در سال 2017
شاخص H_index	81 در سال 2017
شاخص SJR	0.866 در سال 2017
رشته های مرتبط	آمار
گرایش های مرتبط	آمار ریاضی
نوع ارائه مقاله	ژورنال
مجله / کنفرانس	مجله بین المللی معارف تقریبی – International Journal of Approximate Reasoning
دانشگاه	Department of Economics and Business – University of Catania – Italy
کلمات کلیدی	رگرسیون خطی خوشه ای، محدودیت های انطباقی، معادله رگرسيونی، محدوده قابل قبول، برآوردگرهای نرم، الگوریتم EM محدود
کلمات کلیدی انگلیسی	Clusterwise linear regression, Adaptive constraints, Regression equivariance, Plausible bounds, Soft estimators, Constrained EM algorithm
شناسه دیجیتال – doi	http://dx.doi.org/10.1016/j.ijar.2017.09.006
کد محصول	E9546
وضعیت ترجمه مقاله	ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله	دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله	سفارش ترجمه این مقاله

فهرست مطالب مقاله:

Abstract
1 Introduction
2 Constrained approaches for ML estimation
3 The proposed methodology
4 Numerical studies
5 Four real data applications
6 Conclusions
References

بخشی از متن مقاله:

abstract

Constrained approaches to maximum likelihood estimation in the context of finite mixtures of normals have been presented in the literature. A fully data-dependent soft constrained method for maximum likelihood estimation of clusterwise linear regression is proposed, which extends previous work in equivariant data-driven estimation of finite mixtures of normals. The method imposes soft scale bounds based on the homoscedastic variance and a cross-validated tuning parameter c. In our simulation studies and real data examples we show that the selected c will produce an output model with clusterwise linear regressions and clustering as a most-suited-to-the-data solution in between the homoscedastic and the heteroscedastic models.

Introduction

Let {(yi, xi)}n = {(y1, x1),…,(yn, xn)} be a sample of n independent units, where yi is the outcome variable and xi are the J covariates. A clusterwise linear regression model assumes that the density of yi|xi is given by f (yi|xi;ψ) = G g=1 pg f g (yi|xi;σ 2 g ,βg ) = G g=1 pg 1 2πσg 2 exp − (yi − x i βg )2 2σ 2 g , (1) where G is the number of clusters, ψ = {(p1,…, pG;β1,…,βG;σ 2 1 ,…,σ 2 G ) ∈ RG( J+2) : p1 +···+ pG = 1, pg ≥ 0,σ 2 g > 0, g = 1,…, G} is the set of model parameters, and pg , βg , and σ 2 g are respectively the mixing proportions, the vector of J regression coefficients, and the variance term for the g-th cluster. The model in Equation (1) is also known under the name of finite mixture of linear regression models, or switching regression model [21,22,15]. The parameters of finite mixtures of linear regression models are identified if some mild regularity conditions are met [10]. The clusterwise linear regression model of Equation (1) can naturally serve as a classification model. Based on the model, one computes the posterior membership probabilities for each observation as follows: and then classify each observation according, for instance, to fuzzy or crisp classification rules. The problem of clustering sample points grouped around linear structures has been receiving a lot of attention in the statistical literature because of its important applications (see, for instance, [16], and references therein. For the robust literature, among the others, see [6,7]). In order to estimate ψ, one has to maximize the following sample likelihood function L(ψ; y) =n i=1 G g=1 pg 1 2πσg 2 exp − (yi − x i βg )2 2σ 2 g , (3) which can be done using iterative procedures like the EM algorithm [5], whose clustering can be interpreted as a fuzzy partition [9]. Unfortunately, maximum likelihood (ML) estimation of univariate unconditional or conditional normals suffers from the well-known issue of unboundedness of the likelihood function: whenever a sample point coincides with the group’s centroid and the relative variance approaches zero, the likelihood function increases without bound ([14]; also the multivariate case suffers from the issue of unboundedness. See [4]). Hence a global maximum cannot be found. Yet, ML estimation does not fail: Kiefer [15] showed that there is a sequence of consistent, asymptotically efficient and normally distributed estimators for switching regressions with different group-specific variances (heteroscedastic switching regressions). These estimators correspond, with probability approaching one, to local maxima in the interior of the parameter space. Nonetheless, although there is a local maximum which is also a consistent root, there is no tool for choosing it among the local maxima. Day [4] showed, for multivariate mixtures of normals, that potentially each sample point – or any pair of sample points being sufficiently close together, or co-planar [24] – can generate a singularity in the likelihood function of a mixture with heteroscedastic components. This gives rise, both in univariate and multivariate contexts, to a number of spurious maximizers [18].

مقاله انگلیسی رایگان در مورد مدل سازی رگرسیون خطی خوشه ای با محدودیت های مقیاس نرم – الزویر 2017

دیدگاهتان را بنویسید لغو پاسخ