مقاله انگلیسی رایگان در مورد رگرسیون لجستیک بهبود یافته برای پیش بینی دیابت با تلفیق تکنیک های PCA و کی-میانگین - الزویر 2019

مشخصات مقاله
ترجمه عنوان مقاله	مدل رگرسیون لجستیک بهبود یافته برای پیش بینی دیابت با تلفیق تکنیک های PCA و کی-میانگین
عنوان انگلیسی مقاله	Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques
انتشار	مقاله سال 2019
تعداد صفحات مقاله انگلیسی	7 صفحه
هزینه	دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده	نشریه الزویر
نوع نگارش مقاله	مقاله پژوهشی (Research Article)
مقاله بیس	این مقاله بیس میباشد
نمایه (index)	Scopus – DOAJ
نوع مقاله	ISI
فرمت مقاله انگلیسی	PDF
ایمپکت فاکتور(IF)	2.108 در سال 2018
شاخص H_index	9 در سال 2019
شاخص SJR	0.295 در سال 2018
شناسه ISSN	2352-9148
شاخص Quartile (چارک)	Q3 در سال 2018
مدل مفهومی	دارد
پرسشنامه	ندارد
متغیر	دارد
رفرنس	دارد
رشته های مرتبط	مهندسی کامپیوتر، ریاضی
گرایش های مرتبط	هوش مصنوعی، مهندسی نرم افزار، ریاضی کاربردی
نوع ارائه مقاله	ژورنال
مجله	انفورماتیک در قفل پزشکی – Informatics In Medicine Unlocked
دانشگاه	School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, China
کلمات کلیدی	کی-میانگین، دیابت، داده کاوی، رگرسیون لجستیک، PCA
کلمات کلیدی انگلیسی	PCA، K-means، Diabetes، Data mining، Logistic regression
شناسه دیجیتال – doi	https://doi.org/10.1016/j.imu.2019.100179
کد محصول	E12734
وضعیت ترجمه مقاله	ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله	دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله	سفارش ترجمه این مقاله

فهرست مطالب مقاله:

Abstract

1- Introduction

2- Related study

3- Methodology

4- Experimental result

5- Discussion

6- Conclusion and future work

References

بخشی از متن مقاله:

Abstract

Diabetes causes a large number of deaths each year and a large number of people living with the disease do not realize their health condition early enough. In this study, we propose a data mining based model for early diagnosis and prediction of diabetes using the Pima Indians Diabetes dataset. Although K-means is simple and can be used for a wide variety of data types, it is quite sensitive to initial positions of cluster centers which determine the final cluster result, which either provides a sufficient and efficiently clustered dataset for the logistic regression model, or gives a lesser amount of data as a result of incorrect clustering of the original dataset, thereby limiting the performance of the logistic regression model. Our main goal was to determine ways of improving the k-means clustering and logistic regression accuracy result. Our model comprises of PCA (principal component analysis), k-means and logistic regression algorithm. Experimental results show that PCA enhanced the k-means clustering algorithm and logistic regression classifier accuracy versus the result of other published studies, with a k-means output of 25 more correctly classified data, and a logistic regression accuracy of 1.98% higher. As such, the model is shown to be useful for automatically predicting diabetes using patient electronic health records data. A further experiment with a new dataset showed the applicability of our model for the predication of diabetes.

Introduction

Diabetes stands among the top 10 causes of death for 2016. Diabetes killed 1.6 million people in 2016, up from less than 1 million in 2000. With this figure diabetes replaced HIV/AIDS as the seventh top cause of death [1]. The number of people with diabetes has risen from 108 million in 1980 to 422 million in 2014, with the global prevalence of diabetes among adults over 18 years of age rising from 4.7% in 1980 to 8.5% in 2014 [2]. By 2040, 642 million adults (1 in 10 adults) are expected to have diabetes. Also, 46.5% of those with diabetes have not been diagnosed [3]. In order to reduce the number of deaths attributable to diabetes, it is essential that methods and techniques that will aid in early diagnosis of diabetes be devised, because a large number of deaths in diabetic patients are due to late diagnosis. In order to achieve cutting-edge techniques for the early diagnosis of diabetes, we need to utilize advanced information technology, and data mining is a suitable field for this. Data mining offers the ability to extract and discover previously unknown, hidden, but interesting patterns from a large database repository. These patterns can aid medical diagnosis and decision-making. Various techniques and algorithms have been designed for application in extracting knowledge and information in the diagnosis and treatment of disease from medical databases. PCA is a simple, nonparametric method for extracting relevant information from confusing data sets [4]. When a large dataset is to be clustered into a user specified number of clusters (k), which are represented by their centroids, k-means will cluster the data by minimizing the squared error function [5], and often misclassifies some data due to outliers; also the time complexity will be greater. To overcome these problems, principal components analysis (PCA) can be used to reduce the dataset to a lower dimension, while ensuring that the least information is lost, and providing a better centroid point for clustering. K-means clustering partitions a dataset into different groups of similar objects. Clusters that are highly dissimilar from the others are regarded as outliers and discarded. Logistic regression is an efficient regression predictive analysis algorithm. Its application is efficient when the dependent variable of a dataset is dichotomous (binary).

مقاله انگلیسی رایگان در مورد رگرسیون لجستیک بهبود یافته برای پیش بینی دیابت با تلفیق تکنیک های PCA و کی-میانگین – الزویر 2019

دیدگاهتان را بنویسید لغو پاسخ