مقاله انگلیسی رایگان در مورد بهبود عملکرد طبقه بندی با استفاده از الگوریتم انتخاب ویژگی برای داده کاوی – الزویر 2018

 

مشخصات مقاله
ترجمه عنوان مقاله بهبود عملکرد طبقه بندی با استفاده از الگوریتم انتخاب ویژگی زیرمجموعه تصادفی برای داده کاوی
عنوان انگلیسی مقاله Classification Performance Improvement Using Random Subset Feature Selection Algorithm for Data Mining
انتشار مقاله سال 2018
تعداد صفحات مقاله انگلیسی 32 صفحه
هزینه دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده نشریه الزویر
نوع نگارش مقاله
مقاله پژوهشی (Research Article)
مقاله بیس این مقاله بیس نمیباشد
نمایه (index) Scopus – Master Journal List
نوع مقاله ISI
فرمت مقاله انگلیسی  PDF
ایمپکت فاکتور(IF)
7.184 در سال 2017
شاخص H_index 12 در سال 2019
شاخص SJR 0.757 در سال 2017
شناسه ISSN 2214-5796
شاخص Quartile (چارک) Q1 در سال 2017
رشته های مرتبط مهندسی کامپیوتر
گرایش های مرتبط مهندسی الگوریتم ها و محاسبات
نوع ارائه مقاله
ژورنال
مجله  تحقیقات کلان داده – Big Data Research
دانشگاه Department of IT, Anurag Group of Institutions, Hyderabad, India
کلمات کلیدی جنگل تصادفی، انتخاب ویژگی زیرمجموعه، کاهش ابعاد، اطلاعات علمی، پایداری
کلمات کلیدی انگلیسی Random forest، Subset feature selection، Dimensionality reduction، Scientific data، Stability
شناسه دیجیتال – doi
https://doi.org/10.1016/j.bdr.2018.02.007
کد محصول E11092
وضعیت ترجمه مقاله  ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله سفارش ترجمه این مقاله

 

فهرست مطالب مقاله:
Abstract

1- Introduction

2- Dimensionality reduction techniques related work

3- About the existing RSFS algorithm

4- Proposed RSFS algorithm

5- Experiments and results

6- Conclusion

References

بخشی از متن مقاله:

Abstract

This study focuses on feature subset selection from high dimensionality databases and presents modification to the existing Random Subset Feature Selection (RSFS) algorithm for the random selection of feature subsets and for improving stability. A standard k-nearest-neighbor (kNN) classifier is used for classification. The RSFS algorithm is used for reducing the dimensionality of a data set by selecting useful novel features. It is based on the random forest algorithm. The current implementation suffers from poor dimensionality reduction and low stability when the database is very large. In this study, an attempt is made to improve the existing algorithm’s performance for dimensionality reduction and increase its stability. The proposed algorithm was applied to scientific data to test its performance. With 10 fold cross-validation and modifying the algorithm classification accuracy is improved. The applications of the improved algorithm are presented and discussed in detail. From the results it is concluded that the improved algorithm is superior in reducing the dimensionality and improving the classification accuracy when used with a simple kNN classifier. The data sets are selected from public repository. The datasets are scientific in nature and mostly used in cancer detection. From the results it is concluded that the algorithm is highly recommended for dimensionality reduction while extracting relevant data from scientific datasets.

Introduction

Data mining, the extraction of useful hidden features from large databases, is an effective new innovation with incredible potential to help organizations, focus on developing business strategies. The tools, developed for mining data, anticipate future patterns and practices, permitting organizations to make proactive, learning-driven choices. Many data mining tools can address business challenges more effectively than can traditional query or report-based tools. The performance of traditional tool’s is very poor because of the large quantities of data involved. However, large quantities of data might sometimes result in poor performance in data analytics applications as well. Most data mining algorithms are implemented column-wise, which makes them become slower as the number of features increases. When the quantity of collected data is very large, mining for relevant data is a challenge. This is known as the “curse Of dimensionality”[1, 2, 3, 4]. Hence, there is a need for reducing the dimensionality of data without compromising the intrinsic geometric properties. Several methods have been developed, as shown in Figure (1), to address the challenge. Especially in the fields of bio-medical engineering, drug testing, cancer research, the data quantities involved are huge, and collecting them is very expensive. The data generated from experiments, in the above-mentioned fields are popularly known as scientific data. Such scientific data are tend to be noisy and sparse in nature [5, 6]. Because of this, standard data mining tools often do not perform efficiently when applied to scientific data. In this paper, an attempt is made to improve the existing random subset feature selection(RSFS) algorithm for better dimensionality reduction when applied on scientific data. Scientific data sets result from extensive research in fields such as cancer research, bio-informatics, medical diagnosis, genetic engineering and weather studies. These data sets are sparse in nature. For example, cancer, also called malignancy, is an abnormal growth of cells. For cancer treatment chemotherapy, radiation, and/or surgery may be required according to the severity of the disease. In this study, we attempted to reduce the number of features to aid in the detection of cancer, leading to time savings and saved lives. In this paper, we propose a dimensionality reduction on features when applied to cancer data sets. We describe and evaluate our approach in 4 phases: (1) improvement of the random subset feature selection(RSFS) algorithm, (2)the two-sample t-test to ascertain whether the difference between the existing and proposed algorithms is significant, (3) a box plot comparing proposed algorithm’s performance with that of the existing algorithm when datasets are from two classes are of a multi-class labeled type, and (4)stability enhancement for a stable feature subset. This paper is organized into 6 sections: 1. Introduction(this section), 2. Dimensionality Reduction Techniques Related Work, 3. About the existing RSFS, 4. Proposed RSFS Algorithm(present work), 5. Experiments and Results, and 6. Conclusion.

دیدگاهتان را بنویسید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *

دکمه بازگشت به بالا