مقاله انگلیسی رایگان در مورد فیلترینگ نویز در طبقه بندی کلان داده – الزویر 2019

 

مشخصات مقاله
ترجمه عنوان مقاله توانمندسازی داده های هوشمند: فیلترینگ نویز در طبقه بندی کلان داده
عنوان انگلیسی مقاله Enabling Smart Data: Noise filtering in Big Data classification
انتشار مقاله سال 2019
تعداد صفحات مقاله انگلیسی 18 صفحه
هزینه دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده نشریه الزویر
نوع نگارش مقاله
مقاله پژوهشی (Research Article)
مقاله بیس این مقاله بیس نمیباشد
نمایه (index) Scopus – Master Journal List – JCR
نوع مقاله ISI
فرمت مقاله انگلیسی  PDF
ایمپکت فاکتور(IF)
6.774 در سال 2018
شاخص H_index 154 در سال 2019
شاخص SJR 1.620 در سال 2018
شناسه ISSN 0020-0255
شاخص Quartile (چارک) Q1 در سال 2017
رشته های مرتبط مهندسی کامپیوتر
گرایش های مرتبط مهندسی نرم افزار، مهندسی الگوریتم ها و محاسبات
نوع ارائه مقاله
ژورنال
مجله  علوم اطلاعات – Information Sciences
دانشگاه Department of Computer Science and Artificial Intelligence, University of Granada, Granada, 18071, Spain
کلمات کلیدی کلان داده، داده های هوشمند، طبقه بندی، دسته بندی نویز، برچسب نویز
کلمات کلیدی انگلیسی Big Data، Smart Data، Classification، Class noise، Label noise
شناسه دیجیتال – doi
https://doi.org/10.1016/j.ins.2018.12.002
کد محصول E11242
وضعیت ترجمه مقاله  ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله سفارش ترجمه این مقاله

 

فهرست مطالب مقاله:
Abstract

1- Introduction

2- Related work

3- Towards Smart Data: Noise filtering for Big Data

4- Experimental results

5- Conclusions

References

بخشی از متن مقاله:

Abstract

In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent years, also follow the same dictate. A common problem affecting data quality is the presence of noise, particularly in classification problems, where label noise refers to the incorrect labeling of training instances, and is known to be a very disruptive feature of data. However, in this Big Data era, the massive growth in the scale of the data poses a challenge to traditional proposals created to tackle noise, as they have difficulties coping with such a large amount of data. New algorithms need to be proposed to treat the noise in Big Data problems, providing high quality and clean data, also known as Smart Data. In this paper, two Big Data preprocessing approaches to remove noisy examples are proposed: an homogeneous ensemble and an heterogeneous ensemble filter, with special emphasis in their scalability and performance traits. The obtained results show that these proposals enable the practitioner to efficiently obtain a Smart Dataset from any Big Data classification problem.

Introduction

Vast amounts of information surround us today. Technologies such as the Internet generate data at an exponential rate thanks to the affordability and great development of storage and network resources. It is predicted that by 2020, the digital universe will be 10 times as big as it was in 2013, totaling an astonishing 44 zettabytes. The current volume of data has exceeded the processing capabilities of classical data mining systems [47] and have created a need for new frameworks for storing and processing this data. It is widely accepted that we have entered the Big Data era. Big Data is the set of technologies that make processing such large amounts of data possible [8], while most of the classic knowledge extraction methods cannot work in a Big Data environment because they were not conceived for it. Big Data as concept is defined around five aspects: data volume, data velocity, data variety, data veracity and data value. While the volume, variety and velocity aspects refer to the data generation process and how to capture and store the data, veracity and value aspects deal with the quality and the usefulness of the data. These two last aspects become crucial in any Big Data process, where the extraction of useful and valuable knowledge is strongly influenced by the quality of the used data.

دیدگاهتان را بنویسید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *

دکمه بازگشت به بالا