مقاله انگلیسی رایگان در مورد یک مدل آموزش طبقه بندی شده توزیع ساده شده برای داده های بزرگ – الزویر 2019

 

مشخصات مقاله
ترجمه عنوان مقاله یادگیری گروهی توزیع شده برچسب – آگاهی: یک مدل آموزش طبقه بندی شده توزیع ساده شده برای داده های بزرگ
عنوان انگلیسی مقاله Label-Aware Distributed Ensemble Learning: A Simplified Distributed Classifier Training Model for Big Data
انتشار مقاله سال 2019
تعداد صفحات مقاله انگلیسی 12 صفحه
هزینه دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده نشریه الزویر
نوع نگارش مقاله
مقاله پژوهشی (Research Article)
مقاله بیس این مقاله بیس نمیباشد
نمایه (index) Scopus – Master Journals List – JCR
نوع مقاله ISI
فرمت مقاله انگلیسی  PDF
ایمپکت فاکتور(IF)
3.643 در سال 2018
شاخص H_index 16 در سال 2019
شاخص SJR 0.984 در سال 2018
شناسه ISSN 2214-5796
شاخص Quartile (چارک) Q1 در سال 2018
مدل مفهومی ندارد
پرسشنامه ندارد
متغیر ندارد
رفرنس دارد
رشته های مرتبط مهندسی کامپیوتر، مهندسی فناوری اطلاعات
گرایش های مرتبط هوش مصنوعی، مهندسی الگوریتم ها و محاسبات، رایانش ابری
نوع ارائه مقاله
ژورنال
مجله  بررسی کلان داده ها – Big Data Research
دانشگاه School of Computing, Queen’s University, Kingston, ON, Canada
کلمات کلیدی داده هاي بزرگ ، تحليل ، توزيع شده، يادگيري ماشين ، طبقه بندي
کلمات کلیدی انگلیسی Big Data، Analytics، Distributed، Machine learning، Classification
شناسه دیجیتال – doi
https://doi.org/10.1016/j.bdr.2018.11.001
کد محصول E11523
وضعیت ترجمه مقاله  ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله سفارش ترجمه این مقاله

 

فهرست مطالب مقاله:
Abstract

1- Introduction

2- Distributed classifier training: benefits and pitfalls

3- The Label-Aware Distributed Ensemble Learning (LADEL) model

4- Evaluation

5- Conclusions and future work

References

 

بخشی از متن مقاله:

Abstract

Label-Aware Distributed Ensemble Learning (LADEL) is a programming model and an associated implementation for distributing any classifier training to handle Big Data. It only requires users to specify the training data source, the classification algorithm and the desired parallelization level. First, a distributed stratified sampling algorithm is proposed to generate stratified samples from large, pre-partitioned datasets in a shared-nothing architecture. It executes in a single pass over the data and minimizes inter-machine communication. Second, the specified classification algorithm training is parallelized and executed on any number of heterogeneous machines. Finally, the trained classifiers are aggregated to produce the final classifier. Data miners can use LADEL to run any classification algorithm on any distributed framework, without any experience in parallel and distributed systems. The proposed LADEL model can be implemented on any distributed framework (Drill, Spark, Hadoop, etc.) to speed up the development of its data mining capabilities. It is also generic and can be used to distribute the training of any classification algorithm of any sequential single-node data mining library (Weka, R, scikit-learn, etc.). Distributed frameworks can implement LADEL to distribute the execution of existing data mining libraries without rewriting the algorithms to run in parallel. As a proof-of-concept, the LADEL model is implemented on Apache Drill to distribute the training execution of Weka’s classification algorithms. Our empirical studies show that LADEL classifiers have similar and sometimes even better accuracy to the single-node classifiers and they have a significantly faster training and scoring times.

Introduction

Data mining is the process of discovering hidden patterns in data and using these patterns to predict the likelihood of future events. Several problems can be addressed using data mining like: • Classification: Predict the category (discrete) of a new data point. • Regression: Predict the value (continuous) of a new data point. • Clustering: Split data points into categories. • Association Rules: Find relationships between attributes. In this work, we focus on the Classification problem and ways of making it Big Data ready. Classification is a supervised learning approach consisting of two phases: (1) Training: a classifier is built using historical labeled data (i.e data with known category) and (2) Scoring: the trained classifier is used to predict the category of new data points (i.e with unknown category). With the large volume of Big Data, classifier training time and memory requirements are a real challenge. Scalable distributed data mining libraries like Apache Mahout [1], Cloudera Oryx [2], Oxdata H2O [3], MLlib [4] [5] and Deeplearning4j [6] implement distributed versions of the classification algorithms to run on Hadoop [7] and Spark [8]. Distributing classifier training significantly reduces the training time and enables digesting of Big Data. However, the approach used by scalable libraries requires rewriting the classification algorithms to execute in parallel. The rewriting process is complex, timeconsuming and the quality of the modified algorithm depends entirely on the contributors’ expertise. Thus, scalable libraries fail to support as many algorithms as sequential single-node libraries like R [9], Weka [10], scikitlearn [11] and RapidMiner [12].

دیدگاهتان را بنویسید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *

دکمه بازگشت به بالا