مقاله انگلیسی رایگان در مورد کاربرد روش SVM موازی ارتقاء یافته مبتنی بر هستی شناسی – الزویر 2013

 

مشخصات مقاله
ترجمه عنوان مقاله کاربرد روش SVM موازی ارتقاء یافته مبتنی بر هستی شناسی در آموزش فیلترینگ (فیلتر کردن) مقیاس پذیر هرزنامه (اسپم)
عنوان انگلیسی مقاله An ontology enhanced parallel SVM for scalable spam filter training
انتشار مقاله سال 2013
تعداد صفحات مقاله انگلیسی  13 صفحه
هزینه دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده نشریه الزویر
نوع نگارش مقاله
مقاله پژوهشی (Research article)
مقاله بیس این مقاله بیس نمیباشد
نمایه (index) scopus – master journals – JCR
نوع مقاله ISI
فرمت مقاله انگلیسی  PDF
ایمپکت فاکتور(IF)
3.779 در سال 2017
شاخص H_index 100 در سال 2019
شاخص SJR 1.073 در سال 2017
شناسه ISSN 0925-2312
شاخص Quartile (چارک) Q1 در سال 2017
رشته های مرتبط مهندسی کامپیوتر
گرایش های مرتبط مهندسی نرم افزار – معماری کامپیوتر
نوع ارائه مقاله
ژورنال
مجله / کنفرانس Neurocomputing
دانشگاه School of Engineering and Design, Brunel University, Uxbridge, Middlesex, UB8 3PH, UK
کلمات کلیدی فیلترینگ (فیلتر کردن) هرزنامه، ماشین بردار پشتیبان (SVM)، رایانش (محاسبات) موازی، طبقه بندی، نگاشت کاهش
کلمات کلیدی انگلیسی Spam filtering, Support vector machine, Parallel computing, Classification, MapReduce
شناسه دیجیتال – doi
https://doi.org/10.1016/j.neucom.2012.12.001
کد محصول E11912
وضعیت ترجمه مقاله  ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله سفارش ترجمه این مقاله

 

فهرست مطالب مقاله:
Outline
Abstract
Keywords
1. Introduction
2. Related work
3. Parallelizing SVM with MapReduce
4. Ontology augmentation
5. Experimental results
6. Conclusions and future work
References

بخشی از متن مقاله:

Abstract

Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart.

Introduction

The overall email ecosystem has become the most ubiquitous modern day communication tool. Its popularity as well as widespread availability have also created an opportunity for a lucrative business model based on unsolicited bulk email, or rather spam. The proliferation of spam has reached widespread proportions. Spam damages enabling communication infrastructures as well as lumbers consumers and service providers in its trail. Such impact and subsequently the importance to identify better ways to control and mitigate its consequences are reflected in the attention that spam continuously gets from various perspectives. Machine learning techniques such as Support Vector Machines (SVMs) have been applied in spam filtering [1–5]. SVM training is a computationally intensive process primarily due to its convex quadratic programming challenges associated with the dense Hessian Matrix involved during optimization. Numerous SVM formulations, solvers and architectures for improving SVM performance have been explored and proposed [6,7] including the Message Passing Interface (MPI) based parallel approaches [8–10]. SVM decomposition is another technique for improving SVM training performance [11–13]. Less conventional approaches include the utilization of specialized processing hardware such as Graphics Processing Units (GPUs) [14,15]. A widespread practice is to split the training data and use a number of SVMs to process the individual data chunks. This approach typically splits the training data set into a number of smaller fragments. This in turn reduces the individual training set and the overall training time. Most forms of decomposition which are based on a data splitting strategy approach tend to suffer from issues including convergence and accuracy. Challenges related to chunk aliasing, outlier accumulation and data imbalance/distribution tend to intensify problems in a parallel SVM context. This in turn impacts generalization, accuracy and convergence. Techniques such as random sampling have also been shown to exhibit similar accuracy challenges because of the induced probability distribution variance [16]. On the other hand, selective sampling techniques applied to SVMs try to sample the training data intelligently to maximize the performance of SVMs. However these normally require many scans of the entire data set which incurs high overhead in computation.

دیدگاهتان را بنویسید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *

دکمه بازگشت به بالا