مقاله انگلیسی رایگان در مورد الگوریتم پیشرفته ماشین بردار پشتیبانی برای طبقه بندی کلان داده – IEEE 2018
مشخصات مقاله | |
ترجمه عنوان مقاله | تحقیق درباره الگوریتم پیشرفته ماشین بردار پشتیبانی (SVM) برای طبقه بندی کلان داده |
عنوان انگلیسی مقاله | Research on SVM improved algorithm for large data classification |
انتشار | مقاله سال ۲۰۱۸ |
تعداد صفحات مقاله انگلیسی | ۵ صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
پایگاه داده | نشریه IEEE |
مقاله بیس | این مقاله بیس نمیباشد |
فرمت مقاله انگلیسی | |
رشته های مرتبط | مهندسی کامپیوتر، فناوری اطلاعات |
گرایش های مرتبط | الگوریتم ها و محاسبات، هوش مصنوعی، مدیریت سیستم های اطلاعاتی |
نوع ارائه مقاله |
کنفرانس |
مجله / کنفرانس | کنفرانس بین المللی تحلیل کلان داده – IEEE 3rd International Conference on Big Data Analysis |
دانشگاه | Liaoning University of Science and Technology – Anshan LiaoNing |
کلمات کلیدی | ماشین بردار پشتیبانی (SVM)؛ کلان داده؛ چند طبقه بندی؛ فاصله اقلیدس؛ تابع هسته انتگرالی شعاعی |
کلمات کلیدی انگلیسی | support vector machine (SVM); large data; multiclassification;Euclidean distance; radial integral kernel function |
شناسه دیجیتال – doi |
https://doi.org/10.1109/ICBDA.2018.8367673 |
کد محصول | E10335 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
فهرست مطالب مقاله: |
Abstract I INTRODUCTION II THE WEIGHTED EUCLIDEAN DISTANCE AND THE RADIAL PRODUCT KERNEL FUNCTION SVM III CONCLUSION References |
بخشی از متن مقاله: |
Abstract
In view of the two problems of the SVM algorithm in processing large data, the paper proposed a weighted Euclidean distance, radial integral kernel function SVM and dimensionality reduction algorithm for large data packet classification. The SVM cannot handle multi classification and time of building model is long. The algorithm solved these problems. The improved algorithm reconstructs the data feature space, makes the boundary of different data samples clearer, shortens the modeling time, and improves the accuracy of classification. The proposed method verified the feasibility and effectiveness with experiments. The experimental results show that the improved algorithm can achieve better results when multi-duplicated samples and large data capacity are used for multi classification. INTRODUCTION The rapid development of network technology makes a huge amount of data every day. The rapid and accurate classification of the vast amounts of data collected is necessary to extract comprehensible knowledge. According to forecasting by market research firm IDC, global data will exceed 40ZB by 2020[1]. Many industries have provided storage systems with capacity ranging from tens of gigabytes to hundreds of terabytes, or even petabytes. But nearly 60% of the data is repeated, which not only increases data storage, processing time, but also leads to higher and higher costs of data analysis and classification. The efficient and accurate classification algorithm is one of the hot issues in current industry research. There are some common classification algorithms. For example, K-Nearest Neighbor ǃ Native BayesǃNeural NetǃSupport Vector Machine and Linear Least Square Fit and so on[2]. Support vector machine (SVM) algorithm is a kind of machine learning method based on VC dimension theory in statistical learning theory and structural risk minimum principle. It has excellent data classification and regression processing ability[3]. The support vector method was first proposed by Vapnik to solve the problem of pattern recognition. It selects a set of characteristic subsets from the training samples, so that the classification of the characteristic subset is equivalent to the division of the whole dataset. The characteristic subset is called the support vector (SV). Due to its excellent learning ability, the application scope is very wide. For example, intrusion detection, facial expression classification, Time series prediction, speech recognition, signal processing, Gene detection, text classification, font recognition, Fault diagnosis, chemical analysis, image recognition and other fields. SVM algorithm has some obvious advantages in solving classification problems. It has a shorter forecast time. The global optimal solution can guarantee the accuracy of the target detection classifier in the classification. But there are some disadvantages, such as the detection model is established for a long time. Time complexity and space complexity increase linearly with the increase of data when processing large scale data. The data objects are often large data sets in the emerging fields of data mining, document classification and multimedia indexing. The number of attributes and the number of records are very large resulting in poor execution of the processing algorithm[4]. The classifier is only determined based on support vector machine by support vector. The complexity of the classifier is not related to the number of training samples. It only has to do with the number of support vectors[5]. In the paper, propose a weighted Euclidean distance, the radial product kernel function and the decreasing dimension packet support vector machine method, reduces the data dimension, remove redundant feature attributes and duplicate data. A classification model with better generalization ability is obtained by using less support vectors. Reducing storage and processing of data resources, speed up the classification model established time, solve big data classification problems. |