مشخصات مقاله | |
ترجمه عنوان مقاله | : یک روش سریع کاوش مجموعه اقلام پرتکرار مبتنی بر GPU برای داده های در مقیاس بزرگ: GMiner |
عنوان انگلیسی مقاله | A fast GPU-based frequent itemset mining method for large-scale data: GMiner |
انتشار | مقاله سال 2018 |
تعداد صفحات مقاله انگلیسی | 45 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
پایگاه داده | نشریه الزویر |
نوع نگارش مقاله |
مقاله پژوهشی (Research Article) |
مقاله بیس | این مقاله بیس میباشد |
نمایه (index) | Scopus – Master Journal List – JCR |
نوع مقاله | ISI |
فرمت مقاله انگلیسی | |
ایمپکت فاکتور(IF) |
5.080 در سال 2017 |
شاخص H_index | 142 در سال 2019 |
شاخص SJR | 1.635 در سال 2017 |
شناسه ISSN | 0020-0255 |
شاخص Quartile (چارک) | Q1 در سال 2017 |
رشته های مرتبط | مهندسی کامپیوتر |
گرایش های مرتبط | مهندسی نرم افزار، مهندسی الگوریتم ها و محاسبات |
نوع ارائه مقاله |
ژورنال |
مجله | مجله علوم اطلاعات – Information Sciences |
دانشگاه | DGIST (Daegu Gyeongbuk Institute of Science and Technology), Daegu, Republic of Korea |
کلمات کلیدی | کاوش مجموعه اقلام پرتکرار، واحد پردازش گرافیکی، الگوریتم موازی، ریزش کاری |
کلمات کلیدی انگلیسی | frequent itemset mining، graphics processing unit، parallel algorithm، workload skewness |
شناسه دیجیتال – doi |
https://doi.org/10.1016/j.ins.2018.01.046 |
کد محصول | E10893 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
فهرست مطالب مقاله: |
Abstract
1- Introduction 2- Related work 3- TFL strategy 4- HIL Strategy 5- Multiple GPUs and cost model 6- Performance evaluation 7- Summary References |
بخشی از متن مقاله: |
Abstract Frequent itemset mining is widely used as a fundamental data mining technique. However, as the data size increases, the relatively slow performances of the existing methods hinder its applicability. Although many sequential frequent itemset mining methods have been proposed, there is a clear limit to the performance that can be achieved using a single thread. To overcome this limitation, various parallel methods using multi-core CPU, multiple machine, or many-core graphic processing unit (GPU) approaches have been proposed. However, these methods still have drawbacks, including relatively slow performance, data size limitations, and poor scalability due to workload skewness. In this paper, we propose a fast GPU-based frequent itemset mining method called GMiner for large-scale data. GMiner achieves very fast performance by fully exploiting the computational power of GPUs and is suitable for large-scale data. The method performs mining tasks in a counterintuitive way: it mines the patterns from the first level of the enumeration tree rather than storing and utilizing the patterns at the intermediate levels of the tree. This approach is quite effective in terms of both performance and memory use in the GPU architecture. In addition, GMiner solves the workload skewness problem from which the existing parallel methods suffer; as a result, its performance increases almost linearly as the number of GPUs increases. Through extensive experiments, we demonstrate that GMiner significantly outperforms other representative sequential and parallel methods in most cases, by orders of magnitude on the tested benchmarks. Introduction As a fundamental data mining technique, frequent itemset mining is widely used in a wide range of disciplines such as market basket analysis, web usage mining, social network analysis, intrusion detection, bioinformatics, and recommendation systems. However, the deluge of data generated by automated systems for diagnostic or analysis purposes makes it difficult or even impossible to apply mining techniques in many realworld applications. The existing methods often fail to find frequent itemsets in such big data within a reasonable amount of time. Thus, in terms of computational time, itemset mining is still a challenging problem that has not yet been completely solved. Many sequential frequent itemset mining methods such as Apriori [2], Eclat [35], FP-Growth [14], and LCM [30] use a single CPU thread. However, these singlethreaded applications all have a fundamental mining performance limit because CPU clock speed is generally no longer increasing. To overcome the single-thread performance limit, multiple parallel frequent itemset mining methods have been proposed. These methods can be categorized into three main groups: (1) (CPU-based) multithreaded methods, (2) distributed methods, and (3) graphic processing unit (GPU)- based methods. We omit the term ”multi-thread” from the GPU-based methods because they are obviously multi-threaded. The first group focuses on accelerating the performance of the single-threaded methods by exploiting multi-core CPUs [20, 24, 26, 27, 29], while the second group tries to accelerate the performance by exploiting multiple machines [12, 17, 19]. Details about these methods are available in recent survey studies. |