مقاله انگلیسی رایگان در مورد تغییرات در الگوریتم خوشه بندی BIRCH – الزویر 2018

 

مشخصات مقاله
ترجمه عنوان مقاله تغییرات در الگوریتم خوشه بندی BIRCH
عنوان انگلیسی مقاله Variations on the Clustering Algorithm BIRCH
انتشار مقاله سال 2018
تعداد صفحات مقاله انگلیسی 10 صفحه
هزینه دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده نشریه الزویر
نوع نگارش مقاله
مقاله پژوهشی (Research Article)
مقاله بیس این مقاله بیس میباشد
نمایه (index) Scopus – Master Journal List
نوع مقاله ISI
فرمت مقاله انگلیسی  PDF
ایمپکت فاکتور(IF)
7.184 در سال 2017
شاخص H_index 12 در سال 2019
شاخص SJR 0.757 در سال 2017
شناسه ISSN 2214-5796
شاخص Quartile (چارک) Q1 در سال 2017
رشته های مرتبط مهندسی کامپیوتر
گرایش های مرتبط مهندسی الگوریتم ها و محاسبات، هوش مصنوعی
نوع ارائه مقاله
ژورنال
مجله تحقیقات کلان داده – Big Data Research
دانشگاه Service-centric Networking, Telekom Innovation Laboratories, Technische Universität Berlin, Germany
شناسه دیجیتال – doi
https://doi.org/10.1016/j.bdr.2017.09.002
کد محصول E11090
وضعیت ترجمه مقاله  ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله سفارش ترجمه این مقاله

 

فهرست مطالب مقاله:
Abstract

1- Introduction

2- Related work

3- BIRCH

4- Concept

5- Automatic estimation of the tree-BIRCH threshold for clusters of same element count

6- Automatic estimation of the tree-BIRCH threshold for clusters of different element count

7- Supercluster splitting

8- Multiple branch descent

9- Evaluation

10- Future work

11- Conclusion

References

بخشی از متن مقاله:

Abstract

Clustering algorithms are recently regaining attention with the availability of large datasets and the rise of parallelized computing architectures. However, most clustering algorithms suffer from two drawbacks: they do not scale well with increasing dataset sizes and often require proper parametrization which is usually difficult to provide. A very important example is the cluster count, a parameter that in many situations is next to impossible to assess. In this paper we present A-BIRCH, an approach for automatic threshold estimation for the BIRCH clustering algorithm. This approach computes the optimal threshold parameter of BIRCH from the data, such that BIRCH does proper clustering even without the global clustering phase that is usually the final step of BIRCH. This is possible if the data satisfies certain constraints. If those constraints are not satisfied, A-BIRCH will issue a pertinent warning before presenting the results. This approach renders the final global clustering step of BIRCH unnecessary in many situations, which results in two advantages. First, we do not need to know the expected number of clusters beforehand. Second, without the computationally expensive final clustering, the fast BIRCH algorithm will become even faster. For very large data sets, we introduce another variation of BIRCH, which we call MBD-BIRCH, which is of particular advantage in conjunction with A-BIRCH but is independent from it and also of general benefit.

Introduction

Clustering is an unsupervised learning method that groups a set of given data points into well separated subsets. Two prominent examples of clustering algorithms are k-means, see Macqueen [10], and the expectation maximization (EM) algorithm, see Dempster et al. [6]. This paper addresses two issues with clustering: (1) clustering algorithms usually do not scale well and (2) most algorithms require the number of clusters (cluster count) as input. The first issue is becoming more and more important. For applications that need to cluster, for example, millions of documents, huge image or video databases, or terabytes of sensor data produced by the Internet of Things, scalability is essential. The second issue severely reduces the applicability of clustering in situations where the cluster count is very difficult to predict, such as for data exploration, feature engineering, and document clustering. An important clustering method is balanced iterative reducing and clustering using hierarchies, or BIRCH, which was introduced by Zhang et al. [19] and is one of the fastest clustering algorithms available. It outperforms most of the other clustering algorithms by up to two orders of magnitude. Thus, BIRCH already solves the first issue mentioned above. However, to achieve sufficient clustering quality, BIRCH requires the cluster count as input, therefore failing to solve the second issue. This paper describes a method to use BIRCH without having to provide the cluster count, yet preserving cluster quality and speed. This is achieved as follows. We first remove the global clustering step that is done at the end of BIRCH, since this is slow and often requires extra parameters like e.g. the cluster count as input. Then, by analyzing the remaining part of the BIRCH algorithm, which we call tree-BIRCH, we identify three ways in which tree-BIRCH can go wrong: cluster splitting, cluster combining, and supercluster splitting. This knowledge then enables us to improve tree-BIRCH and compute an optimal threshold parameter from the data. With the resulting algorithm, the user has to provide neither any parameters for the final clustering step like e.g. the cluster count, since there is no final clustering step anymore, nor the threshold parameter, since this is computed automatically. The threshold parameter is computed from two attributes of the data, the maximum cluster radius Rmax and the minimum distance Dmin between clusters.

دیدگاهتان را بنویسید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *

دکمه بازگشت به بالا