مشخصات مقاله | |
انتشار | مقاله سال 2018 |
تعداد صفحات مقاله انگلیسی | 13 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
منتشر شده در | نشریه اسپرینگر |
نوع مقاله | ISI |
عنوان انگلیسی مقاله | Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding |
ترجمه عنوان مقاله | الگوریتم خوشه بندی متشکل از مراکز نیمه خوشه ای بر اساس آنتروپی بالقوه و توزیع همبستگی تصادفی توزیع t |
فرمت مقاله انگلیسی | |
رشته های مرتبط | مهدسی کامپیوتر و فناوری اطلاعات |
گرایش های مرتبط | مهندسی الگوریتم ها و محاسبات، شبکه های کامپیوتری |
مجله | محاسبات نرم – Soft Computing |
دانشگاه | Zhejiang Sci-Tech University – Hangzhou – China |
کلمات کلیدی | خوشه بندی اطلاعات، خوشه بندی مراکز نیمه خوشه ای، آنتروپی بالقوه، پارامتر بهینه، توزیع تصادفی T، همسایه |
کلمات کلیدی انگلیسی | Data clustering, Quasi-cluster centers clustering, Potential entropy, Optimal parameter, t-distributed stochastic neighbor embedding |
کد محصول | E7611 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
بخشی از متن مقاله: |
1 Introduction
The purpose of clustering is that dividing the objects into different clusters or classes according to the similarity of sample data. Clustering technology has been widely used in many fields: pattern recognition (Horn and Gottlieb 2002), image processing (Liew and Yan 2003; Li and Shen 2010), and machine learning (Wu 2014). The traditional clustering methods could be roughly grouped into five categories: hierarchical clustering, partition-based clustering, density-based clustering, grid-based clustering, and model-based clustering (Omran et al. 2007; Xu and Tian 2015). The basic idea of hierarchical clustering is to establish a hierarchical relationship of all data points based on the hierarchical tree structure, there are two ways to realize it: bottom-up and top-down. The former supposes that each object stands for an individual cluster at the beginning; then, the most similar two clusters are merged into a new cluster loops until the last one is left. The latter is the opposite process. BIRCH (Zhang et al. 1996, 1997; Madan and Dana 2015), ROCK (Guha et al. 1999; Dutta et al. 2005), and Chameleon (Karypis et al. 1999) are the representatives of this sort of method. Hierarchical clustering does not require the number of clusters to be specified in advance and can handle isolated and noise data well, but the complexity of time and space is too high to be suitable for large dataset. Partition-based clustering regards the center of data points as the center of the corresponding cluster, and the quality of clustering would be gradually improving through attempting to move data objects from one cluster to others employing iterative relocation technique. K-means (Macqueen 1967) and K-medoids (Park and Jun 2009) are the two most famous ways of this kind of clustering algorithm. Partition-based clustering has relatively low time complexity and high computing efficiency in general. However, it does not suitable for non-convex datasets and sensitive to the outliers. Except that, the number of clusters needs to be preset. The core idea of density-based clustering is that the data in the region with high density of the data space are considered to belong to the same cluster. There are some representatives: DBSCAN (Ester et al. 1996; Kumar and Reddy 2016), OPTICS (Ankerst et al. 1999), Mean-shift (Comaniciu and Meer 2002), and DP (Rodriguez and Laio 2014; Du et al. 2016; Mehmood et al. 2016). Density-based clustering can correctly cluster the non-spherical-shape datasets, but it always produces clustering results with low quality when the density of data space is not even. Grid-based clustering is based on the idea that the object space is quantized into a finite number of cells, thereby forming a grid structure so that all the clustering operations are carried out in this grid structure. |