مشخصات مقاله | |
ترجمه عنوان مقاله | تخمین آستانه مسافت برای خوشه بندی فضاهای فرعی حریصانه |
عنوان انگلیسی مقاله | Estimating distance threshold for greedy subspace clustering |
انتشار | مقاله سال 2019 |
تعداد صفحات مقاله انگلیسی | 18 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
پایگاه داده | نشریه الزویر |
نوع نگارش مقاله |
مقاله پژوهشی (Research Article) |
مقاله بیس | این مقاله بیس نمیباشد |
نمایه (index) | Scopus – Master Journals List – JCR |
نوع مقاله | ISI |
فرمت مقاله انگلیسی | |
ایمپکت فاکتور(IF) |
5.891 در سال 2018 |
شاخص H_index | 162 در سال 2019 |
شاخص SJR | 1.190 در سال 2018 |
شناسه ISSN | 0957-4174 |
شاخص Quartile (چارک) | Q1 در سال 2018 |
مدل مفهومی | ندارد |
پرسشنامه | ندارد |
متغیر | ندارد |
رفرنس | دارد |
رشته های مرتبط | مهندسی کامپیوتر |
گرایش های مرتبط | مهندسی الگوریتم و محاسبات |
نوع ارائه مقاله |
ژورنال |
مجله / کنفرانس | سیستم های خبره با کابردهای مربوطه – Expert Systems with Applications |
دانشگاه | Department of Computer Science & Engineering, Sanjay Ghodawat University, Atigre, Kolhapur, Maharashtra 416118, India |
کلمات کلیدی | خوشه بندی فضاهای فرعی حریصانه، برآورد پارامتر، خوشه بندی پیوندی واحد |
کلمات کلیدی انگلیسی | Greedy subspace clustering، Parameter estimation، Single linkage clustering |
شناسه دیجیتال – doi |
https://doi.org/10.1016/j.eswa.2019.06.011 |
کد محصول | E13566 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
فهرست مطالب مقاله: |
Abstract 1. Introduction 2. Review of approaches for subspace clustering 3. Parameter-free subspace clustering 4. Proposed algorithm 5. Experimental results 6. Conclusion and future scope Funding Declaration of Competing Interest CRediT authorship contribution statement References |
بخشی از متن مقاله: |
Abstract
Many approaches have been proposed to recognize clusters in subspaces. However, their performance is highly sensitive to input parameter values. The purpose and expected ranges of these parameters may not available to a non-expert user. The parameter setting producing optimal results can only be known after repeated execution of the clustering process every time with a different set, which is very time consuming. Most of the existing algorithms show high runtimes due to excessive data scans. In this work, we propose a subspace clustering technique that estimates the distance threshold parameter automatically from the data for each attribute and works on the basis of single linkage clustering, in bottom up, greedy fashion. The experimental results show that, the algorithm produces optimal results without accepting any input from the user, achieves up to 10 times better runtime and improved accuracy in a single run without requiring any tuning of parameter values. Introduction Many recent applications capture huge volumes of data that is big in both directions, i.e. terms of objects and attributes. Comprehending such huge datasets is now beyond human capability and requires use of powerful and automated data mining tools. This wealth of data is of no use unless the knowledge embedded within it, is not uncovered by applying appropriate data mining algorithms. Clustering is an unsupervised data-mining task and no class labels are present in the data except for the test data wherein these labels can be used to verify quality of clustering. The attributes describing the objects are also called as dimensions or variables and the objects represent vertices in multi-dimensional space described by these attributes. In the literature, datasets having more than ten dimensions are called as high-dimensional data (Han, Kamber, & Pei, 2012). Conventional clustering algorithms show unusual and incorrect behavior on such datasets, the reason being they compute inter-cluster similarity based on full attribute space. Such distance calculations result in a dissimilarity value which is nearly equal for all object pairs. Thus all the data points are equally similar or dissimilar due to an effect called “Curse of Dimensionality” (Bellman, 1961) observed in high dimensional data spaces. Hence the underlying clustering algorithm cannot find clusters or natural grouping of the objects based on this similarity value and show inability to mine meaningful clusters in high-dimensional data spaces. |