مشخصات مقاله | |
ترجمه عنوان مقاله | ارزیابی عملکرد الگوریتم های خوشه بندی برای اندازه متغیر و ابعاد مجموعه داده ها |
عنوان انگلیسی مقاله | Performance evaluation of clustering algorithms for varying cardinality and dimensionality of data sets |
انتشار | مقاله سال 2020 |
تعداد صفحات مقاله انگلیسی | 7 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
پایگاه داده | نشریه الزویر |
نوع نگارش مقاله |
مقاله پژوهشی (Research Article) |
مقاله بیس | این مقاله بیس نمیباشد |
نوع مقاله | ISI |
فرمت مقاله انگلیسی | |
ایمپکت فاکتور(IF) |
0.967 در سال 2019 |
شاخص H_index | 18 در سال 2020 |
شاخص SJR | 0.299 در سال 2019 |
شناسه ISSN | 2214-7853 |
مدل مفهومی | ندارد |
پرسشنامه | ندارد |
متغیر | ندارد |
رفرنس | دارد |
رشته های مرتبط | مهندسی کامپیوتر، مهندسی فناوری اطلاعات |
گرایش های مرتبط | مهندسی الگوریتم و محاسبات، اینترنت و شبکه های گسترده |
نوع ارائه مقاله |
ژورنال |
مجله | مواد امروزی: اقدامات – materials today: proceedings |
دانشگاه | Department of Computer Applications, Cochin University of Science and Technology, Kochi, Kerala 682022, India |
کلمات کلیدی | الگوریتم های خوشه بندی، کیفیت خوشه بندی، عملکرد خوشه بندی، رسانه های اجتماعی، زمان چرخش |
کلمات کلیدی انگلیسی | Clustering algorithms، Clustering quality، Clustering performance، Social media، Turnaround time |
شناسه دیجیتال – doi |
https://doi.org/10.1016/j.matpr.2020.01.110 |
کد محصول | E14657 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
فهرست مطالب مقاله: |
Abstract 1. Introduction 2. Antecedents 3. Related works 4. Methodology 5. Empirical research 6. Conclusion CRediT authorship contribution statement Declaration of Competing Interest References |
بخشی از متن مقاله: |
Abstract
Clustering is the most widely used unsupervised machine learning technique, having extensive applications in statistical analysis. We have multiple clustering algorithms available in theory and many more implementations available in practice. A bunch of literatures can be found focusing on the quality of clustering algorithms using various internal and external evaluation techniques. The motivation behind this work is the scarcity of literatures dealing with performance of clustering algorithms in terms of turnaround time. This paper summarizes the experimental analysis conducted on the performance of multiple clustering algorithms based on cardinality and dimensionality. The analysis is performed in R, which is a free and open source programming language mainly used for statistical computing. This work evaluates nine key algorithms coming under partitioning, hierarchical, density-based and model-based clustering approaches using different social media data sets. We captured performance trends of these algorithms in terms of turnaround time by varying the cardinality and dimensionality parameters of the data sets. Based on our experiments, CLARA, CLARANS, and k-means algorithms demonstrate best performances with varying cardinality. It is also observed that changes in dimensionality do not impact hierarchical clustering approaches whereas there is a positive influence on the execution time for partitioning, density-based and model-based clustering approaches. Introduction Data mining [1] is the process of extracting meaningful information from raw data through which underlying patterns and relationships are revealed. These revelations form useful knowledge that can be made use of various scientific, educational, and/or industrial scenarios. Based on the type of patterns to be processed, we can adopt appropriate data mining strategies which include, but not limited to classification, clustering, association, regression, etc. Clustering is the machine learning technique used for creating logical groups of similar entities from a data set. The aim of clustering process is to create distinct groups of elements in such a way that the entities from the same group will have similar properties whereas entities from different groups have dissimilar properties. It is an unsupervised learning technique which is widely used for per-forming statistical analysis of data. Since the volume of data being processed is increasing on a daily basis, clustering is extensively applied in almost all industrial segments. This work covers an empirical analysis of the performance of nine different clustering algorithms [2]. We captured the average processing time for each algorithm against varying number of records (cardinality) with constant number of attributes (dimensionality), and varying number of attributes with same number of records. The experiments were conducted using two distinct social media data sets. |