مشخصات مقاله | |
انتشار | مقاله سال 2018 |
تعداد صفحات مقاله انگلیسی | 10 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
منتشر شده در | نشریه الزویر |
نوع مقاله | ISI |
عنوان انگلیسی مقاله | Empirical Analysis of Data Clustering Algorithms |
ترجمه عنوان مقاله | تحلیل تجربی الگوریتم های خوشه بندی اطلاعات |
فرمت مقاله انگلیسی | |
رشته های مرتبط | مهندسی کامپیوتر، فناوری اطلاعات |
گرایش های مرتبط | الگوریتم ها و محاسبات |
مجله | علوم کامپیوتر پروسیدیا – Procedia Computer Science |
دانشگاه | Dept. of Computer Engineering & IT – VJTI – Mumbai – India |
کلمات کلیدی | الگوریتم خوشه بندی، ساختار جامعه، یادگیری بدون نظارت |
کلمات کلیدی انگلیسی | Clustering algorithms; Community structure; Unsupervised learning |
کد محصول | E7602 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
بخشی از متن مقاله: |
1. Introduction
As the Digital transformation of the society gathers pace, there is an increase in proliferation of technologies that simplify the process of recording data efficiently. Low cost sensors, RF-IDs , Internet enabled Point of Sales terminals are an example of such data capturing devices that have invaded our lives. The easy availability of such devices and the resultant simplification of operations due to them has generated repositories of data that previously didn’t exist. Today, there exist many areas where voluminous amount of data gets generated every second and is processed and stored such fields are social networks, sensor networks, cloud storages etc. This has boosted the fields of machine Even though such a volume provides huge opportunities to academia and industry it also represents problems for efficient analysis and retrieval [1]. To mitigate the exponential time and space needed for such operations data is compacted into meaningful summaries i.e. Exploratory Data Analysis [E.D.A.] which shall eliminate the need for storing data in unsupervised learning literature such summaries are equivalent to ”clusters”. E.D.A. helps in visualization and promotes better understanding of the data. It utilizes methods that are at the intersection of machine learning, pattern recognition and information retrieval. Cluster analysis is the main task performed in it. A Cluster in a data is defined objectively using dissimilarity measures such as edit distance, density in a euclidean or non euclidean data space, distance calculated using Minkowski measures, proximity measures or probability distributions. All measures concur that a threshold value should be set for grouping of objects in a cluster and objects which exceed such a threshold are dissimilar and should be separated from the cluster. Clustering gives a better representation of the data since all objects within a cluster have less variability in their attributes and they can be summarized efficiently. Clustering has found applications in other fields like estimating the missing values in data or identifying outliers in data. Clustering is thus a meta learning approach for getting insights into data and in diverse domains such as Market Research, E-Commerce, Social Network Analysis and Aggregation of Search Results among-st others. Multiple algorithms exist for organizing data into clusters however there is no universal solution to all problems. No consensus exists on the ”best” algorithm as each is designed with certain assumptions and has its own biases. These algorithms can be grouped into methodologies such as Partitioning based, hierarchical , density based, grid based, message passing based, neural network based, probabilistic and generative model based. However in terms of complexity it is a NP-hard grouping problem and so existing algorithms rely on approximation techniques or heuristics to reduce the search space in order to find the optimal solution. There is no universally agreed objective criteria for correctness or clustering validity and each of these algorithms has its own drawbacks and successes in solving the challenging problem of unsupervised clustering [3] [4] . |