مقاله انگلیسی رایگان در مورد عملکرد نگاشت کاهش هادوپ در SSD ها برای تحلیل شبکه های اجتماعی – الزویر 2018

 

مشخصات مقاله
ترجمه عنوان مقاله عملکرد نگاشت کاهش هادوپ در SSD ها برای تحلیل شبکه های اجتماعی
عنوان انگلیسی مقاله Hadoop MapReduce Performance on SSDs for Analyzing Social Networks
انتشار مقاله سال 2018
تعداد صفحات مقاله انگلیسی 26 صفحه
هزینه دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده نشریه الزویر
نوع نگارش مقاله
مقاله پژوهشی (Research Article)
مقاله بیس این مقاله بیس نمیباشد
نمایه (index) Scopus – Master Journal List
نوع مقاله ISI
فرمت مقاله انگلیسی  PDF
ایمپکت فاکتور(IF)
7.184 در سال 2017
شاخص H_index 12 در سال 2019
شاخص SJR 0.757 در سال 2017
شناسه ISSN 2214-5796
شاخص Quartile (چارک) Q1 در سال 2017
رشته های مرتبط مهندسی فناوری اطلاعات، مهندسی کامپیوتر
گرایش های مرتبط اینترنت و شبکه های گسترده، مهندسی نرم افزار، شبکه های کامپیوتری، معماری سیستم های کامپیوتری
نوع ارائه مقاله
ژورنال
مجله  تحقیقات کلان داده – Big Data Research
دانشگاه  University of Thessaly, Department of Electrical & Computer Engineering, Volos, Greece
کلمات کلیدی نگاشت کاهش، هادوپ، دیسک های حالت جامد، دیسک های مغناطیسی، شبکه های اجتماعی، کلان داده
کلمات کلیدی انگلیسی MapReduce، Hadoop، Solid state disks، Magnetic disks، Social networks، Big data
شناسه دیجیتال – doi
https://doi.org/10.1016/j.bdr.2017.06.001
کد محصول E11089
وضعیت ترجمه مقاله  ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله سفارش ترجمه این مقاله

 

فهرست مطالب مقاله:
Abstract

1- Introduction

2- Related work

3- Hadoop structure

4- Investigated algorithms

5- Experimental environment and results

6- Conclusions

References

بخشی از متن مقاله:

Abstract

The advent of Solid State Drives (SSDs) stimulated a lot of research to investigate and exploit to the extent possible the potentials of the new drive. The focus of this work is on the investigation of the relative performance and benefits of SSDs versus hard disk drives (HDDs) when they are used as underlying storage for Hadoop’s MapReduce. In particular, we depart from all earlier relevant works in that we do not use their workloads, but examine MapReduce tasks and data suitable for performing analysis of complex networks which present different execution patterns. Despite the plethora of algorithms and implementations for complex network analysis, we carefully selected our “benchmarking methods” so that they include methods that perform both local and network-wide operations in a complex network, and also they are generic enough in the sense that they can be used as primitives for more sophisticated network processing applications. We evaluated the performance of SSDs and HDDs by executing these algorithms on real social network data and excluding the effects of network bandwidth which can severely bias the results. The obtained results confirmed in part earlier studies which showed that SSDs are beneficial to Hadoop. However, we also provided solid evidence that the processing pattern of the running application has a significant role, and thus future studies must not blindly add SSDs to Hadoop, but they should build components for assessing the type of processing pattern of the application and then direct the data to the appropriate storage medium.

Introduction

A complex network is a graph with topological features such as scale-free properties, existence of communities, hubs, and so on that is used to model real systems, for example, technological (Web, Internet, power grid, online social networks) networks, biological networks (gene, protein), social networks [23]. The analysis of online social networks (OSNs) such as Facebook, Twitter, Instagram has received significant attention because all these networks store and process colossal volumes of data, mainly in the form of pair-wise interactions, thus giving birth to networks, i.e., graphs which record persons’ interactions whose analysis and mining offers both operational and business advantages to the OSN owner. Modern OSNs are comprised by millions of nodes and even billions of edges; therefore any algorithm for their analysis that relies on a single machine (centralized) – exploiting solely the machine’s main memory and/or its disk – is eventually doomed to fail due to lack of resources. Thus, the digitization of the aforementioned relationships produces a vast amount of collected data, i.e., big data [9] requiring extreme processing power that only distributed computing can offer. However, developing a distributed solution is a challenging task because it must deal sometimes with sequential processes. Some analysis algorithms based on distributed solutions that can run only on a small cluster of machines are still insufficient, since modern OSNs are maintained by Internet giants such as Google, LinkedIn and Facebook who own huge datacenters and operate clusters of several thousand machines. These clusters are usually programmed by data-parallel frameworks of the MapReduce type [4], a big data analytics platform. The Hadoop [29] middleware was designed to solve problems where the “same, repeated processing” had to be applied to peta-scale volumes of data. Hadoop’s initial design was based on magnetic disk’s characteristics, enforcing sequential read and write operations introducing its own distributed file system (HDFS – Hadoop Distributed File System) with blocks of large size. Recently with the advent of faster Solid State Drives (SSDs) research is emerging to test and possibly to exploit the potential of the new technologically advanced drive [11, 12, 21, 33]. The lack of seek overhead gives them a significant advantage with respect to Hard Disk Drives (HDDs) for workloads whose processing requires random access instead of sequential access.

دیدگاهتان را بنویسید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *

دکمه بازگشت به بالا