مقاله انگلیسی رایگان در مورد ابزاری برای تحلیل های آماری در شبکه کلان داده ها

مشخصات مقاله
ترجمه عنوان مقاله	ابزاری برای تحلیل های آماری در شبکه کلان داده ها
عنوان انگلیسی مقاله	A Tool for Statistical Analysis on Network Big Data
انتشار	مقاله سال 2017
تعداد صفحات مقاله انگلیسی	5 صفحه
هزینه	دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده	نشریه IEEE
مقاله بیس	این مقاله بیس نمیباشد
فرمت مقاله انگلیسی	PDF
رشته های مرتبط	مهندسی فناوری اطلاعات
گرایش های مرتبط	مدیریت سیستم های اطلاعات
مجله / کنفرانس	کارگاه بین المللی کاربرد پایگاه داده ها و سیستم های کارشناس – International Workshop on Database and Expert Systems Applications
دانشگاه	USA. C. Ordonez current affiliation – University of Houston – USA
شناسه دیجیتال – doi	https://doi.org/10.1109/DEXA.2017.23
کد محصول	E10391
وضعیت ترجمه مقاله	ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله	دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله	سفارش ترجمه این مقاله

فهرست مطالب مقاله:

Abstract
I INTRODUCTION
II RELATED WORK
III SYSTEM DESCRIPTION
IV STATISTICAL ANALYSIS ON NETWORK BIG DATA
V CONCLUSIONS
REFERENCES

بخشی از متن مقاله:

Abstract

Due to advances in parallel file systems for big data (i.e. HDFS) and larger capacity hardware (multicore CPUs, large RAM) it is now feasible to manage and query network data in a parallel DBMS supporting SQL, but performing statistical analysis remains a challenge. On the statistics side, the R language is popular, but it presents important limitations: R is limited by main memory, R works in a different address space from query processing, R cannot analyze large diskresident data sets efficiently, and R has no data management capabilities. Moreover, some R libraries allow R to work in parallel, but without data management capabilities. Considering the challenges and limitations described above, we present a system that allows combining SQL queries and R functions in a seamless manner. We justify a parallel DBMS and the R runtime are two different systems that benefit from a low-level integration. Our parallel DBMS is built on top of HDFS, programmed in Java and C++, with a flexible scale out architecture, whereas R is programmed purely in C. The user or developer can make calls in both directions: (1) R calling SQL, to evaluate analytic queries or retrieve data from materialized views (transferring result tables in RAM in a streaming fashion and analyzing them in R), and vice-versa (2) SQL calling R, allowing SQL to convert relational tables to matrices or vectors and making complex computations on them. We give a summary of network monitoring tasks at ATT and present specific programming examples, showing language calls in both directions (i.e. R calls SQL, SQL calls R).

INTRODUCTION

Big data is characterized by the 3 Vs: volume, variety and velocity of data, where analyzing data is a central goal. It is fair to say that managing and analyzing network data is more difficult than other big data problems due to its streaming velocity, higher volume and format variety. That is, it has three more complicated Vs. Big data analytics is notoriously difficult. This problem becomes orders of magnitude harder with network big data due to its higher volume, streaming behavior and format varying over time. In this paper, we study how to perform statistical processing on a network database [4], integrating diverse data streams (not packetlevel data, but network data summaries over time). Computer Science “systems” research has has proposed systems with optimized storage [10] for specialized processing based on rows, columns, and arrays [9]. Most common targets include transactions, queries, detecting patterns and computing mathematical models. In our work we focus on the last one. Streams represent a further challenge, where processing is pushed to main memory, with algorithms working in one pass. On the data mining side there are tons of research proposing algorithms for large data sets, but working mostly on flat files, outside a DBMS. However, integrating statistical systems, like R, with a database system is still a challenge. R is one of the most popular open-source system to perform statistical analysis due to its simple, but powerful, functional language, extensive mathematical library, and interpreted runtime. Unfortunately, as noted in the literature, even though every vendor offers some integration between R and the DBMS, R remains difficult to use and slow to analyze high-velocity streams. From a practical perspective, SQL remains the standard query language for database systems, but it is difficult to predict which language will be the standard for big data analytics: R has a proven track record. With that motivation in mind, we introduce STAR, a system to analyze network data integrating the R runtime with a parallel DBMS for big data supporting standard SQL queries and materialized views. Unlike other R tools and prototypes, STAR can directly process relational tables, truly performing “in-database” analytics. We emphasize that STAR enables analytics in both directions closing the analytic loop: (a) An R program can call SQL queries. (b) An SQL query can call R functions.

مقاله انگلیسی رایگان در مورد ابزاری برای تحلیل های آماری در شبکه کلان داده ها – IEEE 2017

دیدگاهتان را بنویسید لغو پاسخ