مقاله انگلیسی رایگان در مورد SemLinker: اتوماسیون کلان داده ها برای کاربران – اسپرینگر ۲۰۱۸

مقاله انگلیسی رایگان در مورد SemLinker: اتوماسیون کلان داده ها برای کاربران – اسپرینگر ۲۰۱۸

 

مشخصات مقاله
انتشار مقاله سال ۲۰۱۸
تعداد صفحات مقاله انگلیسی ۲۶ صفحه
هزینه دانلود مقاله انگلیسی رایگان میباشد.
منتشر شده در نشریه اسپرینگر
نوع مقاله ISI
عنوان انگلیسی مقاله SemLinker: automating big data integration for casual users
ترجمه عنوان مقاله SemLinker: اتوماسیون یکپارچگی کلان داده ها برای کاربران تصادفی
فرمت مقاله انگلیسی  PDF
رشته های مرتبط کامپیوتر، فناوری اطلاعات
گرایش های مرتبط داده کاوی
مجله مجله کلان داده – Journal of Big Data
دانشگاه School of Computer Science and Informatics – Cardif University – UK
کلمات کلیدی ادغام داده ها، داده های بزرگ، دریاچه داده، مدل سازی، تکامل طرح، نقشه برداری نقشه ها، مدیریت فراداده
کلمات کلیدی انگلیسی  Data integration, Big data, Data lake, Modeling, Schema evolution, Schema mapping, Metadata management
کد محصول E6445
وضعیت ترجمه مقاله  ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله سفارش ترجمه این مقاله

 

بخشی از متن مقاله:
Introduction

Big data is growing rapidly from an increasing plurality of sources, ranging from machine-generated content such as purchase transactions and sensor streams, to human-generated content such as social media and product reviews. Although much of these data are accessible online, their integration is inherently a complex task, and, in most cases, is not performed fully automatically but through manual interactions [1, 2]. Typically, data must go through a process called ETL (Extract, Transform, Load) [3] where they are extracted from their sources, cleaned, transformed, and mapped to a common data model before they are loaded into a central repository, integrated with other data, and made available for analysis. Recently the concept of a data lake [4], a fat repository framework that holds a vast amount of raw data in their native formats including structured, semi-structured, and unstructured data, has emerged in the data management feld. Compared with the monolithic view of a single data model emphasized by the ETL process, a data lake is a more dynamic environment that relaxes data capturing constraints and defers data modeling and integration requirements to a later stage in the data lifecycle, resulting in an almost unlimited potential for ingesting and storing various types of data despite their sources and frequently changing schemas, which are often not known in advance [5]. In one of our earlier papers [6], we propose personal data lake (PDL), an exemplar of this fexible and agile storage solution. PDL ingests raw personal data scattered across a multitude of remote data sources and stores them in a unifed repository regardless of their formats and structures. Although a data lake like PDL, to some extent, contributes towards solving the big data variety challenge, data integration remains an open problem. PDL allows its users to ingest raw data instances directly from the data sources, but the data extraction and integration workfow, without predefned schemas or machine-readable semantics to describe the data, is not straightforward. Often the user has to study the documentation of each data source to enable suitable integration [7]. An enterprise data lake system built with Hadoop [8] would rely on professionals and experts playing active roles in the data integration workfow. PDL, however, is designed for ordinary people, and has no highly trained and skilled IT personnel to physically manage its contents. To this end, equipping PDL with an efcient and easy-to-use data integration solution is essential for casual users and allows them to process, query, and analyze their data, and to gain insights for supporting their decision-making [9].

ثبت دیدگاه