مقاله انگلیسی رایگان در مورد محاسبه مجدد انتخابی و مکرر از وظایف تحلیل های کلان داده - الزویر 2018

مشخصات مقاله
ترجمه عنوان مقاله	محاسبه مجدد انتخابی و مکرر از وظایف تحلیل های کلان داده: نگرشی از مطالعه موردی ژنومیک
عنوان انگلیسی مقاله	Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study
انتشار	مقاله سال 2018
تعداد صفحات مقاله انگلیسی	19 صفحه
هزینه	دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده	نشریه الزویر
نوع نگارش مقاله	مقاله پژوهشی (Research Article)
مقاله بیس	این مقاله بیس میباشد
نمایه (index)	Scopus – Master Journal List
نوع مقاله	ISI
فرمت مقاله انگلیسی	PDF
ایمپکت فاکتور(IF)	7.184 در سال 2017
شاخص H_index	12 در سال 2019
شاخص SJR	0.757 در سال 2017
شناسه ISSN	2214-5796
شاخص Quartile (چارک)	Q1 در سال 2017
رشته های مرتبط	مدیریت، مهندسی فناوری اطلاعات
گرایش های مرتبط	مدیریت فناوری اطلاعات، مدیریت سیستم های اطلاعاتی، مدیریت دانش
نوع ارائه مقاله	ژورنال
مجله	تحقیقات کلان داده – Big Data Research
دانشگاه	School of Computing, Newcastle University, Newcastle upon Tyne, UK
کلمات کلیدی	محاسبه مجدد، فروپاشی دانش، تحلیل داده های بزرگ، ژنومیک
کلمات کلیدی انگلیسی	Re-computation، Knowledge decay، Big data analysis، Genomics
شناسه دیجیتال – doi	https://doi.org/10.1016/j.bdr.2018.06.001
کد محصول	E11083
وضعیت ترجمه مقاله	ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله	دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله	سفارش ترجمه این مقاله

فهرست مطالب مقاله:

Abstract

1- Introduction

2- A generic meta-process for selective re-computation

3- Related work

4- Experimental setting and blind re-computation baseline

5- Data differences

6- Differential execution

7- Partial re-execution

8- Identifying the scope of change

9- A blueprint for a generic and automated re-computation framework – challenges

10- Conclusions and future work

References

بخشی از متن مقاله:

Abstract

The value of knowledge assets generated by analytics processes using Data Science techniques tends to decay over time, as a consequence of changes in the elements the process depends on: external data sources, libraries, and system dependencies. For large-scale problems, refreshing those outcomes through greedy re-computation is both expensive and inefficient, as some changes have limited impact. In this paper we address the problem of refreshing past process outcomes selectively, that is, by trying to identify the subset of outcomes that will have been affected by a change, and by only re-executing fragments of the original process. We propose a technical approach to address the selective re-computation problem by combining multiple techniques, and present an extensive experimental study in Genomics, namely variant calling and their clinical interpretation, to show its effectiveness. In this case study, we are able to decrease the number of required re-computations on a cohort of individuals from 495 (blind) down to 71, and that we can reduce runtime by at least 60% relative to the naïve blind approach, and in some cases by 90%. Starting from this experience, we then propose a blueprint for a generic re-computation meta-process that makes use of process history metadata to make informed decisions about selective re-computations in reaction to a variety of changes in the data.

Introduction

In Data Science applications, the insights generated by resourceintensive data analytics processes may become outdated as a consequence of changes in any of the elements involved in the process. Changes that cause instability include updates to reference data sources, to software libraries, and changes to system dependencies, as well as to the structure of the process itself. We address the problem of efficiently restoring the currency of analytics outcomes in the presence of instability. This involves a trade-off between the recurring cost of process update and re-execution in the presence of changes on one side, and the diminishing value of its obsolete outcomes, on the other. Addressing the problem therefore requires knowledge of the impact of a change, that is, to which extent the change invalidates the analysis, as well as of the cost involved in upgrading the process and running the analysis again. Additionally, it may be possible to optimise the re-analysis given prior outcomes and detailed knowledge of, and control over, the analysis process.

مقاله انگلیسی رایگان در مورد محاسبه مجدد انتخابی و مکرر از وظایف تحلیل های کلان داده – الزویر 2018

دیدگاهتان را بنویسید لغو پاسخ