مشخصات مقاله | |
ترجمه عنوان مقاله | محاسبه مجدد انتخابی و مکرر از وظایف تحلیل های کلان داده: نگرشی از مطالعه موردی ژنومیک |
عنوان انگلیسی مقاله | Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study |
انتشار | مقاله سال 2018 |
تعداد صفحات مقاله انگلیسی | 19 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
پایگاه داده | نشریه الزویر |
نوع نگارش مقاله |
مقاله پژوهشی (Research Article) |
مقاله بیس | این مقاله بیس میباشد |
نمایه (index) | Scopus – Master Journal List |
نوع مقاله | ISI |
فرمت مقاله انگلیسی | |
ایمپکت فاکتور(IF) |
7.184 در سال 2017 |
شاخص H_index | 12 در سال 2019 |
شاخص SJR | 0.757 در سال 2017 |
شناسه ISSN | 2214-5796 |
شاخص Quartile (چارک) | Q1 در سال 2017 |
رشته های مرتبط | مدیریت، مهندسی فناوری اطلاعات |
گرایش های مرتبط | مدیریت فناوری اطلاعات، مدیریت سیستم های اطلاعاتی، مدیریت دانش |
نوع ارائه مقاله |
ژورنال |
مجله | تحقیقات کلان داده – Big Data Research |
دانشگاه | School of Computing, Newcastle University, Newcastle upon Tyne, UK |
کلمات کلیدی | محاسبه مجدد، فروپاشی دانش، تحلیل داده های بزرگ، ژنومیک |
کلمات کلیدی انگلیسی | Re-computation، Knowledge decay، Big data analysis، Genomics |
شناسه دیجیتال – doi |
https://doi.org/10.1016/j.bdr.2018.06.001 |
کد محصول | E11083 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
فهرست مطالب مقاله: |
Abstract
1- Introduction 2- A generic meta-process for selective re-computation 3- Related work 4- Experimental setting and blind re-computation baseline 5- Data differences 6- Differential execution 7- Partial re-execution 8- Identifying the scope of change 9- A blueprint for a generic and automated re-computation framework – challenges 10- Conclusions and future work References |
بخشی از متن مقاله: |
Abstract The value of knowledge assets generated by analytics processes using Data Science techniques tends to decay over time, as a consequence of changes in the elements the process depends on: external data sources, libraries, and system dependencies. For large-scale problems, refreshing those outcomes through greedy re-computation is both expensive and inefficient, as some changes have limited impact. In this paper we address the problem of refreshing past process outcomes selectively, that is, by trying to identify the subset of outcomes that will have been affected by a change, and by only re-executing fragments of the original process. We propose a technical approach to address the selective re-computation problem by combining multiple techniques, and present an extensive experimental study in Genomics, namely variant calling and their clinical interpretation, to show its effectiveness. In this case study, we are able to decrease the number of required re-computations on a cohort of individuals from 495 (blind) down to 71, and that we can reduce runtime by at least 60% relative to the naïve blind approach, and in some cases by 90%. Starting from this experience, we then propose a blueprint for a generic re-computation meta-process that makes use of process history metadata to make informed decisions about selective re-computations in reaction to a variety of changes in the data. Introduction In Data Science applications, the insights generated by resourceintensive data analytics processes may become outdated as a consequence of changes in any of the elements involved in the process. Changes that cause instability include updates to reference data sources, to software libraries, and changes to system dependencies, as well as to the structure of the process itself. We address the problem of efficiently restoring the currency of analytics outcomes in the presence of instability. This involves a trade-off between the recurring cost of process update and re-execution in the presence of changes on one side, and the diminishing value of its obsolete outcomes, on the other. Addressing the problem therefore requires knowledge of the impact of a change, that is, to which extent the change invalidates the analysis, as well as of the cost involved in upgrading the process and running the analysis again. Additionally, it may be possible to optimise the re-analysis given prior outcomes and detailed knowledge of, and control over, the analysis process. |