مقاله انگلیسی رایگان در مورد مدیریت فراداده برای پایگاه داده های علمی – الزویر ۲۰۱۹

مقاله انگلیسی رایگان در مورد مدیریت فراداده برای پایگاه داده های علمی – الزویر ۲۰۱۹

 

مشخصات مقاله
ترجمه عنوان مقاله مدیریت فراداده برای پایگاه داده های علمی
عنوان انگلیسی مقاله Metadata management for scientific databases
انتشار مقاله سال ۲۰۱۹
تعداد صفحات مقاله انگلیسی ۲۰ صفحه
هزینه دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده نشریه الزویر
نوع نگارش مقاله
مقاله پژوهشی (Research Article)
مقاله بیس این مقاله بیس نمیباشد
نمایه (index) Scopus – Master Journals List – JCR
نوع مقاله ISI
فرمت مقاله انگلیسی  PDF
ایمپکت فاکتور(IF)
۳٫۱۷۶ در سال ۲۰۱۸
شاخص H_index ۷۶ در سال ۲۰۱۹
شاخص SJR ۰٫۷۷۹ در سال ۲۰۱۸
شناسه ISSN ۰۳۰۶-۴۳۷۹
شاخص Quartile (چارک) Q1 در سال ۲۰۱۸
مدل مفهومی ندارد
پرسشنامه ندارد
متغیر ندارد
رفرنس دارد
رشته های مرتبط مهندسی فناوری اطلاعات، کامپیوتر
گرایش های مرتبط مدیریت سیستم های اطلاعات
نوع ارائه مقاله
ژورنال
مجله  سیستم های اطلاعاتی – Information Systems
دانشگاه Politecnico di Milano, Italy
کلمات کلیدی مدیریت فراداده، بانکهای اطلاعاتی علمی، بهینه سازی پرس و جو
کلمات کلیدی انگلیسی Metadata management، Scientific databases، Query optimization
شناسه دیجیتال – doi
https://doi.org/10.1016/j.is.2018.10.002
کد محصول E13234
وضعیت ترجمه مقاله  ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله سفارش ترجمه این مقاله

 

فهرست مطالب مقاله:
Abstract

۱- Introduction and motivation

۲- Scientific data model

۳- Scientific query language

۴- Optimization of ScQL queries

۵- Applicability of the approach

۶- Related work

۷- Conclusions

References

 

بخشی از متن مقاله:

Abstract

Most scientific databases consist of datasets (or sources) which in turn include samples (or files) with an identical structure (or schema). In many cases, samples are associated with rich metadata, describing the process that leads to building them (e.g.: the experimental conditions used during sample generation). Metadata are typically used in scientific computations just for the initial data selection; at most, metadata about query results is recovered after executing the query, and associated with its results by post-processing. In this way, a large body of information that could be relevant for interpreting query results goes unused during query processing.
In this paper, we present ScQL, a new algebraic relational language, whose operations apply to objects consisting of data–metadatapairs, by preserving such one-to-one correspondence throughout the computation. We formally define each operation and we describe an optimization, called meta-first, that may significantly reduce the query processing overhead by anticipating the use of metadata for selectively loading into the execution environment only those input samples that contribute to the result samples.
In ScQL, metadata have the same relevance as data, and contribute to building query results; in this way, the resulting samples are systematically associated with metadata about either the specific input samples involved or about query processing, thereby yielding a new form of metadata provenance. We present many examples of use of ScQL, relative to several application domains, and we demonstrate the effectiveness of the meta-first optimization.

Introduction and motivation

The organizations of scientific databases are very different. In many scientific fields, such as biology and astronomy, big consortia produce large, well-organized data repositories for public use. In other contexts, such as public administrations, data are open but much less organized and much more dispersed. Other big data players, such as Internet companies or mobile phone operators, produce information mostly for internal use, but often support third parties in research studies (e.g., about consumers’ interests) by providing them with services for data retrieval. We abstract a scientific data source as a container of several datasets, that in turn consists of thousands of samples, one for each experimental condition, often stored as files and not within a database; typically, samples are described by metadata, i.e., descriptive information about the content and production process of each sample. In meteorology, typical metadata describe ‘‘the WDM station, the sources of meteorological data, and the period of record for which the data is available’’; then the samples describe millions of records registered at the station. In genomics, typical metadata describe ‘‘the technology used for DNA sequencing, the process of DNA preparation, the genotype and phenotype of the donor’’; then, samples describe millions of genomic regions collected during the experiment. Metadata support the selection of the relevant experimental data by means of user interfaces (e.g. see genomic repositories such as ENCODE (the Encyclopedia of Genomic Elements, [1]) or TCGA (The Cancer Genome Atlas, [2]). When a source exposes APIs or WEB interfaces, metadata associated to each sample (such as Twitter’s hashtags or timestamps) support data retrieval.

ثبت دیدگاه