مقاله انگلیسی رایگان در مورد کلان داده: برخی از مسائل آماری – الزویر ۲۰۱۸
مشخصات مقاله | |
انتشار | مقاله سال ۲۰۱۸ |
تعداد صفحات مقاله انگلیسی | ۱۱ صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
منتشر شده در | نشریه الزویر |
نوع مقاله | ISI |
عنوان انگلیسی مقاله | Big data: Some statistical issues |
ترجمه عنوان مقاله | کلان داده: برخی از مسائل آماری |
فرمت مقاله انگلیسی | |
رشته های مرتبط | آمار |
مجله | آمار و احتمال نامه ها – Statistics and Probability Letters |
دانشگاه | Medical Research Council Population Health Research Unit – University of Oxford – UK |
شناسه دیجیتال – doi | https://doi.org/10.1016/j.spl.2018.02.015 |
کد محصول | E8254 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
بخشی از متن مقاله: |
۱ Introduction
Over the last 125 years computational techniques have evolved from slide rule and log tables, through hand operated machines like the Brunsviga, to electric desk-top machines, and from them to modern computers, at first complex to use and limited in scope and then to the ever expanding modern ubiquitous version. The development of statistical technique and theory over that time has mirrored and been strongly influenced by that growth in computer power and availability. Big data have been around a long time, for example in population censuses. In an engineering context, paper traces recorded such properties as the stress at various points in an aircraft wing during flight. In a manufacturing context, the mass per unit length of textile yarn was recorded. These examples produced very large amounts of data for visual inspection, but in the past suitable for quantitative analysis at most on a sampling basis. Three questions that characterize today’s big data are largely absent from these earlier contexts. In outline the questions are: Are the data relevant for the purpose of the investigation? Is the data quality adequate for its intended purpose? Is the detailed statistical analysis appropriate, in particular is the assessment of the precision of the conclusions seriously overoptimistic? Sometimes the first two aspects may be inverted: the data are available, for what are they useful? We comment on these issues largely, but not entirely, from an epidemiological perspective. In an epidemiological context, large data sets with many individuals arise from routinely collected medical records, from cohorts assembled with a defined objective, and from registries of patients with specific conditions. Some large population-based studies are of mixed type, in that they are cohorts with a purpose-built baseline data set augmented by linkage to routinely collected records or registries. Many aspects of study design and analysis are common to large and not-so-large sets of data but the achievement of high quality in large sets of data may be a particular challenge. There are a number of conceptual aspects of a study all of which may have statistical implications. These are: Question formulation; Choice of study population; Study design; Metrology; Data collection; Monitoring and quality control; Data analysis; Presentation of conclusions; Interpretation. When big data are involved all of these may raise special features. Here we concentrate largely but not entirely on the aspects prior to data analysis. |