مقاله انگلیسی رایگان در مورد دانش انتخاب مجموعه بر اساس بازیابی اطلاعات توزیع شده (الزویر)

مشخصات مقاله
انتشار	مقاله سال 2018
تعداد صفحات مقاله انگلیسی	13 صفحه
هزینه	دانلود مقاله انگلیسی رایگان میباشد.
منتشر شده در	نشریه الزویر
نوع مقاله	ISI
عنوان انگلیسی مقاله	Knowledge based collection selection for distributed information retrieval
ترجمه عنوان مقاله	دانش انتخاب مجموعه بر اساس بازیابی اطلاعات توزیع شده
فرمت مقاله انگلیسی	PDF
رشته های مرتبط	مهندسی کامپیوتر
گرایش های مرتبط	مهندسی نرم افزار
مجله	پردازش و مدیریت اطلاعات – Information Processing & Management
دانشگاه	College of Computer Science and Technology – Zhejiang University – China
کلمات کلیدی	انتخاب مجموعه، بازیابی اطلاعات توزیع شده، پایگاه دانش، توسعه پرس و جو
کد محصول	E5645
وضعیت ترجمه مقاله	ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله	دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله	سفارش ترجمه این مقاله

بخشی از متن مقاله:

1. Introduction

Distributed Information Retrieval (DIR), also known as Federated Search (FS) or Federated IR (FIR), concerns with aggregating multiple searchable sources of information under a single interface (Crestani & Markov, 2013). DIR consists of four main phases: collection (server/resource) description, collection selection, results merging, and results presentation. Given a query and a set of collection descriptions, collection selection ranks available collections based on their computed scores, then determines which collections to search (Callan, 2002). In a specific search circumstance, users are often interested in top-ranked search results. However, not all collections contain information that users need. If search engine only retrieve a small number of collections and get a similar effect to retrieve all collections, it would significantly enhance the efficiency of retrieval system. Collection selection plays an important role in reducing computational overhead and improving retrieval efficiency. Recent years have seen a great deal of work on collection selection, which can be divided according to the mechanism to describe a collection: dictionary-based methods (Aly, Hiemstra, & Demeester, 2013, Callan, Lu, & Croft, 1995, Gravano & Garcia-Molina, 1995, Xu & Croft, 1999, Yuwono & Lee, 1997) and sampling-based methods (Baillie, Carman, & Crestani, 2011, Kulkarni, Tigelaar, Hiemstra, & Callan, 2012, Mendoza, Marín, Gil-Costa, & Ferrarotti, 2016, Paltoglou, Salampasis, & Satratzemi, 2011, Shokouhi, 2007, Shokouhi, Zobel, Tahaghoghi, & Scholer, 2007, Si & Callan, 2003, Thomas & Shokouhi, 2009, Wauer, Schuster, & Schill, 2011).Dictionary-based methods use the word statistics of all documents as collection description, and then exploit a scoring function to reflect the similarity between a collection and a query. However, it is unfeasible to acquire the word statistics of all collections in an uncollaborative distributed information retrieval environment. Another problem is that the scoring function based on word statistics loses a large amount of semantic information in calculating collection score, e.g., synonym, polysemy, and the order of words. These methods also have a low effectiveness in the environment of skewed collection sizes.

دیدگاهتان را بنویسید لغو پاسخ