مشخصات مقاله | |
انتشار | مقاله سال 2018 |
تعداد صفحات مقاله انگلیسی | 13 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
منتشر شده در | نشریه الزویر |
نوع مقاله | ISI |
عنوان انگلیسی مقاله | Knowledge based collection selection for distributed information retrieval |
ترجمه عنوان مقاله | دانش انتخاب مجموعه بر اساس بازیابی اطلاعات توزیع شده |
فرمت مقاله انگلیسی | |
رشته های مرتبط | مهندسی کامپیوتر |
گرایش های مرتبط | مهندسی نرم افزار |
مجله | پردازش و مدیریت اطلاعات – Information Processing & Management |
دانشگاه | College of Computer Science and Technology – Zhejiang University – China |
کلمات کلیدی | انتخاب مجموعه، بازیابی اطلاعات توزیع شده، پایگاه دانش، توسعه پرس و جو |
کد محصول | E5645 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
بخشی از متن مقاله: |
1. Introduction
Distributed Information Retrieval (DIR), also known as Federated Search (FS) or Federated IR (FIR), concerns with aggregating multiple searchable sources of information under a single interface (Crestani & Markov, 2013). DIR consists of four main phases: collection (server/resource) description, collection selection, results merging, and results presentation. Given a query and a set of collection descriptions, collection selection ranks available collections based on their computed scores, then determines which collections to search (Callan, 2002). In a specific search circumstance, users are often interested in top-ranked search results. However, not all collections contain information that users need. If search engine only retrieve a small number of collections and get a similar effect to retrieve all collections, it would significantly enhance the efficiency of retrieval system. Collection selection plays an important role in reducing computational overhead and improving retrieval efficiency. Recent years have seen a great deal of work on collection selection, which can be divided according to the mechanism to describe a collection: dictionary-based methods (Aly, Hiemstra, & Demeester, 2013, Callan, Lu, & Croft, 1995, Gravano & Garcia-Molina, 1995, Xu & Croft, 1999, Yuwono & Lee, 1997) and sampling-based methods (Baillie, Carman, & Crestani, 2011, Kulkarni, Tigelaar, Hiemstra, & Callan, 2012, Mendoza, Marín, Gil-Costa, & Ferrarotti, 2016, Paltoglou, Salampasis, & Satratzemi, 2011, Shokouhi, 2007, Shokouhi, Zobel, Tahaghoghi, & Scholer, 2007, Si & Callan, 2003, Thomas & Shokouhi, 2009, Wauer, Schuster, & Schill, 2011).Dictionary-based methods use the word statistics of all documents as collection description, and then exploit a scoring function to reflect the similarity between a collection and a query. However, it is unfeasible to acquire the word statistics of all collections in an uncollaborative distributed information retrieval environment. Another problem is that the scoring function based on word statistics loses a large amount of semantic information in calculating collection score, e.g., synonym, polysemy, and the order of words. These methods also have a low effectiveness in the environment of skewed collection sizes. |