مقاله انگلیسی رایگان در مورد کاوش مدارک وب هندی دو زبانه – الزویر 2016

 

مشخصات مقاله
انتشار مقاله سال 2016
تعداد صفحات مقاله انگلیسی 7 صفحه
هزینه دانلود مقاله انگلیسی رایگان میباشد.
منتشر شده در نشریه الزویر
نوع مقاله ISI
عنوان انگلیسی مقاله Mining of Bilingual Indian Web Documents
ترجمه عنوان مقاله کاوش مدارک وب هندی دو زبانه
فرمت مقاله انگلیسی  PDF
رشته های مرتبط مهندسی کامپیوتر
گرایش های مرتبط مدیریت فناوری اطلاعات، نرم افزار
مجله علوم کامپیوتر پروسیدیا – Procedia Computer Science
دانشگاه Chirala Engineering College – Chirala – India
کلمات کلیدی صفت؛ دو زبانه؛ طبقه بندی؛ استخراج محتوا؛ معدن؛ رویکرد مبتنی بر پیکسل؛ واکسل
کلمات کلیدی انگلیسی Attribute; Bilingual; Classification; Content Extraction; Mining; Pixel-based Approach; Voxel
کد محصول E7072
وضعیت ترجمه مقاله  ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله سفارش ترجمه این مقاله

 

بخشی از متن مقاله:
1. Introduction

Web and mobile communication are becoming the two main aspects of present day social and cultural information exchange and dissemination. While web and internet are major sources data and information generation, cellular communication through oral, SMS and other forms of media is opening a new dimension as language, dialect and regional flavor are the main forms used, leading to complex web/mobile data generation. This aspect in the Indian context is becoming a significant tool particularly in education, where on-line courses and distance education are gaining popularity. In this scenario, Indian web documents are quite complex and varied and pose a very interesting problem for mining and content extraction. Bilingual and in some cases multilingual communication plays a major role as present day teachers resort to using regional dialect with English words and this results in development of websites and web documents, where a DOM parser may not be helpful for data mining or content extraction. The concept of content extraction has its origin and key role in NLP, where its main use is on recognizing entities like person names and company information in news magazines and websites. Data on the web now-a-days has structured and unstructured form of documents, homogenous, heterogeneous and hybrid forms of media data and modern websites present more challenges and complexities than conventional ones. At the first level, variation in text in different Indian languages is a starting point to present the complexity and Fig. 1, shows the word ‘physics’ given in four different languages in translated form. If one looks at web pages it is even more involved and Fig. 2 shows the web page for an educational institution in Tamil Nadu, which has multilingual texts and different images integrated onto it. While English dominates there are regional dialects in Tamil language either in translated or transliterated form like ‘ANNAMALAI’, Tamil word written in English script. The present paper focuses on such web pages having bilingual web documents in Indian context. It is observed that even among Indian languages, scripts have similarities like in Telugu and Kannada; but, a general Indian webpage may have lot of variation, as many scripts are derived from Arabic, Urdu, Hindi and other Indian regional languages. Arabic and Urdu are the languages where text is written from right to left. In all other Indian regional languages text is written left to right. In Chinese language, text is written top to bottom. In the presence of so many variations in text, complexities arise when only natural language processing tools are used for content extraction and hidden knowledge discovery. That is the reason; a generic approach is needed here to give better results. In media mining translation and transliteration do not play that much difference as is observed in NLP. Since, in media mining input is treated in terms of pixel-map variations.

دیدگاهتان را بنویسید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *

دکمه بازگشت به بالا