مشخصات مقاله | |
ترجمه عنوان مقاله | رویکرد شبکه عصبی پیچشی برای پیش بینی مکان پلی آدنیله سازی |
عنوان انگلیسی مقاله | DeepPolyA: a convolutional neural network approach for polyadenylation site prediction |
انتشار | مقاله سال 2018 |
تعداد صفحات مقاله انگلیسی | 10 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
پایگاه داده | نشریه IEEE |
مقاله بیس | این مقاله بیس نمیباشد |
نمایه (index) | scopus – master journals – JCR – DOAJ |
نوع مقاله | ISI |
فرمت مقاله انگلیسی | |
ایمپکت فاکتور(IF) |
3.557 در سال 2017 |
شاخص H_index | 36 در سال 2018 |
شاخص SJR | 0.548 در سال 2018 |
رشته های مرتبط | مهندسی کامپیوتر، فناوری اطلاعات |
گرایش های مرتبط | الگوریتم ها و محاسبات، هوش مصنوعی، شبکه های کامپیوتری |
نوع ارائه مقاله |
ژورنال |
مجله / کنفرانس | IEEE Access |
دانشگاه | Department of Computer Science – New Jersey Institute of Technology – USA |
کلمات کلیدی | پیش بینی پلی آدنیله سازی، یادگیری عمیق، شبکه عصبی چند لایه، کشف موتیف، ژنومیک و الگوریتم های یادگیری ماشین |
کلمات کلیدی انگلیسی | Polyadenylation prediction, deep learning, multi-layer neural network, motif discovery, genomics and machine learning algorithms |
شناسه دیجیتال – doi |
https://doi.org/10.1109/ACCESS.2018.2825996 |
کد محصول | E10381 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
فهرست مطالب مقاله: |
Abstract I Introduction II METHODS III EXPERIMENTS AND RESULTS IV CONCLUSION AND FUTURE WORKS References |
بخشی از متن مقاله: |
ABSTRACT
Polyadenylation (Poly(A)) plays crucial roles in gene regulation, especially in messenger RNA metabolism, protein diversification and protein localization. Accurate prediction of polyadenylation sites and identification of motifs that controlling polyadenylation are fundamental for interpreting the patterns of gene expression, improving the accuracy of genome annotation and comprehending the mechanisms that governing gene regulation. Despite considerable advances in using machine learning techniques for this problem, its efficiency is still limited by the lack of experiences and domain knowledge to carefully design and generate useful features, especially for plants. With the increasing availability of extensive genomic datasets and leading computational techniques, deep learning methods, especially convolutional neural networks, have been applied to automatically identify and understand gene regulation directly from gene sequences and predict unknown sequence profiles. Here, we present DeepPolyA, a new deep convolutional neural network-based approach, to predict polyadenylation sites from the plant Arabidopsis thaliana gene sequences. We investigate various deep neural network architectures and evaluate their performance against classical machine learning algorithms and several popular deep learning models. Experimental results demonstrate that DeepPolyA is substantially better than competing methods regarding various performance metrics. We further visualize the learned motifs of DeepPolyA to provide insights of our model and learned polyadenylation signals. INTRODUCTION Polyadenylation is a vital process that occurs after gene transcription and produces mature messenger RNA (mRNA) for translation by synthesizing the polyadenylation tail at the RNA’s 3’-end [1]. Recent discoveries have revealed that the 3’-end of most protein-coding and long-noncoding RNAs (lncRNAs; noncoding transcripts of 200 nucleotides or longer) is cleaved and polyadenylated [2]. In addition, alternative polyadenylation (APA) is prevalent in all eukaryotic species and plays critical roles in gene regulation, especially in the processes such as mRNA metabolism, protein diversification and protein localization [3]. Specifically, in addition to conducing to the intricacy of transcriptome by producing isoforms of distinct properties, it can regulate the translation efficiency, function, stability and localization of target RNAs [2], [4]. The polyadenylation site, also called the poly(A) site, is defined by surrounding RNA segments and conserved across metazoans with some minor variations in mammals [1]–[3]. Accurate prediction of poly(A) sites and identification of motifs that controlling them are fundamental for interpreting the patterns of gene expression, improving the accuracy of genome annotation and comprehending the mechanisms that governing gene regulation [5], [6]. However, this remains a challenging problem, especially for plants, to precisely identify the poly(A) signals and predict poly(A) sites. Unlike animals, plants possess much less conserved signal sequences in such regions [7]. For example, the upstream element signal “AAUAAA” (or “AATAAA” in DNA sequence), which has been identified as the best signal in plants, can only be found in approximately 10% of Arabidopsis genes [8], [9]. In contrast, the same signal is utilized by 50% of human genes [10]. The variable structures composed of functional motifs [11], [12] also increase the difficulty in identifying poly(A) sites. In addition, because of the epidemic presence of alternative polyadenylation in intron and coding sequence (CDS), the poly(A) sites may locate in the genomic regions other than 3’ untranslated region (3’-UTR). Thus, an ideal predictive model should be powerful and robust enough to overcome all barriers as mentioned above to achieve decent performance. |