مشخصات مقاله | |
ترجمه عنوان مقاله | شناسایی تالیف کد با استفاده از شبکه های عصبی پیچشی |
عنوان انگلیسی مقاله | Code authorship identification using convolutional neural networks |
انتشار | مقاله سال 2019 |
تعداد صفحات مقاله انگلیسی | 12 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
پایگاه داده | نشریه الزویر |
نوع نگارش مقاله |
مقاله پژوهشی (Research Article) |
مقاله بیس | این مقاله بیس نمیباشد |
نمایه (index) | Scopus – Master Journals List – JCR |
نوع مقاله | ISI |
فرمت مقاله انگلیسی | |
ایمپکت فاکتور(IF) |
7.007 در سال 2018 |
شاخص H_index | 93 در سال 2019 |
شاخص SJR | 0.835 در سال 2018 |
شناسه ISSN | 0167-739X |
شاخص Quartile (چارک) | Q1 در سال 2018 |
مدل مفهومی | ندارد |
پرسشنامه | ندارد |
متغیر | ندارد |
رفرنس | دارد |
رشته های مرتبط | مهندسی کامپیوتر |
گرایش های مرتبط | مهندسی نرم افزار، برنامه نویسی کامپیوتر، امنیت اطلاعات، هوش مصنوعی |
نوع ارائه مقاله |
ژورنال |
مجله | سیستم های کامپیوتری نسل آینده – Future Generation Computer Systems |
دانشگاه | Computer Engineering Department, INHA University, Incheon, South Korea |
کلمات کلیدی | شناسایی تألیف کد، ویژگیهای برنامه حریم خصوصی، شبکه عصبی پیچشی، شناسایی یادگیری عمیق، forensics نرم افزار و امنیت |
کلمات کلیدی انگلیسی | Code authorship identification، Program features privacy، Convolutional neural network، Deep learning identification، Software forensics and security |
شناسه دیجیتال – doi |
https://doi.org/10.1016/j.future.2018.12.038 |
کد محصول | E11546 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
فهرست مطالب مقاله: |
Abstract
1- Introduction 2- Related work 3- Theoretical background 4- CNN-based code authorship identification systems 5- Experiment and evaluation 6- Limitations 7- Conclusion References |
بخشی از متن مقاله: |
Abstract Although source code authorship identification creates a privacy threat for many open source contributors, it is an important topic for the forensics field and enables many successful forensic applications, including ghostwriting detection, copyright dispute settlements, and other code analysis applications. This work proposes a convolutional neural network (CNN) based code authorship identification system. Our proposed system exploits term frequency-inverse document frequency, word embedding modeling, and feature learning techniques for code representation. This representation is then fed into a CNN-based code authorship identification model to identify the code’s author. Evaluation results from using our approach on data from Google Code Jam demonstrate an identification accuracy of up to 99.4% with 150 candidate programmers, and 96.2% with 1,600 programmers. The evaluation of our approach also shows high accuracy for programmers identification over real-world code samples from 1987 public repositories on GitHub with 95% accuracy for 745 C programmers and 97% for the C++ programmers. These results indicate that the proposed approaches are not language-specific techniques and can identify programmers of different programming languages. Introduction Recently, the code authorship identification task has gained increased attention in the research community [1] due to its importance in software forensics. Code authorship identification is the process of identifying programmers based on their distinctive programming styles. Style is based on various factors, such as the programmer’s preferences in the way to write code, naming of the variables, programming proficiency and experience, and the thinking process to solve any programming task. All of these factors help to extract specific features from a given piece of a programmer’s code to enable the authorship identification process by assigning each piece to the programmer who wrote it. Thus, the advancements in this field could assist in several aspects of software forensics, such as software authorship disputes [2], code integrity investigations [3], code plagiarism detection [4], and copyright infringement [5]. Moreover, code authorship identification can be used to identify programmers of malicious code. The success of code authorship identification depends on effective features extraction process that captures the distinctive characteristics of programmers’ coding styles. This process is challenging, since the ‘‘coding style’’ of a programmer could change when working in environments or when following certain software engineering paradigms [6]. Being able to extract such features would enable accurate code authorship identification by assigning programmers to the input source code samples. This work investigates the capabilities of a convolutional neural network (CNN) to solve the code authorship identification problem. |