مقاله انگلیسی رایگان در مورد کاستن زمان آموزش مدل های یادگیری عمیق – IEEE 2022

 

مشخصات مقاله
ترجمه عنوان مقاله کاهش زمان آموزش مدل های یادگیری عمیق با استفاده از SGD همزمان و اندازه بزرگ دسته ای
عنوان انگلیسی مقاله Reducing the training time of deep learning models using synchronous SGD and large batch size
نشریه آی تریپل ای – IEEE
سال انتشار 2022
تعداد صفحات مقاله انگلیسی  3 صفحه
هزینه دانلود مقاله انگلیسی رایگان میباشد.
مقاله بیس این مقاله بیس نمیباشد
نوع مقاله ISI
فرمت مقاله انگلیسی  PDF
شناسه ISSN 2768-0754
فرضیه ندارد
مدل مفهومی ندارد
پرسشنامه ندارد
متغیر دارد
رفرنس دارد
رشته های مرتبط مهندسی کامپیوتر
گرایش های مرتبط هوش مصنوعی
نوع ارائه مقاله
کنفرانس
مجله / کنفرانس کنفرانس بین المللی سیستم های هوشمند و بینایی کامپیوتر – International Conference on Intelligent Systems and Computer Vision
دانشگاه Hassan First University of Settat
کلمات کلیدی یادگیری عمیق، آموزش توزیع شده، یادگیری ماشین، شبکه عصبی پیجشی
کلمات کلیدی انگلیسی deep learning, distributed training, machine learning, convolutional neural network
شناسه دیجیتال – doi
https://doi.org/10.1109/ISCV54655.2022.9806117
لینک سایت مرجع
https://ieeexplore.ieee.org/document/9806117
کد محصول e17194
وضعیت ترجمه مقاله  ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله سفارش ترجمه این مقاله

 

فهرست مطالب مقاله:
Abstract
I. INTRODUCTION
II. DISTRIBUTED TRAINING ALGORITHMS
III. EXPERIMENT
IV. RESULTS AND DISCUSSION
V. CONCLUSION
REFERENCES

 

بخشی از متن مقاله:

Abstract

     Recently, deep learning research has demonstrated that being able to train big models improves performance substantially. In this work, we consider the problem of training a deep neural network with millions of parameters using multiple CPU cores. On a single machine with a modern CPU platform, training a benchmark dataset of Dogs vs Cats can take up to hours; however, distributing training across numerous machines has been seen to dramatically reduce this time. The current state of the art for a modern distributed training framework is presented in this study, which covers the many methods and strategies utilized to distribute training. We concentrate on synchronous versions of distributed Stochastic Gradient Descent, different All Reduce gradient aggregation algorithms, and best practices for achieving higher throughput and reduced latency, such as gradient compression and large batch sizes. We show that using the same approaches, we can train a smaller deep network for an image classification problem in a shorter time. Although we focus on and report on the effectiveness of these approaches when used to train convolutional neural networks, the underlying methods may be used to train any gradient-based machine-learning algorithm.

Introduction

     Recently, in a wide range of applications, including speech recognition, computer vision, text processing, and natural language processing, deep learning has outperformed classical Machine Learning models in creating models to address complicated problems. Despite significant progress in customizing neural networks designs, there is still one major drawback: training big NNs is memory and time intensive. The training of NNs in a distributed way is one answer to this problem. The purpose of distributed deep learning systems (DDLS) is to scale out the training of big models by combining the resources of several separate computers. As a result, several of the DDLS presented in the literature use various ways to implement distributed model training [1]. Training times have increased substantially as models and datasets have become more sophisticated, sometimes weeks or even months on a single GPU. To address this issue, two techniques proposed by many researchers for scaling out big deep learning workloads are model and data parallelism. Model parallelism seeks to transfer model execution stages onto cluster hardware, whereas data-parallel methods treat collaborative model training as a concurrency/synchronization challenge [1]. The main idea behind data parallelism is to enhance the overall sample throughput rate by duplicating the model over several computers and performing backpropagation in parallel to acquire more information about the loss function more quickly. It is achieved in the following way. Each cluster node begins by downloading the current model. Then, utilizing its parallel data assignment, each node executes backpropagation. Finally, the various results are combined and merged to create a new model [2].

CONCLUSION

     Data parallelism techniques using asynchronous algorithms have been widely employed to expedite the training of deep learning models. To enhance data throughput while ensuring computing efficiency in each worker, scale up techniques rely on tight hardware integration. Increasing the batch size, on the other hand, may results in a loss in test accuracy, which may be mitigated by a number of recent concepts, such as increasing the learning rate throughout the training process and using a learning rate warm up technique.

دیدگاهتان را بنویسید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *

دکمه بازگشت به بالا