مشخصات مقاله | |
ترجمه عنوان مقاله | کاهش زمان آموزش مدل های یادگیری عمیق با استفاده از SGD همزمان و اندازه بزرگ دسته ای |
عنوان انگلیسی مقاله | Reducing the training time of deep learning models using synchronous SGD and large batch size |
نشریه | آی تریپل ای – IEEE |
سال انتشار | 2022 |
تعداد صفحات مقاله انگلیسی | 3 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
مقاله بیس | این مقاله بیس نمیباشد |
نوع مقاله | ISI |
فرمت مقاله انگلیسی | |
شناسه ISSN | 2768-0754 |
فرضیه | ندارد |
مدل مفهومی | ندارد |
پرسشنامه | ندارد |
متغیر | دارد |
رفرنس | دارد |
رشته های مرتبط | مهندسی کامپیوتر |
گرایش های مرتبط | هوش مصنوعی |
نوع ارائه مقاله |
کنفرانس |
مجله / کنفرانس | کنفرانس بین المللی سیستم های هوشمند و بینایی کامپیوتر – International Conference on Intelligent Systems and Computer Vision |
دانشگاه | Hassan First University of Settat |
کلمات کلیدی | یادگیری عمیق، آموزش توزیع شده، یادگیری ماشین، شبکه عصبی پیجشی |
کلمات کلیدی انگلیسی | deep learning, distributed training, machine learning, convolutional neural network |
شناسه دیجیتال – doi |
https://doi.org/10.1109/ISCV54655.2022.9806117 |
لینک سایت مرجع |
https://ieeexplore.ieee.org/document/9806117 |
کد محصول | e17194 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
فهرست مطالب مقاله: |
Abstract I. INTRODUCTION II. DISTRIBUTED TRAINING ALGORITHMS III. EXPERIMENT IV. RESULTS AND DISCUSSION V. CONCLUSION REFERENCES |
بخشی از متن مقاله: |
Abstract Recently, deep learning research has demonstrated that being able to train big models improves performance substantially. In this work, we consider the problem of training a deep neural network with millions of parameters using multiple CPU cores. On a single machine with a modern CPU platform, training a benchmark dataset of Dogs vs Cats can take up to hours; however, distributing training across numerous machines has been seen to dramatically reduce this time. The current state of the art for a modern distributed training framework is presented in this study, which covers the many methods and strategies utilized to distribute training. We concentrate on synchronous versions of distributed Stochastic Gradient Descent, different All Reduce gradient aggregation algorithms, and best practices for achieving higher throughput and reduced latency, such as gradient compression and large batch sizes. We show that using the same approaches, we can train a smaller deep network for an image classification problem in a shorter time. Although we focus on and report on the effectiveness of these approaches when used to train convolutional neural networks, the underlying methods may be used to train any gradient-based machine-learning algorithm. Introduction Recently, in a wide range of applications, including speech recognition, computer vision, text processing, and natural language processing, deep learning has outperformed classical Machine Learning models in creating models to address complicated problems. Despite significant progress in customizing neural networks designs, there is still one major drawback: training big NNs is memory and time intensive. The training of NNs in a distributed way is one answer to this problem. The purpose of distributed deep learning systems (DDLS) is to scale out the training of big models by combining the resources of several separate computers. As a result, several of the DDLS presented in the literature use various ways to implement distributed model training [1]. Training times have increased substantially as models and datasets have become more sophisticated, sometimes weeks or even months on a single GPU. To address this issue, two techniques proposed by many researchers for scaling out big deep learning workloads are model and data parallelism. Model parallelism seeks to transfer model execution stages onto cluster hardware, whereas data-parallel methods treat collaborative model training as a concurrency/synchronization challenge [1]. The main idea behind data parallelism is to enhance the overall sample throughput rate by duplicating the model over several computers and performing backpropagation in parallel to acquire more information about the loss function more quickly. It is achieved in the following way. Each cluster node begins by downloading the current model. Then, utilizing its parallel data assignment, each node executes backpropagation. Finally, the various results are combined and merged to create a new model [2]. CONCLUSION Data parallelism techniques using asynchronous algorithms have been widely employed to expedite the training of deep learning models. To enhance data throughput while ensuring computing efficiency in each worker, scale up techniques rely on tight hardware integration. Increasing the batch size, on the other hand, may results in a loss in test accuracy, which may be mitigated by a number of recent concepts, such as increasing the learning rate throughout the training process and using a learning rate warm up technique. |