مقاله انگلیسی رایگان در مورد اجرای بهینه با نگرش یادگیری تقویتی – تیلور و فرانسیس ۲۰۲۲

taylorandfrancis

 

مشخصات مقاله
ترجمه عنوان مقاله رویکرد یادگیری تقویتی برای اجرای بهینه
عنوان انگلیسی مقاله A reinforcement learning approach to optimal execution
انتشار  مقاله سال ۲۰۲۲
تعداد صفحات مقاله انگلیسی  ۱۹ صفحه
هزینه  دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده  نشریه تیلور و فرانسیس – Taylor & Francis
نوع نگارش مقاله مقاله پژوهشی (Research article)
مقاله بیس این مقاله بیس میباشد
نمایه (index) JCR – Master Journal List – Scopus
نوع مقاله
ISI
فرمت مقاله انگلیسی  PDF
ایمپکت فاکتور(IF)
۲٫۱۳۲ در سال ۲۰۲۰
شاخص H_index ۷۳ در سال ۲۰۲۲
شاخص SJR ۰٫۸۶۵ در سال ۲۰۲۰
شناسه ISSN ۱۴۶۹-۷۶۹۶
شاخص Quartile (چارک) Q1 در سال ۲۰۲۰
فرضیه ندارد
مدل مفهومی دارد
پرسشنامه ندارد
متغیر دارد
رفرنس دارد
رشته های مرتبط مهندسی کامپیوتر
گرایش های مرتبط مهندسی نرم افزار
نوع ارائه مقاله
ژورنال
مجله / کنفرانس مالی کمی – Quantitative Finance
دانشگاه Graduate School of Business, Columbia University, USA
کلمات کلیدی اجرای بهینه – توقف بهینه – یادگیری تقویتی – یادگیری تفاوت زمانی
کلمات کلیدی انگلیسی  Optimal execution – Optimal stopping – Reinforcement learning – Temporal difference learning
شناسه دیجیتال – doi https://doi.org/10.1080/14697688.2022.2039403
کد محصول e16645
وضعیت ترجمه مقاله  ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله سفارش ترجمه این مقاله

 

فهرست مطالب مقاله:

Abstract

۱٫ Introduction

۲٫ Limit order book and optimal stopping formulation

۳٫ Supervised learning approach

۴٫ Reinforcement learning approach

۵٫ Numerical experiment: setup

۶٫ Numerical experiment: results

Disclosure statement

References

Appendices

 

بخشی از متن مقاله:

Abstract

     We consider the problem of execution timing in optimal execution. Specifically, we formulate the optimal execution problem of an infinitesimal order as an optimal stopping problem. By using a novel neural network architecture, we develop two versions of data-driven approaches for this problem, one based on supervised learning, and the other based on reinforcement learning. Temporal difference learning can be applied and extends these two methods to many variants. Through numerical experiments on historical market data, we demonstrate significant cost reduction of these methods. Insights from numerical experiments reveals various tradeoffs in the use of temporal difference learning, including convergence rates, data efficiency, and a tradeoff between bias and variance.

Introduction

     Optimal execution is a classic problem in finance that aims to optimize trading while balancing various tradeoffs. When trading a large order of stock, one of the most common tradeoffs is between market impact and price uncertainty. More specifically, if a large order is submitted as a single execution, the market would typically move in the adverse direction, worsening the average execution price. This phenomenon is commonly referred to as the ‘market impact’. In order to minimize the market impact, the trader has an incentive to divide the large order into smaller child orders and execute them gradually over time. However, this strategy inevitably prolongs the execution horizon, exposing the trader to a greater degree of price uncertainty. Optimal execution problems seek to obtain an optimal trading schedule while balancing a specific tradeoff such as this.

     We will refer to the execution problem mentioned above as the parent-order problem, where an important issue is to divide a large parent order into smaller child orders to mitigate market impact. In this paper, we focus on the optimal execution of the child orders, that is, after the parent order is divided, the problem of executing each one of the child orders. The child orders are quite different in nature compared to the parent order. The child orders are typically much smaller in size, and the prescribed execution horizons are typically much shorter. In practice, a parent order is typically completed within hours or days, while a child orders are typically completed within seconds or minutes. Because any further dividing of an order can be viewed as another parent-order problem, we will only consider the child-order problem at the most atomic level. At this level, the child orders will not be further divided. In other words, each child order will be fulfilled in a single execution.

 Numerical experiment: results

     This section presents the results of the numerical experiments and discusses the interpretation of these results.

Best performances TD
learning is applied to both the SL and RL method, with various update step m (see section 3.4.1). These algorithms, SL-TD(m-step) and RL-TD(m-step), are trained using the training data, tuned with the validation data, and performances are reported using the testing data. Neural network architecture, learning rate, update step m, and other hyperparameters are tuned to maximize the performance. The best performances using SL and RL are reported in table 1. These figures are price gains per episode averaged over all 50 stocks. The price gain is reported in percentage of half-spread. The detailed performance for each stock can be found in appendix 5 (see table A6).

     Given sufficient data and time, the RL method outperforms the SL method. This is true under both the stock-specific regime and the universal regime. The models trained under the universal regime generally outperform the models trained under the stock-specific regime as well.

ارسال دیدگاه

نشانی ایمیل شما منتشر نخواهد شد.