مشخصات مقاله | |
ترجمه عنوان مقاله | رویکرد یادگیری تقویتی برای اجرای بهینه |
عنوان انگلیسی مقاله | A reinforcement learning approach to optimal execution |
انتشار | مقاله سال 2022 |
تعداد صفحات مقاله انگلیسی | 19 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
پایگاه داده | نشریه تیلور و فرانسیس – Taylor & Francis |
نوع نگارش مقاله | مقاله پژوهشی (Research article) |
مقاله بیس | این مقاله بیس میباشد |
نمایه (index) | JCR – Master Journal List – Scopus |
نوع مقاله |
ISI |
فرمت مقاله انگلیسی | |
ایمپکت فاکتور(IF) |
2.132 در سال 2020 |
شاخص H_index | 73 در سال 2022 |
شاخص SJR | 0.865 در سال 2020 |
شناسه ISSN | 1469-7696 |
شاخص Quartile (چارک) | Q1 در سال 2020 |
فرضیه | ندارد |
مدل مفهومی | دارد |
پرسشنامه | ندارد |
متغیر | دارد |
رفرنس | دارد |
رشته های مرتبط | مهندسی کامپیوتر |
گرایش های مرتبط | مهندسی نرم افزار |
نوع ارائه مقاله |
ژورنال |
مجله / کنفرانس | مالی کمی – Quantitative Finance |
دانشگاه | Graduate School of Business, Columbia University, USA |
کلمات کلیدی | اجرای بهینه – توقف بهینه – یادگیری تقویتی – یادگیری تفاوت زمانی |
کلمات کلیدی انگلیسی | Optimal execution – Optimal stopping – Reinforcement learning – Temporal difference learning |
شناسه دیجیتال – doi | https://doi.org/10.1080/14697688.2022.2039403 |
کد محصول | e16645 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
فهرست مطالب مقاله: |
Abstract 1. Introduction 2. Limit order book and optimal stopping formulation 3. Supervised learning approach 4. Reinforcement learning approach 5. Numerical experiment: setup 6. Numerical experiment: results Disclosure statement References Appendices |
بخشی از متن مقاله: |
Abstract We consider the problem of execution timing in optimal execution. Specifically, we formulate the optimal execution problem of an infinitesimal order as an optimal stopping problem. By using a novel neural network architecture, we develop two versions of data-driven approaches for this problem, one based on supervised learning, and the other based on reinforcement learning. Temporal difference learning can be applied and extends these two methods to many variants. Through numerical experiments on historical market data, we demonstrate significant cost reduction of these methods. Insights from numerical experiments reveals various tradeoffs in the use of temporal difference learning, including convergence rates, data efficiency, and a tradeoff between bias and variance. Introduction Optimal execution is a classic problem in finance that aims to optimize trading while balancing various tradeoffs. When trading a large order of stock, one of the most common tradeoffs is between market impact and price uncertainty. More specifically, if a large order is submitted as a single execution, the market would typically move in the adverse direction, worsening the average execution price. This phenomenon is commonly referred to as the ‘market impact’. In order to minimize the market impact, the trader has an incentive to divide the large order into smaller child orders and execute them gradually over time. However, this strategy inevitably prolongs the execution horizon, exposing the trader to a greater degree of price uncertainty. Optimal execution problems seek to obtain an optimal trading schedule while balancing a specific tradeoff such as this. We will refer to the execution problem mentioned above as the parent-order problem, where an important issue is to divide a large parent order into smaller child orders to mitigate market impact. In this paper, we focus on the optimal execution of the child orders, that is, after the parent order is divided, the problem of executing each one of the child orders. The child orders are quite different in nature compared to the parent order. The child orders are typically much smaller in size, and the prescribed execution horizons are typically much shorter. In practice, a parent order is typically completed within hours or days, while a child orders are typically completed within seconds or minutes. Because any further dividing of an order can be viewed as another parent-order problem, we will only consider the child-order problem at the most atomic level. At this level, the child orders will not be further divided. In other words, each child order will be fulfilled in a single execution. Numerical experiment: results This section presents the results of the numerical experiments and discusses the interpretation of these results. Best performances TD Given sufficient data and time, the RL method outperforms the SL method. This is true under both the stock-specific regime and the universal regime. The models trained under the universal regime generally outperform the models trained under the stock-specific regime as well. |