مقاله انگلیسی رایگان در مورد اجرای بهینه با نگرش یادگیری تقویتی - تیلور و فرانسیس 2022

مشخصات مقاله
ترجمه عنوان مقاله	رویکرد یادگیری تقویتی برای اجرای بهینه
عنوان انگلیسی مقاله	A reinforcement learning approach to optimal execution
انتشار	مقاله سال 2022
تعداد صفحات مقاله انگلیسی	19 صفحه
هزینه	دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده	نشریه تیلور و فرانسیس – Taylor & Francis
نوع نگارش مقاله	مقاله پژوهشی (Research article)
مقاله بیس	این مقاله بیس میباشد
نمایه (index)	JCR – Master Journal List – Scopus
نوع مقاله	ISI
فرمت مقاله انگلیسی	PDF
ایمپکت فاکتور(IF)	2.132 در سال 2020
شاخص H_index	73 در سال 2022
شاخص SJR	0.865 در سال 2020
شناسه ISSN	1469-7696
شاخص Quartile (چارک)	Q1 در سال 2020
فرضیه	ندارد
مدل مفهومی	دارد
پرسشنامه	ندارد
متغیر	دارد
رفرنس	دارد
رشته های مرتبط	مهندسی کامپیوتر
گرایش های مرتبط	مهندسی نرم افزار
نوع ارائه مقاله	ژورنال
مجله / کنفرانس	مالی کمی – Quantitative Finance
دانشگاه	Graduate School of Business, Columbia University, USA
کلمات کلیدی	اجرای بهینه – توقف بهینه – یادگیری تقویتی – یادگیری تفاوت زمانی
کلمات کلیدی انگلیسی	Optimal execution – Optimal stopping – Reinforcement learning – Temporal difference learning
شناسه دیجیتال – doi	https://doi.org/10.1080/14697688.2022.2039403
کد محصول	e16645
وضعیت ترجمه مقاله	ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله	دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله	سفارش ترجمه این مقاله

فهرست مطالب مقاله:

Abstract

1. Introduction

2. Limit order book and optimal stopping formulation

3. Supervised learning approach

4. Reinforcement learning approach

5. Numerical experiment: setup

6. Numerical experiment: results

Disclosure statement

References

Appendices

بخشی از متن مقاله:

Abstract

We consider the problem of execution timing in optimal execution. Specifically, we formulate the optimal execution problem of an infinitesimal order as an optimal stopping problem. By using a novel neural network architecture, we develop two versions of data-driven approaches for this problem, one based on supervised learning, and the other based on reinforcement learning. Temporal difference learning can be applied and extends these two methods to many variants. Through numerical experiments on historical market data, we demonstrate significant cost reduction of these methods. Insights from numerical experiments reveals various tradeoffs in the use of temporal difference learning, including convergence rates, data efficiency, and a tradeoff between bias and variance.

Introduction

Optimal execution is a classic problem in finance that aims to optimize trading while balancing various tradeoffs. When trading a large order of stock, one of the most common tradeoffs is between market impact and price uncertainty. More specifically, if a large order is submitted as a single execution, the market would typically move in the adverse direction, worsening the average execution price. This phenomenon is commonly referred to as the ‘market impact’. In order to minimize the market impact, the trader has an incentive to divide the large order into smaller child orders and execute them gradually over time. However, this strategy inevitably prolongs the execution horizon, exposing the trader to a greater degree of price uncertainty. Optimal execution problems seek to obtain an optimal trading schedule while balancing a specific tradeoff such as this.

We will refer to the execution problem mentioned above as the parent-order problem, where an important issue is to divide a large parent order into smaller child orders to mitigate market impact. In this paper, we focus on the optimal execution of the child orders, that is, after the parent order is divided, the problem of executing each one of the child orders. The child orders are quite different in nature compared to the parent order. The child orders are typically much smaller in size, and the prescribed execution horizons are typically much shorter. In practice, a parent order is typically completed within hours or days, while a child orders are typically completed within seconds or minutes. Because any further dividing of an order can be viewed as another parent-order problem, we will only consider the child-order problem at the most atomic level. At this level, the child orders will not be further divided. In other words, each child order will be fulfilled in a single execution.

Numerical experiment: results

This section presents the results of the numerical experiments and discusses the interpretation of these results.

Best performances TD
learning is applied to both the SL and RL method, with various update step m (see section 3.4.1). These algorithms, SL-TD(m-step) and RL-TD(m-step), are trained using the training data, tuned with the validation data, and performances are reported using the testing data. Neural network architecture, learning rate, update step m, and other hyperparameters are tuned to maximize the performance. The best performances using SL and RL are reported in table 1. These figures are price gains per episode averaged over all 50 stocks. The price gain is reported in percentage of half-spread. The detailed performance for each stock can be found in appendix 5 (see table A6).

Given sufficient data and time, the RL method outperforms the SL method. This is true under both the stock-specific regime and the universal regime. The models trained under the universal regime generally outperform the models trained under the stock-specific regime as well.

مقاله انگلیسی رایگان در مورد اجرای بهینه با نگرش یادگیری تقویتی – تیلور و فرانسیس 2022

دیدگاهتان را بنویسید لغو پاسخ