مقاله انگلیسی رایگان در مورد تسلط مورد انتظار بازده مقیاس‌پذیر - اسپرینگر 2022

مشخصات مقاله
ترجمه عنوان مقاله	تسلط مورد انتظار بازده مقیاس‌پذیر: یک مفهوم راه‌حل جدید برای تصمیم‌گیری چند هدفه
عنوان انگلیسی مقاله	Expected scalarised returns dominance: a new solution concept for multi-objective decision making
نشریه	اسپرینگر
سال انتشار	2022
تعداد صفحات مقاله انگلیسی	21 صفحه
هزینه	دانلود مقاله انگلیسی رایگان میباشد.
نوع نگارش مقاله	مقاله پژوهشی (Research article)
مقاله بیس	این مقاله بیس نمیباشد
نمایه (index)	JCR – Master Journal List – Scopus – ISC
نوع مقاله	ISI
فرمت مقاله انگلیسی	PDF
ایمپکت فاکتور(IF)	5.599 در سال 2020
شاخص H_index	94 در سال 2022
شاخص SJR	1.072 در سال 2020
شناسه ISSN	1433-3058
شاخص Quartile (چارک)	Q1 در سال 2020
فرضیه	ندارد
مدل مفهومی	ندارد
پرسشنامه	ندارد
متغیر	دارد
رفرنس	دارد
رشته های مرتبط	مدیریت – مهندسی کامپیوتر
گرایش های مرتبط	مدیریت اجرایی – هوش مصنوعی
نوع ارائه مقاله	ژورنال
مجله / کنفرانس	محاسبات عصبی و برنامه های کاربردی – Neural Computing and Applications
دانشگاه	National University of Ireland Galway, Ireland
کلمات کلیدی	چند هدفه – تصمیم گیری – توزیعی – یادگیری تقویتی – تسلط تصادفی
کلمات کلیدی انگلیسی	Multi-objective – Decision making – Distributional – Reinforcement learning – Stochastic dominance
شناسه دیجیتال – doi	https://doi.org/10.1007/s00521-022-07334-x
لینک سایت مرجع	https://isidl.com/wp-admin/post.php?action=edit&post=46511
کد محصول	e17134
وضعیت ترجمه مقاله	ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله	دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله	سفارش ترجمه این مقاله

فهرست مطالب مقاله:

Abstract
1 Introduction
2 Background
3 Expected scalarised returns
4 Stochastic dominance for ESR
5 Solution sets for ESR
6 Multi-objective tabular distributional reinforcement learning
7 Experiments
8 Related work
9 Conclusion and future work
Appendix
Declaration
References

بخشی از متن مقاله:

Abstract

In many real-world scenarios, the utility of a user is derived from a single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user’s preferences over objectives (also known as the utility function) are unknown or difficult to specify. In such scenarios, a set of optimal policies must be learned. However, settings where the expected utility must be maximised have been largely overlooked by the multi-objective reinforcement learning community and, as a consequence, a set of optimal solutions has yet to be defined. In this work, we propose first-order stochastic dominance as a criterion to build solution sets to maximise expected utility. We also define a new dominance criterion, known as expected scalarised returns (ESR) dominance, that extends first-order stochastic dominance to allow a set of optimal policies to be learned in practice. Additionally, we define a new solution concept called the ESR set, which is a set of policies that are ESR dominant. Finally, we present a new multi-objective tabular distributional reinforcement learning (MOTDRL) algorithm to learn the ESR set in multi-objective multi-armed bandit settings.

Introduction

When making decisions in the real world, decision makers must make trade-offs between multiple, often conflicting, objectives [44]. In many real-world settings, a policy is only executed once. For example, consider a municipality that receives the majority of its electricity from local solar farms. To deal with the intermittency of the solar farms, the municipality wants to build a new electricity generation facility. The municipality are considering two choices: building a natural gas facility or adding a lithium-ion battery storage facility to the solar farms. Moreover, the municipality want to minimise CO2 emissions while ensuring energy demand can continuously be met. Given a new energy generation facility will only be constructed once, a full distribution over each potential outcome for capacity to meet electricity demand and CO2 emissions must be considered to make an optimal decision. The current state-of-the-art multi-objective reinforcement learning (MORL) literature focuses almost exclusively on learning polices that are optimal over multiple executions. Given such problems are salient, to fully utilise MORL in the real world, we must develop algorithms to compute a policy, or set of policies, that are optimal given the single-execution nature of the problem.

Conclusion and future work

MORL has been highlighted as one of several key challenges that need to be addressed in order for RL to be commonly deployed in real-world systems [12]. In order to apply RL to the real world, the MORL community must consider the ESR criterion. However, the ESR criterion has largely been ignored by the MORL community, with the exception of the works of Roijers et al. [33, 36], Hayes et al. [15, 16] and Vamplew et al. [43]. The works of Hayes et al. [15, 16] and Roijers et al. [33] present single-policy algorithms that are suitable to learn policies under the ESR criterion; however, prior to this work, a formal definition of the necessary requirements to compute policies under the ESR criterion had not previously been defined. In Sect. 3, we outline, through examples and definitions, the necessary requirements to optimise under the ESR criterion. The formal definitions outlined in Sect. 3 ensure that an optimal policy can be learned when the utility function of the user is known under the ESR criterion. However, in the real world, a user’s preferences over objectives (or utility function) may be unknown at the time of learning [36].

Prior to this paper, a suitable solution set for the unknown utility function scenario under the ESR criterion had not been defined. This long-standing research gap has restricted the applicability of MORL in real-world scenarios under the ESR criterion. In Sects. 4 and 5, we define the necessary solution sets required for multi-policy algorithms to learn a set of optimal policies under the ESR criterion when the utility function of a user is unknown. In Sect. 6, we present a novel multi-policy algorithm, known as multi-objective tabular distributional reinforcement learning (MOTDRL), that can learn the ESR set in a MOMAB setting when the utility function of a user is unknown at the time of learning. In Sect. 7, we evaluate MOTDRL in two MOMAB settings and show that MOTDRL can learn the ESR set in MOMAB settings. This work aims to answer some of the existing research questions regarding the ESR criterion. Moreover, we aim to highlight the importance of the ESR criterion when applying MORL to real-world scenarios. In order to successfully apply MORL to the real world, we must implement new single-policy and multi-policy algorithms that can learn solutions for nonlinear utility functions in various scenarios.

مقاله انگلیسی رایگان در مورد تسلط مورد انتظار بازده مقیاس‌پذیر – اسپرینگر 2022

دیدگاهتان را بنویسید لغو پاسخ