Abstract

Systems-of-systems (SoS) often include multiple agents that interact in both cooperative and competitive modes. Moreover, they involve multiple resources, including energy, information, and bandwidth. If these resources are limited, agents need to decide how to share resources cooperatively to reach the system-level goal, while performing the tasks assigned to them autonomously. This paper takes a step toward addressing these challenges by proposing a dynamic two-tier learning framework, based on deep reinforcement learning that enables dynamic resource allocation while acknowledging the autonomy of systems constituents. The two-tier learning framework that decouples the learning process of the SoS constituents from that of the resource manager ensures that the autonomy and learning of the SoS constituents are not compromised as a result of interventions executed by the resource manager. We apply the proposed two-tier learning framework on a customized OpenAI Gym environment and compare the results of the proposed framework to baseline methods of resource allocation to show the superior performance of the two-tier learning scheme across a different set of SoS key parameters. We then use the results of this experiment and apply our heuristic inference method to interpret the decisions of the resource manager for a range of environment and agent parameters.

References

1.
Ackoff
,
R. L.
,
1971
, “
Towards a System of Systems Concepts
,”
Manag. Sci.
,
17
(
11
), pp.
661
671
.
2.
Maier
,
M. W.
,
1998
, “
Architecting Principles for Systems-of-Systems
,”
Syst. Eng.: J. Int. Council Syst. Eng.
,
1
(
4
), pp.
267
284
.
3.
Jamshidi
,
M.
,
2008
, “
System of Systems Engineering—New Challenges for the 21st Century
,”
IEEE Aerosp. Electron. Syst. Mag.
,
23
(
5
), pp.
4
19
.
4.
Caputo
,
C.
, and
Cardin
,
M.-A.
,
2022
, “
Analyzing Real Options and Flexibility in Engineering Systems Design Using Decision Rules and Deep Reinforcement Learning
,”
ASME J. Mech. Des.
,
144
(
2
), p.
021705
.
5.
Heydari
,
B.
,
Mosleh
,
M.
, and
Dalili
,
K.
,
2016
, “
From Modular to Distributed Open Architectures: A Unified Decision Framework
,”
Syst. Eng.
,
19
(
3
), pp.
252
266
.
6.
Mosleh
,
M.
,
Ludlow
,
P.
, and
Heydari
,
B.
,
2016
, “
Resource Allocation Through Network Architecture in Systems of Systems: A Complex Networks Framework
,”
2016 Annual IEEE Systems Conference (SysCon)
,
Orlando, FL
,
Apr. 18–21
, IEEE, pp.
1
5
.
7.
Raz
,
A. K.
,
Blasch
,
E.
,
Cruise
,
R.
, and
Natarajan
,
S.
,
2019
, “
Enabling Autonomy in Command and Control Via Game-Theoretic Models and Machine Learning With a Systems Perspective
,”
AIAA Scitech 2019 Forum
,
San Diego, CA
,
Jan. 7–11
, p.
0381
.
8.
Jackson
,
M. O.
,
2001
, “
A Crash Course in Implementation Theory
,”
Soc. Choice Welfare
,
18
(
4
), pp.
655
708
.
9.
Roughgarden
,
T.
,
2010
, “
Algorithmic Game Theory
,”
Commun. ACM
,
53
(
7
), pp.
78
86
.
10.
Nishimura
,
D. G.
,
2010
,
Principles of Magnetic Resonance Imaging
,
Stanford University
,
Stanford, CA
.
11.
Nowak
,
M. A.
,
2006
, “
Five Rules for the Evolution of Cooperation
,”
Science
,
314
(
5805
), pp.
1560
1563
.
12.
Gianetto
,
D. A.
, and
Heydari
,
B.
,
2013
, “
Catalysts of Cooperation in System of Systems: The Role of Diversity and Network Structure
,”
IEEE Syst. J.
,
9
(
1
), pp.
303
311
.
13.
Xiao
,
Y.
, and
Sha
,
Z.
,
2022
, “
Robust Design of Complex Socio-Technical Systems Against Seasonal Effects: A Network Motif-Based Approach
,”
Design Sci.
,
8
(
2
), p.
e2
.
14.
Rahwan
,
I.
,
Cebrian
,
M.
,
Obradovich
,
N.
,
Bongard
,
J.
,
Bonnefon
,
J.-F.
,
Breazeal
,
C.
,
Crandall
,
J. W.
,
Christakis
,
N. A.
,
Couzin
,
I. D.
,
Jackson
,
M. O.
, and
Jennings
,
N. R.
,
2019
, “
Machine Behaviour
,”
Nature
,
568
(
7753
), pp.
477
486
.
15.
Heydari
,
B.
, and
Pennock
,
M. J.
,
2018
, “
Guiding the Behavior of Sociotechnical Systems: The Role of Agent-Based Modeling
,”
Syst. Eng.
,
21
(
3
), pp.
210
226
.
16.
Schulman
,
J.
,
Wolski
,
F.
,
Dhariwal
,
P.
,
Radford
,
A.
, and
Klimov
,
O.
,
2017
, “Proximal Policy Optimization Algorithms,” preprint arXiv:1707.06347.
17.
Yu
,
T.
,
Quillen
,
D.
,
He
,
Z.
,
Julian
,
R.
,
Hausman
,
K.
,
Finn
,
C.
, and
Levine
,
S.
,
2020
, “
Meta-World: A Benchmark and Evaluation for Multi-task and Meta Reinforcement Learning
,”
3rd Conference on Robot Learning (CoRL 2019)
,
Osaka, Japan
,
Oct. 30–Nov. 1, 2019
, PMLR, pp.
1094
1100
.
18.
Silver
,
D.
,
Schrittwieser
,
J.
,
Simonyan
,
K.
,
Antonoglou
,
I.
,
Huang
,
A.
,
Guez
,
A.
,
Hubert
,
T.
,
Baker
,
L.
,
Lai
,
M.
,
Bolton
,
A.
, and
Chen
,
Y.
,
2017
, “
Mastering the Game of Go Without Human Knowledge
,”
Nature
,
550
(
7676
), pp.
354
359
.
19.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Rusu
,
A. A.
,
Veness
,
J.
,
Bellemare
,
M. G.
,
Graves
,
A.
,
Riedmiller
,
M.
,
Fidjeland
,
A. K.
,
Ostrovski
,
G.
, and
Petersen
,
S.
,
2015
, “
Human-Level Control Through Deep Reinforcement Learning
,”
Nature
,
518
(
7540
), pp.
529
533
.
20.
Lee
,
T. H.
,
2003
,
The Design of CMOS Radio-Frequency Integrated Circuits.
,
Cambridge University Press
,
Cambridge, UK
.
21.
Chen
,
Q.
,
Heydari
,
B.
, and
Moghaddam
,
M.
,
2021
, “
Leveraging Task Modularity in Reinforcement Learning for Adaptable Industry 4.0 Automation
,”
ASME J. Mech. Des.
,
143
(
7
), p.
071701
.
22.
Sukhbaatar
,
S.
,
Szlam
,
A.
, and
Fergus
,
R.
,
2016
, “
Learning Multiagent Communication With Backpropagation
,”
Advances in Neural Information Processing Systems 29 (NIPS 2016)
,
Barcelona, Spain
,
Dec. 5–10
.
23.
Lowe
,
R.
,
Wu
,
Y.
,
Tamar
,
A.
,
Harb
,
J.
,
Abbeel
,
P.
, and
Mordatch
,
I.
,
2017
, “Multi-Agent Actor–Critic for Mixed Cooperative-Competitive Environments,” preprint arXiv:1706.02275.
24.
Foerster
,
J.
,
Farquhar
,
G.
,
Afouras
,
T.
,
Nardelli
,
N.
, and
Whiteson
,
S.
,
2018
, “
Counterfactual Multi-Agent Policy Gradients
,”
The Thirty-Second AAAI Conference on Artificial Intelligence
, Vol. 32,
New Orleans, LA
,
Feb. 2–7
.
25.
Jiang
,
J.
, and
Lu
,
Z.
,
2018
, “Learning Attentional Communication for Multi-Agent Cooperation,” preprint arXiv:1805.07733.
26.
Yang
,
Y.
,
Luo
,
R.
,
Li
,
M.
,
Zhou
,
M.
,
Zhang
,
W.
, and
Wang
,
J.
,
2018
, “
Mean Field Multi-Agent Reinforcement Learning
,”
Proceedings of the 35th International Conference on Machine Learning
,
Stockholmsmässan, Stockholm, Sweden
,
July 10–15
, PMLR, pp.
5571
5580
.
27.
Son
,
K.
,
Kim
,
D.
,
Kang
,
W. J.
,
Hostallero
,
D. E.
, and
Yi
,
Y.
,
2019
, “
Qtran: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
,”
Proceedings of the 36th International Conference on Machine Learning
,
Long Beach, CA
,
June 9–15
, PMLR, pp.
5887
5896
.
28.
Al-Tam
,
F.
,
Correia
,
N.
, and
Rodriguez
,
J.
,
2020
, “
Learn to Schedule (leasch): A Deep Reinforcement Learning Approach for Radio Resource Scheduling in the 5g Mac Layer
,”
IEEE Access
,
8
, p.
108088
.
29.
Pham
,
H. X.
,
La
,
H. M.
,
Feil-Seifer
,
D.
, and
Nefian
,
A.
,
2018
, “Cooperative and Distributed Reinforcement Learning of Drones for Field Coverage,” preprint arXiv:1803.07250.
30.
Lillicrap
,
T. P.
,
Hunt
,
J. J.
,
Pritzel
,
A.
,
Heess
,
N.
,
Erez
,
T.
,
Tassa
,
Y.
,
Silver
,
D.
, and
Wierstra
,
D.
,
2015
, “Continuous Control With Deep Reinforcement Learning,” preprint arXiv:1509.02971.
31.
Littman
,
M. L.
,
1994
, “
Markov Games As a Framework for Multi-Agent Reinforcement Learning
,”
Proceedings of the Eleventh International Conference
,
Rutgers University, New Brunswick, NJ
,
July 10–13
, Elsevier, pp.
157
163
.
32.
Mnih
,
V.
,
Badia
,
A. P.
,
Mirza
,
M.
,
Graves
,
A.
,
Lillicrap
,
T.
,
Harley
,
T.
,
Silver
,
D.
, and
Kavukcuoglu
,
K.
,
2016
, “
Asynchronous Methods for Deep Reinforcement Learning
,”
Proceedings of The 33rd International Conference on Machine Learning
,
New York
,
June 20–22
, PMLR, pp.
1928
1937
.
33.
Zhang
,
K.
,
Yang
,
Z.
, and
Başar
,
T.
,
2021
,
Handbook of Reinforcement Learning and Control
,
Springer Nature
,
Cham, Switzerland
, pp.
321
384
.
34.
Winfield
,
A. F.
, and
Jirotka
,
M.
,
2017
, “
The Case for An Ethical Black Box
,”
Towards Autonomous Robotic Systems 18th Annual Conference, TAROS 2017
,
Guildford, UK
,
July 19–21
, Springer, pp.
262
273
.
35.
Selbst
,
A. D.
,
Boyd
,
D.
,
Friedler
,
S. A.
,
Venkatasubramanian
,
S.
, and
Vertesi
,
J.
,
2019
, “
Fairness and Abstraction in Sociotechnical Systems
,”
FAT* '19: Conference on Fairness, Accountability, and Transparency
,
Atlanta, GA
,
Jan. 29–31
, pp.
59
68
.
36.
Mehrabi
,
N.
,
Morstatter
,
F.
,
Saxena
,
N.
,
Lerman
,
K.
, and
Galstyan
,
A.
,
2021
, “
A Survey on Bias and Fairness in Machine Learning
,”
ACM Comput. Surv. (CSUR)
,
54
(
6
), pp.
1
35
.
37.
Mosleh
,
M.
, and
Heydari
,
B.
,
2017
, “
Fair Topologies: Community Structures and Network Hubs Drive Emergence of Fairness Norms
,”
Sci. Rep.
,
7
(
1
), pp.
1
9
.
38.
Gunning
,
D.
,
Stefik
,
M.
,
Choi
,
J.
,
Miller
,
T.
,
Stumpf
,
S.
, and
Yang
,
G.-Z.
,
2019
, “
XAI—explainable Artificial Intelligence
,”
Sci. Rob.
,
4
(
37
), p.
eaay7120
.
39.
Preuer
,
K.
,
Klambauer
,
G.
,
Rippmann
,
F.
,
Hochreiter
,
S.
, and
Unterthiner
,
T.
,
2019
,
Explainable AI: Interpreting, Explaining and Visualizing Deep Learning
,
Springer Nature
,
Cham, Switzerland
.
40.
Dachowicz
,
A.
,
Mall
,
K.
,
Balasubramani
,
P.
,
Maheshwari
,
A.
,
Raz
,
A. K.
,
Panchal
,
J. H.
, and
DeLaurentis
,
D. A.
,
2022
, “
Mission Engineering and Design Using Real-Time Strategy Games: An Explainable AI Approach
,”
ASME J. Mech. Des.
,
144
(
2
), p.
021710
.
41.
Verma
,
A.
,
Murali
,
V.
,
Singh
,
R.
,
Kohli
,
P.
, and
Chaudhuri
,
S.
,
2018
, “
Programmatically Interpretable Reinforcement Learning
,”
Proceedings of the 35th International Conference on Machine Learning
,
Stockholmsmässan, Stockholm, Sweden
,
July 10–15
, PMLR, pp.
5045
5054
.
42.
Madumal
,
P.
,
Miller
,
T.
,
Sonenberg
,
L.
, and
Vetere
,
F.
,
2020
, “
Explainable Reinforcement Learning Through a Causal Lens
,”
The Thirty-Fourth AAAI Conference on Artificial Intelligence
,
New York
,
Feb. 7–12
, Vol. 34, pp.
2493
2500
.
43.
Heuillet
,
A.
,
Couthouis
,
F.
, and
Díaz-Rodríguez
,
N.
,
2021
, “
Explainability in Deep Reinforcement Learning
,”
Knowl.-Based Syst.
,
214
, p.
106685
.
44.
Phillips
,
P. J.
,
Hahn
,
C. A.
,
Fontana
,
P. C.
,
Broniatowski
,
D. A.
, and
Przybocki
,
M. A.
,
2021
,
Four Principles of Explainable Artificial Intelligence
,
National Institute of Standards and Technology
,
Gaithersburg, MD
.
45.
Broniatowski
,
D. A.
,
2021
, “Psychological Foundations of Explainability and Interpretability in Artificial Intelligence,” NIST: National Institute of Standards and Technology, US Department of Commerce.
46.
Lipton
,
Z. C.
,
2018
, “
The Mythos of Model Interpretability: In Machine Learning, the Concept of Interpretability Is Both Important and Slippery
,”
Queue
,
16
(
3
), pp.
31
57
.
47.
Hassannezhad
,
M.
,
Cantamessa
,
M.
,
Montagna
,
F.
, and
Clarkson
,
P. J.
,
2019
, “
Managing Sociotechnical Complexity in Engineering Design Projects
,”
ASME J. Mech. Des.
,
141
(
8
), p.
081101
.
48.
ElSayed
,
K. A.
,
Bilionis
,
I.
, and
Panchal
,
J. H.
,
2021
, “
Evaluating Heuristics in Engineering Design: A Reinforcement Learning Approach
,” International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol.
85383
,
American Society of Mechanical Engineers
, Paper No. V03AT03A022.
49.
Rahman
,
M. H.
,
Yuan
,
S.
,
Xie
,
C.
, and
Sha
,
Z.
,
2020
, “
Predicting Human Design Decisions With Deep Recurrent Neural Network Combining Static and Dynamic Data
,”
Design Sci.
,
6
, p.
e15
.
50.
Simon
,
H. A.
,
1977
,
Models of Discovery
,
D. Reidel Publishing Company
,
Dordrecht, The Netherlands
, pp.
154
175
.
51.
Artinger
,
F.
,
Petersen
,
M.
,
Gigerenzer
,
G.
, and
Weibler
,
J.
,
2015
, “
Heuristics As Adaptive Decision Strategies in Management
,”
J. Organ. Behav.
,
36
(
1
), pp.
S33
S52
.
52.
Meluso
,
J.
, and
Austin-Breneman
,
J.
,
2018
, “
Gaming the System: An Agent-Based Model of Estimation Strategies and Their Effects on System Performance
,”
ASME J. Mech. Des.
,
140
(
12
), p.
121101
.
53.
Brockman
,
G.
,
Cheung
,
V.
,
Pettersson
,
L.
,
Schneider
,
J.
,
Schulman
,
J.
,
Tang
,
J.
, and
Zaremba
,
W.
,
2016
, “Openai gym,” preprint arXiv:1606.01540.
54.
Kopetz
,
H.
,
2011
,
Real-Time Systems
,
Prentice Hall PTR
,
Upper Saddle River, NJ
, pp.
307
323
.
55.
Alighanbari
,
M.
,
2007
, “Robust and Decentralized Task Assignment Algorithms for UAVs,” Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA.
56.
Prakasha
,
P. S.
,
Ratei
,
P.
,
Naeem
,
N.
,
Nagel
,
B.
, and
Bertram
,
O.
,
2021
, “
System of Systems Simulation Driven Urban Air Mobility Vehicle Design
,”
AIAA AVIATION 2021 FORUM
,
Virtual Event
,
Aug. 2–6
, p.
3200
.
57.
Moradian
,
M.
,
Tabatabaei
,
F. M.
, and
Moradian
,
S.
,
2013
, “
Modeling, Control & Fault Management of Microgrids
,”
Smart Grid and Renewable Energy
,
4
(
1
), p.
28141
.
58.
Saad
,
W.
,
Han
,
Z.
, and
Poor
,
H. V.
,
2011
, “
A Game Theoretic Approach for Multi-Hop Power Line Communications
,”
2nd International ICST Conference, GameNets 2011
,
Shanghai, China
,
Apr. 11–18
, Springer, pp.
546
561
.
59.
Brown
,
O.
,
Eremenko
,
P.
, and
Collopy
,
P.
,
2009
, “
Value-Centric Design Methodologies for Fractionated Spacecraft: Progress Summary From Phase I of the Darpa System F6 Program
,”
AIAA SPACE 2009 Conference & Exposition
,
Pasadena, CA
,
Sept. 14–17
, p.
6540
.
60.
Mosleh
,
M.
,
Dalili
,
K.
, and
Heydari
,
B.
,
2014
, “
Optimal Modularity for Fractionated Spacecraft: The Case of System F6
,”
Procedia Comput. Sci.
,
28
, pp.
164
170
.
61.
Westkämper
,
E.
,
2008
, “
Manufuture and Sustainable Manufacturing
,”
The 41st CIRP Conference on Manufacturing Systems
,
Tokyo, Japan
,
May 26–28
, Springer, pp.
11
14
.
62.
Mitola
,
J.
, and
Maguire
,
G. Q.
,
1999
, “
Cognitive Radio: Making Software Radios More Personal
,”
IEEE Pers. Commun.
,
6
(
4
), pp.
13
18
.
63.
Madni
,
A. M.
,
Sievers
,
M. W.
,
Humann
,
J.
,
Ordoukhanian
,
E.
,
Boehm
,
B.
, and
Lucero
,
S.
,
2018
,
Disciplinary Convergence in Systems Engineering Research
,
Springer Nature
,
Cham, Switzerland
.
64.
Dahmann
,
J.
, and
Henshaw
,
M.
,
2016
, “
Introduction to Systems of Systems Engineering
,”
Insight
,
19
(
3
), pp.
12
16
.
65.
Wooldridge
,
M.
,
2009
,
An Introduction to Multiagent Systems
,
John Wiley & Sons Ltd
,
West Sussex, UK
.
66.
Boardman
,
J.
, and
Sauser
,
B.
,
2006
, “
System of Systems—The Meaning of of
,”
2006 IEEE/SMC International Conference on System of Systems Engineering
,
Los Angeles, CA
,
Apr. 24–26
, IEEE, p.
6
.
67.
Dahmann
,
J. S.
, and
Baldwin
,
K. J.
,
2008
, “
Understanding the Current State of US Defense Systems of Systems and the Implications for Systems Engineering
,”
2008 2nd Annual IEEE Systems Conference
,
Montreal, QC, Canada
,
Apr. 7–10
, IEEE, pp.
1
7
.
68.
Agarwal
,
S.
,
Pape
,
L. E.
,
Kilicay-Ergin
,
N.
, and
Dagli
,
C. H.
,
2014
, “
Multi-Agent Based Architecture for Acknowledged System of Systems
,”
Procedia Comput. Sci.
,
28
, pp.
1
10
.
69.
Jackson
,
M. O.
, and
Wolinsky
,
A.
,
1996
, “
A Strategic Model of Social and Economic Networks
,”
J. Econ. Theory
,
71
(
1
), pp.
44
74
.
70.
Naderializadeh
,
N.
,
Sydir
,
J. J.
,
Simsek
,
M.
, and
Nikopour
,
H.
,
2021
, “
Resource Management in Wireless Networks Via Multi-Agent Deep Reinforcement Learning
,”
IEEE Trans. Wirel. Commun.
,
20
(
6
), pp.
3507
3523
.
71.
Yan
,
M.
,
Feng
,
G.
,
Zhou
,
J.
,
Sun
,
Y.
, and
Liang
,
Y.-C.
,
2019
, “
Intelligent Resource Scheduling for 5g Radio Access Network Slicing
,”
IEEE Trans. Veh. Technol.
,
68
(
8
), pp.
7691
7703
.
72.
Ferreira
,
P. V. R.
,
Paffenroth
,
R.
,
Wyglinski
,
A. M.
,
Hackett
,
T. M.
,
Bilén
,
S. G.
,
Reinhart
,
R. C.
, and
Mortensen
,
D. J.
,
2018
, “
Multiobjective Reinforcement Learning for Cognitive Satellite Communications Using Deep Neural Network Ensembles
,”
IEEE J. Sel. Areas Commun.
,
36
(
5
), pp.
1030
1041
.
73.
Du
,
B.
,
Wu
,
C.
, and
Huang
,
Z.
,
2019
, “
Learning Resource Allocation and Pricing for Cloud Profit Maximization
,”
The Thirty-Third AAAI Conference on Artificial Intelligence
,
Honolulu, HI
,
Jan. 27–Feb. 1
, Vol. 33, pp.
7570
7577
.
74.
Deng
,
S.
,
Xiang
,
Z.
,
Zhao
,
P.
,
Taheri
,
J.
,
Gao
,
H.
,
Yin
,
J.
, and
Zomaya
,
A. Y.
,
2020
, “
Dynamical Resource Allocation in Edge for Trustable Internet-of-Things Systems: A Reinforcement Learning Method
,”
IEEE Trans. Ind. Inform.
,
16
(
9
), pp.
6103
6113
.
75.
Liu
,
H.
,
Liu
,
S.
, and
Zheng
,
K.
,
2018
, “
A Reinforcement Learning-Based Resource Allocation Scheme for Cloud Robotics
,”
IEEE Access
,
6
, pp.
17215
17222
.
76.
Chinchali
,
S.
,
Sharma
,
A.
,
Harrison
,
J.
,
Elhafsi
,
A.
,
Kang
,
D.
,
Pergament
,
E.
,
Cidon
,
E.
,
Katti
,
S.
, and
Pavone
,
M.
,
2021
, “
Network Offloading Policies for Cloud Robotics: A Learning-Based Approach
,”
Auton. Rob.
,
45
, pp.
997
1012
.
77.
Cui
,
J.
,
Liu
,
Y.
, and
Nallanathan
,
A.
,
2019
, “
Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks
,”
IEEE Trans. Wirel. Commun.
,
19
(
2
), pp.
729
743
.
78.
Safavian
,
S. R.
, and
Landgrebe
,
D.
,
1991
, “
A Survey of Decision Tree Classifier Methodology
,”
IEEE Trans. Syst. Man Cybern.
,
21
(
3
), pp.
660
674
.
79.
Roth
,
A. M.
,
Topin
,
N.
,
Jamshidi
,
P.
, and
Veloso
,
M.
,
2019
, “Conservative Q-Improvement: Reinforcement Learning for an Interpretable Decision-Tree Policy,” preprint arXiv:1907.01180.
80.
Nageshrao
,
S.
,
Costa
,
B.
, and
Filev
,
D.
,
2019
, “
Interpretable Approximation of a Deep Reinforcement Learning Agent As a Set of If–Then Rules
,”
2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)
,
Boca Raton, FL
,
Dec. 16–19
, IEEE, pp.
216
221
.
81.
Hu
,
D.
,
2019
, “
An Introductory Survey on Attention Mechanisms in NLP Problems
,”
Proceedings of the 2019 Intelligent Systems Conference (IntelliSys)
,
London, UK
,
Sept. 5–6
, Springer, pp.
432
448
.
82.
Hafiz
,
A. M.
,
Parah
,
S. A.
, and
Bhat
,
R. U. A.
,
2021
, “Attention Mechanisms and Deep Learning for Machine Vision: A Survey of the State of the Art,” preprint arXiv:2106.07550.
83.
Mott
,
A.
,
Zoran
,
D.
,
Chrzanowski
,
M.
,
Wierstra
,
D.
, and
Rezende
,
D. J.
,
2019
, “Towards Interpretable Reinforcement Learning Using Attention Augmented Agents,” preprint arXiv:1906.02500.
84.
Tang
,
Y.
,
Nguyen
,
D.
, and
Ha
,
D.
,
2020
, “
Neuroevolution of Self-Interpretable Agents
,”
GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference
,
Cancún, Mexico
,
July 8–12
, pp.
414
424
.
85.
Annasamy
,
R. M.
, and
Sycara
,
K.
,
2019
, “
Towards Better Interpretability in Deep Q-networks
,”
The Thirty-Third AAAI Conference on Artificial Intelligence
,
Honolulu, HI
,
Jan. 27–Feb. 1
, Vol. 33, pp.
4561
4569
.
86.
Lyu
,
D.
,
Yang
,
F.
,
Liu
,
B.
, and
Gustafson
,
S.
,
2019
, “
SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning
,”
The Thirty-Third AAAI Conference on Artificial Intelligence
,
Honolulu, HI
,
Jan. 27–Feb. 1
, Vol. 33, pp.
2970
2977
.
87.
Sun
,
S.-H.
,
Wu
,
T.-L.
, and
Lim
,
J. J.
,
2020
, “
Program Guided Agent
,”
Eighth International Conference on Learning Representations
,
Addis Ababa, Ethiopia
,
Apr. 26–May 1
.
88.
Tan
,
M.
,
1993
, “
Multi-Agent Reinforcement Learning: Independent Vs. Cooperative Agents
,”
Proceedings of the Tenth International Conference on Machine Learning
,
Amherst, MA
,
June 27–29
, pp.
330
337
.
89.
Tampuu
,
A.
,
Matiisen
,
T.
,
Kodelja
,
D.
,
Kuzovkin
,
I.
,
Korjus
,
K.
,
Aru
,
J.
,
Aru
,
J.
, and
Vicente
,
R.
,
2017
, “
Multiagent Cooperation and Competition With Deep Reinforcement Learning
,”
PLoS One
,
12
(
4
), p.
e0172395
.
90.
Leibo
,
J. Z.
,
Zambaldi
,
V.
,
Lanctot
,
M.
,
Marecki
,
J.
, and
Graepel
,
T.
,
2017
, “Multi-Agent Reinforcement Learning in Sequential Social Dilemmas,” preprint arXiv:1702.03037.
91.
Lyu
,
X.
,
Xiao
,
Y.
,
Daley
,
B.
, and
Amato
,
C.
,
2021
, “Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning,” preprint arXiv:2102.04402.
92.
Lyu
,
X.
,
Baisero
,
A.
,
Xiao
,
Y.
, and
Amato
,
C.
,
2022
, “A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning,” preprint arXiv:2201.01221.
93.
Shapley
,
L. S.
,
1953
, “
Stochastic Games
,”
Proc. Natl. Acad. Sci. USA
,
39
(
10
), pp.
1095
1100
.
94.
Sutton
,
R. S.
,
McAllester
,
D. A.
,
Singh
,
S. P.
, and
Mansour
,
Y.
,
1999
, “
Policy Gradient Methods for Reinforcement Learning With Function Approximation
,”
NIPS'99: Proceedings of the 12th International Conference on Neural Information Processing Systems
,
Denver, CO
,
Nov. 29–Dec. 4
, pp.
1057
1063
.
95.
Huang
,
Q.
,
2020
, “
Model-Based Or Model-Free, A Review of Approaches in Reinforcement Learning
,”
2020 International Conference on Computing and Data Science (CDS)
,
Stanford, CA
,
Aug. 1–2
, IEEE, pp.
219
221
.
96.
March
,
J. G.
,
1991
, “
Exploration and Exploitation in Organizational Learning
,”
Organ. Sci.
,
2
(
1
), pp.
71
87
.
97.
Leonardos
,
S.
,
Piliouras
,
G.
, and
Spendlove
,
K.
,
2021
, “
Exploration–exploitation in Multi-Agent Competition: Convergence With Bounded Rationality
,”
NeurIPS 2021: Thirty-fifth Conference on Neural Information Processing Systems
,
Virtual
,
Dec. 6–14
, Vol. 34, pp.
26318
26331
.
98.
Mordatch
,
I.
, and
Abbeel
,
P.
,
2018
, “
Emergence of Grounded Compositional Language in Multi-Agent Populations
,”
Thirty-Second AAAI Conference on Artificial Intelligence
,
New Orleans, LA
,
Feb. 2–7
, Vol. 32.
99.
Gupta
,
J. K.
,
Egorov
,
M.
, and
Kochenderfer
,
M.
,
2017
, “
Cooperative Multi-Agent Control Using Deep Reinforcement Learning
,”
Autonomous Agents and Multiagent Systems AAMAS 2017 Workshops
,
São Paulo, Brazil
,
May 8–12
, Springer, pp.
66
83
.
100.
Levine
,
S.
,
Kumar
,
A.
,
Tucker
,
G.
, and
Fu
,
J.
,
2020
, “Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems,” preprint arXiv:2005.01643.
You do not currently have access to this content.