Abstract

The Stewart platform is an entirely parallel robot with mechanical differences from typical serial robotic manipulators, which has a wide application area ranging from flight and driving simulators to structural test platforms. This work concentrates on learning to control a complex model of the Stewart platform using state-of-the-art deep reinforcement learning (DRL) algorithms. In this regard, to enhance the reliability of the learning performance and to have a test bed capable of mimicking the behavior of the system completely, a precisely designed simulation environment is presented. Therefore, we first design a parametric representation for the kinematics of the Stewart platform in Gazebo and robot operating system (ROS) and integrate it with a Python class to conveniently generate the structures in simulation description format (SDF). Then, to control the system, we benefit from three DRL algorithms: the asynchronous advantage actor–critic (A3C), the deep deterministic policy gradient (DDPG), and the proximal policy optimization (PPO) to learn the control gains of a proportional integral derivative (PID) controller for a given reaching task. We chose to apply these algorithms due to the Stewart platform’s continuous action and state spaces, making them well-suited for our problem, where exact controller tuning is a crucial task. The simulation results show that the DRL algorithms can successfully learn the controller gains, resulting in satisfactory control performance.

References

1.
Sutton
,
R. S.
, and
Barto
,
A. G.
,
2018
,
Reinforcement Learning: An Introduction
, 2nd ed.,
The MIT Press
,
Cambridge, MA
.
2.
Allerton
,
D.
,
2009
,
Principles of Flight Simulation
,
John Wiley & Sons
,
Hoboken, NJ
.
3.
Liu
,
K.
,
Fitzgerald
,
J. M.
, and
Lewis
,
F. L.
,
1993
, “
Kinematic Analysis of a Stewart Platform Manipulator
,”
IEEE. Trans. Ind. Electron.
,
40
(
2
), pp.
282
293
.
4.
Bingul
,
Z.
, and
Karahan
,
O.
,
2012
,
Dynamic Modeling and Simulation of Stewart Platform
,
INTECH Open Access Publisher
,
London, UK
.
5.
Harib
,
K.
, and
Srinivasan
,
K.
,
2003
, “
Kinematic and Dynamic Analysis of Stewart Platform-based Machine Tool Structures
,”
Robotica
,
21
(
5
), pp.
541
554
.
6.
Iqbal
,
S.
,
Bhatti
,
A. I.
, and
Ahmed
,
Q.
,
2008
, “
Dynamic Analysis and Robust Control Design for Stewart Platform With Moving Payloads
,”
IFAC Proc. Vol.
,
41
(
2
), pp.
5324
5329
.
7.
Nawratil
,
G.
,
2014
, “
Introducing the Theory of Bonds for Stewart Gough Platforms With Self-motions
,”
ASME J. Mech. Rob.
,
6
(
1
), p.
011004
.
8.
Hesselroth
,
A. H.
, and
Hennessey
,
M. P.
,
2014
, “
Analytical Evaluation of the Double Stewart Platform Tensile Truss Stiffness Matrix
,”
ASME J. Mech. Rob.
,
6
(
1
), p.
011003
.
9.
Su
,
Y. X.
,
Duan
,
B. Y.
,
Zheng
,
C. H.
,
Zhang
,
Y.
,
Chen
,
G.
, and
Mi
,
J.
,
2004
, “
Disturbance-Rejection High-Precision Motion Control of a Stewart Platform
,”
IEEE Trans. Control Syst. Technol.
,
12
(
3
), pp.
364
374
.
10.
Dasgupta
,
B.
, and
Mruthyunjaya
,
T.
,
2000
, “
The Stewart Platform Manipulator: A Review
,”
Mech. Mach. Theory
,
35
(
1
), pp.
15
40
.
11.
Tarokh
,
M.
,
2007
, “
Real Time Forward Kinematics Solutions for General Stewart Platforms
,”
2007 IEEE International Conference on Robotics and Automation
,
Roma, Italy
,
Apr. 10–14
, IEEE, pp.
901
906
.
12.
Geng
,
Z.
, and
Haynes
,
L.
,
1991
, “
Neural Network Solution for the Forward Kinematics Problem of a Stewart Platform
,”
1991 IEEE International Conference on Robotics and Automation
,
Sacramento, CA
,
Apr. 9–11
, IEEE, pp.
2650
2655
.
13.
Houck
,
J. A.
,
Telban
,
R. J.
, and
Cardullo
,
F. M.
,
2005
,
Motion Cueing Algorithm Development: Human-Centered Linear and Nonlinear Approaches, Contractor Report (CR), NASA/CR-2005-213747
.
14.
Sadjadian
,
H.
,
Taghirad
,
H. D.
, and
Fatehi
,
A.
,
2005
, “
Neural Networks Approaches for Computing the Forward Kinematics of a Redundant Parallel Manipulator
,”
Int. J. Comput. Intell.
,
2
(
1
), pp.
40
47
.
15.
Kuzeci
,
Z. E.
,
Omurlu
,
V. E.
,
Alp
,
H.
, and
Ozkol
,
I.
,
2012
, “
Workspace Analysis of Parallel Mechanisms Through Neural Networks and Genetic Algorithms
,”
12th IEEE International Workshop on Advanced Motion Control (AMC 2012)
,
Sarajevo, Bosnia and Herzegovina
,
Mar. 25–27
, IEEE, pp.
1
6
.
16.
Morell
,
A.
,
Acosta
,
L.
, and
Toledo
,
J.
,
2012
, “
An Artificial Intelligence Approach to Forward Kinematics of Stewart Platforms
,”
20th Mediterranean Conference on Control & Automation (MED)
,
Barcelona, Spain
,
July 3–6
, IEEE, pp.
433
438
.
17.
Mirza
,
M. A.
,
Li
,
S.
, and
Jin
,
L.
,
2017
, “
Simultaneous Learning and Control of Parallel Stewart Platforms With Unknown Parameters
,”
Neurocomputing
,
266
(
C
), pp.
114
122
.
18.
Limtrakul
,
S.
, and
Arnonkijpanich
,
B.
,
2019
, “
Supervised Learning Based on the Self-organizing Maps for Forward Kinematic Modeling of Stewart Platform
,”
Neural Comput. Appl.
,
31
(
2
), pp.
619
635
.
19.
Kober
,
J.
,
Bagnell
,
J. A.
, and
Peters
,
J.
,
2013
, “
Reinforcement Learning in Robotics: A Survey
,”
The Int. J. Robot. Res.
,
32
(
11
), pp.
1238
1274
.
20.
Deisenroth
,
M.
, and
Rasmussen
,
C. E.
,
2011
, “
Pilco: A Model-Based and Data-Efficient Approach to Policy Search
,”
Proceedings of the 28th International Conference on International Conference on Machine Learning
,
Bellevue, WA
,
June 28–July 2
, pp.
465
472
.
21.
Åström
,
K. J.
, and
Hägglund
,
T.
,
2001
, “
The Future of PID Control
,”
Control Eng. Pract.
,
9
(
11
), pp.
1163
1175
.
22.
Guan
,
Z.
, and
Yamamoto
,
T.
,
2021
, “
Design of a Reinforcement Learning PID Controller
,”
IEEJ Trans. Electric Electron. Eng.
,
16
(
10
), pp.
1354
1360
.
23.
Ziegler
,
J. G.
, and
Nichols
,
N. B.
,
1942
, “
Optimum Settings for Automatic Controllers
,”
Trans ASME
,
64
(
8
), pp.
759
765
.
24.
Chien
,
K. L.
,
1972
, “
On the Automatic Control of Generalized Passive Systems
,”
Trans. ASME
,
74
(
2
), pp.
175
185
.
25.
Sutton
,
R. S.
,
Barto
,
A. G.
, and
Williams
,
R. J.
,
1992
, “
Reinforcement Learning Is Direct Adaptive Optimal Control
,”
IEEE Control Syst. Mag.
,
12
(
2
), pp.
19
22
.
26.
Aghaei
,
V. T.
,
Ağababaoğlu
,
A.
,
Yıldırım
,
S.
, and
Onat
,
A.
,
2021
, “
A Real-Time Application of Markov Chain Monte Carlo Method for Bayesian Trajectory Control of a Robotic Manipulator
,”
ISA Trans.
27.
Hynes
,
A.
,
Sapozhnikova
,
E. P.
, and
Dusparic
,
I.
,
2020
, “
Optimising PID Control With Residual Policy Reinforcement Learning.
28th Irish Conference on Artificial Intelligence and Cognitive Science
,
Dublin, Republic of Ireland
,
Dec. 7–8
, pp.
277
288
.
28.
Qin
,
Y.
,
Zhang
,
W.
,
Shi
,
J.
, and
Liu
,
J.
,
2018
, “
Improve PID Controller Through Reinforcement Learning
,”
2018 IEEE CSAA Guidance, Navigation and Control Conference
,
Xiamen, China
,
Aug. 10–12
, IEEE, pp.
1
6
.
29.
Bottle
,
K.
,
2023
,
Stewart Platform Mechanical System
, https://www.mathworks.com/matlabcentral/fileexchange/2334-stewart-platform-mechanical-system, MATLAB Central File Exchange.
30.
Rohmer
,
E.
,
Singh
,
S. P. N.
, and
Freese
,
M.
,
2013
, “
V-rep: A Versatile and Scalable Robot Simulation Framework
,”
2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2013)
,
Tokyo, Japan
,
Nov. 3–7
, pp.
1321
1326
.
31.
Quigley
,
M.
,
Conley
,
K.
,
Gerkey
,
B.
,
Faust
,
J.
,
Foote
,
T.
,
Leibs
,
J.
,
Wheeler
,
R.
, and
Ng
,
A. Y.
,
2009
, “
ROS: An Open-Source Robot Operating System
,”
Conference: ICRA Workshop on Open Source Software
,
Kobe, Japan
,
May 12–17
, p.
5
.
32.
Rivera
,
Z. B.
,
De Simone
,
M. C.
, and
Guida
,
D.
,
2019
, “
Unmanned Ground Vehicle Modelling in Gazebo/ROS-Based Environments
,”
Machines
,
7
(
2
), p.
42
.
33.
Koenig
,
N.
, and
Howard
,
A.
,
2004
, “
Design and Use Paradigms for Gazebo, An Open-Source Multi-robot Simulator
,”
2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
,
Sendai, Japan
,
Sept. 28–Oct. 2
, IEEE, pp.
2149
2154
.
34.
Knabe
,
C.
,
Seminatore
,
J.
,
Webb
,
J.
,
Hopkins
,
M.
,
Furukawa
,
T.
,
Leonessa
,
A.
, and
Lattimer
,
B.
,
2015
, “
Design of a Series Elastic Humanoid for the Darpa Robotics Challenge
,”
2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids 2015)
,
Seoul, South Korea
,
Nov. 3–5
, IEEE, pp.
738
743
.
35.
Meyer
,
J.
,
Sendobry
,
A.
,
Kohlbrecher
,
S.
,
Klingauf
,
U.
, and
Stryk
,
O. v.
,
2012
, “
Comprehensive Simulation of Quadrotor UAVs Using Ros and Gazebo
,”
Simulation, Modeling, and Programming for Autonomous Robots: Third International Conference
,
Tsukuba, Japan
,
Nov. 5–8
, Springer, pp.
400
411
.
36.
Osrf
,
2014
,
Tutorial: Using a URDF in Gazebo
, http://classic.gazebosim.org/tutorials?tut=ros_urdf&cat=connect_ros, ©2014 Open Source Robotics Foundation.
37.
Minimal DOM implementation
, https://docs.python.org/3/library/xml.dom.minidom.html
Copyright 2001-2023, Python Software Foundation
.
38.
Ingram
,
D.
,
Simulating a Stewart platform in Gazebo Using a Plugin to Allow Control of a Closed Loop Manipulator With ROS
, https://github.com/daniel-s-ingram/stewart.
39.
Mnih
,
V.
,
Badia
,
A. P.
,
Mirza
,
M.
,
Graves
,
A.
,
Lillicrap
,
T.
,
Harley
,
T.
,
Silver
,
D.
, and
Kavukcuoglu
,
K.
,
2016
, “
Asynchronous Methods for Deep Reinforcement Learning
,”
The 33rd International Conference on Machine Learning
,
New York, NY
,
June 20–22
, PMLR, pp.
1928
1937
.
40.
Lillicrap
,
T. P.
,
Hunt
,
J. J.
,
Pritzel
,
A.
,
Heess
,
N.
,
Erez
,
T.
,
Tassa
,
Y.
,
Silver
,
D.
, and
Wierstra
,
D.
,
2015
, “
Continuous Control With Deep Reinforcement Learning
,” arXiv preprint arXiv:1509.02971.
41.
Schulman
,
J.
,
Wolski
,
F.
,
Dhariwal
,
P.
,
Radford
,
A.
, and
Klimov
,
O.
,
2017
, “
Proximal Policy Optimization Algorithms
,” arXiv preprint arXiv:1707.06347.
42.
Taghirad
,
H. D.
,
2013
,
Parallel Robots: Mechanics and Control
,
CRC Press
,
Boca Raton, FL
.
43.
Yadavari
,
H.
,
Parametric Simulation of Stewart Platform in ROS and Gazebo with Deep Reinforcement Learning Control
, https://github.com/HadiYd/stewart_platform_learning.
44.
Rahman
,
M.
,
Rashid
,
S.
, and
Hossain
,
M.
,
2018
, “
Implementation of Q Learning and Deep Q Network for Controlling a Self Balancing Robot Model
,”
Robot. Biomimet.
,
5
(
1
), pp.
1
6
.
45.
Nagabandi
,
A.
,
Kahn
,
G.
,
Fearing
,
R. S.
, and
Levine
,
S.
,
2018
, “
Neural Network Dynamics for Model-Based Deep Reinforcement Learning With Model-Free Fine-Tuning
,”
2018 IEEE International Conference on Robotics and Automation (ICRA 2018)
,
Brisbane, Australia
,
May 21–25
, IEEE, pp.
7559
7566
.
46.
Li
,
Z.
,
Xue
,
S.
,
Lin
,
W.
, and
Tong
,
M.
,
2018
, “
Training a Robust Reinforcement Learning Controller for the Uncertain System Based on Policy Gradient Method
,”
Neurocomputing
,
316
(
8
), pp.
313
321
.
47.
Senda
,
K.
,
Mano
,
S.
, and
Fujii
,
S.
,
2003
, “
A Reinforcement Learning Accelerated by State Space Reduction
,”
SICE 2003 Annual Conference
,
Fukui, Japan
,
Aug. 4–6
, Vol. 2, IEEE, pp.
1992
1997
.
48.
Sadamoto
,
T.
,
Chakrabortty
,
A.
, and
Imura
,
J.-I.
,
2020
, “
Fast Online Reinforcement Learning Control Using State-Space Dimensionality Reduction
,”
IEEE Trans. Control Netw. Syst.
,
8
(
1
), pp.
342
353
.
49.
Ng
,
A. Y.
,
Harada
,
D.
, and
Russell
,
S.
,
1999
, “
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
,”
International Conference on Machine Learning (ICML 1999)
,
Bled, Slovenia
,
June 27–30
, Vol. 99, pp.
278
287
.
50.
Laud
,
A. D.
,
2004
,
Theory and Application of Reward Shaping in Reinforcement Learning
,
University of Illinois at Urbana-Champaign
,
Champaign, IL
.
51.
Kaelbling
,
L. P.
, and
Kaelbling
,
L. P.
,
1996
,
Recent Advances in Reinforcement Learning
,
Springer
,
Midtown Manhattan, New York City
.
52.
Brockman
,
G.
,
Cheung
,
V.
,
Pettersson
,
L.
,
Schneider
,
J.
,
Schulman
,
J.
,
Tang
,
J.
, and
Zaremba
,
W.
,
2016
, “
Openai Gym
,” arXiv preprint arXiv:1606.01540.
53.
Ezquerro
,
A.
,
Rodriguez
,
M. A.
, and
Tellez
,
R.
,
How to Use OpenAI Reinforcement Learning infrastructure to Train ROS Based Robots in Gazebo Simulations
, http://wiki.ros.org/openai_ros.
54.
Krizhevsky
,
A.
,
Sutskever
,
I.
, and
Hinton
,
G. E.
,
2012
, “
Imagenet Classification With Deep Convolutional Neural Networks
,”
Adv. Neural Inform. Process. Syst.
,
25
, pp.
84
90
.
55.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Rusu
,
A. A.
,
Veness
,
J.
,
Bellemare
,
M. G.
,
Graves
,
A.
, et al.,
2015
, “
Human-Level Control Through Deep Reinforcement Learning
,”
Nature
,
518
(
7540
), pp.
529
533
.
56.
Konda
,
V.
, and
Tsitsiklis
,
J.
,
1999
, “
Actor–Critic Algorithms
,”
Adv. Neural Inf. Process. Syst.
,
12
, pp.
1008
1014
.
57.
Silver
,
D.
,
Lever
,
G.
,
Heess
,
N.
,
Degris
,
T.
,
Wierstra
,
D.
, and
Riedmiller
,
M.
,
2014
, “
Deterministic Policy Gradient Algorithms
,”
International Conference on Machine Learning
,
Beijing, China
,
June 21–26
, PMLR, pp.
387
395
.
58.
Schulman
,
J.
,
Levine
,
S.
,
Abbeel
,
P.
,
Jordan
,
M.
, and
Moritz
,
P.
,
2015
, “
Trust Region Policy Optimization
,”
International Conference on Machine Learning, PMLR 2015
,
Lille, France
,
July 6–11
, pp.
1889
1897
.
59.
Marload River, R. R. S. C.
,
2022
,
Simple implementations of various popular Deep Reinforcement Learning Algorithms using TensorFlow2
, https://github.com/marload/DeepRL-TensorFlow2.
You do not currently have access to this content.