Reinforcement learning-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage traning

Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics device. The aim is to enable accurate target tracking by UAV with responding to the dynamic generated by the target such as sudden trajectory change...

Full description

Bibliographic Details
Main Author:	Ahmed Abo Mosali, Najm Addin Mohammed
Format:	Thesis
Language:	English English English
Published:	2022
Subjects:	TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television
Online Access:	http://eprints.uthm.edu.my/8452/1/24p%20NAJM%20ADDIN%20MOHAMMED%20AHMED%20ABO%20MOSALI.pdf http://eprints.uthm.edu.my/8452/2/NAJM%20ADDIN%20MOHAMMED%20AHMED%20ABO%20MOSALI%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/8452/3/NAJM%20ADDIN%20MOHAMMED%20AHMED%20ABO%20MOSALI%20WATERMARK.pdf

_version_	1825710575914582016
author	Ahmed Abo Mosali, Najm Addin Mohammed
author_facet	Ahmed Abo Mosali, Najm Addin Mohammed
author_sort	Ahmed Abo Mosali, Najm Addin Mohammed
collection	UTHM
description	Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics device. The aim is to enable accurate target tracking by UAV with responding to the dynamic generated by the target such as sudden trajectory change using reinforcement learning which is proved to learn dynamic effectively. In this thesis, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as one recent and composite architecture of reinforcement learning (RL), has been explored as a tracking agent for the problem of UAV-based target tracking. This involved several improvements on the original TD3. First, the proportional-differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking was proposed to enable a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. Third, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Fourth, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. Fifth, a novel agent selection algorithm was developed to enable the selection of the best agent and avoid under-fitting and over-fitting. For the purpose of evaluating the performance of the control system, flight testing was conducted based on three types of target trajectories, namely fixed, square, and blinking. The evaluation was performed in both simulation and real-world experiments. The results showed that the multistage training achieved the best-accomplished performance with both exponential and achievement rewarding for a fixed trained agent with a fixed and square moving target and for a combinatorial agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential (PD) controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking.
first_indexed	2024-03-05T21:59:41Z
format	Thesis
id	uthm.eprints-8452
institution	Universiti Tun Hussein Onn Malaysia
language	English English English
last_indexed	2024-03-05T21:59:41Z
publishDate	2022
record_format	dspace
spelling	uthm.eprints-84522023-02-27T01:03:28Z http://eprints.uthm.edu.my/8452/ Reinforcement learning-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage traning Ahmed Abo Mosali, Najm Addin Mohammed TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics device. The aim is to enable accurate target tracking by UAV with responding to the dynamic generated by the target such as sudden trajectory change using reinforcement learning which is proved to learn dynamic effectively. In this thesis, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as one recent and composite architecture of reinforcement learning (RL), has been explored as a tracking agent for the problem of UAV-based target tracking. This involved several improvements on the original TD3. First, the proportional-differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking was proposed to enable a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. Third, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Fourth, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. Fifth, a novel agent selection algorithm was developed to enable the selection of the best agent and avoid under-fitting and over-fitting. For the purpose of evaluating the performance of the control system, flight testing was conducted based on three types of target trajectories, namely fixed, square, and blinking. The evaluation was performed in both simulation and real-world experiments. The results showed that the multistage training achieved the best-accomplished performance with both exponential and achievement rewarding for a fixed trained agent with a fixed and square moving target and for a combinatorial agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential (PD) controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking. 2022-08 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/8452/1/24p%20NAJM%20ADDIN%20MOHAMMED%20AHMED%20ABO%20MOSALI.pdf text en http://eprints.uthm.edu.my/8452/2/NAJM%20ADDIN%20MOHAMMED%20AHMED%20ABO%20MOSALI%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/8452/3/NAJM%20ADDIN%20MOHAMMED%20AHMED%20ABO%20MOSALI%20WATERMARK.pdf Ahmed Abo Mosali, Najm Addin Mohammed (2022) Reinforcement learning-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage traning. Doctoral thesis, Universiti Tun Hussein Onn Malaysia.
spellingShingle	TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television Ahmed Abo Mosali, Najm Addin Mohammed Reinforcement learning-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage traning
title	Reinforcement learning-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage traning
title_full	Reinforcement learning-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage traning
title_fullStr	Reinforcement learning-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage traning
title_full_unstemmed	Reinforcement learning-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage traning
title_short	Reinforcement learning-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage traning
title_sort	reinforcement learning based target tracking for unmanned aerial vehicle with achievement rewarding and multistage traning
topic	TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television
url	http://eprints.uthm.edu.my/8452/1/24p%20NAJM%20ADDIN%20MOHAMMED%20AHMED%20ABO%20MOSALI.pdf http://eprints.uthm.edu.my/8452/2/NAJM%20ADDIN%20MOHAMMED%20AHMED%20ABO%20MOSALI%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/8452/3/NAJM%20ADDIN%20MOHAMMED%20AHMED%20ABO%20MOSALI%20WATERMARK.pdf
work_keys_str_mv	AT ahmedabomosalinajmaddinmohammed reinforcementlearningbasedtargettrackingforunmannedaerialvehiclewithachievementrewardingandmultistagetraning

Reinforcement learning-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage traning

Similar Items