High-Precision Trajectory Tracking for UAV Aerial Refueling Based on Deep Reinforcement Learning and Adaptive Model Predictive Control

Yanning Gong

doi:10.23977/jeeem.2026.090104

High-Precision Trajectory Tracking for UAV Aerial Refueling Based on Deep Reinforcement Learning and Adaptive Model Predictive Control

Download as PDF

DOI: 10.23977/jeeem.2026.090104 | Downloads: 7 | Views: 425

Author(s)

Yanning Gong ¹

Affiliation(s)

¹ Jiangxi University of Water Resources and Electric Power, Nanchang, Jiangxi, 330099, China

Corresponding Author

Yanning Gong

ABSTRACT

To address the challenges of strong wake vortex aerodynamic disturbances, highly nonlinear dynamics of the hose-drogue system, and strict constraints during the unmanned aerial vehicle (UAV) aerial refueling docking phase, a cascaded trajectory tracking and autonomous decision-making framework is proposed, integrating Soft Actor-Critic (SAC) deep reinforcement learning (DRL) with Adaptive Model Predictive Control (AMPC). First, a six-degree-of-freedom (6-DOF) nonlinear dynamic model of the UAV incorporating the Burnham-Hallock wake vortex velocity field interference is established, alongside a lumped-mass catenary dynamic model for the hose-drogue system. Second, at the decision-making layer, an autonomous rendezvous agent based on the SAC algorithm is designed. By formulating a Markov Decision Process (MDP) featuring a continuous state-action space and a composite reward function, safe obstacle avoidance and optimal trajectory planning for the UAV in complex airflow environments are achieved. At the control layer, specifically targeting the precise docking phase, an AMPC is synthesized to enforce physical hard constraints on the actuators (i.e., deflection magnitude and rate limits). Furthermore, a first-order low-pass incremental state observer is introduced to estimate and compensate for the time-varying wind field disturbances induced by the wake vortex in real time. Finally, rigorous numerical evaluations and dynamic simulations are conducted for verification. The results demonstrate that, compared with traditional fuzzy PID and Nonlinear Dynamic Inversion (NDI) control, the proposed SAC-AMPC cascaded framework reduces the Root Mean Square Error (RMSE) along the X, Y, and Z axes by 61.9%, 61.1%, and 64.4%, respectively, relative to the fuzzy PID baseline. Additionally, the maximum transient overshoot is bounded within 0.55 m, the control energy consumption is reduced by 28.8%, and actuator saturation is completely eradicated. The docking success rate across 100 Monte Carlo runs reaches 98.0%. This research provides a novel theoretical foundation and highly reliable technical support for the autonomous aerial refueling of UAVs in complex, disturbance-rich environments.

KEYWORDS

Aerial refueling; Trajectory tracking; Model Predictive Control; Deep reinforcement learning; Wake vortex disturbance; Soft Actor-Critic (SAC)

CITE THIS PAPER

Yanning Gong. High-Precision Trajectory Tracking for UAV Aerial Refueling Based on Deep Reinforcement Learning and Adaptive Model Predictive Control. Journal of Electrotechnology, Electrical Engineering and Management (2026). Vol. 9, No.1, 33-40. DOI: http://dx.doi.org/10.23977/jeeem.2026.090104.

REFERENCES

[1] LIN X K, LIANG X L, REN B X, et al. Aerial refueling trajectory tracking control for UAV based on fuzzy PID [J]. Journal of Ordnance Equipment Engineering, 2022, 43(10): 18-26.
[2] DUAN H B, LI J, YIN S. An autonomous rendezvous and docking framework for UAV aerial refueling using deep reinforcement learning [J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(3): 1465-1478.
[3] ZHANG K, WANG Z, TENG G. Active disturbance rejection control for autonomous aerial refueling of UAVs under varying wake vortex [J]. Aerospace Science and Technology, 2023, 134: 108152.
[4] LI L, CHEN Y, MENG Y. Adaptive Model Predictive Control for constrained quadrotor tracking under unknown aerodynamic disturbances [J]. Automatica, 2022, 140: 110221.
[5] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [C]// International Conference on Machine Learning (ICML). PMLR, 2018: 1861-1870.
[6] BURNHAM D C, HALLOCK J N. Chicago monostatic acoustic vortex sensing system, volume IV: Wake vortex decay [R]. FAA, 1982.
[7] SUN J, CHEN Z, SHEN H. Robust nonlinear dynamic inversion control for UAV close-formation flight in turbulence [J]. Acta Astronautica, 2020, 175: 399-411.
[8] LU Y P, YANG C X, LIU Y Y. A survey of modeling and control technologies for aerial refueling system [J]. Acta Aeronautica et Astronautica Sinica, 2014, 35(09): 2375-2389.

Subscription

E-Mail Alert

Downloads:	6486
Visits:	404882

High-Precision Trajectory Tracking for UAV Aerial Refueling Based on Deep Reinforcement Learning and Adaptive Model Predictive Control

Author(s)

Affiliation(s)

Corresponding Author

ABSTRACT

KEYWORDS

CITE THIS PAPER

REFERENCES

RESOURCES

JOIN US

PUBLICATION SERVICES

CONTACT US