Education, Science, Technology, Innovation and Life
Open Access
Sign In

Enhance Matching Model Performance by Deep Reinforcement Learning in Faq-Oriented Dialogue System

Download as PDF

DOI: 10.23977/ISTAE2021028

Author(s)

Xiaotong Pan, Zuopeng Liu

Corresponding Author

Xiaotong Pan

ABSTRACT

In task-oriented dialogue system (TODS) we extract dialogue state against spoken language understanding findings, then calculate action distribution and determine one action as reply in policy learning, which is easy to define user feedback to adjust dialogue policy if we seem policy learning algorithm as an agent, dialogue state as slot filling result, task completion as reward, then train a deep reinforcement learning (DRL) model to determine which action would get more reward during prediction. Relatively in FAQ-oriented dialogue system (FODS) it is intractable to optimize question-answer policy out of user feedback since it is hard to define slots in FODS, all information is organized as frequently asked questions (FAQs), thus system action’s selection in TODS’ policy learning is converted to matching between user query and FAQ in FODS, that is hard to adjust matching algorithm by user feedback information directly. To tackle this issue, we propose an algorithm based on DRL that can improve matching performance by online user feedback, in contrast to previous matching approach which just made use of semantic similarity. By this method we increase interception ratio, decrease human transfer ratio and shorten average user interaction turn against various diverse business data in Xiaomi Intelligent Customer Service (XICS).

KEYWORDS

Dialogue system, Deep reinforcement learning

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.