CN111027215B

CN111027215B - Character training system and method for virtual person

Info

Publication number: CN111027215B
Application number: CN201911267237.5A
Authority: CN
Inventors: 王艺敏; 苏洋; 徐智勇; 周华; 沈荟萍; 郑吉林; 赵继勇
Original assignee: Army Engineering University of PLA
Current assignee: Army Engineering University of PLA
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2024-02-20
Anticipated expiration: 2039-12-11
Also published as: CN111027215A

Abstract

A character training system and method for virtual people comprises the following steps: step 1: establishing a parameterized model of emotion stimulation fusion; step 2: establishing an event stimulus reasoning mechanism; step 3: customizing the intention fusion rule; step 4: constructing a psychological response model based on a hidden Markov model; step 5: training a model; step 6: a training mechanism for sustainable learning is performed. The method effectively overcomes the defects that in the prior art, emotion stimulation can only be carried out by a single-mode input event, emotion stimulation at the current moment cannot be comprehensively output, emotion response of an intelligent body cannot be regulated according to user expectations, a data-driven emotion recognition model needs to depend on a large amount of training data marked in detail, and performance cannot be improved in the interaction process with a user.

Description

Character training system and method for virtual person

Technical Field

The invention relates to the technical field of virtual persons under artificial intelligence, also relates to the technical field of character training, in particular to a character training system and a character training method aiming at virtual persons, and especially relates to a virtual character training system and a character training method based on strategy gradient reinforcement learning.

Background

Artificial intelligence has gained increased attention in the computer field. And the method is applied to robots, economic politics decision-making, control systems and simulation systems. In recent years, with the development of artificial intelligence technology, a virtual man technology with man-machine interaction function and a personalized scheme thereof have more requirements. Virtual people should have a certain emotion in intelligent man-machine interaction, and have become common knowledge in the industry. The emotion of a virtual person is triggered by external stimulus and is determined by the "character" of the virtual person. The character of the virtual person may be given by hard coding or in a regular form, for example in the form of if "stimulating" the "emotion". Ideally, the character of a virtual person should be learned like a human in interacting with a user, so that it creates a personalized character for a particular user.

The existing emotion stimulation for the virtual human has the following defects:

1. only single-mode input events can carry out emotion stimulation, and the emotion stimulation at the current moment cannot be comprehensively output;

2. the emotional response of the agent cannot be adjusted according to the user's expectations;

3. the existing emotion recognition model based on data driving needs to rely on a large amount of detailed labeled training data, and cannot improve performance in the process of interaction with a user.

Disclosure of Invention

In order to solve the problems, the invention provides a character training system and a character training method for a virtual person, which effectively avoid the defects that in the prior art, emotion stimulation can only be carried out by a single-mode input event, emotion stimulation at the current moment can not be comprehensively output, emotion response of an intelligent body can not be regulated according to user expectations, a data-driven emotion recognition model needs to depend on a large amount of training data marked in detail, and performance can not be improved in the process of interaction with a user.

In order to overcome the defects in the prior art, the invention provides a solution for a character training system and a character training method for a virtual person, which comprises the following steps:

a planning method for a character training system of a virtual person comprises the following steps:

step 1: establishing a parameterized model of emotion stimulation fusion;

step 2: establishing an event stimulus reasoning mechanism;

step 3: customizing the intention fusion rule;

step 4: constructing a psychological response model based on a hidden Markov model;

step 5: training a model;

step 6: a training mechanism for sustainable learning is performed.

The emotional stimulus includes: the expression of the user, the emotion of the sentence text spoken by the user, the emotion contained in the voice intonation of the user and the emotion implied in the characteristics and actions of the user;

the expression of the user, the emotion of the sentence text spoken by the user and the emotion contained in the voice intonation of the user are respectively given in a vision module, a semantic module and a voice module; the three emotion values are fused to emotion stimulus in a weighted mode;

the emotion implied by the user's own features and actions is given in the form of boolean variables, i.e. whether such features are present or whether the action is present, and the emotion implied by each feature or action is described in the form of a parameter vector, which is fused to the emotion stimulus by weighting.

The method for establishing the event stimulus reasoning mechanism comprises the steps of judging the expression of a user, the emotion of a sentence text spoken by the user and the voice intonation of the user, converting the voice intonation of the user into the sentence text and inputting the sentence text into the semantic module, wherein the semantic module makes judgment of an emotion subject, judgment of a subject/object polarity, judgment of an event/behavior polarity, judgment of an event state and judgment of self response to the inputted sentence text in a natural language processing word segmentation mode, and the judgment mode of the expression of the user serving as a visual event comprises the following steps:

(1) A judgment of an emotion subject, comprising:

the emotion main body takes the intelligent body as a center and is divided into a self person and another person, wherein the self person refers to the intelligent body, and the other person refers to the person with the expression of the user in the visual field identified;

(2) A determination of a host/guest polarity comprising:

host/guest polarities are divided into five classes: the positive, non-positive, negative, non-negative and non-existence, the different values of the four characteristics of the expression, the age, the color value and the sex of the identified person are divided into five types of polarities;

(3) A determination of event/behavior polarity comprising:

event/behavior polarity is divided into five categories: the actions of the identified person are classified into the five types, namely expected, unexpected, expressive, criticizing and nonexistent;

(4) A determination of event status, comprising:

event states fall into six categories: determining that a feature has not occurred in the past, determining that a feature has occurred in the future, and determining that the feature has not been present and is unknown;

(5) A determination of self-reaction comprising:

self-reaction is classified into four categories: a discrete, non-discrete, non-discrete.

The customization of the intent fusion rule adopts the following rule fusion to input intent from the semantic module and the visual module on the basis of time synchronization:

the time synchronization includes: when a text intention input exists, fusing the text intention with a nearby visual input to ensure the time synchronism of the text intention and the nearby visual input, wherein the fused rule follows the following two points:

(1) Complementary intent, including:

if the intention is difficult to judge in a certain channel and the uncertainty is high, the intention of the channel with high uncertainty can be abandoned by the other channel as a supplement, and the intention of the channel with high uncertainty is taken as a certain intention;

(2) Semantic dominance, comprising:

if the confidence level of the two channels is high and the values of the same intention slot are contradictory, semantic input is taken as a main part and visual input is taken as an auxiliary part.

The method for constructing the psychological response model based on the hidden Markov model comprises the following steps:

firstly, a task description diagram of character training is given, a trainer is a user, and a trained person is a robot serving as a virtual person; the user can control the task description diagram in two parts, namely, the user can input the expression, namely, the user can make a series of expressions on the robot; secondly, the expected output, namely the emotional state response of the robot to the series of expressions, is expected by the user;

then, recognizing an expression input sequence by adopting expression recognition software, and converting the expression input sequence into a vector sequence of six-dimensional emotion stimulation, wherein the two sequences are equal in length; then, a hidden Markov model is adopted, six-dimensional emotion stimulation is used as observation input, and the emotion state response of a robot hoped by a user is used as decoding output;

in the hidden Markov model, the retention probability in a certain state is lambda, the probability of jumping out of the state to all other states is 1-lambda, and the retention probability in the state for t time units is lambda t; the coupling effect between the emotion states is corrected by applying a penalty term to the emotion state transition matrix, and the correction can be implemented in Viterbi algorithm decoding.

Training the model comprises the steps that a user makes a user expression sequence for the robot, and marks what emotion is generated by the hoped robot as a hoped emotion sequence in sequence;

after generating a plurality of pairs of user expression sequences and desired emotion sequences, training parameters of the model by adopting a stimulus-emotion hidden Markov model training method based on a desired maximum algorithm, wherein the method comprises the following steps: an observation matrix describing the relationship between external stimuli and emotional states, and a transition matrix between emotional states.

The training mechanism for sustainable learning adopts a reinforcement learning mechanism, comprising maximizing a user score, wherein the maximized user score is an objective function shown as a formula (1):

E _{s～p(s；x,z),x～p(x)} (f(x,s)) (1)

for the objective function, a method of strategy gradient as shown in formula (2) is adopted:

▽ _z E _{s～p(s；x,z),x～p(x)} (f(x,s))＝▽ _z Σ _x p(x)Σ _s p(s；x,z)f(x,s)

＝Σ _x p(x)Σ _s p(s；x,z)▽ _z log(p(s；x,z))f(x,s)

＝E _x～p(x) E _{s～p(s；x,z)} ▽ _z log(p(s；x,z))f(x,s) (2)

＝Σ _j Σ _i ▽ _z log(p(s _ij ；x _j ,z))f(x _j ,s _ij )

wherein s is _ij Is from p (s; x) _j Z), i.e. a given event sample input x _j In the case of the current model parameter z, the probability distribution p(s) in the previous model is based on _ij ；x _j Z) generating random s _ij ；x _j Is an analog event input from a sample p (x _j )。

The character training system for the virtual person comprises a building module, an reasoning module, a fusion module, a construction module, a training module and an execution module;

the establishing module is used for establishing a parameterized model of emotion stimulus fusion;

the reasoning module is used for establishing an event stimulation reasoning mechanism;

the fusion module is used for customizing the intention fusion rule;

the construction module is used for constructing a psychological response model based on the hidden Markov model;

the training module is used for training the model;

the execution module is used for executing a training mechanism for sustainable learning.

The beneficial effects of the invention are as follows:

the virtual human-based training system and the method based on strategy gradient reinforcement learning, which are disclosed by the invention, comprise a quantitative modeling of multi-modal input event emotion stimulus and a model training method based on strategy gradient reinforcement learning, and can be used for multi-modal emotion recognition and emotion calculation: the invention is not limited to the listed events, specific measurement modes of emotion and the like, and compared with single-mode event input, the invention fuses and processes a plurality of event inputs of a plurality of modes and comprehensively outputs emotion stimulus at the current moment; compared with an expert system, the model provided by the invention has adjustable parameters, and can adjust the emotional response of the intelligent body according to the user's expectations through technologies such as quantization, parameterization, optimizing estimation and the like; compared with a classical emotion recognition model based on data driving, the method does not depend on a large amount of detailed labeled training data, and can improve performance in the process of interaction with a user.

Drawings

Fig. 1 is a schematic diagram of a planning method for a character training system for a virtual person according to the present invention.

FIG. 2 is an exemplary diagram of a parameterized model of emotion stimulus fusion of the present invention.

FIG. 3 is an exemplary diagram of a task description graph of character training of the present invention.

FIG. 4 is a schematic diagram of the transfer relationship of the present invention.

FIG. 5 is a schematic diagram of a stimulus-emotion hidden Markov model training method based on a desired maximization algorithm of the present invention.

Detailed Description

The invention considers the following virtual human training scenarios:

the user makes a series of emotional stimuli, text stimuli, to the robot as a virtual person, and hopes that the robot in turn generates a specific emotional state. To effectively model a real scene, sequential stimuli are considered, which in turn produce emotional states. This is because the user makes expressions generally from calm to calm, it is difficult to make strict restrictions on which segment is calm, which segment is happy/angry, etc.; likewise, there is a process for the robot to recognize the user's expression, react to it, and gradually eliminate the reaction.

In addition, the stimulus of the user to the robot is not necessarily expression or voice, but can be other quantifiable events; however, at present, expression stimulus and text stimulus have better quantification models, in particular to expression stimulus which is already sextuple vectorized; here it is necessary to determine in connection with the "boundaries" of the robot application scenario.

In order to realize the training model in the scene, a quantized relation of stimulus-emotion state and a transition relation of emotion state-emotion state are required to be established, and the problem to be solved by the model such as a probability type finite state machine, a hidden Markov model and a Bayesian network is solved. Thus, the present invention employs a hidden Markov model and makes the following assumptions: discretizing the external stimulus into an M-dimensional vector (e.g., m=6 in the model when only expressive stimulus is involved); the 7 basic emotional states (including calm) of the robot are hidden states of the hidden markov model.

The invention will be further described with reference to the drawings and examples.

As shown in fig. 1 to 5, the planning method for the character training system of the virtual person comprises the following steps:

step 1: establishing a parameterized model of emotion stimulation fusion;

external stimuli, i.e. the emotional stimuli coming from the outside, are of the following types, i.e. the emotional stimuli considered to be external include: the expression of the user, the emotion of the sentence text spoken by the user, the emotion contained in the voice intonation of the user and the emotion implied in the characteristics and actions of the user; the user's own features are for example: whether acquaintance, sex, age, color value, etc.; the actions of the user are for example: nodding, waving, appearing, disappearing, speaking, staring at, etc.

The expression of the user, the emotion of the sentence text spoken by the user and the emotion contained in the voice intonation of the user are respectively given in a vision module, a semantic module and a voice module, wherein the vision module can be a module for acquiring the expression of the user through a camera, the semantic module can be a module for extracting emotion colors through analyzing the sentence text of the user, and the voice module can be a module for analyzing the emotion contained in the intonation; the three emotion values are fused to emotion stimulus in a weighted mode; the weights are unknown parameters, which are to be adjusted during the training process, that is, the weights are parameters of the parameterized model, configurable, adjustable;

the emotion implied by the user's own characteristics and actions is given in the form of Boolean variables, that is, whether the characteristics are present or whether the actions are present or not is indicated, the emotion implied by each characteristic or action is described in the form of parameter vectors, and the emotion vectors are fused to emotion stimulus after being weighted; these parameter vectors and weights are also configurable, adjustable.

Step 2: establishing an event stimulus reasoning mechanism;

adopting an OCC emotion basic model and five rules of generation of 23 emotion labels thereof, wherein in the OCC emotion model, the five rules of generation of 23 emotion labels are shown in table 1:

TABLE 1

In order to make emotion stimulus judgment on an input event, five rules can be compared one by one, because the establishment of an event stimulus reasoning mechanism comprises judgment on the expression of a user, the emotion of a sentence text spoken by the user and the voice intonation of the user, the voice module can convert the voice intonation of the user into the sentence text and input the sentence text into the semantic module, and the semantic module can make judgment on the sentence text input by the user in a mode of natural language processing word segmentation and the like, wherein the judgment comprises judgment on emotion subjects, judgment on the polarities of subjects/objects, judgment on the polarities of events/behaviors, judgment on the states of the events and judgment on self-reaction, and only the judgment mode on the expression of the user as a visual event is given here comprises:

(1) A judgment of an emotion subject, comprising:

(2) A determination of a host/guest polarity comprising:

host/guest polarities are divided into five classes: the positive, non-positive, negative, non-negative and non-existence, the different values of the four characteristics of the expression, the age, the color value and the sex of the identified person are divided into five types of polarities; note that the values of these four types of features may be contradictory in polarity, and some rules may be required to be fused once. The fused rules may be given by a configuration file.

(3) A determination of event/behavior polarity comprising:

event/behavior polarity is divided into five categories: the actions of the identified person (such as simple actions of nodding, blinking, shaking, etc.) are classified into the five types; note that the classification of multiple actions may be contradictory and require a fusion to be made first. The fused rules are also given by the configuration file.

(4) A determination of event status, comprising:

event states fall into six categories: determining that a feature has not occurred in the past, determining that a feature has occurred in the future, and determining that the feature has not been present and is unknown; for visual events, it is generally assumed that a temporal state, i.e., a determination has occurred.

(5) A determination of self-reaction comprising:

self-reaction is classified into four categories: a discrete, non-discrete, non-discrete. For visual modality input, this rule may be temporarily overridden.

In the above five rules, the visual event itself may contradict each other in the value of a certain feature, for example, the high color value may be positive, the low color value may be negative, and how to merge may be performed, so that various combinations may need to be traversed to define trainable parameters for determination. The scheme of establishing the event stimulus inference mechanism leaves the task of describing the external event to the structural module (i.e. the visual module and the semantic module) of the event, while the described process is to classify any input for five kinds of features in the emotion rules. The intensity can be obtained for OCC affective tags by replacing hard rules with finite state machines.

Step 3: customizing the intention fusion rule;

in addition to the user speaking sentences that express the user's intent, the user's gestures and actions sometimes have intent. Thus, the customization of the intent fusion rules fuses intent input from the semantic and visual modules on a time-synchronized basis using the following rules:

the time synchronization includes: visual input due to user expression change or motion detection is generally more frequent, and is much larger in number than entries for user voice interaction. Thus, most timestamps have only visual input, and no text semantic input, in which case the visual intent of one channel is the intent of the fused output. When there is another channel of text intent input, the text intent is fused with the nearby visual input, ensuring the synchronicity of the two in time, the fused rules follow the following two points:

(1) Complementary intent, including:

if the intention is difficult to judge in a certain channel and the uncertainty is high, the intention of the channel with high uncertainty can be abandoned by the other channel as a supplement, and the intention of the channel with high uncertainty is taken as a certain intention; the rule is specifically implemented by confidence level judgment and a slot filling method.

(2) Semantic dominance, comprising:

firstly, a task description diagram of character training is given, as shown in fig. 3, a trainer is a user, and a trainee is a robot serving as a virtual person; the user can control the task description diagram in two parts, namely, the user can input the expression, namely, the user can make a series of expressions on the robot; secondly, the expected output, namely the emotional state response of the robot to the series of expressions, is expected by the user, and the two parts are respectively seen at the top and the bottom of the figure 3;

to combine the two parts, the expression recognition software is then used to recognize the expression input sequence, and the expression input sequence is converted into a vector sequence of six-dimensional emotion stimuli, such as in the form of One-Hot encoding, with each emotion stimulus corresponding to an element of the vector, i.e., 1 (or 0) representing the presence (or absence) of the emotion stimulus, such as x _t ＝[0,1,0,0,0,0] ^T Indicating that the time t is subjected to the 2 nd emotion stimulus; then, a hidden Markov model is adopted, six-dimensional emotion stimulation is used as observation input, the emotion state response of a robot hoped by a user is used as decoding output, two sequences are unequal in length, and generally, the emotion change of the robot is less than the emotion change of the user; the two parts are respectively marked with shading parts with different darkness, which indicates the invisibility.

Hidden markov models focus on establishing a relationship of a six-dimensional emotional stimulus sequence to the emotional state response of a user desiring a robot. FIG. 4 is a schematic diagram of a discrete density hidden Markov model for describing the quantitative relationship of "stimulus-emotion state" and the transfer relationship of "emotion state-emotion state" for this task, both of which can realize quantitative description. It should be noted that the model of fig. 4 does not conflict with the rule-based hard-coded model, which is a natural continuation of the latter. The hard coding actually characterizes the logical relationship of the stimulus event and the emotional state, i.e. the observation matrix in fig. 4 is a 0-1 matrix, and the probability relationship Pr (x1|s1) is a bernoulli distribution.

In the hidden Markov model, the retention probability in a certain state is lambda, the probability of jumping out of the state to all other states is 1-lambda, and the retention probability in the state for t time units is lambda t; the coupling effect between emotion states is corrected by applying a penalty term to the emotion state transition matrix, and the correction can be implemented in Viterbi algorithm decoding, and in fact, the application of a language model in speech recognition decoding is performed.

Step 5: training a model;

the user makes a user expression sequence, namely a continuous action, on the robot, and marks what emotion the robot is expected to generate in sequence as a desired emotion sequence, for example, the user performs a performance such as 'start without expression- & gtgazelle- & gtfrowning- & gtwithout expression', marks that the robot is expected to be 'calm- & gtfear- & gtcalm' or the robot is expected to be 'calm- & gtanger- & gtcalm' (note that the robot corresponding to two types of characters respectively, namely, the former character is small and the latter character is big);

after generating the plurality of pairs of user expression sequences and the desired emotion sequences, the parameters of the model can be trained using the stimulus-emotion hidden Markov model training method of FIG. 5 based on the desired maximum algorithm, the parameters including: description of ith external stimulus x _i And jth emotion state s _j Observation probability E of relationship _ij ＝P(x _i |s _j ) Probability of transition between emotional states T _jk ＝P(s _k |s _j ) Wherein i is used as a sequence number for traversing all external stimuli, j and k are used as sequence numbers for traversing all emotional states, and i, j and k are positive integers, so that the observation probability and the transition probability are respectively expressed as an observation matrix E and a state transition matrix T in a matrix form, and the training method is specifically shown in the following example:

taking the user expression sequence as input { x } ₁ ,x ₂ ,…,x _t X is here _t Representing the t-th user expression and treating the desired emotion sequence as a state { q } ₁ ,q ₂ ,…,q _t Q of here _t Representing the t-th desired emotion, t being a positive integer, where q _t Taking the set of 6 emotion states { s } ₁ ,s ₂ ,s ₃ ,s _4, s _5, s ₆ An element in }; the following steps are adopted:

a) Estimating each q using Viterbi algorithm _t As shown in FIG. 5, q _t Taking 'calm for several times', 'happy for several times' and 'calm for several times' respectively;

b) On the basis of the value, the parameters of the observation matrix E and the state transition matrix T are updated by using maximum likelihood estimation;

c) Repeating the steps a) and b) several times, more than two times.

Step 6: executing a training mechanism for sustainable learning;

the goal of the sustainable learning training mechanism as machine learning training is to make the model more accurate and richer. More accurate means that the model parameters can more accurately characterize the stimulus and character, richer means that the model can accommodate more external events through self-learning. To make the parameters more accurate, reinforcement learning mechanisms are employed, including maximizing the user score, which is an objective function as shown in equation (1):

E _{s～p(s；x,z),x～p(x)} (f(x,s)) (1)

wherein f (x, s) is the evaluation of the response of the user when the virtual person receives the external stimulus x to make the emotion response s, and the positive number is used for indicating that the user approves the emotion response of the virtual person and exciting the response; taking a negative number indicates that the user does not recognize the reaction and suppresses the reaction. E represents mathematical expectation, s-p (s; x, z) represents extracting s according to probability p (s; x, z), x-p (x) represents extracting x according to probability p (x), and z represents parameters to be optimized in the reinforcement learning model. For the objective function, iteratively updating the parameter z by adopting a strategy gradient method as shown in a formula (2):

＝Σ _x p(x)Σ _s p(s；x,z)▽ _z log(p(s；x,z))f(x,s)

＝E _x～p(x) E _{s～p(s；x,z)} ▽ _z log(p(s；x,z))f(x,s) (2)

＝Σ _j Σ _i ▽ _z log(p(s _ij ；x _j ,z))f(x _j ,s _ij )

wherein s is _ij Is from p (s; x) _j Z), i.e. given event sample input x _j In the case of the current model parameter z, the probability distribution p(s) in the previous model is based on _ij ；x _j Z) generating random s _ij ；x _j Is an analog event input from a sample p (x _j ) The method comprises the steps of carrying out a first treatment on the surface of the Thus, the policy gradient▽ _z E _{s～p(s；x,z),x～p(x)} (f (x, s)) can be calculated and the model parameters z can be optimized along the gradient direction, where i and j are positive integers.

In order to enable the model to handle more external events, it is necessary to build up in terms of structured representation of visual and semantic events, unified under a representation framework, and then get emotional stimuli by means of derivation rules and parameterized models.

The character training system for the virtual person comprises a building module, an reasoning module, a fusion module, a construction module, a training module and an execution module; the building module, the reasoning module, the fusion module, the building module, the training module and the execution module can run on a robot serving as a virtual person.

the fusion module is used for customizing the intention fusion rule;

the training module is used for training the model;

While the invention has been described by way of examples, it will be understood by those skilled in the art that the present disclosure is not limited to the examples described above, and that various changes, modifications and substitutions may be made without departing from the scope of the invention.

Claims

1. A method of character training system for a virtual person, comprising the steps of:

step 1: establishing a parameterized model of emotion stimulation fusion;

step 2: establishing an event stimulus reasoning mechanism;

step 3: customizing the intention fusion rule;

step 5: training a psychological response model based on a hidden Markov model;

step 6: executing a training mechanism for sustainable learning;

the establishment of the event stimulus reasoning mechanism comprises judgment of the expression of a user, the emotion of a sentence text spoken by the user and the voice intonation of the user, the voice module converts the voice intonation of the user into the sentence text and inputs the sentence text into a semantic module, the semantic module makes judgment of an emotion subject, judgment of a subject/object polarity, judgment of an event/action polarity, judgment of an event state and judgment of self response to the input sentence text in a natural language processing word segmentation mode, and the judgment mode of the expression of the user serving as a visual event comprises the following steps:

(1) A judgment of an emotion subject, comprising:

(2) A determination of a host/guest polarity comprising:

(3) A determination of event/behavior polarity comprising:

(4) A determination of event status, comprising:

(5) A determination of self-reaction comprising:

self-reaction is classified into four categories: a discrete, non-discrete, non-discrete;

2. The method for a virtual human personality training system of claim 1 wherein the emotional stimulus comprises: the expression of the user, the emotion of the sentence text spoken by the user, the emotion contained in the voice intonation of the user and the emotion implied in the characteristics and actions of the user;

3. The method for a virtual human personality training system according to claim 1 wherein the customization of the intent fusion rules fuses intent input from the semantic and visual modules on a time-synchronized basis using the following rules:

(1) Complementary intent, including:

if the intention is difficult to judge in a certain channel and the uncertainty is high, the intention of the channel with high uncertainty is abandoned by using the other channel as a supplement, and the intention of the channel with high uncertainty is taken as the intention of the more confident channel;

(2) Semantic dominance, comprising:

4. The method of claim 1, wherein the training of the model includes a user making a sequence of user expressions for the robot and noting what emotions the desired robot, as a sequence of desired emotions, produces in sequence;

5. The method of claim 1, wherein the sustainable learning training mechanism employs a reinforcement learning mechanism comprising maximizing a user score, the maximizing a user score being an objective function as shown in formula (1):

E _{s～p(s；x,z),x～p(x)} (f(x,s))(1)

for the objective function, a method of strategy gradient as shown in formula (2) is adopted: (V) _z E _{s～p(s；x,z),x～p(x)} (f(x,s)) ＝Σ _j Σ _i ▽ _z log(p(s _ij ；x _j ,z))f(x _j ,s _ij ) (2)

Wherein s is _ij Is from p (s; x) _j Z), i.e. a given event sample input x _j In the case of the current hidden Markov model-based psycho-reactive model parameters z, the probability distribution p(s) in the parameterized model fused according to the previous emotional stimuli _ij ；x _j Z) generating random s _ij ；x _j Is an analog event input from a sample p (x _j ) Wherein f (x, s) is the evaluation of the response of the user when the virtual person receives the external stimulus x to make the emotion response s, and the positive number represents that the user approves the emotion response of the virtual person and stimulates the response; e represents mathematical expectation, s-p (s; x, z) represents extracting s according to probability p (s; x, z), x-p (x) represents extracting x according to probability p (x), and z represents parameters to be optimized in the reinforcement learning model; where i and j are positive integers.

6. A character training system for a virtual person is characterized by comprising a building module, an reasoning module, a fusion module, a building module, a training module and an execution module;

the fusion module is used for customizing the intention fusion rule;

the training module is used for training a psychological response model based on a hidden Markov model;

the execution module is used for executing a training mechanism for sustainable learning;

(1) A judgment of an emotion subject, comprising:

(2) A determination of a host/guest polarity comprising:

(3) A determination of event/behavior polarity comprising:

(4) A determination of event status, comprising:

(5) A determination of self-reaction comprising: