CN112201277B - Voice response method, device, equipment and computer readable storage medium - Google Patents

Voice response method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN112201277B
CN112201277B CN202011052933.7A CN202011052933A CN112201277B CN 112201277 B CN112201277 B CN 112201277B CN 202011052933 A CN202011052933 A CN 202011052933A CN 112201277 B CN112201277 B CN 112201277B
Authority
CN
China
Prior art keywords
voice
intonation
user
type
response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011052933.7A
Other languages
Chinese (zh)
Other versions
CN112201277A (en
Inventor
申亚坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202011052933.7A priority Critical patent/CN112201277B/en
Publication of CN112201277A publication Critical patent/CN112201277A/en
Application granted granted Critical
Publication of CN112201277B publication Critical patent/CN112201277B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The application provides a method, a device, equipment and a computer readable storage medium for voice response, comprising the following steps: and acquiring the user voice, determining the intonation type corresponding to the user voice according to the voice characteristics and the voice content of the user voice, generating a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content, and finally broadcasting the response voice. Because the broadcasted response voice is obtained according to the intonation type of the user voice and the voice content, the broadcasted response voice can be different as long as the intonation type of the user voice is different, personalized response according to the user voice is realized, and therefore the experience of the user can be improved. In addition, the intonation type corresponding to the user voice is determined according to the voice characteristics and the two dimensions of the voice content of the user voice, so that the intonation type corresponding to the user voice has higher accuracy, and the accuracy of the broadcasted response voice can be improved.

Description

Voice response method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of speech processing, and in particular, to a method and apparatus for speech response, an electronic device, and a computer readable storage medium.
Background
In many service scenarios, intelligent voice response devices are provided for voice interaction with a user. However, at present, the response modes of many intelligent voice response devices are relatively single, for example, a unified intonation response mode is adopted for response, personalized response cannot be performed according to different user voices, and service experience of users cannot be improved.
Disclosure of Invention
The application provides a voice response method and device, electronic equipment and a computer readable storage medium, and aims to solve the problem of how to perform personalized response according to user voice in application of voice response equipment.
In order to achieve the above object, the present application provides the following technical solutions:
a method of voice response, comprising:
acquiring user voice;
determining the intonation type corresponding to the user voice according to the voice characteristics and the voice content of the user voice;
generating response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
and broadcasting the response voice.
In the above method, optionally, the intonation types include at least two designated intonation types, and any one of the intonation types is preset according to the voice characteristics and the voice content of the voice of the historical user;
the speech features include at least pitch features and amplitude features.
In the above method, optionally, the determining the intonation type corresponding to the user voice according to the voice feature of the user voice and the voice content includes:
inputting the user voice into a pre-trained Bayesian classification model, and enabling the Bayesian classification model to determine the intonation type corresponding to the user voice according to the voice characteristics of the user voice;
recognizing and obtaining the voice content corresponding to the user voice;
inputting the voice content of the user voice into a pre-trained voice classification model; the voice classification model determines the intonation type corresponding to the user voice according to the voice content of the user voice;
respectively acquiring the intonation types corresponding to the user voice output by the Bayesian model and the voice classification model;
and if the intonation type output by the Bayesian classification model and the intonation type output by the voice classification model are the same intonation type, the same intonation type is used as the intonation type corresponding to the voice of the user.
The method, optionally, further comprises:
and if the intonation types output by the Bayesian classification model and the intonation types output by the voice classification model are different intonation types, determining the intonation type corresponding to the user voice as a preset default intonation type.
In the above method, optionally, the bayesian classification model is obtained by training according to a voice training sample, where the voice training sample carries the voice features;
the process of determining the intonation type corresponding to the user voice by the Bayesian classification model is as follows: and the Bayesian classification model calculates the probability that the user voice respectively belongs to each intonation type according to the voice characteristics of the user voice, and determines the intonation type corresponding to the maximum probability value as the intonation type corresponding to the user voice.
In the above method, optionally, the voice classification model is a GA-BP neural network model, and the GA-BP neural network model is a model obtained by optimizing an initial BP neural network model;
the number of input layer nodes of the initial BP neural network model is determined according to the voice content length of a voice training sample, the number of output layer nodes is determined according to the intonation type, and the number of hidden layer nodes is determined based on a trial-and-error method;
the optimizing of the initial BP neural network model is as follows: training and learning the initial weight and the threshold value of each layer in the input layer, the hidden layer and the output layer of the initial BP neural network model according to preset sample data and a genetic algorithm, and determining the optimal initial weight and the threshold value of each layer to obtain the optimized BP neural network model.
The method, optionally, generating a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content, includes:
determining responsive voice content based on the voice content;
generating the response voice with voice content as the response voice content and tone type as the response voice of the tone type corresponding to the user voice.
An apparatus for voice response, comprising:
the acquisition unit is used for acquiring the voice of the user;
the determining unit is used for determining the intonation type corresponding to the user voice according to the voice characteristics and the voice content of the user voice;
a generating unit, configured to generate a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
and the broadcasting unit is used for broadcasting the response voice.
A voice response apparatus comprising: a processor and a memory for storing a program; the processor is configured to run the program to implement the method of voice response described above.
A computer readable storage medium having instructions stored therein which, when executed on a computer, cause the computer to perform the method of voice response described above.
The method and the device disclosed by the application comprise the following steps: and acquiring the user voice, determining the intonation type corresponding to the user voice according to the voice characteristics and the voice content of the user voice, generating a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content, and finally broadcasting the response voice. Because the broadcasted response voice is obtained according to the intonation type of the user voice and the voice content, the broadcasted response voice can be different as long as the intonation type of the user voice is different, personalized response according to the user voice is realized, and therefore the experience of the user can be improved.
In addition, the intonation type corresponding to the user voice is determined according to the voice characteristics and the two dimensions of the voice content of the user voice, so that the intonation type corresponding to the user voice has higher accuracy, and the accuracy of the broadcasted response voice can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method of providing a voice response according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for determining intonation types corresponding to user speech provided in an example of the present application;
fig. 3 is a schematic structural diagram of a voice response device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a voice response device according to an embodiment of the present application.
Detailed Description
In many occasions, intelligent voice broadcasting equipment is adopted to carry out voice interaction on users, however, at present, many intelligent voice response equipment only pay attention to the content of the voice of the users and does not pay attention to intonation for the voice, so that the unified intonation response mode is generally adopted to carry out response, personalized response cannot be carried out according to different voices of the users, and service experience of the users cannot be improved.
Therefore, the embodiment of the application provides a voice response method, which aims to respond to a user by combining the voice of the user and the voice content of the voice of the user so as to realize personalized response according to different voices of the user.
In this application, the voice content of the user voice refers to voice text content corresponding to the user voice.
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The execution subject of this embodiment is an intelligent voice broadcasting device, such as an intelligent voice robot, with a voice processing function.
Fig. 1 is a method for providing a voice response according to an embodiment of the present application, which may include the following steps:
s101, acquiring user voice.
The user voice is user voice, and the intelligent voice broadcasting equipment can acquire the user voice in a voice acquisition range under the running state.
S102, determining the intonation type corresponding to the user voice according to the voice characteristics and the voice content of the user voice.
In this embodiment, the voice features are information that can be used to describe the mood and emotion attitudes of the user's voice, and include pitch features, amplitude features, tone features, and the like.
The intonation type comprises at least two appointed intonation types, the voice characteristics and voice contents of the historical user voice are preset, namely the appointed intonation types are set according to voice information of the mood and emotion attitudes of the historical user voice and the voice contents of the historical user voice, the appointed intonation types can be a fun-like interaction intonation type, a mild formal interaction intonation type and the like, for example, the fun-like interaction intonation type can be a type that the tone or the voice amplitude of the voice is greatly changed, and the relativity of the voice contents and service inquiry type problems is weak. The gentle formal interactive intonation type may be one where the pitch or amplitude of the speech is less variable and the content of the speech is more relevant to the service query.
For a specific embodiment of this step, reference may be made to the flowchart shown in fig. 2.
S103, generating response voice corresponding to the user voice based on the intonation type and the voice content corresponding to the user voice.
The specific implementation mode of the step comprises the step A1 and the step A2:
and A1, determining response voice content based on the voice content of the user voice.
Responsive voice content corresponding to the voice content of the user voice is determined based on the voice content of the user voice, and for example, the corresponding responsive voice content may be determined based on keywords included in the voice content.
Of course, in the step, the response voice content may be determined based on the voice content and the intonation type of the user voice, that is, the response voice content of the response voice is not only related to the voice content of the user voice, but also related to the intonation type of the user voice, that is, the voice content of the user voice is the same, and the voice content of the response voice may be different under the condition of different intonation types, so that the method has better individuation characteristics.
And A2, generating response voice with voice content as response voice content and tone type as tone type corresponding to the voice of the user.
The intonation type of the response voice is the same as that of the user voice, and the personalized effect of the response voice can be enhanced.
S104, broadcasting response voice.
For example, the intelligent voice broadcasting device invokes a preset voice broadcasting device to broadcast response voice.
The method provided by the embodiment comprises the following steps: and acquiring the user voice, determining the intonation type corresponding to the user voice according to the voice characteristics and the voice content of the user voice, generating a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content, and finally broadcasting the response voice. Because the broadcasted response voice is obtained according to the intonation type of the user voice and the voice content, the broadcasted response voice can be different as long as the intonation type of the user voice is different, personalized response according to the user voice is realized, and therefore the experience of the user can be improved.
In addition, the intonation type corresponding to the user voice is determined according to the voice characteristics and the two dimensions of the voice content of the user voice, so that the intonation type corresponding to the user voice has higher accuracy, and the accuracy of the broadcasted response voice can be improved.
Fig. 2 is a specific implementation manner of determining the intonation type corresponding to the user voice according to the voice characteristics and the voice content of the user voice in S102 of the foregoing embodiment, which may include the following steps:
s201, inputting the user voice into a pre-trained Bayesian classification model, so that the Bayesian classification model determines the intonation type corresponding to the user voice according to the voice characteristics of the user voice.
In the step, a Bayesian classification model is obtained by training according to a voice training sample. The speech training sample carries a plurality of speech features, wherein the training method for obtaining the Bayesian classification model by training the training sample can refer to the prior art.
The pre-trained Bayesian classification model can extract the tone characteristics of the user voice and determine the intonation type corresponding to the user voice based on the tone characteristics of the user voice.
The method comprises the following steps: the Bayesian classification model calculates the probability that the user voice belongs to each appointed intonation type respectively according to the voice characteristics of the user voice, and determines the appointed intonation type corresponding to the maximum probability value as the intonation type corresponding to the user voice.
For example, the feature set of all intonation features of the user voice is represented by X, and the first intonation type is represented by Y1, and the probability that the user voice belongs to the first intonation type is calculated by introducing all intonation features of the user voice into a probability formula.
Wherein, the probability formula is:
p (Y1|X) is the probability that the user's speech belongs to the common first intonation type Y1 under the condition of the feature set X of the user's speech, A i Representing the ith feature in the feature set X corresponding to the user voice, n is the number of features in the feature set X, P (Y1) is the probability that any intonation type belongs to the first intonation type Y1, and P (A) i The corresponding feature is A under the condition that Y1) is intonation type Y1 i P (X) is the probability of the user's speech occurring in all specified intonation types, P (A) i ) With features A for any one speech i Is a function of the probability of (1),
wherein P (Y1), P (A) i Y1), and P (a) i ) Is estimated in advance from a plurality of feature sets X of determined intonation types. The larger the number of feature sets X, the more accurate the intonation types corresponding to the feature sets X, and the estimated P (Y1), P (A i Y1), and P (a) i ) The more accurate.
S202, recognizing and obtaining voice content corresponding to the voice of the user.
The step can adopt the existing voice recognition method to obtain the voice content of the user voice.
S203, inputting the voice content of the user voice into a pre-trained voice classification model, so that the voice classification model determines the intonation type corresponding to the user voice according to the voice content of the user voice.
The Bayesian classification model is used for determining the intonation type corresponding to the user voice based on the voice characteristics of the voice, and the voice classification model is used for determining the intonation type corresponding to the user voice according to the voice content of the user voice.
Optionally, the voice classification model is a GA-BP neural network model. The GA-BP neural network model is obtained by optimizing the initial BP neural network model. The trained voice classification model can obtain the intonation type corresponding to the input voice content.
The number of the input layer nodes of the initial BP neural network model is determined according to the voice content length of the voice training sample, the number of the output layer nodes is determined according to the intonation type, and the number of the hidden layer nodes is determined based on a trial-and-error method. The voice training sample is voice content of historical user voice carrying intonation types.
The initial BP neural network model is optimized as follows: training and learning the initial weight and the threshold value of each layer in the input layer, the hidden layer and the output layer of the initial BP neural network model according to preset sample data and a genetic algorithm, and determining the optimal initial weight and the threshold value of each layer to obtain the optimized BP neural network model. For a specific optimization procedure, reference is made to the prior art.
S204, respectively acquiring intonation types corresponding to the user voice output by the Bayesian model and the voice classification model.
S205, judging whether the intonation types corresponding to the user voice output by the Bayesian model and the voice classification model are the same. If the same, S206 is executed, and if not, S207 is executed.
S206, using the same intonation type as the intonation type corresponding to the user voice.
And if the intonation types corresponding to the user voice output by the Bayesian model and the voice classification model are the same, the probability that the same intonation type is the correct intonation type of the user voice is high.
S207, determining the intonation type corresponding to the voice of the user as a preset default intonation type.
For example, the default intonation type may be preset to be a gentle intonation type, so that the intonation type corresponding to the user voice output by the bayesian model and the voice classification model is determined to be a gentle formal interactive intonation type when the intonation types corresponding to the user voice are different.
According to the method provided by the embodiment, the voice classification model is used for determining the intonation type corresponding to the user voice based on the voice characteristics of the voice, the voice classification model is used for determining the intonation type corresponding to the user voice according to the voice content of the user voice, and the method is equivalent to determining the intonation type for the voice from different dimensions, so that the intonation type of the user voice is determined jointly by combining the trained Bayesian classification model and the voice classification model, and the accuracy of the obtained intonation type of the user voice can be improved.
Fig. 3 is a schematic structural diagram of a voice response device according to an embodiment of the present application, including: a processor 301 and a memory 302, the memory being for storing a program, the processor being for running the program to implement the method of voice response provided herein.
Intelligent voice response devices may be placed at various service points for providing automatic voice response services to users. For example, the intelligent voice response device may be used in a service website for transacting business to enhance the user's service experience by providing a fun-in-cheering interaction with the user, as well as providing a business transacting class interaction.
For example, considering that the user's voice is of the intonation type and is of the cheerful intonation type, the user is likely to wish to perform an informal fun-like interaction with the intelligent voice device, and when the user's voice is of the flat intonation type, the user is likely to wish to perform a formal business interaction with the intelligent voice device.
Correspondingly, the intonation type of the user voice is designated as a cheerful intonation type or a gentle intonation type in advance, and the intelligent voice response device is preconfigured to respond to the user by adopting the intonation of cheerful fun when the intonation type of the user voice is determined as the cheerful intonation type, and to respond to the user by adopting the gentle formal intonation when the intonation type of the user voice is determined as the gentle intonation type. The intelligent voice response device can improve the service experience of the user by providing two different interaction modes.
Fig. 4 is a schematic structural diagram of a voice response device according to an embodiment of the present application, including:
an acquisition unit 401 for acquiring a user voice;
a determining unit 402, configured to determine a intonation type corresponding to the user voice according to the voice feature and the voice content of the user voice;
a generating unit 403, configured to generate a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
and the broadcasting unit 404 is used for broadcasting the response voice.
The intonation type comprises at least two appointed intonation types, wherein any one intonation type is preset according to voice characteristics and voice contents of historical user voices, and the voice characteristics at least comprise tone characteristics and sound amplitude characteristics.
The determining unit 402 determines, according to the voice characteristics and the voice content of the user voice, a specific implementation manner of the intonation type corresponding to the user voice as follows:
inputting the user voice into a pre-trained Bayesian classification model, and enabling the Bayesian classification model to determine the intonation type corresponding to the user voice according to the voice characteristics of the user voice;
recognizing and obtaining the voice content corresponding to the user voice;
inputting the voice content of the user voice into a pre-trained voice classification model; the voice classification model determines the intonation type corresponding to the user voice according to the voice content of the user voice;
respectively acquiring the intonation types corresponding to the user voice output by the Bayesian model and the voice classification model;
if the intonation type output by the Bayesian classification model and the intonation type output by the voice classification model are the same intonation type, the same intonation type is used as the intonation type corresponding to the user voice;
and if the intonation types output by the Bayesian classification model and the intonation types output by the voice classification model are different intonation types, determining the intonation type corresponding to the user voice as a preset default intonation type.
Optionally, the bayesian classification model is obtained through training according to a voice training sample, and the voice training sample carries the voice characteristics; the process of determining the intonation type corresponding to the user voice by the Bayesian classification model is as follows: and the Bayesian classification model calculates the probability that the user voice respectively belongs to each intonation type according to the voice characteristics of the user voice, and determines the intonation type corresponding to the maximum probability value as the intonation type corresponding to the user voice.
Optionally, the voice classification model is a GA-BP neural network model, and the GA-BP neural network model is obtained by optimizing an initial BP neural network model;
the number of input layer nodes of the initial BP neural network model is determined according to the voice content length of a voice training sample, the number of output layer nodes is determined according to the intonation type, and the number of hidden layer nodes is determined based on a trial-and-error method;
the optimizing of the initial BP neural network model is as follows: training and learning the initial weight and the threshold value of each layer in the input layer, the hidden layer and the output layer of the initial BP neural network model according to preset sample data and a genetic algorithm, and determining the optimal initial weight and the threshold value of each layer to obtain the optimized BP neural network model.
Optionally, the specific implementation manner of generating, by the generating unit 403, the answer speech corresponding to the user speech based on the intonation type corresponding to the user speech and the speech content is:
determining responsive voice content based on the voice content;
generating response voice content, wherein the voice content is response voice content, and the intonation type is the response voice of the intonation type corresponding to the user voice.
The device provided by the embodiment of the application comprises: and acquiring the user voice, determining the intonation type corresponding to the user voice according to the voice characteristics and the voice content of the user voice, generating a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content, and finally broadcasting the response voice. Because the broadcasted response voice is obtained according to the intonation type of the user voice and the voice content, the broadcasted response voice can be different as long as the intonation type of the user voice is different, personalized response according to the user voice is realized, and therefore the experience of the user can be improved.
In addition, the intonation type corresponding to the user voice is determined according to the voice characteristics and the two dimensions of the voice content of the user voice, so that the intonation type corresponding to the user voice has higher accuracy, and the accuracy of the broadcasted response voice can be improved.
The present application also provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of voice response of the present application, namely to perform the steps of:
acquiring user voice;
determining the intonation type corresponding to the user voice according to the voice characteristics and the voice content of the user voice;
generating response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
and broadcasting the response voice.
The functions described in the methods of the present application, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computing device readable storage medium. Based on such understanding, a portion of the embodiments of the present application that contributes to the prior art or a portion of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (4)

1. A method of voice response comprising:
acquiring user voice;
determining the intonation type corresponding to the user voice according to the voice characteristics and the voice content of the user voice;
generating response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
broadcasting the response voice;
the intonation type comprises at least two appointed intonation types, wherein any one of the intonation types is preset according to the voice characteristics and the voice content of the voice of the historical user, namely the appointed intonation type is preset according to the voice information of the mood and emotion attitudes of the voice of the historical user and the voice content of the voice of the historical user; the appointed intonation type is a fun-amusing interactive intonation type or a gentle formal interactive intonation type; the fun-cheering interactive intonation type is a type that the tone or the sound amplitude of the voice is changed greatly and the relativity between the voice content and the service inquiry type problem is weak; the gentle formal interactive intonation type is the type with small tone or amplitude variation of voice and strong correlation between voice content and service inquiry problems;
the voice features at least comprise tone features and sound amplitude features;
wherein, the determining the intonation type corresponding to the user voice according to the voice characteristics of the user voice and the voice content includes:
inputting the user voice into a pre-trained Bayesian classification model, and enabling the Bayesian classification model to determine the intonation type corresponding to the user voice according to the voice characteristics of the user voice;
recognizing and obtaining the voice content corresponding to the user voice;
inputting the voice content of the user voice into a pre-trained voice classification model; the voice classification model determines the intonation type corresponding to the user voice according to the voice content of the user voice;
respectively acquiring the intonation types corresponding to the user voice output by the Bayesian classification model and the voice classification model;
if the intonation type output by the Bayesian classification model and the intonation type output by the voice classification model are the same intonation type, the same intonation type is used as the intonation type corresponding to the user voice;
wherein, still include:
if the intonation type output by the Bayesian classification model and the intonation type output by the voice classification model are different intonation types, determining the intonation type corresponding to the user voice as a preset default intonation type;
the Bayesian classification model is obtained by training according to a voice training sample, wherein the voice training sample carries the voice characteristics;
the process of determining the intonation type corresponding to the user voice by the Bayesian classification model is as follows: the Bayesian classification model calculates probability values of the user voice belonging to the intonation types respectively according to the voice characteristics of the user voice, and determines the intonation type corresponding to the user voice as the intonation type corresponding to the user voice with the largest probability value;
the voice classification model is a GA-BP neural network model, and the GA-BP neural network model is obtained by optimizing an initial BP neural network model;
the number of input layer nodes of the initial BP neural network model is determined according to the voice content length of a voice training sample, the number of output layer nodes is determined according to the intonation type, and the number of hidden layer nodes is determined based on a trial-and-error method;
the optimizing of the initial BP neural network model is as follows: training and learning the initial weight and the threshold value of each layer in the input layer, the hidden layer and the output layer of the initial BP neural network model according to preset sample data and a genetic algorithm, and determining the optimal initial weight and the threshold value of each layer to obtain an optimized BP neural network model;
wherein the generating, based on the intonation type corresponding to the user voice and the voice content, a response voice corresponding to the user voice includes:
determining responsive voice content based on the voice content;
and generating the response voice with the voice content as the response voice content and the tone type as the response voice of the tone type corresponding to the user voice, wherein the tone type of the response voice is the same as the tone type of the user voice so as to enhance the personalized effect of the response voice.
2. A voice response apparatus, comprising:
the acquisition unit is used for acquiring the voice of the user;
the determining unit is used for determining the intonation type corresponding to the user voice according to the voice characteristics and the voice content of the user voice;
a generating unit, configured to generate a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
the broadcasting unit is used for broadcasting the response voice;
the intonation type comprises at least two appointed intonation types, wherein any one of the intonation types is preset according to the voice characteristics and the voice content of the voice of the historical user, namely the appointed intonation type is preset according to the voice information of the mood and emotion attitudes of the voice of the historical user and the voice content of the voice of the historical user; the appointed intonation type is a fun-amusing interactive intonation type or a gentle formal interactive intonation type; the fun-cheering interactive intonation type is a type that the tone or the sound amplitude of the voice is changed greatly and the relativity between the voice content and the service inquiry type problem is weak; the gentle formal interactive intonation type is the type with small tone or amplitude variation of voice and strong correlation between voice content and service inquiry problems;
the voice features at least comprise tone features and sound amplitude features;
wherein, the determining the intonation type corresponding to the user voice according to the voice characteristics of the user voice and the voice content includes:
inputting the user voice into a pre-trained Bayesian classification model, and enabling the Bayesian classification model to determine the intonation type corresponding to the user voice according to the voice characteristics of the user voice;
recognizing and obtaining the voice content corresponding to the user voice;
inputting the voice content of the user voice into a pre-trained voice classification model; the voice classification model determines the intonation type corresponding to the user voice according to the voice content of the user voice;
respectively acquiring the intonation types corresponding to the user voice output by the Bayesian classification model and the voice classification model;
if the intonation type output by the Bayesian classification model and the intonation type output by the voice classification model are the same intonation type, the same intonation type is used as the intonation type corresponding to the user voice;
wherein, still include:
if the intonation type output by the Bayesian classification model and the intonation type output by the voice classification model are different intonation types, determining the intonation type corresponding to the user voice as a preset default intonation type;
the Bayesian classification model is obtained by training according to a voice training sample, wherein the voice training sample carries the voice characteristics;
the process of determining the intonation type corresponding to the user voice by the Bayesian classification model is as follows: the Bayesian classification model calculates probability values of the user voice belonging to the intonation types respectively according to the voice characteristics of the user voice, and determines the intonation type corresponding to the user voice as the intonation type corresponding to the user voice with the largest probability value;
the voice classification model is a GA-BP neural network model, and the GA-BP neural network model is obtained by optimizing an initial BP neural network model;
the number of input layer nodes of the initial BP neural network model is determined according to the voice content length of a voice training sample, the number of output layer nodes is determined according to the intonation type, and the number of hidden layer nodes is determined based on a trial-and-error method;
the optimizing of the initial BP neural network model is as follows: training and learning the initial weight and the threshold value of each layer in the input layer, the hidden layer and the output layer of the initial BP neural network model according to preset sample data and a genetic algorithm, and determining the optimal initial weight and the threshold value of each layer to obtain an optimized BP neural network model;
wherein the generating, based on the intonation type corresponding to the user voice and the voice content, a response voice corresponding to the user voice includes:
determining responsive voice content based on the voice content;
and generating the response voice with the voice content as the response voice content and the tone type as the response voice of the tone type corresponding to the user voice, wherein the tone type of the response voice is the same as the tone type of the user voice so as to enhance the personalized effect of the response voice.
3. A voice response apparatus, comprising: a processor and a memory for storing a program; the processor is configured to run the program to implement the method of voice response of claim 1.
4. A computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of voice response of claim 1.
CN202011052933.7A 2020-09-29 2020-09-29 Voice response method, device, equipment and computer readable storage medium Active CN112201277B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011052933.7A CN112201277B (en) 2020-09-29 2020-09-29 Voice response method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011052933.7A CN112201277B (en) 2020-09-29 2020-09-29 Voice response method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112201277A CN112201277A (en) 2021-01-08
CN112201277B true CN112201277B (en) 2024-03-22

Family

ID=74008030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011052933.7A Active CN112201277B (en) 2020-09-29 2020-09-29 Voice response method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112201277B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334583A (en) * 2018-01-26 2018-07-27 上海智臻智能网络科技股份有限公司 Affective interaction method and device, computer readable storage medium, computer equipment
CN109447354A (en) * 2018-10-31 2019-03-08 中国银行股份有限公司 A kind of intelligent bank note distribution method and device based on GA-BP neural network
KR20190088126A (en) * 2018-01-05 2019-07-26 서울대학교산학협력단 Artificial intelligence speech synthesis method and apparatus in foreign language
CN110110169A (en) * 2018-01-26 2019-08-09 上海智臻智能网络科技股份有限公司 Man-machine interaction method and human-computer interaction device
CN110379445A (en) * 2019-06-20 2019-10-25 深圳壹账通智能科技有限公司 Method for processing business, device, equipment and storage medium based on mood analysis
CN111368538A (en) * 2020-02-29 2020-07-03 平安科技(深圳)有限公司 Voice interaction method, system, terminal and computer readable storage medium
CN111414754A (en) * 2020-03-19 2020-07-14 中国建设银行股份有限公司 Emotion analysis method and device of event, server and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190088126A (en) * 2018-01-05 2019-07-26 서울대학교산학협력단 Artificial intelligence speech synthesis method and apparatus in foreign language
CN108334583A (en) * 2018-01-26 2018-07-27 上海智臻智能网络科技股份有限公司 Affective interaction method and device, computer readable storage medium, computer equipment
CN110110169A (en) * 2018-01-26 2019-08-09 上海智臻智能网络科技股份有限公司 Man-machine interaction method and human-computer interaction device
CN109447354A (en) * 2018-10-31 2019-03-08 中国银行股份有限公司 A kind of intelligent bank note distribution method and device based on GA-BP neural network
CN110379445A (en) * 2019-06-20 2019-10-25 深圳壹账通智能科技有限公司 Method for processing business, device, equipment and storage medium based on mood analysis
CN111368538A (en) * 2020-02-29 2020-07-03 平安科技(深圳)有限公司 Voice interaction method, system, terminal and computer readable storage medium
CN111414754A (en) * 2020-03-19 2020-07-14 中国建设银行股份有限公司 Emotion analysis method and device of event, server and storage medium

Also Published As

Publication number Publication date
CN112201277A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
US10332507B2 (en) Method and device for waking up via speech based on artificial intelligence
CN110110062B (en) Machine intelligent question and answer method and device and electronic equipment
US9190055B1 (en) Named entity recognition with personalized models
CN111428010B (en) Man-machine intelligent question-answering method and device
US9582757B1 (en) Scalable curation system
WO2018033030A1 (en) Natural language library generation method and device
EP3779972A1 (en) Voice wake-up method and apparatus
CN111081220B (en) Vehicle-mounted voice interaction method, full-duplex dialogue system, server and storage medium
CN105632486A (en) Voice wake-up method and device of intelligent hardware
CN110019742B (en) Method and device for processing information
CN111931513A (en) Text intention identification method and device
CN103365833A (en) Context scene based candidate word input prompt method and system for implementing same
CN111191450A (en) Corpus cleaning method, corpus entry device and computer-readable storage medium
CN113392640B (en) Title determination method, device, equipment and storage medium
CN111709223B (en) Sentence vector generation method and device based on bert and electronic equipment
CN110717027B (en) Multi-round intelligent question-answering method, system, controller and medium
CN111357051B (en) Speech emotion recognition method, intelligent device and computer readable storage medium
CN111312222A (en) Awakening and voice recognition model training method and device
CN110457454A (en) A kind of dialogue method, server, conversational system and storage medium
CN111858854A (en) Question-answer matching method based on historical dialogue information and related device
CN116401354A (en) Text processing method, device, storage medium and equipment
CN115640398A (en) Comment generation model training method, comment generation device and storage medium
CN116150306A (en) Training method of question-answering robot, question-answering method and device
CN113889091A (en) Voice recognition method and device, computer readable storage medium and electronic equipment
CN116913278B (en) Voice processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant