CN112581964B - Multi-domain oriented intelligent voice interaction method - Google Patents

Multi-domain oriented intelligent voice interaction method Download PDF

Info

Publication number
CN112581964B
CN112581964B CN202011413880.7A CN202011413880A CN112581964B CN 112581964 B CN112581964 B CN 112581964B CN 202011413880 A CN202011413880 A CN 202011413880A CN 112581964 B CN112581964 B CN 112581964B
Authority
CN
China
Prior art keywords
text information
question
answer
voice
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011413880.7A
Other languages
Chinese (zh)
Other versions
CN112581964A (en
Inventor
吴靖
罗少杰
樊立波
徐树良
郑伟彦
刘宏伟
严性平
朱家庆
顾建炜
边巧燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dayou Industrial Co ltd Hangzhou Science And Technology Development Branch
Original Assignee
Zhejiang Dayou Industrial Co ltd Hangzhou Science And Technology Development Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dayou Industrial Co ltd Hangzhou Science And Technology Development Branch filed Critical Zhejiang Dayou Industrial Co ltd Hangzhou Science And Technology Development Branch
Priority to CN202011413880.7A priority Critical patent/CN112581964B/en
Publication of CN112581964A publication Critical patent/CN112581964A/en
Application granted granted Critical
Publication of CN112581964B publication Critical patent/CN112581964B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the field of intelligent voice interaction, in particular to a multi-field-oriented intelligent voice interaction method, which comprises the following steps: acquiring a voice signal; carrying out voice recognition on a voice signal to obtain text information; extracting keywords in the text information, and matching the keywords to the corresponding power grid field; grading and adjusting the acquired text information; generating an answer text according to the adjusted text information; and performing voice synthesis on the answer text and outputting the answer text. The invention can effectively improve the capability and efficiency of the power grid personnel for processing the service, and has good economic benefit and practical value.

Description

Multi-field-oriented intelligent voice interaction method
Technical Field
The invention relates to the field of intelligent voice interaction, in particular to an intelligent voice interaction method oriented to multiple fields.
Background
With the development of artificial intelligence, the speech recognition technology has made remarkable progress, and gradually goes from a laboratory to the market. In the power industry, personnel in a power grid face a large amount of service data and complex service knowledge every day, and when the complex and variable problems are encountered, the traditional network retrieval and inquiry of old personnel with rich experience are used for obtaining a solution, so that the speed is low, the efficiency is low, and even the solution cannot be found due to the cross-field problem. Only by introducing an intelligent voice interaction technology, the problem can be solved quickly, the mode of simply processing the problem by manpower is eliminated, and the high efficiency and standardization of business work are realized.
Disclosure of Invention
In order to solve the problems, the invention provides an intelligent voice interaction method for multiple fields.
A multi-domain oriented intelligent voice interaction method comprises the following steps:
acquiring a voice signal;
carrying out voice recognition on a voice signal to obtain text information;
extracting keywords in the text information, and matching the keywords to the corresponding power grid field;
grading and adjusting the acquired text information;
generating an answer text according to the adjusted text information;
and performing voice synthesis on the answer text and outputting the answer text.
Preferably, the acquiring the voice signal includes:
the microphone array collects signals through a plurality of microphones to serve as input of voice signal processing, and signals received by an ith microphone in the microphone array are as follows:
Figure GDA0003961109010000021
in the formula: y is i For the signal received by the ith microphone in the microphone array, i =1,2,. N, r (x) is the sound source signal, α i Attenuation factor, τ, for acoustic wave propagation i Is the time required for the sound wave to travel to the two microphones, n i (x) Is ambient noise, and r (x) and n i (x) Are not related to each other, n id (x) For multipath reflected noise received by the ith microphone, n ie (x) Ambient noise received for the ith microphone;
the corresponding vector form is:
y(x)=m(x)*r(x)+n l (x) (2)
in the formula: symbol ". Sup." is a convolution operator, n l (x) Are interfering components.
Preferably, the performing voice recognition on the voice signal to obtain the text information includes:
and realizing preliminary speech recognition by a method of combining the ASR and the hot word technology to obtain text information.
Preferably, the extracting keywords from the text information and matching the keywords to the corresponding power grid field includes:
extracting keywords in the text information;
matching the questions proposed by the users to the corresponding power grid field according to the keywords, and performing intention recognition in the field by combining power grid hot words, wherein a model of the intention recognition is shown as a formula (3):
Figure GDA0003961109010000022
in the formula: p is the probability that the text belongs to the field of Y under the condition that the power grid hot word is X, h ij The method is characterized by high dimension, Y is different power fields, and X is a power grid hotword;
and according to the intention recognition model, obtaining the probability that the text information of the preliminary voice recognition belongs to each field, thereby determining the power grid field corresponding to the text information.
Preferably, the scoring and adjusting the acquired text information includes:
the evaluation score model is shown in equation (4):
Figure GDA0003961109010000031
in the formula: SES is an evaluation score of a sentence, N is the number of words with homophones and multiwords in a noun of the initial text information of the sentence, V is the number of words with homophones and multiwords in a verb of the initial text information of the sentence, and A is the number of words with homophones and multiwords in an adjective of the initial text information of the sentence;
and (4) locking sentences with the scores lower than 100% by combining a scoring model, and replacing the vocabulary with the homophone multi-word problem in the sentences with the vocabulary in the domain hot word stock until the score reaches 100%.
Preferably, the replacing the vocabulary with the homophone multi-word problem in the sentence with the vocabulary in the domain hot thesaurus includes:
if only one word in the hot word bank has a homophone word with the corresponding word in the sentence, the word is directly replaced by the word in the hot word bank;
if a plurality of words in the hot word bank and corresponding words in the sentence have homophonic and nonuse words, semantic analysis is required to be combined, and the relevance of context content is examined, so that which word is selected to replace the word is determined.
Preferably, the generating of the answer text according to the adjusted text information includes:
inputting the obtained adjusted text information into a question-answering engine in the field, wherein the question-answering engine determines a preset question;
calculating semantic similarity between the first input information and each preset problem in a preset file, and determining the preset problem of which the semantic similarity between the first input information and the preset problem meets a preset range, wherein the first input information refers to voice information input by a user for the first time, and the preset file refers to a configuration file containing corresponding relations between all support problems and fields;
in the question-answering process, outputting a question of an unknown condition of the determined preset question, acquiring the condition of user response, and judging whether the acquired condition combination has a corresponding answer or not according to the corresponding relation between the preset condition combination and the answer and the domain knowledge map; if yes, outputting an answer corresponding to the acquired combination of the conditions; if not, the next round of question answering is performed.
Preferably, the voice synthesizing and outputting the answer text includes:
speech synthesis is achieved using LMA models, which are as follows:
Figure GDA0003961109010000041
in the formula: x is the speech signal, C is the cepstrum coefficient of the speech signal, and L is the adjustment coefficient.
The invention has the beneficial effects that: acquiring a voice signal; carrying out voice recognition on a voice signal to obtain text information; extracting keywords in the text information, and matching the keywords to the corresponding power grid field; grading and adjusting the acquired text information; generating an answer text according to the adjusted text information; carrying out voice synthesis on the answer text and outputting; the invention can effectively improve the capability and efficiency of processing the service by the power grid personnel and has good economic benefit and practical value.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a schematic flow chart of a multi-domain oriented intelligent voice interaction method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of speech recognition in a multi-domain oriented intelligent speech interaction method according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of answer generation in a multi-domain-oriented intelligent voice interaction method according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be further described below with reference to the accompanying drawings, but the present invention is not limited to these embodiments.
The method comprises the steps of hearing a problem through a microphone array, converting a sound signal into initial text information by combining an automatic voice recognition technology with a power grid hot word, extracting a power grid keyword in a text, locking the power grid field corresponding to the text information through an intention recognition system, evaluating and adjusting the initial text information according to a knowledge graph of the field, generating an answer corresponding to the text by combining a field question-answer engine technology, and finally, speaking the answer through a text-to-voice technology, so that digital staff can hear and speak the same as people and can understand and think the same. The method can effectively improve the capability and efficiency of processing the service, and has good economic benefit and practical value.
Based on the above thought, the embodiment of the present invention provides a multi-domain oriented intelligent voice interaction method, as shown in fig. 1, including the following steps:
s1: a speech signal is acquired.
The microphone array collects signals through a plurality of microphones as input for speech signal processing, and the actual speech signal contains both environmental noise and multipath reflection noise. The signal received by the ith microphone in the microphone array is:
Figure GDA0003961109010000061
in the formula: y is i N, r (x) is the sound source signal, α, for the signal received by the ith microphone in the microphone array, i =1,2 i Attenuation factor, τ, for acoustic wave propagation i Is the time required for the sound wave to travel to the two microphones, n i (x) Is ambient noise, and r (x) and n i (x) Are not related to each other, n id (x) For multipath reflected noise received by the ith microphone, n ie (x) Ambient noise received for the ith microphone.
The corresponding vector form is:
y(x)=m(x)*r(x)+n l (x) (2)
in the formula: the symbol ". Is a convolution operator, n l (x) As interference components (including ambient noise interference and room multipath reflected noise).
S2: and carrying out voice recognition on the voice signals to obtain text information.
Based on the audio signals collected by the microphone array, the primary speech recognition is realized by a method of combining the ASR and the hotword technology. The speech recognition flow chart based on deep learning is shown in fig. 2:
after the speech signal collected by the microphone array, audio feature extraction (mainly phoneme extraction) is performed on the speech signal, that is, discretization processing is performed on a continuous time signal (audio signal), theoretically, the higher the sampling rate is, the better the effect is, and generally, a sampling rate of 16kHz is adopted. And taking the extracted audio characteristic vector as an input layer of a neural network, constructing a recognition model, and finally outputting a recognized recognition result.
S3: and extracting keywords in the text information, and matching the keywords to the corresponding power grid field.
In the field of power grid services, a plurality of commonly used vocabularies are collected, a hot word library is established, and keywords in text information can be automatically extracted and stored in a memory by combining with the preliminarily recognized text information.
In the power grid service, different professional fields use different professional terms and hot words. The professional field contains multiple field knowledge map, for example financial field, fortune is examined the field, and the scheduling field etc. will match the problem that the user proposed according to the keyword that preliminary speech recognition extracted to corresponding electric wire netting field, combines the electric wire netting hotword, carries out intention identification to the text of preliminary conversion in this field, and the model of intention identification is shown as equation (3):
Figure GDA0003961109010000071
in the formula: p is the probability that the text belongs to the field of Y under the condition that the power grid hot word is X, h ij The method is characterized by high dimensional characteristics, Y is different fields (finance, operation inspection, scheduling, marketing and the like), and X is a power grid hotword.
S4: and scoring and adjusting the acquired text information.
And according to the intention model, obtaining the probability that the text information of the preliminary voice recognition belongs to each field, thereby determining the power grid field corresponding to the text information. In the determined power grid field, the initial text information is scored by combining the field hotwords, and a scoring model is as follows: the scoring model divides each sentence in the initial text information into nouns, verbs and adjectives, and then compares the corresponding parts in the sentences one by using the nouns, verbs and adjectives in the hot words in the field to check whether homonyms and multi-character vocabularies appear. The evaluation score model for each sentence in the initial text information is shown in equation (4):
Figure GDA0003961109010000072
in the formula: SES (sequence evaluation scores) is an evaluation score of a Sentence (the higher the score is, the more reasonable the initial text information of the Sentence is converted), N is the number of words in which homophones occur in nouns of the initial text information of the Sentence, V is the number of words in which homophones occur in verbs of the initial text information of the Sentence, and a is the number of words in which homophones occur in adjectives of the initial text information of the Sentence.
And determining the corresponding power grid field through the text information of the preliminary voice recognition so as to improve the accuracy and efficiency of voice interaction.
Combining a scoring model, locking sentences with the scores lower than 100%, and replacing vocabularies with the homophone multi-word problem in the sentences with vocabularies in the domain hot word library, wherein the replacement principle is as follows:
1. if only one word in the hot word bank has a homophone word with the corresponding word in the sentence, the word is directly replaced by the word in the hot word bank;
2. if a plurality of words in the hot word bank and corresponding words in the sentence have homophonic and useless words, semantic analysis is required to be combined, and the relevance of context content is examined, so that which word is selected to replace the word is determined.
After the sentence is adjusted, scoring is carried out again until the score reaches 100%, and more accurate text is obtained.
The method for combining the intention recognition and the hot words to enable the text to be more accurate has a good application effect on the condition that homophones and different words in the sentence.
S5: and generating an answer text according to the adjusted text information.
The multi-field knowledge graph relates to defect record retrieval of power equipment, customer service of a power grid company, automatic generation of secondary safety measures of an intelligent substation, a full-service unified data center, equipment fault diagnosis and management and the like, and forms a power grid semantic network knowledge base. The multi-domain knowledge graph is equivalent to learned knowledge, and the problem is really solved through the knowledge by a domain question and answer engine. The domain question-answering engine comprises a plurality of groups of corresponding condition combinations and answers. A flowchart of answer generation in the intelligent voice interaction method is shown in fig. 3.
The process inputs the acquired high-precision text information into a field question-answering engine, and the question-answering engine determines a preset question, wherein the preset question refers to questions corresponding to different answers under different condition combinations, and each condition combination comprises one or more conditions. And determining the preset questions with the semantic similarity meeting a preset range by calculating the semantic similarity between the first input information and each preset question in a preset file, wherein the first input information refers to voice information input by a user for the first time, and the preset file refers to a configuration file containing corresponding relations between all the support questions and the field.
In the question-answering process, outputting a question of an unknown condition of the determined preset question, acquiring the condition of user response, and judging whether the acquired condition combination has a corresponding answer or not according to the corresponding relation between the preset condition combination and the answer and the domain knowledge map; if yes, outputting an answer corresponding to the acquired combination of the conditions; if not, the next round of question answering is performed. After the receiving module receives a predetermined question input by a user, outputting a question of an unknown condition of the predetermined question; the preset question refers to a question corresponding to different answers under different condition combinations, each condition combination comprises one or more conditions, after a question of an unknown condition of the preset question is output and a condition for a user to answer is received by the receiving module, an answer text corresponding to the acquired condition combination of the preset question is output, or a question text of another unknown condition of the preset question is output.
S6: and performing voice synthesis on the answer text and outputting the answer text.
In the present embodiment, speech synthesis is realized using an LMA (logarithmic amplitude approximation) model. The LMA formula is as follows:
Figure GDA0003961109010000091
in the formula: x is the speech signal, C is the cepstral coefficient of the speech signal, and L is the adjustment coefficient (L = 3).
The generated answer text or question text is brought into an LMA model, and the phenomena of light reading, cooperative pronunciation and the like are processed, so that the high-sound-quality voice synthesis function can be realized.
Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (3)

1. A multi-domain oriented intelligent voice interaction method is characterized by comprising the following steps:
acquiring a voice signal;
carrying out voice recognition on a voice signal to obtain text information;
extracting keywords in the text information, and matching the keywords to the corresponding power grid field;
grading and adjusting the acquired text information;
generating an answer text according to the adjusted text information; the generating of the answer text according to the adjusted text information includes:
inputting the obtained adjusted text information into a question-answering engine in the field, wherein the question-answering engine determines a preset question;
calculating semantic similarity between first input information and each preset problem in a preset file, and determining the preset problem of which the semantic similarity between the first input information and the preset problem meets a preset range, wherein the first input information refers to voice information input by a user for the first time, and the preset file refers to a configuration file containing corresponding relations between all support problems and fields;
in the question-answering process, outputting a question of an unknown condition of the determined preset question, acquiring the condition of user response, and judging whether the acquired condition combination has a corresponding answer or not according to the corresponding relation between the preset condition combination and the answer and the domain knowledge map; if yes, outputting an answer corresponding to the acquired combination of the conditions; if not, performing the next round of question answering;
and performing voice synthesis on the answer text and outputting the answer text, and realizing the voice synthesis by using an LMA (local area network) model, wherein the LMA model is as follows:
Figure FDA0003961107000000011
in the formula: x is the speech signal, C is the cepstrum coefficient of the speech signal, and L are both adjustment coefficients.
2. The multi-domain-oriented intelligent voice interaction method according to claim 1, wherein the acquiring the voice signal comprises:
the microphone array collects signals through a plurality of microphones to serve as input of voice signal processing, and signals received by an ith microphone in the microphone array are as follows:
Figure FDA0003961107000000021
in the formula: y is i N, r (x) is the sound source signal, α, for the signal received by the ith microphone in the microphone array, i =1,2 i For attenuation of acoustic wave propagationFactor of decrease, τ i Is the time required for the sound wave to travel to the two microphones, n i (x) Is ambient noise, and r (x) and n i (x) Are not related to each other, n id (x) For multipath reflected noise received by the ith microphone, n ie (x) Ambient noise received for the ith microphone;
the corresponding vector form is:
y(x)=m(x)*r(x)+n l (x) (2)
in the formula: symbol ". Sup." is a convolution operator, n l (x) M (x) is an attenuation coefficient, and r (x) is a sound source signal.
3. The method of claim 1, wherein the performing speech recognition on the speech signal to obtain text information comprises:
and realizing preliminary speech recognition by combining the ASR and hotword technology to obtain text information.
CN202011413880.7A 2020-12-04 2020-12-04 Multi-domain oriented intelligent voice interaction method Active CN112581964B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011413880.7A CN112581964B (en) 2020-12-04 2020-12-04 Multi-domain oriented intelligent voice interaction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011413880.7A CN112581964B (en) 2020-12-04 2020-12-04 Multi-domain oriented intelligent voice interaction method

Publications (2)

Publication Number Publication Date
CN112581964A CN112581964A (en) 2021-03-30
CN112581964B true CN112581964B (en) 2023-03-24

Family

ID=75127489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011413880.7A Active CN112581964B (en) 2020-12-04 2020-12-04 Multi-domain oriented intelligent voice interaction method

Country Status (1)

Country Link
CN (1) CN112581964B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112992144B (en) * 2021-04-21 2021-07-27 国网浙江省电力有限公司金华供电公司 Intelligent voice regulation and control method applied to electric power field
CN113160822B (en) * 2021-04-30 2023-05-30 北京百度网讯科技有限公司 Speech recognition processing method, device, electronic equipment and storage medium
CN114333807B (en) * 2021-12-24 2023-04-25 北京百度网讯科技有限公司 Power scheduling method, device, apparatus, storage medium, and program
CN116259308B (en) * 2023-05-16 2023-07-21 四川大学 Context-aware blank pipe voice recognition method and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247750A (en) * 2017-05-26 2017-10-13 深圳千尘计算机技术有限公司 Artificial intelligence exchange method and system
CN107329843B (en) * 2017-06-30 2021-06-01 百度在线网络技术(北京)有限公司 Application program voice control method, device, equipment and storage medium
CN108831479A (en) * 2018-06-27 2018-11-16 努比亚技术有限公司 A kind of audio recognition method, terminal and computer readable storage medium
CN109040481A (en) * 2018-08-09 2018-12-18 武汉优品楚鼎科技有限公司 The automatic error-correcting smart phone inquiry method, system and device of field of securities
CN111696545B (en) * 2019-03-15 2023-11-03 北京汇钧科技有限公司 Speech recognition error correction method, device and storage medium
CN110838288B (en) * 2019-11-26 2022-05-06 杭州博拉哲科技有限公司 Voice interaction method and system and dialogue equipment

Also Published As

Publication number Publication date
CN112581964A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN112581964B (en) Multi-domain oriented intelligent voice interaction method
Yamagishi et al. Thousands of voices for HMM-based speech synthesis–Analysis and application of TTS systems built on various ASR corpora
EP1901283A2 (en) Automatic generation of statistical laguage models for interactive voice response applacation
Khassanov et al. A crowdsourced open-source Kazakh speech corpus and initial speech recognition baseline
CN112397054B (en) Power dispatching voice recognition method
CN106875936A (en) Voice recognition method and device
CN115019776A (en) Voice recognition model, training method thereof, voice recognition method and device
CN114818649A (en) Service consultation processing method and device based on intelligent voice interaction technology
CN116631412A (en) Method for judging voice robot through voiceprint matching
CN111798846A (en) Voice command word recognition method and device, conference terminal and conference terminal system
Desot et al. End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
CN111090726A (en) NLP-based electric power industry character customer service interaction method
CN115168563B (en) Airport service guiding method, system and device based on intention recognition
KR102407055B1 (en) Apparatus and method for measuring dialogue quality index through natural language processing after speech recognition
CN115985320A (en) Intelligent device control method and device, electronic device and storage medium
CN115691500A (en) Power customer service voice recognition method and device based on time delay neural network
Jackson Automatic speech recognition: Human computer interface for kinyarwanda language
Xu et al. Agricultural price information acquisition using noise-robust Mandarin auto speech recognition
Kaur et al. Speech based retrieval system for Punjabi language
Oyucu Comparing The Fine-Tuning and Performance of Whisper Pre-Trained Models for Turkish Speech Recognition Task
Oyucu Development of test corpus with large vocabulary for Turkish speech recognition system and a new test procedure
Hlaing et al. Word Representations for Neural Network Based Myanmar Text-to-Speech S.
Luque et al. GEOVAQA: A voice activated geographical question answering system
CN102034474B (en) Method for identifying all languages by voice and inputting individual characters by voice
Qaroush et al. Automatic spoken customer query identification for Arabic language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant