CN111583939A - Method and device for specific target wake-up by voice recognition - Google Patents

Method and device for specific target wake-up by voice recognition Download PDF

Info

Publication number
CN111583939A
CN111583939A CN201910124945.7A CN201910124945A CN111583939A CN 111583939 A CN111583939 A CN 111583939A CN 201910124945 A CN201910124945 A CN 201910124945A CN 111583939 A CN111583939 A CN 111583939A
Authority
CN
China
Prior art keywords
target
module
voice
detected
acoustic model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910124945.7A
Other languages
Chinese (zh)
Inventor
李政
吴国扬
陈心章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foxlink Electronics Dongguan Co Ltd
Cheng Uei Precision Industry Co Ltd
Original Assignee
Foxlink Electronics Dongguan Co Ltd
Cheng Uei Precision Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foxlink Electronics Dongguan Co Ltd, Cheng Uei Precision Industry Co Ltd filed Critical Foxlink Electronics Dongguan Co Ltd
Priority to CN201910124945.7A priority Critical patent/CN111583939A/en
Publication of CN111583939A publication Critical patent/CN111583939A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a method and a device for waking up a specific target by voice recognition, wherein the method comprises the following steps: receiving a voice message of a specific target and extracting voice characteristics in the voice message; the voice characteristics of the specific target are used as input data of an HVS model which is trained in an identification mode, training is carried out, a specific target acoustic model is obtained, and the specific target acoustic model is stored; receiving a voice message of a target to be detected, and extracting voice characteristics in the voice message; taking the voice characteristics of the target to be tested as input data of a hidden vector state model trained in an identification mode, and training to obtain an acoustic model of the target to be tested; and comparing the acoustic model of the target to be detected with the acoustic model of the specific target, if the acoustic model of the target to be detected and the acoustic model of the specific target are related, performing language decoding on the voice characteristics of the target to be detected by using the language model, and judging whether to awaken or not according to a language decoding result. According to the invention, the HVS model of discriminant training is used as the acoustic model, so that the target can be accurately and quickly judged, and further the awakening function is achieved.

Description

Method and device for specific target wake-up by voice recognition
Technical Field
The present invention relates to the field of speech recognition, and in particular, to a method and an apparatus for speech recognition.
Background
In recent years, a smart sound box gradually changes the way of life of people, and as a voice assistant, the smart sound box can assist users in performing tasks in life, such as helping cars, shopping, reminding items, recording information, and the like.
A conventional smart speaker usually employs a voice wake-up method to automatically retrieve a number of pre-registered voice commands (wake-up words) from a continuous voice to wake up the smart speaker for subsequent tasks. Traditionally, the Hidden Markov Model (HMM) technique is used, which uses a Phoneme (Phoneme) alone and the feature-to-line comparison of syllables to find the most probable word, and later, a Gaussian Mixture Model (GMM) is combined to form a classic GMM-HMM Model. The conventional GMM-HMM model usually adopts a Maximum similarity training method (Maximum likehood), but this method is likely to make the probability of the competitor's answer greater than the probability of the correct answer under certain factors, which leads to the decrease of the correct rate, and thus there is still room for improvement.
Disclosure of Invention
The invention aims to provide a method for realizing specific target awakening by voice recognition, aiming at the defects and shortcomings of the prior art, the method is characterized in that the specific target identity recognition monitoring is realized by utilizing the awakening words of the specific target in combination with a Hidden Vector State Model (HVS Model for short) which is differentially trained, so that the purpose of specific target voice awakening is achieved.
In order to achieve the above object, an aspect of the embodiments of the present invention provides a method for voice recognition for specifically target wake-up, including the following steps:
s1: receiving a voice message of a specific target, preprocessing the voice message of the specific target, and extracting a voice feature of the specific target;
s2: taking the voice features of the specific target as input data of a hidden vector state Model (HVS Model) trained in a discriminant manner, training to obtain a specific target acoustic Model, and storing the specific target acoustic Model;
s3: receiving a voice message of a target to be detected, preprocessing the voice message of the target to be detected, and extracting a voice feature of the target to be detected;
s4: taking the voice characteristics of the target to be detected as input data of a hidden vector state model trained in an identification mode, and training to obtain an acoustic model of the target to be detected;
s5: and comparing the relevance between the acoustic model of the target to be detected and the acoustic model of the specific target, if the acoustic model of the target to be detected and the acoustic model of the specific target are related, performing language decoding on the voice characteristics of the target to be detected by using at least one language model, and judging whether to awaken or not according to a language decoding result.
Specifically, the voice message of the specific target and the voice message of the target to be tested include at least one awakening word.
Specifically, the pretreatment comprises: the voice message is processed with noise suppression and echo cancellation.
In particular, the speech features are obtained by means of mel-frequency cepstral coefficients (MFCCs).
Specifically, the discriminant training is trained using a maximum mutual information approach (MMI).
Specifically, the language model includes a lexicon model or a grammar model or a combination thereof.
Specifically, the step of determining whether to wake up the speech recognition according to the speech decoding result includes: performing language decoding on the voice characteristics of the target to be detected; judging whether the target voice message to be detected contains the awakening word or not; if the awakening word is contained, the voice recognition awakening is started, and if the awakening word is not contained, the voice recognition awakening is not started.
Another aspect of an embodiment of the present invention provides a device for waking up a specific target by using voice recognition, including:
the system comprises an acquisition module, a detection module and a display module, wherein the acquisition module comprises a plurality of microphone arrays and is used for receiving voice messages of a specific target and a target to be detected, and the voice messages comprise a wakeup word;
the extracting module is connected with the collecting module and is used for extracting MFCC voice characteristics in the voice messages of the specific target and the target to be detected;
the training module is connected with the extracting module and used for taking MFCC voice characteristics in the voice messages of the specific target and the target to be detected as input data of a hidden vector state model trained by a maximum mutual information method and acquiring an acoustic model of the trained specific target and an acoustic model of the target to be detected;
the storage module is connected with the training module and used for storing the trained acoustic model of the specific target;
the decoding module is connected with the extracting module and is used for carrying out language decoding on the voice message of the target to be detected; and
the processor module is connected with the training module, the storage module and the decoding module and used for comparing the acoustic model of the specific target in the storage module with the acoustic model of the target to be detected, judging whether the decoding module is started to carry out language decoding on the voice message of the target to be detected according to the comparison result, and confirming whether the voice message of the target to be detected after the language decoding contains the awakening word to awaken the device.
Specifically, the device further comprises a registration module, the registration module is connected with the acquisition module and the storage module, and the registration module is used for starting and storing the acoustic model of the specific target to the storage module.
Specifically, the device further comprises a wireless communication module, wherein the wireless communication module is used for external communication connection.
Compared with the prior art, the method and the device for awakening the specific target by voice recognition adopt the hidden vector state model of the discriminant training as the acoustic model, use the discriminant training to maximize the occurrence probability of correct answers, reduce the occurrence probability of competitors, increase the discrimination capability between the correct answers and the competitors, and quickly and accurately judge whether the target to be detected is the specific target so as to achieve the awakening function.
Drawings
Fig. 1 is a flowchart illustrating a method for waking up a specific target by speech recognition according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of an apparatus for waking up a specific target by speech recognition according to an embodiment of the present invention.
The reference numerals in the figures are explained below:
100 speech recognition device 11 acquisition module
12 extraction module and 13 training module
14 storage module 15 decoding module
16 processor module 17 registration module
18 wireless communication module
S101 to S105.
Detailed Description
To explain the technical content, structural features, and achieved objects and effects of the present invention in detail, the following embodiments are exemplified and the detailed description is given in conjunction with the drawings.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for waking up a specific target by speech recognition according to an embodiment of the present invention, including the following steps:
step S101: receiving a voice message of a specific target, preprocessing the voice message of the specific target, and extracting a voice feature of the specific target;
specifically, the specific target in this step refers to a registered user who achieves the awakening condition in the voice recognition, the voice message is a text prepared in advance, the text content includes a preset awakening word, and the specific target reads the text content first and collects the voice message of the specific target through an acquisition module 11 of the voice recognition apparatus 100 according to an embodiment of the present invention.
Specifically, the voice message collected in this step is an analog voice signal, and the subsequent voice recognition processing can be performed only by converting the analog voice signal into a digital voice signal. In addition, other environmental noises may be included in the voice message, so that it is also necessary to perform pre-processing on the voice message, including noise suppression processing and echo cancellation processing on the digital voice signal, to filter out unwanted environmental noises and obtain a valid voice signal.
Specifically, in the embodiment of the present invention, a Mel-frequency Cepstral Coefficients (MFCC) mode is adopted to extract the voice feature of the specific target, the preprocessed voice signal is cut into a plurality of frames (Frame blocking), Pre-emphasis (Pre-emphasis) is performed on the part of the voice signal to be emphasized, windowing (Window) is performed, and the like, so as to obtain a clearer and more definite section of voice feature.
Step S102: taking the voice features of the specific target as input data of a Hidden Vector State Model (HVS Model for short) trained in an identification mode, training to obtain a specific target acoustic Model, and storing the specific target acoustic Model;
specifically, in this step, the speech features of the specific target are used as input data to train the acoustic model, in the embodiment of the present invention, the latent vector state model is adopted and the discriminant training mode is used for training, the discriminant training does not aim at maximizing the similarity of the trained acoustic corpus but aims at minimizing the classification (or identification) errors, and the identification rate is improved.
The discriminant training is performed based on a Maximum Mutual Information (MMI) method, which can increase the probability of the occurrence of the Maximum correct answer, effectively reduce the probability of the occurrence of the competitor, and increase the discrimination between the correct answer and the competitor.
Specifically, the step of storing the specific target acoustic model refers to storing the specific target acoustic model in a storage module 14 of the speech recognition apparatus 100 according to the embodiment of the invention.
Step S103: receiving a voice message of a target to be detected, preprocessing the voice message of the target to be detected, and extracting a voice feature of the target to be detected;
specifically, the target to be detected in this step refers to a user who wants to perform voice recognition and comparison, and the target to be detected outputs a section of voice message, and the voice message of the target to be detected is collected by an acquisition module 11 of the voice recognition apparatus 100 according to the embodiment of the present invention.
Specifically, the step of preprocessing the voice message of the target to be detected and extracting the voice feature of the target to be detected is the same as the above-mentioned process of preprocessing the voice message of the specific target and extracting the voice feature of the specific target.
Step S104: taking the voice characteristics of the target to be detected as input data of a hidden vector state model trained in an identification mode, and training to obtain an acoustic model of the target to be detected;
specifically, in this step, the speech characteristics of the target to be measured are used as input data to train the acoustic model, in the embodiment of the present invention, a hidden vector state model is adopted and a discriminant training mode is used for training, and the discriminant training is performed on the basis of Maximum Mutual Information (MMI).
Step S105: and comparing the relevance between the acoustic model of the target to be detected and the acoustic model of the specific target, if the acoustic model of the target to be detected and the acoustic model of the specific target are related, performing language decoding on the voice characteristics of the target to be detected by using at least one language model, and judging whether to awaken or not according to a language decoding result.
Specifically, in this step, language decoding is performed when the acoustic model of the target to be tested conforms to the acoustic model of the specific target, and if the acoustic model of the target to be tested does not conform to the acoustic model of the specific target, no action is performed, and the language decoding performs language model training by using the voice feature of the target to be tested as input data.
When the acoustic model of the target to be detected is judged to be the acoustic model of the specific target, the target to be detected is represented as the specific target, and therefore language decoding is carried out to confirm whether the voice message of the target to be detected contains the awakening words or not. Training a lexicon model and a grammar model of the voice characteristics of the target to be detected, analyzing to obtain the voice message content of the target to be detected, then judging whether the voice message content of the target to be detected contains a wake-up word, if so, waking up the voice recognition to start, and if not, waking up the voice recognition to not start.
Referring to fig. 2, an embodiment of the present invention provides a device for voice recognition for specific target wake-up. A speech recognition device 100 includes an acquisition module 11, an extraction module 12, a training module 13, a storage module 14, a decoding module 15, a processor module 16, a registration module 17, and a wireless communication module 18.
The collecting module 11 is connected with the extracting module 12 and the registering module 17, wherein the collecting module 11 is provided with a plurality of microphone arrays for receiving the voice messages of the specific target and the target to be detected, the collected voice messages are analog voice signals which need to be converted into digital voice signals, meanwhile, the digital voice signals are subjected to noise suppression processing and echo cancellation processing, and then the processed digital voice messages are transmitted to the extracting module 12.
The definition of the specific target is an object for voice recognition of the specific target wakeup according to the present invention, and the definition of the target to be tested is an object for voice recognition by the voice recognition apparatus 100.
The voice message of the specific target comprises a preset awakening word.
The extraction module 12 is connected with the acquisition module 11, the training module 13 and the decoding module 15, and the extraction module 12 is used for receiving the voice message processed by the acquisition module 11, extracting the voice characteristics of a specific target and a target to be detected, and transmitting the voice characteristics to the training module 13 for acoustic model training or transmitting the voice characteristics to the decoding module 15 for decoding.
The extracting of the voice features of the specific target and the target to be detected is to extract the voice features of the voice message by using Mel-frequency Cepstral coeffients (MFCC for short).
The training module 13 is connected with the extraction module 12, the storage module 14 and the processor module 16. The training module 13 is configured to receive the speech features of the specific target and the target to be tested extracted by the extraction module 12, use the speech features of the specific target and the target to be tested as input data of a hidden vector state model trained by using a maximum mutual information method, finally obtain an acoustic model after training, and perform different steps according to the specific target and the target to be tested. If the target is the specific target, the acoustic model of the specific target is transmitted to the storage module 14, and if the target is the object to be measured, the acoustic model of the object to be measured is transmitted to the processor module 16.
The storage module 14 is connected with the training module 13, the processor module 16 and the registration module 17. The storage module 14 is configured to store the acoustic model of the specific target trained by the training module 13. In the embodiment of the present invention, when the specific target performs the operation of the registration module 17, the acoustic model of the specific target trained by the training module 13 is transmitted to the storage module 14 for storage. In addition, when the processor module 16 performs comparison between the object to be measured and the acoustic model of the specific object, the storage module 14 transmits the stored acoustic model of the specific object to the processor module 16.
The decoding module 15 is connected with the extracting module 12 and the processor module 16. The decoding module 15 is used to decode the speech information of the target to be detected, more specifically, the extracting module 12 trains the speech characteristics of the target to be detected as the input data of the lexicon model and the grammar model, and transmits the result to the processor module 16.
The processor module 16 is connected with the training module 13, the storage module 14, the decoding module 15 and the wireless communication module 18. The processor module 16 is configured to compare the acoustic model of the specific target with the acoustic model of the target to be detected, and determine whether to start the decoding module 15 for language decoding according to a comparison result of the two acoustic models, and more specifically, when the training module 13 transmits the acoustic model of the target to be detected, the processor module 16 simultaneously obtains the acoustic model of the specific target from the storage module 14, and compares the two acoustic models in the processor module 16.
When it is determined that the acoustic model of the specific target is related to the acoustic model of the target, i.e., the target is the specific target, the processor module 16 starts the decoding module 15 and the decoding module 15 performs language decoding, so that the voice message language decoding of the target is performed to determine whether the voice message language decoding includes the wakeup word.
The decoding module 15 obtains the voice features of the target to be detected from the extracting module 12, and returns the operation result of the language decoding to the processor module 16, and the processor module 16 can judge whether the voice message of the target to be detected contains the awakening word or not according to the acoustic model of the target to be detected and the result after the language decoding.
When the processor module 16 obtains that the voice message of the target to be detected contains the awakening word, the voice recognition device 100 is awakened, otherwise, the voice recognition device is not executed.
The registration module 17 is connected with the acquisition module 11 and the storage module 14. The registration module 17 is used for providing a specific target to register the speech recognition device 100, wherein the registration module 17 includes a start element and a display element, when the specific target touches the start element, the storage module 14 is started at the same time, which indicates that the acoustic model of the speech message collected by the acquisition module 11 at this time after being trained by the training module 13 needs to be stored in the storage module 14, and in addition, when the specific target touches the start element, the display element is started to provide the specific target to confirm whether the specific target is the registration stage at present.
In an embodiment of the present invention, the activating element is a button, and the display element is a light emitting diode.
The wireless communication module 18 is connected with the processor module 16. The wireless communication module 18 is configured to perform communication with the external device after the processor module 16 determines that the voice recognition device 100 is successfully woken up.
In the embodiment of the present invention, the wireless communication module 18 includes a Wi-Fi module or a bluetooth module.
As described above, the method and apparatus for waking up a specific target by speech recognition of the present invention uses the hidden vector state model of the discriminant training as the acoustic model, and the discriminant training using the maximum mutual information method not only maximizes the occurrence probability of the correct answer, but also reduces the occurrence probability of the competitor, increases the discrimination between the correct answer and the competitor, and can quickly and accurately determine whether the target to be tested is the specific target, thereby achieving the function of waking up.

Claims (10)

1. A method for target-specific wake up by speech recognition, comprising the steps of:
s1: receiving a voice message of a specific target, preprocessing the voice message of the specific target, and extracting a voice feature of the specific target;
s2: taking the voice features of the specific target as input data of a hidden vector state Model (HVS Model) trained in a discriminant manner, training to obtain a specific target acoustic Model, and storing the specific target acoustic Model;
s3: receiving a voice message of a target to be detected, preprocessing the voice message of the target to be detected, and extracting a voice feature of the target to be detected;
s4: taking the voice characteristics of the target to be detected as input data of a hidden vector state model trained in an identification mode, and training to obtain an acoustic model of the target to be detected;
s5: and comparing the relevance between the acoustic model of the target to be detected and the acoustic model of the specific target, if the acoustic model of the target to be detected and the acoustic model of the specific target are related, performing language decoding on the voice characteristics of the target to be detected by using at least one language model, and judging whether to awaken or not according to a language decoding result.
2. The method of claim 1, wherein the voice message of the specific target and the voice message of the target include at least one wake-up word.
3. The method of claim 1, wherein the pre-processing comprises: the voice message is processed with noise suppression and echo cancellation.
4. Method for target-specific wake up by speech recognition according to claim 1, characterized in that the speech features are derived by means of Mel Frequency Cepstral Coefficients (MFCCs).
5. The method of claim 1, wherein the discriminant training is performed using a Maximum Mutual Information (MMI) method.
6. The method of claim 1, wherein the language model comprises a lexicon model or a grammar model or a combination thereof.
7. The method of claim 2, wherein the determining whether the voice recognition wake up is achieved according to the speech decoding result comprises:
performing language decoding on the voice characteristics of the target to be detected;
judging whether the target voice message to be detected contains the awakening word or not;
if the awakening word is contained, the voice recognition awakening is started, and if the awakening word is not contained, the voice recognition awakening is not started.
8. An apparatus for target-specific wake up by speech recognition, the apparatus comprising:
the system comprises an acquisition module, a detection module and a display module, wherein the acquisition module comprises a plurality of microphone arrays and is used for receiving voice messages of a specific target and a target to be detected, and the voice messages comprise a wakeup word;
the extracting module is connected with the collecting module and is used for extracting MFCC voice characteristics in the voice messages of the specific target and the target to be detected;
the training module is connected with the extracting module and used for taking MFCC voice characteristics in the voice messages of the specific target and the target to be detected as input data of a hidden vector state model trained by a maximum mutual information method and acquiring an acoustic model of the trained specific target and an acoustic model of the target to be detected;
the storage module is connected with the training module and used for storing the trained acoustic model of the specific target;
the decoding module is connected with the extracting module and is used for carrying out language decoding on the voice message of the target to be detected; and
the processor module is connected with the training module, the storage module and the decoding module and used for comparing the acoustic model of the specific target in the storage module with the acoustic model of the target to be detected, judging whether the decoding module is started to carry out language decoding on the voice message of the target to be detected according to the comparison result, and confirming whether the voice message of the target to be detected after the language decoding contains the awakening word to awaken the device.
9. The device for waking up with a specific object through voice recognition according to claim 8, further comprising a registration module, wherein the registration module is connected to the acquisition module and the storage module, and the registration module is configured to initiate storing of the acoustic model of the specific object in the storage module.
10. The device of claim 8, further comprising a wireless communication module, wherein the wireless communication module is configured to perform an external communication link.
CN201910124945.7A 2019-02-19 2019-02-19 Method and device for specific target wake-up by voice recognition Withdrawn CN111583939A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910124945.7A CN111583939A (en) 2019-02-19 2019-02-19 Method and device for specific target wake-up by voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910124945.7A CN111583939A (en) 2019-02-19 2019-02-19 Method and device for specific target wake-up by voice recognition

Publications (1)

Publication Number Publication Date
CN111583939A true CN111583939A (en) 2020-08-25

Family

ID=72122523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910124945.7A Withdrawn CN111583939A (en) 2019-02-19 2019-02-19 Method and device for specific target wake-up by voice recognition

Country Status (1)

Country Link
CN (1) CN111583939A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971678A (en) * 2013-01-29 2014-08-06 腾讯科技(深圳)有限公司 Method and device for detecting keywords
CN106611597A (en) * 2016-12-02 2017-05-03 百度在线网络技术(北京)有限公司 Voice wakeup method and voice wakeup device based on artificial intelligence
CN107123417A (en) * 2017-05-16 2017-09-01 上海交通大学 Optimization method and system are waken up based on the customized voice that distinctive is trained
CN108281137A (en) * 2017-01-03 2018-07-13 中国科学院声学研究所 A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN109155132A (en) * 2016-03-21 2019-01-04 亚马逊技术公司 Speaker verification method and system
CN109243446A (en) * 2018-10-01 2019-01-18 厦门快商通信息技术有限公司 A kind of voice awakening method based on RNN network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971678A (en) * 2013-01-29 2014-08-06 腾讯科技(深圳)有限公司 Method and device for detecting keywords
CN109155132A (en) * 2016-03-21 2019-01-04 亚马逊技术公司 Speaker verification method and system
CN106611597A (en) * 2016-12-02 2017-05-03 百度在线网络技术(北京)有限公司 Voice wakeup method and voice wakeup device based on artificial intelligence
CN108281137A (en) * 2017-01-03 2018-07-13 中国科学院声学研究所 A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN107123417A (en) * 2017-05-16 2017-09-01 上海交通大学 Optimization method and system are waken up based on the customized voice that distinctive is trained
CN109243446A (en) * 2018-10-01 2019-01-18 厦门快商通信息技术有限公司 A kind of voice awakening method based on RNN network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DEYU ZHOU AND YULAN HE: "《Discriminative Training of the Hidden Vector State Model for Semantic Parsing》", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 *

Similar Documents

Publication Publication Date Title
CN110428810B (en) Voice wake-up recognition method and device and electronic equipment
EP3210205B1 (en) Sound sample verification for generating sound detection model
US9646610B2 (en) Method and apparatus for activating a particular wireless communication device to accept speech and/or voice commands using identification data consisting of speech, voice, image recognition
CN110047481B (en) Method and apparatus for speech recognition
WO2017071182A1 (en) Voice wakeup method, apparatus and system
WO2016150001A1 (en) Speech recognition method, device and computer storage medium
US6618702B1 (en) Method of and device for phone-based speaker recognition
US20120209609A1 (en) User-specific confidence thresholds for speech recognition
CN110570873B (en) Voiceprint wake-up method and device, computer equipment and storage medium
US20130197911A1 (en) Method and System For Endpoint Automatic Detection of Audio Record
CN109272991B (en) Voice interaction method, device, equipment and computer-readable storage medium
CN112102850B (en) Emotion recognition processing method and device, medium and electronic equipment
US20160111090A1 (en) Hybridized automatic speech recognition
CN111462756B (en) Voiceprint recognition method and device, electronic equipment and storage medium
US11626104B2 (en) User speech profile management
CN111145763A (en) GRU-based voice recognition method and system in audio
JP2003330485A (en) Voice recognition device, voice recognition system, and method for voice recognition
US20230206924A1 (en) Voice wakeup method and voice wakeup device
CN109074809B (en) Information processing apparatus, information processing method, and computer-readable storage medium
CN109065026B (en) Recording control method and device
CN117198338B (en) Interphone voiceprint recognition method and system based on artificial intelligence
TW202029181A (en) Method and apparatus for specific user to wake up by speech recognition
CN110808050B (en) Speech recognition method and intelligent device
CN111048068B (en) Voice wake-up method, device and system and electronic equipment
CN115691478A (en) Voice wake-up method and device, man-machine interaction equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200825