CN110503944A - The training of voice wake-up model and application method and device - Google Patents
The training of voice wake-up model and application method and device Download PDFInfo
- Publication number
- CN110503944A CN110503944A CN201910806848.6A CN201910806848A CN110503944A CN 110503944 A CN110503944 A CN 110503944A CN 201910806848 A CN201910806848 A CN 201910806848A CN 110503944 A CN110503944 A CN 110503944A
- Authority
- CN
- China
- Prior art keywords
- word
- word speed
- speed
- voice data
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 99
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000001514 detection method Methods 0.000 claims abstract description 110
- 230000002618 waking effect Effects 0.000 claims abstract description 16
- 230000015654 memory Effects 0.000 claims description 22
- 238000004891 communication Methods 0.000 claims description 5
- 238000007689 inspection Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 description 9
- 238000004590 computer program Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000010295 mobile communication Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000037007 arousal Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The present invention discloses training and application method and the device that voice wakes up model, wherein a kind of voice wakes up the training method of model, comprising: obtains the training voice data that model is waken up for voice;The trained voice data is separately input into keyword search system and word speed detection system;Whether the trained voice data for obtaining the keyword search system output includes specified the first output for waking up word as a result, obtaining the second output result of the speed of the trained voice data of the word speed detection system output;At least the keyword search system and the word speed detection system are trained using first attribute of the trained voice data and second attribute as benchmark.The scheme that the present processes and device provide is added word speed detection, the sliding window of different length is used for the voice of different word speeds, so as to greatly reduce influence of the word speed to result is waken up by considering influence of the word speed to result is waken up.
Description
Technical field
The invention belongs to the training of voice awakening technology field more particularly to voice wake-up model and application method and dresses
It sets.
Background technique
In the related technology, there is the key word spotting techniques based on deep learning, i.e. voice awakening technology, such as a language
Sound interactive system, when user says instruction, system judges whether it is to wake up word, if so, interactive system is waken up, if not
It is that interactive system is not waken up.
Voice wakes up model and needs given wake-up word, to pass through the training of early period, obtain waking up model.
Inventor has found that the voice awakening technology based on deep learning is in the normal of user during realizing the application
It is excellent in, is showed under fast word speed poor under word speed;The same wake-up word, wake-up rate is 90% under normal word speed, fast
It can be down to 70% under word speed.
Summary of the invention
The embodiment of the present invention provides training and application method and the device that a kind of voice wakes up model, at least solving
State one of technical problem.
In a first aspect, the embodiment of the present invention provides a kind of training method of voice wake-up model, comprising: obtain and be directed to voice
Wake up the training voice data of model, wherein there is the trained voice data known first attribute and known second to belong to
Property, whether first attribute is comprising specified wake-up word, and second attribute is word speed speed;By the trained voice data
It is separately input into keyword search system and word speed detection system, wherein the keyword search system is for detecting voice number
Whether comprising specified wake-up word in, the word speed detection system is used to detect the word speed speed of voice data;Obtain the pass
Whether the trained voice data of key word detection system output includes specified the first output for waking up word as a result, obtaining institute's predicate
Second output result of the speed of the trained voice data of fast detection system output;At least by the trained voice data
First attribute and second attribute carry out the keyword search system and the word speed detection system as benchmark
Training.
Second aspect, the embodiment of the present invention provide the application method that a kind of voice wakes up model, comprising: obtain user to
Detect voice data;Word speed detection system after the voice data to be detected is input to the training of the method by first aspect
In;Obtain the word speed speed result of the word speed detection system;It is determined based on the word speed speed result and is examined in the keyword
The correspondence sliding window length for the sliding window that examining system uses;The voice data to be detected is input to the side by first aspect
In the keyword search system of sliding window that is after method training and using the corresponding sliding window length;Obtain the keyword inspection
The output of examining system provides wake-up result based on the output.
The third aspect, the embodiment of the present invention provide a kind of training device of voice wake-up model, comprising: training obtains mould
Block is configured to obtain the training voice data for waking up model for voice, wherein the trained voice data has known the
Whether one attribute and known second attribute, first attribute are comprising specified wake-up word, and second attribute is that word speed is fast
Slowly;Input module is configured to the trained voice data being separately input into keyword search system and word speed detection system,
In, the keyword search system whether for detecting comprising specified wake-up word in voice data, use by the word speed detection system
In the word speed speed of detection voice data;Output obtains module, is configured to obtain the described of the keyword search system output
Whether training voice data includes specified the first output for waking up word as a result, obtaining the instruction of the word speed detection system output
Practice the second output result of the speed of voice data;And training module, it is configured to the institute at least by the trained voice data
The first attribute and second attribute are stated as benchmark to instruct the keyword search system and the word speed detection system
Practice.
Fourth aspect, the embodiment of the present invention provide a kind of use device of voice wake-up model, comprising: detection obtains mould
Block is configured to obtain the voice data to be detected of user;Word speed detection module is configured to input the voice data to be detected
In word speed detection system after to the method training by first aspect;Word speed obtains module, is configured to obtain the word speed inspection
The word speed speed result of examining system;Sliding window length determination modul is configured to the word speed speed result and determines described
The correspondence sliding window length for the sliding window that keyword search system uses;Keyword search module, being configured to will be described to be detected
Voice data is input to after the method by first aspect is trained and the sliding window using the corresponding sliding window length pass
In key word detection system;And result output module is waken up, it is configured to obtain the output of the keyword search system, is based on institute
It states output and provides wake-up result.
5th aspect, provides a kind of electronic equipment comprising: at least one processor, and with described at least one
Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, institute
It states instruction to be executed by least one described processor, so that at least one described processor is able to carry out any embodiment of the present invention
Voice wake up model training and application method the step of.
6th aspect, the embodiment of the present invention also provide a kind of computer program product, and the computer program product includes
The computer program being stored on non-volatile computer readable storage medium storing program for executing, the computer program include program instruction, when
When described program instruction is computer-executed, the computer is made to execute the instruction of the voice wake-up model of any embodiment of the present invention
The step of experienced and application method.
The scheme that the present processes and device provide passes through while training two systems: word speed detection system and keyword
Detection system considers influence of the word speed to result is waken up, and word speed detection is added, and uses different length for the voice of different word speeds
Sliding window, so as to greatly reduce word speed to wake up result influence, improve wake-up rate.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment
Attached drawing be briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is the flow chart for the training method that a kind of voice that one embodiment of the invention provides wakes up model;
Fig. 2 is the flow chart for the training method that another voice that one embodiment of the invention provides wakes up model;
Fig. 3 is the flow chart for the application method that another voice that one embodiment of the invention provides wakes up model;
Fig. 4 be another voice for providing of one embodiment of the invention wake up model training and one of application method it is specific
Exemplary block diagram;
Fig. 5 is the block diagram for the training device that also a kind of voice that one embodiment of the invention provides wakes up model;
Fig. 6 is the block diagram for the use device that also a kind of voice that one embodiment of the invention provides wakes up model;
Fig. 7 is the structural schematic diagram for the electronic equipment that one embodiment of the invention provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Referring to FIG. 1, it illustrates the flow chart that the voice of the application wakes up one embodiment of training method of model, this reality
The training method for applying the voice wake-up model of example can be adapted for the terminal for having Intelligent voice dialog arousal function, such as intelligent language
Sound TV, intelligent sound box, Intelligent dialogue toy and other existing intelligent terminals for supporting voice to wake up etc..
As shown in Figure 1, in a step 101, obtaining the training voice data for waking up model for voice;
In a step 102, training voice data is separately input into keyword search system and word speed detection system;
In step 103, whether the training voice data for obtaining the output of keyword search system includes specified wake-up word
First output is as a result, obtain the second output result of the speed of the training voice data of word speed detection system output;
At step 104, at least keyword is examined using the first attribute of training voice data and the second attribute as benchmark
Examining system and word speed detection system are trained.
In the present embodiment, for step 101, the training device that voice wakes up model obtains training voice data first,
Wherein, which has known first attribute and known second attribute, and whether the first attribute is comprising specified
Word is waken up, the second attribute is word speed speed, that is, whether train in voice data includes to specify to wake up word and training voice data
Word speed speed is known.Later, for step 102, the training device that voice wakes up model distinguishes the training voice data
It is input in keyword search system and word speed detection system, wherein keyword search system is for detecting in voice data
No to wake up word comprising specified, word speed detection system is used to detect the word speed speed of voice data.Wherein, keyword search system can
To be existing keyword search system, performance preferably keyword search system, the application for being also possible to the following exploitation exist
There is no limit for this.Keyword search system can identify preset keyword by training one, that is, wake up the model of word, so as to
It whether enough detects in voice comprising waking up word.Word speed detection system then be set one or more threshold value word speed is divided into it is more
A grade or section, to can determine whether that its word speed is in which grade or section, word speed detection system by inputting voice
It is also possible to the new system for detecting voice word speed of existing other word speed detection systems or the following exploitation, this Shen
Please there is no limit herein.
Later, for step 103, voice wakes up the instruction that the training device of model is exported by obtaining keyword search system
Practice whether voice data includes specified the first output for waking up word as a result, and obtaining the training voice of word speed detection system output
Second output result of the speed of data.Later for step 104, the training device that voice wakes up model at least will training voice
The first attribute and the second attribute of data are trained keyword search system and word speed detection system as benchmark, by with
On the basis of known attribute or target is trained keyword search model and word speed detection model, so that keyword search
Model and word speed detection model accuracy are higher.
Voice provided in this embodiment wakes up the training method of model by inputting the training voice data of known attribute
Into keyword search system and word speed detection system, so as to by after training constantly adjustment keyword search system and
The mode of the parameter of word speed detection system constantly optimizes keyword search system and word speed detection system, so that its detection performance
More preferably, accuracy is higher, is preferably applied in subsequent wake-up identification.
In some alternative embodiments, word speed detection system is two classifiers, is provided with one in word speed detection system
Word speed threshold value, wherein when word speed is more than or equal to word speed threshold value, output word speed is fast;When word speed is less than word speed threshold value, output
Word speed is slow.It is that fast and word speed is slowly to only be divided into word speed, two classifier trainings are got up simple and fast.
With further reference to Fig. 2, it illustrates the streams that the voice of the application wakes up another embodiment of the training method of model
Cheng Tu.The flow chart for the step of method of the present embodiment is further limited primarily directed to step 104 in flow chart 1.
In step 201, adjust word speed detection system parameter so that word speed detection system second output result base
This is equal to the second attribute;
In step 202, the different word speeds that detected corresponding to word speed detection system, by during training
The parameter of keyword search system is adjusted so that the first output result of keyword search system is substantially equal to the first attribute.
In the present embodiment, for step 201, voice wakes up the training device of model by adjusting word speed detection system
So that the second output result of word speed detection system is substantially equal to the second attribute, i.e. trained result increasingly approaches very parameter
Real result.Later, for step 202, corresponding to the different word speeds that word speed detection system detected, by trained
The parameter for adjusting keyword search system in the process makes the first output result of keyword search system be substantially equal to the first category
Property, that is, judge whether also more to level off to comprising keyword it is true whether include keyword result.
The method of the present embodiment makes the output result of training by parameter during training by adjusting model
Substantially equal to really as a result, detection performance is more preferable so that each model accuracy after training is higher.
In some alternative embodiments, the parameter of keyword search system includes sliding window length.To for difference
Word speed, can by adjust this parameter of sliding window length make the accuracy of keyword search model higher.
Referring to FIG. 3, it illustrates the streams that a kind of voice that one embodiment of the application provides wakes up the application method of model
Cheng Tu.The training method that the voice of the present embodiment wakes up model can be adapted for the end for having Intelligent voice dialog arousal function
End, such as intelligent sound TV, intelligent sound box, Intelligent dialogue toy and other existing intelligent terminals for supporting voice to wake up
Deng.
As shown in figure 3, in step 301, obtaining the voice data to be detected of user;
In step 302, the word speed after voice data to be detected being input to the training of the method by above embodiments is examined
In examining system;
In step 303, the word speed speed result of word speed detection system is obtained;
In step 304, determine that the corresponding of sliding window used in keyword search system is slided based on word speed speed result
Dynamic window length;
In step 305, after voice data to be detected being input to the training of the method by above embodiments and use
In the keyword search system of the sliding window of corresponding sliding window length;
Within step 306, the output for obtaining keyword search system, provides wake-up result based on output.
In the present embodiment, for step 301, the use device that voice wakes up model obtains the voice number to be detected of user
According to.Later for step 302, voice wakes up the use device of model and voice data to be monitored is input to by Fig. 1, Fig. 2 and
In the word speed detection system that the method for relevant embodiment was trained.Then for step 303, the output of word speed detection system is obtained
Word speed speed result.Later for step 304, determine needs in keyword search based on the word speed speed result of internal heat before
The correspondence sliding window length of sliding window used in system, it is however generally that, when word speed is fast, corresponding sliding window length is shorter,
When word speed is slow, corresponding sliding window length is longer, so that corresponding sliding window length is with Speed variation, keyword search system
It is influenced greatly to be lowered by word speed, wakes up word so as to preferably detect whether to exist.
The method of the present embodiment first detects the speed of word speed by after getting voice data to be detected, further according to
The speed of word speed adjusts accordingly the parameter of keyword search system, so that inspection of the word speed to keyword search system
The influence for surveying accuracy reduces as much as possible, so that keyword search system can have better detectability to keyword,
Waking up performance can also be promoted.
In some alternative embodiments, the sliding window used in keyword search system is determined based on word speed speed result
Correspondence sliding window length include: when word speed speed result be fast word speed when, by the sliding of the sliding window of keyword search system
Window length reduces preset length with corresponding with fast word speed;When word speed speed result is slow word speed, by keyword search system
The sliding window length of sliding window increases preset length with corresponding with slow word speed.The present embodiment proposes a kind of determining sliding window length
Method, by this method can be allowed to by the dynamic regulation of word speed and sliding window length based on the word speed detected with
Corresponding sliding window length is reduced or increased to switch to and be suitble to faster or slower language in the speed that benchmark word speed is compared
The sliding window length of speed, so that preferably identification wakes up word.
In some alternative embodiments, the sliding window used in keyword search system is determined based on word speed speed result
Correspondence sliding window length include: when word speed speed result is fast word speed, corresponding sliding window length is L1;When word speed speed knot
When fruit is slow word speed, corresponding sliding window length is L2, wherein L1 < L2.The present embodiment proposes a kind of determining sliding window length
Method can be allowed to by word speed and the one-to-one mode of sliding window length based on the word speed detected by this method
Corresponding sliding window length is switched to, so that preferably identification wakes up word.
Below to some problems encountered in the implementation of the present invention by description inventor and to finally determination
One specific embodiment of scheme is illustrated, so that those skilled in the art more fully understand the scheme of the application.
Inventor has found that the defect of prior art is mainly due to following original after carefully studying to the prior art
Because caused by:
Voice wakes up model, is passing through early period after training, model parameter has been fixed;However under different word speeds, model
Parameter needs adjust.The model of preset parameter is not suitable for solving the problems, such as that word speed changes.
Those skilled in the art may use following scheme to solve drawbacks described above:
In order to solve the problems, such as that word speed changes, the parameter for changing speech feature extraction is generallyd use, to adapt to fast word speed;So
And the parameter of fast word speed is adapted to, and the wake-up rate of normal word speed can be reduced.
It is generally used on very small-sized smart machine since voice wakes up model, such as Intelligent bracelet, intelligent hand
Machine, it is smaller for the memory of domination, and changeable parameters/multi-parameters model needs bigger memory size.It is set however as intelligence
The development of standby hardware, workable memory headroom are obviously improved, and there has also been the bases of realization for changeable parameters/multi-parameters model.
The scheme of the application proposes the training and use device of a kind of voice wake-up model:
In the keyword search system-based used now, increase a word speed detection system;Word speed detection system is used
In the speed of detection word speed, corresponding different word speed uses different Prediction Parameters;Word speed detection system cooperates keyword search
Model can achieve preferable wake-up effect.
Referring to FIG. 4, it illustrates the flow charts of a specific embodiment of the scheme of the application, it should be noted that with
Although referring to some specific examples in lower embodiment, the scheme being not intended to limit this application.
As shown in figure 4, training two systems, (voice in Fig. 4 wakes up keyword search system in model training stage
Model) and word speed detection system (regression model in Fig. 4).Wherein the input of keyword search system is training data, i.e., greatly
Amount includes or not comprising the recording for waking up word, whether output recording is comprising waking up word.The input of word speed detection system is equally
Recording data, export for record word speed speed, it is substantially two classifiers.
In test phase, test recording can be sent into word speed detection and keyword search system;Word speed detection system detects language
The quality of speed, if it is fast word speed, keyword search system uses the sliding window of smaller length, if slowly, using relatively greatly enhancing
The sliding window of degree;Wake-up is finally provided as a result, whether being keyword.
Wherein, the word speed detection system of training stage, can be substituted with regression model.In the training process that word speed determines
In, the label sequence for voice segments is inputted, which can calculate the duration of each word in voice segments, export as language
Speed, this is a regression model, i.e., output valve is continuous, can choose a linear regression model (LRM).
In the test process that word speed determines, the label sequence of voice is inputted, exports as word speed, different word speeds is divided
For three classifications, at a slow speed, normal word speed is with quickly, and each word speed corresponds to the long size of different window, and word speed is faster, and window length is smaller,
In.Label sequence refers to the corresponding annotated sequence of voice segments, is to be obtained by original pinyin marking by HMM acoustic model
The other annotated sequence of frame level.
It is waken up in model in voice, training stage mode input is phonetic feature, and output is that the posteriority of corresponding label is general
Rate is determined as the probability of some label.In test process, model exports posteriority score, then according to scoring system to posteriority
Score is given a mark, this scoring system relies on window long value, and final output wakes up result (wake up/not waking up).
In scoring system, according to the posterior probability that wake-up model obtains, whole score, the score are calculated according to marking principle
It can be waken up more than threshold value;In scoring process, window length is directly affected in one section of voice segments, some label posterior probability is maximum
The calculating of value, when word speed is very fast, window length is shorter, and obtained maximum value is more accurate;When word speed is slow, window length is longer than calibrated
Really.
Above-described embodiment at least can be realized following technical effect:
By considering influence of the word speed to result is waken up word speed detection is added, for difference in scheme provided in this embodiment
The voice of word speed uses the sliding window of different length, so as to greatly reduce influence of the word speed to result is waken up.
Referring to FIG. 5, it illustrates the block diagrams that the voice that one embodiment of the invention provides wakes up the training device of model.
As shown in figure 5, a kind of voice wakes up the training device 500 of model, including training obtains module 510, input module
520, output obtains module 530 and training module 540.
Wherein, training obtains module 510, is configured to obtain the training voice data for waking up model for voice, wherein institute
Trained voice data is stated with known first attribute and known second attribute, first attribute is whether to call out comprising specified
Awake word, second attribute are word speed speed;Input module 520 is configured to the trained voice data being separately input into pass
Key word detection system and word speed detection system, wherein the keyword search system for detect in voice data whether include
Specified to wake up word, the word speed detection system is used to detect the word speed speed of voice data;Output obtains module 530, is configured to
The trained voice data for obtaining keyword search system output whether include specified the first output for waking up word as a result,
Obtain the second output result of the speed of the trained voice data of the word speed detection system output;And training module
540, it is configured at least using first attribute of the trained voice data and second attribute as benchmark to the pass
Key word detection system and the word speed detection system are trained.
With further reference to Fig. 6, it illustrates the frames that the voice that one embodiment of the invention provides wakes up the use device of model
Figure.
As shown in fig. 6, a kind of voice wakes up the use device 600 of model, including detection obtains module 610, word speed detection
Module 620, word speed obtain module 630, sliding window length determination modul 640, keyword search module 650 and wake up result output
Module 660.
Wherein, detection obtains module 610, is configured to obtain the voice data to be detected of user;Word speed detection module 620,
It is configured to be input to the voice data to be detected in the word speed detection system after process method training shown in FIG. 1;
Word speed obtains module 630, is configured to obtain the word speed speed result of the word speed detection system;Sliding window length determination modul
640, it is configured to the corresponding sliding that the word speed speed result determines the sliding window used in the keyword search system
Window length;Keyword search module 650 is configured to for the voice data to be detected being input to by process side shown in FIG. 1
In the keyword search system of sliding window that is after method training and using the corresponding sliding window length;And wake up result output
Module 660 is configured to obtain the output of the keyword search system, provides wake-up result based on the output.
It should be appreciated that all modules recorded in Fig. 5 and Fig. 6 with reference to each in method described in Fig. 1, Fig. 2 and Fig. 3
Step is corresponding.As a result, the operation above with respect to method description and feature and corresponding technical effect be equally applicable to Fig. 5 and
All modules in Fig. 6, details are not described herein.
It is worth noting that, the scheme that the module in embodiments herein is not intended to limit this application, such as train
Obtaining module can be described as obtaining the module for the training voice data that model is waken up for voice.Furthermore it is also possible to by hard
Part processor realizes related function module, such as training obtains module can also realize that details are not described herein with processor.
In further embodiments, the embodiment of the invention also provides a kind of nonvolatile computer storage medias, calculate
Machine storage medium is stored with computer executable instructions, which can be performed in above-mentioned any means embodiment
Voice wake up model training and application method;
As an implementation, nonvolatile computer storage media of the invention is stored with the executable finger of computer
It enables, computer executable instructions setting are as follows:
Obtain the training voice data that model is waken up for voice, wherein the trained voice data has known the
Whether one attribute and known second attribute, first attribute are comprising specified wake-up word, and second attribute is that word speed is fast
Slowly;
The trained voice data is separately input into keyword search system and word speed detection system, wherein the pass
Whether key word detection system is for detecting comprising specified wake-up word in voice data, and the word speed detection system is for detecting voice
The word speed speed of data;
Whether the trained voice data for obtaining the keyword search system output includes the first of specified wake-up word
Output is as a result, obtain the second output result of the speed of the trained voice data of the word speed detection system output;
At least using first attribute of the trained voice data and second attribute as benchmark to the key
Word detection system and the word speed detection system are trained.
As another embodiment, nonvolatile computer storage media of the invention is stored with the executable finger of computer
It enables, computer executable instructions setting are as follows:
Obtain the voice data to be detected of user;
The voice data to be detected is input in the word speed detection system after claim 1-4 training;
Obtain the word speed speed result of the word speed detection system;
The correspondence sliding window of the sliding window used in the keyword search system is determined based on the word speed speed result
Length;
The voice data to be detected is input to after claim 1-4 training and uses the corresponding sliding window
In the keyword search system of the sliding window of length;
The output for obtaining the keyword search system provides wake-up result based on the output.
Non-volatile computer readable storage medium storing program for executing may include storing program area and storage data area, wherein storage journey
It sequence area can application program required for storage program area, at least one function;Storage data area can store and be waken up according to voice
The training of model and use device use created data etc..In addition, non-volatile computer readable storage medium storing program for executing can be with
It can also include nonvolatile memory, for example, at least disk memory, a flash memory including high-speed random access memory
Device or other non-volatile solid state memory parts.In some embodiments, non-volatile computer readable storage medium storing program for executing is optional
Including the memory remotely located relative to processor, these remote memories can wake up model by network connection to voice
Training and use device.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile communication
Net and combinations thereof.
The embodiment of the present invention also provides a kind of computer program product, and computer program product is non-volatile including being stored in
Computer program on computer readable storage medium, computer program include program instruction, when program instruction is held by computer
When row, computer is made to execute training and application method that any of the above-described voice wakes up model.
Fig. 7 is the structural schematic diagram of electronic equipment provided in an embodiment of the present invention, as shown in fig. 7, the equipment includes: one
Or multiple processors 710 and memory 720, in Fig. 7 by taking a processor 710 as an example.Voice wakes up training and making for model
It can also include: input unit 730 and output device 740 with the equipment of method.Processor 710, memory 720, input unit
730 can be connected with output device 740 by bus or other modes, in Fig. 7 for being connected by bus.Memory
720 be above-mentioned non-volatile computer readable storage medium storing program for executing.Processor 710 is stored in non-in memory 720 by operation
Volatibility software program, instruction and module are realized thereby executing the various function application and data processing of server
State training and application method that embodiment of the method voice wakes up model.Input unit 730 can receive the number or character letter of input
Breath, and generation key signals related with voice the wake-up user setting and function control of training and use device of model are defeated
Enter.Output device 740 may include that display screen etc. shows equipment.
Method provided by the embodiment of the present invention can be performed in the said goods, has the corresponding functional module of execution method and has
Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present invention.
As an implementation, above-mentioned electronic apparatus application is in the training device that voice wakes up model, comprising: at least
One processor;And the memory being connect at least one processor communication;Wherein, be stored with can be by least one for memory
The instruction that a processor executes, instruction are executed by least one processor so that at least one processor can:
Obtain the training voice data that model is waken up for voice, wherein the trained voice data has known the
Whether one attribute and known second attribute, first attribute are comprising specified wake-up word, and second attribute is that word speed is fast
Slowly;
The trained voice data is separately input into keyword search system and word speed detection system, wherein the pass
Whether key word detection system is for detecting comprising specified wake-up word in voice data, and the word speed detection system is for detecting voice
The word speed speed of data;
Whether the trained voice data for obtaining the keyword search system output includes the first of specified wake-up word
Output is as a result, obtain the second output result of the speed of the trained voice data of the word speed detection system output;
At least using first attribute of the trained voice data and second attribute as benchmark to the key
Word detection system and the word speed detection system are trained.
As another embodiment, above-mentioned electronic apparatus application is in the use device that voice wakes up model, comprising: extremely
A few processor;And the memory being connect at least one processor communication;Wherein, be stored with can be by least for memory
One processor execute instruction, instruction executed by least one processor so that at least one processor can:
Obtain the voice data to be detected of user;
The voice data to be detected is input in the word speed detection system after claim 1-4 training;
Obtain the word speed speed result of the word speed detection system;
The correspondence sliding window of the sliding window used in the keyword search system is determined based on the word speed speed result
Length;
The voice data to be detected is input to after claim 1-4 training and uses the corresponding sliding window
In the keyword search system of the sliding window of length;
The output for obtaining the keyword search system provides wake-up result based on the output.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data
Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low
Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function
Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio,
Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total
Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy
Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
The apparatus embodiments described above are merely exemplary, wherein unit can be as illustrated by the separation member
Or may not be and be physically separated, component shown as a unit may or may not be physical unit, i.e.,
It can be located in one place, or may be distributed over multiple network units.It can select according to the actual needs therein
Some or all of the modules achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creative labor
In the case where dynamic, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
The method of certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. the training method that a kind of voice wakes up model, comprising:
Obtain the training voice data that model is waken up for voice, wherein the trained voice data has known first to belong to
Property and known second attribute, first attribute be whether comprising specified to wake up word, second attribute is word speed speed;
The trained voice data is separately input into keyword search system and word speed detection system, wherein the keyword
Whether detection system is for detecting comprising specified wake-up word in voice data, and the word speed detection system is for detecting voice data
Word speed speed;
Whether the trained voice data for obtaining the keyword search system output includes specified the first output for waking up word
As a result, obtaining the second output result of the speed of the trained voice data of the word speed detection system output;
At least the keyword is examined using first attribute of the trained voice data and second attribute as benchmark
Examining system and the word speed detection system are trained.
2. according to the method described in claim 1, wherein, the word speed detection system is two classifiers, the word speed detection system
A word speed threshold value is provided in system, wherein
When word speed is more than or equal to the word speed threshold value, output word speed is fast;
When word speed is less than the word speed threshold value, output word speed is slow.
3. according to the method described in claim 1, wherein, it is described at least by first attribute of the trained voice data and
Second attribute is trained to the keyword search system and the word speed detection system as benchmark and includes:
The parameter of word speed detection system is adjusted so that the second output result of word speed detection system is substantially equal to described second and belongs to
Property;
Corresponding to the different word speeds that the word speed detection system detected, by adjusting keyword inspection during training
The parameter of examining system is so that the first output result of keyword search system is substantially equal to first attribute.
4. according to the method described in claim 3, wherein, the parameter of the keyword search system includes sliding window length.
5. the application method that a kind of voice wakes up model, comprising:
Obtain the voice data to be detected of user;
The voice data to be detected is input in the word speed detection system after claim 1-4 training;
Obtain the word speed speed result of the word speed detection system;
The correspondence sliding window length of the sliding window used in the keyword search system is determined based on the word speed speed result;
The voice data to be detected is input to after claim 1-4 training and uses the corresponding sliding window length
Sliding window keyword search system in;
The output for obtaining the keyword search system provides wake-up result based on the output.
6. according to the method described in claim 5, wherein, described determined based on the word speed speed result is examined in the keyword
The correspondence sliding window length for the sliding window that examining system uses includes:
When the word speed speed result is fast word speed, the sliding window length of the sliding window of the keyword search system is reduced
Preset length is with corresponding with the fast word speed;
When the word speed speed result is slow word speed, the sliding window length of the sliding window of the keyword search system is increased
Preset length is with corresponding with the slow word speed.
7. according to the method described in claim 5, wherein, described determined based on the word speed speed result is examined in the keyword
The correspondence sliding window length for the sliding window that examining system uses includes:
When the word speed speed result is fast word speed, corresponding sliding window length is L1;
When the word speed speed result is slow word speed, corresponding sliding window length is L2, wherein L1 < L2.
8. the training device that a kind of voice wakes up model, comprising:
Training obtains module, is configured to obtain the training voice data for waking up model for voice, wherein the trained voice number
According to known first attribute and known second attribute, first attribute is whether comprising specified to wake up word, described the
Two attributes are word speed speed;
Input module is configured to the trained voice data being separately input into keyword search system and word speed detection system,
Wherein, whether the keyword search system is for detecting in voice data comprising specified wake-up word, the word speed detection system
For detecting the word speed speed of voice data;
Output obtains module, and whether the trained voice data for being configured to obtain the keyword search system output includes to refer to
Surely the first output of word is waken up as a result, obtaining the second of the speed of the trained voice data of the word speed detection system output
Export result;
Training module is configured at least using first attribute of the trained voice data and second attribute as benchmark
The keyword search system and the word speed detection system are trained.
9. the use device that a kind of voice wakes up model, comprising:
Detection obtains module, is configured to obtain the voice data to be detected of user;
Word speed detection module is configured to for the voice data to be detected to be input to the word speed after claim 1-4 training
In detection system;
Word speed obtains module, is configured to obtain the word speed speed result of the word speed detection system;
Sliding window length determination modul is configured to the word speed speed result and determines in keyword search system use
Sliding window correspondence sliding window length;
Keyword search module, be configured to for the voice data to be detected to be input to it is after claim 1-4 training and
In keyword search system using the sliding window of the corresponding sliding window length;
Result output module is waken up, is configured to obtain the output of the keyword search system, wake-up is provided based on the output
As a result.
10. a kind of electronic equipment comprising: at least one processor, and connect at least one described processor communication
Memory, wherein the memory be stored with can by least one described processor execute instruction, described instruction by it is described extremely
A few processor executes, so that at least one described processor is able to carry out the step of any one of claim 1 to 7 the method
Suddenly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910806848.6A CN110503944B (en) | 2019-08-29 | 2019-08-29 | Method and device for training and using voice awakening model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910806848.6A CN110503944B (en) | 2019-08-29 | 2019-08-29 | Method and device for training and using voice awakening model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110503944A true CN110503944A (en) | 2019-11-26 |
CN110503944B CN110503944B (en) | 2021-09-24 |
Family
ID=68590309
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910806848.6A Active CN110503944B (en) | 2019-08-29 | 2019-08-29 | Method and device for training and using voice awakening model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110503944B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110910885A (en) * | 2019-12-12 | 2020-03-24 | 苏州思必驰信息科技有限公司 | Voice awakening method and device based on decoding network |
CN112466332A (en) * | 2020-11-13 | 2021-03-09 | 阳光保险集团股份有限公司 | Method and device for scoring speed, electronic equipment and storage medium |
WO2021134549A1 (en) * | 2019-12-31 | 2021-07-08 | 李庆远 | Human merging and training of multiple artificial intelligence outputs |
CN113782014A (en) * | 2021-09-26 | 2021-12-10 | 联想(北京)有限公司 | Voice recognition method and device |
CN115223553A (en) * | 2022-03-11 | 2022-10-21 | 广州汽车集团股份有限公司 | Voice recognition method and driving assistance system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002358094A (en) * | 2001-03-29 | 2002-12-13 | Ricoh Co Ltd | Voice recognition system |
DE102004012209A1 (en) * | 2004-03-12 | 2005-10-06 | Siemens Ag | Noise reducing method for speech recognition system in e.g. mobile telephone, involves selecting noise models based on vehicle parameters for noise reduction, where parameters are obtained from signal that does not represent sound |
CN108701452A (en) * | 2016-02-02 | 2018-10-23 | 日本电信电话株式会社 | Audio model learning method, audio recognition method, audio model learning device, speech recognition equipment, audio model learning program and speech recognition program |
CN109671433A (en) * | 2019-01-10 | 2019-04-23 | 腾讯科技(深圳)有限公司 | A kind of detection method and relevant apparatus of keyword |
-
2019
- 2019-08-29 CN CN201910806848.6A patent/CN110503944B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002358094A (en) * | 2001-03-29 | 2002-12-13 | Ricoh Co Ltd | Voice recognition system |
DE102004012209A1 (en) * | 2004-03-12 | 2005-10-06 | Siemens Ag | Noise reducing method for speech recognition system in e.g. mobile telephone, involves selecting noise models based on vehicle parameters for noise reduction, where parameters are obtained from signal that does not represent sound |
CN108701452A (en) * | 2016-02-02 | 2018-10-23 | 日本电信电话株式会社 | Audio model learning method, audio recognition method, audio model learning device, speech recognition equipment, audio model learning program and speech recognition program |
CN109671433A (en) * | 2019-01-10 | 2019-04-23 | 腾讯科技(深圳)有限公司 | A kind of detection method and relevant apparatus of keyword |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110910885A (en) * | 2019-12-12 | 2020-03-24 | 苏州思必驰信息科技有限公司 | Voice awakening method and device based on decoding network |
WO2021134549A1 (en) * | 2019-12-31 | 2021-07-08 | 李庆远 | Human merging and training of multiple artificial intelligence outputs |
CN112466332A (en) * | 2020-11-13 | 2021-03-09 | 阳光保险集团股份有限公司 | Method and device for scoring speed, electronic equipment and storage medium |
CN112466332B (en) * | 2020-11-13 | 2024-05-28 | 阳光保险集团股份有限公司 | Method and device for scoring speech rate, electronic equipment and storage medium |
CN113782014A (en) * | 2021-09-26 | 2021-12-10 | 联想(北京)有限公司 | Voice recognition method and device |
CN113782014B (en) * | 2021-09-26 | 2024-03-26 | 联想(北京)有限公司 | Speech recognition method and device |
CN115223553A (en) * | 2022-03-11 | 2022-10-21 | 广州汽车集团股份有限公司 | Voice recognition method and driving assistance system |
CN115223553B (en) * | 2022-03-11 | 2023-11-17 | 广州汽车集团股份有限公司 | Speech recognition method and driving assistance system |
Also Published As
Publication number | Publication date |
---|---|
CN110503944B (en) | 2021-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110503944A (en) | The training of voice wake-up model and application method and device | |
US11127416B2 (en) | Method and apparatus for voice activity detection | |
CN110838286B (en) | Model training method, language identification method, device and equipment | |
CN108877778B (en) | Sound end detecting method and equipment | |
CN110268469B (en) | Server side hotword | |
US10269346B2 (en) | Multiple speech locale-specific hotword classifiers for selection of a speech locale | |
US20190115011A1 (en) | Detecting keywords in audio using a spiking neural network | |
US9837068B2 (en) | Sound sample verification for generating sound detection model | |
CN105185373B (en) | The generation of prosody hierarchy forecast model and prosody hierarchy Forecasting Methodology and device | |
CN108198548A (en) | A kind of voice awakening method and its system | |
CN108564941A (en) | Audio recognition method, device, equipment and storage medium | |
US9418662B2 (en) | Method, apparatus and computer program product for providing compound models for speech recognition adaptation | |
CN110517670A (en) | Promote the method and apparatus for waking up performance | |
CN105426404A (en) | Music information recommendation method and apparatus, and terminal | |
CN110222649B (en) | Video classification method and device, electronic equipment and storage medium | |
CN102280106A (en) | VWS method and apparatus used for mobile communication terminal | |
CN102982811A (en) | Voice endpoint detection method based on real-time decoding | |
CN110473539A (en) | Promote the method and apparatus that voice wakes up performance | |
CN112581938B (en) | Speech breakpoint detection method, device and equipment based on artificial intelligence | |
CN111179915A (en) | Age identification method and device based on voice | |
CN103903633A (en) | Method and apparatus for detecting voice signal | |
CN111128134A (en) | Acoustic model training method, voice awakening method, device and electronic equipment | |
CN101510423A (en) | Pronunciation detection method and apparatus | |
CN104700831B (en) | The method and apparatus for analyzing the phonetic feature of audio file | |
CN105609114A (en) | Method and device for detecting pronunciation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant after: Sipic Technology Co.,Ltd. Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant before: AI SPEECH Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |