CN110503944A - The training of voice wake-up model and application method and device - Google Patents

The training of voice wake-up model and application method and device Download PDF

Info

Publication number
CN110503944A
CN110503944A CN201910806848.6A CN201910806848A CN110503944A CN 110503944 A CN110503944 A CN 110503944A CN 201910806848 A CN201910806848 A CN 201910806848A CN 110503944 A CN110503944 A CN 110503944A
Authority
CN
China
Prior art keywords
word
word speed
speed
voice data
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910806848.6A
Other languages
Chinese (zh)
Other versions
CN110503944B (en
Inventor
王蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AI Speech Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201910806848.6A priority Critical patent/CN110503944B/en
Publication of CN110503944A publication Critical patent/CN110503944A/en
Application granted granted Critical
Publication of CN110503944B publication Critical patent/CN110503944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention discloses training and application method and the device that voice wakes up model, wherein a kind of voice wakes up the training method of model, comprising: obtains the training voice data that model is waken up for voice;The trained voice data is separately input into keyword search system and word speed detection system;Whether the trained voice data for obtaining the keyword search system output includes specified the first output for waking up word as a result, obtaining the second output result of the speed of the trained voice data of the word speed detection system output;At least the keyword search system and the word speed detection system are trained using first attribute of the trained voice data and second attribute as benchmark.The scheme that the present processes and device provide is added word speed detection, the sliding window of different length is used for the voice of different word speeds, so as to greatly reduce influence of the word speed to result is waken up by considering influence of the word speed to result is waken up.

Description

The training of voice wake-up model and application method and device
Technical field
The invention belongs to the training of voice awakening technology field more particularly to voice wake-up model and application method and dresses It sets.
Background technique
In the related technology, there is the key word spotting techniques based on deep learning, i.e. voice awakening technology, such as a language Sound interactive system, when user says instruction, system judges whether it is to wake up word, if so, interactive system is waken up, if not It is that interactive system is not waken up.
Voice wakes up model and needs given wake-up word, to pass through the training of early period, obtain waking up model.
Inventor has found that the voice awakening technology based on deep learning is in the normal of user during realizing the application It is excellent in, is showed under fast word speed poor under word speed;The same wake-up word, wake-up rate is 90% under normal word speed, fast It can be down to 70% under word speed.
Summary of the invention
The embodiment of the present invention provides training and application method and the device that a kind of voice wakes up model, at least solving State one of technical problem.
In a first aspect, the embodiment of the present invention provides a kind of training method of voice wake-up model, comprising: obtain and be directed to voice Wake up the training voice data of model, wherein there is the trained voice data known first attribute and known second to belong to Property, whether first attribute is comprising specified wake-up word, and second attribute is word speed speed;By the trained voice data It is separately input into keyword search system and word speed detection system, wherein the keyword search system is for detecting voice number Whether comprising specified wake-up word in, the word speed detection system is used to detect the word speed speed of voice data;Obtain the pass Whether the trained voice data of key word detection system output includes specified the first output for waking up word as a result, obtaining institute's predicate Second output result of the speed of the trained voice data of fast detection system output;At least by the trained voice data First attribute and second attribute carry out the keyword search system and the word speed detection system as benchmark Training.
Second aspect, the embodiment of the present invention provide the application method that a kind of voice wakes up model, comprising: obtain user to Detect voice data;Word speed detection system after the voice data to be detected is input to the training of the method by first aspect In;Obtain the word speed speed result of the word speed detection system;It is determined based on the word speed speed result and is examined in the keyword The correspondence sliding window length for the sliding window that examining system uses;The voice data to be detected is input to the side by first aspect In the keyword search system of sliding window that is after method training and using the corresponding sliding window length;Obtain the keyword inspection The output of examining system provides wake-up result based on the output.
The third aspect, the embodiment of the present invention provide a kind of training device of voice wake-up model, comprising: training obtains mould Block is configured to obtain the training voice data for waking up model for voice, wherein the trained voice data has known the Whether one attribute and known second attribute, first attribute are comprising specified wake-up word, and second attribute is that word speed is fast Slowly;Input module is configured to the trained voice data being separately input into keyword search system and word speed detection system, In, the keyword search system whether for detecting comprising specified wake-up word in voice data, use by the word speed detection system In the word speed speed of detection voice data;Output obtains module, is configured to obtain the described of the keyword search system output Whether training voice data includes specified the first output for waking up word as a result, obtaining the instruction of the word speed detection system output Practice the second output result of the speed of voice data;And training module, it is configured to the institute at least by the trained voice data The first attribute and second attribute are stated as benchmark to instruct the keyword search system and the word speed detection system Practice.
Fourth aspect, the embodiment of the present invention provide a kind of use device of voice wake-up model, comprising: detection obtains mould Block is configured to obtain the voice data to be detected of user;Word speed detection module is configured to input the voice data to be detected In word speed detection system after to the method training by first aspect;Word speed obtains module, is configured to obtain the word speed inspection The word speed speed result of examining system;Sliding window length determination modul is configured to the word speed speed result and determines described The correspondence sliding window length for the sliding window that keyword search system uses;Keyword search module, being configured to will be described to be detected Voice data is input to after the method by first aspect is trained and the sliding window using the corresponding sliding window length pass In key word detection system;And result output module is waken up, it is configured to obtain the output of the keyword search system, is based on institute It states output and provides wake-up result.
5th aspect, provides a kind of electronic equipment comprising: at least one processor, and with described at least one Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, institute It states instruction to be executed by least one described processor, so that at least one described processor is able to carry out any embodiment of the present invention Voice wake up model training and application method the step of.
6th aspect, the embodiment of the present invention also provide a kind of computer program product, and the computer program product includes The computer program being stored on non-volatile computer readable storage medium storing program for executing, the computer program include program instruction, when When described program instruction is computer-executed, the computer is made to execute the instruction of the voice wake-up model of any embodiment of the present invention The step of experienced and application method.
The scheme that the present processes and device provide passes through while training two systems: word speed detection system and keyword Detection system considers influence of the word speed to result is waken up, and word speed detection is added, and uses different length for the voice of different word speeds Sliding window, so as to greatly reduce word speed to wake up result influence, improve wake-up rate.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is the flow chart for the training method that a kind of voice that one embodiment of the invention provides wakes up model;
Fig. 2 is the flow chart for the training method that another voice that one embodiment of the invention provides wakes up model;
Fig. 3 is the flow chart for the application method that another voice that one embodiment of the invention provides wakes up model;
Fig. 4 be another voice for providing of one embodiment of the invention wake up model training and one of application method it is specific Exemplary block diagram;
Fig. 5 is the block diagram for the training device that also a kind of voice that one embodiment of the invention provides wakes up model;
Fig. 6 is the block diagram for the use device that also a kind of voice that one embodiment of the invention provides wakes up model;
Fig. 7 is the structural schematic diagram for the electronic equipment that one embodiment of the invention provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Referring to FIG. 1, it illustrates the flow chart that the voice of the application wakes up one embodiment of training method of model, this reality The training method for applying the voice wake-up model of example can be adapted for the terminal for having Intelligent voice dialog arousal function, such as intelligent language Sound TV, intelligent sound box, Intelligent dialogue toy and other existing intelligent terminals for supporting voice to wake up etc..
As shown in Figure 1, in a step 101, obtaining the training voice data for waking up model for voice;
In a step 102, training voice data is separately input into keyword search system and word speed detection system;
In step 103, whether the training voice data for obtaining the output of keyword search system includes specified wake-up word First output is as a result, obtain the second output result of the speed of the training voice data of word speed detection system output;
At step 104, at least keyword is examined using the first attribute of training voice data and the second attribute as benchmark Examining system and word speed detection system are trained.
In the present embodiment, for step 101, the training device that voice wakes up model obtains training voice data first, Wherein, which has known first attribute and known second attribute, and whether the first attribute is comprising specified Word is waken up, the second attribute is word speed speed, that is, whether train in voice data includes to specify to wake up word and training voice data Word speed speed is known.Later, for step 102, the training device that voice wakes up model distinguishes the training voice data It is input in keyword search system and word speed detection system, wherein keyword search system is for detecting in voice data No to wake up word comprising specified, word speed detection system is used to detect the word speed speed of voice data.Wherein, keyword search system can To be existing keyword search system, performance preferably keyword search system, the application for being also possible to the following exploitation exist There is no limit for this.Keyword search system can identify preset keyword by training one, that is, wake up the model of word, so as to It whether enough detects in voice comprising waking up word.Word speed detection system then be set one or more threshold value word speed is divided into it is more A grade or section, to can determine whether that its word speed is in which grade or section, word speed detection system by inputting voice It is also possible to the new system for detecting voice word speed of existing other word speed detection systems or the following exploitation, this Shen Please there is no limit herein.
Later, for step 103, voice wakes up the instruction that the training device of model is exported by obtaining keyword search system Practice whether voice data includes specified the first output for waking up word as a result, and obtaining the training voice of word speed detection system output Second output result of the speed of data.Later for step 104, the training device that voice wakes up model at least will training voice The first attribute and the second attribute of data are trained keyword search system and word speed detection system as benchmark, by with On the basis of known attribute or target is trained keyword search model and word speed detection model, so that keyword search Model and word speed detection model accuracy are higher.
Voice provided in this embodiment wakes up the training method of model by inputting the training voice data of known attribute Into keyword search system and word speed detection system, so as to by after training constantly adjustment keyword search system and The mode of the parameter of word speed detection system constantly optimizes keyword search system and word speed detection system, so that its detection performance More preferably, accuracy is higher, is preferably applied in subsequent wake-up identification.
In some alternative embodiments, word speed detection system is two classifiers, is provided with one in word speed detection system Word speed threshold value, wherein when word speed is more than or equal to word speed threshold value, output word speed is fast;When word speed is less than word speed threshold value, output Word speed is slow.It is that fast and word speed is slowly to only be divided into word speed, two classifier trainings are got up simple and fast.
With further reference to Fig. 2, it illustrates the streams that the voice of the application wakes up another embodiment of the training method of model Cheng Tu.The flow chart for the step of method of the present embodiment is further limited primarily directed to step 104 in flow chart 1.
In step 201, adjust word speed detection system parameter so that word speed detection system second output result base This is equal to the second attribute;
In step 202, the different word speeds that detected corresponding to word speed detection system, by during training The parameter of keyword search system is adjusted so that the first output result of keyword search system is substantially equal to the first attribute.
In the present embodiment, for step 201, voice wakes up the training device of model by adjusting word speed detection system So that the second output result of word speed detection system is substantially equal to the second attribute, i.e. trained result increasingly approaches very parameter Real result.Later, for step 202, corresponding to the different word speeds that word speed detection system detected, by trained The parameter for adjusting keyword search system in the process makes the first output result of keyword search system be substantially equal to the first category Property, that is, judge whether also more to level off to comprising keyword it is true whether include keyword result.
The method of the present embodiment makes the output result of training by parameter during training by adjusting model Substantially equal to really as a result, detection performance is more preferable so that each model accuracy after training is higher.
In some alternative embodiments, the parameter of keyword search system includes sliding window length.To for difference Word speed, can by adjust this parameter of sliding window length make the accuracy of keyword search model higher.
Referring to FIG. 3, it illustrates the streams that a kind of voice that one embodiment of the application provides wakes up the application method of model Cheng Tu.The training method that the voice of the present embodiment wakes up model can be adapted for the end for having Intelligent voice dialog arousal function End, such as intelligent sound TV, intelligent sound box, Intelligent dialogue toy and other existing intelligent terminals for supporting voice to wake up Deng.
As shown in figure 3, in step 301, obtaining the voice data to be detected of user;
In step 302, the word speed after voice data to be detected being input to the training of the method by above embodiments is examined In examining system;
In step 303, the word speed speed result of word speed detection system is obtained;
In step 304, determine that the corresponding of sliding window used in keyword search system is slided based on word speed speed result Dynamic window length;
In step 305, after voice data to be detected being input to the training of the method by above embodiments and use In the keyword search system of the sliding window of corresponding sliding window length;
Within step 306, the output for obtaining keyword search system, provides wake-up result based on output.
In the present embodiment, for step 301, the use device that voice wakes up model obtains the voice number to be detected of user According to.Later for step 302, voice wakes up the use device of model and voice data to be monitored is input to by Fig. 1, Fig. 2 and In the word speed detection system that the method for relevant embodiment was trained.Then for step 303, the output of word speed detection system is obtained Word speed speed result.Later for step 304, determine needs in keyword search based on the word speed speed result of internal heat before The correspondence sliding window length of sliding window used in system, it is however generally that, when word speed is fast, corresponding sliding window length is shorter, When word speed is slow, corresponding sliding window length is longer, so that corresponding sliding window length is with Speed variation, keyword search system It is influenced greatly to be lowered by word speed, wakes up word so as to preferably detect whether to exist.
The method of the present embodiment first detects the speed of word speed by after getting voice data to be detected, further according to The speed of word speed adjusts accordingly the parameter of keyword search system, so that inspection of the word speed to keyword search system The influence for surveying accuracy reduces as much as possible, so that keyword search system can have better detectability to keyword, Waking up performance can also be promoted.
In some alternative embodiments, the sliding window used in keyword search system is determined based on word speed speed result Correspondence sliding window length include: when word speed speed result be fast word speed when, by the sliding of the sliding window of keyword search system Window length reduces preset length with corresponding with fast word speed;When word speed speed result is slow word speed, by keyword search system The sliding window length of sliding window increases preset length with corresponding with slow word speed.The present embodiment proposes a kind of determining sliding window length Method, by this method can be allowed to by the dynamic regulation of word speed and sliding window length based on the word speed detected with Corresponding sliding window length is reduced or increased to switch to and be suitble to faster or slower language in the speed that benchmark word speed is compared The sliding window length of speed, so that preferably identification wakes up word.
In some alternative embodiments, the sliding window used in keyword search system is determined based on word speed speed result Correspondence sliding window length include: when word speed speed result is fast word speed, corresponding sliding window length is L1;When word speed speed knot When fruit is slow word speed, corresponding sliding window length is L2, wherein L1 < L2.The present embodiment proposes a kind of determining sliding window length Method can be allowed to by word speed and the one-to-one mode of sliding window length based on the word speed detected by this method Corresponding sliding window length is switched to, so that preferably identification wakes up word.
Below to some problems encountered in the implementation of the present invention by description inventor and to finally determination One specific embodiment of scheme is illustrated, so that those skilled in the art more fully understand the scheme of the application.
Inventor has found that the defect of prior art is mainly due to following original after carefully studying to the prior art Because caused by:
Voice wakes up model, is passing through early period after training, model parameter has been fixed;However under different word speeds, model Parameter needs adjust.The model of preset parameter is not suitable for solving the problems, such as that word speed changes.
Those skilled in the art may use following scheme to solve drawbacks described above:
In order to solve the problems, such as that word speed changes, the parameter for changing speech feature extraction is generallyd use, to adapt to fast word speed;So And the parameter of fast word speed is adapted to, and the wake-up rate of normal word speed can be reduced.
It is generally used on very small-sized smart machine since voice wakes up model, such as Intelligent bracelet, intelligent hand Machine, it is smaller for the memory of domination, and changeable parameters/multi-parameters model needs bigger memory size.It is set however as intelligence The development of standby hardware, workable memory headroom are obviously improved, and there has also been the bases of realization for changeable parameters/multi-parameters model.
The scheme of the application proposes the training and use device of a kind of voice wake-up model:
In the keyword search system-based used now, increase a word speed detection system;Word speed detection system is used In the speed of detection word speed, corresponding different word speed uses different Prediction Parameters;Word speed detection system cooperates keyword search Model can achieve preferable wake-up effect.
Referring to FIG. 4, it illustrates the flow charts of a specific embodiment of the scheme of the application, it should be noted that with Although referring to some specific examples in lower embodiment, the scheme being not intended to limit this application.
As shown in figure 4, training two systems, (voice in Fig. 4 wakes up keyword search system in model training stage Model) and word speed detection system (regression model in Fig. 4).Wherein the input of keyword search system is training data, i.e., greatly Amount includes or not comprising the recording for waking up word, whether output recording is comprising waking up word.The input of word speed detection system is equally Recording data, export for record word speed speed, it is substantially two classifiers.
In test phase, test recording can be sent into word speed detection and keyword search system;Word speed detection system detects language The quality of speed, if it is fast word speed, keyword search system uses the sliding window of smaller length, if slowly, using relatively greatly enhancing The sliding window of degree;Wake-up is finally provided as a result, whether being keyword.
Wherein, the word speed detection system of training stage, can be substituted with regression model.In the training process that word speed determines In, the label sequence for voice segments is inputted, which can calculate the duration of each word in voice segments, export as language Speed, this is a regression model, i.e., output valve is continuous, can choose a linear regression model (LRM).
In the test process that word speed determines, the label sequence of voice is inputted, exports as word speed, different word speeds is divided For three classifications, at a slow speed, normal word speed is with quickly, and each word speed corresponds to the long size of different window, and word speed is faster, and window length is smaller, In.Label sequence refers to the corresponding annotated sequence of voice segments, is to be obtained by original pinyin marking by HMM acoustic model The other annotated sequence of frame level.
It is waken up in model in voice, training stage mode input is phonetic feature, and output is that the posteriority of corresponding label is general Rate is determined as the probability of some label.In test process, model exports posteriority score, then according to scoring system to posteriority Score is given a mark, this scoring system relies on window long value, and final output wakes up result (wake up/not waking up).
In scoring system, according to the posterior probability that wake-up model obtains, whole score, the score are calculated according to marking principle It can be waken up more than threshold value;In scoring process, window length is directly affected in one section of voice segments, some label posterior probability is maximum The calculating of value, when word speed is very fast, window length is shorter, and obtained maximum value is more accurate;When word speed is slow, window length is longer than calibrated Really.
Above-described embodiment at least can be realized following technical effect:
By considering influence of the word speed to result is waken up word speed detection is added, for difference in scheme provided in this embodiment The voice of word speed uses the sliding window of different length, so as to greatly reduce influence of the word speed to result is waken up.
Referring to FIG. 5, it illustrates the block diagrams that the voice that one embodiment of the invention provides wakes up the training device of model.
As shown in figure 5, a kind of voice wakes up the training device 500 of model, including training obtains module 510, input module 520, output obtains module 530 and training module 540.
Wherein, training obtains module 510, is configured to obtain the training voice data for waking up model for voice, wherein institute Trained voice data is stated with known first attribute and known second attribute, first attribute is whether to call out comprising specified Awake word, second attribute are word speed speed;Input module 520 is configured to the trained voice data being separately input into pass Key word detection system and word speed detection system, wherein the keyword search system for detect in voice data whether include Specified to wake up word, the word speed detection system is used to detect the word speed speed of voice data;Output obtains module 530, is configured to The trained voice data for obtaining keyword search system output whether include specified the first output for waking up word as a result, Obtain the second output result of the speed of the trained voice data of the word speed detection system output;And training module 540, it is configured at least using first attribute of the trained voice data and second attribute as benchmark to the pass Key word detection system and the word speed detection system are trained.
With further reference to Fig. 6, it illustrates the frames that the voice that one embodiment of the invention provides wakes up the use device of model Figure.
As shown in fig. 6, a kind of voice wakes up the use device 600 of model, including detection obtains module 610, word speed detection Module 620, word speed obtain module 630, sliding window length determination modul 640, keyword search module 650 and wake up result output Module 660.
Wherein, detection obtains module 610, is configured to obtain the voice data to be detected of user;Word speed detection module 620, It is configured to be input to the voice data to be detected in the word speed detection system after process method training shown in FIG. 1; Word speed obtains module 630, is configured to obtain the word speed speed result of the word speed detection system;Sliding window length determination modul 640, it is configured to the corresponding sliding that the word speed speed result determines the sliding window used in the keyword search system Window length;Keyword search module 650 is configured to for the voice data to be detected being input to by process side shown in FIG. 1 In the keyword search system of sliding window that is after method training and using the corresponding sliding window length;And wake up result output Module 660 is configured to obtain the output of the keyword search system, provides wake-up result based on the output.
It should be appreciated that all modules recorded in Fig. 5 and Fig. 6 with reference to each in method described in Fig. 1, Fig. 2 and Fig. 3 Step is corresponding.As a result, the operation above with respect to method description and feature and corresponding technical effect be equally applicable to Fig. 5 and All modules in Fig. 6, details are not described herein.
It is worth noting that, the scheme that the module in embodiments herein is not intended to limit this application, such as train Obtaining module can be described as obtaining the module for the training voice data that model is waken up for voice.Furthermore it is also possible to by hard Part processor realizes related function module, such as training obtains module can also realize that details are not described herein with processor.
In further embodiments, the embodiment of the invention also provides a kind of nonvolatile computer storage medias, calculate Machine storage medium is stored with computer executable instructions, which can be performed in above-mentioned any means embodiment Voice wake up model training and application method;
As an implementation, nonvolatile computer storage media of the invention is stored with the executable finger of computer It enables, computer executable instructions setting are as follows:
Obtain the training voice data that model is waken up for voice, wherein the trained voice data has known the Whether one attribute and known second attribute, first attribute are comprising specified wake-up word, and second attribute is that word speed is fast Slowly;
The trained voice data is separately input into keyword search system and word speed detection system, wherein the pass Whether key word detection system is for detecting comprising specified wake-up word in voice data, and the word speed detection system is for detecting voice The word speed speed of data;
Whether the trained voice data for obtaining the keyword search system output includes the first of specified wake-up word Output is as a result, obtain the second output result of the speed of the trained voice data of the word speed detection system output;
At least using first attribute of the trained voice data and second attribute as benchmark to the key Word detection system and the word speed detection system are trained.
As another embodiment, nonvolatile computer storage media of the invention is stored with the executable finger of computer It enables, computer executable instructions setting are as follows:
Obtain the voice data to be detected of user;
The voice data to be detected is input in the word speed detection system after claim 1-4 training;
Obtain the word speed speed result of the word speed detection system;
The correspondence sliding window of the sliding window used in the keyword search system is determined based on the word speed speed result Length;
The voice data to be detected is input to after claim 1-4 training and uses the corresponding sliding window In the keyword search system of the sliding window of length;
The output for obtaining the keyword search system provides wake-up result based on the output.
Non-volatile computer readable storage medium storing program for executing may include storing program area and storage data area, wherein storage journey It sequence area can application program required for storage program area, at least one function;Storage data area can store and be waken up according to voice The training of model and use device use created data etc..In addition, non-volatile computer readable storage medium storing program for executing can be with It can also include nonvolatile memory, for example, at least disk memory, a flash memory including high-speed random access memory Device or other non-volatile solid state memory parts.In some embodiments, non-volatile computer readable storage medium storing program for executing is optional Including the memory remotely located relative to processor, these remote memories can wake up model by network connection to voice Training and use device.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile communication Net and combinations thereof.
The embodiment of the present invention also provides a kind of computer program product, and computer program product is non-volatile including being stored in Computer program on computer readable storage medium, computer program include program instruction, when program instruction is held by computer When row, computer is made to execute training and application method that any of the above-described voice wakes up model.
Fig. 7 is the structural schematic diagram of electronic equipment provided in an embodiment of the present invention, as shown in fig. 7, the equipment includes: one Or multiple processors 710 and memory 720, in Fig. 7 by taking a processor 710 as an example.Voice wakes up training and making for model It can also include: input unit 730 and output device 740 with the equipment of method.Processor 710, memory 720, input unit 730 can be connected with output device 740 by bus or other modes, in Fig. 7 for being connected by bus.Memory 720 be above-mentioned non-volatile computer readable storage medium storing program for executing.Processor 710 is stored in non-in memory 720 by operation Volatibility software program, instruction and module are realized thereby executing the various function application and data processing of server State training and application method that embodiment of the method voice wakes up model.Input unit 730 can receive the number or character letter of input Breath, and generation key signals related with voice the wake-up user setting and function control of training and use device of model are defeated Enter.Output device 740 may include that display screen etc. shows equipment.
Method provided by the embodiment of the present invention can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present invention.
As an implementation, above-mentioned electronic apparatus application is in the training device that voice wakes up model, comprising: at least One processor;And the memory being connect at least one processor communication;Wherein, be stored with can be by least one for memory The instruction that a processor executes, instruction are executed by least one processor so that at least one processor can:
Obtain the training voice data that model is waken up for voice, wherein the trained voice data has known the Whether one attribute and known second attribute, first attribute are comprising specified wake-up word, and second attribute is that word speed is fast Slowly;
The trained voice data is separately input into keyword search system and word speed detection system, wherein the pass Whether key word detection system is for detecting comprising specified wake-up word in voice data, and the word speed detection system is for detecting voice The word speed speed of data;
Whether the trained voice data for obtaining the keyword search system output includes the first of specified wake-up word Output is as a result, obtain the second output result of the speed of the trained voice data of the word speed detection system output;
At least using first attribute of the trained voice data and second attribute as benchmark to the key Word detection system and the word speed detection system are trained.
As another embodiment, above-mentioned electronic apparatus application is in the use device that voice wakes up model, comprising: extremely A few processor;And the memory being connect at least one processor communication;Wherein, be stored with can be by least for memory One processor execute instruction, instruction executed by least one processor so that at least one processor can:
Obtain the voice data to be detected of user;
The voice data to be detected is input in the word speed detection system after claim 1-4 training;
Obtain the word speed speed result of the word speed detection system;
The correspondence sliding window of the sliding window used in the keyword search system is determined based on the word speed speed result Length;
The voice data to be detected is input to after claim 1-4 training and uses the corresponding sliding window In the keyword search system of the sliding window of length;
The output for obtaining the keyword search system provides wake-up result based on the output.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
The apparatus embodiments described above are merely exemplary, wherein unit can be as illustrated by the separation member Or may not be and be physically separated, component shown as a unit may or may not be physical unit, i.e., It can be located in one place, or may be distributed over multiple network units.It can select according to the actual needs therein Some or all of the modules achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creative labor In the case where dynamic, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation The method of certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. the training method that a kind of voice wakes up model, comprising:
Obtain the training voice data that model is waken up for voice, wherein the trained voice data has known first to belong to Property and known second attribute, first attribute be whether comprising specified to wake up word, second attribute is word speed speed;
The trained voice data is separately input into keyword search system and word speed detection system, wherein the keyword Whether detection system is for detecting comprising specified wake-up word in voice data, and the word speed detection system is for detecting voice data Word speed speed;
Whether the trained voice data for obtaining the keyword search system output includes specified the first output for waking up word As a result, obtaining the second output result of the speed of the trained voice data of the word speed detection system output;
At least the keyword is examined using first attribute of the trained voice data and second attribute as benchmark Examining system and the word speed detection system are trained.
2. according to the method described in claim 1, wherein, the word speed detection system is two classifiers, the word speed detection system A word speed threshold value is provided in system, wherein
When word speed is more than or equal to the word speed threshold value, output word speed is fast;
When word speed is less than the word speed threshold value, output word speed is slow.
3. according to the method described in claim 1, wherein, it is described at least by first attribute of the trained voice data and Second attribute is trained to the keyword search system and the word speed detection system as benchmark and includes:
The parameter of word speed detection system is adjusted so that the second output result of word speed detection system is substantially equal to described second and belongs to Property;
Corresponding to the different word speeds that the word speed detection system detected, by adjusting keyword inspection during training The parameter of examining system is so that the first output result of keyword search system is substantially equal to first attribute.
4. according to the method described in claim 3, wherein, the parameter of the keyword search system includes sliding window length.
5. the application method that a kind of voice wakes up model, comprising:
Obtain the voice data to be detected of user;
The voice data to be detected is input in the word speed detection system after claim 1-4 training;
Obtain the word speed speed result of the word speed detection system;
The correspondence sliding window length of the sliding window used in the keyword search system is determined based on the word speed speed result;
The voice data to be detected is input to after claim 1-4 training and uses the corresponding sliding window length Sliding window keyword search system in;
The output for obtaining the keyword search system provides wake-up result based on the output.
6. according to the method described in claim 5, wherein, described determined based on the word speed speed result is examined in the keyword The correspondence sliding window length for the sliding window that examining system uses includes:
When the word speed speed result is fast word speed, the sliding window length of the sliding window of the keyword search system is reduced Preset length is with corresponding with the fast word speed;
When the word speed speed result is slow word speed, the sliding window length of the sliding window of the keyword search system is increased Preset length is with corresponding with the slow word speed.
7. according to the method described in claim 5, wherein, described determined based on the word speed speed result is examined in the keyword The correspondence sliding window length for the sliding window that examining system uses includes:
When the word speed speed result is fast word speed, corresponding sliding window length is L1;
When the word speed speed result is slow word speed, corresponding sliding window length is L2, wherein L1 < L2.
8. the training device that a kind of voice wakes up model, comprising:
Training obtains module, is configured to obtain the training voice data for waking up model for voice, wherein the trained voice number According to known first attribute and known second attribute, first attribute is whether comprising specified to wake up word, described the Two attributes are word speed speed;
Input module is configured to the trained voice data being separately input into keyword search system and word speed detection system, Wherein, whether the keyword search system is for detecting in voice data comprising specified wake-up word, the word speed detection system For detecting the word speed speed of voice data;
Output obtains module, and whether the trained voice data for being configured to obtain the keyword search system output includes to refer to Surely the first output of word is waken up as a result, obtaining the second of the speed of the trained voice data of the word speed detection system output Export result;
Training module is configured at least using first attribute of the trained voice data and second attribute as benchmark The keyword search system and the word speed detection system are trained.
9. the use device that a kind of voice wakes up model, comprising:
Detection obtains module, is configured to obtain the voice data to be detected of user;
Word speed detection module is configured to for the voice data to be detected to be input to the word speed after claim 1-4 training In detection system;
Word speed obtains module, is configured to obtain the word speed speed result of the word speed detection system;
Sliding window length determination modul is configured to the word speed speed result and determines in keyword search system use Sliding window correspondence sliding window length;
Keyword search module, be configured to for the voice data to be detected to be input to it is after claim 1-4 training and In keyword search system using the sliding window of the corresponding sliding window length;
Result output module is waken up, is configured to obtain the output of the keyword search system, wake-up is provided based on the output As a result.
10. a kind of electronic equipment comprising: at least one processor, and connect at least one described processor communication Memory, wherein the memory be stored with can by least one described processor execute instruction, described instruction by it is described extremely A few processor executes, so that at least one described processor is able to carry out the step of any one of claim 1 to 7 the method Suddenly.
CN201910806848.6A 2019-08-29 2019-08-29 Method and device for training and using voice awakening model Active CN110503944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910806848.6A CN110503944B (en) 2019-08-29 2019-08-29 Method and device for training and using voice awakening model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910806848.6A CN110503944B (en) 2019-08-29 2019-08-29 Method and device for training and using voice awakening model

Publications (2)

Publication Number Publication Date
CN110503944A true CN110503944A (en) 2019-11-26
CN110503944B CN110503944B (en) 2021-09-24

Family

ID=68590309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910806848.6A Active CN110503944B (en) 2019-08-29 2019-08-29 Method and device for training and using voice awakening model

Country Status (1)

Country Link
CN (1) CN110503944B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910885A (en) * 2019-12-12 2020-03-24 苏州思必驰信息科技有限公司 Voice awakening method and device based on decoding network
CN112466332A (en) * 2020-11-13 2021-03-09 阳光保险集团股份有限公司 Method and device for scoring speed, electronic equipment and storage medium
WO2021134549A1 (en) * 2019-12-31 2021-07-08 李庆远 Human merging and training of multiple artificial intelligence outputs
CN113782014A (en) * 2021-09-26 2021-12-10 联想(北京)有限公司 Voice recognition method and device
CN115223553A (en) * 2022-03-11 2022-10-21 广州汽车集团股份有限公司 Voice recognition method and driving assistance system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002358094A (en) * 2001-03-29 2002-12-13 Ricoh Co Ltd Voice recognition system
DE102004012209A1 (en) * 2004-03-12 2005-10-06 Siemens Ag Noise reducing method for speech recognition system in e.g. mobile telephone, involves selecting noise models based on vehicle parameters for noise reduction, where parameters are obtained from signal that does not represent sound
CN108701452A (en) * 2016-02-02 2018-10-23 日本电信电话株式会社 Audio model learning method, audio recognition method, audio model learning device, speech recognition equipment, audio model learning program and speech recognition program
CN109671433A (en) * 2019-01-10 2019-04-23 腾讯科技(深圳)有限公司 A kind of detection method and relevant apparatus of keyword

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002358094A (en) * 2001-03-29 2002-12-13 Ricoh Co Ltd Voice recognition system
DE102004012209A1 (en) * 2004-03-12 2005-10-06 Siemens Ag Noise reducing method for speech recognition system in e.g. mobile telephone, involves selecting noise models based on vehicle parameters for noise reduction, where parameters are obtained from signal that does not represent sound
CN108701452A (en) * 2016-02-02 2018-10-23 日本电信电话株式会社 Audio model learning method, audio recognition method, audio model learning device, speech recognition equipment, audio model learning program and speech recognition program
CN109671433A (en) * 2019-01-10 2019-04-23 腾讯科技(深圳)有限公司 A kind of detection method and relevant apparatus of keyword

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910885A (en) * 2019-12-12 2020-03-24 苏州思必驰信息科技有限公司 Voice awakening method and device based on decoding network
WO2021134549A1 (en) * 2019-12-31 2021-07-08 李庆远 Human merging and training of multiple artificial intelligence outputs
CN112466332A (en) * 2020-11-13 2021-03-09 阳光保险集团股份有限公司 Method and device for scoring speed, electronic equipment and storage medium
CN112466332B (en) * 2020-11-13 2024-05-28 阳光保险集团股份有限公司 Method and device for scoring speech rate, electronic equipment and storage medium
CN113782014A (en) * 2021-09-26 2021-12-10 联想(北京)有限公司 Voice recognition method and device
CN113782014B (en) * 2021-09-26 2024-03-26 联想(北京)有限公司 Speech recognition method and device
CN115223553A (en) * 2022-03-11 2022-10-21 广州汽车集团股份有限公司 Voice recognition method and driving assistance system
CN115223553B (en) * 2022-03-11 2023-11-17 广州汽车集团股份有限公司 Speech recognition method and driving assistance system

Also Published As

Publication number Publication date
CN110503944B (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN110503944A (en) The training of voice wake-up model and application method and device
US11127416B2 (en) Method and apparatus for voice activity detection
CN110838286B (en) Model training method, language identification method, device and equipment
CN108877778B (en) Sound end detecting method and equipment
CN110268469B (en) Server side hotword
US10269346B2 (en) Multiple speech locale-specific hotword classifiers for selection of a speech locale
US20190115011A1 (en) Detecting keywords in audio using a spiking neural network
US9837068B2 (en) Sound sample verification for generating sound detection model
CN105185373B (en) The generation of prosody hierarchy forecast model and prosody hierarchy Forecasting Methodology and device
CN108198548A (en) A kind of voice awakening method and its system
CN108564941A (en) Audio recognition method, device, equipment and storage medium
US9418662B2 (en) Method, apparatus and computer program product for providing compound models for speech recognition adaptation
CN110517670A (en) Promote the method and apparatus for waking up performance
CN105426404A (en) Music information recommendation method and apparatus, and terminal
CN110222649B (en) Video classification method and device, electronic equipment and storage medium
CN102280106A (en) VWS method and apparatus used for mobile communication terminal
CN102982811A (en) Voice endpoint detection method based on real-time decoding
CN110473539A (en) Promote the method and apparatus that voice wakes up performance
CN112581938B (en) Speech breakpoint detection method, device and equipment based on artificial intelligence
CN111179915A (en) Age identification method and device based on voice
CN103903633A (en) Method and apparatus for detecting voice signal
CN111128134A (en) Acoustic model training method, voice awakening method, device and electronic equipment
CN101510423A (en) Pronunciation detection method and apparatus
CN104700831B (en) The method and apparatus for analyzing the phonetic feature of audio file
CN105609114A (en) Method and device for detecting pronunciation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: AI SPEECH Ltd.

GR01 Patent grant
GR01 Patent grant