CN110473552A - Speech recognition authentication method and system - Google Patents
Speech recognition authentication method and system Download PDFInfo
- Publication number
- CN110473552A CN110473552A CN201910832042.4A CN201910832042A CN110473552A CN 110473552 A CN110473552 A CN 110473552A CN 201910832042 A CN201910832042 A CN 201910832042A CN 110473552 A CN110473552 A CN 110473552A
- Authority
- CN
- China
- Prior art keywords
- audio
- speaker
- frequency information
- feature
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000003595 spectral effect Effects 0.000 claims abstract description 59
- 238000003860 storage Methods 0.000 claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 238000001228 spectrum Methods 0.000 claims description 42
- 238000012795 verification Methods 0.000 claims description 25
- 238000001514 detection method Methods 0.000 claims description 19
- 238000012545 processing Methods 0.000 claims description 11
- 230000008859 change Effects 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 abstract description 7
- 239000000284 extract Substances 0.000 description 6
- 230000001755 vocal effect Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 206010011224 Cough Diseases 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000004069 differentiation Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000010606 normalization Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000009191 jumping Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000029058 respiratory gaseous exchange Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000004146 energy storage Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Telephonic Communication Services (AREA)
Abstract
The embodiment of the present invention provides a kind of speech recognition authentication method, comprising: obtains audio-frequency information;The audio-frequency information is pre-processed, to obtain voice messaging from the audio-frequency information according to the short-time energy of the audio-frequency information and spectral centroid;The voice messaging is subjected to speech feature extraction;The phonetic feature is handled, to obtain and the more close target voice feature of speaker;The speaker's phonetic feature stored in the target voice feature and database is matched;And according to matching result, the identity information of the speaker corresponding with speaker's phonetic feature matched is exported, to obtain the speaker corresponding with the voice messaging.The embodiment of the present invention also provides a kind of speech recognition certification, computer equipment and readable storage medium storing program for executing.It is able to ascend the accuracy of speech recognition technology through the embodiment of the present invention, greatly promotes user experience.
Description
Technical field
The present embodiments relate to field of speech recognition more particularly to a kind of speech recognition authentication method, speech recognition to recognize
Card system, computer equipment and readable storage medium storing program for executing.
Background technique
It is increasingly mature with speech recognition technology, speech recognition technology in daily life using extremely wide.Example
Such as, domestic intelligent speech robot people completes the phonetic order received by identifying the sound of kinsfolk;Minutes system
System, recorded by identifying the sound of participant participant can on speech etc..However, existing most of speech recognition systems
All there is the problems such as unclear speech recognition, Speaker Identification mistake, for example, percussion keyboard sound as the spoken utterance of effective people
Sound so that speech recognition system gives invalid response, or is recorded as the speech of speaker A the speech of speaker B.
Present invention seek to address that speech recognition is unclear, the low problem of the recognition accuracies such as Speaker Identification mistake.
Summary of the invention
In view of this, it is necessary to provide a kind of speech recognition authentication method, speech recognition Verification System, computer equipment and
Readable storage medium storing program for executing is able to ascend the accuracy of speech recognition technology, greatly promotes user experience.
To achieve the above object, the embodiment of the invention provides a kind of speech recognition authentication methods, which comprises
Obtain audio-frequency information;
The audio-frequency information is pre-processed, with according to the short-time energy of the audio-frequency information and spectral centroid from described
Voice messaging is obtained in audio-frequency information;
The voice messaging is subjected to speech feature extraction;
The phonetic feature is handled, to obtain and the more close target voice feature of speaker;
The speaker's phonetic feature stored in the target voice feature and database is matched;And
According to matching result, by the identity information of the speaker corresponding with speaker's phonetic feature matched
Output, to obtain the speaker corresponding with the voice messaging.
Preferably, described that the audio-frequency information is pre-processed, according to the short-time energy of the audio-frequency information and frequency
Spectrum center obtains the step of voice messaging from the audio-frequency information, comprising:
Multiframe short signal is extracted from the audio-frequency information according to preset rules, wherein the preset rules include pre-
If signal extraction time interval;And
The multiframe short signal is calculated into the short-time energy according to mute detection algorithm;
The spectral centroid is calculated according to the multiframe short signal;
The first preset value stored in the short-time energy and database is compared;
The second preset value stored in the spectral centroid and the database is compared;
When the short-time energy is higher than first preset value, and the spectral centroid is higher than second preset value,
Determine the audio-frequency information for voice messaging;And
Obtain the voice messaging.
Preferably, the calculation formula of the short-time energy are as follows:
Wherein, E indicates the short-time energy, and N indicates the frame number of short signal, N >=2, and is integer, and s (n) indicates time domain
The signal amplitude of upper n-th frame short signal.
Preferably, the described the step of spectral centroid is calculated according to the multiframe short signal, comprising:
Frequency corresponding with the multiframe short signal is obtained respectively;And
According to the frequency and the multiframe short signal, the audio-frequency information is calculated according to the mute detection algorithm
Spectral centroid, wherein the calculation formula of the spectral centroid are as follows:
Wherein, C indicates that the spectral centroid, K indicate frequency number corresponding with N frame s (n) respectively, K >=2, and is whole
Number, S (k) indicate the spectrum energy distribution that discrete Fourier transform corresponding with the s (n) obtains on frequency domain.
Preferably, described the step of being compared the second preset value stored in the spectral centroid and the database
Later, further includes:
When the short-time energy is lower than second preset value lower than first preset value and/or the spectral centroid
When, determine that the audio-frequency information is invalid audio-frequency information, wherein the invalid audio-frequency information includes at least: mute, environment is made an uproar
Sound and non-ambient noise;And
The audio-frequency information is deleted.
Preferably, described to handle the phonetic feature, it is special with the more close target voice of speaker to obtain
The step of sign, comprising:
The phonetic feature is normalized using Z-score standardized method, by the phonetic feature into
Row is unified, wherein the formula of the normalized are as follows:μ is the mean value of multiple voice messagings, and σ is more
The standard deviation of a voice messaging, x are the multiple single frames voice data, and x* is that the voice after normalized is special
Sign;
Normalized result feature is spliced, to form the long splicing frame of lap;And
The splicing frame is input in neural network, to be trained to the splicing frame, to obtain the target language
Sound feature.
Preferably, described to handle the phonetic feature, it is special with the more close target voice of speaker to obtain
After the step of sign, further includes:
The target voice feature is input in trained in advance Speaker change detection model and invasive noise model;
According to voice messaging described in output result verification whether be saved in the Speaker change detection model it is multiple default
The voice of a default speaker in speaker;And
When the voice messaging is the voice of the default speaker, then the voice messaging is obtained.
To achieve the above object, the embodiment of the invention also provides a kind of speech recognition Verification Systems, comprising:
Module is obtained, for obtaining audio-frequency information;
Preprocessing module, for being pre-processed to the audio-frequency information, according to the short-time energy of the audio-frequency information
Voice messaging is obtained from the audio-frequency information with spectral centroid;
Characteristic extracting module, for the voice messaging to be carried out speech feature extraction;
Processing module, for handling the phonetic feature, to obtain and the more close target voice of speaker
Feature;
A matching module, for carrying out the speaker's phonetic feature stored in the target voice feature and database
Match;And
Output module corresponding with speaker's phonetic feature is stated what is matched for according to matching result
The identity information output of people is talked about, to obtain the speaker corresponding with the voice messaging.
To achieve the above object, the embodiment of the invention also provides a kind of computer equipment, the computer equipment storages
Device, processor and it is stored in the computer program that can be run on the memory and on the processor, the computer journey
The step of speech recognition authentication method as described above is realized when sequence is executed by processor.
To achieve the above object, the embodiment of the invention also provides a kind of computer readable storage medium, the computers
Computer program is stored in readable storage medium storing program for executing, the computer program can be performed by least one processor, so that institute
State the step of at least one processor executes speech recognition authentication method as described above.
Speech recognition authentication method provided in an embodiment of the present invention, speech recognition Verification System, computer equipment and readable
Storage medium, by being pre-processed to the audio-frequency information of acquisition, according in the short-time energy and frequency spectrum of the audio-frequency information
The heart obtains voice messaging from the audio-frequency information, and extracts phonetic feature from the voice messaging, by the phonetic feature
It is handled, with acquisition and the more close target voice feature of speaker, will be deposited in the target voice feature and database
Speaker's phonetic feature of storage matches, and according to matching result, by the speaker's identity information matched export with
Obtain the speaker corresponding with the voice messaging.Through the embodiment of the present invention, it is able to ascend the standard of speech recognition technology
Exactness greatly promotes user experience.
Detailed description of the invention
Fig. 1 is the step flow chart of the speech recognition authentication method of the embodiment of the present invention one.
Fig. 2 is audio frequency characteristics spliced map after the normalization of the embodiment of the present invention one.
Fig. 3 is the specific connecting method figure of the embodiment of the present invention one.
Fig. 4 is the hardware structure schematic diagram of the computer equipment of the embodiment of the present invention two.
Fig. 5 is the program module schematic diagram of the speech recognition Verification System of the embodiment of the present invention three.
Appended drawing reference:
Computer equipment | 2 |
Memory | 21 |
Processor | 22 |
Network interface | 23 |
Speech recognition Verification System | 20 |
Obtain module | 201 |
Preprocessing module | 202 |
Characteristic extracting module | 203 |
Processing module | 204 |
Matching module | 205 |
Output module | 206 |
Normalize module | 207 |
Splicing module | 208 |
Training module | 209 |
Speech verification module | 210 |
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not
For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work
Every other embodiment obtained is put, shall fall within the protection scope of the present invention.
It should be noted that the description for being related to " first ", " second " etc. in the present invention is used for description purposes only, and cannot
It is interpreted as its relative importance of indication or suggestion or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the
One ", the feature of " second " can explicitly or implicitly include at least one of the features.In addition, the skill between each embodiment
Art scheme can be combined with each other, but must be based on can be realized by those of ordinary skill in the art, when technical solution
Will be understood that the combination of this technical solution is not present in conjunction with there is conflicting or cannot achieve when, also not the present invention claims
Protection scope within.
Embodiment one
Refering to fig. 1, the step flow chart of the speech recognition authentication method of the embodiment of the present invention one is shown.It is appreciated that
Flow chart in this method embodiment, which is not used in, is defined the sequence for executing step.It should be noted that the present embodiment is in terms of
Calculating machine equipment 2 is that executing subject carries out exemplary description.It is specific as follows:
Step S100 obtains audio-frequency information.
In a preferred embodiment, when minutes carry out, due to the voice, quiet spoken in environment there are speaker
Sound, environmental noise and non-ambient noise, the speech recognition Verification System obtain those sound namely audio-frequency information.
It should be noted that non-ambient noise has in different short-time energy and frequency spectrum from the voice that speaker speaks
The heart.
Step S102 pre-processes the audio-frequency information, according to the short-time energy of the audio-frequency information and frequency spectrum
Center obtains voice messaging from the audio-frequency information.
Illustratively, after getting audio-frequency information, due to the audio-frequency information include speaker speak voice messaging,
Mute, ambient noise and non-ambient noise need to handle the audio-frequency information to obtain institute from the audio-frequency information
State voice messaging.The mute part referred to due to silence without pronunciation, such as: speaker can think during speaking
It examines, breathe, since speaker will not make a sound in thinking and breathing.The environmental noise includes but is not limited to door and window
The sound of the sendings such as the collision of switch, object.The non-ambient noise includes but is not limited to, such as: cough clicks mouse
Sound or beat the sound of keyboard.Short-time energy and spectral centroid are two important fingers of mute detection technique sound intermediate frequency information
Mark, wherein what the short-time energy embodied is the power of signal energy, and the mute and environment that can be distinguished in a segment of audio is made an uproar
Sound, the spectral centroid can distinguish the part in non-ambient noise.By comprehensive short-time energy and spectral centroid with from described
Effective audio namely voice messaging are filtered out in audio-frequency information.
In a preferred embodiment, pre-processed when to the audio-frequency information, with according to the audio-frequency information in short-term
When energy and spectral centroid obtain voice messaging from the audio-frequency information, extracted from the audio-frequency information according to preset rules
Multiframe short signal, wherein the preset rules include preset signals extraction time interval.Then, the multiframe is believed in short-term
Number the short-time energy and the spectral centroid are calculated according to mute detection algorithm.Then, by the short-time energy and database
First preset value of middle storage is compared, and the second preset value stored in the spectral centroid and the database is compared
Compared with.When the short-time energy is higher than first preset value, and the spectral centroid is higher than second preset value, institute is determined
Stating audio-frequency information is voice messaging, and obtains the voice messaging.
The calculation formula of the short-time energy are as follows:E indicates the short-time energy, and N is indicated
The frame number of short signal, N >=2, s (n) indicate the signal amplitude of n-th frame short signal in time domain.
Illustratively, from audio-frequency information according to prefixed time interval (such as: 0.2ms) extract multiframe short signal s
(1), then the multiframe short signal extracted is calculated short-time energy, to determine by (2) s, s (3), s (4) ... s (N)
The energy for stating audio-frequency information is strong and weak.
It should be noted that the short-time energy is the quadratic sum of every frame signal, embodiment be signal energy power, when
When signal energy is too weak, then determine that the signal is mute or ambient noise.
In a further preferred embodiment, it when calculating the spectral centroid according to the spininess short signal, also obtains respectively
Frequency corresponding with the multiframe short signal is taken, and according to the frequency and the multiframe short signal, according to described mute
Detection algorithm calculates the spectral centroid of the audio-frequency information, wherein the calculation formula of the spectral centroid are as follows:Wherein, C indicates the spectral centroid, and K indicates frequency number corresponding with N frame s (n) respectively, K >=2, and
For integer, S (k) indicates the spectrum energy distribution that discrete Fourier transform corresponding with the s (n) obtains on frequency domain.
It should be noted that the spectral centroid is also known as frequency spectrum single order away from the value of spectral centroid is smaller, shows more
Spectrum energy concentrates in low-frequency range, and the part of non-ambient noise can be removed by using spectral centroid, such as: cough,
It clicks the sound of mouse or beats the sound of keyboard.
It should be noted that then representing should when threshold value of the index of short-time energy and spectral centroid simultaneously above setting
Audio-frequency information is effective audio namely the voice messaging that speaker speaks, and most ambient noise and non-ambient noise are removed,
So that the voice messaging remained is purer, quality is higher, is that the process of speech recognition reduces a large amount of disturbing factor.At this
In inventive embodiments, by the way that the ratio convenient value of first preset value and second preset value setting is high, and then obtain
High quality voice messaging.
In a further preferred embodiment, the second preset value stored in the spectral centroid and the database is compared
After relatively, when the short-time energy is lower than second preset value lower than first preset value and/or the spectral centroid
When, determine that the audio-frequency information is invalid audio-frequency information, and the audio-frequency information is deleted.Wherein, the invalid audio-frequency information
It includes at least: mute, ambient noise and non-ambient noise.
Illustratively, if the short-time energy is lower than first preset value, quiet environment, the audio letter are represented as
The mute or ambient noise of breath.If the spectral centroid is lower than second preset value, it is represented as non-quiet environment, the sound
Frequency information is non-ambient noise.
The voice messaging is carried out speech feature extraction by step S104.
In a preferred embodiment, by being jumped with a length of 10 frame of window (100 milliseconds), sound frame away from the Chinese for 3 frames (30 milliseconds)
Bright window carries out windowing process to the voice messaging, and then extracts corresponding phonetic feature.
It should be noted that the phonetic feature includes but is not limited to spectrum signature, sound quality feature and vocal print feature.Frequency spectrum
It is characterized in the voice data different according to acoustical vibration frequency separation, such as target voice and interference voice.Sound quality feature and vocal print
The corresponding speaker of voice data to be tested is identified according to the tone color feature of vocal print and sound when feature.Since speech differentiation is to use
In distinguishing target voice and interference voice in voice data, therefore it may only be necessary to obtain the spectrum signature of the voice messaging, just
It can complete speech differentiation.Wherein frequency spectrum is the abbreviation of frequency spectral density, and spectrum signature is the parameter for reflecting frequency spectral density.
In a preferred embodiment, the voice messaging includes multiple single frames voice data, and the voice messaging is carried out
When speech feature extraction, Fast Fourier Transform (FFT) processing first is carried out to the single frames voice data, obtains the voice messaging
Then power spectrum carries out dimension-reduction treatment to the power spectrum using Meier filter group, Meier frequency spectrum is obtained, finally to the plum
You carry out cepstral analysis by frequency spectrum, to obtain the phonetic feature.
Illustratively, since the Auditory Perception system of people can simulate complicated nonlinear system, the power spectrum of acquisition is not
The non-linear behavior of voice data can be showed well, therefore also need to carry out at dimensionality reduction frequency spectrum using Meier filter group
Reason, so that the frequency spectrum of the voice data to be tested obtained is more nearly the frequency of auditory perceptual.Wherein, Meier filter group be by
The triangle bandpass filter composition of multiple overlappings, triangle bandpass filter carries lower frequency limit, cutoff frequency and center frequency
Three kinds of frequencies of rate.The centre frequency of these triangle bandpass filters be on melscale it is equidistant, melscale is in 1000HZ
It is linear increase before, is into logarithm after 1000HZ and increases.Cepstrum refers to a kind of Fourier transform spectrum warp pair of signal
The inverse Fourier transform carried out again after number operation, since general Fourier spectrum is complex number spectrum, thus cepstrum is also known as cepstrum.
Step S106 handles the phonetic feature, to obtain and the more close target voice feature of speaker.
In a preferred embodiment, the phonetic feature is handled, to obtain and the more close target of speaker
The step of phonetic feature, specifically includes: the phonetic feature is normalized using Z-score standardized method, with
The phonetic feature is carried out unified, wherein the formula of the normalized are as follows:μ is multiple voice letters
The mean value of breath, σ be multiple voice messagings standard deviation, x be the multiple single frames voice data, x* be normalized it
The phonetic feature afterwards.Then, normalized result feature is spliced, to form the long splicing frame of lap.
Finally, the splicing frame is input in neural network, it is special to obtain the target voice to be trained to the splicing frame
Sign, to reduce the loss of the voice messaging.
Illustratively, Fig. 2 is please referred to using a length of 10 frame of window, is jumped away from the Hamming window for 3 frames to the normalized knot
Fruit feature is spliced, and the feature of 390 dimensions is formed.It then, is that a splicing unit spells 10 frame with every 10 frame
It connects, specific connecting method please refers to Fig. 3.
It should be noted that since each frame is all 39, therefore, the dimension that 10 frames are stitched together is 390 dimensions.Due to jumping
Away from 3 steps are jumped backward since first frame for 3 frames, next frame number to be spliced together is the 4th frame to the 13rd frame, and so on.
The embodiment of the present invention solves the comparativity between data target, reduces by the way that the phonetic feature is carried out unification
Different Effects caused by unusual sample data help to carry out Comprehensive Correlation evaluation to the phonetic feature, improve better language
Sound training effect.
The embodiment of the present invention forms the longer frame of lap by splicing feature, is to capture excessive letter
Breath reduces the loss of information in longer duration.
In a further preferred embodiment, the phonetic feature is handled, to obtain and the more close mesh of speaker
After marking phonetic feature, the target voice feature is also input to trained in advance Speaker change detection model and invasive noise
In model.Then, according to output result verification described in voice messaging whether be saved in the Speaker change detection model it is multiple
The voice of a default speaker is then obtained when the voice messaging is the voice of the default speaker in default speaker
Take the voice messaging.
Specifically, after extracting the phonetic feature of speaker, by verifying whether the phonetic feature is trained in advance
One of them in default speaker in Speaker change detection model, and this is accepted or rejected according to verification result selection and is spoken
People.If identifying, the phonetic feature is acted as fraudulent substitute for a person by invader, refuses the voice messaging of the speaker.
Step S108 matches the speaker's phonetic feature stored in the target voice feature and database.
Illustratively, the speaker's phonetic feature stored in treated phonetic feature and database is compared, with
Obtain the speaker's phonetic feature to match with the phonetic feature.
In a preferred embodiment, the phonetic feature extracted and the speaker's phonetic feature stored in database are carried out
Before matching, the phonetic feature of the speaker is also acquired in advance, and by the phonetic feature and the identity of corresponding speaker
Information preservation is in database.
Specifically, it since during the acquisition of speaker's phonetic feature, environment is quiet environment, therefore is easy to obtain institute
The phonetic feature of speaker is stated, and the identity information of the phonetic feature and corresponding speaker is stored in database.
Step S110, according to matching result, the speaker corresponding with speaker's phonetic feature that will be matched
Identity information output, to obtain corresponding with the voice messaging speaker.
Specifically, when the phonetic feature of the identity information 1 of the speaker stored in the phonetic feature and database extracted
When matching, which is exported, and then obtains speaker's first representated by the identity information 1.
Through the embodiment of the present invention, it is able to ascend the accuracy of speech recognition technology, and greatly promotes user experience.
Embodiment two
Referring to Fig. 2, showing the hardware structure schematic diagram of the computer equipment of the embodiment of the present invention two.Computer equipment
2 include but are not limited to, and connection memory 21, processing 22 and network interface 23, Fig. 2 can be in communication with each other by system bus
The computer equipment 2 with component 21-23 is illustrated only, it should be understood that be not required for implementing all components shown,
The implementation that can be substituted is more or less component.
The memory 21 include at least a type of readable storage medium storing program for executing, the readable storage medium storing program for executing include flash memory,
Hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), random access storage device (RAM), static random are visited
It asks memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), may be programmed read-only deposit
Reservoir (PROM), magnetic storage, disk, CD etc..In some embodiments, the memory 21 can be the computer
The internal storage unit of equipment 2, such as the hard disk or memory of the computer equipment 2.In further embodiments, the memory
It is also possible to the plug-in type hard disk being equipped on the External memory equipment of the computer equipment 2, such as the computer equipment 2, intelligence
Energy storage card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash
Card) etc..Certainly, the memory 21 can also both including the computer equipment 2 internal storage unit and also including outside it
Portion stores equipment.In the present embodiment, the memory 21 is installed on the operating system of the computer equipment 2 commonly used in storage
With types of applications software, such as the program code of speech recognition Verification System 20 etc..In addition, the memory 21 can be also used for
Temporarily store the Various types of data that has exported or will export.
The processor 22 can be in some embodiments central processing unit (Central Processing Unit,
CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 22 is commonly used in the control meter
Calculate the overall operation of machine equipment 2.In the present embodiment, the processor 22 is for running the program generation stored in the memory 21
Code or processing data, such as run the speech recognition Verification System 20 etc..
The network interface 23 may include radio network interface or wired network interface, which is commonly used in
Communication connection is established between the computer equipment 2 and other electronic equipments.For example, the network interface 23 is for passing through network
The computer equipment 2 is connected with exterior terminal, establishes data transmission between the computer equipment 2 and exterior terminal
Channel and communication connection etc..The network can be intranet (Intranet), internet (Internet), whole world movement
Communication system (Global System of Mobile communication, GSM), wideband code division multiple access (Wideband
Code Division Multiple Access, WCDMA), 4G network, 5G network, bluetooth (Bluetooth), the nothings such as Wi-Fi
Line or cable network.
Embodiment three
Referring to Fig. 3, showing the program module schematic diagram of the speech recognition Verification System of the embodiment of the present invention three.At this
In embodiment, speech recognition Verification System 20 may include or be divided into one or more program modules, one or more
Program module is stored in storage medium, and as performed by one or more processors, to complete the present invention, and can be realized
Predicate sound identification authentication method.The so-called program module of the embodiment of the present invention is the series of computation for referring to complete specific function
Machine program instruction section, the implementation procedure than program itself more suitable for description speech recognition Verification System 20 in storage medium.
The function of each program module of the present embodiment will specifically be introduced by being described below:
Module 201 is obtained, for obtaining audio-frequency information.
In a preferred embodiment, when minutes carry out, due to the voice, quiet spoken in environment there are speaker
Sound, environmental noise and non-ambient noise, the acquisition module 201 obtain those sound namely audio-frequency information.
It should be noted that non-ambient noise has in different short-time energy and frequency spectrum from the voice that speaker speaks
The heart.
Preprocessing module 202, for being pre-processed to the audio-frequency information, in short-term can according to the audio-frequency information
Amount and spectral centroid obtain voice messaging from the audio-frequency information.
Illustratively, after the acquisition module 201 gets audio-frequency information, since the audio-frequency information includes speaker
Voice messaging, mute, ambient noise and the non-ambient noise spoken, the preprocessing module 202 are needed to the audio-frequency information
It is handled to obtain the voice messaging from the audio-frequency information.The mute portion referred to due to silence without pronunciation
Point, such as: speaker can think deeply during speaking, breathe, since speaker will not make a sound in thinking and breathing.
The environmental noise includes but is not limited to the sound of the sendings such as the collision of the switch of door and window, object.The non-ambient noise includes
But it is not limited to, such as: cough clicks the sound of mouse or beats the sound of keyboard.Short-time energy and spectral centroid are quiet
Two important indicators of sound detection technique sound intermediate frequency information, wherein what the short-time energy embodied is the power of signal energy, energy
The mute and ambient noise in a segment of audio is enough distinguished, the spectral centroid can distinguish the part in non-ambient noise.Pass through
Comprehensive short-time energy and spectral centroid from the audio-frequency information to filter out effective audio namely voice messaging.
In a preferred embodiment, the preprocessing module 202 is also used to according to preset rules from the audio-frequency information
Extract multiframe short signal, wherein the preset rules include preset signals extraction time interval.Then, the multiframe is short
When signal according to mute detection algorithm calculate the short-time energy and the spectral centroid.Then, by the short-time energy and number
Be compared according to the first preset value stored in library, by the second preset value stored in the spectral centroid and the database into
Row compares.When the short-time energy is higher than first preset value, and the spectral centroid is higher than second preset value, sentence
The fixed audio-frequency information is voice messaging, and obtains the voice messaging.
The calculation formula of the short-time energy are as follows:E indicates the short-time energy, and N is indicated
The frame number of short signal, N >=2, s (n) indicate the signal amplitude of n-th frame short signal in time domain.
Illustratively, the preprocessing module 202 from audio-frequency information according to prefixed time interval (such as: 0.2ms) mention
Multiframe short signal s (1), s (2), s (3), s (4) ... s (N) are taken, is then calculated the multiframe short signal extracted short
Shi Nengliang, it is strong and weak with the energy of the determination audio-frequency information.
It should be noted that the short-time energy is the quadratic sum of every frame signal, embodiment be signal energy power, when
When signal energy is too weak, then determine that the signal is mute or ambient noise.
In a further preferred embodiment, the preprocessing module 202 is also used to obtain respectively and the multiframe short signal
Corresponding frequency, and according to the frequency and the multiframe short signal, the audio is calculated according to the mute detection algorithm
The spectral centroid of information, wherein the calculation formula of the spectral centroid are as follows:Wherein, C indicates the frequency spectrum
Center, K indicate frequency number corresponding with N frame s (n) respectively, K >=2, and are integer, S (k) indicate on frequency domain with the s (n)
The spectrum energy distribution that corresponding discrete Fourier transform obtains.
It should be noted that the spectral centroid is also known as frequency spectrum single order away from the value of spectral centroid is smaller, shows more
Spectrum energy concentrates in low-frequency range, and the part of non-ambient noise can be removed by using spectral centroid, such as: cough,
It clicks the sound of mouse or beats the sound of keyboard.
It should be noted that then representing should when threshold value of the index of short-time energy and spectral centroid simultaneously above setting
Audio-frequency information is effective audio namely the voice messaging that speaker speaks, and most ambient noise and non-ambient noise are removed,
So that the voice messaging remained is purer, quality is higher, is that the process of speech recognition reduces a large amount of disturbing factor.At this
In inventive embodiments, by the way that the ratio convenient value of first preset value and second preset value setting is high, and then obtain
High quality voice messaging.
In a further preferred embodiment, the preprocessing module 202 is also used to when the short-time energy is lower than described first
When preset value and/or the spectral centroid are lower than second preset value, determine that the audio-frequency information is invalid audio-frequency information,
And the audio-frequency information is deleted.Wherein, the invalid audio-frequency information includes at least: mute, ambient noise and non-ambient are made an uproar
Sound.
Illustratively, if the short-time energy is lower than first preset value, quiet environment, the audio letter are represented as
The mute or ambient noise of breath.If the spectral centroid is lower than second preset value, it is represented as non-quiet environment, the sound
Frequency information is non-ambient noise.
Characteristic extracting module 203, for the voice messaging to be carried out speech feature extraction.
In a preferred embodiment, the characteristic extracting module 203 with a length of 10 frame of window (100 milliseconds), sound frame by being jumped
Windowing process is carried out to the voice messaging away from the Hamming window for 3 frames (30 milliseconds), and then extracts corresponding phonetic feature.
It should be noted that the phonetic feature includes but is not limited to spectrum signature, sound quality feature and vocal print feature.Frequency spectrum
It is characterized in the voice data different according to acoustical vibration frequency separation, such as target voice and interference voice.Sound quality feature and vocal print
The corresponding speaker of voice data to be tested is identified according to the tone color feature of vocal print and sound when feature.Since speech differentiation is to use
In distinguishing target voice and interference voice in voice data, therefore it may only be necessary to obtain the spectrum signature of the voice messaging, just
It can complete speech differentiation.Wherein frequency spectrum is the abbreviation of frequency spectral density, and spectrum signature is the parameter for reflecting frequency spectral density.
In a preferred embodiment, the voice messaging includes multiple single frames voice data, the characteristic extracting module 203
It is also used to first carry out Fast Fourier Transform (FFT) processing to the single frames voice data, obtains the power spectrum of the voice messaging, so
Afterwards using Meier filter group to the power spectrum carry out dimension-reduction treatment, obtain Meier frequency spectrum, finally to the Meier frequency spectrum into
Row cepstral analysis, to obtain the phonetic feature.
Illustratively, since the Auditory Perception system of people can simulate complicated nonlinear system, the power spectrum of acquisition is not
The non-linear behavior of voice data can be showed well, therefore also need to carry out at dimensionality reduction frequency spectrum using Meier filter group
Reason, so that the frequency spectrum of the voice data to be tested obtained is more nearly the frequency of auditory perceptual.Wherein, Meier filter group be by
The triangle bandpass filter composition of multiple overlappings, triangle bandpass filter carries lower frequency limit, cutoff frequency and center frequency
Three kinds of frequencies of rate.The centre frequency of these triangle bandpass filters be on melscale it is equidistant, melscale is in 1000HZ
It is linear increase before, is into logarithm after 1000HZ and increases.Cepstrum refers to a kind of Fourier transform spectrum warp pair of signal
The inverse Fourier transform carried out again after number operation, since general Fourier spectrum is complex number spectrum, thus cepstrum is also known as cepstrum.
Processing module 204, for handling the phonetic feature, to obtain and the more close target language of speaker
Sound feature.
In a further preferred embodiment, the speech recognition Verification System further includes normalization module 207, splicing module
208 and training module 209.The normalization module 207 is used to carry out the phonetic feature using Z-score standardized method
Normalized carries out the phonetic feature unified, wherein the formula of the normalized are as follows:μ is
The mean value of multiple voice messagings, σ are the standard deviation of multiple voice messagings, and x is the multiple single frames voice data, x*
For the phonetic feature after normalized.The splicing module 208 is for spelling normalized result feature
It connects, to form the long splicing frame of lap.The training module 209 is used to for the splicing frame being input in neural network,
To be trained to the splicing frame, to obtain the target voice feature, to reduce the loss of the voice messaging.
Illustratively, Fig. 2 is please referred to using a length of 10 frame of window, is jumped away from the Hamming window for 3 frames to the normalized knot
Fruit feature is spliced, and the feature of 390 dimensions is formed.It then, is that a splicing unit spells 10 frame with every 10 frame
It connects, specific connecting method please refers to Fig. 3.
It should be noted that since each frame is all 39, therefore, the dimension that 10 frames are stitched together is 390 dimensions.Due to jumping
Away from 3 steps are jumped backward since first frame for 3 frames, next frame number to be spliced together is the 4th frame to the 13rd frame, and so on.
The embodiment of the present invention solves the comparativity between data target, reduces by the way that the phonetic feature is carried out unification
Different Effects caused by unusual sample data help to carry out Comprehensive Correlation evaluation to the phonetic feature, improve better language
Sound training effect.The embodiment of the present invention forms the longer frame of lap by splicing feature, is to capture
Information is spent, the loss of information in longer duration is reduced.
In a further preferred embodiment, the speech recognition Verification System further includes speech verification module 210, is used for institute
It states phonetic feature to be input in trained in advance Speaker change detection model and invasive noise model, and according to output result verification
Whether the voice messaging is a default speaker in the multiple default speakers saved in the Speaker change detection model
Voice then obtains the voice messaging when the voice messaging is the voice of the default speaker.
Specifically, the speech verification module 210 is by verifying whether the phonetic feature is speaker's inspection trained in advance
One of them in the default speaker in model is surveyed, and the speaker is accepted or rejected according to verification result selection.If identification
The phonetic feature is acted as fraudulent substitute for a person by invader out, then refuses the voice messaging of the speaker.
Matching module 205, speaker's phonetic feature for will store in the target voice feature and database carry out
Matching.
Illustratively, speaker's voice that the matching module 205 will store in treated phonetic feature and database
Feature compares, to obtain the speaker's phonetic feature to match with the phonetic feature.
In a preferred embodiment, the speech recognition Verification System 20 also acquires the voice spy of the speaker in advance
Sign, and the identity information of the phonetic feature and corresponding speaker is stored in database.
Specifically, it since during the acquisition of speaker's phonetic feature, environment is quiet environment, therefore is easy to obtain institute
The phonetic feature of speaker is stated, and the identity information of the phonetic feature and corresponding speaker is stored in database.
Output module 206 corresponding with speaker's phonetic feature is spoken what is matched for according to matching result
The identity information of people exports, to obtain the speaker corresponding with the voice messaging.
Specifically, when the speaker's stored in the phonetic feature and database that the characteristic extracting module 203 extracts
When the phonetic feature matching of identity information 1, the output module 206 exports the identity information 1, and then obtains the identity information
Speaker's first representated by 1.
Through the embodiment of the present invention, it is able to ascend the accuracy of speech recognition technology, and greatly promotes user experience.
The present invention also provides a kind of computer equipments, can such as execute smart phone, tablet computer, the notebook electricity of program
Brain, desktop computer, rack-mount server, blade server, tower server or Cabinet-type server (including independent clothes
Server cluster composed by business device or multiple servers) etc..The computer equipment of the present embodiment includes at least but unlimited
In: memory, the processor etc. of connection can be in communication with each other by system bus.
The present embodiment also provides a kind of computer readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory
(for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only memory
(ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic
Disk, CD, server, App are stored thereon with computer program, phase are realized when program is executed by processor using store etc.
Answer function.The computer readable storage medium of the present embodiment identifies Verification System 20 for storaged voice, when being executed by processor
Realize the speech recognition authentication method of embodiment one.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of speech recognition authentication method characterized by comprising
Obtain audio-frequency information;
The audio-frequency information is pre-processed, with according to the short-time energy of the audio-frequency information and spectral centroid from the audio
Voice messaging is obtained in information;
The voice messaging is subjected to speech feature extraction;
The phonetic feature is handled, to obtain and the more close target voice feature of speaker;
The speaker's phonetic feature stored in the target voice feature and database is matched;And
It is according to matching result, the identity information of the speaker corresponding with speaker's phonetic feature matched is defeated
Out, to obtain the speaker corresponding with the voice messaging.
2. speech recognition authentication method as described in claim 1, which is characterized in that described to be located in advance to the audio-frequency information
Reason, the step of voice messaging is obtained from the audio-frequency information according to the short-time energy of the audio-frequency information and spectral centroid,
Include:
Multiframe short signal is extracted from the audio-frequency information according to preset rules, wherein the preset rules include default letter
Number extraction time interval;And
The multiframe short signal is calculated into the short-time energy according to mute detection algorithm;
The spectral centroid is calculated according to the multiframe short signal;
The first preset value stored in the short-time energy and database is compared;
The second preset value stored in the spectral centroid and the database is compared;
When the short-time energy is higher than first preset value, and the spectral centroid is higher than second preset value, determine
The audio-frequency information is voice messaging;And
Obtain the voice messaging.
3. speech recognition authentication method as claimed in claim 2, which is characterized in that the calculation formula of the short-time energy are as follows:
Wherein, E indicates the short-time energy, and N indicates the frame number of short signal, N >=2, and is integer, and s (n) indicates in time domain the
The signal amplitude of n frame short signal.
4. speech recognition authentication method as claimed in claim 2, which is characterized in that described according to the multiframe short signal meter
The step of calculating the spectral centroid, comprising:
Frequency corresponding with the multiframe short signal is obtained respectively;And
According to the frequency and the multiframe short signal, the frequency spectrum of the audio-frequency information is calculated according to the mute detection algorithm
Center, wherein the calculation formula of the spectral centroid are as follows:
Wherein, C indicates that the spectral centroid, K indicate frequency number corresponding with N frame s (n) respectively, K >=2, and is integer, S
(k) the spectrum energy distribution that discrete Fourier transform corresponding with the s (n) obtains on frequency domain is indicated.
5. speech recognition authentication method as claimed in claim 2, which is characterized in that described by the spectral centroid and the number
After the step of being compared according to the second preset value stored in library, further includes:
When the short-time energy is lower than second preset value lower than first preset value and/or the spectral centroid, sentence
The fixed audio-frequency information is invalid audio-frequency information, wherein the invalid audio-frequency information includes at least: mute, ambient noise and non-
Ambient noise;And
The audio-frequency information is deleted.
6. speech recognition authentication method as described in claim 1, which is characterized in that it is described will be at the phonetic feature
Reason, the step of to obtain with speaker's more close target voice feature, comprising:
The phonetic feature is normalized using Z-score standardized method, the phonetic feature is united
One, wherein the formula of the normalized are as follows:μ is the mean value of multiple voice messagings, and σ is multiple institutes
The standard deviation of voice messaging is stated, x is the multiple single frames voice data, and x* is the phonetic feature after normalized;
Normalized result feature is spliced, to form the long splicing frame of lap;And
The splicing frame is input in neural network, it is special to obtain the target voice to be trained to the splicing frame
Sign.
7. speech recognition authentication method as described in claim 1, which is characterized in that it is described will be at the phonetic feature
Reason, after the step of the acquisition target voice feature more close with speaker, further includes:
The target voice feature is input in trained in advance Speaker change detection model and invasive noise model;
It whether is that multiple preset saved in the Speaker change detection model is spoken according to voice messaging described in output result verification
The voice of a default speaker in people;And
When the voice messaging is the voice of the default speaker, then the voice messaging is obtained.
8. a kind of speech recognition Verification System characterized by comprising
Module is obtained, for obtaining audio-frequency information;
Preprocessing module, for being pre-processed to the audio-frequency information, according to the short-time energy of the audio-frequency information and frequency
Spectrum center obtains voice messaging from the audio-frequency information;
Characteristic extracting module, for the voice messaging to be carried out speech feature extraction;
Processing module, for handling the phonetic feature, to obtain and the more close target voice feature of speaker;
Matching module, for matching the speaker's phonetic feature stored in the target voice feature and database;And
Output module is used for according to matching result, the speaker corresponding with speaker's phonetic feature that will be matched
Identity information output, to obtain corresponding with the voice messaging speaker.
9. a kind of computer equipment, which is characterized in that the computer equipment memory, processor and be stored in the memory
Computer program that is upper and can running on the processor, is realized when the computer program is executed by processor as right is wanted
The step of seeking speech recognition authentication method described in any one of 1-7.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium
Program, the computer program can be performed by least one processors, so that at least one described processor executes such as right
It is required that described in any one of 1-7 the step of speech recognition authentication method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910832042.4A CN110473552A (en) | 2019-09-04 | 2019-09-04 | Speech recognition authentication method and system |
PCT/CN2019/117554 WO2021042537A1 (en) | 2019-09-04 | 2019-11-12 | Voice recognition authentication method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910832042.4A CN110473552A (en) | 2019-09-04 | 2019-09-04 | Speech recognition authentication method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110473552A true CN110473552A (en) | 2019-11-19 |
Family
ID=68514996
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910832042.4A Pending CN110473552A (en) | 2019-09-04 | 2019-09-04 | Speech recognition authentication method and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110473552A (en) |
WO (1) | WO2021042537A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112053695A (en) * | 2020-09-11 | 2020-12-08 | 北京三快在线科技有限公司 | Voiceprint recognition method and device, electronic equipment and storage medium |
CN112348527A (en) * | 2020-11-17 | 2021-02-09 | 上海桂垚信息科技有限公司 | Identity authentication method in bank transaction system based on voice recognition |
CN112927680A (en) * | 2021-02-10 | 2021-06-08 | 中国工商银行股份有限公司 | Voiceprint effective voice recognition method and device based on telephone channel |
CN113716246A (en) * | 2021-09-16 | 2021-11-30 | 安徽世绿环保科技有限公司 | Resident rubbish throwing traceability system |
CN113879931A (en) * | 2021-09-13 | 2022-01-04 | 厦门市特种设备检验检测院 | Elevator safety monitoring method |
CN114697759A (en) * | 2022-04-25 | 2022-07-01 | 中国平安人寿保险股份有限公司 | Virtual image video generation method and system, electronic device and storage medium |
CN115214541A (en) * | 2022-08-10 | 2022-10-21 | 海南小鹏汽车科技有限公司 | Vehicle control method, vehicle, and computer-readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103705333A (en) * | 2013-08-30 | 2014-04-09 | 李峰 | Method and device for intelligently stopping snoring |
CN104538036A (en) * | 2015-01-20 | 2015-04-22 | 浙江大学 | Speaker recognition method based on semantic cell mixing model |
CN106356052A (en) * | 2016-10-17 | 2017-01-25 | 腾讯科技(深圳)有限公司 | Voice synthesis method and device |
CN106782564A (en) * | 2016-11-18 | 2017-05-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for processing speech data |
US20180039888A1 (en) * | 2016-08-08 | 2018-02-08 | Interactive Intelligence Group, Inc. | System and method for speaker change detection |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030236663A1 (en) * | 2002-06-19 | 2003-12-25 | Koninklijke Philips Electronics N.V. | Mega speaker identification (ID) system and corresponding methods therefor |
JP4392805B2 (en) * | 2008-04-28 | 2010-01-06 | Kddi株式会社 | Audio information classification device |
CN102820033B (en) * | 2012-08-17 | 2013-12-04 | 南京大学 | Voiceprint identification method |
CN104078039A (en) * | 2013-03-27 | 2014-10-01 | 广东工业大学 | Voice recognition system of domestic service robot on basis of hidden Markov model |
CN106782565A (en) * | 2016-11-29 | 2017-05-31 | 重庆重智机器人研究院有限公司 | A kind of vocal print feature recognition methods and system |
CN108877775B (en) * | 2018-06-04 | 2023-03-31 | 平安科技(深圳)有限公司 | Voice data processing method and device, computer equipment and storage medium |
-
2019
- 2019-09-04 CN CN201910832042.4A patent/CN110473552A/en active Pending
- 2019-11-12 WO PCT/CN2019/117554 patent/WO2021042537A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103705333A (en) * | 2013-08-30 | 2014-04-09 | 李峰 | Method and device for intelligently stopping snoring |
CN104538036A (en) * | 2015-01-20 | 2015-04-22 | 浙江大学 | Speaker recognition method based on semantic cell mixing model |
US20180039888A1 (en) * | 2016-08-08 | 2018-02-08 | Interactive Intelligence Group, Inc. | System and method for speaker change detection |
CN106356052A (en) * | 2016-10-17 | 2017-01-25 | 腾讯科技(深圳)有限公司 | Voice synthesis method and device |
CN106782564A (en) * | 2016-11-18 | 2017-05-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for processing speech data |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112053695A (en) * | 2020-09-11 | 2020-12-08 | 北京三快在线科技有限公司 | Voiceprint recognition method and device, electronic equipment and storage medium |
CN112348527A (en) * | 2020-11-17 | 2021-02-09 | 上海桂垚信息科技有限公司 | Identity authentication method in bank transaction system based on voice recognition |
CN112927680A (en) * | 2021-02-10 | 2021-06-08 | 中国工商银行股份有限公司 | Voiceprint effective voice recognition method and device based on telephone channel |
CN112927680B (en) * | 2021-02-10 | 2022-06-17 | 中国工商银行股份有限公司 | Voiceprint effective voice recognition method and device based on telephone channel |
CN113879931A (en) * | 2021-09-13 | 2022-01-04 | 厦门市特种设备检验检测院 | Elevator safety monitoring method |
CN113716246A (en) * | 2021-09-16 | 2021-11-30 | 安徽世绿环保科技有限公司 | Resident rubbish throwing traceability system |
CN114697759A (en) * | 2022-04-25 | 2022-07-01 | 中国平安人寿保险股份有限公司 | Virtual image video generation method and system, electronic device and storage medium |
CN114697759B (en) * | 2022-04-25 | 2024-04-09 | 中国平安人寿保险股份有限公司 | Virtual image video generation method and system, electronic device and storage medium |
CN115214541A (en) * | 2022-08-10 | 2022-10-21 | 海南小鹏汽车科技有限公司 | Vehicle control method, vehicle, and computer-readable storage medium |
CN115214541B (en) * | 2022-08-10 | 2024-01-09 | 海南小鹏汽车科技有限公司 | Vehicle control method, vehicle, and computer-readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2021042537A1 (en) | 2021-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110473552A (en) | Speech recognition authentication method and system | |
WO2021128741A1 (en) | Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium | |
CN108564940A (en) | Audio recognition method, server and computer readable storage medium | |
CN110473566A (en) | Audio separation method, device, electronic equipment and computer readable storage medium | |
CN108428446A (en) | Audio recognition method and device | |
CN110457432A (en) | Interview methods of marking, device, equipment and storage medium | |
CN110675862A (en) | Corpus acquisition method, electronic device and storage medium | |
CN110544469B (en) | Training method and device of voice recognition model, storage medium and electronic device | |
CN110880329A (en) | Audio identification method and equipment and storage medium | |
CN109584884A (en) | A kind of speech identity feature extractor, classifier training method and relevant device | |
US20060100866A1 (en) | Influencing automatic speech recognition signal-to-noise levels | |
CN109599117A (en) | A kind of audio data recognition methods and human voice anti-replay identifying system | |
CN109147798B (en) | Speech recognition method, device, electronic equipment and readable storage medium | |
CN110136726A (en) | A kind of estimation method, device, system and the storage medium of voice gender | |
CN108154371A (en) | Electronic device, the method for authentication and storage medium | |
CN112382300A (en) | Voiceprint identification method, model training method, device, equipment and storage medium | |
CN110738998A (en) | Voice-based personal credit evaluation method, device, terminal and storage medium | |
CN113223536A (en) | Voiceprint recognition method and device and terminal equipment | |
CN110992940B (en) | Voice interaction method, device, equipment and computer-readable storage medium | |
CN109545226A (en) | A kind of audio recognition method, equipment and computer readable storage medium | |
CN113782032A (en) | Voiceprint recognition method and related device | |
CN109273012A (en) | A kind of identity identifying method based on Speaker Identification and spoken digit recognition | |
CN109087647A (en) | Application on Voiceprint Recognition processing method, device, electronic equipment and storage medium | |
CN113112992B (en) | Voice recognition method and device, storage medium and server | |
CN112420056A (en) | Speaker identity authentication method and system based on variational self-encoder and unmanned aerial vehicle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191119 |
|
RJ01 | Rejection of invention patent application after publication |