CN108460081B - Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium - Google Patents
Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium Download PDFInfo
- Publication number
- CN108460081B CN108460081B CN201810031164.9A CN201810031164A CN108460081B CN 108460081 B CN108460081 B CN 108460081B CN 201810031164 A CN201810031164 A CN 201810031164A CN 108460081 B CN108460081 B CN 108460081B
- Authority
- CN
- China
- Prior art keywords
- voice data
- registration
- index
- primary
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 230000001755 vocal effect Effects 0.000 claims description 99
- 239000002131 composite material Substances 0.000 claims description 43
- 238000004590 computer program Methods 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 29
- 238000004422 calculation algorithm Methods 0.000 claims description 20
- 238000007781 pre-processing Methods 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 10
- 239000000284 extract Substances 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 2
- 230000000875 corresponding effect Effects 0.000 description 98
- 230000006870 function Effects 0.000 description 16
- 238000000605 extraction Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 238000007621 cluster analysis Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000010485 coping Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000006854 communication Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013481 data capture Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000002939 deleterious effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 210000000867 larynx Anatomy 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 210000003928 nasal cavity Anatomy 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 210000002105 tongue Anatomy 0.000 description 1
- 210000000515 tooth Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/61—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/635—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/685—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Library & Information Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of voice data base establishing method, voiceprint registration method, apparatus, equipment and media.The voice data base establishing method includes: acquisition primary voice data, and the primary voice data includes original user mark and voice collecting time;Primary voice data is pre-processed, efficient voice data are obtained;Obtain the corresponding signal-to-noise ratio of the efficient voice data;Efficient voice data are stored in speech database, and establish index for the efficient voice data in speech database, index includes original user mark, voice collecting time and signal-to-noise ratio.The voice data base establishing method is established by the signal-to-noise ratio of pretreatment, calculating efficient voice data to primary voice data and after creating speech database including user identifier, the index of voice collecting time and signal-to-noise ratio, and database data treatment effeciency is improved.
Description
Technical field
The present invention relates to data processing field more particularly to a kind of voice data base establishing method, voiceprint registration method, dresses
It sets, equipment and medium.
Background technique
With the development of artificial intelligence technology, the technology relevant to characteristics of human body such as face, voice and fingerprint is gradually applied
In real life.Vocal print is the sound wave spectrum for the carrying verbal information that electricity consumption acoustic instrument is shown, with specificity and relatively
The characteristics of stability.The generation of human language is a complicated physiology physics mistake between Body Languages maincenter and vocal organs
Journey, phonatory organ -- the difference of tongue, tooth, larynx, lung, nasal cavity in terms of size and form that everyone uses in speech
Very big, the voiceprint map of any two people is all variant, therefore can verify by vocal print to the identity of user.In vocal print
Identification process needs vocal print registered in advance, and current voiceprint registration process is typically all to use real-time recording voice data and carry out vocal print
The mode of extraction is registered.The consumption long period is required to from recorded speech data to voiceprint extraction, this causes entirely to infuse
It is taken a long time during volume, to register efficiency lower.Moreover, when registering vocal print using real-time recording voice data, because when recording
Ambient condition and user's body health status so that recorded speech data for extracting vocal print and other when the language that acquires
There are larger differences for sound data, to influence accuracy of the vocal print of real-time recording voice data extraction in Application on Voiceprint Recognition.
Summary of the invention
The embodiment of the present invention provides a kind of voice data base establishing method, device, equipment and medium, to solve at database
Manage the lower problem of efficiency.
The embodiment of the present invention provides a kind of voiceprint registration method, apparatus, equipment and medium, to solve vocal print feature accuracy
Not high problem.
In a first aspect, the embodiment of the present invention provides a kind of voice data base establishing method, comprising:
Primary voice data is obtained, the primary voice data includes original user mark and voice collecting time;
The primary voice data is pre-processed, efficient voice data are obtained;
Obtain the corresponding signal-to-noise ratio of the efficient voice data;
The efficient voice data are stored in speech database, and are effective language in the speech database
Sound data establish index, and the index includes original user mark, voice collecting time and signal-to-noise ratio.
Second aspect, the embodiment of the present invention provide a kind of speech database creating device, comprising:
Primary voice data obtains module, and for obtaining primary voice data, the primary voice data includes original use
Family mark and voice collecting time;
Data preprocessing module obtains efficient voice data for pre-processing to the primary voice data;
Signal-to-noise ratio obtains module, for obtaining the corresponding signal-to-noise ratio of the efficient voice data;
Speech database index establishes module, for the efficient voice data to be stored in speech database, and is
The efficient voice data in the speech database establish index, and the index includes original user mark, voice collecting
Time and signal-to-noise ratio.
The third aspect, the embodiment of the present invention provide a kind of voiceprint registration method, comprising:
Voiceprint registration request is obtained, the voiceprint registration request includes registration user identifier and current time;
Based on the registration user identifier voice inquirement database, obtain corresponding original with the registration user identifier
The corresponding target index of user identifier, the speech database are using the wound of voice data base establishing method described in first aspect
The speech database built;
According to voice collecting time, signal-to-noise ratio and the current time that the target indexes, each target is obtained
Index corresponding composite index;
It chooses the highest target of composite index and indexes corresponding efficient voice data, as registration voice data;
Based on the registration voice data, corresponding vocal print feature is obtained as registration vocal print.
Fourth aspect, the embodiment of the present invention provide a kind of voiceprint registration device, comprising:
Voiceprint registration request module, for obtaining voiceprint registration request, the voiceprint registration request includes that registration is used
Family mark and current time;
Target index obtains certain block, for being based on the registration user identifier voice inquirement database, obtains and the note
The corresponding original user of volume user identifier identifies corresponding target index, and the speech database is using described in first aspect
Voice data base establishing method creation speech database;
Composite index obtains module, voice collecting time, signal-to-noise ratio for index according to the target and it is described currently
Time obtains each target and indexes corresponding composite index;
It registers voice data and obtains module, index corresponding efficient voice number for choosing the highest target of composite index
According to as registration voice data;
It registers vocal print and obtains module, for being based on the registration voice data, obtain corresponding vocal print feature as registration
Vocal print.
Fifth aspect present invention provides a kind of terminal device, including memory, processor and is stored in the memory
In and the computer program that can run on the processor, the processor realize such as this hair when executing the computer program
Described in bright first aspect the step of voice data base establishing method;Alternatively, reality when the processor executes the computer program
Now as described in third aspect present invention the step of voice data base establishing method.
Sixth aspect present invention provides a kind of computer readable storage medium, and the computer-readable recording medium storage has
Computer program realizes that speech database creates as described in the first aspect of the invention when the computer program is executed by processor
The step of method;Alternatively, the processor realizes the voice number as described in third aspect present invention when executing the computer program
The step of according to base establishing method.
In voice data base establishing method provided in an embodiment of the present invention, device, equipment and storage medium, pass through obtain it is former
Beginning voice data provides data source for creation speech database.Primary voice data is pre-processed again, to obtain effectively
Voice data saves data processing time to improve subsequent treatment effeciency.The corresponding signal-to-noise ratio of efficient voice data is obtained,
By the signal-to-noise ratio, the noise level of efficient voice data can be intuitively judged, to know the language of efficient voice data
Sound quality.Finally efficient voice data are stored in speech database, and are built for the efficient voice data in speech database
Lithol draws, and index includes original user mark, voice collecting time and signal-to-noise ratio.The voice data base establishing method passes through to original
The pretreatment of beginning voice data, the signal-to-noise ratio for calculating efficient voice data and to establish after create speech database include use
Family mark, the index of voice collecting time and signal-to-noise ratio, improve database data treatment effeciency, also increase vocal print feature
Accuracy.Further, it is also possible to which the subsequent voiceprint registration stage is facilitated quickly to navigate to suitable efficient voice data.Pass through voice number
According to reasonable setting of library during creation, the accuracy that the vocal print feature in subsequent voiceprint registration stage is extracted is improved, is reduced
The registion time of voiceprint registration.
In voiceprint registration method, apparatus provided in an embodiment of the present invention, equipment and storage medium, which is adopted
The speech database of the voice data base establishing method creation provided with first aspect present invention carries out voiceprint registration, improves sound
The accuracy of line registration phase vocal print feature extraction, the registion time for reducing voiceprint registration.Mesh is based on during voiceprint registration
Mark indexes to obtain the composite index of corresponding effective voice data, in favor of quickly navigating to suitable efficient voice data, with
Guarantee extracts the vocal print feature the most identical with user, further improves the accuracy of voiceprint registration.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 is a flow chart of the voice data base establishing method provided in the embodiment of the present invention 1;
Fig. 2 is a flow chart of a specific embodiment of step S12 in Fig. 1;
Fig. 3 is a flow chart of another specific embodiment of step S12 in Fig. 1;
Fig. 4 is a functional block diagram of the speech database creating device provided in the embodiment of the present invention 2;
Fig. 5 is a flow chart of the voiceprint registration method provided in the embodiment of the present invention 3;
Fig. 6 is a flow chart of a specific embodiment in the embodiment of the present invention 3;
Fig. 7 is a functional block diagram of the voiceprint registration device provided in the embodiment of the present invention 4;
Fig. 8 is a schematic diagram of the terminal device provided in the embodiment of the present invention 6.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
Embodiment 1
Fig. 1 shows the flow chart of voice data base establishing method in the present embodiment.The voice data base establishing method application
In various terminal equipment or server, for creating speech database, to solve the problems, such as that database processing efficiency is lower.Such as
Shown in Fig. 1, which includes the following steps:
S11: obtaining primary voice data, and primary voice data includes original user mark and voice collecting time.
Wherein, primary voice data refers to untreated voice data after acquisition.Original user mark is for distinguishing
The mark of different user, an original user identify a corresponding unique subscriber.In a specific embodiment, original user mark
Knowledge can be subscriber phone number, user account or identification card number etc..The voice collecting time refers to primary voice data acquisition
Time.
Preferably, primary voice data can be obtained from the database that acquisition has a large number of users voice data.For example, portion
Point enterprise can set up customer service hotline, and user solves it by dialing this customer service hotline using the enterprise
The problem of encountering during product or service, enterprise also can carry out product promotion to client by this customer service hotline
Or pay a return visit etc..Usually, enterprise can record to above-mentioned call, and the voice data of recording is stored in a database
In.Alternatively, in some application programs, when carrying out interactive voice between user or between user and customer service, the number of application program
The voice data of user can be stored with according to library.
S12: pre-processing primary voice data, obtains efficient voice data.
Primary voice data is untreated data after acquisition, therefore may include in primary voice data
In vain, the voice data of redundancy.For example, voice duration does not reach requirement in primary voice data, include in primary voice data
It is not belonging to the voice data of user, it is invalid, redundancy language that the voice quality of primary voice data is undesirable etc.
Sound data.Alternatively, can have the speech period of partial invalidity or redundancy in a primary voice data, this partial redundance or
The presence of invalid speech period can bring deleterious effect to subsequent language data process process, therefore need to remove this part
Redundancy or invalid speech period, wherein speech period is a part in primary voice data.By to primary voice data
It is pre-processed, to obtain efficient voice data, to improve the treatment effeciency of subsequent voice data, to save the time.
S13: the corresponding signal-to-noise ratio of efficient voice data is obtained.
Signal-to-noise ratio (signal-to-noise ratio, SNR) is the ratio for describing effective component and noise contribution in signal
Relation Parameters.Signal-to-noise ratio is higher to illustrate that noise is relatively fewer, by obtaining the signal-to-noise ratio of efficient voice data, can intuitively sentence
The size of noise in disconnected efficient voice data out, to know the voice quality of efficient voice data.Specifically, meter can be passed through
The mode of calculation obtains the corresponding signal-to-noise ratio of efficient voice data.
When obtaining the corresponding signal-to-noise ratio of efficient voice data by the way of calculating, the calculation formula of signal-to-noise ratio can be with
Are as follows: SNR=10Lg (PS/PN), wherein PSAnd PNRespectively represent the effective power of effective component and noise contribution.Optionally,
It can also be converted into the ratio of voltage amplitude, i.e. the calculation formula of signal-to-noise ratio can also indicate are as follows: SNR=20Lg (VS/
VN), wherein VSAnd VNRespectively represent the virtual value of effective component voltage and noise contribution voltage.
In one embodiment, the corresponding signal-to-noise ratio of efficient voice data is obtained, specifically includes the following steps:
Firstly, extracting the fundamental tone data in efficient voice data using Pitch-Synchronous OLA algorithm.Fundamental tone data are effective language
Normal voice data in sound data, and noise data are opposite.Preferably, can using spectrum-subtraction, Wiener Filter Method or
Minimum Mean-Square Error Short-Time Spectral Estimation method extracts fundamental tone data from voice data.
Then, according to the noise data in fundamental tone data acquisition efficient voice data.Fundamental tone number is extracted from voice data
According to rear, the voice data of remaining part is exactly the noise data in voice data.
Finally, calculating the signal-to-noise ratio of voice data according to fundamental tone data and noise data.It is obtained from efficient voice data
After the fundamental tone data and the noise data that obtain efficient voice data, efficient voice can be calculated according to fundamental tone data and noise data
The signal-to-noise ratio of data.Specifically, can first calculate fundamental tone data and noise data effective power or calculate fundamental tone data and
The voltage magnitude of noise data, then the ratio of the two is calculated, to obtain the signal-to-noise ratio of efficient voice data.
In a specific embodiment, after the step of obtaining efficient voice data corresponding signal-to-noise ratio, further includes:
Remove the efficient voice data that signal-to-noise ratio is lower than snr threshold.
After the signal-to-noise ratio for getting efficient voice data, can the efficient voice data too low to signal-to-noise ratio go
Except processing, to reduce data volume, play the role of the pressure for alleviating data processing and storage.Specifically, a letter can be set
It makes an uproar than threshold value, when the signal-to-noise ratio of efficient voice data is lower than this snr threshold, illustrates making an uproar for this section of efficient voice data
Sound is very high, therefore this section of efficient voice data are not suitable as the voice data for being used to carry out voiceprint extraction.Pass through
The efficient voice data that signal-to-noise ratio is lower than snr threshold are removed, data volume is reduced, to alleviate the pressure of data processing and storage,
Subsequent data processing time can also be reduced, treatment effeciency is improved.
S14: efficient voice data are stored in speech database, and are built for the efficient voice data in speech database
Lithol draws, and index includes original user mark, voice collecting time and signal-to-noise ratio.
Wherein, speech database is the database for storing effective voice data.It will be by pre-processing and calculating noise
It is stored in speech database than efficient voice data later, and establishes index for every section of efficient voice data, after improving
The continuous efficiency that data processing is carried out using the speech database.Moreover, can by way of search index in voiceprint registration
To be directly targeted to suitable efficient voice data, and vocal print feature is extracted from corresponding efficient voice data, can be improved
The accuracy of vocal print feature.
Specifically, index includes original user mark, voice collecting time and signal-to-noise ratio.Original user is identified for distinguishing
The efficient voice data of different user.The voice collecting time represents the recording time of voice, in general, the sound meeting of user
As the migration of time has small variation.The voice collecting time is closer from current time, then represents this section of efficient voice number
According to being more close with the voice that user currently records, so that vocal print feature is also more identical.And it can be straight by signal-to-noise ratio
It sees ground and judges the noise level of efficient voice data, to know the voice quality of efficient voice data.
It in one embodiment, be the index that efficient voice data in speech database are established is brin index.
Brin index store table consecutive data block section and corresponding data value range, existed using brin index
Saving on system space has big advantage, needs to store a large amount of original users in speech database and identifies corresponding efficient voice
Data, it is more demanding to the memory space of database, by using brin index, a large amount of index space can be saved.
Therefore, the foundation indexed by speech database, improves database processing efficiency, also increases vocal print feature
Accuracy.Moreover, original user mark, voice collecting time and the signal-to-noise ratio in index can be passed through in the voiceprint registration stage
Three comprehensively considers, in favor of quickly navigating to most suitable efficient voice data, and according to the vocal print of the efficient voice data
Feature is registered, and is greatly reduced and is formed the time of vocal print feature in voiceprint registration stage, and is most suitable by selecting
Efficient voice data, the accuracy of the voiceprint registration also improved.
In voice data base establishing method provided in an embodiment of the present invention, by obtaining primary voice data, to create language
Sound database provides data source.Primary voice data is pre-processed again, it is subsequent to improve to obtain efficient voice data
Treatment effeciency, save data processing time.The corresponding signal-to-noise ratio of efficient voice data is obtained, the signal-to-noise ratio, Ke Yizhi are passed through
It sees ground and judges the noise level of efficient voice data, to know the voice quality of efficient voice data.Finally by effective language
Sound data are stored in speech database, and establish index for the efficient voice data in speech database, and index includes original
User identifier, voice collecting time and signal-to-noise ratio.The voice data base establishing method by pretreatment to primary voice data,
Calculating the signal-to-noise ratio of efficient voice data and establishing after creating speech database includes user identifier, voice collecting time
With the index of signal-to-noise ratio, improves database processing efficiency, also increases the accuracy of vocal print feature.Further, it is also possible to convenient
The subsequent voiceprint registration stage quickly navigates to suitable efficient voice data.It is reasonable during creation by speech database
Setting, when improving the accuracy of the vocal print feature extraction in subsequent voiceprint registration stage, greatly reducing the registration of voiceprint registration
Between.
In a specific embodiment, primary voice data is pre-processed, obtains efficient voice data, it is specific to wrap
It includes following steps: corresponding primary voice data being identified to each original user and is filtered processing and mute removal processing, is obtained
Take efficient voice data.
It is identified in corresponding primary voice data in same original user, it is possible to which there are minorities to be not belonging to the original user
The primary voice data (the case where i.e. other people use) of corresponding user is identified, the primary voice data saves just at this time
The voice data that the original user identifies corresponding user, need this part primary voice data remove, to avoid
There is deviation when the subsequent extraction vocal print feature based on primary voice data.
Therefore, corresponding primary voice data is identified to each original user and is filtered processing, be from raw tone
It is found out in data and is not belonging to the primary voice data that the original user identifies corresponding user, and by the original language in this part
The removal of sound data.Specifically, comparison and matched mode to find out user's sheet can be not belonging to using clustering algorithm or one by one
The primary voice data of people.
In one section of primary voice data, it is possible to there can be voice data in partial period and be in the mute stage, such as
Waiting period in communication process.The corresponding voice data of this partial period belongs to invalid or redundancy voice data, needs
Carry out mute removal processing.
It preferably, can be using voiced activity detection (VAD, Voice Activity Detection) to raw tone number
According to being detected, to identify that phonological component and non-speech portion, non-speech portion are mute part, mute part is gone
It removes, removes mute primary voice data to obtain.
Voiced activity detection, whether the purpose is to detect comprising voice signal presence in current speech signal, i.e., to input
Voice data is judged, the voice signal in voice data is distinguished with various ambient noise signals, respectively to two kinds
Signal uses different processing methods.By voiced activity detection, identify phonological component in one section of primary voice data and
Mute part, and mute part is removed, mute primary voice data is removed to obtain.
It is to be appreciated that identifying corresponding primary voice data to each original user is filtered processing and mute removal
The execution sequence of processing can be replaced, will treated that voice data is known as effective language by filtration treatment and mute removal
Sound data.It can first carry out carrying out mute removal again after filtration treatment, can also first carry out carrying out again after mute removal
Filtration treatment.
In this embodiment, corresponding user's sheet is identified by being not belonging to original user in removal primary voice data
The data of people improve the accuracy of the data stored in speech database.And primary voice data is carried out at mute removal
After reason, the processing time of follow-up data processing is reduced, treatment effeciency is improved.
In a specific embodiment, corresponding primary voice data is identified to each original user and is filtered place
Reason, as shown in Fig. 2, specifically includes the following steps:
S121: the vocal print feature that same original user identifies corresponding primary voice data is extracted;
It is identified based on original user, corresponding primary voice data progress vocal print feature is identified to same original user and is mentioned
It takes.Vocal print feature refers to the essential characteristic that people is characterized in primary voice data, such as the frequency bandwidth of the profile of fundamental tone, formant
And track, spectrum envelop parameter, auditory properties parameter, linear prediction are washed one's face and rinsed one's mouth and its derive from parameter or hybrid parameter etc..Specifically,
It can be based on linear predictive coding (LPC, Linear Predictive Coding) or mel cepstrum coefficients (MFCC, Mel
Frequency Cepstral Coefficient) carry out vocal print feature extraction.
S122: being based on vocal print feature, and same original user is identified corresponding primary voice data and is clustered using k-means
Algorithm carries out clustering, obtains target's center's point.
Wherein, clustering is also known as cluster analysis, it is a kind of statistical analysis side for studying (sample or index) classification problem
Method, while being also an important analysis method of data mining.Cluster point is carried out using k-means algorithm to primary voice data
Analysis obtains target's center's point.Specifically, defining K value is set according to the quantity that same original user identifies corresponding primary voice data,
And set the initial center point of each clustering cluster.After all the points (primary voice data) are all assigned, to this clustering cluster
In all the points recalculate (such as calculate average value) and obtain the new central point of the cluster.Then again by way of iteration into
The step of row distributing center point and the central point of update clustering cluster, until the central point of clustering cluster varies less, or reach
Specified the number of iterations.Using the central point of clustering cluster corresponding to the most point (primary voice data) of quantity as target's center
Point.
S123: using distance algorithm, calculates same original user and identifies in corresponding each primary voice data and target
The distance of heart point.
Distance algorithm refers to the algorithm for estimating the similarity measurement between different samples.It in one embodiment, can be with
Each raw tone number is calculated using manhatton distance, Minkowski Distance, cosine similarity or Euclidean distance scheduling algorithm
According at a distance from target's center's point.
In one embodiment, each primary voice data and target's center's point are calculated using Euclidean distance algorithm
Euclidean distance.
Euclidean distance algorithm refers to (i.e. should in the natural length of actual distance or vector in m-dimensional space between two points
Point arrives the distance of origin).Any two n-dimensional vector a (Xi1,Xi2,...,Xin) and b (Xj1,Xj2,...,Xjn) Euclidean distance beBased on the vocal print feature of each primary voice data, calculated by Euclidean distance algorithm each
The Euclidean distance of primary voice data and target's center's point.
S124: remove same original user identify it is big at a distance from target's center's point in corresponding each primary voice data
In the primary voice data of distance threshold.
After clustering, is identified in corresponding primary voice data in same original user, belong to same user
Primary voice data can cluster near heart point in the target, this part primary voice data is at a distance from target's center's point
Very little.And be not belonging to user primary voice data can far from target's center's point, i.e., this part primary voice data with
The distance of target's center's point is bigger.It therefore, can be by same original user by one reasonable distance threshold of setting
It identifies and is not belonging to the primary voice data of user in corresponding primary voice data and screens and be removed, to protect
Demonstrate,prove the accuracy of data.
In this embodiment, same original user corresponding primary voice data is identified to carry out using clustering algorithm
Clustering, and calculate same original user identify corresponding each primary voice data and clustering cluster target's center's point away from
From, then remove the primary voice data that distance is greater than distance threshold.By the removal of the primary voice data to mistake, ensure that
The accuracy of data, while data volume is reduced, also improve data-handling efficiency.
In a specific embodiment, corresponding primary voice data is identified to each original user and is filtered place
Reason, as shown in figure 3, specifically includes the following steps:
S121 ': the vocal print feature that same original user identifies corresponding primary voice data is extracted.
It is identified based on original user, corresponding primary voice data progress vocal print feature is identified to same original user and is mentioned
It takes.Specifically, linear predictive coding (LPC, Linear Predictive Coding) or mel cepstrum coefficients can be based on
The extraction of (MFCC, Mel Frequency Cepstral Coefficient) progress vocal print feature.
S122 ': will be in the corresponding vocal print feature of primary voice data each in same user identifier and same user identifier
The corresponding vocal print feature of remaining primary voice data is compared and is matched one by one, according to matching result, is counted each original
Voice data it fails to match number.
Wherein, matching result includes successful match and it fails to match two kinds of results.It is corresponding original in same user identifier
In voice data, when there is the primary voice data for being not belonging to user, the vocal print feature of the part primary voice data
Vocal print feature with the primary voice data for belonging to user is unmatched (i.e. it fails to match).Therefore, by will be same
Remaining primary voice data pair in each corresponding vocal print feature of primary voice data and same user identifier in user identifier
The vocal print feature answered is compared and is matched one by one, wherein being not belonging to the primary voice data of user and belonging to user
Primary voice data carry out vocal print feature comparison when, matching result will be that it fails to match.
S123 ': when one section of primary voice data it fails to match number is greater than matching threshold, the raw tone number is removed
According to.
When one section of primary voice data it fails to match number is more, illustrate the vocal print feature of this section of primary voice data
Vocal print feature with other most of primary voice datas is unmatched.In this way, it may determine that going out this section of raw tone number
What it is according to middle storage is the primary voice data for being not belonging to user.Therefore, a matching threshold can be preset, when one section
When primary voice data it fails to match number is greater than the matching threshold, the primary voice data is removed, ensure that data
Accuracy, while data volume is reduced, also improve data-handling efficiency.
In a specific embodiment, corresponding primary voice data is identified to each original user and is filtered place
Reason, further includes following specific steps:
Judge that same original user identifies whether corresponding primary voice data amount is greater than or equal to cluster threshold value;If same
Original user identifies corresponding primary voice data amount and is greater than or equal to cluster threshold value, thens follow the steps S121-S124;If same
Original user identifies corresponding primary voice data amount and is less than cluster threshold value, thens follow the steps S121 '-S123 '.
For clustering algorithm, the accuracy and data volume of clustering are positively correlated.When data volume is little
When, cluster accuracy decreases, and is handled in the case where data volume is little using clustering algorithm, will increase calculating
Complexity.Therefore, a cluster threshold value can be set, the specific value of the cluster threshold value can be according to algorithm characteristic and actual demand
Adjustment.Preferably, which is 10.It is more than or equal to cluster when same original user identifies corresponding primary voice data amount
When threshold value, processing is just filtered to primary voice data using the embodiment of step S121-S124.And when data volume is less than
When clustering threshold value, then processing is filtered to primary voice data using step S121 '-S123 '.
In this embodiment, suitable Processing Algorithm is selected to carry out primary voice data by the size of data volume
Filtration treatment improves the accuracy of data processing.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
Embodiment 2
Fig. 4 shows the original with the one-to-one speech database creating device of voice data base establishing method in embodiment 1
Manage block diagram.As shown in figure 4, the speech database creating device includes that primary voice data obtains module 11, data prediction mould
Block 12, signal-to-noise ratio obtain module 13 and speech database index establishes module 14.Wherein, primary voice data obtain module 11,
Data preprocessing module 12, signal-to-noise ratio obtain module 13 and speech database index establishes the realization function and embodiment of module 14
The corresponding step of voice data base establishing method corresponds in 1, and to avoid repeating, the present embodiment is not described in detail one by one.
Primary voice data obtains module 11, and for obtaining primary voice data, primary voice data includes original user
Mark and voice collecting time.
Data preprocessing module 12 obtains efficient voice data for pre-processing to primary voice data.
Signal-to-noise ratio obtains module 13, for obtaining the corresponding signal-to-noise ratio of efficient voice data.
Speech database index establishes module 14, for efficient voice data to be stored in speech database, and is language
Efficient voice data in sound database establish index, and index includes original user mark, voice collecting time and signal-to-noise ratio.
Preferably, data preprocessing module 12 includes vocal print feature extraction unit 121, cluster analysis unit 122, distance meter
Calculate unit 123, the first data removal unit 124.
Vocal print feature extraction unit 121 identifies the vocal print of corresponding primary voice data for extracting same original user
Feature.
Same original user is identified corresponding primary voice data for being based on vocal print feature by cluster analysis unit 122
Clustering is carried out using k-means clustering algorithm, obtains target's center's point.
Metrics calculation unit 123 calculates same original user and identifies corresponding each original language for using distance algorithm
Sound data are at a distance from target's center's point.
First data removal unit 124 identifies in corresponding each primary voice data for removing same original user
With the primary voice data for being greater than distance threshold at a distance from target's center's point.
Preferably, data preprocessing module 12 further includes data comparison and matching unit 122 ' and the second data removal unit
123’。
Data comparison and matching unit 122 ' are used for the corresponding vocal print of primary voice data each in same user identifier
The corresponding vocal print feature of remaining primary voice data is compared and is matched one by one in feature and same user identifier, according to
With as a result, counting each primary voice data it fails to match number.
Second data removal unit 123 ', for being greater than matching threshold in one section of primary voice data it fails to match number
When, remove the primary voice data.
Preferably, data preprocessing module 12 further includes primary voice data amount judging unit 120.
Primary voice data amount judging unit 120, for judging that same original user identifies corresponding primary voice data
Whether amount is greater than or equal to cluster threshold value.
Embodiment 3
Fig. 5 shows the flow chart of voiceprint registration method in the present embodiment.The voiceprint registration method is applied to be set in various terminals
In standby and server, for carrying out voiceprint registration, to solve to take a long time during voiceprint registration, vocal print feature accuracy not
High problem.As shown in figure 5, the voiceprint registration method includes the following steps:
S21: obtaining voiceprint registration request, and voiceprint registration request includes registration user identifier and current time.
Wherein, voiceprint registration request refers to the request registered using vocal print feature that user proposes.Register user's mark
Know the mark for identifying the user for proposing voiceprint registration request.In a specific embodiment, registration user identifier can be with
It is subscriber phone number, user account or identification card number.Preferably, registration user identifier is corresponding with original user mark
, for example, registering user identifier also as phone number when original user is identified as phone number.Current time refers to acquisition
To the current time of system when voiceprint registration request.
S22: based on registration user identifier voice inquirement database, the original user to match with registration user identifier is obtained
Corresponding target index is identified, speech database is the voice data created using the voice data base establishing method of embodiment 1
Library.
Based on the registration user identifier in voiceprint registration request, inquired in speech database, and speech database
It is the speech database created using the voice data base establishing method of embodiment 1.When the original user mark in an index
When matching with registration user identifier, which is target index.Original user mark matches with registration user identifier
Refer to that original user mark is identical with registration user identifier.Specifically, by efficient voice data foundation in speech database
Index is inquired, and inquiry includes the index of the original user mark to match with registration user identifier, obtains target index.
S23: the voice collecting time indexed according to current time, target and signal-to-noise ratio obtain each target index and correspond to
Composite index.
Wherein, the voice collecting time generally represents the recording time of voice, and the sound of user can be with the migration of time
There is small variation.The voice collecting time is closer from current time, then represents this section of efficient voice data and the current language of user
Sound is closer, so that vocal print feature is also more identical.And efficient voice data can intuitively be judged by signal-to-noise ratio
Noise level, signal-to-noise ratio is higher, then the noise of efficient voice data is smaller, can know the voice of efficient voice data accordingly
Quality.
Based on current time, voice collecting time and signal-to-noise ratio are comprehensively considered, available each target index is corresponding
Composite index.
S24: it chooses the highest target of composite index and indexes corresponding efficient voice data, as registration voice data.
Registration voice data refers to vocal print feature and the most identical efficient voice data of user.In target index,
Target indexes the vocal print feature and use that corresponding composite index is higher, obtains in the efficient voice data corresponding from target index
Family is just more identical.Therefore, the highest target of composite index can be chosen and index corresponding efficient voice data, as note
Volume voice data, improves the accuracy of registration vocal print.
In a specific embodiment, it according to current time, the voice collecting time of target index and signal-to-noise ratio, obtains
Each target indexes corresponding composite index, specifically includes: the voice collecting time indexed according to current time, target and noise
Than calculating each target using composite index calculation formula and indexing corresponding composite index.The composite index calculation formula are as follows:
Composite index=a* signal-to-noise ratio+(1-a) * [1/ (current time-target index voice collecting time)];
Wherein, a is default weight, and 0≤a≤1.
In efficient voice data, signal-to-noise ratio is higher, and noise signal is fewer in the efficient voice data.And when voice collecting
Between it is closer from current time, then it is closer for representing this section of efficient voice data and the current voice of user, so that vocal print is special
Sign is also more close.Therefore, the two factors are based on, further according to the demand of practical application scene, are equipped with for the two factors
Default weight, the composite index of each efficient voice data can be obtained by composite index calculation formula.It obtains each effective
After the composite index of voice data, each efficient voice data can be measured by this intuitive numerical value by composite index,
To select target effective voice data the most suitable.
For example, it is 0.7 that default weight a, which can be set, composite index calculation formula at this time are as follows: composite index=0.7* noise
Than+0.3* [1/ (current time-target index voice collecting time)].After getting any voiceprint registration request, root
According to the efficient voice data that the registration user identifier inquiry acquisition in voiceprint registration request is stored in speech database, and according to
According to the composite index of each efficient voice data of the terminal formula of index.
S25: based on registration voice data, corresponding vocal print feature is obtained as registration vocal print.
After getting registration voice data, it is based on the registration voice data, corresponding vocal print feature is obtained, as registration
Vocal print.
In a specific embodiment, the vocal print feature of efficient voice data can be extracted in advance, and this can be made to have
The vocal print feature of effect phonetic feature is associated with the index in step S14, so as to based on the index fast search to corresponding vocal print
Feature.In the voiceprint registration stage, after obtaining registration voice data, so that it may it is corresponding to directly acquire the registration voice data
Vocal print feature further reduces the time of voiceprint registration as registration vocal print.
In voiceprint registration method provided in an embodiment of the present invention, voiceprint registration request is obtained, to trigger voiceprint registration.Base again
In registration user identifier voice inquirement database, target corresponding with the original user mark that registration user identifier matches is obtained
Index, wherein speech database is the speech database created using the voice data base establishing method of embodiment 1.According to current
Time, the voice collecting time of target index and signal-to-noise ratio, obtain each target and index corresponding composite index, pass through target rope
Draw the composite index that can obtain corresponding efficient voice data.It chooses the highest target of composite index and indexes corresponding effective language
Sound data improve the accuracy of registration vocal print as registration voice data.After getting registration voice data, based on registration
Voice data obtains corresponding vocal print feature as registration vocal print.The voiceprint registration method is using the voice data in embodiment 1
The speech database of base establishing method creation carries out voiceprint registration, improves the accurate of voiceprint registration stage vocal print feature extraction
Property, the registion time for reducing voiceprint registration.Target index is based on during voiceprint registration to obtain corresponding effective voice data
Composite index guarantee to get and coincide the most with user in favor of quickly navigating to suitable efficient voice data
Vocal print feature further improves the accuracy of voiceprint registration.
In a specific embodiment, based on registration user identifier voice inquirement database, as shown in fig. 6, further including
Following steps:
S221: if there is no the original users to match with registration user identifier to identify in speech database, language is sent
Sound recording request.
In speech database, the efficient voice data that user identifier matches may be not present and be registered, are passed through at this time
Voice recording request is sent, obtains registration vocal print by the way of obtaining voice recording data in real time.Specifically, pass through registration
User identifier is inquired in the index in speech database, if original there is no matching with registration user identifier in the index
User identifier, then be not present and efficient voice data that the registration user identifier matches in the speech database, then sends language
Sound recording request.
S222: it obtains voice recording and requests corresponding voice recording data.
After sending voice recording request, user can record according to prompt its voice of typing, recorded speech data recording
After finishing, then the voice recording data are obtained.
S223: corresponding vocal print feature is extracted from voice recording data as registration vocal print.
After obtaining the voice recording data that user records, corresponding vocal print feature is extracted from the voice recording data and is made
To register vocal print.Herein, vocal print feature refer in primary voice data characterize people essential characteristic, such as fundamental tone profile, altogether
The frequency bandwidth at peak of shaking and track, spectrum envelop parameter, auditory properties parameter, linear prediction are washed one's face and rinsed one's mouth and its derive from parameter or mixing
Parameter etc., extracting mode can refer to the step S121 in previous embodiment, and therefore not to repeat here.
In this embodiment, when there is no the corresponding efficient voice data of registration user identifier in speech database
When, the case where registration vocal print is obtained by the way of real-time recording voice data, avoids the user that from can not registering appearance is improved
The integrality and reasonability of voiceprint registration method.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
Embodiment 4
Fig. 7 shows the functional block diagram with the one-to-one voiceprint registration device of voiceprint registration method in embodiment 3.Such as Fig. 7
Shown, which includes voiceprint registration request module 21, target index obtains module 22, synthesis refers to
Number obtains module 23, registration voice data obtains module 24 and registration vocal print obtains module 25.Wherein, voiceprint registration request
Module 21, target index obtain module 22, composite index obtains module 23, registration voice data obtains module 24 and registration vocal print
The realization function step corresponding with voiceprint registration method in embodiment 3 for obtaining module 25 corresponds, to avoid repeating, this reality
Example is applied not to be described in detail one by one.
Voiceprint registration request module 21, for obtaining voiceprint registration request, voiceprint registration request includes registration user
Mark and current time.
Target index obtains module 22, for obtaining and registration user based on registration user identifier voice inquirement database
It identifies corresponding original user and identifies corresponding target index, speech database is the voice data described using embodiment 1
The speech database of base establishing method creation.
Composite index obtains module 23, for obtaining according to current time, the voice collecting time of target index and signal-to-noise ratio
Each target is taken to index corresponding composite index.
It registers voice data and obtains module 24, index corresponding efficient voice number for choosing the highest target of composite index
According to as registration voice data.
It registers vocal print and obtains module 25, for obtaining corresponding vocal print feature as registration sound based on registration voice data
Line.
Preferably, it further includes that voice recording request transmitting unit 221, voice recording data obtain that target index, which obtains module 22,
Take unit 222 and registration voiceprint extraction unit 223.
Voice recording request transmitting unit 221, for there is no match with registration user identifier in speech database
Original user mark, then send voice recording request.
Voice recording data capture unit 222 requests corresponding voice recording data for obtaining voice recording.
Voiceprint extraction unit 223 is registered, for extracting corresponding vocal print feature from voice recording data as registration sound
Line.
Embodiment 5
The present embodiment provides a computer readable storage medium, computer journey is stored on the computer readable storage medium
Sequence realizes voice data base establishing method in embodiment 1, or realizes embodiment 3 when the computer program is executed by processor
Middle voiceprint registration method, to avoid repeating, which is not described herein again.Alternatively, being realized when the computer program is executed by processor real
The function of each module/unit in speech database creating device in example 2 is applied, or is realized in embodiment 4 in voiceprint registration device
The function of each module/unit, to avoid repeating, which is not described herein again.
Embodiment 6
Fig. 8 is the schematic diagram for the terminal device that one embodiment of the invention provides.As shown in figure 8, the terminal of the embodiment is set
Standby 80 include: processor 81, memory 82 and are stored in the computer journey that can be run in memory 82 and on processor 81
Sequence 83.The step of processor 81 realizes voice data base establishing method in above-described embodiment 1 when executing computer program 83, such as
Step S11 to S14 shown in FIG. 1.Alternatively, processor 81 realizes each module/unit in embodiment 2 when executing computer program 83
Function, such as primary voice data shown in Fig. 4 obtain module 11, data preprocessing module 12, signal-to-noise ratio obtain 13 and of module
Speech database indexes the function of establishing module 14.Alternatively, processor 81 realizes above-described embodiment 3 when executing computer program 83
The step of middle voiceprint registration method, such as step S21 to S25 shown in fig. 5.Alternatively, processor 81 executes computer program 83
The function of each module/unit in Shi Shixian embodiment 4, such as the module of voiceprint registration request shown in Fig. 7 21, target index obtain
Modulus block 22, composite index obtain module 23, registration voice data obtains module 24 and register the function that vocal print obtains module 25.
Illustratively, computer program 83 can be divided into one or more module/units, one or more mould
Block/unit is stored in memory 82, and is executed by processor 81, to complete the present invention.One or more module/units can
To be the series of computation machine program instruction section that can complete specific function, the instruction segment is for describing computer program 83 at end
Implementation procedure in end equipment 80.For example, computer program 83, which can be divided into primary voice data shown in Fig. 4, obtains mould
Block 11, data preprocessing module 12, signal-to-noise ratio obtain module 13 and speech database index establishes module 14, each specific function of module
It can will not repeat them here as described in Example 2.Alternatively, computer program 83 can be divided into vocal print note shown in fig. 6
Volume request module 21, target index obtain module 22, composite index obtains module 23, registration voice data obtains module 24
Module 25 is obtained with registration vocal print, each module concrete function is for example as described in Example 4, will not repeat them here.
Terminal device 80 can be desktop PC, notebook, palm PC and cloud server etc. and calculate equipment.Eventually
End equipment may include, but be not limited only to, processor 81, memory 82.It will be understood by those skilled in the art that Fig. 8 is only eventually
The example of end equipment 80 does not constitute the restriction to terminal device 80, may include components more more or fewer than diagram, or
Combine certain components or different components, for example, terminal device can also include input-output equipment, network access equipment,
Bus etc..
Alleged processor 81 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
Memory 82 can be the internal storage unit of terminal device 80, such as the hard disk or memory of terminal device 80.It deposits
Reservoir 82 is also possible to the plug-in type hard disk being equipped on the External memory equipment of terminal device 80, such as terminal device 80, intelligence
Storage card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card)
Deng.Further, memory 82 can also both including terminal device 80 internal storage unit and also including External memory equipment.It deposits
Reservoir 82 is for storing other programs and data needed for computer program and terminal device.Memory 82 can be also used for temporarily
When store the data that has exported or will export.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function
Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different
Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing
The all or part of function of description.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or
In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation
All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program
Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on
The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program generation
Code can be source code form, object identification code form, executable file or certain intermediate forms etc..The computer-readable medium
It may include: any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic that can carry the computer program code
Dish, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM,
Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described
The content that computer-readable medium includes can carry out increasing appropriate according to the requirement made laws in jurisdiction with patent practice
Subtract, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and
Telecommunication signal.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality
Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each
Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified
Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all
It is included within protection scope of the present invention.
Claims (10)
1. a kind of voice data base establishing method characterized by comprising
Primary voice data is obtained, the primary voice data includes original user mark and voice collecting time;
The primary voice data is pre-processed, efficient voice data are obtained;
Obtain the corresponding signal-to-noise ratio of the efficient voice data;
The efficient voice data are stored in speech database, and are the efficient voice number in the speech database
It is indexed according to establishing, the index includes original user mark, voice collecting time and signal-to-noise ratio.
2. voice data base establishing method as described in claim 1, which is characterized in that it is described to the primary voice data into
Row pretreatment, obtains efficient voice data, specifically includes:
Corresponding primary voice data is identified to each original user and is filtered processing and mute removal processing, obtains effective language
Sound data.
3. voice data base establishing method as claimed in claim 2, which is characterized in that described to each original user mark pair
The primary voice data answered is filtered processing, specifically includes:
Extract the vocal print feature that same original user identifies corresponding primary voice data;
Based on the vocal print feature, same original user is identified into corresponding primary voice data and uses k-means clustering algorithm
Clustering is carried out, target's center's point is obtained;
Using distance algorithm, calculates same original user and identify corresponding each primary voice data and target's center's point
Distance;
Remove same original user identify in corresponding each primary voice data be greater than at a distance from target's center's point away from
Primary voice data from threshold value.
4. a kind of voiceprint registration method characterized by comprising
Voiceprint registration request is obtained, the voiceprint registration request includes registration user identifier and current time;
Based on the registration user identifier voice inquirement database, the original user to match with the registration user identifier is obtained
Corresponding target index is identified, the speech database is using the described in any item speech database creations of claim 1-3
The speech database of method creation;
The voice collecting time indexed according to the current time, the target and signal-to-noise ratio obtain each target index
Corresponding composite index;
It chooses the highest target of composite index and indexes corresponding efficient voice data, as registration voice data;
Based on the registration voice data, corresponding vocal print feature is obtained as registration vocal print.
5. voiceprint registration method as claimed in claim 4, which is characterized in that described according to the current time, the target
The voice collecting time of index and signal-to-noise ratio obtain each target and index corresponding composite index, specifically include:
The voice collecting time indexed according to the current time, the target and signal-to-noise ratio, using composite index calculation formula
It calculates each target and indexes corresponding composite index;
The composite index calculation formula are as follows:
Composite index=a* signal-to-noise ratio+(1-a) * [1/ (current time-target index voice collecting time)];
Wherein, a is default weight, and 0≤a≤1.
6. voiceprint registration method as claimed in claim 4, which is characterized in that described to inquire language based on the registration user identifier
Sound database, further includes:
If there is no the original users to match with the registration user identifier to identify in the speech database, voice is sent
Recording request;
It obtains the voice recording and requests corresponding voice recording data;
Corresponding vocal print feature is extracted from the voice recording data as registration vocal print.
7. a kind of speech database creating device characterized by comprising
Primary voice data obtains module, and for obtaining primary voice data, the primary voice data includes original user mark
Know and the voice collecting time;
Data preprocessing module obtains efficient voice data for pre-processing to the primary voice data;
Signal-to-noise ratio obtains module, for obtaining the corresponding signal-to-noise ratio of the efficient voice data;
Speech database index establishes module, for the efficient voice data to be stored in speech database, and is described
The efficient voice data in speech database establish index, and the index includes original user mark, voice collecting time
And signal-to-noise ratio.
8. a kind of voiceprint registration device characterized by comprising
Voiceprint registration request module, for obtaining voiceprint registration request, the voiceprint registration request includes registration user's mark
Knowledge and current time;
Target index obtains module, for being based on the registration user identifier voice inquirement database, obtains and uses with the registration
Family identifies corresponding original user and identifies corresponding target index, and the speech database is any using claim 1-3
The speech database of the item voice data base establishing method creation;
Composite index obtains module, voice collecting time and signal-to-noise ratio for being indexed according to the current time, the target,
It obtains each target and indexes corresponding composite index;
It registers voice data and obtains module, index corresponding efficient voice data for choosing the highest target of composite index, make
To register voice data;
It registers vocal print and obtains module, for being based on the registration voice data, obtain corresponding vocal print feature as registration vocal print.
9. a kind of terminal device, including memory, processor and storage are in the memory and can be on the processor
The computer program of operation, which is characterized in that the processor realizes such as claims 1 to 3 when executing the computer program
The step of described in any item voice data base establishing methods;Alternatively, the processor is realized when executing the computer program
The step of voiceprint registration methods as described in any item such as claim 4 to 6.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists
In the computer program realizes speech database creation side as described in any one of claims 1 to 3 when being executed by processor
The step of method;Alternatively, realizing such as claim 4 to 6 described in any item vocal prints when the computer program is executed by processor
The step of register method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810031164.9A CN108460081B (en) | 2018-01-12 | 2018-01-12 | Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium |
PCT/CN2018/077234 WO2019136801A1 (en) | 2018-01-12 | 2018-02-26 | Voice database creation method, voiceprint registration method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810031164.9A CN108460081B (en) | 2018-01-12 | 2018-01-12 | Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108460081A CN108460081A (en) | 2018-08-28 |
CN108460081B true CN108460081B (en) | 2019-07-12 |
Family
ID=63221350
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810031164.9A Active CN108460081B (en) | 2018-01-12 | 2018-01-12 | Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108460081B (en) |
WO (1) | WO2019136801A1 (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109065056B (en) * | 2018-09-26 | 2021-05-11 | 珠海格力电器股份有限公司 | Method and device for controlling air conditioner through voice |
CN109727602A (en) * | 2018-12-29 | 2019-05-07 | 苏州思必驰信息科技有限公司 | A kind of method for recognizing sound-groove and device of mobile device terminal |
CN111856399B (en) * | 2019-04-26 | 2023-06-30 | 北京嘀嘀无限科技发展有限公司 | Positioning identification method and device based on sound, electronic equipment and storage medium |
CN110689894B (en) * | 2019-08-15 | 2022-03-29 | 深圳市声扬科技有限公司 | Automatic registration method and device and intelligent equipment |
CN110648671A (en) * | 2019-08-21 | 2020-01-03 | 广州国音智能科技有限公司 | Voiceprint model reconstruction method, terminal, device and readable storage medium |
CN110600040B (en) * | 2019-09-19 | 2021-05-25 | 北京三快在线科技有限公司 | Voiceprint feature registration method and device, computer equipment and storage medium |
CN110738524A (en) * | 2019-10-15 | 2020-01-31 | 上海云从企业发展有限公司 | service data management method, system, equipment and medium |
CN110782902A (en) * | 2019-11-06 | 2020-02-11 | 北京远鉴信息技术有限公司 | Audio data determination method, apparatus, device and medium |
CN110875043B (en) * | 2019-11-11 | 2022-06-17 | 广州国音智能科技有限公司 | Voiceprint recognition method and device, mobile terminal and computer readable storage medium |
CN111128198B (en) * | 2019-12-25 | 2022-10-28 | 厦门快商通科技股份有限公司 | Voiceprint recognition method, voiceprint recognition device, storage medium, server and voiceprint recognition system |
CN111243601B (en) * | 2019-12-31 | 2023-04-07 | 北京捷通华声科技股份有限公司 | Voiceprint clustering method and device, electronic equipment and computer-readable storage medium |
CN111415669B (en) * | 2020-04-15 | 2023-03-31 | 厦门快商通科技股份有限公司 | Voiceprint model construction method, device and equipment |
CN112258220B (en) * | 2020-10-12 | 2024-06-07 | 北京豆牛网络科技有限公司 | Information acquisition and analysis method, system, electronic equipment and computer readable medium |
CN112992181A (en) * | 2021-02-08 | 2021-06-18 | 上海哔哩哔哩科技有限公司 | Audio classification method and device |
WO2024049311A1 (en) * | 2022-08-30 | 2024-03-07 | Biometriq Sp. Z O.O. | Method of selecting the optimal voiceprint |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9300790B2 (en) * | 2005-06-24 | 2016-03-29 | Securus Technologies, Inc. | Multi-party conversation analyzer and logger |
CN102509547B (en) * | 2011-12-29 | 2013-06-19 | 辽宁工业大学 | Method and system for voiceprint recognition based on vector quantization based |
CN106095799A (en) * | 2016-05-30 | 2016-11-09 | 广州多益网络股份有限公司 | The storage of a kind of voice, search method and device |
CN106782564B (en) * | 2016-11-18 | 2018-09-11 | 百度在线网络技术(北京)有限公司 | Method and apparatus for handling voice data |
-
2018
- 2018-01-12 CN CN201810031164.9A patent/CN108460081B/en active Active
- 2018-02-26 WO PCT/CN2018/077234 patent/WO2019136801A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2019136801A1 (en) | 2019-07-18 |
CN108460081A (en) | 2018-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108460081B (en) | Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium | |
CN106683680B (en) | Speaker recognition method and device, computer equipment and computer readable medium | |
CN106847292B (en) | Method for recognizing sound-groove and device | |
CN102509547B (en) | Method and system for voiceprint recognition based on vector quantization based | |
CN107481717B (en) | Acoustic model training method and system | |
US6704708B1 (en) | Interactive voice response system | |
CN109243465A (en) | Voiceprint authentication method, device, computer equipment and storage medium | |
JP6303971B2 (en) | Speaker change detection device, speaker change detection method, and computer program for speaker change detection | |
CN110265040A (en) | Training method, device, storage medium and the electronic equipment of sound-groove model | |
CN110223673B (en) | Voice processing method and device, storage medium and electronic equipment | |
WO2019037205A1 (en) | Voice fraud identifying method and apparatus, terminal device, and storage medium | |
CN110415687A (en) | Method of speech processing, device, medium, electronic equipment | |
CN107680582A (en) | Acoustic training model method, audio recognition method, device, equipment and medium | |
CN110246490A (en) | Voice keyword detection method and relevant apparatus | |
JPH05216490A (en) | Apparatus and method for speech coding and apparatus and method for speech recognition | |
CN106128465A (en) | A kind of Voiceprint Recognition System and method | |
CN109801634A (en) | A kind of fusion method and device of vocal print feature | |
CN113436612B (en) | Intention recognition method, device, equipment and storage medium based on voice data | |
CN113223536B (en) | Voiceprint recognition method and device and terminal equipment | |
CN110428853A (en) | Voice activity detection method, Voice activity detection device and electronic equipment | |
CN109817196A (en) | A kind of method of canceling noise, device, system, equipment and storage medium | |
CN110767238A (en) | Blacklist identification method, apparatus, device and storage medium based on address information | |
CN111048072A (en) | Voiceprint recognition method applied to power enterprises | |
CN112951256A (en) | Voice processing method and device | |
CN110033786A (en) | Sexual discriminating method, apparatus, equipment and readable storage medium storing program for executing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |