CN108460081B

CN108460081B - Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium

Info

Publication number: CN108460081B
Application number: CN201810031164.9A
Authority: CN
Inventors: 张丝潆; 王健宗; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-01-12
Filing date: 2018-01-12
Publication date: 2019-07-12
Anticipated expiration: 2038-01-12
Also published as: WO2019136801A1; CN108460081A

Abstract

The invention discloses a kind of voice data base establishing method, voiceprint registration method, apparatus, equipment and media.The voice data base establishing method includes: acquisition primary voice data, and the primary voice data includes original user mark and voice collecting time；Primary voice data is pre-processed, efficient voice data are obtained；Obtain the corresponding signal-to-noise ratio of the efficient voice data；Efficient voice data are stored in speech database, and establish index for the efficient voice data in speech database, index includes original user mark, voice collecting time and signal-to-noise ratio.The voice data base establishing method is established by the signal-to-noise ratio of pretreatment, calculating efficient voice data to primary voice data and after creating speech database including user identifier, the index of voice collecting time and signal-to-noise ratio, and database data treatment effeciency is improved.

Description

Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium

Technical field

The present invention relates to data processing field more particularly to a kind of voice data base establishing method, voiceprint registration method, dresses It sets, equipment and medium.

Background technique

With the development of artificial intelligence technology, the technology relevant to characteristics of human body such as face, voice and fingerprint is gradually applied In real life.Vocal print is the sound wave spectrum for the carrying verbal information that electricity consumption acoustic instrument is shown, with specificity and relatively The characteristics of stability.The generation of human language is a complicated physiology physics mistake between Body Languages maincenter and vocal organs Journey, phonatory organ -- the difference of tongue, tooth, larynx, lung, nasal cavity in terms of size and form that everyone uses in speech Very big, the voiceprint map of any two people is all variant, therefore can verify by vocal print to the identity of user.In vocal print Identification process needs vocal print registered in advance, and current voiceprint registration process is typically all to use real-time recording voice data and carry out vocal print The mode of extraction is registered.The consumption long period is required to from recorded speech data to voiceprint extraction, this causes entirely to infuse It is taken a long time during volume, to register efficiency lower.Moreover, when registering vocal print using real-time recording voice data, because when recording Ambient condition and user's body health status so that recorded speech data for extracting vocal print and other when the language that acquires There are larger differences for sound data, to influence accuracy of the vocal print of real-time recording voice data extraction in Application on Voiceprint Recognition.

Summary of the invention

The embodiment of the present invention provides a kind of voice data base establishing method, device, equipment and medium, to solve at database Manage the lower problem of efficiency.

The embodiment of the present invention provides a kind of voiceprint registration method, apparatus, equipment and medium, to solve vocal print feature accuracy Not high problem.

In a first aspect, the embodiment of the present invention provides a kind of voice data base establishing method, comprising:

Primary voice data is obtained, the primary voice data includes original user mark and voice collecting time；

The primary voice data is pre-processed, efficient voice data are obtained；

Obtain the corresponding signal-to-noise ratio of the efficient voice data；

The efficient voice data are stored in speech database, and are effective language in the speech database Sound data establish index, and the index includes original user mark, voice collecting time and signal-to-noise ratio.

Second aspect, the embodiment of the present invention provide a kind of speech database creating device, comprising:

Primary voice data obtains module, and for obtaining primary voice data, the primary voice data includes original use Family mark and voice collecting time；

Data preprocessing module obtains efficient voice data for pre-processing to the primary voice data；

Signal-to-noise ratio obtains module, for obtaining the corresponding signal-to-noise ratio of the efficient voice data；

Speech database index establishes module, for the efficient voice data to be stored in speech database, and is The efficient voice data in the speech database establish index, and the index includes original user mark, voice collecting Time and signal-to-noise ratio.

The third aspect, the embodiment of the present invention provide a kind of voiceprint registration method, comprising:

Voiceprint registration request is obtained, the voiceprint registration request includes registration user identifier and current time；

Based on the registration user identifier voice inquirement database, obtain corresponding original with the registration user identifier The corresponding target index of user identifier, the speech database are using the wound of voice data base establishing method described in first aspect The speech database built；

According to voice collecting time, signal-to-noise ratio and the current time that the target indexes, each target is obtained Index corresponding composite index；

It chooses the highest target of composite index and indexes corresponding efficient voice data, as registration voice data；

Based on the registration voice data, corresponding vocal print feature is obtained as registration vocal print.

Fourth aspect, the embodiment of the present invention provide a kind of voiceprint registration device, comprising:

Voiceprint registration request module, for obtaining voiceprint registration request, the voiceprint registration request includes that registration is used Family mark and current time；

Target index obtains certain block, for being based on the registration user identifier voice inquirement database, obtains and the note The corresponding original user of volume user identifier identifies corresponding target index, and the speech database is using described in first aspect Voice data base establishing method creation speech database；

Composite index obtains module, voice collecting time, signal-to-noise ratio for index according to the target and it is described currently Time obtains each target and indexes corresponding composite index；

It registers voice data and obtains module, index corresponding efficient voice number for choosing the highest target of composite index According to as registration voice data；

It registers vocal print and obtains module, for being based on the registration voice data, obtain corresponding vocal print feature as registration Vocal print.

Fifth aspect present invention provides a kind of terminal device, including memory, processor and is stored in the memory In and the computer program that can run on the processor, the processor realize such as this hair when executing the computer program Described in bright first aspect the step of voice data base establishing method；Alternatively, reality when the processor executes the computer program Now as described in third aspect present invention the step of voice data base establishing method.

Sixth aspect present invention provides a kind of computer readable storage medium, and the computer-readable recording medium storage has Computer program realizes that speech database creates as described in the first aspect of the invention when the computer program is executed by processor The step of method；Alternatively, the processor realizes the voice number as described in third aspect present invention when executing the computer program The step of according to base establishing method.

In voice data base establishing method provided in an embodiment of the present invention, device, equipment and storage medium, pass through obtain it is former Beginning voice data provides data source for creation speech database.Primary voice data is pre-processed again, to obtain effectively Voice data saves data processing time to improve subsequent treatment effeciency.The corresponding signal-to-noise ratio of efficient voice data is obtained, By the signal-to-noise ratio, the noise level of efficient voice data can be intuitively judged, to know the language of efficient voice data Sound quality.Finally efficient voice data are stored in speech database, and are built for the efficient voice data in speech database Lithol draws, and index includes original user mark, voice collecting time and signal-to-noise ratio.The voice data base establishing method passes through to original The pretreatment of beginning voice data, the signal-to-noise ratio for calculating efficient voice data and to establish after create speech database include use Family mark, the index of voice collecting time and signal-to-noise ratio, improve database data treatment effeciency, also increase vocal print feature Accuracy.Further, it is also possible to which the subsequent voiceprint registration stage is facilitated quickly to navigate to suitable efficient voice data.Pass through voice number According to reasonable setting of library during creation, the accuracy that the vocal print feature in subsequent voiceprint registration stage is extracted is improved, is reduced The registion time of voiceprint registration.

In voiceprint registration method, apparatus provided in an embodiment of the present invention, equipment and storage medium, which is adopted The speech database of the voice data base establishing method creation provided with first aspect present invention carries out voiceprint registration, improves sound The accuracy of line registration phase vocal print feature extraction, the registion time for reducing voiceprint registration.Mesh is based on during voiceprint registration Mark indexes to obtain the composite index of corresponding effective voice data, in favor of quickly navigating to suitable efficient voice data, with Guarantee extracts the vocal print feature the most identical with user, further improves the accuracy of voiceprint registration.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 is a flow chart of the voice data base establishing method provided in the embodiment of the present invention 1；

Fig. 2 is a flow chart of a specific embodiment of step S12 in Fig. 1；

Fig. 3 is a flow chart of another specific embodiment of step S12 in Fig. 1；

Fig. 4 is a functional block diagram of the speech database creating device provided in the embodiment of the present invention 2；

Fig. 5 is a flow chart of the voiceprint registration method provided in the embodiment of the present invention 3；

Fig. 6 is a flow chart of a specific embodiment in the embodiment of the present invention 3；

Fig. 7 is a functional block diagram of the voiceprint registration device provided in the embodiment of the present invention 4；

Fig. 8 is a schematic diagram of the terminal device provided in the embodiment of the present invention 6.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

Embodiment 1

Fig. 1 shows the flow chart of voice data base establishing method in the present embodiment.The voice data base establishing method application In various terminal equipment or server, for creating speech database, to solve the problems, such as that database processing efficiency is lower.Such as Shown in Fig. 1, which includes the following steps:

S11: obtaining primary voice data, and primary voice data includes original user mark and voice collecting time.

Wherein, primary voice data refers to untreated voice data after acquisition.Original user mark is for distinguishing The mark of different user, an original user identify a corresponding unique subscriber.In a specific embodiment, original user mark Knowledge can be subscriber phone number, user account or identification card number etc..The voice collecting time refers to primary voice data acquisition Time.

Preferably, primary voice data can be obtained from the database that acquisition has a large number of users voice data.For example, portion Point enterprise can set up customer service hotline, and user solves it by dialing this customer service hotline using the enterprise The problem of encountering during product or service, enterprise also can carry out product promotion to client by this customer service hotline Or pay a return visit etc..Usually, enterprise can record to above-mentioned call, and the voice data of recording is stored in a database In.Alternatively, in some application programs, when carrying out interactive voice between user or between user and customer service, the number of application program The voice data of user can be stored with according to library.

S12: pre-processing primary voice data, obtains efficient voice data.

Primary voice data is untreated data after acquisition, therefore may include in primary voice data In vain, the voice data of redundancy.For example, voice duration does not reach requirement in primary voice data, include in primary voice data It is not belonging to the voice data of user, it is invalid, redundancy language that the voice quality of primary voice data is undesirable etc. Sound data.Alternatively, can have the speech period of partial invalidity or redundancy in a primary voice data, this partial redundance or The presence of invalid speech period can bring deleterious effect to subsequent language data process process, therefore need to remove this part Redundancy or invalid speech period, wherein speech period is a part in primary voice data.By to primary voice data It is pre-processed, to obtain efficient voice data, to improve the treatment effeciency of subsequent voice data, to save the time.

S13: the corresponding signal-to-noise ratio of efficient voice data is obtained.

Signal-to-noise ratio (signal-to-noise ratio, SNR) is the ratio for describing effective component and noise contribution in signal Relation Parameters.Signal-to-noise ratio is higher to illustrate that noise is relatively fewer, by obtaining the signal-to-noise ratio of efficient voice data, can intuitively sentence The size of noise in disconnected efficient voice data out, to know the voice quality of efficient voice data.Specifically, meter can be passed through The mode of calculation obtains the corresponding signal-to-noise ratio of efficient voice data.

When obtaining the corresponding signal-to-noise ratio of efficient voice data by the way of calculating, the calculation formula of signal-to-noise ratio can be with Are as follows: SNR=10Lg (P_S/P_N), wherein P_SAnd P_NRespectively represent the effective power of effective component and noise contribution.Optionally, It can also be converted into the ratio of voltage amplitude, i.e. the calculation formula of signal-to-noise ratio can also indicate are as follows: SNR=20Lg (V_S/ V_N), wherein V_SAnd V_NRespectively represent the virtual value of effective component voltage and noise contribution voltage.

In one embodiment, the corresponding signal-to-noise ratio of efficient voice data is obtained, specifically includes the following steps:

Firstly, extracting the fundamental tone data in efficient voice data using Pitch-Synchronous OLA algorithm.Fundamental tone data are effective language Normal voice data in sound data, and noise data are opposite.Preferably, can using spectrum-subtraction, Wiener Filter Method or Minimum Mean-Square Error Short-Time Spectral Estimation method extracts fundamental tone data from voice data.

Then, according to the noise data in fundamental tone data acquisition efficient voice data.Fundamental tone number is extracted from voice data According to rear, the voice data of remaining part is exactly the noise data in voice data.

Finally, calculating the signal-to-noise ratio of voice data according to fundamental tone data and noise data.It is obtained from efficient voice data After the fundamental tone data and the noise data that obtain efficient voice data, efficient voice can be calculated according to fundamental tone data and noise data The signal-to-noise ratio of data.Specifically, can first calculate fundamental tone data and noise data effective power or calculate fundamental tone data and The voltage magnitude of noise data, then the ratio of the two is calculated, to obtain the signal-to-noise ratio of efficient voice data.

In a specific embodiment, after the step of obtaining efficient voice data corresponding signal-to-noise ratio, further includes: Remove the efficient voice data that signal-to-noise ratio is lower than snr threshold.

After the signal-to-noise ratio for getting efficient voice data, can the efficient voice data too low to signal-to-noise ratio go Except processing, to reduce data volume, play the role of the pressure for alleviating data processing and storage.Specifically, a letter can be set It makes an uproar than threshold value, when the signal-to-noise ratio of efficient voice data is lower than this snr threshold, illustrates making an uproar for this section of efficient voice data Sound is very high, therefore this section of efficient voice data are not suitable as the voice data for being used to carry out voiceprint extraction.Pass through The efficient voice data that signal-to-noise ratio is lower than snr threshold are removed, data volume is reduced, to alleviate the pressure of data processing and storage, Subsequent data processing time can also be reduced, treatment effeciency is improved.

S14: efficient voice data are stored in speech database, and are built for the efficient voice data in speech database Lithol draws, and index includes original user mark, voice collecting time and signal-to-noise ratio.

Wherein, speech database is the database for storing effective voice data.It will be by pre-processing and calculating noise It is stored in speech database than efficient voice data later, and establishes index for every section of efficient voice data, after improving The continuous efficiency that data processing is carried out using the speech database.Moreover, can by way of search index in voiceprint registration To be directly targeted to suitable efficient voice data, and vocal print feature is extracted from corresponding efficient voice data, can be improved The accuracy of vocal print feature.

Specifically, index includes original user mark, voice collecting time and signal-to-noise ratio.Original user is identified for distinguishing The efficient voice data of different user.The voice collecting time represents the recording time of voice, in general, the sound meeting of user As the migration of time has small variation.The voice collecting time is closer from current time, then represents this section of efficient voice number According to being more close with the voice that user currently records, so that vocal print feature is also more identical.And it can be straight by signal-to-noise ratio It sees ground and judges the noise level of efficient voice data, to know the voice quality of efficient voice data.

It in one embodiment, be the index that efficient voice data in speech database are established is brin index.

Brin index store table consecutive data block section and corresponding data value range, existed using brin index Saving on system space has big advantage, needs to store a large amount of original users in speech database and identifies corresponding efficient voice Data, it is more demanding to the memory space of database, by using brin index, a large amount of index space can be saved.

Therefore, the foundation indexed by speech database, improves database processing efficiency, also increases vocal print feature Accuracy.Moreover, original user mark, voice collecting time and the signal-to-noise ratio in index can be passed through in the voiceprint registration stage Three comprehensively considers, in favor of quickly navigating to most suitable efficient voice data, and according to the vocal print of the efficient voice data Feature is registered, and is greatly reduced and is formed the time of vocal print feature in voiceprint registration stage, and is most suitable by selecting Efficient voice data, the accuracy of the voiceprint registration also improved.

In voice data base establishing method provided in an embodiment of the present invention, by obtaining primary voice data, to create language Sound database provides data source.Primary voice data is pre-processed again, it is subsequent to improve to obtain efficient voice data Treatment effeciency, save data processing time.The corresponding signal-to-noise ratio of efficient voice data is obtained, the signal-to-noise ratio, Ke Yizhi are passed through It sees ground and judges the noise level of efficient voice data, to know the voice quality of efficient voice data.Finally by effective language Sound data are stored in speech database, and establish index for the efficient voice data in speech database, and index includes original User identifier, voice collecting time and signal-to-noise ratio.The voice data base establishing method by pretreatment to primary voice data, Calculating the signal-to-noise ratio of efficient voice data and establishing after creating speech database includes user identifier, voice collecting time With the index of signal-to-noise ratio, improves database processing efficiency, also increases the accuracy of vocal print feature.Further, it is also possible to convenient The subsequent voiceprint registration stage quickly navigates to suitable efficient voice data.It is reasonable during creation by speech database Setting, when improving the accuracy of the vocal print feature extraction in subsequent voiceprint registration stage, greatly reducing the registration of voiceprint registration Between.

In a specific embodiment, primary voice data is pre-processed, obtains efficient voice data, it is specific to wrap It includes following steps: corresponding primary voice data being identified to each original user and is filtered processing and mute removal processing, is obtained Take efficient voice data.

It is identified in corresponding primary voice data in same original user, it is possible to which there are minorities to be not belonging to the original user The primary voice data (the case where i.e. other people use) of corresponding user is identified, the primary voice data saves just at this time The voice data that the original user identifies corresponding user, need this part primary voice data remove, to avoid There is deviation when the subsequent extraction vocal print feature based on primary voice data.

Therefore, corresponding primary voice data is identified to each original user and is filtered processing, be from raw tone It is found out in data and is not belonging to the primary voice data that the original user identifies corresponding user, and by the original language in this part The removal of sound data.Specifically, comparison and matched mode to find out user's sheet can be not belonging to using clustering algorithm or one by one The primary voice data of people.

In one section of primary voice data, it is possible to there can be voice data in partial period and be in the mute stage, such as Waiting period in communication process.The corresponding voice data of this partial period belongs to invalid or redundancy voice data, needs Carry out mute removal processing.

It preferably, can be using voiced activity detection (VAD, Voice Activity Detection) to raw tone number According to being detected, to identify that phonological component and non-speech portion, non-speech portion are mute part, mute part is gone It removes, removes mute primary voice data to obtain.

Voiced activity detection, whether the purpose is to detect comprising voice signal presence in current speech signal, i.e., to input Voice data is judged, the voice signal in voice data is distinguished with various ambient noise signals, respectively to two kinds Signal uses different processing methods.By voiced activity detection, identify phonological component in one section of primary voice data and Mute part, and mute part is removed, mute primary voice data is removed to obtain.

It is to be appreciated that identifying corresponding primary voice data to each original user is filtered processing and mute removal The execution sequence of processing can be replaced, will treated that voice data is known as effective language by filtration treatment and mute removal Sound data.It can first carry out carrying out mute removal again after filtration treatment, can also first carry out carrying out again after mute removal Filtration treatment.

In this embodiment, corresponding user's sheet is identified by being not belonging to original user in removal primary voice data The data of people improve the accuracy of the data stored in speech database.And primary voice data is carried out at mute removal After reason, the processing time of follow-up data processing is reduced, treatment effeciency is improved.

In a specific embodiment, corresponding primary voice data is identified to each original user and is filtered place Reason, as shown in Fig. 2, specifically includes the following steps:

S121: the vocal print feature that same original user identifies corresponding primary voice data is extracted；

It is identified based on original user, corresponding primary voice data progress vocal print feature is identified to same original user and is mentioned It takes.Vocal print feature refers to the essential characteristic that people is characterized in primary voice data, such as the frequency bandwidth of the profile of fundamental tone, formant And track, spectrum envelop parameter, auditory properties parameter, linear prediction are washed one's face and rinsed one's mouth and its derive from parameter or hybrid parameter etc..Specifically, It can be based on linear predictive coding (LPC, Linear Predictive Coding) or mel cepstrum coefficients (MFCC, Mel Frequency Cepstral Coefficient) carry out vocal print feature extraction.

S122: being based on vocal print feature, and same original user is identified corresponding primary voice data and is clustered using k-means Algorithm carries out clustering, obtains target's center's point.

Wherein, clustering is also known as cluster analysis, it is a kind of statistical analysis side for studying (sample or index) classification problem Method, while being also an important analysis method of data mining.Cluster point is carried out using k-means algorithm to primary voice data Analysis obtains target's center's point.Specifically, defining K value is set according to the quantity that same original user identifies corresponding primary voice data, And set the initial center point of each clustering cluster.After all the points (primary voice data) are all assigned, to this clustering cluster In all the points recalculate (such as calculate average value) and obtain the new central point of the cluster.Then again by way of iteration into The step of row distributing center point and the central point of update clustering cluster, until the central point of clustering cluster varies less, or reach Specified the number of iterations.Using the central point of clustering cluster corresponding to the most point (primary voice data) of quantity as target's center Point.

S123: using distance algorithm, calculates same original user and identifies in corresponding each primary voice data and target The distance of heart point.

Distance algorithm refers to the algorithm for estimating the similarity measurement between different samples.It in one embodiment, can be with Each raw tone number is calculated using manhatton distance, Minkowski Distance, cosine similarity or Euclidean distance scheduling algorithm According at a distance from target's center's point.

In one embodiment, each primary voice data and target's center's point are calculated using Euclidean distance algorithm Euclidean distance.

Euclidean distance algorithm refers to (i.e. should in the natural length of actual distance or vector in m-dimensional space between two points Point arrives the distance of origin).Any two n-dimensional vector a (X_i1,X_i2,...,X_in) and b (X_j1,X_j2,...,X_jn) Euclidean distance beBased on the vocal print feature of each primary voice data, calculated by Euclidean distance algorithm each The Euclidean distance of primary voice data and target's center's point.

S124: remove same original user identify it is big at a distance from target's center's point in corresponding each primary voice data In the primary voice data of distance threshold.

After clustering, is identified in corresponding primary voice data in same original user, belong to same user Primary voice data can cluster near heart point in the target, this part primary voice data is at a distance from target's center's point Very little.And be not belonging to user primary voice data can far from target's center's point, i.e., this part primary voice data with The distance of target's center's point is bigger.It therefore, can be by same original user by one reasonable distance threshold of setting It identifies and is not belonging to the primary voice data of user in corresponding primary voice data and screens and be removed, to protect Demonstrate,prove the accuracy of data.

In this embodiment, same original user corresponding primary voice data is identified to carry out using clustering algorithm Clustering, and calculate same original user identify corresponding each primary voice data and clustering cluster target's center's point away from From, then remove the primary voice data that distance is greater than distance threshold.By the removal of the primary voice data to mistake, ensure that The accuracy of data, while data volume is reduced, also improve data-handling efficiency.

In a specific embodiment, corresponding primary voice data is identified to each original user and is filtered place Reason, as shown in figure 3, specifically includes the following steps:

S121 ': the vocal print feature that same original user identifies corresponding primary voice data is extracted.

It is identified based on original user, corresponding primary voice data progress vocal print feature is identified to same original user and is mentioned It takes.Specifically, linear predictive coding (LPC, Linear Predictive Coding) or mel cepstrum coefficients can be based on The extraction of (MFCC, Mel Frequency Cepstral Coefficient) progress vocal print feature.

S122 ': will be in the corresponding vocal print feature of primary voice data each in same user identifier and same user identifier The corresponding vocal print feature of remaining primary voice data is compared and is matched one by one, according to matching result, is counted each original Voice data it fails to match number.

Wherein, matching result includes successful match and it fails to match two kinds of results.It is corresponding original in same user identifier In voice data, when there is the primary voice data for being not belonging to user, the vocal print feature of the part primary voice data Vocal print feature with the primary voice data for belonging to user is unmatched (i.e. it fails to match).Therefore, by will be same Remaining primary voice data pair in each corresponding vocal print feature of primary voice data and same user identifier in user identifier The vocal print feature answered is compared and is matched one by one, wherein being not belonging to the primary voice data of user and belonging to user Primary voice data carry out vocal print feature comparison when, matching result will be that it fails to match.

S123 ': when one section of primary voice data it fails to match number is greater than matching threshold, the raw tone number is removed According to.

When one section of primary voice data it fails to match number is more, illustrate the vocal print feature of this section of primary voice data Vocal print feature with other most of primary voice datas is unmatched.In this way, it may determine that going out this section of raw tone number What it is according to middle storage is the primary voice data for being not belonging to user.Therefore, a matching threshold can be preset, when one section When primary voice data it fails to match number is greater than the matching threshold, the primary voice data is removed, ensure that data Accuracy, while data volume is reduced, also improve data-handling efficiency.

In a specific embodiment, corresponding primary voice data is identified to each original user and is filtered place Reason, further includes following specific steps:

Judge that same original user identifies whether corresponding primary voice data amount is greater than or equal to cluster threshold value；If same Original user identifies corresponding primary voice data amount and is greater than or equal to cluster threshold value, thens follow the steps S121-S124；If same Original user identifies corresponding primary voice data amount and is less than cluster threshold value, thens follow the steps S121 '-S123 '.

For clustering algorithm, the accuracy and data volume of clustering are positively correlated.When data volume is little When, cluster accuracy decreases, and is handled in the case where data volume is little using clustering algorithm, will increase calculating Complexity.Therefore, a cluster threshold value can be set, the specific value of the cluster threshold value can be according to algorithm characteristic and actual demand Adjustment.Preferably, which is 10.It is more than or equal to cluster when same original user identifies corresponding primary voice data amount When threshold value, processing is just filtered to primary voice data using the embodiment of step S121-S124.And when data volume is less than When clustering threshold value, then processing is filtered to primary voice data using step S121 '-S123 '.

In this embodiment, suitable Processing Algorithm is selected to carry out primary voice data by the size of data volume Filtration treatment improves the accuracy of data processing.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

Embodiment 2

Fig. 4 shows the original with the one-to-one speech database creating device of voice data base establishing method in embodiment 1 Manage block diagram.As shown in figure 4, the speech database creating device includes that primary voice data obtains module 11, data prediction mould Block 12, signal-to-noise ratio obtain module 13 and speech database index establishes module 14.Wherein, primary voice data obtain module 11, Data preprocessing module 12, signal-to-noise ratio obtain module 13 and speech database index establishes the realization function and embodiment of module 14 The corresponding step of voice data base establishing method corresponds in 1, and to avoid repeating, the present embodiment is not described in detail one by one.

Primary voice data obtains module 11, and for obtaining primary voice data, primary voice data includes original user Mark and voice collecting time.

Data preprocessing module 12 obtains efficient voice data for pre-processing to primary voice data.

Signal-to-noise ratio obtains module 13, for obtaining the corresponding signal-to-noise ratio of efficient voice data.

Speech database index establishes module 14, for efficient voice data to be stored in speech database, and is language Efficient voice data in sound database establish index, and index includes original user mark, voice collecting time and signal-to-noise ratio.

Preferably, data preprocessing module 12 includes vocal print feature extraction unit 121, cluster analysis unit 122, distance meter Calculate unit 123, the first data removal unit 124.

Vocal print feature extraction unit 121 identifies the vocal print of corresponding primary voice data for extracting same original user Feature.

Same original user is identified corresponding primary voice data for being based on vocal print feature by cluster analysis unit 122 Clustering is carried out using k-means clustering algorithm, obtains target's center's point.

Metrics calculation unit 123 calculates same original user and identifies corresponding each original language for using distance algorithm Sound data are at a distance from target's center's point.

First data removal unit 124 identifies in corresponding each primary voice data for removing same original user With the primary voice data for being greater than distance threshold at a distance from target's center's point.

Preferably, data preprocessing module 12 further includes data comparison and matching unit 122 ' and the second data removal unit 123’。

Data comparison and matching unit 122 ' are used for the corresponding vocal print of primary voice data each in same user identifier The corresponding vocal print feature of remaining primary voice data is compared and is matched one by one in feature and same user identifier, according to With as a result, counting each primary voice data it fails to match number.

Second data removal unit 123 ', for being greater than matching threshold in one section of primary voice data it fails to match number When, remove the primary voice data.

Preferably, data preprocessing module 12 further includes primary voice data amount judging unit 120.

Primary voice data amount judging unit 120, for judging that same original user identifies corresponding primary voice data Whether amount is greater than or equal to cluster threshold value.

Embodiment 3

Fig. 5 shows the flow chart of voiceprint registration method in the present embodiment.The voiceprint registration method is applied to be set in various terminals In standby and server, for carrying out voiceprint registration, to solve to take a long time during voiceprint registration, vocal print feature accuracy not High problem.As shown in figure 5, the voiceprint registration method includes the following steps:

S21: obtaining voiceprint registration request, and voiceprint registration request includes registration user identifier and current time.

Wherein, voiceprint registration request refers to the request registered using vocal print feature that user proposes.Register user's mark Know the mark for identifying the user for proposing voiceprint registration request.In a specific embodiment, registration user identifier can be with It is subscriber phone number, user account or identification card number.Preferably, registration user identifier is corresponding with original user mark , for example, registering user identifier also as phone number when original user is identified as phone number.Current time refers to acquisition To the current time of system when voiceprint registration request.

S22: based on registration user identifier voice inquirement database, the original user to match with registration user identifier is obtained Corresponding target index is identified, speech database is the voice data created using the voice data base establishing method of embodiment 1 Library.

Based on the registration user identifier in voiceprint registration request, inquired in speech database, and speech database It is the speech database created using the voice data base establishing method of embodiment 1.When the original user mark in an index When matching with registration user identifier, which is target index.Original user mark matches with registration user identifier Refer to that original user mark is identical with registration user identifier.Specifically, by efficient voice data foundation in speech database Index is inquired, and inquiry includes the index of the original user mark to match with registration user identifier, obtains target index.

S23: the voice collecting time indexed according to current time, target and signal-to-noise ratio obtain each target index and correspond to Composite index.

Wherein, the voice collecting time generally represents the recording time of voice, and the sound of user can be with the migration of time There is small variation.The voice collecting time is closer from current time, then represents this section of efficient voice data and the current language of user Sound is closer, so that vocal print feature is also more identical.And efficient voice data can intuitively be judged by signal-to-noise ratio Noise level, signal-to-noise ratio is higher, then the noise of efficient voice data is smaller, can know the voice of efficient voice data accordingly Quality.

Based on current time, voice collecting time and signal-to-noise ratio are comprehensively considered, available each target index is corresponding Composite index.

S24: it chooses the highest target of composite index and indexes corresponding efficient voice data, as registration voice data.

Registration voice data refers to vocal print feature and the most identical efficient voice data of user.In target index, Target indexes the vocal print feature and use that corresponding composite index is higher, obtains in the efficient voice data corresponding from target index Family is just more identical.Therefore, the highest target of composite index can be chosen and index corresponding efficient voice data, as note Volume voice data, improves the accuracy of registration vocal print.

In a specific embodiment, it according to current time, the voice collecting time of target index and signal-to-noise ratio, obtains Each target indexes corresponding composite index, specifically includes: the voice collecting time indexed according to current time, target and noise Than calculating each target using composite index calculation formula and indexing corresponding composite index.The composite index calculation formula are as follows:

Composite index=a* signal-to-noise ratio+(1-a) * [1/ (current time-target index voice collecting time)]；

Wherein, a is default weight, and 0≤a≤1.

In efficient voice data, signal-to-noise ratio is higher, and noise signal is fewer in the efficient voice data.And when voice collecting Between it is closer from current time, then it is closer for representing this section of efficient voice data and the current voice of user, so that vocal print is special Sign is also more close.Therefore, the two factors are based on, further according to the demand of practical application scene, are equipped with for the two factors Default weight, the composite index of each efficient voice data can be obtained by composite index calculation formula.It obtains each effective After the composite index of voice data, each efficient voice data can be measured by this intuitive numerical value by composite index, To select target effective voice data the most suitable.

For example, it is 0.7 that default weight a, which can be set, composite index calculation formula at this time are as follows: composite index=0.7* noise Than+0.3* [1/ (current time-target index voice collecting time)].After getting any voiceprint registration request, root According to the efficient voice data that the registration user identifier inquiry acquisition in voiceprint registration request is stored in speech database, and according to According to the composite index of each efficient voice data of the terminal formula of index.

S25: based on registration voice data, corresponding vocal print feature is obtained as registration vocal print.

After getting registration voice data, it is based on the registration voice data, corresponding vocal print feature is obtained, as registration Vocal print.

In a specific embodiment, the vocal print feature of efficient voice data can be extracted in advance, and this can be made to have The vocal print feature of effect phonetic feature is associated with the index in step S14, so as to based on the index fast search to corresponding vocal print Feature.In the voiceprint registration stage, after obtaining registration voice data, so that it may it is corresponding to directly acquire the registration voice data Vocal print feature further reduces the time of voiceprint registration as registration vocal print.

In voiceprint registration method provided in an embodiment of the present invention, voiceprint registration request is obtained, to trigger voiceprint registration.Base again In registration user identifier voice inquirement database, target corresponding with the original user mark that registration user identifier matches is obtained Index, wherein speech database is the speech database created using the voice data base establishing method of embodiment 1.According to current Time, the voice collecting time of target index and signal-to-noise ratio, obtain each target and index corresponding composite index, pass through target rope Draw the composite index that can obtain corresponding efficient voice data.It chooses the highest target of composite index and indexes corresponding effective language Sound data improve the accuracy of registration vocal print as registration voice data.After getting registration voice data, based on registration Voice data obtains corresponding vocal print feature as registration vocal print.The voiceprint registration method is using the voice data in embodiment 1 The speech database of base establishing method creation carries out voiceprint registration, improves the accurate of voiceprint registration stage vocal print feature extraction Property, the registion time for reducing voiceprint registration.Target index is based on during voiceprint registration to obtain corresponding effective voice data Composite index guarantee to get and coincide the most with user in favor of quickly navigating to suitable efficient voice data Vocal print feature further improves the accuracy of voiceprint registration.

In a specific embodiment, based on registration user identifier voice inquirement database, as shown in fig. 6, further including Following steps:

S221: if there is no the original users to match with registration user identifier to identify in speech database, language is sent Sound recording request.

In speech database, the efficient voice data that user identifier matches may be not present and be registered, are passed through at this time Voice recording request is sent, obtains registration vocal print by the way of obtaining voice recording data in real time.Specifically, pass through registration User identifier is inquired in the index in speech database, if original there is no matching with registration user identifier in the index User identifier, then be not present and efficient voice data that the registration user identifier matches in the speech database, then sends language Sound recording request.

S222: it obtains voice recording and requests corresponding voice recording data.

After sending voice recording request, user can record according to prompt its voice of typing, recorded speech data recording After finishing, then the voice recording data are obtained.

S223: corresponding vocal print feature is extracted from voice recording data as registration vocal print.

After obtaining the voice recording data that user records, corresponding vocal print feature is extracted from the voice recording data and is made To register vocal print.Herein, vocal print feature refer in primary voice data characterize people essential characteristic, such as fundamental tone profile, altogether The frequency bandwidth at peak of shaking and track, spectrum envelop parameter, auditory properties parameter, linear prediction are washed one's face and rinsed one's mouth and its derive from parameter or mixing Parameter etc., extracting mode can refer to the step S121 in previous embodiment, and therefore not to repeat here.

In this embodiment, when there is no the corresponding efficient voice data of registration user identifier in speech database When, the case where registration vocal print is obtained by the way of real-time recording voice data, avoids the user that from can not registering appearance is improved The integrality and reasonability of voiceprint registration method.

Embodiment 4

Fig. 7 shows the functional block diagram with the one-to-one voiceprint registration device of voiceprint registration method in embodiment 3.Such as Fig. 7 Shown, which includes voiceprint registration request module 21, target index obtains module 22, synthesis refers to Number obtains module 23, registration voice data obtains module 24 and registration vocal print obtains module 25.Wherein, voiceprint registration request Module 21, target index obtain module 22, composite index obtains module 23, registration voice data obtains module 24 and registration vocal print The realization function step corresponding with voiceprint registration method in embodiment 3 for obtaining module 25 corresponds, to avoid repeating, this reality Example is applied not to be described in detail one by one.

Voiceprint registration request module 21, for obtaining voiceprint registration request, voiceprint registration request includes registration user Mark and current time.

Target index obtains module 22, for obtaining and registration user based on registration user identifier voice inquirement database It identifies corresponding original user and identifies corresponding target index, speech database is the voice data described using embodiment 1 The speech database of base establishing method creation.

Composite index obtains module 23, for obtaining according to current time, the voice collecting time of target index and signal-to-noise ratio Each target is taken to index corresponding composite index.

It registers voice data and obtains module 24, index corresponding efficient voice number for choosing the highest target of composite index According to as registration voice data.

It registers vocal print and obtains module 25, for obtaining corresponding vocal print feature as registration sound based on registration voice data Line.

Preferably, it further includes that voice recording request transmitting unit 221, voice recording data obtain that target index, which obtains module 22, Take unit 222 and registration voiceprint extraction unit 223.

Voice recording request transmitting unit 221, for there is no match with registration user identifier in speech database Original user mark, then send voice recording request.

Voice recording data capture unit 222 requests corresponding voice recording data for obtaining voice recording.

Voiceprint extraction unit 223 is registered, for extracting corresponding vocal print feature from voice recording data as registration sound Line.

Embodiment 5

The present embodiment provides a computer readable storage medium, computer journey is stored on the computer readable storage medium Sequence realizes voice data base establishing method in embodiment 1, or realizes embodiment 3 when the computer program is executed by processor Middle voiceprint registration method, to avoid repeating, which is not described herein again.Alternatively, being realized when the computer program is executed by processor real The function of each module/unit in speech database creating device in example 2 is applied, or is realized in embodiment 4 in voiceprint registration device The function of each module/unit, to avoid repeating, which is not described herein again.

Embodiment 6

Fig. 8 is the schematic diagram for the terminal device that one embodiment of the invention provides.As shown in figure 8, the terminal of the embodiment is set Standby 80 include: processor 81, memory 82 and are stored in the computer journey that can be run in memory 82 and on processor 81 Sequence 83.The step of processor 81 realizes voice data base establishing method in above-described embodiment 1 when executing computer program 83, such as Step S11 to S14 shown in FIG. 1.Alternatively, processor 81 realizes each module/unit in embodiment 2 when executing computer program 83 Function, such as primary voice data shown in Fig. 4 obtain module 11, data preprocessing module 12, signal-to-noise ratio obtain 13 and of module Speech database indexes the function of establishing module 14.Alternatively, processor 81 realizes above-described embodiment 3 when executing computer program 83 The step of middle voiceprint registration method, such as step S21 to S25 shown in fig. 5.Alternatively, processor 81 executes computer program 83 The function of each module/unit in Shi Shixian embodiment 4, such as the module of voiceprint registration request shown in Fig. 7 21, target index obtain Modulus block 22, composite index obtain module 23, registration voice data obtains module 24 and register the function that vocal print obtains module 25.

Illustratively, computer program 83 can be divided into one or more module/units, one or more mould Block/unit is stored in memory 82, and is executed by processor 81, to complete the present invention.One or more module/units can To be the series of computation machine program instruction section that can complete specific function, the instruction segment is for describing computer program 83 at end Implementation procedure in end equipment 80.For example, computer program 83, which can be divided into primary voice data shown in Fig. 4, obtains mould Block 11, data preprocessing module 12, signal-to-noise ratio obtain module 13 and speech database index establishes module 14, each specific function of module It can will not repeat them here as described in Example 2.Alternatively, computer program 83 can be divided into vocal print note shown in fig. 6 Volume request module 21, target index obtain module 22, composite index obtains module 23, registration voice data obtains module 24 Module 25 is obtained with registration vocal print, each module concrete function is for example as described in Example 4, will not repeat them here.

Terminal device 80 can be desktop PC, notebook, palm PC and cloud server etc. and calculate equipment.Eventually End equipment may include, but be not limited only to, processor 81, memory 82.It will be understood by those skilled in the art that Fig. 8 is only eventually The example of end equipment 80 does not constitute the restriction to terminal device 80, may include components more more or fewer than diagram, or Combine certain components or different components, for example, terminal device can also include input-output equipment, network access equipment, Bus etc..

Alleged processor 81 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.

Memory 82 can be the internal storage unit of terminal device 80, such as the hard disk or memory of terminal device 80.It deposits Reservoir 82 is also possible to the plug-in type hard disk being equipped on the External memory equipment of terminal device 80, such as terminal device 80, intelligence Storage card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) Deng.Further, memory 82 can also both including terminal device 80 internal storage unit and also including External memory equipment.It deposits Reservoir 82 is for storing other programs and data needed for computer program and terminal device.Memory 82 can be also used for temporarily When store the data that has exported or will export.

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program generation Code can be source code form, object identification code form, executable file or certain intermediate forms etc..The computer-readable medium It may include: any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic that can carry the computer program code Dish, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described The content that computer-readable medium includes can carry out increasing appropriate according to the requirement made laws in jurisdiction with patent practice Subtract, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and Telecommunication signal.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of voice data base establishing method characterized by comprising

The primary voice data is pre-processed, efficient voice data are obtained；

Obtain the corresponding signal-to-noise ratio of the efficient voice data；

The efficient voice data are stored in speech database, and are the efficient voice number in the speech database It is indexed according to establishing, the index includes original user mark, voice collecting time and signal-to-noise ratio.

2. voice data base establishing method as described in claim 1, which is characterized in that it is described to the primary voice data into Row pretreatment, obtains efficient voice data, specifically includes:

Corresponding primary voice data is identified to each original user and is filtered processing and mute removal processing, obtains effective language Sound data.

3. voice data base establishing method as claimed in claim 2, which is characterized in that described to each original user mark pair The primary voice data answered is filtered processing, specifically includes:

Extract the vocal print feature that same original user identifies corresponding primary voice data；

Based on the vocal print feature, same original user is identified into corresponding primary voice data and uses k-means clustering algorithm Clustering is carried out, target's center's point is obtained；

Using distance algorithm, calculates same original user and identify corresponding each primary voice data and target's center's point Distance；

Remove same original user identify in corresponding each primary voice data be greater than at a distance from target's center's point away from Primary voice data from threshold value.

4. a kind of voiceprint registration method characterized by comprising

Based on the registration user identifier voice inquirement database, the original user to match with the registration user identifier is obtained Corresponding target index is identified, the speech database is using the described in any item speech database creations of claim 1-3 The speech database of method creation；

The voice collecting time indexed according to the current time, the target and signal-to-noise ratio obtain each target index Corresponding composite index；

5. voiceprint registration method as claimed in claim 4, which is characterized in that described according to the current time, the target The voice collecting time of index and signal-to-noise ratio obtain each target and index corresponding composite index, specifically include:

The voice collecting time indexed according to the current time, the target and signal-to-noise ratio, using composite index calculation formula It calculates each target and indexes corresponding composite index；

The composite index calculation formula are as follows:

Wherein, a is default weight, and 0≤a≤1.

6. voiceprint registration method as claimed in claim 4, which is characterized in that described to inquire language based on the registration user identifier Sound database, further includes:

If there is no the original users to match with the registration user identifier to identify in the speech database, voice is sent Recording request；

It obtains the voice recording and requests corresponding voice recording data；

Corresponding vocal print feature is extracted from the voice recording data as registration vocal print.

7. a kind of speech database creating device characterized by comprising

Primary voice data obtains module, and for obtaining primary voice data, the primary voice data includes original user mark Know and the voice collecting time；

Speech database index establishes module, for the efficient voice data to be stored in speech database, and is described The efficient voice data in speech database establish index, and the index includes original user mark, voice collecting time And signal-to-noise ratio.

8. a kind of voiceprint registration device characterized by comprising

Voiceprint registration request module, for obtaining voiceprint registration request, the voiceprint registration request includes registration user's mark Knowledge and current time；

Target index obtains module, for being based on the registration user identifier voice inquirement database, obtains and uses with the registration Family identifies corresponding original user and identifies corresponding target index, and the speech database is any using claim 1-3 The speech database of the item voice data base establishing method creation；

Composite index obtains module, voice collecting time and signal-to-noise ratio for being indexed according to the current time, the target, It obtains each target and indexes corresponding composite index；

It registers voice data and obtains module, index corresponding efficient voice data for choosing the highest target of composite index, make To register voice data；

9. a kind of terminal device, including memory, processor and storage are in the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claims 1 to 3 when executing the computer program The step of described in any item voice data base establishing methods；Alternatively, the processor is realized when executing the computer program The step of voiceprint registration methods as described in any item such as claim 4 to 6.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In the computer program realizes speech database creation side as described in any one of claims 1 to 3 when being executed by processor The step of method；Alternatively, realizing such as claim 4 to 6 described in any item vocal prints when the computer program is executed by processor The step of register method.