CN111222005B

CN111222005B - Voiceprint data reordering method and device, electronic equipment and storage medium

Info

Publication number: CN111222005B
Application number: CN202010018417.6A
Authority: CN
Inventors: 孙伟; 李永超; 方昕; 黄志华; 柳林
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2023-01-13
Anticipated expiration: 2040-01-08
Also published as: CN111222005A

Abstract

The application provides a method, a device, electronic equipment and a storage medium for reordering voiceprint data, wherein similar voiceprint data corresponding to target voiceprint data are obtained from a preset voiceprint database; calculating a first similarity score between the similar voiceprint data and the target voiceprint data; meanwhile, optimizing a target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and optimizing a similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set; then calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set; and finally, calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordering data of the similar voiceprint data. The complexity of the voiceprint data retrieval process can be reduced, and the accuracy of the voiceprint data reordering result is greatly improved.

Description

Voiceprint data reordering method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of data retrieval, and in particular, to a method and an apparatus for reordering voiceprint data, an electronic device, and a computer storage medium.

Background

With the development of technology, reordering has become a research focus, and the process of reordering the original search results based on the original search ordering by mining the internal association of data or by referring to external knowledge and manual intervention is a process of reordering the original search results. For example, in the field of image retrieval, in most image retrieval systems, a user provides query text, and the retrieval system returns a picture with high matching degree to the user by extracting text information from metadata attached to the picture to match with the query text. Therefore, the image reordering further optimizes the picture sequence by extracting the visual information of the pictures and analyzing the visual association between the pictures, thereby greatly improving the query performance.

In the conventional voiceprint data retrieval method, because the data volume contained in the database of the voiceprint data is too large, the conditions of insufficient memory, inaccurate retrieved result and the like can occur when the reordering method is directly adopted, and the accuracy of the voiceprint data retrieval result is greatly reduced.

Disclosure of Invention

Based on the above problems, the application provides a method, an apparatus, an electronic device and a storage medium for reordering voiceprint data, which can select a certain amount of similar voiceprint data from a voiceprint database based on target voiceprint data to reduce occupied memory, and then perform optimization processing based on a nearest neighbor set of the similar voiceprint data to finally obtain a reordering result, thereby avoiding the situation of insufficient memory during retrieval, and greatly improving the accuracy of the voiceprint data reordering result.

A first aspect of the embodiments of the present application provides a method for reordering voiceprint data, where the method includes:

acquiring similar voiceprint data corresponding to the target voiceprint data from a preset voiceprint database;

calculating a first similarity score between the similar voiceprint data and the target voiceprint data;

optimizing a target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and optimizing a similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set;

calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set;

and calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordering data of the similar voiceprint data.

A second aspect of the embodiments of the present application provides a voiceprint data reordering apparatus, where the apparatus includes a processing unit, where the processing unit is configured to:

A third aspect of embodiments of the present application provides an electronic device, comprising a processor, a memory, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps as described in any one of the first aspect of embodiments of the present application.

A fourth aspect of embodiments of the present application provides a computer storage medium storing a computer program comprising program instructions that, when executed by a processor, cause the processor to perform a method as described in any one of the first aspect of embodiments of the present application.

A fifth aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform some or all of the steps as described in any one of the methods of the first aspect of embodiments of the present application. The computer program product may be a software installation package.

By implementing the embodiment of the application, the following beneficial effects can be obtained:

firstly, acquiring similar voiceprint data corresponding to target voiceprint data from a preset voiceprint database; then, calculating a first similarity score between the similar voiceprint data and the target voiceprint data; meanwhile, optimizing a target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and optimizing a similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set; then calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set; and finally, calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordering data of the similar voiceprint data. The complexity of the voiceprint data retrieval process can be reduced, and the accuracy of the voiceprint data reordering result is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a system architecture diagram of a method for reordering voiceprint data according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for reordering voiceprint data according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a purification process according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of another purification process provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a purification process based on FIGS. 3 and 4 according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of an expansion processing method based on fig. 5 according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 8 is a block diagram illustrating functional units of a voiceprint data reordering apparatus according to an embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The electronic device according to the embodiments of the present application may be an electronic device with communication capability, and the electronic device may include various handheld devices with wireless communication function, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), mobile Stations (MS), terminal devices (terminal device), and so on.

Fig. 1 is a system architecture diagram of a method for reordering voiceprint data in an embodiment of the present invention, and includes a voice recognition unit 110, a preset voiceprint database 120, and a reordering unit 130, where the voice recognition unit 110 may be connected to the reordering unit 130, and configured to perform preprocessing on voice data, where the preprocessing may include noise reduction processing, feature extraction, and the like, obtain target voiceprint data of a target user through the preprocessing, and send the target voiceprint data to the reordering unit 130 for subsequent voiceprint data retrieval, and the reordering unit 130 may be a processor, and is provided with a related algorithm, and may be connected to the preset voiceprint database 120, and screen out a certain amount of similar voiceprint data from the preset voiceprint database 120 according to the obtained target voiceprint data, and reorder the similar voiceprint data and the target voiceprint data to obtain a reordering result. The reordering result can improve the accuracy of voiceprint data retrieval.

Through the system architecture, a certain amount of similar voiceprint data can be selected from the preset voiceprint database based on the target voiceprint data to reduce the occupied memory, optimization processing is carried out based on the nearest neighbor set of the similar voiceprint data, and the reordering result is finally obtained, so that the condition of insufficient memory during retrieval is avoided, and the accuracy of the voiceprint data reordering result is greatly improved.

Fig. 2 is a schematic flow chart of a method for reordering voiceprint data according to an embodiment of the present application, and specifically includes the following steps:

step 201, obtaining similar voiceprint data corresponding to the target voiceprint data from a preset voiceprint database.

The target voiceprint data is obtained by preprocessing original voice of a target user, and the preprocessing process specifically comprises the following steps: the method includes the steps of carrying out noise reduction processing and human voice separation on the original voice, then obtaining acoustic feature data of the original voice data through acoustic feature extraction, wherein the acoustic feature data can include Mel Frequency Cepstrum Coefficient (MFCC) or Perceptual Linear prediction Coefficient (PLP) and the like, then obtaining a mean value super Vector through a voice factor Vector (I-Vector) technology and combining the acoustic feature data, a preset mixed Gaussian model and a factor load matrix, and finally obtaining the target voiceprint data. The preset voiceprint database may be a variety of open source databases, such as Lucene (a full-text search engine toolkit of an open source code issued by the apache software foundation), elasticSearch (a highly extensible open source search engine based on Lucene), faces (an open source high-performance database developed by Facebook AI Research for similarity search and dense vector clustering), and the like, which is described in the present application by faces.

The cosine distance data between the target voiceprint data and the preset voiceprint data in the preset voiceprint database can be calculated, and the following formula can be specifically adopted:

d ₁ (p,g _i )＝1-cos(p,g _i )

wherein p represents the above target voiceprint data, g _i Represents the i-th preset voiceprint data of the i preset voiceprint data, the cos (p, g) mentioned above _i ) Representing the cosine similarity between the target voiceprint data and any one of the preset voiceprint data, d ₁ (p,g _i ) And representing the cosine distance between the target voiceprint data and any one of the preset voiceprint data, wherein the cosine distance data comprises the cosine distance between the target voiceprint data and each preset voiceprint data, and selecting M preset voiceprint data as the similar voiceprint data based on the cosine distance data, wherein the smaller the cosine distance value is, the higher the similarity between the target voiceprint data and the preset voiceprint data is, so that the first M preset voiceprint data can be selected as the similar voiceprint data according to the sequence of the cosine distances from small to large, and the value of M is not particularly limited.

It should be noted that the essence of the target voiceprint data and the preset voiceprint data is a feature vector.

By acquiring similar voiceprint data corresponding to the target voiceprint data from the preset voiceprint database, the preset voiceprint database can be subjected to library reduction, namely, the total data volume is reduced, so that the space complexity can be reduced, and the memory limitation of hardware equipment is avoided.

Step 202, obtaining a first similarity score between the similar voiceprint data and the target voiceprint data.

The M first similarity scores of the target voiceprint data and the M similar voiceprint data can be obtained based on the size of the cosine distance data, that is, M cosine distances between the target voiceprint data and the M similar voiceprint data are obtained one by one, and the M cosine distances are determined as the first similarity scores of the target voiceprint data and each similar voiceprint data, so as to finally obtain M first similarity scores.

By obtaining a first similarity score between the similar voiceprint data and the target voiceprint data, an initial ordering result can be obtained, and preparation is made for subsequent reordering.

Step 203, performing optimization processing on the target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and performing optimization processing on the similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set.

The target nearest neighbor set corresponding to the target voiceprint data may be constructed based on a hamming distance and a cosine distance, where K first similar samples are selected from M similar voiceprint data to construct the target nearest neighbor set, where K is less than or equal to M, and similarly, M similar nearest neighbor sets corresponding to the M similar voiceprint data may be constructed based on the hamming distance and the cosine distance, where K second similar samples exist in each similar nearest neighbor set, and M × K second similar samples exist in total, it is to be noted that the first similar samples and the second similar samples are also voiceprint feature vectors in nature, and the target nearest neighbor set and the similar nearest neighbor sets may have sample overlap, for example, the first similar samples of the target nearest neighbor set are { a, b, c, d }, the second similar samples of the first similar nearest neighbor set are { z, o, v, y }, the second similar samples of the second similar nearest neighbor sets are { a, b, v, w }, and the second similar samples of the first similar nearest neighbor set are no longer overlapped with the target nearest neighbor set, and the second similar samples of the target nearest neighbor set are no longer overlapped with the second nearest neighbor set { a, b, v, w, and the second similar samples of the target nearest neighbor set are no longer overlapped with the second nearest neighbor set.

For convenience of understanding, the method for constructing the nearest neighbor set is described in a unified manner, a sign function may be taken for each dimension of the target voiceprint data and the similar voiceprint data to obtain a binary code to form a hash index library, then a first number of similar samples are retrieved based on hamming distance, and then a second number of similar samples are selected from the first number of similar samples based on cosine distance to obtain the nearest neighbor set, so that the nearest neighbor set can be guaranteed to have higher precision.

After the target nearest neighbor set and the similar nearest neighbor set are obtained, the target nearest neighbor set and the similar nearest neighbor set may be subjected to a purification process to obtain a target purified nearest neighbor set and a similar purified nearest neighbor set.

Optionally, K first sample nearest neighbor sets corresponding to the K first similar samples may be obtained first, and at the same time, M × K second sample nearest neighbor sets corresponding to the M × K second similar samples are obtained, that is, nearest neighbor sets of the first similar samples and the second nearest neighbor samples are obtained again; then, calculating K first overlap ratios of the K first sample nearest neighbor sets and the target nearest neighbor set, and calculating M × K second overlap ratios of the M × K second sample nearest neighbor sets and the M similar nearest neighbor sets; then, screening out first similar samples of which the first coincidence degree is greater than a preset coincidence threshold value from the K first similar samples as first purified samples, and screening out M groups of second similar samples of which the second coincidence degree is greater than the preset coincidence threshold value from the M × K second similar samples as second purified samples; then, the first refined sample is retained to obtain the target refined nearest neighbor set, and the second refined sample is retained to obtain the M similar refined nearest neighbor sets.

For convenience of understanding, the following description is made with reference to fig. 3 for exemplifying the purification process of the target nearest neighbor set, and fig. 3 is a schematic diagram of a purification process method provided in this embodiment of the present application, and it can be seen that N (p, k) represents the above target nearest neighbor set, p represents the target voiceprint data, k represents the number of samples, where k is 4, which includes { b, c, d, g } as a first similar sample, N (b, k) of the first similar sample b is { e, f, c, g }, N (c, k) of the first similar sample c is { p, h, f, b }, N (d, k) of the first similar sample d is { o, t, z, y }, and N (g, k) of the first similar sample g is { s, l, h, d }. It can be seen that the overlap samples of N (b, k) and N (p, k) are c and g, and the first overlap ratio, i.e., the number of overlap samples, is 2; the number of overlapping samples of the N (c, k) and the N (p, k) is b, and the number of overlapping samples is 1; the number of superposed samples of the N (d, k) and the N (p, k) is 0; n (g, k) and N (p, k) are represented by d, and the number of the represented samples is 1; at this time, the preset overlap ratio threshold may be 2, and obviously, only the first sample nearest neighbor set corresponding to the first similar sample b satisfies the preset overlap ratio threshold, so b is saved, c, d, and g are deleted, and the target refined nearest neighbor set N is obtained ^* The first refined sample of (p, k) is { b }, it should be noted that the refining process on the target nearest neighbor set and the refining process on the similar nearest neighbor set are the same steps, and M similar refined nearest neighbor sets can be obtained in the same manner, which is not described herein again.

Optionally, another purification processing method also exists, and the K first sample nearest neighbor sets corresponding to the K first similar samples may be obtained first, and the M × K second sample nearest neighbor sets corresponding to the M × K second similar samples may be obtained first; then, screening out first similar samples corresponding to the first sample nearest neighbor set with the target voiceprint data in the K first sample nearest neighbor sets as first purified samples, and screening out M groups of second similar samples with corresponding similar voiceprint data in the MxK second sample nearest neighbor sets as second purified samples; and finally, reserving the first purification sample to obtain the target purification nearest neighbor set, and reserving the second purification sample to obtain the M similar purification nearest neighbor sets.

For convenience of understanding, another refinement processing procedure of the target nearest neighbor set is illustrated below with reference to fig. 4, and fig. 4 is a schematic diagram of another refinement processing method provided in this embodiment of the present application, where as shown in the drawing, N (p, k) represents the target nearest neighbor set, which includes the first similar sample and the first sample nearest neighbor set as described in fig. 3, and is not repeated here, it can be seen that only the target voiceprint data p exists in the first sample nearest neighbor set corresponding to the first similar sample c, that is, the first similar sample c and the target voiceprint data p are nearest neighbors to each other, and the first similar samples b, d, and g do not satisfy the condition that the first similar sample c and the target voiceprint data are nearest neighbors to each other, so c may be finally retained, b, d, and g may be deleted, and the obtained target refined nearest neighbor set N may be retained ^* The first refined sample of (p, k) is { c }. It should be noted that the purification processing on the target nearest neighbor set and the purification processing on the similar nearest neighbor set are the same step, and M similar purified nearest neighbor sets can be obtained in the same manner, which is not described herein again.

Alternatively, in some cases, a case where a positive sample is deleted only by using the purification processing method of fig. 3 or only by using the purification processing method of fig. 4 may be combined with the two purification processing methods, that is, a sample that should be retained after the purification processing, to screen out, as a first purified sample, a first similar sample of the K first similar samples whose first overlap ratio is greater than a preset overlap threshold, and to screen out, as a second purified sample, M groups of second similar samples whose second overlap ratio is greater than the preset overlap threshold, of the M × K second similar samples, at the same time, to screen out, as a first purified sample, a first similar sample corresponding to a first sample nearest neighbor set in which the target voiceprint data exists in the K first sample nearest neighbor sets, and to screen out, as a second purified sample, M groups of second similar samples in which the M × K second sample nearest neighbor sets exist corresponding to the target voiceprint data, and finally retain the first purified sample to obtain the target purified nearest neighbor set, and retain the second similar sample to obtain M groups of the second purified nearest neighbor sets.

For convenience of understanding, another purification processing method according to the embodiment of the present application is illustrated with reference to fig. 5, and fig. 5 is a schematic diagram of a purification processing method based on fig. 3 and 4 provided by the embodiment of the present application, and since the first similar sample b is a positive sample in fig. 3 and the first similar sample c is a positive sample in fig. 4, both b and c can be taken as positive samples in combination with the purification processing methods of fig. 3 and 4, and the final target purification nearest neighbor set N is obtained ^* The first refinement sample of (p, k) is { b, c }, and it should be noted that the refinement processing on the target nearest neighbor set and the refinement processing on the similar nearest neighbor set are the same steps, and M similar refined nearest neighbor sets can be obtained in the same manner, which is not described herein again.

Therefore, the interference of the negative sample, namely the sample data which should not be reserved, can be eliminated through purification treatment, the accuracy of the nearest neighbor set is greatly improved, and the accuracy of subsequent retrieval is also improved.

After obtaining the target refined nearest neighbor set and the similar refined nearest neighbor sets, the target refined nearest neighbor set may be expanded based on the K first similar samples to obtain a target optimized nearest neighbor set, and the M similar refined nearest neighbor sets may be expanded based on the mxk second similar samples to obtain M similar optimized nearest neighbor sets.

Specifically, first, a first refined nearest neighbor set corresponding to the first refined sample is obtained, and M groups of second refined nearest neighbor sets corresponding to the second refined sample are obtained; then, obtaining a first number of coincident samples between said first refined nearest neighbor set and said target refined nearest neighbor set, and obtaining M second numbers of coincident samples between said M second refined nearest neighbor sets and said similar refined nearest neighbor set; screening out first purification similar samples corresponding to the first purification nearest neighbor set with the first coincident sample number larger than a preset coincident sample number threshold value as first expansion samples, and screening out M groups of second purification similar samples corresponding to M groups of second purification nearest neighbor sets with the M groups of first coincident sample numbers larger than a preset coincident sample number threshold value as second expansion samples; finally, a union set of the first extended sample and the first refined sample is used to obtain the target optimization nearest neighbor set, and a union set of the second extended sample and the second refined sample is used to obtain the M similar optimization nearest neighbor sets.

For convenience of understanding, the extension processing of the target refinement nearest neighbor notation is exemplified with reference to fig. 6, and fig. 6 is a schematic diagram of an extension processing method based on fig. 5 provided in this embodiment of the present application, it should be noted that, the number of samples included in the first refinement nearest neighbor set may be set by itself, and since the number of first refinement samples of the target refinement nearest neighbor set is 2, the first refinement nearest neighbor set may also be set to take a k/2 nearest neighbor set, that is, the number of samples included in the first refinement nearest neighbor set is also 2, and it can be seen that N is ^* The first refined sample of (p, k) is { b, c }, and the first refined sample b corresponds to the first refined nearest neighbor set N ^* (b, k/2) includes the first refined similar sample as { c, f }, where the first refined similar sample c corresponds to the first refined similar sample N ^* (c, k/2) is { p, f }, where N is ^* (b, k/2) and N ^* There are coincident samples c between (p, k), the number of coincident samples being 1,N ^* (c, k/2) and N ^* The number of coincident samples between (p, k) is 0, the preset threshold number of coincident samples is 1, and N is ^* (b, k/2) and N ^* And (p, k) the number of coincident samples between the (p, k) is greater than or equal to the preset threshold number of coincident samples, so that the { c, f } is a first extended sample, and the target optimization nearest neighbor set N' (p, k) obtained by merging the { c, f } with the { b, c } is { b, c, f }. It should be noted that the expansion process for the target nearest neighbor set and the expansion process for the similar nearest neighbor set are the same step, and M similarly optimized nearest neighbor sets can be obtained in the same way, and for convenience of description, the M similarly optimized nearest neighbor sets are usedIs denoted as N' (g) _M K), which will not be described in detail herein.

Therefore, samples which are not in the original nearest neighbor set can be obtained through the expansion processing, the accuracy of the nearest neighbor samples is improved, and the accuracy of subsequent retrieval is also improved.

And 204, calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set.

Wherein M second similarity scores between the target optimized nearest neighbor set and the M similar optimized nearest neighbor sets may be calculated based on Jacard distances, jack-card distances d between the target optimized nearest neighbor set and the M similar optimized nearest neighbor sets ₂ (p,g _M ) The calculation formula of (2) is as follows:

J(p,g _M ) Denotes the Jacard similarity index, g _M Represents any one of the M similar voiceprint data, N (g) _M And k) represents a similar optimization nearest neighbor set corresponding to any similar voiceprint data.

Optionally, to reduce the amount of computation, the target optimized nearest neighbor set and the similar optimized nearest neighbor set may be encoded into vectors, and a gaussian kernel function is used to obtain:

thus, the calculation amount of the Jacard distance can be simplified:

and obtaining M second similarity scores between the target optimization nearest neighbor set and the M similar optimization nearest neighbor sets.

By calculating a second similarity score between the target optimized nearest neighbor set and the similar optimized nearest neighbor set, the similarity between nearest neighbor sets can be determined, so that a second sort result can be obtained according to the similarity between sets, and preparation is made for subsequent reordering.

Step 205, calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordered data of the similar voiceprint data.

First, a first weight corresponding to the M first similarity scores and a second weight corresponding to the M second similarity scores may be obtained based on a weighted average algorithm, and then M third similarity scores may be obtained through calculation according to the first weight, the second weight, the M first similarity scores, and the M second similarity scores; and finally, reordering the target voiceprint data and the similar voiceprint data according to the magnitude of the M third similarity scores to obtain the reordered data.

Specifically, the third similarity score is:

d(p,g _M )＝λd ₁ (p,g _M )+(1-λ)d ₂ (p,g _M )

alternatively, λ may be 0.6, the first weight may be 0.6, and the second weight may be 0.4.

After the M third similarity scores are obtained through calculation, reordering can be performed according to the score values, and reordering results are obtained.

Through the steps, a certain amount of similar voiceprint data can be selected from the voiceprint database based on the target voiceprint data to reduce the occupied memory, optimization processing is performed based on the nearest neighbor set of the similar voiceprint data, the reordering result is finally obtained, the condition of insufficient memory during retrieval is avoided, and the accuracy of the voiceprint data reordering result is greatly improved.

Fig. 7 is a schematic structural diagram of an electronic device 700 according to an embodiment of the present application, and includes an application processor 701, a communication interface 702, and a memory 703, where the application processor 701, the communication interface 702, and the memory 703 are connected to each other through a bus 704, and the bus 704 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, for example. The bus 704 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus. Wherein the memory 703 is configured to store a computer program comprising program instructions, and the application processor 701 is configured to call the program instructions to perform the method of: acquiring similar voiceprint data corresponding to the target voiceprint data from a preset voiceprint database; obtaining a first similarity score between the similar voiceprint data and the target voiceprint data; optimizing a target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and optimizing a similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set; calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set; and calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordering data of the similar voiceprint data.

In a possible embodiment, in the aspect of obtaining similar voiceprint data corresponding to the target voiceprint data from the preset voiceprint database, the instructions in the program are specifically configured to perform the following operations: calculating cosine distance data between the target voiceprint data and preset voiceprint data in the preset voiceprint database; and selecting M pieces of preset voiceprint data as the similar voiceprint data based on the size of the cosine distance data, wherein M is a positive integer.

In one possible embodiment, in said calculating a first similarity score between said similar voiceprint data and said target voiceprint data, the instructions in said program are specifically adapted to perform the following operations: and obtaining M first similarity scores of the target voiceprint data and the M similar voiceprint data based on the size of the cosine distance data.

In a possible embodiment, in the aspect that the optimization processing is performed on the target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and the optimization processing is performed on the similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set, the instructions in the program are specifically configured to perform the following operations: retrieving a target nearest neighbor set corresponding to the target voiceprint data based on a hamming distance and a cosine distance, and retrieving M similar nearest neighbor sets corresponding to the M similar voiceprint data based on the hamming distance and the cosine distance, wherein the target nearest neighbor set comprises K first similar samples, the M similar nearest neighbor sets comprise M × K second similar samples, and K is a positive integer; performing purification processing on the target nearest neighbor set based on the K first similar samples to obtain a target purified nearest neighbor set, and performing purification processing on the M similar nearest neighbor sets based on the MxK second similar samples to obtain M similar purified nearest neighbor sets; and performing expansion processing on the target purification nearest neighbor set based on the K first similar samples to obtain a target optimization nearest neighbor set, and performing expansion processing on the M similar purification nearest neighbor sets based on the M multiplied by K second similar samples to obtain M similar optimization nearest neighbor sets.

In a possible embodiment, in the aspect that the refining the target nearest neighbor set based on the K first similar samples obtains a target refined nearest neighbor set, and the refining the M similar nearest neighbor sets based on the mxk second similar samples obtains M similar refined nearest neighbor sets, the instructions in the program are specifically configured to perform the following operations: acquiring K first sample nearest neighbor sets corresponding to the K first similar samples, and acquiring M multiplied by K second sample nearest neighbor sets corresponding to the M multiplied by K second similar samples; calculating K first degrees of coincidence of the K first sample nearest neighbor sets with the target nearest neighbor set, and M × K second degrees of coincidence of the M × K second sample nearest neighbor sets with the M similar nearest neighbor sets; screening out first similar samples of which the first coincidence degrees are greater than a preset coincidence threshold value from the K first similar samples as first purified samples, and screening out M groups of second similar samples of which the second coincidence degrees are greater than the preset coincidence threshold value from the M multiplied by K second similar samples as second purified samples; retaining the first refined sample to obtain the target refined nearest neighbor set, and retaining the second refined sample to obtain the M similar refined nearest neighbor sets.

In a possible embodiment, in the aspect that the refining the target nearest neighbor set based on the K first similar samples obtains a target refined nearest neighbor set, and the refining the M similar nearest neighbor sets based on the mxk second similar samples obtains M similar refined nearest neighbor sets, the instructions in the program are specifically configured to perform the following operations: acquiring K first sample nearest neighbor sets corresponding to the K first similar samples, and acquiring M multiplied by K second sample nearest neighbor sets corresponding to the M multiplied by K second similar samples; screening out a first similar sample corresponding to a first sample nearest neighbor set with the target voiceprint data in the K first sample nearest neighbor sets as a first purified sample, and screening out M groups of second similar samples with corresponding similar voiceprint data in the MxK second sample nearest neighbor sets as second purified samples; retaining the first refined sample to obtain the target refined nearest neighbor set, and retaining the second refined sample to obtain the M similar refined nearest neighbor sets.

In one possible embodiment, in terms of the expanding the target refined nearest neighbor set based on the K first similar samples resulting in a target optimized nearest neighbor set, and the expanding the M similar refined nearest neighbor sets based on the mxk second similar samples resulting in M similar optimized nearest neighbor sets, the instructions in the program are specifically configured to perform the following operations: acquiring a first purification nearest neighbor set corresponding to the first purification sample, and acquiring M groups of second purification nearest neighbor sets corresponding to the second purification sample; obtaining a first number of coincident samples between the first refined nearest neighbor set and the target refined nearest neighbor set, and obtaining M sets of second number of coincident samples between the M sets of second refined nearest neighbor sets and the similar refined nearest neighbor set; screening out first purification similar samples corresponding to first purification nearest neighbor sets with the first coincident sample number larger than a preset coincident sample number threshold value as first extended samples, and screening out M groups of second purification similar samples corresponding to M groups of second purification nearest neighbor sets with the M groups of first coincident sample numbers larger than a preset coincident sample number threshold value as second extended samples; and merging the first extended sample and the first purified sample to obtain the target optimization nearest neighbor set, and merging the second extended sample and the second purified sample to obtain the M similar optimization nearest neighbor sets.

In one possible embodiment, in said calculating a second similarity score between said target optimized nearest neighbor set and said similar optimized nearest neighbor set, instructions in said program are specifically for performing the following: calculating M second similarity scores between the target optimized nearest neighbor set and the M similar optimized nearest neighbor sets based on Jacard distances.

In one possible embodiment, in the aspect that the calculation based on the first similarity score and the second similarity score obtains the target voiceprint data and the re-ordered data of the similar voiceprint data, the instructions in the program are specifically configured to perform the following operations: acquiring first weights corresponding to the M first similarity scores and second weights corresponding to the M second similarity scores based on a weighted average algorithm; calculating according to the first weight, the second weight, the M first similarity scores and the M second similarity scores to obtain M third similarity scores; and reordering the target voiceprint data and the similar voiceprint data according to the magnitude of the M third similarity scores to obtain reordered data.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments provided herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that, in the embodiment of the present application, the division of the unit is schematic, and is only one logic function division, and when the actual implementation is realized, another division manner may be provided.

Fig. 8 is a block diagram illustrating functional units of a voiceprint data reordering apparatus 800 according to an embodiment of the present disclosure. The voiceprint data reordering device 800 is applied to an electronic device and comprises a processing unit 801, a communication unit 802 and a storage unit 803, wherein the processing unit 801 is used for executing any step in the method embodiments, and when data transmission such as sending is executed, the communication unit 802 can be optionally called to complete corresponding operation. The details will be described below.

The processing unit 801 is configured to obtain similar voiceprint data corresponding to the target voiceprint data from a preset voiceprint database; acquiring a first similarity score between the similar voiceprint data and the target voiceprint data; optimizing a target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and optimizing a similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set; calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set; and calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordering data of the similar voiceprint data.

In a possible embodiment, in the aspect of obtaining similar voiceprint data corresponding to the target voiceprint data from the preset voiceprint database, the processing unit 801 is specifically configured to: calculating cosine distance data between the target voiceprint data and preset voiceprint data in the preset voiceprint database; and selecting M pieces of preset voiceprint data as the similar voiceprint data based on the size of the cosine distance data, wherein M is a positive integer.

In one possible embodiment, in terms of the calculating the first similarity score between the similar voiceprint data and the target voiceprint data, the processing unit 801 is specifically configured to: and obtaining M first similarity scores of the target voiceprint data and the M similar voiceprint data based on the size of the cosine distance data.

In a possible embodiment, in terms of performing optimization processing on the target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and performing optimization processing on the similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set, the processing unit 801 is specifically configured to: retrieving a target nearest neighbor set corresponding to the target voiceprint data based on a hamming distance and a cosine distance, and retrieving M similar nearest neighbor sets corresponding to the M similar voiceprint data based on the hamming distance and the cosine distance, wherein the target nearest neighbor set comprises K first similar samples, the M similar nearest neighbor sets comprise M × K second similar samples, and K is a positive integer; carrying out purification treatment on the target nearest neighbor set based on the K first similar samples to obtain a target purified nearest neighbor set, and carrying out purification treatment on the M similar nearest neighbor sets based on the M multiplied by K second similar samples to obtain M similar purified nearest neighbor sets; and performing expansion processing on the target purification nearest neighbor set based on the K first similar samples to obtain a target optimization nearest neighbor set, and performing expansion processing on the M similar purification nearest neighbor sets based on the M multiplied by K second similar samples to obtain M similar optimization nearest neighbor sets.

In a possible embodiment, in terms of the refining the target nearest neighbor set based on the K first similar samples to obtain a target refined nearest neighbor set, and the refining the M similar nearest neighbor sets based on the mxk second similar samples to obtain M similar refined nearest neighbor sets, the processing unit 801 is specifically configured to: acquiring K first sample nearest neighbor sets corresponding to the K first similar samples, and acquiring M multiplied by K second sample nearest neighbor sets corresponding to the M multiplied by K second similar samples; calculating K first degrees of overlap of the K first sample nearest neighbor sets with the target nearest neighbor set, and, M × K second degrees of overlap of the M × K second sample nearest neighbor sets with the M similar nearest neighbor sets; screening out first similar samples of which the first coincidence degrees are greater than a preset coincidence threshold value from the K first similar samples as first purified samples, and screening out M groups of second similar samples of which the second coincidence degrees are greater than the preset coincidence threshold value from the M multiplied by K second similar samples as second purified samples; retaining the first refined sample to obtain the target refined nearest neighbor set, and retaining the second refined sample to obtain the M similar refined nearest neighbor sets.

In a possible embodiment, in terms of the refining the target nearest neighbor set based on the K first similar samples to obtain a target refined nearest neighbor set, and the refining the M similar nearest neighbor sets based on the mxk second similar samples to obtain M similar refined nearest neighbor sets, the processing unit 801 is specifically configured to: acquiring K first sample nearest neighbor sets corresponding to the K first similar samples, and acquiring M multiplied by K second sample nearest neighbor sets corresponding to the M multiplied by K second similar samples; screening out a first similar sample corresponding to a first sample nearest neighbor set with the target voiceprint data in the K first sample nearest neighbor sets as a first purified sample, and screening out M groups of second similar samples with corresponding similar voiceprint data in the MxK second sample nearest neighbor sets as second purified samples; retaining the first refined sample to obtain the target refined nearest neighbor set, and retaining the second refined sample to obtain the M similar refined nearest neighbor sets.

In a possible embodiment, in terms of the expanding the target refined nearest neighbor set based on the K first similar samples to obtain a target optimized nearest neighbor set, and the expanding the M similar refined nearest neighbor sets based on the mxk second similar samples to obtain M similar optimized nearest neighbor sets, the processing unit 801 is specifically configured to: acquiring a first purification nearest neighbor set corresponding to the first purification sample, and acquiring M groups of second purification nearest neighbor sets corresponding to the second purification sample; obtaining a first number of coincident samples between the first refined nearest neighbor set and the target refined nearest neighbor set, and obtaining M sets of second numbers of coincident samples between the M sets of second refined nearest neighbor sets and the similar refined nearest neighbor set; screening out first purification similar samples corresponding to the first purification nearest neighbor set with the first coincident sample number larger than a preset coincident sample number threshold value as first expansion samples, and screening out M groups of second purification similar samples corresponding to M groups of second purification nearest neighbor sets with the M groups of first coincident sample numbers larger than a preset coincident sample number threshold value as second expansion samples; and merging the first extended sample and the first purified sample to obtain the target optimization nearest neighbor set, and merging the second extended sample and the second purified sample to obtain the M similar optimization nearest neighbor sets.

In a possible embodiment, in terms of said calculating a second similarity score between said target optimized nearest neighbor set and said similar optimized nearest neighbor set, said processing unit 801 is specifically configured to: calculating M second similarity scores between the target optimized nearest neighbor set and the M similar optimized nearest neighbor sets based on Jacard distances.

In a possible embodiment, in terms of obtaining the re-ranking data of the target voiceprint data and the similar voiceprint data by performing the calculation based on the first similarity score and the second similarity score, the processing unit 801 is specifically configured to: acquiring first weights corresponding to the M first similarity scores and second weights corresponding to the M second similarity scores based on a weighted average algorithm; calculating according to the first weight, the second weight, the M first similarity scores and the M second similarity scores to obtain M third similarity scores; and reordering the target voiceprint data and the similar voiceprint data according to the M third similarity scores to obtain reordered data.

Embodiments of the present application further provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enables a computer to execute part or all of the steps of any one of the methods as described in the above method embodiments, and the computer includes an electronic device.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the above methods of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps of the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, the memory including: flash Memory disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.

The above description of the embodiments is only for the purpose of helping to understand the method of the present application and its core ideas; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of reordering voiceprint data, the method comprising:

obtaining a first similarity score between the similar voiceprint data and the target voiceprint data;

purifying a target nearest neighbor set corresponding to the target voiceprint data, expanding a result obtained by the purification to obtain a target optimized nearest neighbor set, purifying a similar nearest neighbor set corresponding to the similar voiceprint data, and expanding a result obtained by the purification to obtain a similar optimized nearest neighbor set;

calculating a second similarity score between the target optimized nearest neighbor set and the similar optimized nearest neighbor set;

2. The method according to claim 1, wherein the obtaining similar voiceprint data corresponding to the target voiceprint data from the preset voiceprint database comprises:

calculating cosine distance data between the target voiceprint data and preset voiceprint data in the preset voiceprint database;

and selecting M pieces of preset voiceprint data as the similar voiceprint data based on the size of the cosine distance data, wherein M is a positive integer.

3. The method of claim 2, wherein said calculating a first similarity score between the similar voiceprint data and the target voiceprint data comprises:

and obtaining M first similarity scores of the target voiceprint data and the M similar voiceprint data based on the size of the cosine distance data.

4. The method according to claim 2, wherein the purifying the target nearest neighbor set corresponding to the target voiceprint data and expanding the result obtained by the purifying to obtain a target optimized nearest neighbor set, and the purifying the similar nearest neighbor set corresponding to the similar voiceprint data and expanding the result obtained by the purifying to obtain a similar optimized nearest neighbor set, comprises:

retrieving a target nearest neighbor set corresponding to the target voiceprint data based on a hamming distance and a cosine distance, and retrieving M similar nearest neighbor sets corresponding to the M similar voiceprint data based on the hamming distance and the cosine distance, wherein the target nearest neighbor set comprises K first similar samples, the M similar nearest neighbor sets comprise M × K second similar samples, and K is a positive integer;

carrying out purification treatment on the target nearest neighbor set based on the K first similar samples to obtain a target purified nearest neighbor set, and carrying out purification treatment on the M similar nearest neighbor sets based on the M multiplied by K second similar samples to obtain M similar purified nearest neighbor sets;

and performing expansion processing on the target purification nearest neighbor set based on the K first similar samples to obtain a target optimization nearest neighbor set, and performing expansion processing on the M similar purification nearest neighbor sets based on the M multiplied by K second similar samples to obtain M similar optimization nearest neighbor sets.

5. The method of claim 4, wherein the refining the target nearest neighbor set based on the K first similar samples yields a target refined nearest neighbor set, and wherein the refining the M similar nearest neighbor sets based on the MXK second similar samples yields M similar refined nearest neighbor sets, comprising:

acquiring K first sample nearest neighbor sets corresponding to the K first similar samples, and acquiring M multiplied by K second sample nearest neighbor sets corresponding to the M multiplied by K second similar samples;

calculating K first degrees of overlap of the K first sample nearest neighbor sets with the target nearest neighbor set, and, M × K second degrees of overlap of the M × K second sample nearest neighbor sets with the M similar nearest neighbor sets;

screening out first similar samples of which the first coincidence degrees are greater than a preset coincidence threshold value from the K first similar samples as first purified samples, and screening out M groups of second similar samples of which the second coincidence degrees are greater than the preset coincidence threshold value from the M multiplied by K second similar samples as second purified samples;

retaining the first refined sample to obtain the target refined nearest neighbor set, and retaining the second refined sample to obtain the M similar refined nearest neighbor sets.

6. The method of claim 4, wherein the refining the target nearest neighbor set based on the K first similar samples to obtain a target refined nearest neighbor set, and wherein the refining the M similar nearest neighbor sets based on the M x K second similar samples to obtain M similar refined nearest neighbor sets comprises:

screening out a first similar sample corresponding to a first sample nearest neighbor set with the target voiceprint data in the K first sample nearest neighbor sets as a first purified sample, and screening out M groups of second similar samples with corresponding similar voiceprint data in the MxK second sample nearest neighbor sets as second purified samples;

7. The method of claim 5 or 6, wherein the expanding the target refined nearest neighbor set based on the K first similar samples results in a target optimized nearest neighbor set, and wherein the expanding the M similar refined nearest neighbor sets based on the MXK second similar samples results in M similar optimized nearest neighbor sets, comprising:

acquiring a first purification nearest neighbor set corresponding to the first purification sample, and acquiring M groups of second purification nearest neighbor sets corresponding to the second purification sample;

obtaining a first number of coincident samples between the first refined nearest neighbor set and the target refined nearest neighbor set, and obtaining M sets of second number of coincident samples between each of the M sets of second refined nearest neighbor sets and the similar refined nearest neighbor set;

screening out first purification similar samples corresponding to the first purification nearest neighbor set with the first coincident sample number larger than a preset coincident sample number threshold value as first expansion samples, and screening out M groups of second purification similar samples corresponding to M groups of second purification nearest neighbor sets with the M groups of first coincident sample numbers larger than a preset coincident sample number threshold value as second expansion samples;

and merging the first extended sample and the first purified sample to obtain the target optimization nearest neighbor set, and merging the second extended sample and the second purified sample to obtain the M similar optimization nearest neighbor sets.

8. The method of claim 2, wherein the calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set comprises:

calculating M second similarity scores between the target optimized nearest neighbor set and the M similar optimized nearest neighbor sets based on Jacard distances.

9. The method according to any one of claims 2 to 8, wherein the calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the re-ranking data of the similar voiceprint data comprises:

acquiring first weights corresponding to the M first similarity scores and second weights corresponding to the M second similarity scores based on a weighted average algorithm;

calculating according to the first weight, the second weight, the M first similarity scores and the M second similarity scores to obtain M third similarity scores;

and reordering the target voiceprint data and the similar voiceprint data according to the magnitude of the M third similarity scores to obtain reordered data.

10. An apparatus for reordering voiceprint data, the apparatus comprising a processing unit configured to:

11. An electronic device comprising an application processor, a memory, and one or more programs stored in the memory and configured to be executed by the application processor, the programs comprising instructions for performing the steps of the method of any of claims 1-9.

12. A computer storage medium, characterized in that it stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 9.