CN111222005A

CN111222005A - Voiceprint data reordering method and device, electronic equipment and storage medium

Info

Publication number: CN111222005A
Application number: CN202010018417.6A
Authority: CN
Inventors: 孙伟; 李永超; 方昕; 黄志华; 柳林
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2020-06-02
Anticipated expiration: 2040-01-08
Also published as: CN111222005B

Abstract

The application provides a method and a device for reordering voiceprint data, electronic equipment and a storage medium, and the method comprises the steps of firstly obtaining similar voiceprint data corresponding to target voiceprint data from a preset voiceprint database; calculating a first similarity score between the similar voiceprint data and the target voiceprint data; meanwhile, optimizing a target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and optimizing a similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set; then calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set; and finally, calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordering data of the similar voiceprint data. The complexity of the voiceprint data retrieval process can be reduced, and the accuracy of the voiceprint data reordering result is greatly improved.

Description

Voiceprint data reordering method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of data retrieval, and in particular, to a method and an apparatus for reordering voiceprint data, an electronic device, and a computer storage medium.

Background

With the development of technology, reordering, i.e. the process of reordering original search results by mining the internal association of data or by referring to external knowledge and manual intervention, has become a research hotspot problem. For example, in the field of image retrieval, in most image retrieval systems, a user provides query text, and the retrieval system returns a picture with high matching degree to the user by extracting text information from metadata attached to the picture to match with the query text. Therefore, the image reordering further optimizes the picture sequence by extracting the visual information of the pictures and analyzing the visual association between the pictures, thereby greatly improving the query performance.

In the conventional voiceprint data retrieval method, due to the fact that the data volume contained in the database of the voiceprint data is too large, the conditions of insufficient memory, inaccurate retrieved result and the like can occur when the reordering method is directly adopted, and the accuracy of the voiceprint data retrieval result is greatly reduced.

Disclosure of Invention

Based on the above problems, the application provides a method, an apparatus, an electronic device and a storage medium for reordering voiceprint data, which can select a certain amount of similar voiceprint data from a voiceprint database based on target voiceprint data to reduce occupied memory, and then perform optimization processing based on a nearest neighbor set of the similar voiceprint data to finally obtain a reordering result, thereby avoiding the situation of insufficient memory during retrieval, and greatly improving the accuracy of the voiceprint data reordering result.

A first aspect of the embodiments of the present application provides a method for reordering voiceprint data, where the method includes:

acquiring similar voiceprint data corresponding to the target voiceprint data from a preset voiceprint database;

calculating a first similarity score between the similar voiceprint data and the target voiceprint data;

optimizing a target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and optimizing a similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set;

calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set;

and calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordering data of the similar voiceprint data.

A second aspect of the embodiments of the present application provides a voiceprint data reordering apparatus, where the apparatus includes a processing unit, where the processing unit is configured to:

A third aspect of embodiments of the present application provides an electronic device, comprising a processor, a memory, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps as described in any one of the first aspect of embodiments of the present application.

A fourth aspect of embodiments of the present application provides a computer storage medium storing a computer program comprising program instructions that, when executed by a processor, cause the processor to perform a method as described in any one of the first aspect of embodiments of the present application.

A fifth aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform some or all of the steps as described in any one of the methods of the first aspect of embodiments of the present application. The computer program product may be a software installation package.

By implementing the embodiment of the application, the following beneficial effects can be obtained:

firstly, acquiring similar voiceprint data corresponding to target voiceprint data from a preset voiceprint database; then, calculating a first similarity score between the similar voiceprint data and the target voiceprint data; meanwhile, optimizing a target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and optimizing a similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set; then calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set; and finally, calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordering data of the similar voiceprint data. The complexity of the voiceprint data retrieval process can be reduced, and the accuracy of the voiceprint data reordering result is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a system architecture diagram of a method for reordering voiceprint data according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for reordering voiceprint data according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a purification process according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of another purification process provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a purification process based on FIGS. 3 and 4 according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of an expansion processing method according to an embodiment of the present application, based on fig. 5;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 8 is a block diagram illustrating functional units of a voiceprint data reordering apparatus according to an embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The electronic device according to the embodiments of the present application may be an electronic device with communication capability, and the electronic device may include various handheld devices with wireless communication function, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), Mobile Stations (MS), terminal devices (terminal device), and so on.

Referring to fig. 1, a system architecture of a voiceprint data reordering method in an embodiment of the present invention is described in detail, where fig. 1 is a system architecture diagram of a voiceprint data reordering method provided in an embodiment of the present invention, and includes a voice recognition unit 110, a preset voiceprint database 120, and a reordering unit 130, where the voice recognition unit 110 may be connected to the reordering unit 130 for performing preprocessing on voice data, where the preprocessing may include noise reduction processing, feature extraction, and the like, target voiceprint data of a target user may be obtained through the preprocessing, and the target voiceprint data is sent to the reordering unit 130 for subsequent voiceprint data retrieval, the reordering unit 130 may be a processor, has a built-in correlation algorithm, may be connected to the preset voiceprint database 120, and selects a certain amount of similar voiceprint data from the preset voiceprint database 120 according to the obtained target voiceprint data, and reordering the similar voiceprint data and the target voiceprint data to obtain a reordered result. The reordering result can improve the accuracy of voiceprint data retrieval.

Through the system architecture, a certain amount of similar voiceprint data can be selected from the preset voiceprint database based on the target voiceprint data to reduce the occupied memory, optimization processing is carried out based on the nearest neighbor set of the similar voiceprint data, and the reordering result is finally obtained, so that the condition of insufficient memory during retrieval is avoided, and the accuracy of the voiceprint data reordering result is greatly improved.

Fig. 2 is a schematic flow chart of a method for reordering voiceprint data according to an embodiment of the present application, and specifically includes the following steps:

step 201, obtaining similar voiceprint data corresponding to the target voiceprint data from a preset voiceprint database.

The target voiceprint data is obtained by preprocessing original voice of a target user, and the preprocessing process specifically comprises the following steps: the method includes the steps of carrying out noise reduction processing and human voice separation on the original voice, then obtaining acoustic feature data of the original voice data through acoustic feature extraction, wherein the acoustic feature data can include Mel Frequency Cepstrum Coefficient (MFCC) or Perceptual linear prediction Coefficient (PLP) and the like, then obtaining a mean value super Vector through a voice factor Vector (I-Vector) technology and combining the acoustic feature data, a preset mixed Gaussian model and a factor load matrix, and finally obtaining the target voiceprint data. The preset voiceprint database may be a variety of open source databases, such as Lucene (a full text search engine toolkit of an open source code published by apache software foundation), ElasticSearch (a highly extensible open source search engine based on Lucene), and faces (an open source high performance library developed by Facebook AI Research for similarity search and dense vector clustering), which are described in the present application by faces.

The cosine distance data between the target voiceprint data and the preset voiceprint data in the preset voiceprint database can be calculated, and the following formula can be specifically adopted:

d₁(p,g_i)＝1-cos(p,g_i)

wherein p represents the above target voiceprint data, g_iRepresents the ith preset voiceprint data of the i preset voiceprint data, the cos (p, g)_i) Representing the cosine similarity between the target voiceprint data and any one of the preset voiceprint data, d₁(p,g_i) And representing the cosine distance between the target voiceprint data and any one of the preset voiceprint data, wherein the cosine distance data comprises the cosine distance between the target voiceprint data and each preset voiceprint data, and selecting M preset voiceprint data as the similar voiceprint data based on the cosine distance data, wherein the smaller the cosine distance value is, the higher the similarity between the target voiceprint data and the preset voiceprint data is, so that the first M preset voiceprint data can be selected as the similar voiceprint data according to the sequence of the cosine distances from small to large, and the value of M is not particularly limited.

It should be noted that the essence of the target voiceprint data and the preset voiceprint data is a feature vector.

By acquiring similar voiceprint data corresponding to the target voiceprint data from the preset voiceprint database, the preset voiceprint database can be subjected to library reduction, namely, the total data volume is reduced, so that the space complexity can be reduced, and the memory limitation of hardware equipment is avoided.

Step 202, obtaining a first similarity score between the similar voiceprint data and the target voiceprint data.

The M first similarity scores of the target voiceprint data and the M similar voiceprint data can be obtained based on the size of the cosine distance data, that is, M cosine distances between the target voiceprint data and the M similar voiceprint data are obtained one by one, and the M cosine distances are determined as the first similarity scores of the target voiceprint data and each similar voiceprint data, so as to finally obtain M first similarity scores.

By obtaining a first similarity score between the similar voiceprint data and the target voiceprint data, an initial ordering result can be obtained, and preparation is made for subsequent reordering.

Step 203, performing optimization processing on the target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and performing optimization processing on the similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set.

Wherein, a target nearest neighbor set corresponding to the target voiceprint data can be constructed based on hamming distance and cosine distance, where K first similar samples are selected from M similar voiceprint data to construct the target nearest neighbor set, where K is less than or equal to M, and similarly, M similar nearest neighbor sets corresponding to the M similar voiceprint data can be constructed based on hamming distance and cosine distance, where K second similar samples exist in each similar nearest neighbor set, and M × K second similar samples exist in total, it is to be noted that the first similar samples and the second similar samples are also voiceprint feature vectors in nature, and the target nearest neighbor set and the similar nearest neighbor set can have sample overlap, for example, the first similar samples of the target nearest neighbor set are { a, b, c, d } respectively, and the second similar samples of the first similar nearest neighbor set are { z, and o, v, y, where the second similar samples of the second similar nearest neighbor set are { a, b, v, w }, respectively, and it can be seen that the overlapped samples of the target nearest neighbor set and the second nearest neighbor set are a and b, and the overlapped sample of the first similar nearest neighbor set and the second similar nearest neighbor set is v, which is not described herein again.

For convenience of understanding, the method for constructing the top nearest neighbor set is described in a unified manner, a sign function is taken for each dimension of the target voiceprint data and the similar voiceprint data to obtain a binary code to form a hash index library, then a first number of similar samples are retrieved based on hamming distance, and then a second number of similar samples are selected from the first number of similar samples based on cosine distance to obtain the nearest neighbor set, so that the nearest neighbor set can be guaranteed to have higher precision.

After the target nearest neighbor set and the similar nearest neighbor set are obtained, the target nearest neighbor set and the similar nearest neighbor set may be subjected to a purification process to obtain a target purified nearest neighbor set and a similar purified nearest neighbor set.

Optionally, K first sample nearest neighbor sets corresponding to the K first similar samples may be obtained first, and at the same time, M × K second sample nearest neighbor sets corresponding to the M × K second similar samples are obtained, that is, nearest neighbor sets of the first similar samples and the second nearest neighbor samples are obtained again; then, calculating K first overlap ratios of the K first sample nearest neighbor sets and the target nearest neighbor set, and calculating M × K second overlap ratios of the M × K second sample nearest neighbor sets and the M similar nearest neighbor sets; then, screening out first similar samples of which the first coincidence degree is greater than a preset coincidence threshold value from the K first similar samples as first purified samples, and screening out M groups of second similar samples of which the second coincidence degree is greater than the preset coincidence threshold value from the M × K second similar samples as second purified samples; then, the first refined sample is retained to obtain the target refined nearest neighbor set, and the second refined sample is retained to obtain the M similar refined nearest neighbor sets.

For convenience of understanding, the following description is made with reference to fig. 3 for illustrating the purification process of the target nearest neighbor set, and fig. 3 is a schematic diagram of a purification process method provided in this embodiment of the present application, and it can be seen that N (p, k) represents the target nearest neighbor set, p represents the target voiceprint data, k represents the number of samples, where k is 4, which includes { b, c, d, g } as the first similar samples, N (b, k) of the first similar sample b is { e, f, c, g }, N (c, k) of the first similar sample c is { p, h, f, b }, N (d, k) of the first nearest neighbor set of the first similar sample d is { o, t, z, y }, N (g, k) of the first similar sample g is { s, l, h, d }. It can be seen that the overlap samples of N (b, k) and N (p, k) are c and g, and the first overlap ratio, i.e., the number of overlap samples, is 2; the weight of N (c, k) and N (p, k)The combined sample is b, and the number of the combined samples is 1; the number of superposed samples of the N (d, k) and the N (p, k) is 0; n (g, k) and N (p, k) are represented by d, and the number of the represented samples is 1; at this time, the preset overlap ratio threshold may be 2, and obviously, only the first sample nearest neighbor set corresponding to the first similar sample b satisfies the preset overlap ratio threshold, so b is saved, c, d, and g are deleted, and the target refined nearest neighbor set N is obtained^*The first refined sample of (p, k) is { b }, it should be noted that the refining process on the target nearest neighbor set and the refining process on the similar nearest neighbor set are the same steps, and M similar refined nearest neighbor sets can be obtained in the same manner, which is not described herein again.

Optionally, another purification processing method also exists, and the K first sample nearest neighbor sets corresponding to the K first similar samples may be obtained first, and the M × K second sample nearest neighbor sets corresponding to the M × K second similar samples may be obtained first; then, screening out a first similar sample corresponding to a first sample nearest neighbor set with the target voiceprint data in the K first sample nearest neighbor sets as a first purified sample, and screening out M groups of second similar samples with corresponding similar voiceprint data in the MxK second sample nearest neighbor sets as second purified samples; and finally, reserving the first purification sample to obtain the target purification nearest neighbor set, and reserving the second purification sample to obtain the M similar purification nearest neighbor sets.

For convenience of understanding, another refinement processing procedure of the target nearest neighbor set is illustrated below with reference to fig. 4, fig. 4 is a schematic diagram of another refinement processing method provided in this embodiment of the present application, as shown in the drawing, N (p, k) represents the target nearest neighbor set, which includes the first similar sample and the first sample nearest neighbor set as described in fig. 3, and details are not repeated here, it can be seen that only the target voiceprint data p exists in the first sample nearest neighbor set corresponding to the first similar sample c, that is, the first similar sample c and the target voiceprint data p are nearest neighbors to each other, and none of the first similar samples b, d, and g satisfies the condition that the target voiceprint data are nearest neighbors to each other, so c may be retained finally,deleting b, d and g to obtain a target purification nearest neighbor set N^*The first refined sample of (p, k) is { c }. It should be noted that the purification processing on the target nearest neighbor set and the purification processing on the similar nearest neighbor set are the same step, and M similar purified nearest neighbor sets can be obtained in the same manner, which is not described herein again.

Optionally, in some cases, a case where a positive sample is deleted only by using the purification processing method of fig. 3 or only by using the purification processing method of fig. 4 may occur, where the positive sample is a sample that should be retained after the purification processing, and the two purification processing methods may be combined, that is, the first similar sample with the first overlap ratio greater than a preset overlap threshold among the K first similar samples is screened as a first purified sample, and M groups of second similar samples with the second overlap ratio greater than the preset overlap threshold among the M × K second similar samples are screened as second purified samples, at the same time, the first similar sample corresponding to the first sample nearest neighbor set in which the target voiceprint data exists among the K first sample nearest neighbor sets is screened as a first purified sample, and the M groups of second similar samples with corresponding similar voiceprint data among the M × K second sample nearest neighbor sets are screened as second purified samples, and finally, reserving the first purification sample to obtain the target purification nearest neighbor set, and reserving the second purification sample to obtain the M similar purification nearest neighbor sets.

For convenience of understanding, another purification processing method according to the embodiment of the present application is illustrated with reference to fig. 5, and fig. 5 is a schematic diagram of a purification processing method based on fig. 3 and 4 provided by the embodiment of the present application, and since the first similar sample b is a positive sample in fig. 3 and the first similar sample c is a positive sample in fig. 4, both b and c can be taken as positive samples in combination with the purification processing methods of fig. 3 and 4, and the final target purification nearest neighbor set N is obtained^*The first refinement sample of (p, k) is { b, c }, and it should be noted that the refinement processing on the target nearest neighbor set and the refinement processing on the similar nearest neighbor set are the same steps, and M similar refined nearest neighbor sets can be obtained in the same manner, which is not described herein again.

Therefore, the interference of the negative sample, namely the sample data which should not be preserved, can be eliminated through the purification treatment, the accuracy of the nearest neighbor set is greatly improved, and the accuracy of the subsequent retrieval is also improved.

After obtaining the target refined nearest neighbor set and the similar refined nearest neighbor set, the target refined nearest neighbor set may be expanded based on the K first similar samples to obtain a target optimized nearest neighbor set, and the M similar refined nearest neighbor sets may be expanded based on the mxk second similar samples to obtain M similar optimized nearest neighbor sets.

Specifically, first, a first refined nearest neighbor set corresponding to the first refined sample is obtained, and M groups of second refined nearest neighbor sets corresponding to the second refined sample are obtained; thereafter, obtaining a first number of coincident samples between said first refined nearest neighbor set and said target refined nearest neighbor set, and obtaining M sets of second number of coincident samples between said M sets of second refined nearest neighbor sets and said similar refined nearest neighbor set; screening out first purification similar samples corresponding to the first purification nearest neighbor set with the first coincident sample number larger than a preset coincident sample number threshold value as first expansion samples, and screening out M groups of second purification similar samples corresponding to M groups of second purification nearest neighbor sets with the M groups of first coincident sample numbers larger than a preset coincident sample number threshold value as second expansion samples; finally, the merging set of the first extended sample and the first refined sample is used to obtain the target optimization nearest neighbor set, and the merging set of the second extended sample and the second refined sample is used to obtain the M similar optimization nearest neighbor sets.

For convenience of understanding, the extension process of the target refinement nearest neighbor notation is exemplified with reference to fig. 6, fig. 6 is a schematic diagram of an extension process method based on fig. 5 according to an embodiment of the present application, and it should be noted that, the number of samples included in the first refinement nearest neighbor set may be set by itself, and since the number of first refinement samples of the target refinement nearest neighbor set is 2, the first refinement nearest neighbor set is also referred to herein as the first refinement nearest neighbor setIt can be set to take k/2 nearest neighbor set, i.e. the number of samples included in the first refinement nearest neighbor set is also 2, and it can be seen that N is^*The first refined sample of (p, k) is { b, c }, and the first refined sample b corresponds to the first refined nearest neighbor set N^*(b, k/2) includes the first refined similar sample as { c, f }, and the first refined sample c corresponds to the first refined similar sample N^*(c, k/2) is { p, f }, where N is^*(b, k/2) and N^*There are coincident samples c between (p, k), the number of coincident samples is 1, N^*(c, k/2) and N^*The number of coincident samples between (p, k) is 0, the preset threshold number of coincident samples is 1, and N is^*(b, k/2) and N^*And (p, k) the number of coincident samples between the (p, k) is greater than or equal to the preset threshold number of coincident samples, so that the { c, f } is a first extended sample, and the target optimization nearest neighbor set N' (p, k) obtained by merging the { c, f } with the { b, c } is { b, c, f }. It should be noted that the expansion processing on the target nearest neighbor set and the expansion processing on the similar nearest neighbor set are the same step, and M similar optimized nearest neighbor sets can be obtained in the same manner, and for convenience of description, any one of the M similar optimized nearest neighbor sets is represented as N '(g), where N' represents any one of the M similar optimized nearest neighbor sets (g)_MK), which will not be described in detail herein.

Therefore, samples which are not in the original nearest neighbor set can be obtained through the expansion processing, the accuracy of the nearest neighbor samples is improved, and the accuracy of subsequent retrieval is also improved.

Step 204, calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set.

Wherein M second similarity scores between the target optimized nearest neighbor set and the M similar optimized nearest neighbor sets may be calculated based on the jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj₂(p,g_M) The calculation formula of (2) is as follows:

J(p,g_M) Denotes the Jacard similarity index, g_MRepresents any one of the M similar voiceprint data, N (g)_MAnd k) represents a similar optimization nearest neighbor set corresponding to any similar voiceprint data.

Optionally, to reduce the calculation amount, the target optimization nearest neighbor set and the similar optimization nearest neighbor set may be encoded into vectors, and a gaussian kernel function is simultaneously used to obtain:

thus, the calculation amount of the Jacard distance can be simplified:

and obtaining M second similarity scores between the target optimization nearest neighbor set and the M similar optimization nearest neighbor sets.

By calculating a second similarity score between the target optimized nearest neighbor set and the similar optimized nearest neighbor set, the similarity between the nearest neighbor sets can be determined, so that a second sort result can be obtained according to the similarity between the sets, and preparation is made for subsequent reordering.

Step 205, calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordered data of the similar voiceprint data.

First obtaining a first weight corresponding to the M first similarity scores and a second weight corresponding to the M second similarity scores based on a weighted average algorithm, and then calculating according to the first weight, the second weight, the M first similarity scores and the M second similarity scores to obtain M third similarity scores; and finally, reordering the target voiceprint data and the similar voiceprint data according to the magnitude of the M third similarity scores to obtain the reordered data.

Specifically, the third similarity score is:

d(p,g_M)＝λd₁(p,g_M)+(1-λ)d₂(p,g_M)

alternatively, λ may be 0.6, the first weight may be 0.6, and the second weight may be 0.4.

After the M third similarity scores are obtained through calculation, reordering can be performed according to the score values, and reordering results are obtained.

Through the steps, a certain amount of similar voiceprint data can be selected from the voiceprint database based on the target voiceprint data to reduce the occupied memory, optimization processing is carried out based on the nearest neighbor set of the similar voiceprint data, the reordering result is finally obtained, the condition of insufficient memory during retrieval is avoided, and the accuracy of the voiceprint data reordering result is greatly improved.

Fig. 7 is a schematic structural diagram of an electronic device 700 provided in the embodiment of the present application, and the electronic device 700 includes an application processor 701, a communication interface 702, and a memory 703, where the application processor 701, the communication interface 702, and the memory 703 are connected to each other through a bus 704, and the bus 704 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, for example. The bus 704 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus. Wherein the memory 703 is configured to store a computer program comprising program instructions, and the application processor 701 is configured to call the program instructions to perform the method of: acquiring similar voiceprint data corresponding to the target voiceprint data from a preset voiceprint database; acquiring a first similarity score between the similar voiceprint data and the target voiceprint data; optimizing a target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and optimizing a similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set; calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set; and calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordering data of the similar voiceprint data.

In a possible embodiment, in the aspect of obtaining similar voiceprint data corresponding to the target voiceprint data from the preset voiceprint database, the instructions in the program are specifically configured to perform the following operations: calculating cosine distance data between the target voiceprint data and preset voiceprint data in the preset voiceprint database; and selecting M pieces of preset voiceprint data as the similar voiceprint data based on the size of the cosine distance data, wherein M is a positive integer.

In one possible embodiment, in the calculating the first similarity score between the similar voiceprint data and the target voiceprint data, the instructions in the program are specifically configured to perform the following operations: and obtaining M first similarity scores of the target voiceprint data and the M similar voiceprint data based on the size of the cosine distance data.

In a possible embodiment, in the aspect that the target nearest neighbor set corresponding to the target voiceprint data is optimized to obtain a target optimized nearest neighbor set, and the similar nearest neighbor set corresponding to the similar voiceprint data is optimized to obtain a similar optimized nearest neighbor set, the instructions in the program are specifically configured to perform the following operations: retrieving a target nearest neighbor set corresponding to the target voiceprint data based on a hamming distance and a cosine distance, and retrieving M similar nearest neighbor sets corresponding to the M similar voiceprint data based on the hamming distance and the cosine distance, wherein the target nearest neighbor set comprises K first similar samples, the M similar nearest neighbor sets comprise M × K second similar samples, and K is a positive integer; carrying out purification treatment on the target nearest neighbor set based on the K first similar samples to obtain a target purified nearest neighbor set, and carrying out purification treatment on the M similar nearest neighbor sets based on the M multiplied by K second similar samples to obtain M similar purified nearest neighbor sets; and performing expansion processing on the target purification nearest neighbor set based on the K first similar samples to obtain a target optimization nearest neighbor set, and performing expansion processing on the M similar purification nearest neighbor sets based on the M multiplied by K second similar samples to obtain M similar optimization nearest neighbor sets.

In a possible embodiment, in the aspect that the refining the target nearest neighbor set based on the K first similar samples obtains a target refined nearest neighbor set, and the refining the M similar nearest neighbor sets based on the mxk second similar samples obtains M similar refined nearest neighbor sets, the instructions in the program are specifically configured to perform the following operations: acquiring K first sample nearest neighbor sets corresponding to the K first similar samples, and acquiring M multiplied by K second sample nearest neighbor sets corresponding to the M multiplied by K second similar samples; calculating K first degrees of coincidence of the K first sample nearest neighbor sets with the target nearest neighbor set, and M × K second degrees of coincidence of the M × K second sample nearest neighbor sets with the M similar nearest neighbor sets; screening out first similar samples of which the first coincidence degrees are greater than a preset coincidence threshold value from the K first similar samples as first purified samples, and screening out M groups of second similar samples of which the second coincidence degrees are greater than the preset coincidence threshold value from the M multiplied by K second similar samples as second purified samples; retaining the first refined sample to obtain the target refined nearest neighbor set, and retaining the second refined sample to obtain the M similar refined nearest neighbor sets.

In a possible embodiment, in the aspect that the refining the target nearest neighbor set based on the K first similar samples obtains a target refined nearest neighbor set, and the refining the M similar nearest neighbor sets based on the mxk second similar samples obtains M similar refined nearest neighbor sets, the instructions in the program are specifically configured to perform the following operations: acquiring K first sample nearest neighbor sets corresponding to the K first similar samples, and acquiring M multiplied by K second sample nearest neighbor sets corresponding to the M multiplied by K second similar samples; screening out a first similar sample corresponding to a first sample nearest neighbor set with the target voiceprint data in the K first sample nearest neighbor sets as a first purified sample, and screening out M groups of second similar samples with corresponding similar voiceprint data in the MxK second sample nearest neighbor sets as second purified samples; retaining the first refined sample to obtain the target refined nearest neighbor set, and retaining the second refined sample to obtain the M similar refined nearest neighbor sets.

In one possible embodiment, in terms of the expanding the target refined nearest neighbor set based on the K first similar samples to obtain a target optimized nearest neighbor set, and the expanding the M similar refined nearest neighbor sets based on the mxk second similar samples to obtain M similar optimized nearest neighbor sets, the instructions in the program are specifically configured to perform the following operations: acquiring a first purification nearest neighbor set corresponding to the first purification sample, and acquiring M groups of second purification nearest neighbor sets corresponding to the second purification sample; obtaining a first number of coincident samples between the first refined nearest neighbor set and the target refined nearest neighbor set, and obtaining M sets of second numbers of coincident samples between the M sets of second refined nearest neighbor sets and the similar refined nearest neighbor set; screening out first purification similar samples corresponding to the first purification nearest neighbor set with the first coincident sample number larger than a preset coincident sample number threshold value as first expansion samples, and screening out M groups of second purification similar samples corresponding to M groups of second purification nearest neighbor sets with the M groups of first coincident sample numbers larger than a preset coincident sample number threshold value as second expansion samples; and merging the first extended sample and the first purified sample to obtain the target optimization nearest neighbor set, and merging the second extended sample and the second purified sample to obtain the M similar optimization nearest neighbor sets.

In one possible embodiment, in said calculating the second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set, the instructions in the program are specifically configured to perform the following operations: calculating M second similarity scores between the target optimized nearest neighbor set and the M similar optimized nearest neighbor sets based on Jacard distances.

In one possible embodiment, in the aspect that the calculation based on the first similarity score and the second similarity score obtains the target voiceprint data and the re-ordered data of the similar voiceprint data, the instructions in the program are specifically configured to perform the following operations: acquiring first weights corresponding to the M first similarity scores and second weights corresponding to the M second similarity scores based on a weighted average algorithm; calculating according to the first weight, the second weight, the M first similarity scores and the M second similarity scores to obtain M third similarity scores; and reordering the target voiceprint data and the similar voiceprint data according to the magnitude of the M third similarity scores to obtain reordered data.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Fig. 8 is a block diagram illustrating functional units of a voiceprint data reordering apparatus 800 according to an embodiment of the present disclosure. The voiceprint data reordering device 800 is applied to an electronic device and comprises a processing unit 801, a communication unit 802 and a storage unit 803, wherein the processing unit 801 is used for executing any step in the method embodiments, and when data transmission such as sending is executed, the communication unit 802 can be optionally called to complete corresponding operation. The details will be described below.

The processing unit 801 is configured to obtain similar voiceprint data corresponding to the target voiceprint data from a preset voiceprint database; acquiring a first similarity score between the similar voiceprint data and the target voiceprint data; optimizing a target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and optimizing a similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set; calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set; and calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the reordering data of the similar voiceprint data.

In a possible embodiment, in the aspect of obtaining similar voiceprint data corresponding to the target voiceprint data from the preset voiceprint database, the processing unit 801 is specifically configured to: calculating cosine distance data between the target voiceprint data and preset voiceprint data in the preset voiceprint database; and selecting M pieces of preset voiceprint data as the similar voiceprint data based on the size of the cosine distance data, wherein M is a positive integer.

In a possible embodiment, in the aspect of calculating the first similarity score between the similar voiceprint data and the target voiceprint data, the processing unit 801 is specifically configured to: and obtaining M first similarity scores of the target voiceprint data and the M similar voiceprint data based on the size of the cosine distance data.

In a possible embodiment, in terms of performing optimization processing on the target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and performing optimization processing on the similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set, the processing unit 801 is specifically configured to: retrieving a target nearest neighbor set corresponding to the target voiceprint data based on a hamming distance and a cosine distance, and retrieving M similar nearest neighbor sets corresponding to the M similar voiceprint data based on the hamming distance and the cosine distance, wherein the target nearest neighbor set comprises K first similar samples, the M similar nearest neighbor sets comprise M × K second similar samples, and K is a positive integer; carrying out purification treatment on the target nearest neighbor set based on the K first similar samples to obtain a target purified nearest neighbor set, and carrying out purification treatment on the M similar nearest neighbor sets based on the M multiplied by K second similar samples to obtain M similar purified nearest neighbor sets; and performing expansion processing on the target purification nearest neighbor set based on the K first similar samples to obtain a target optimization nearest neighbor set, and performing expansion processing on the M similar purification nearest neighbor sets based on the M multiplied by K second similar samples to obtain M similar optimization nearest neighbor sets.

In a possible embodiment, in terms of the refining the target nearest neighbor set based on the K first similar samples to obtain a target refined nearest neighbor set, and the refining the M similar nearest neighbor sets based on the mxk second similar samples to obtain M similar refined nearest neighbor sets, the processing unit 801 is specifically configured to: acquiring K first sample nearest neighbor sets corresponding to the K first similar samples, and acquiring M multiplied by K second sample nearest neighbor sets corresponding to the M multiplied by K second similar samples; calculating K first degrees of coincidence of the K first sample nearest neighbor sets with the target nearest neighbor set, and M × K second degrees of coincidence of the M × K second sample nearest neighbor sets with the M similar nearest neighbor sets; screening out first similar samples of which the first coincidence degrees are greater than a preset coincidence threshold value from the K first similar samples as first purified samples, and screening out M groups of second similar samples of which the second coincidence degrees are greater than the preset coincidence threshold value from the M multiplied by K second similar samples as second purified samples; retaining the first refined sample to obtain the target refined nearest neighbor set, and retaining the second refined sample to obtain the M similar refined nearest neighbor sets.

In a possible embodiment, in terms of the refining the target nearest neighbor set based on the K first similar samples to obtain a target refined nearest neighbor set, and the refining the M similar nearest neighbor sets based on the mxk second similar samples to obtain M similar refined nearest neighbor sets, the processing unit 801 is specifically configured to: acquiring K first sample nearest neighbor sets corresponding to the K first similar samples, and acquiring M multiplied by K second sample nearest neighbor sets corresponding to the M multiplied by K second similar samples; screening out a first similar sample corresponding to a first sample nearest neighbor set with the target voiceprint data in the K first sample nearest neighbor sets as a first purified sample, and screening out M groups of second similar samples with corresponding similar voiceprint data in the MxK second sample nearest neighbor sets as second purified samples; retaining the first refined sample to obtain the target refined nearest neighbor set, and retaining the second refined sample to obtain the M similar refined nearest neighbor sets.

In a possible embodiment, in terms of the expanding the target refined nearest neighbor set based on the K first similar samples to obtain a target optimized nearest neighbor set, and the expanding the M similar refined nearest neighbor sets based on the mxk second similar samples to obtain M similar optimized nearest neighbor sets, the processing unit 801 is specifically configured to: acquiring a first purification nearest neighbor set corresponding to the first purification sample, and acquiring M groups of second purification nearest neighbor sets corresponding to the second purification sample; obtaining a first number of coincident samples between the first refined nearest neighbor set and the target refined nearest neighbor set, and obtaining M sets of second numbers of coincident samples between the M sets of second refined nearest neighbor sets and the similar refined nearest neighbor set; screening out first purification similar samples corresponding to the first purification nearest neighbor set with the first coincident sample number larger than a preset coincident sample number threshold value as first expansion samples, and screening out M groups of second purification similar samples corresponding to M groups of second purification nearest neighbor sets with the M groups of first coincident sample numbers larger than a preset coincident sample number threshold value as second expansion samples; and merging the first extended sample and the first purified sample to obtain the target optimization nearest neighbor set, and merging the second extended sample and the second purified sample to obtain the M similar optimization nearest neighbor sets.

In a possible embodiment, in the calculating the second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set, the processing unit 801 is specifically configured to: calculating M second similarity scores between the target optimized nearest neighbor set and the M similar optimized nearest neighbor sets based on Jacard distances.

In a possible embodiment, in terms of obtaining the reordering data of the target voiceprint data and the similar voiceprint data by performing the calculation based on the first similarity score and the second similarity score, the processing unit 801 is specifically configured to: acquiring first weights corresponding to the M first similarity scores and second weights corresponding to the M second similarity scores based on a weighted average algorithm; calculating according to the first weight, the second weight, the M first similarity scores and the M second similarity scores to obtain M third similarity scores; and reordering the target voiceprint data and the similar voiceprint data according to the magnitude of the M third similarity scores to obtain reordered data.

Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above description of the embodiments is only for the purpose of helping to understand the method of the present application and its core ideas; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of reordering voiceprint data, the method comprising:

acquiring a first similarity score between the similar voiceprint data and the target voiceprint data;

2. The method according to claim 1, wherein the obtaining similar voiceprint data corresponding to the target voiceprint data from the preset voiceprint database comprises:

calculating cosine distance data between the target voiceprint data and preset voiceprint data in the preset voiceprint database;

and selecting M pieces of preset voiceprint data as the similar voiceprint data based on the size of the cosine distance data, wherein M is a positive integer.

3. The method of claim 2, wherein said calculating a first similarity score between the similar voiceprint data and the target voiceprint data comprises:

and obtaining M first similarity scores of the target voiceprint data and the M similar voiceprint data based on the size of the cosine distance data.

4. The method according to claim 2, wherein the optimizing the target nearest neighbor set corresponding to the target voiceprint data to obtain a target optimized nearest neighbor set, and optimizing the similar nearest neighbor set corresponding to the similar voiceprint data to obtain a similar optimized nearest neighbor set comprises:

retrieving a target nearest neighbor set corresponding to the target voiceprint data based on a hamming distance and a cosine distance, and retrieving M similar nearest neighbor sets corresponding to the M similar voiceprint data based on the hamming distance and the cosine distance, wherein the target nearest neighbor set comprises K first similar samples, the M similar nearest neighbor sets comprise M × K second similar samples, and K is a positive integer;

carrying out purification treatment on the target nearest neighbor set based on the K first similar samples to obtain a target purified nearest neighbor set, and carrying out purification treatment on the M similar nearest neighbor sets based on the M multiplied by K second similar samples to obtain M similar purified nearest neighbor sets;

and performing expansion processing on the target purification nearest neighbor set based on the K first similar samples to obtain a target optimization nearest neighbor set, and performing expansion processing on the M similar purification nearest neighbor sets based on the M multiplied by K second similar samples to obtain M similar optimization nearest neighbor sets.

5. The method of claim 4, wherein the refining the target nearest neighbor set based on the K first similar samples to obtain a target refined nearest neighbor set, and wherein the refining the M similar nearest neighbor sets based on the M x K second similar samples to obtain M similar refined nearest neighbor sets comprises:

acquiring K first sample nearest neighbor sets corresponding to the K first similar samples, and acquiring M multiplied by K second sample nearest neighbor sets corresponding to the M multiplied by K second similar samples;

calculating K first degrees of coincidence of the K first sample nearest neighbor sets with the target nearest neighbor set, and M × K second degrees of coincidence of the M × K second sample nearest neighbor sets with the M similar nearest neighbor sets;

screening out first similar samples of which the first coincidence degrees are greater than a preset coincidence threshold value from the K first similar samples as first purified samples, and screening out M groups of second similar samples of which the second coincidence degrees are greater than the preset coincidence threshold value from the M multiplied by K second similar samples as second purified samples;

retaining the first refined sample to obtain the target refined nearest neighbor set, and retaining the second refined sample to obtain the M similar refined nearest neighbor sets.

6. The method of claim 4, wherein the refining the target nearest neighbor set based on the K first similar samples to obtain a target refined nearest neighbor set, and wherein the refining the M similar nearest neighbor sets based on the M x K second similar samples to obtain M similar refined nearest neighbor sets comprises:

screening out a first similar sample corresponding to a first sample nearest neighbor set with the target voiceprint data in the K first sample nearest neighbor sets as a first purified sample, and screening out M groups of second similar samples with corresponding similar voiceprint data in the MxK second sample nearest neighbor sets as second purified samples;

7. The method of claim 4, wherein the expanding the target refined nearest neighbor set based on the K first similar samples yields a target optimized nearest neighbor set, and wherein the expanding the M similar refined nearest neighbor sets based on the M x K second similar samples yields M similar optimized nearest neighbor sets, comprising:

acquiring a first purification nearest neighbor set corresponding to the first purification sample, and acquiring M groups of second purification nearest neighbor sets corresponding to the second purification sample;

obtaining a first number of coincident samples between the first refined nearest neighbor set and the target refined nearest neighbor set, and obtaining M sets of second number of coincident samples between each of the M sets of second refined nearest neighbor sets and the similar refined nearest neighbor set;

screening out first purification similar samples corresponding to the first purification nearest neighbor set with the first coincident sample number larger than a preset coincident sample number threshold value as first expansion samples, and screening out M groups of second purification similar samples corresponding to M groups of second purification nearest neighbor sets with the M groups of first coincident sample numbers larger than a preset coincident sample number threshold value as second expansion samples;

and merging the first extended sample and the first purified sample to obtain the target optimization nearest neighbor set, and merging the second extended sample and the second purified sample to obtain the M similar optimization nearest neighbor sets.

8. The method of claim 2, wherein the calculating a second similarity score between the target optimization nearest neighbor set and the similar optimization nearest neighbor set comprises:

calculating M second similarity scores between the target optimized nearest neighbor set and the M similar optimized nearest neighbor sets based on Jacard distances.

9. The method according to any one of claims 2 to 8, wherein the calculating based on the first similarity score and the second similarity score to obtain the target voiceprint data and the re-ranking data of the similar voiceprint data comprises:

acquiring first weights corresponding to the M first similarity scores and second weights corresponding to the M second similarity scores based on a weighted average algorithm;

calculating according to the first weight, the second weight, the M first similarity scores and the M second similarity scores to obtain M third similarity scores;

and reordering the target voiceprint data and the similar voiceprint data according to the magnitude of the M third similarity scores to obtain reordered data.

10. An apparatus for reordering voiceprint data, the apparatus comprising a processing unit configured to:

11. An electronic device comprising an application processor, a memory, and one or more programs stored in the memory and configured to be executed by the application processor, the programs comprising instructions for performing the steps of the method of any of claims 1-9.

12. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method according to any of claims 1-9.