CN117423344A

CN117423344A - Voiceprint recognition method and device based on neural network

Info

Publication number: CN117423344A
Application number: CN202311262765.8A
Authority: CN
Inventors: 胡光强; 许敏; 张军
Original assignee: HUADI COMPUTER GROUP CO Ltd
Current assignee: HUADI COMPUTER GROUP CO Ltd
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2024-01-19

Abstract

The invention discloses a voiceprint recognition method and device based on a neural network. The method comprises the following steps: collecting voiceprint signals to be identified and preprocessing; extracting features of the voice print signals to be identified after pretreatment, and determining voice print features of the voice print signals to be identified; inputting voiceprint features into a pre-trained neural network classifier model, and outputting all possible class label sets of voiceprint signals to be identified, wherein the classifier model adopts self-organizing mapping based on quantum optimization; traversing a possible class label set according to a pre-constructed Bayesian optimization probability model, and obtaining an optimal class label of the voiceprint signal to be identified.

Description

Voiceprint recognition method and device based on neural network

Technical Field

The invention relates to the technical field of voiceprint recognition, in particular to a voiceprint recognition method and device based on a neural network.

Background

Voiceprint recognition is a technique for performing authentication or identification using individual voice features, and has been widely studied and used in recent years. With the rapid development of various application scenarios such as intelligent voice assistant, intelligent home, mobile payment, security verification and the like, the accuracy and reliability of the voiceprint recognition technology become particularly important.

However, in practical applications, existing voiceprint recognition techniques face a number of challenges. The following problems remain to be solved in the prior art:

data quality and resolution problems: conventional data acquisition methods may fail to ensure sufficient information volume and resolution, thereby affecting the accuracy of the identification.

The pretreatment is insufficient: existing data preprocessing techniques may not be advanced enough, failing to effectively eliminate noise or perform proper data normalization may affect the performance and generalization ability of the model.

Data diversity and volume were insufficient: traditional data expansion approaches may be too simple to take into account the time dependence of the voiceprint data, resulting in insufficient generalization ability of the model in the face of different environmental and noise conditions.

Limited feature extraction capability: conventional Convolutional Neural Networks (CNNs), while performing well in voiceprint recognition, may suffer from deficiencies in handling multi-scale features and in introducing attention mechanisms.

Classifier performance and efficiency problems: existing classifiers may not fully utilize advanced optimization techniques such as quantum optimization and thus have limitations in computational efficiency and accuracy.

Lack of uncertainty assessment: conventional models typically only give one most likely classification result without evaluating the uncertainty of the model to this result, which is a disadvantage in terms of safety and reliability.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a voiceprint recognition method and device based on a neural network.

According to one aspect of the present invention, there is provided a voiceprint recognition method based on a neural network, including:

collecting voiceprint signals to be identified and preprocessing;

extracting features of the voice print signals to be identified after pretreatment, and determining voice print features of the voice print signals to be identified;

inputting voiceprint features into a pre-trained neural network classifier model, and outputting all possible class label sets of voiceprint signals to be identified, wherein the classifier model adopts self-organizing mapping based on quantum optimization;

traversing a possible class label set according to a pre-constructed Bayesian optimization probability model, and obtaining an optimal class label of the voiceprint signal to be identified.

Optionally, the classifier model training process is as follows:

collecting a voiceprint signal dataset of a microphone device;

performing feature extraction on the voiceprint signal data set by using a Mel frequency cepstrum coefficient feature extraction method to determine a Mel frequency cepstrum coefficient set;

preprocessing a voiceprint signal data set by adopting a maximum and minimum normalization and time domain noise filter combination method;

performing data expansion on the preprocessed voiceprint signal data set based on the generation of the echo state against a network algorithm to generate an expanded voiceprint signal data set;

performing feature extraction on the expanded voiceprint signal data set by adopting a multi-scale convolutional neural network, and determining a voiceprint feature set of the voiceprint signal data set;

the Mel frequency cepstrum coefficient set is formed into a Mel frequency cepstrum coefficient matrix, and is fused with the voiceprint feature set to determine the voiceprint fusion feature set;

inputting the voiceprint fusion feature set into a self-organizing map classifier based on quantum optimization for training to obtain a classifier model.

Optionally, the microphone device is any one of the following dynamic microphone, condenser microphone and array microphone.

Optionally, the feature extraction method for extracting features of the voiceprint signal data set by using a mel-frequency cepstral coefficient feature extraction method, and determining the mel-frequency cepstral coefficient set includes:

performing Fourier transformation on the voiceprint signals of the voiceprint signal data set respectively to obtain a frequency spectrum feature set;

mapping the spectrum feature set to a Mel scale to obtain a Mel domain feature set;

and performing discrete cosine transform on the Mel domain feature set to obtain a Mel frequency cepstrum coefficient set.

Optionally, the maximum and minimum normalization method formula is:

where x' is normalized data, min (x) and max (x) are the minimum and maximum values of the original data x, respectively.

Optionally, the time domain noise filter formula is:

where x' is normalized data, α is a weight parameter between 0 and 1, n is the local window size, w _i Is the weight of each point in the window and satisfies

Optionally, generating the countermeasure network algorithm includes generating G _ESN And discriminator D, wherein generator G _ESN Consists of an original generator and an echo state network.

Optionally, the bayesian optimization model is:

where Y is the set of all possible class labels, μ (x, Y) and σ ² (x, Y) are mean and variance, respectively, x is the voiceprint signal to be identified and Y is the category in Y.

Optionally, the method further comprises: and carrying out uncertainty measurement on the optimal category label, and determining the uncertainty of the optimal label, wherein the uncertainty measurement is expressed as the following formula:

U(x,y ^* )＝σ ² (x,y ^* )

wherein x is the voiceprint signal to be identified, y ^* For the best category label, σ ² Is the variance.

According to another aspect of the present invention, there is provided a voiceprint recognition apparatus based on a neural network, including:

the acquisition module is used for acquiring the voiceprint signal to be identified and preprocessing the voiceprint signal;

the feature extraction module is used for carrying out feature extraction on the voice print signal to be identified after the pretreatment and determining voice print features of the voice print signal to be identified;

the output module is used for inputting voiceprint characteristics into a pre-trained neural network classifier model and outputting all possible class label sets of voiceprint signals to be identified, wherein the classifier model adopts self-organizing mapping based on quantum optimization;

the acquisition module is used for traversing a possible class label set according to a pre-constructed Bayesian optimization probability model to acquire an optimal class label of the voiceprint signal to be identified.

According to a further aspect of the present invention there is provided a computer readable storage medium storing a computer program for performing the method according to any one of the above aspects of the present invention.

According to still another aspect of the present invention, there is provided an electronic device including: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method according to any of the above aspects of the present invention.

Therefore, the invention provides a voiceprint recognition method based on a neural network. On the design of the classifier, self-organizing mapping (SOM) based on quantum optimization is adopted to improve accuracy and calculation efficiency, uncertainty assessment based on Bayesian optimization is introduced, and a more reliable classification result is provided.

Drawings

Exemplary embodiments of the present invention may be more completely understood in consideration of the following drawings:

FIG. 1 is a flow chart of a voice print recognition method based on a neural network according to an exemplary embodiment of the present invention;

fig. 2 is a schematic structural diagram of a voiceprint recognition method based on a neural network according to an exemplary embodiment of the present invention;

fig. 3 is a schematic structural diagram of a voiceprint recognition device based on a neural network according to an exemplary embodiment of the present invention;

fig. 4 is a structure of an electronic device provided in an exemplary embodiment of the present invention.

Detailed Description

Hereinafter, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present invention and not all embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present invention are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.

It should also be understood that in embodiments of the present invention, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.

It should also be appreciated that any component, data, or structure referred to in an embodiment of the invention may be generally understood as one or more without explicit limitation or the contrary in the context.

In addition, the term "and/or" in the present invention is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In the present invention, the character "/" generally indicates that the front and rear related objects are an or relationship.

It should also be understood that the description of the embodiments of the present invention emphasizes the differences between the embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.

Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.

The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, the techniques, methods, and apparatus should be considered part of the specification.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations with electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the foregoing, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

Exemplary method

Fig. 1 is a flowchart of a voice print recognition method based on a neural network according to an exemplary embodiment of the present invention. The embodiment can be applied to an electronic device, as shown in fig. 1, a voice print recognition method 100 based on a neural network includes the following steps:

step 101, collecting voiceprint signals to be identified and preprocessing;

102, extracting features of the pre-processed voiceprint signals to be identified, and determining voiceprint features of the voiceprint signals to be identified;

step 103, inputting voiceprint features into a pre-trained neural network classifier model, and outputting all possible class label sets of voiceprint signals to be identified, wherein the classifier model adopts self-organizing mapping based on quantum optimization;

and 104, traversing a possible class label set according to a pre-constructed Bayesian optimization probability model to obtain an optimal class label of the voiceprint signal to be identified.

Specifically, the invention provides a voiceprint recognition method based on a neural network. The method is focused on high quality and high resolution from the beginning of data acquisition; advanced techniques, such as time domain noise filters based on local correlation and convolutional neural networks (MS-CNN) based on multiple scales, are employed during the data preprocessing and feature extraction stage to more fully and accurately capture voiceprint characteristics; on the classifier design, self-organizing mapping (SOM) based on quantum optimization is adopted to improve accuracy and calculation efficiency; in addition, uncertainty evaluation based on Bayesian optimization is introduced, and more reliable classification results are provided.

Specifically, the training steps of the classifier model are as follows:

step one: data acquisition and annotation

The data of the invention mainly come from high-quality microphone equipment, in particular to a dynamic microphone, a capacitor microphone, an array microphone and the like, so as to ensure that collected voiceprint signals have enough information quantity and resolution. The data is stored in WAV or MP3 format to achieve compatibility with various data analysis tools.

The voiceprint signal is one-dimensional time series data, typically denoted by x (t), where x is the amplitude of the voiceprint signal and t is time. Such signals may be further broken down into a number of attributes and features, such as frequency, phase, etc.

Further, according to a Mel Frequency Cepstral Coefficient (MFCC) feature extraction method, preliminary feature extraction is performed on voiceprint features. The method comprises the following steps:

1. fourier transform is applied to the signal X (t) to obtain X (f).

Specifically, in order to extract effective features from the voiceprint signal x (t), the present invention employs fourier transformation for preliminary feature extraction. Specifically, given a time signal x (t), its fourier transform is expressed as:

where X (f) is a frequency domain representation, f is frequency, and j is complex unit.

2. Mapping X (f) onto a Mel scale to obtain M (f).

Specifically, the calculation formula of M (f) is:

3. and performing Discrete Cosine Transform (DCT) on the M (f) to obtain the MFCC.

Specifically, the MFCC may be expressed as:

MFCC＝DCT(M(f))

in a specific embodiment, consider a simple voiceprint signal x (t) =sin (2pi ft), where f=440 Hz.

Further, a fourier transform is applied, resulting in X (f) =δ (f-440) +δ (f+440), where δ is a dirac δ -function.

Further, mapping onto Mel scale to obtain

Further, the MFCC is characterized as 392.5. And taking the characteristic as a newly added characteristic, cascading with the original signal, and enhancing the characteristic characterization capability of the original voiceprint signal.

Step two: data preprocessing

The preprocessing of voiceprint data is a crucial step, affecting the performance and generalization ability of the subsequent neural network model. The invention adopts a combination method based on maximum and minimum normalization and a specific noise filter to perform data preprocessing.

For the collected voiceprint signal data, it is composed of a plurality of frequency components and may contain different levels of noise. Maximum and minimum normalization can normalize the data to within a predetermined range (typically 0,1 or-1, 1). Specifically, given a voiceprint signal sample x2 (t), its normalized formula is:

where x2 (t)' is normalized data, min (x 2 (t)) and max (x 2 (t)) are the minimum and maximum values of the original data x, respectively, and x2 (t) is a given voiceprint signal sample before normalization.

Further, the normalized data is input to a noise filter. Conventional noise filters typically operate on a frequency domain basis, and the present invention employs a time domain noise filter based on local correlation. For each time point t, the noise filtered value x "(t) is calculated as follows:

wherein α is a weight parameter between 0 and 1, n is the local window size, w _i Is the weight of each point in the window and satisfies

After maximum and minimum normalization of the voiceprint signal x, it is input to a noise filter to obtain final preprocessed data x″. The calculation steps can be represented by the following formula:

x″＝NoiseFilter(MinMaxNorm(x))

where NoiseFilter is a noise filter and minmaxnom is the maximum minimum normalization function.

By combining the maximum and minimum normalization with the specific noise filter, the important characteristics in the voiceprint data are reserved, the influence of noise and other irrelevant factors is reduced, and the voiceprint recognition accuracy is further improved.

Step three: data augmentation

It will be appreciated that the amount and diversity of data is a critical factor in the task of voiceprint recognition. The invention provides an improved echo state-based generation countermeasure network algorithm for data expansion.

First, an Echo State Network (ESNs) is used as a basic part of the generator. The echo state network comprises an input weight matrix W _in An echo state weight matrix W and an output weight matrix W _out . For the preprocessed data, let data sequence x= [ X (1), X (2),. The term, X (N)]Where N is the length of the time series.

Then, the dynamic update rule of the hidden state h (t) is:

h(t)＝(1-α)·h(t-1)+α·tanh(W _in ·x(t)+W·h(t-1))

where α is a forgetting factor ranging from 0 to 1. W (W) _in Is a weight matrix input to the hidden layer, and W is a weight matrix connected inside the hidden layer.

In a standard generation countermeasure network, the loss functions of the generator G and discriminator D are generally defined as follows:

wherein p is _data (x) Is the true distribution of data, p _z (z) is the noise profile of the generator input.

In the present invention, an Echo State Network (ESN) is incorporated into the generator so that the generator can not only generate new samples, but also capture the time dependent nature of the voiceprint data. Specifically, generator G is denoted as G _ESN The echo state network is formed by combining an original generator and an echo state network, and can be expressed as follows:

G _ESN (z)＝G(ESN(z))

then, the modified loss function is:

by incorporating the ESN into the GAN, the generated data better mimics the time-dependent nature of the voiceprint data.

Step four: data feature extraction

Feature extraction of the expanded sample data requires extraction of key features from the high-dimensional original sound signal for high-accuracy recognition, since voiceprint recognition is a complex pattern recognition problem. Traditional Convolutional Neural Networks (CNNs) have performed well in voiceprint recognition, but still suffer from certain limitations, particularly in dealing with multi-scale features.

The invention provides a multi-scale convolutional neural network (MS-CNN) based feature extraction, in particular to a self-adaptive learning rate and attention mechanism which is introduced to enhance the performance of a model.

Specifically, in the feature extraction stage, firstly, a multi-scale convolution operation is performed on the input voiceprint data X. Specifically, a convolution kernel K of three different scales (scales) is employed ₁ ,K ₂ ,K ₃ The convolution operation is performed, which can be expressed as:

F ₁ ＝X*K ₁

F ₂ ＝X*K ₂

F ₃ ＝X*K ₃

where x represents the convolution operation. X is an input voiceprint data matrix, the dimension is NxM, N is the number of samples, M is the feature number of each sample, and F is the feature matrix extracted by MS-CNN.

Further, for finer feature extraction, the learning rate LR is adaptively adjusted after each convolution operation, which can be expressed as:

where L is the loss function,is the gradient of the weight W with respect to the loss function. W is a weight parameter in CNN, and LR is a learning rate.

Further, attention mechanism weighting is performed. After multi-scale feature extraction, an attention mechanism is adopted to weight and combine the features F extracted at different scales ₁ ,F ₂ ,F ₃ Can be expressed as:

F＝α ₁ ×F ₁ +α ₂ ×F ₂ +α ₃ ×F ₃

where α is the attention weight.

Further, in processing voiceprint data, frequency domain information often contains a number of useful features in addition to time domain information. In order to more comprehensively capture the characteristics of the voiceprint, the invention introduces a frequency domain information fusion mechanism.

Specifically, the mel-frequency cepstrum coefficient obtained in the step one is first combined into a mel-frequency cepstrum coefficient matrix by means of a multi-scale sliding window, which can be expressed as

Further, a learnable weight parameter beta is utilized for the matchingThe fusion with the time domain (F) information can be expressed as:

wherein,is a frequency domain feature->Regarding the gradient of the loss function L, the fused data is:

in this fusion step, F _final The feature set that is ultimately used for voiceprint recognition.

The frequency domain information fusion mechanism provided by the invention further enriches the expression capability of the features and improves the accuracy of voiceprint recognition. And by introducing multi-scale convolution, self-adaptive learning rate and attention mechanism, the characteristics extracted from the voiceprint data are more accurate and robust, so that the accuracy of the voiceprint recognition task is improved.

Step five: training classifier

And inputting the data after the feature extraction into a classifier, and training the classifier. The invention discloses a self-organizing map (SOM) classifier based on quantum optimization, which is used for realizing higher classification accuracy and calculation efficiency.

Self-organizing map (SOM) maps the input space (i.e., voiceprint feature space) to a low-dimensional grid. Specifically, let the data after feature extraction be x= { X ₁ ,x ₂ ,...,x _N [ is, wherein x _i Is the eigenvector of the voiceprint and N is the number of samples.

Further, for an n×m SOM grid, a weight matrix W is defined. Wherein w is _ij The weights representing the (i, j) nodes in the grid are a d-dimensional vector.

Further, the basic update formula of the SOM is:

w _ij (t+1)＝w _ij (t)+α(t)·h _ij (ξ,t)·(x(t)-w _ij (t))

where α (t) is the learning rate, h _ij (ζ, t) is a neighborhood function, typically using a Gaussian function; ζ is the winning node (i.e., the node closest to input x (t)); t is the time step.

Further, the invention introduces a quantum optimization algorithm to adjust SOM grid weights. Specifically, the optimization objective is to minimize the following loss function J:

wherein w is _c(i) Is closest to x _i Is the weight of the SOM node, lambda is the regularization parameter,is a quantum optimization operation.

Wherein the quantum optimizing operationIs realized by quantum gates and quantum bits, and is used for quickly finding out the globally optimal solution.

Further, J (W) is required to be minimum, and W is graded to obtain:

further, the weight W is updated using a random gradient descent method:

where η is the step size.

Based on this, not only data point x _i SOM node w nearest thereto _c(i) The approach is followed and the overall weight matrix W approaches the quantum-optimized state.

The process of voiceprint recognition by the classifier model trained through the steps is as follows:

and performing voiceprint classification by using the trained classifier model. Based on the Bayesian optimization model reasoning strategy, not only can the classification accuracy be improved, but also an uncertainty measure can be provided for each classification result.

Specifically, for a trained model f, it can map voiceprint data points x to corresponding class labels y. The uncertainty of this mapping is expressed in terms of a conditional probability P (y|x), which can be expressed as:

where f (x, y) is the joint score of the model to the input x and the label y, y' traverses all possible labels.

Further, bayesian optimization is applied to the probabilistic model P (y|x) to find the most likely class labels. Specifically, it is provided withFor all possible class label sets, the optimization objective is:

by Gaussian Process Regression (GPR), there are:

wherein μ (x, y) and σ ² (x, y) are mean and variance, respectively.

Then, the optimization objective becomes:

where β is an adjustable parameter used to trade-off accuracy and uncertainty.

Further, the uncertainty measure is evaluated by the variance of the model, specifically:

U(x,y ^* )＝σ ² (x,y ^* )

in voiceprint recognition, the following steps are followed:

1. setting input and output.

Input x (voiceprint data point)

Output y ^* (most likely class label), U (x, y ^* ) (uncertainty measurement)

2. P (y|x) was modeled using GPR.

3. Calculate allMu (x, y) and sigma below ² (x,y)。

4. Find out

5. Calculating uncertainty measure U (x, y ^* )＝σ ² (x,y ^* )。

Based on the method, the accuracy of voiceprint recognition is improved, and a reliable uncertainty assessment is provided for each classification result by introducing Bayesian optimization and uncertainty measurement.

Exemplary apparatus

Fig. 3 is a schematic structural diagram of a voiceprint recognition device based on a neural network according to an exemplary embodiment of the present invention. As shown in fig. 3, the apparatus 300 includes:

the acquisition module 310 is used for acquiring and preprocessing the voiceprint signal to be identified;

the feature extraction module 320 is configured to perform feature extraction on the pre-processed voiceprint signal to be identified, and determine voiceprint features of the voiceprint signal to be identified;

the output module 330 is configured to input the voiceprint features into a pre-trained neural network classifier model, and output all possible class label sets of the voiceprint signals to be identified, where the classifier model adopts self-organizing mapping based on quantum optimization;

and the obtaining module 340 is configured to traverse the possible class label set according to the pre-constructed bayesian optimization probability model, and obtain an optimal class label of the voiceprint signal to be identified.

Optionally, the classifier model training process of the output module 330 is as follows:

the acquisition sub-module is used for acquiring a voiceprint signal data set of the microphone equipment;

the characteristic extraction submodule is used for carrying out characteristic extraction on the voiceprint signal data set by utilizing a Mel frequency cepstrum coefficient characteristic extraction method to determine a Mel frequency cepstrum coefficient set;

the preprocessing sub-module is used for preprocessing the voiceprint signal data set by adopting a maximum and minimum normalization and time domain noise filter combination method;

the expansion sub-module is used for carrying out data expansion on the voice print signal data set after pretreatment based on the generation of the echo state and the countermeasure network algorithm, and generating an expanded voice print signal data set;

the second feature extraction submodule is used for carrying out feature extraction on the expanded voiceprint signal data set by adopting a multi-scale convolutional neural network and determining a voiceprint feature set of the voiceprint signal data set;

the fusion submodule is used for forming a Mel frequency cepstrum coefficient set into a Mel frequency cepstrum coefficient matrix, and fusing the Mel frequency cepstrum coefficient matrix with the voiceprint feature set to determine the voiceprint fusion feature set;

and the training sub-module is used for inputting the voiceprint fusion feature set into the self-organizing map classifier based on quantum optimization for training to obtain a classifier model.

Optionally, the feature extraction submodule includes:

the first transformation unit is used for carrying out Fourier transformation on the voiceprint signals of the voiceprint signal data set respectively to obtain a frequency spectrum feature set;

the mapping unit is used for mapping the spectrum feature set to a Mel scale to obtain a Mel domain feature set;

and the second transformation unit is used for performing discrete cosine transformation on the Mel domain feature set to obtain a Mel frequency cepstrum coefficient set.

Optionally, the maximum and minimum normalization method formula is:

Optionally, the time domain noise filter formula is:

wherein x is ^′ Is normalized data, alpha is a weight parameter between 0 and 1, n is the local window size, w _i Is the weight of each point in the window and satisfies

Optionally, the bayesian optimization model is:

U(x,y ^* )＝σ ² (x,y ^* )

Exemplary electronic device

Fig. 4 is a structure of an electronic device provided in an exemplary embodiment of the present invention. As shown in fig. 4, the electronic device 40 includes one or more processors 41 and memory 42.

Claims

1. A voice print recognition method based on a neural network, comprising:

collecting voiceprint signals to be identified and preprocessing;

extracting features of the voice print signal to be identified after pretreatment, and determining voice print features of the voice print signal to be identified;

inputting the voiceprint characteristics into a pre-trained neural network classifier model, and outputting all possible class label sets of the voiceprint signals to be identified, wherein the classifier model adopts self-organizing mapping based on quantum optimization;

traversing the possible class label set according to a pre-constructed Bayesian optimization probability model, and acquiring the optimal class label of the voiceprint signal to be identified.

2. The method of claim 1, wherein the classifier model training process is as follows:

collecting a voiceprint signal dataset of a microphone device;

preprocessing the voiceprint signal data set by adopting a maximum and minimum normalization and time domain noise filter combination method;

performing feature extraction on the extended voiceprint signal data set by adopting a multi-scale convolutional neural network, and determining a voiceprint feature set of the voiceprint signal data set;

the Mel frequency cepstrum coefficient set is formed into a Mel frequency cepstrum coefficient matrix, and is fused with the voiceprint feature set to determine a voiceprint fusion feature set;

inputting the voiceprint fusion feature set to a self-organizing map classifier based on quantum optimization for training to obtain the classifier model.

3. The method of claim 2, wherein the microphone apparatus is any one of a dynamic microphone, a condenser microphone, and an array microphone described below.

4. The method of claim 1, wherein the feature extraction of the voiceprint signal dataset using a mel-frequency cepstral coefficient feature extraction method, determining a mel-frequency cepstral coefficient set, comprises:

and performing discrete cosine transform on the Mel domain feature set to obtain the Mel frequency cepstrum coefficient set.

5. The method of claim 2, wherein the maximum and minimum normalization method formula is:

6. The method of claim 2, wherein the time domain noise filter formula is:

7. The method of claim 2, wherein the generating an countermeasure network algorithm comprises generating G _ESN And a discriminator D, wherein the generator G _ESN Consists of an original generator and an echo state network.

8. The method according to claim 1, wherein the bayesian optimization model is:

9. The method as recited in claim 1, further comprising: and carrying out uncertainty measurement on the optimal category label, and determining the uncertainty of the optimal label, wherein the uncertainty measurement is expressed as the following formula:

U(x,y ^* )＝σ ² (x,y ^* )

10. A voice print recognition device based on a neural network, comprising:

the feature extraction module is used for carrying out feature extraction on the voice print signal to be identified after pretreatment and determining voice print features of the voice print signal to be identified;

the output module is used for inputting the voiceprint characteristics into a pre-trained neural network classifier model and outputting all possible class label sets of the voiceprint signals to be identified, wherein the classifier model adopts self-organizing mapping based on quantum optimization;

and the acquisition module is used for traversing the possible class label set according to a pre-constructed Bayesian optimization probability model and acquiring the optimal class label of the voiceprint signal to be identified.