CN113823294B - Cross-channel voiceprint recognition method, device, equipment and storage medium - Google Patents

Cross-channel voiceprint recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN113823294B
CN113823294B CN202111390613.7A CN202111390613A CN113823294B CN 113823294 B CN113823294 B CN 113823294B CN 202111390613 A CN202111390613 A CN 202111390613A CN 113823294 B CN113823294 B CN 113823294B
Authority
CN
China
Prior art keywords
channel
voiceprint
audio data
data
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111390613.7A
Other languages
Chinese (zh)
Other versions
CN113823294A (en
Inventor
郑方
佴瑞乾
李蓝天
王东
张琛
谢弈峥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Shanghai Pudong Development Bank Co Ltd
Original Assignee
Tsinghua University
Shanghai Pudong Development Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Shanghai Pudong Development Bank Co Ltd filed Critical Tsinghua University
Priority to CN202111390613.7A priority Critical patent/CN113823294B/en
Publication of CN113823294A publication Critical patent/CN113823294A/en
Application granted granted Critical
Publication of CN113823294B publication Critical patent/CN113823294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a cross-channel voiceprint recognition method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring voiceprint audio data to be identified, wherein the voiceprint audio data to be identified are acquired in channels in a set channel set, and the set channel set comprises at least two different channels; inputting the voiceprint audio data to be identified into a preset cross-channel voiceprint identification model to obtain a voiceprint audio data processing result, and identifying the voiceprint audio data according to the voiceprint audio data processing result; the cross-channel voiceprint recognition model is obtained by training voiceprint audio data collected in the set channel set through multiple iteration processes, and model parameters are trained through the voiceprint audio data collected in two different channels in each iteration process. The technical scheme of the invention can improve the identification accuracy of cross-channel voiceprint identification.

Description

Cross-channel voiceprint recognition method, device, equipment and storage medium
Technical Field
The present invention relates to the field of geostatistical methods, and in particular, to a cross-channel voiceprint recognition method, apparatus, electronic device, and non-transitory computer-readable storage medium.
Background
In recent years, with the intensive research of voiceprint recognition technology, the voiceprint recognition system has achieved satisfactory performance under single channel conditions. However, in practical applications, the voice signal may be transmitted through different channels, such as a network channel, a telephone channel, and so on. This channel difference will distort the speech signal to different degrees, affecting the performance of the voiceprint recognition system. For example, in the registration phase, the user's voice is collected by the network channel; in the recognition phase, the user's voice is picked up by the telephone channel. At this time, the voiceprint recognition performance will be greatly degraded due to channel mismatch. Considering the diversity of voiceprint authentication scenes, the voiceprint recognition technology of a single channel can greatly limit the popularization and application of the voiceprint technology.
Therefore, how to overcome the influence of channel change on the identification performance is a technical problem which needs to be solved at present to improve the identification performance of the voiceprint identification system under the cross-channel condition.
Disclosure of Invention
The invention provides a cross-channel voiceprint recognition method and device, electronic equipment and a non-transitory computer readable storage medium, which are used for solving the problem that cross-channel voiceprint recognition is difficult in the prior art and improving the accuracy of cross-channel voiceprint recognition.
The invention provides a cross-channel voiceprint recognition method, which comprises the following steps: acquiring voiceprint audio data to be identified, wherein the voiceprint audio data to be identified are acquired in channels in a set channel set, and the set channel set comprises at least two different channels; inputting the voiceprint audio data to be identified into a preset cross-channel voiceprint identification model to obtain a voiceprint audio data processing result, and identifying the voiceprint audio data according to the voiceprint audio data processing result; the cross-channel voiceprint recognition model is obtained by training voiceprint audio data collected in the set channel set through multiple iteration processes, and model parameters are trained through the voiceprint audio data collected in two different channels in each iteration process.
The cross-channel voiceprint recognition method provided by the invention further comprises a training process of the cross-channel voiceprint recognition model, wherein the training process comprises the following steps: acquiring a sample voiceprint audio data set collected in the set channel set, wherein the set channel set comprises a first channel and a second channel, and the sample voiceprint audio data in the sample voiceprint audio data set are collected in the at least two different channels; selecting sample voiceprint audio data in one channel, calculating a first loss function and an updated intermediate parameter of the sample voiceprint audio data in the channel corresponding to the sample voiceprint audio data, selecting sample voiceprint audio data in another channel except the one channel based on the updated intermediate parameter and the first loss function, calculating a second loss function and an updated model parameter of the sample voiceprint audio data in the channel corresponding to the sample voiceprint audio data, and completing an iteration process; and reselecting sample voiceprint audio data to perform an iterative process until the second loss function is converged to obtain the cross-channel voiceprint recognition model.
According to the cross-channel voiceprint recognition method provided by the invention, the at least two different channels comprise at least one of the following channel classifications: a wireless channel, a wired channel, and a storage channel.
According to the cross-channel voiceprint identification method provided by the invention, the voiceprint audio data to be identified comprise first data collected in a first channel and second data collected in a second channel; after obtaining the voiceprint audio data processing result, the method further includes: acquiring a similar relation between the first data and the second data according to a voiceprint audio data processing result corresponding to the first data and a voiceprint audio data processing result corresponding to the second data; and identifying whether the first data and the second data are from the same speaker according to the magnitude relation between the similarity relation and a set first threshold value.
According to the cross-channel voiceprint identification method provided by the invention, the voiceprint audio data to be identified comprises third data collected in a first channel; after obtaining the voiceprint audio data processing result, the method further includes: acquiring a similarity relation between the third data and the on-library data according to a voiceprint audio data processing result corresponding to the third data and on-library data in a voiceprint library, wherein the on-library data is obtained according to voiceprint audio data collected in a second channel; selecting fourth data with the maximum similarity with the third data from the database data according to the similarity relation; and identifying whether the third data and the fourth data are from the same speaker according to the magnitude relation between the similarity of the third data and the fourth data and a set second threshold value.
According to the cross-channel voiceprint recognition method provided by the invention, the similarity relation is obtained by calculating the cosine distance or performing probability linear discriminant analysis.
According to the inventionIn each iteration process, the intermediate parameters are updated according to the following formula:
Figure 863630DEST_PATH_IMAGE001
wherein,
Figure 211566DEST_PATH_IMAGE002
is that
Figure 964496DEST_PATH_IMAGE003
On-channel
Figure 296251DEST_PATH_IMAGE004
The loss function of (a) to (b),
Figure 425619DEST_PATH_IMAGE005
is collected from the channel
Figure 362482DEST_PATH_IMAGE004
The voice print audio data of (a) above,
Figure 53358DEST_PATH_IMAGE006
for the learning rate of the local update,
Figure 370944DEST_PATH_IMAGE007
is composed of
Figure 539889DEST_PATH_IMAGE008
The amount of change in (c); model parameters are updated according to the following formula
Figure 626531DEST_PATH_IMAGE008
To
Figure 488308DEST_PATH_IMAGE009
Figure 794655DEST_PATH_IMAGE010
Wherein,
Figure 265826DEST_PATH_IMAGE011
Figure 505177DEST_PATH_IMAGE012
is that
Figure 537855DEST_PATH_IMAGE013
On-channel
Figure 853472DEST_PATH_IMAGE014
The loss function of (a) to (b),
Figure 364218DEST_PATH_IMAGE015
is collected from the channel
Figure 926918DEST_PATH_IMAGE016
The voice print audio data of (a) above,
Figure 363453DEST_PATH_IMAGE017
is the global updated learning rate.
The invention provides a cross-channel voiceprint recognition device, which comprises: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring voiceprint audio data to be identified, the voiceprint audio data to be identified are acquired in channels in a set channel set, and the set channel set comprises at least two different channels; the identification unit is used for inputting the voiceprint audio data to be identified into a preset cross-channel voiceprint identification model to obtain a voiceprint audio data processing result so as to identify the voiceprint audio data according to the voiceprint audio data processing result; the cross-channel voiceprint recognition model is obtained by training voiceprint audio data collected in the set channel set through multiple iteration processes, and model parameters are trained through the voiceprint audio data collected in two different channels in each iteration process.
According to the cross-channel voiceprint recognition device provided by the invention, the device further comprises a training unit used for performing a training process on the cross-channel voiceprint recognition model, and the training unit comprises: a first obtaining subunit, configured to obtain a sample voiceprint audio data set collected in the set channel set, where the sample voiceprint audio data in the sample voiceprint audio data set are collected in the at least two different channels; the iteration subunit is configured to select sample voiceprint audio data in one channel, calculate a first loss function and an updated intermediate parameter of the sample voiceprint audio data in a channel corresponding to the iteration subunit, select sample voiceprint audio data in another channel other than the one channel based on the updated intermediate parameter and the first loss function, calculate a second loss function and an updated model parameter of the sample voiceprint audio data in the channel corresponding to the iteration subunit, complete an iteration process, and reselect the sample voiceprint audio data to perform the iteration process until the second loss function converges, so as to obtain the cross-channel voiceprint identification model.
According to the cross-channel voiceprint recognition device provided by the invention, the at least two different channels comprise at least one of the following channel classifications: a wireless channel, a wired channel, and a storage channel.
According to the cross-channel voiceprint recognition device provided by the invention, the voiceprint audio data to be recognized comprise first data collected in the first channel and second data collected in the second channel; the apparatus further includes a first similarity relation determination unit configured to: after the voiceprint audio data processing result is obtained, acquiring the similarity relation between the first data and the second data according to the voiceprint audio data processing result corresponding to the first data and the voiceprint audio data processing result corresponding to the second data; and identifying whether the first data and the second data are from the same speaker according to the magnitude relation between the similarity relation and a set first threshold value.
According to the cross-channel voiceprint recognition device provided by the invention, the voiceprint audio data to be recognized comprise third data collected in the first channel; the apparatus further includes a second similarity relation determination unit configured to: acquiring a similarity relation between the third data and the on-library data according to a voiceprint audio data processing result corresponding to the third data and the on-library data in a voiceprint library, wherein the on-library data is obtained according to the voiceprint audio data collected in the second channel; selecting fourth data with the maximum similarity with the third data from the database data according to the similarity relation; and identifying whether the third data and the fourth data are from the same speaker according to the magnitude relation between the similarity of the third data and the fourth data and a set second threshold value.
According to the cross-channel voiceprint recognition device provided by the invention, the iteration unit is further configured to: during each iteration, the intermediate parameters are updated according to the following formula:
Figure 378814DEST_PATH_IMAGE018
wherein,
Figure 755569DEST_PATH_IMAGE002
is that
Figure 671310DEST_PATH_IMAGE005
On-channel
Figure 45790DEST_PATH_IMAGE004
The loss function of (a) to (b),
Figure 46982DEST_PATH_IMAGE005
is collected from the channel
Figure 899532DEST_PATH_IMAGE004
The voice print audio data of (a) above,
Figure 436823DEST_PATH_IMAGE006
for the learning rate of the local update,
Figure 684003DEST_PATH_IMAGE007
is composed of
Figure 673956DEST_PATH_IMAGE008
The amount of change in (c); model parameters are updated according to the following formula
Figure 828731DEST_PATH_IMAGE008
To
Figure 220529DEST_PATH_IMAGE019
Figure 733550DEST_PATH_IMAGE020
Wherein,
Figure 709334DEST_PATH_IMAGE021
Figure 169266DEST_PATH_IMAGE012
is that
Figure 415570DEST_PATH_IMAGE013
On-channel
Figure 535711DEST_PATH_IMAGE014
The loss function of (a) to (b),
Figure 234676DEST_PATH_IMAGE015
is collected from the channel
Figure 498299DEST_PATH_IMAGE016
The voice print audio data of (a) above,
Figure 97645DEST_PATH_IMAGE017
is the global updated learning rate.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the cross-channel voiceprint recognition method as described in any of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the cross-channel voiceprint recognition method as described in any of the above.
According to the cross-channel voiceprint recognition method, the cross-channel voiceprint recognition device, the electronic equipment and the non-transient computer readable storage medium, model training of each iteration process is carried out on voiceprint audio data collected in two different channels, so that a cross-channel voiceprint recognition model suitable for different channels can be obtained, and the cross-channel voiceprint recognition model can be used for accurately recognizing the voiceprint audio data to be recognized.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a cross-channel voiceprint recognition method provided by the present invention;
FIG. 2 is a flow chart illustrating a training process of a cross-channel voiceprint recognition model provided by the present invention;
FIG. 3 is a flow chart of a two iteration process provided by the present invention;
FIG. 4 is a schematic structural diagram of a cross-channel voiceprint recognition apparatus provided by the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the one or more embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the invention. As used in one or more embodiments of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present invention refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used herein to describe various information in one or more embodiments of the present invention, such information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The terms used in the examples of the present invention are explained below:
voiceprint: one type of information in a speech signal is a general term for speech features that characterize the identity of a speaker and speech models built based on these features. Because the different speakers use different vocal organs such as tongue, oral cavity, nasal cavity, vocal cords, lung, etc. in different sizes and forms, and considering the difference of different speakers in age, character, language habit, etc., the characteristics of different speakers such as vocal volume and vocal frequency are greatly different. It can be said that the voiceprint patterns of any two persons are not identical.
And (3) voiceprint recognition: the method is also called speaker identification, which is a biological characteristic identification technology for automatically realizing speaker identification by utilizing a computer and various information identification technologies according to the voiceprint characteristics which can represent the personal information of the speaker in a voice signal. Voiceprint recognition is essentially a type of pattern recognition problem. A typical voiceprint recognition system generally consists of two phases, registration and recognition. Wherein, the registration tries to train the reserved voice of the user into a speaker model, and the recognition is to judge whether an unknown voice comes from a specified speaker.
In the related art, the traditional voiceprint recognition technology is based on a statistical probability model, wherein the most classical is a gaussian mixture model-general background model (GMM-UBM) architecture. To further enhance the expressive power of speaker characteristics under limited data, various subspace models are proposed in succession, the most notable of which is the i-vector model. The i-vector model introduces an important concept: speaker characterization vector (Speaker embedding), i.e. a continuous vector of a fixed length is used to characterize the Speaker characteristics.
In recent years, based on deep learning methods, researchers have proposed a series of voiceprint recognition model methods in sequence, such as: d-vector model, x-vector model, etc. Such models map a speech signal of random duration into a continuous vector of fixed length called Deep speaker characterization vector (Deep speaker embedding). Constructing a space for describing the characteristics of the speakers through the characterization vectors of the speakers; in this space, scoring and decision making for voiceprint recognition can be achieved.
For the mainstream speaker model, the training goal is usually to maximally distinguish different speakers without considering channel disturbance, which makes it difficult to be effective in the cross-channel task. To address the cross-channel problem, researchers have conducted a series of studies. Such research is mainly divided into two fields, one is channel adaptation; another class is channel generalization. For channel adaptation, the basic idea is to project a channel A into a channel B through a certain mapping function, and to complete registration and identification on the channel B; for channel generalization, the basic idea is to learn a channel-independent space, project both channel a and channel B into the space, and perform registration and identification.
In consideration of channel disturbance, the technical scheme of cross-channel voiceprint recognition is difficult to achieve high recognition accuracy.
To solve the problem, an embodiment of the present invention provides a cross-channel voiceprint recognition scheme. The scheme is a channel robustness optimization method, and can improve the channel generalization of a voiceprint recognition system, so that the problem of cross-channel recognition is solved. The technical scheme of the embodiment of the invention belongs to the field of second-class channel generalization.
The following detailed description of exemplary embodiments of the invention refers to the accompanying drawings.
Fig. 1 is a flowchart illustrating a cross-channel voiceprint recognition method according to an embodiment of the present invention. The method provided by the embodiment of the invention can be executed by any electronic equipment with computer processing capability, such as a terminal or a server. As shown in fig. 1, the cross-channel voiceprint recognition method includes:
step 102, obtaining voiceprint audio data to be identified, wherein the voiceprint audio data to be identified is collected in a channel in a set channel set, and the set channel set comprises at least two different channels.
Specifically, the at least two channels may be a first channel and a second channel with different transmission media.
Step 104, inputting the voiceprint audio data to be identified into a preset cross-channel voiceprint identification model to obtain a voiceprint audio data processing result, and identifying the voiceprint audio data according to the voiceprint audio data processing result; the cross-channel voiceprint recognition model is obtained by training voiceprint audio data collected in a set channel through a plurality of iteration processes, and model parameters are trained by the voiceprint audio data collected in two different channels in each iteration process.
In particular, the cross-channel voiceprint recognition model is a deep neural network model. And the data processing result is a characteristic vector of the voiceprint audio data to be recognized output by the cross-channel voiceprint recognition model, and when the voiceprint audio data to be recognized is the voiceprint audio data, the voiceprint audio data processing result is a speaker characterization vector. According to the feature vector or the speaker characterization vector, comparison between two voice print audio data or comparison between the current input voice print and the voice print in the database can be performed in the space for describing the characteristics of the speaker.
In the embodiment of the invention, in the training process of the cross-channel voiceprint recognition model, each iteration part adopts voiceprint audio data in two different channels for training, so that channel generalization can be better realized, and the accuracy is higher during cross-channel voiceprint recognition.
Before step 104, a training process for the cross-channel voiceprint recognition model is further included, as shown in fig. 2, the training process includes:
step 201, a sample voiceprint audio data set collected in a set channel set is obtained, and sample voiceprint audio data in the sample voiceprint audio data set are collected in at least two different channels.
Step 202, selecting sample voiceprint audio data for iteration, specifically, selecting sample voiceprint audio data in one channel, calculating a first loss function and an updated intermediate parameter of the sample voiceprint audio data in the channel corresponding to the sample voiceprint audio data, selecting sample voiceprint audio data in another channel except the channel based on the updated intermediate parameter and the first loss function, calculating a second loss function and an updated model parameter of the sample voiceprint audio data in the channel corresponding to the sample voiceprint audio data, and completing an iteration process.
In step 203, it is determined whether the second penalty function is converged, if yes, step 204 is executed, and if not, step 202 is executed.
And step 204, obtaining a cross-channel voiceprint recognition model.
In step 202, the operation of updating the intermediate parameters is a local update phase of model parameter update, and the operation of updating the model parameters is a global update phase of model parameter update. The training data for these two phases come from different channels.
In an embodiment of the invention, the at least two different channels comprise at least one of the following channel classes: a wireless channel, a wired channel, and a storage channel.
The two different channels may be different channels in the same category of channels, for example, channels of two different transmission media in a wired channel, or two channels in different categories, for example, one is a wired channel and one is a wireless channel.
In one embodiment, the two phases of training data are from two different channels
Figure 155731DEST_PATH_IMAGE022
. Wherein,
Figure 106107DEST_PATH_IMAGE023
representation from channel
Figure 907841DEST_PATH_IMAGE004
And channel
Figure 659896DEST_PATH_IMAGE014
The data set of (2).
Figure 652998DEST_PATH_IMAGE005
And
Figure 326556DEST_PATH_IMAGE015
is a channel
Figure 666401DEST_PATH_IMAGE004
And channel
Figure 240340DEST_PATH_IMAGE016
A subset of (a).
Figure 905808DEST_PATH_IMAGE024
Are model parameters of the trained model.
During each iteration, the intermediate parameters are updated according to the following formula
Figure 535503DEST_PATH_IMAGE025
Figure 443154DEST_PATH_IMAGE026
Wherein,
Figure 373064DEST_PATH_IMAGE002
namely, it is
Figure 707968DEST_PATH_IMAGE027
Which is prepared from
Figure 90539DEST_PATH_IMAGE005
On-channel
Figure 37766DEST_PATH_IMAGE004
The loss function of (a) to (b),
Figure 320718DEST_PATH_IMAGE005
is collected from the channel
Figure 124726DEST_PATH_IMAGE004
The voice print audio data of (a) above,
Figure 994593DEST_PATH_IMAGE006
for the learning rate of the local update,
Figure 244046DEST_PATH_IMAGE007
is composed of
Figure 148549DEST_PATH_IMAGE008
The amount of change in (c).
Model parameters are updated according to the following formula
Figure 795562DEST_PATH_IMAGE008
To
Figure 940277DEST_PATH_IMAGE019
Figure 229307DEST_PATH_IMAGE010
Wherein,
Figure 486850DEST_PATH_IMAGE011
Figure 570344DEST_PATH_IMAGE012
namely, it is
Figure 211541DEST_PATH_IMAGE028
Which is a
Figure 802797DEST_PATH_IMAGE015
On-channel
Figure 150733DEST_PATH_IMAGE016
The loss function of (a) to (b),
Figure 670707DEST_PATH_IMAGE015
is collected from the channel
Figure 500998DEST_PATH_IMAGE014
The voice print audio data of (a) above,
Figure 131830DEST_PATH_IMAGE017
is the global updated learning rate.
In this solution, the model parameters
Figure 662169DEST_PATH_IMAGE024
The updates are done only at global updates, while local updates are computed
Figure 851579DEST_PATH_IMAGE029
Only the intermediate parameters of the gradient are calculated as global updates.
Before step 104, preprocessing the voiceprint audio data to be recognized is required, and when the voiceprint audio data to be recognized is the voiceprint audio data, the preprocessing operation may be a noise reduction operation or a mute segment data removal operation, or the noise reduction operation or the mute segment data removal operation may be performed at the same time.
In the voiceprint recognition technology, whether two voiceprint audio data are the same speaker or not can be compared, namely one-to-one confirmation is carried out; the voiceprint audio data of the same speaker as the current voiceprint audio data can also be recognized from a plurality of voiceprint audio data, namely, one-to-many recognition is carried out.
In one embodiment of the invention, the voiceprint audio data to be identified comprises first data collected in a first channel and second data collected in a second channel; after step 104, one-to-one confirmation of the voiceprint audio data may be performed, and specifically, a similarity relationship between the first data and the second data is obtained according to a voiceprint audio data processing result corresponding to the first data and a voiceprint audio data processing result corresponding to the second data; and identifying whether the first data and the second data are from the same speaker according to the magnitude relation between the similarity relation and the set first threshold value.
This embodiment may be used for one-to-one validation of voiceprint audio data collected under different channels of the same determined user. For example, the voiceprint audio data of the user collected at the mobile phone end is compared with the voiceprint audio data of the same user collected at other equipment for confirmation.
In another embodiment of the invention, the voiceprint audio data to be identified comprises third data collected in the first channel; after step 104, identifying the voiceprint audio data, specifically, obtaining a similarity relationship between the third data and the in-library data according to a voiceprint audio data processing result corresponding to the third data and the in-library data in the voiceprint library, wherein the in-library data is obtained according to the voiceprint audio data collected in the second channel; selecting fourth data with the maximum similarity with the third data from the database data according to the similarity relation; and identifying whether the third data and the fourth data are from the same speaker according to the magnitude relation between the similarity of the third data and the fourth data and a set second threshold value.
Wherein the voiceprint library stores voiceprint audio data of a plurality of different speakers of the second channel. This embodiment can be used for one-to-many recognition of voiceprint audio data acquired under different channels of an uncertain user. For example, the voiceprint audio data of a certain user collected at the mobile phone end is compared with the database data collected at other equipment for identification.
The similarity relation can be obtained by calculating cosine distance or performing probability linear discriminant analysis. The scheme of calculating the similarity by adopting the cosine distance algorithm is simple, and the rear-end algorithms such as the probability linear discriminant analysis algorithm and the like are slightly complex, but have higher calculation accuracy.
The channel robustness optimization method provided by the embodiment of the invention has great advantages in cross-channel condition.
To formula
Figure 405052DEST_PATH_IMAGE030
Performing a first order taylor expansion yields:
Figure 573996DEST_PATH_IMAGE031
hypothesis pair channel
Figure 660638DEST_PATH_IMAGE004
And channel
Figure 522415DEST_PATH_IMAGE016
The sequence in the training process is not limited, and then the following formula is provided:
Figure 625500DEST_PATH_IMAGE032
wherein,
Figure 96671DEST_PATH_IMAGE033
and
Figure 539284DEST_PATH_IMAGE034
respectively from the channel
Figure 70498DEST_PATH_IMAGE035
And
Figure 598562DEST_PATH_IMAGE036
and is and
Figure 109309DEST_PATH_IMAGE037
in the formula (2), the first term on the right side of the equation
Figure 170544DEST_PATH_IMAGE038
The method is equivalent to accumulating the loss value of each channel data in the data set, and is equivalent to the loss value of the mixed training of a plurality of channels. The second term on the right of the equation can be regarded as a regularization term, which is a loss functionInner products of gradients on different channels.
In model training, the optimization objective is to minimize the loss function. Obviously, the optimization of the first term on the right side of the equation results in model parameters
Figure 374123DEST_PATH_IMAGE024
Gradually converging; while the first term converges, the second term on the right side of the equation ensures that the gradient directions of different channels are consistent as much as possible, i.e., the directions are consistent, and the inner product is maximum. This means that optimizing the objective function will ensure, on the one hand, that performance optimizations are identified on the individual channels and, on the other hand, that the performance optimizations are consistent across the individual channels.
Hereinafter, taking 16kHz voice data of network channel and 8kHz voice data of telephone channel as an example, the training and testing process of the present invention is shown.
In the iterative process diagram shown in fig. 3, two rounds of iterative processes are shown. Wherein,
Figure 186221DEST_PATH_IMAGE039
the model parameters in each iteration are respectively, and the solid arrow represents the parameter updating direction of each iteration. As indicated by the dashed arrows in the figure, the first round of training includes two steps, local update and global update, the first step local update using 8kHz channel data, the second step global update using 16kHz channel data; similarly, in the second round of training, the first step local update uses 16kHz channel data and the second step global update uses 8kHz channel data. After a plurality of rounds of training, finally optimized model parameters are obtained
Figure 999194DEST_PATH_IMAGE040
In the test stage, the network channel 16kHz voice data and the telephone channel 8kHz voice data can be respectively mapped into the same parameter space through the model, and registration and confirmation recognition are completed in the space.
In an application channel of the embodiment of the invention, a user registers voiceprint through an application program of a mobile terminal and consults services through a call center. In the process, the business system of the merchant uses voiceprint recognition to authenticate the user identity to ensure the business safety. In the process, the voice with the sampling rate of 16KHz is collected through the network channel of the mobile terminal, the voice with the sampling rate of 8KHz is collected through the telephone channel, and the comparison of the two voices belongs to cross-channel comparison, namely cross-channel identification.
The technical scheme of the embodiment of the invention has simple training process and can be easily transferred to various deep learning frames. In addition, the optimization of each channel is guaranteed, the consistency of the optimization of each channel is guaranteed, the optimization deviation among different channels is avoided, and overfitting on some channels is prevented.
As can be seen from the formula (2), the method has a strong mathematical theoretical basis, and the effectiveness of the scheme is proved. The technical scheme of the embodiment of the invention is not only suitable for the problem of cross-channel voiceprint recognition, but also can be popularized to other related applications of pattern recognition, such as image recognition channels of face recognition and the like.
According to the cross-channel voiceprint recognition method provided by the invention, model training in each iteration process is carried out by adopting the voiceprint audio data collected in two different channels, so that a cross-channel voiceprint recognition model suitable for different channels can be obtained, and the cross-channel voiceprint recognition model can be used for accurately recognizing the voiceprint audio data to be recognized.
The cross-channel voiceprint recognition device provided by the invention is described below, and the cross-channel voiceprint recognition device described below and the cross-channel voiceprint recognition method described above can be referred to correspondingly.
As shown in fig. 4, an apparatus for cross-channel voiceprint recognition according to an embodiment of the present invention includes:
an obtaining unit 402, configured to obtain voiceprint audio data to be identified, where the voiceprint audio data to be identified is collected in a channel in a set channel set, and the set channel set includes at least two different channels.
The identification unit 404 is configured to input voiceprint audio data to be identified into a preset cross-channel voiceprint identification model to obtain a voiceprint audio data processing result, and perform voiceprint audio data identification according to the voiceprint audio data processing result; the cross-channel voiceprint recognition model is obtained by training voiceprint audio data collected in a set channel through a plurality of iteration processes, and model parameters are trained by the voiceprint audio data collected in two different channels in each iteration process.
In the embodiment of the present invention, the training unit is further included for performing a training process on the cross-channel voiceprint recognition model, and the training unit includes: the first acquisition subunit is used for acquiring a sample voiceprint audio data set acquired in a set channel set, wherein the sample voiceprint audio data in the sample voiceprint audio data set are acquired in at least two different channels; the iteration subunit is used for selecting sample voiceprint audio data in one channel, calculating a first loss function and an updated intermediate parameter of the sample voiceprint audio data in the channel corresponding to the iteration subunit, selecting sample voiceprint audio data in another channel except the channel based on the updated intermediate parameter and the first loss function, calculating a second loss function and an updated model parameter of the sample voiceprint audio data in the channel corresponding to the iteration subunit, completing an iteration process, and reselecting the sample voiceprint audio data to perform an iteration process until the second loss function is converged to obtain a cross-channel voiceprint recognition model.
In the embodiment of the invention, the voice print audio data to be identified comprises first data collected in a first channel and second data collected in a second channel; the apparatus further includes a first similarity relation determination unit configured to: after obtaining the voiceprint audio data processing result, acquiring a similarity relation between the first data and the second data according to the voiceprint audio data processing result corresponding to the first data and the voiceprint audio data processing result corresponding to the second data; and identifying whether the first data and the second data are from the same speaker according to the magnitude relation between the similarity relation and the set first threshold value.
In the embodiment of the invention, the voiceprint audio data to be identified comprises third data collected in a first channel; the apparatus further includes a second similarity relation determination unit configured to: acquiring a similarity relation between the third data and the on-library data according to the voiceprint audio data processing result corresponding to the third data and the on-library data in the voiceprint library, wherein the on-library data is obtained according to the voiceprint audio data collected in the second channel; selecting fourth data with the maximum similarity with the third data from the database data according to the similarity relation; and identifying whether the third data and the fourth data are from the same speaker according to the magnitude relation between the similarity of the third data and the fourth data and a set second threshold value.
In an embodiment of the invention, the at least two different channels comprise at least one of the following channel classes: a wireless channel, a wired channel, and a storage channel.
The two different channels may be different channels in the same category of channels, for example, channels of two different transmission media in a wired channel, or two channels in different categories, for example, one is a wired channel and one is a wireless channel.
In an embodiment of the present invention, the iteration unit is further configured to: during each iteration, the intermediate parameters are updated according to the following formula:
Figure 416400DEST_PATH_IMAGE041
wherein,
Figure 790881DEST_PATH_IMAGE002
is that
Figure 792073DEST_PATH_IMAGE005
On-channel
Figure 644622DEST_PATH_IMAGE004
The loss function of (a) to (b),
Figure 946028DEST_PATH_IMAGE005
is collected from the channel
Figure 491410DEST_PATH_IMAGE004
The voice print audio data of (a) above,
Figure 215784DEST_PATH_IMAGE006
for the learning rate of the local update,
Figure 636138DEST_PATH_IMAGE007
is composed of
Figure 27937DEST_PATH_IMAGE008
The amount of change in (c).
Model parameters are updated according to the following formula
Figure 744220DEST_PATH_IMAGE008
To
Figure 188845DEST_PATH_IMAGE019
Figure 648777DEST_PATH_IMAGE042
Wherein,
Figure 393617DEST_PATH_IMAGE043
Figure 280801DEST_PATH_IMAGE012
is that
Figure 42084DEST_PATH_IMAGE013
On-channel
Figure 538662DEST_PATH_IMAGE014
The loss function of (a) to (b),
Figure 905052DEST_PATH_IMAGE015
is collected from the channel
Figure 963138DEST_PATH_IMAGE016
The voice print audio data of (a) above,
Figure 483873DEST_PATH_IMAGE017
is the global updated learning rate.
Since each functional module of the cross-channel voiceprint recognition apparatus in the exemplary embodiment of the present invention corresponds to the step of the exemplary embodiment of the cross-channel voiceprint recognition method, for details that are not disclosed in the embodiment of the apparatus of the present invention, please refer to the above-mentioned embodiment of the cross-channel voiceprint recognition method of the present invention.
According to the cross-channel voiceprint recognition device provided by the invention, model training in each iteration process is carried out by adopting the voiceprint audio data collected in two different channels, so that a cross-channel voiceprint recognition model suitable for different channels can be obtained, and the cross-channel voiceprint recognition model can be used for accurately recognizing the voiceprint audio data to be recognized.
Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a cross-channel voiceprint recognition method comprising: acquiring voiceprint audio data to be identified, wherein the voiceprint audio data to be identified are acquired in channels in a set channel set, and the set channel set comprises at least two different channels; inputting the voiceprint audio data to be identified into a preset cross-channel voiceprint identification model to obtain a voiceprint audio data processing result, and identifying the voiceprint audio data according to the voiceprint audio data processing result; the cross-channel voiceprint recognition model is obtained by training voiceprint audio data collected in the set channel set through multiple iteration processes, and model parameters are trained through the voiceprint audio data collected in two different channels in each iteration process.
Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the cross-channel voiceprint recognition method provided by the above methods, the method comprising: acquiring voiceprint audio data to be identified, wherein the voiceprint audio data to be identified are acquired in channels in a set channel set, and the set channel set comprises at least two different channels; inputting the voiceprint audio data to be identified into a preset cross-channel voiceprint identification model to obtain a voiceprint audio data processing result, and identifying the voiceprint audio data according to the voiceprint audio data processing result; the cross-channel voiceprint recognition model is obtained by training voiceprint audio data collected in the set channel set through multiple iteration processes, and model parameters are trained through the voiceprint audio data collected in two different channels in each iteration process.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that when executed by a processor is implemented to perform the cross-channel voiceprint recognition methods provided above, the method comprising: acquiring voiceprint audio data to be identified, wherein the voiceprint audio data to be identified are acquired in channels in a set channel set, and the set channel set comprises at least two different channels; inputting the voiceprint audio data to be identified into a preset cross-channel voiceprint identification model to obtain a voiceprint audio data processing result, and identifying the voiceprint audio data according to the voiceprint audio data processing result; the cross-channel voiceprint recognition model is obtained by training voiceprint audio data collected in the set channel set through multiple iteration processes, and model parameters are trained through the voiceprint audio data collected in two different channels in each iteration process.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A cross-channel voiceprint recognition method, comprising:
acquiring voiceprint audio data to be identified, wherein the voiceprint audio data to be identified are acquired in channels in a set channel set, and the set channel set comprises at least two different channels;
inputting the voiceprint audio data to be identified into a preset cross-channel voiceprint identification model to obtain a voiceprint audio data processing result, and identifying the voiceprint audio data according to the voiceprint audio data processing result;
the cross-channel voiceprint recognition model is obtained by training voiceprint audio data collected in the set channel set through multiple iteration processes, and model parameters are trained through voiceprint audio data collected in two different channels in each iteration process;
the training process of the cross-channel voiceprint recognition model comprises the following steps:
acquiring a sample voiceprint audio data set collected in the set channel set, wherein the sample voiceprint audio data in the sample voiceprint audio data set are collected in the at least two different channels;
selecting sample voiceprint audio data in one channel, calculating a first loss function and an updated intermediate parameter of the sample voiceprint audio data in the channel corresponding to the sample voiceprint audio data, selecting sample voiceprint audio data in another channel except the one channel based on the updated intermediate parameter and the first loss function, calculating a second loss function and an updated model parameter of the sample voiceprint audio data in the channel corresponding to the sample voiceprint audio data, and completing an iteration process;
and reselecting sample voiceprint audio data to perform an iterative process until the second loss function is converged to obtain the cross-channel voiceprint recognition model.
2. The method of claim 1, wherein the at least two different channels comprise at least one of the following channel classifications: a wireless channel, a wired channel, and a storage channel.
3. The method according to claim 1, wherein the voiceprint audio data to be identified comprises first data collected in a first channel and second data collected in a second channel; after obtaining the voiceprint audio data processing result, the method further includes:
acquiring a similar relation between the first data and the second data according to a voiceprint audio data processing result corresponding to the first data and a voiceprint audio data processing result corresponding to the second data;
and identifying whether the first data and the second data are from the same speaker according to the magnitude relation between the similarity relation and a set first threshold value.
4. The method of claim 1, wherein the voiceprint audio data to be identified comprises third data collected on a first channel; after obtaining the voiceprint audio data processing result, the method further includes:
acquiring a similarity relation between the third data and the on-library data according to a voiceprint audio data processing result corresponding to the third data and the on-library data in a voiceprint library, wherein the on-library data is obtained according to voiceprint audio data collected in a second channel;
selecting fourth data with the maximum similarity with the third data from the database data according to the similarity relation;
and identifying whether the third data and the fourth data are from the same speaker according to the magnitude relation between the similarity of the third data and the fourth data and a set second threshold value.
5. The method according to claim 3 or 4, wherein the similarity relation is obtained by calculating a cosine distance or performing a probabilistic linear discriminant analysis.
6. The method of claim 1, wherein during each of said iterations, the intermediate parameters are updated according to the following formula:
Figure DEST_PATH_IMAGE001
wherein,
Figure 497595DEST_PATH_IMAGE002
is that
Figure DEST_PATH_IMAGE003
On-channel
Figure 295786DEST_PATH_IMAGE004
The loss function of (a) to (b),
Figure 25845DEST_PATH_IMAGE003
is collected from the channel
Figure 960303DEST_PATH_IMAGE004
The voice print audio data of (a) is,
Figure DEST_PATH_IMAGE005
for the learning rate of the local update,
Figure 484825DEST_PATH_IMAGE006
is composed of
Figure DEST_PATH_IMAGE007
The amount of change in (c); model parameters are updated according to the following formula
Figure 770313DEST_PATH_IMAGE007
To
Figure 772904DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
Wherein,
Figure 827448DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE011
is that
Figure 788451DEST_PATH_IMAGE012
On-channel
Figure DEST_PATH_IMAGE013
The loss function of (a) to (b),
Figure 561235DEST_PATH_IMAGE012
is collected from the channel
Figure 101937DEST_PATH_IMAGE013
The voice print audio data of (a) is,
Figure 276567DEST_PATH_IMAGE014
is the global updated learning rate.
7. A cross-channel voiceprint recognition apparatus comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring voiceprint audio data to be identified, the voiceprint audio data to be identified are acquired in channels in a set channel set, and the set channel set comprises at least two different channels;
the identification unit is used for inputting the voiceprint audio data to be identified into a preset cross-channel voiceprint identification model to obtain a voiceprint audio data processing result so as to identify the voiceprint audio data according to the voiceprint audio data processing result;
the cross-channel voiceprint recognition model is obtained by training voiceprint audio data collected in the set channel set through multiple iteration processes, and model parameters are trained through voiceprint audio data collected in two different channels in each iteration process;
a training unit for performing a training process on the cross-channel voiceprint recognition model, the training unit comprising: a first obtaining subunit, configured to obtain a sample voiceprint audio data set collected in the set channel set, where the sample voiceprint audio data in the sample voiceprint audio data set are collected in the at least two different channels; the iteration subunit is configured to select sample voiceprint audio data in one channel, calculate a first loss function and an updated intermediate parameter of the sample voiceprint audio data in a channel corresponding to the iteration subunit, select sample voiceprint audio data in another channel other than the one channel based on the updated intermediate parameter and the first loss function, calculate a second loss function and an updated model parameter of the sample voiceprint audio data in the channel corresponding to the iteration subunit, complete an iteration process, and reselect the sample voiceprint audio data to perform the iteration process until the second loss function converges, so as to obtain the cross-channel voiceprint identification model.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 6 are implemented when the processor executes the program.
9. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202111390613.7A 2021-11-23 2021-11-23 Cross-channel voiceprint recognition method, device, equipment and storage medium Active CN113823294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111390613.7A CN113823294B (en) 2021-11-23 2021-11-23 Cross-channel voiceprint recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111390613.7A CN113823294B (en) 2021-11-23 2021-11-23 Cross-channel voiceprint recognition method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113823294A CN113823294A (en) 2021-12-21
CN113823294B true CN113823294B (en) 2022-03-11

Family

ID=78919679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111390613.7A Active CN113823294B (en) 2021-11-23 2021-11-23 Cross-channel voiceprint recognition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113823294B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10548534B2 (en) * 2016-03-21 2020-02-04 Sonde Health Inc. System and method for anhedonia measurement using acoustic and contextual cues
CN111312283B (en) * 2020-02-24 2023-03-21 中国工商银行股份有限公司 Cross-channel voiceprint processing method and device
CN111402899B (en) * 2020-03-25 2023-10-13 中国工商银行股份有限公司 Cross-channel voiceprint recognition method and device
CN111611566B (en) * 2020-05-12 2023-09-05 珠海造极智能生物科技有限公司 Speaker verification system and replay attack detection method thereof
CN112820299B (en) * 2020-12-29 2021-09-14 马上消费金融股份有限公司 Voiceprint recognition model training method and device and related equipment
CN112820298B (en) * 2021-01-14 2022-11-22 中国工商银行股份有限公司 Voiceprint recognition method and device
CN113421573B (en) * 2021-06-18 2024-03-19 马上消费金融股份有限公司 Identity recognition model training method, identity recognition method and device

Also Published As

Publication number Publication date
CN113823294A (en) 2021-12-21

Similar Documents

Publication Publication Date Title
US11031018B2 (en) System and method for personalized speaker verification
US10347241B1 (en) Speaker-invariant training via adversarial learning
CN110164452B (en) Voiceprint recognition method, model training method and server
McLaren et al. Advances in deep neural network approaches to speaker recognition
WO2019237517A1 (en) Speaker clustering method and apparatus, and computer device and storage medium
US11456003B2 (en) Estimation device, learning device, estimation method, learning method, and recording medium
CN108109613A (en) For the audio training of Intelligent dialogue voice platform and recognition methods and electronic equipment
CN102024455A (en) Speaker recognition system and method
CN111091809B (en) Regional accent recognition method and device based on depth feature fusion
WO2020045313A1 (en) Mask estimation device, mask estimation method, and mask estimation program
CN108520752A (en) A kind of method for recognizing sound-groove and device
CN108877812B (en) Voiceprint recognition method and device and storage medium
CN110648671A (en) Voiceprint model reconstruction method, terminal, device and readable storage medium
Meyer et al. Anonymizing speech with generative adversarial networks to preserve speaker privacy
KR102026226B1 (en) Method for extracting signal unit features using variational inference model based deep learning and system thereof
CN110085236B (en) Speaker recognition method based on self-adaptive voice frame weighting
CN115457938A (en) Method, device, storage medium and electronic device for identifying awakening words
CN109377984B (en) ArcFace-based voice recognition method and device
CN111462762B (en) Speaker vector regularization method and device, electronic equipment and storage medium
CN113823294B (en) Cross-channel voiceprint recognition method, device, equipment and storage medium
CN111028847A (en) Voiceprint recognition optimization method based on back-end model and related device
CN116434758A (en) Voiceprint recognition model training method and device, electronic equipment and storage medium
CN112599118B (en) Speech recognition method, device, electronic equipment and storage medium
US20230206926A1 (en) A deep neural network training method and apparatus for speaker verification
CN113870840A (en) Voice recognition method, device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant