CN114822510B - Voice awakening method and system based on binary convolutional neural network - Google Patents

Voice awakening method and system based on binary convolutional neural network Download PDF

Info

Publication number
CN114822510B
CN114822510B CN202210737439.7A CN202210737439A CN114822510B CN 114822510 B CN114822510 B CN 114822510B CN 202210737439 A CN202210737439 A CN 202210737439A CN 114822510 B CN114822510 B CN 114822510B
Authority
CN
China
Prior art keywords
voice
network
student
output
binary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210737439.7A
Other languages
Chinese (zh)
Other versions
CN114822510A (en
Inventor
王啸
李郡
付冠宇
尚德龙
周玉梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Nanjing Intelligent Technology Research Institute
Original Assignee
Zhongke Nanjing Intelligent Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Nanjing Intelligent Technology Research Institute filed Critical Zhongke Nanjing Intelligent Technology Research Institute
Priority to CN202210737439.7A priority Critical patent/CN114822510B/en
Publication of CN114822510A publication Critical patent/CN114822510A/en
Application granted granted Critical
Publication of CN114822510B publication Critical patent/CN114822510B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention relates to a voice awakening method and a system based on a binary convolution neural network, relating to the field of voice recognition, wherein the method comprises the following steps: performing MFCC feature extraction on each voice sample of the voice data set to obtain a continuous MFCC feature frame corresponding to each voice sample; taking the continuous MFCC characteristic frames as the input of a teacher network, and taking the labels corresponding to the voice samples as the output to train the teacher network, so as to obtain a trained teacher network; based on a knowledge distillation method, adopting a trained teacher network to conduct guide training on a student network, and taking the trained student network as a voice wake-up system classifier; the student network is a binary convolution neural network; and performing MFCC feature extraction on the voice signal to be recognized, inputting the extracted continuous MFCC feature frame into a voice awakening system classifier, and inputting the output of the voice awakening system classifier into a voice awakening system. The invention reduces the calculation amount and power consumption of voice recognition.

Description

Voice awakening method and system based on binary convolutional neural network
Technical Field
The invention relates to the technical field of voice recognition, in particular to a voice awakening method and system based on a binary convolutional neural network.
Background
The voice wake-up system usually runs on the mobile device, and the mobile device has a small memory and limited computing power, so the voice wake-up system should simultaneously meet the requirements of high accuracy, small memory for running and small computing amount. However, the high-performance deep neural network model has high complexity and large computation amount, and often occupies a large amount of memory, so that it is difficult to deploy the deep neural network model to a mobile terminal with a smaller memory. Therefore, the network needs to be compressed to obtain a lightweight model, which is more convenient to deploy to the mobile terminal device.
Disclosure of Invention
The invention aims to provide a voice awakening method and a voice awakening system based on a binary convolutional neural network, which reduce the calculated amount and the power consumption of voice recognition.
In order to achieve the purpose, the invention provides the following scheme:
a voice awakening method based on a binary convolutional neural network comprises the following steps:
performing MFCC feature extraction on each voice sample of the voice data set to obtain a continuous MFCC feature frame corresponding to each voice sample; the label of each voice sample comprises a keyword and a non-keyword;
taking continuous MFCC characteristic frames as input of a teacher network, and taking labels corresponding to voice samples as output to train the teacher network, so as to obtain a trained teacher network;
based on a knowledge distillation method, adopting a trained teacher network to conduct guided training on a student network, and taking the trained student network as a voice wake-up system classifier; the student network is a binary convolution neural network;
and performing MFCC feature extraction on a voice signal to be recognized, inputting the extracted continuous MFCC feature frame into the voice awakening system classifier, and inputting the output of the voice awakening system classifier into the voice awakening system.
Optionally, the loss function adopted in the student network training is a KD loss function, where the KD loss function is expressed as:
L KD (W student )=aT 2 *CrossEntropy(
Figure 800350DEST_PATH_IMAGE001
)+(1-a)*CrossEntropy(Q s ,y true );
wherein, the first and the second end of the pipe are connected with each other,L KD (W student ) Represents the function of the KD loss as described,CrossEntropy(. -) represents a cross entropy loss function,
Figure 124015DEST_PATH_IMAGE002
representing the probability of the student network outputTThe ratio of (a) to (b),
Figure 339971DEST_PATH_IMAGE003
probability of representing the teacher network output andTthe ratio of (a) to (b),Tin order to set the parameters, the user can select the parameters,ain order to set the parameters, the user can select the parameters,Q s the probability output for the student network is,y true are tags retrieved from the voice data set.
Optionally, the teacher network is Resnet152.
Optionally, the binary convolutional neural network includes a convolutional layer, a batch normalization layer, a ReLU activation function, 3 blocks, a maximum pooling layer, and a fully-connected layer, which are connected in sequence, where each Block includes a binarized convolutional layer, a batch normalization layer, and a ReLU activation function, which are connected in sequence.
Optionally, the set of voice data is a *** voice command set.
The invention also discloses a voice awakening system based on the binary convolution neural network, which comprises the following components:
the MFCC feature extraction module is used for performing MFCC feature extraction on each voice sample of the voice data set to obtain a continuous MFCC feature frame corresponding to each voice sample; the label of each voice sample comprises a keyword and a non-keyword;
the teacher network training module is used for training the teacher network by taking the continuous MFCC characteristic frames as the input of the teacher network and taking the labels corresponding to the voice samples as the output to obtain the trained teacher network;
the student network training module is used for adopting a trained teacher network to conduct guide training on the student network based on a knowledge distillation method, and taking the trained student network as a voice awakening system classifier; the student network is a binary convolution neural network;
and the to-be-recognized voice signal classification module is used for performing MFCC feature extraction on the to-be-recognized voice signal, inputting the extracted continuous MFCC feature frames into the voice awakening system classifier, and inputting the output of the voice awakening system classifier into the voice awakening system.
Optionally, the loss function adopted in the student network training is a KD loss function, where the KD loss function is expressed as:
L KD (W student )=aT 2 *CrossEntropy(
Figure 987377DEST_PATH_IMAGE001
)+(1-a)*CrossEntropy(Q s ,y true );
wherein, the first and the second end of the pipe are connected with each other,L KD (W student ) Represents the function of the KD loss as described,CrossEntropy(-) represents a cross-entropy loss function,
Figure 988700DEST_PATH_IMAGE002
probability of representing the student network output andTthe ratio of (a) to (b),
Figure 483267DEST_PATH_IMAGE003
probability of representing the teacher network output andTthe ratio of (a) to (b),Tin order to set the parameters for the first time,ain order to set the parameters for the second setting,Q s the probability output for the student network is,y true are tags retrieved from the voice data set.
Optionally, the teacher network is Resnet152.
Optionally, the binary convolutional neural network includes a convolutional layer, a batch normalization layer, a ReLU activation function, 3 blocks, a maximum pooling layer, and a fully-connected layer, which are connected in sequence, where each Block includes a binary convolutional layer, a batch normalization layer, and a ReLU activation function, which are connected in sequence.
Optionally, the set of voice data is a *** voice command set.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a voice awakening method and system based on a binary convolution neural network.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic flow chart of a voice wake-up method based on a binary convolutional neural network according to the present invention;
fig. 2 is a schematic structural diagram of a voice wake-up system based on a binary convolutional neural network according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention aims to provide a voice awakening method and a voice awakening system based on a binary convolutional neural network, which reduce the calculated amount and the power consumption of voice recognition.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a schematic flow chart of a voice wake-up method based on a binary convolutional neural network according to the present invention, and as shown in fig. 1, a voice wake-up method based on a binary convolutional neural network includes:
step 101: performing MFCC feature extraction on each voice sample of the voice data set to obtain a continuous MFCC feature frame corresponding to each voice sample; the labels of each speech sample include keywords and non-keywords.
The voice data set is the *** voice command set (GSCD).
In step 101, the continuous MFCC feature frames, i.e., mel-frequency cepstrum coefficient feature matrices, are used as input for teacher networking and pre-training.
The keyword tags include "on", "off" and "zero", and the non-keyword tags are selected as "silent state".
Step 102: and taking the continuous MFCC characteristic frames as the input of the teacher network, and taking the labels corresponding to the voice samples as the output to train the teacher network, thereby obtaining the trained teacher network.
The teacher network is a residual network, specifically the Resnet152.
Step 103: based on a knowledge distillation method, adopting a trained teacher network to conduct guide training on a student network, and taking the trained student network as a voice wake-up system classifier; the student network is a Binary Convolutional Neural Network (BCNN).
Network training is usually performed by minimizing the error between the output and the label by using a loss function, and a cross-entropy loss function, i.e. loss, is generally adopted in classification tasks such as voice wakeupLoss=CrossEntroy(output, target), where output is the output of the neural network and target is the label obtained from the dataset. The final network output is probability output, and in this embodiment, non-key words of "mute state" and "on" are selected "The three keywords of off and zero are equivalent to four classification tasks, and the final output is four corresponding probability outputs, while in the traditional training process, only the maximum probability output and the label are selected for loss function calculation, which causes the waste of information contained in the rest probability outputs. The student network not only needs to carry out loss function calculation on the probability output of the student network and the label, but also needs to carry out loss function calculation on the probability output and the probability output of the teacher network, so that the information loss in the student network training process is reduced, and the information loss is also a problem of the binarization network. In particular to the method and the device which are realized by modifying the loss function in the network training process of the students.
In the invention, the loss function adopted during network training of students is a KD loss function (KD loss), which is expressed as follows:
L KD (W student )=aT 2 *CrossEntropy(
Figure 674601DEST_PATH_IMAGE001
)+(1-a)*CrossEntropy(Q s ,y true );
wherein the content of the first and second substances,L KD (W student ) The function of the KD loss is expressed,CrossEntropy(. -) represents a cross entropy loss function,
Figure 309720DEST_PATH_IMAGE002
probability of representing network output of student andTthe ratio of (a) to (b),
Figure 761954DEST_PATH_IMAGE003
probability of representing teacher's network output andTthe ratio of (a) to (b),Tin order to set the parameters, the user can set the parameters,ain order to set the parameters, the user can set the parameters,Q s is the probability output by the student network,y true for the tags to be retrieved from the voice data set,y true 0 or 1, the correct class of tags is 1, the other tags are 0,for example: the output of the four types of tags is 0,0,1,0.
The binary convolutional neural network comprises a convolutional layer, a batch normalization layer, a ReLU activation function, 3 blocks, a maximum pooling layer and a full connection layer which are connected in sequence, wherein each Block comprises a binary convolutional layer, a batch normalization layer and a ReLU activation function which are connected in sequence. Before convolution operation, the binary convolution layer quantizes the input activation value and the weight value into 1 and-1 in a binary mode, so that the parameter quantity is reduced, and complex floating point convolution operation is converted into simple shift operation.
The binary quantization is specifically a formula:
Figure 207848DEST_PATH_IMAGE004
Figure 241855DEST_PATH_IMAGE005
wherein the content of the first and second substances,a r an input activation value that represents full precision,w r an input weight value representing full precision;a b representing the activation value after the binarization,w b representing the binarized weight value.
Step 104: and performing MFCC feature extraction on the voice signal to be recognized, inputting the extracted continuous MFCC feature frame into a voice awakening system classifier, and inputting the output of the voice awakening system classifier into a voice awakening system.
Wherein, step 104 specifically includes: the method comprises the steps of obtaining an audio file to be recognized, obtaining a voice signal to be recognized, carrying out MFCC feature extraction on the voice signal to be recognized to obtain a Mel cepstrum coefficient feature matrix, inputting the Mel cepstrum coefficient feature matrix into a voice awakening system classifier, outputting the probability of keywords and non-keywords, taking the output value with the maximum probability as final output, and outputting and inputting the final output into the voice awakening system.
Compared with the traditional binarization neural network, the voice awakening method based on the binarization convolutional neural network improves the identification precision and greatly improves the feasibility of the binarization network applied to the voice awakening system.
The method uses the distillation knowledge training method, uses the pre-trained teacher network to conduct guide training on the student network, optimizes the Loss function used in the student network training, and improves the knowledge (information content) acquired in the student network training process by the proposed KD Loss function compared with the traditional cross entropy Loss function, thereby relieving the defect of large Loss of the binaryzation network information and improving the identification precision of the network.
Compared with the traditional neural network voice awakening system, the binary convolution neural network is used, the space for data storage is reduced at the cost of certain precision, the calculated amount and the power consumption of the voice awakening system are greatly reduced, and the difficulty of hardware implementation is reduced. The advantage comes from the binarization processing of the input and the weight of the neural network, thereby greatly reducing the storage amount of data and the operation amount of the system, further reducing the power consumption and providing a realization scheme of the lightweight voice wake-up system which is convenient for a mobile terminal to use.
Fig. 2 is a schematic structural diagram of a voice wake-up system based on a binary convolutional neural network according to the present invention, and as shown in fig. 2, a voice wake-up system based on a binary convolutional neural network includes:
the MFCC feature extraction module 201 is configured to perform MFCC feature extraction on each voice sample of the voice data set to obtain a continuous MFCC feature frame corresponding to each voice sample; the labels of each speech sample include keywords and non-keywords.
And the teacher network training module 202 is used for training the teacher network by taking the continuous MFCC characteristic frames as input of the teacher network and taking the labels corresponding to the voice samples as output, so as to obtain the trained teacher network.
The student network training module 203 is used for guiding and training a student network by adopting a trained teacher network based on a knowledge distillation method, and taking the trained student network as a voice awakening system classifier; the student network is a binary convolutional neural network.
And the to-be-recognized voice signal classification module 204 is configured to perform MFCC feature extraction on the to-be-recognized voice signal, input the extracted continuous MFCC feature frames into the voice wake-up system classifier, and input the output of the voice wake-up system classifier into the voice wake-up system.
The loss function adopted during the student network training is a KD loss function, which is expressed as:
L KD (W student )=aT 2 *CrossEntropy(
Figure 277070DEST_PATH_IMAGE001
)+(1-a)*CrossEntropy(Q s ,y true );
wherein, the first and the second end of the pipe are connected with each other,L KD (W student ) The function of the KD loss is expressed,CrossEntropy(. -) represents a cross entropy loss function,
Figure 518564DEST_PATH_IMAGE002
probability of representing network output of student andTthe ratio of (a) to (b),
Figure 886092DEST_PATH_IMAGE003
probability of representing teacher's network output andTthe ratio of (a) to (b),Tin order to set the parameters, the user can set the parameters,ain order to set the parameters, the user can set the parameters,Q s is the probability output by the student network,y true are tags retrieved from the voice data set.
The teacher network is Resnet152.
The binary convolutional neural network comprises a convolutional layer, a batch normalization layer, a ReLU activation function, 3 blocks, a maximum pooling layer and a full connection layer which are sequentially connected, wherein each Block comprises a binary convolutional layer, a batch normalization layer and a ReLU activation function which are sequentially connected.
The voice data set is a *** voice command set.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (6)

1. A voice wake-up method based on a binary convolutional neural network is characterized by comprising the following steps:
performing MFCC feature extraction on each voice sample of the voice data set to obtain a continuous MFCC feature frame corresponding to each voice sample; the label of each voice sample comprises a keyword and a non-keyword;
training the teacher network by taking the continuous MFCC characteristic frames as input of the teacher network and taking the labels corresponding to the voice samples as output, and obtaining the trained teacher network;
based on a knowledge distillation method, adopting a trained teacher network to conduct guide training on a student network, and taking the trained student network as a voice wake-up system classifier; the student network is a binary convolution neural network; the binary convolutional neural network comprises a convolutional layer, a batch normalization layer, a ReLU activation function, 3 blocks, a maximum pooling layer and a full connection layer which are connected in sequence, wherein each Block comprises a binary convolutional layer, a batch normalization layer and a ReLU activation function which are connected in sequence; the binary convolution layer is used for quantizing the input activation value and the weight value into 1 and-1 in a binary manner before convolution operation is carried out, and converting floating point convolution operation into shift operation;
performing MFCC feature extraction on a voice signal to be recognized, inputting an extracted continuous MFCC feature frame into the voice awakening system classifier, and inputting the output of the voice awakening system classifier into a voice awakening system;
the loss function adopted during the student network training is a KD loss function, and the KD loss function is expressed as:
L KD (W student )=aT 2 *CrossEntropy(
Figure 250972DEST_PATH_IMAGE001
)+(1-a)*CrossEntropy(Q s ,y true );
wherein the content of the first and second substances,L KD (W student ) A function representing the loss of KD as described,CrossEntropy(-) represents a cross-entropy loss function,
Figure 521417DEST_PATH_IMAGE002
representing the probability of the student network outputTThe ratio of (a) to (b),
Figure 428193DEST_PATH_IMAGE003
probability of representing the teacher network output andTthe ratio of (a) to (b),Tin order to set the parameters for the first time,ain order to set the parameters for the second setting,Q s the probability output for the student network is,y true as tags obtained from the voice data set.
2. The binary convolutional neural network-based voice wakeup method according to claim 1, wherein the teacher network is Resnet152.
3. The binary convolutional neural network-based voice wakeup method according to claim 1, wherein the voice data set is a Google voice command set.
4. A voice wake-up system based on a binary convolutional neural network, comprising:
the MFCC feature extraction module is used for performing MFCC feature extraction on each voice sample of the voice data set to obtain a continuous MFCC feature frame corresponding to each voice sample; the label of each voice sample comprises a keyword and a non-keyword;
the teacher network training module is used for training the teacher network by taking the continuous MFCC characteristic frames as the input of the teacher network and taking the labels corresponding to the voice samples as the output to obtain the trained teacher network;
the student network training module is used for adopting a trained teacher network to conduct guide training on a student network based on a knowledge distillation method, and taking the trained student network as a voice awakening system classifier; the student network is a binary convolution neural network; the binary convolutional neural network comprises a convolutional layer, a batch normalization layer, a ReLU activation function, 3 blocks, a maximum pooling layer and a full connection layer which are connected in sequence, wherein each Block comprises a binary convolutional layer, a batch normalization layer and a ReLU activation function which are connected in sequence; the binary convolution layer is used for quantizing the input activation value and the weight value into 1 and-1 in a binary manner before convolution operation is carried out, and converting floating point convolution operation into shift operation;
the voice signal classification module to be recognized is used for performing MFCC feature extraction on a voice signal to be recognized, inputting extracted continuous MFCC feature frames into the voice awakening system classifier, and inputting the output of the voice awakening system classifier into a voice awakening system;
the loss function adopted during the student network training is a KD loss function, and the KD loss function is expressed as:
L KD (W student )=aT 2 *CrossEntropy(
Figure 176706DEST_PATH_IMAGE001
)+(1-a)*CrossEntropy(Q s ,y true );
wherein the content of the first and second substances,L KD (W student ) Represents the function of the KD loss as described,CrossEntropy(-) represents a cross-entropy loss function,
Figure 118117DEST_PATH_IMAGE002
representing the probability of the student network outputTThe ratio of (a) to (b),
Figure 559463DEST_PATH_IMAGE003
probability of representing the teacher network output andTthe ratio of (a) to (b),Tin order to set the parameters, the user can select the parameters,ain order to set the parameters, the user can set the parameters,Q s the probability output for the student network is,y true are tags retrieved from the voice data set.
5. The binary convolutional neural network-based voice wake-up system of claim 4, wherein the teacher network is Resnet152.
6. The binary convolutional neural network-based voice wake-up system of claim 4, wherein the voice data set is a Google Voice Command set.
CN202210737439.7A 2022-06-28 2022-06-28 Voice awakening method and system based on binary convolutional neural network Active CN114822510B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210737439.7A CN114822510B (en) 2022-06-28 2022-06-28 Voice awakening method and system based on binary convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210737439.7A CN114822510B (en) 2022-06-28 2022-06-28 Voice awakening method and system based on binary convolutional neural network

Publications (2)

Publication Number Publication Date
CN114822510A CN114822510A (en) 2022-07-29
CN114822510B true CN114822510B (en) 2022-10-04

Family

ID=82522967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210737439.7A Active CN114822510B (en) 2022-06-28 2022-06-28 Voice awakening method and system based on binary convolutional neural network

Country Status (1)

Country Link
CN (1) CN114822510B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110265002A (en) * 2019-06-04 2019-09-20 北京清微智能科技有限公司 Audio recognition method, device, computer equipment and computer readable storage medium
CN111091819A (en) * 2018-10-08 2020-05-01 蔚来汽车有限公司 Voice recognition device and method, voice interaction system and method
CN111583940A (en) * 2020-04-20 2020-08-25 东南大学 Very low power consumption keyword awakening neural network circuit
CN112233675A (en) * 2020-10-22 2021-01-15 中科院微电子研究所南京智能技术研究院 Voice awakening method and system based on separation convolutional neural network
CN112365885A (en) * 2021-01-18 2021-02-12 深圳市友杰智新科技有限公司 Training method and device of wake-up model and computer equipment
CN113191489A (en) * 2021-04-30 2021-07-30 华为技术有限公司 Training method of binary neural network model, image processing method and device
CN113409773A (en) * 2021-08-18 2021-09-17 中科南京智能技术研究院 Binaryzation neural network voice awakening method and system
CN113782009A (en) * 2021-11-10 2021-12-10 中科南京智能技术研究院 Voice awakening system based on Savitzky-Golay filter smoothing method
WO2022016556A1 (en) * 2020-07-24 2022-01-27 华为技术有限公司 Neural network distillation method and apparatus
CN114358206A (en) * 2022-01-12 2022-04-15 合肥工业大学 Binary neural network model training method and system, and image processing method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11410029B2 (en) * 2018-01-02 2022-08-09 International Business Machines Corporation Soft label generation for knowledge distillation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091819A (en) * 2018-10-08 2020-05-01 蔚来汽车有限公司 Voice recognition device and method, voice interaction system and method
CN110265002A (en) * 2019-06-04 2019-09-20 北京清微智能科技有限公司 Audio recognition method, device, computer equipment and computer readable storage medium
CN111583940A (en) * 2020-04-20 2020-08-25 东南大学 Very low power consumption keyword awakening neural network circuit
WO2022016556A1 (en) * 2020-07-24 2022-01-27 华为技术有限公司 Neural network distillation method and apparatus
CN112233675A (en) * 2020-10-22 2021-01-15 中科院微电子研究所南京智能技术研究院 Voice awakening method and system based on separation convolutional neural network
CN112365885A (en) * 2021-01-18 2021-02-12 深圳市友杰智新科技有限公司 Training method and device of wake-up model and computer equipment
CN113191489A (en) * 2021-04-30 2021-07-30 华为技术有限公司 Training method of binary neural network model, image processing method and device
CN113409773A (en) * 2021-08-18 2021-09-17 中科南京智能技术研究院 Binaryzation neural network voice awakening method and system
CN113782009A (en) * 2021-11-10 2021-12-10 中科南京智能技术研究院 Voice awakening system based on Savitzky-Golay filter smoothing method
CN114358206A (en) * 2022-01-12 2022-04-15 合肥工业大学 Binary neural network model training method and system, and image processing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
知识蒸馏(Knowledge Distillation)简述(一);Ivan Yan;《百度》;20191125;网页全文 *

Also Published As

Publication number Publication date
CN114822510A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN108806667B (en) Synchronous recognition method of voice and emotion based on neural network
CN110516253B (en) Chinese spoken language semantic understanding method and system
CN111210807B (en) Speech recognition model training method, system, mobile terminal and storage medium
CN111145729B (en) Speech recognition model training method, system, mobile terminal and storage medium
CN114708855B (en) Voice awakening method and system based on binary residual error neural network
CN113744727A (en) Model training method, system, terminal device and storage medium
CN112735404A (en) Ironic detection method, system, terminal device and storage medium
CN114495904B (en) Speech recognition method and device
CN115457938A (en) Method, device, storage medium and electronic device for identifying awakening words
CN114627868A (en) Intention recognition method and device, model and electronic equipment
Li A lightweight architecture for query-by-example keyword spotting on low-power iot devices
CN113823265A (en) Voice recognition method and device and computer equipment
CN109119073A (en) Audio recognition method, system, speaker and storage medium based on multi-source identification
CN115376547B (en) Pronunciation evaluation method, pronunciation evaluation device, computer equipment and storage medium
CN114822510B (en) Voice awakening method and system based on binary convolutional neural network
CN115064160B (en) Voice wake-up method and device
Diwan et al. Reduce and reconstruct: ASR for low-resource phonetic languages
CN113160801B (en) Speech recognition method, device and computer readable storage medium
Tailor et al. Deep learning approach for spoken digit recognition in Gujarati language
CN114822509A (en) Speech recognition method, speech recognition device, computer equipment and storage medium
CN111414748A (en) Traffic data processing method and device
CN112287673B (en) Method for realizing voice navigation robot based on deep learning
CN117041430B (en) Method and device for improving outbound quality and robustness of intelligent coordinated outbound system
CN114596844B (en) Training method of acoustic model, voice recognition method and related equipment
Zou et al. End to End Speech Recognition Based on ResNet-BLSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant