CN113823274B - Voice keyword sample screening method based on detection error weighted editing distance - Google Patents

Voice keyword sample screening method based on detection error weighted editing distance Download PDF

Info

Publication number
CN113823274B
CN113823274B CN202110938700.5A CN202110938700A CN113823274B CN 113823274 B CN113823274 B CN 113823274B CN 202110938700 A CN202110938700 A CN 202110938700A CN 113823274 B CN113823274 B CN 113823274B
Authority
CN
China
Prior art keywords
samples
training
sample
voice keyword
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110938700.5A
Other languages
Chinese (zh)
Other versions
CN113823274A (en
Inventor
贺前华
严海康
兰小添
郑若伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110938700.5A priority Critical patent/CN113823274B/en
Publication of CN113823274A publication Critical patent/CN113823274A/en
Application granted granted Critical
Publication of CN113823274B publication Critical patent/CN113823274B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a voice keyword sample screening method based on a detection error weighting edit distance, which utilizes output information in a voice keyword recognition model training process to revise the edit distance of a decoding sequence and a tag sequence by weighting detection errors of sample keywords, so that important samples can be paid greater attention, and unqualified voice keyword samples can be screened out. The invention greatly reduces the workload of manually auditing all samples and improves the screening efficiency. An effective scheme is provided for cleaning the corpus, a high-quality voice data set is constructed, the difficulty in constructing the low-resource small language corpus is reduced, a voice keyword sample with higher quality is provided for the deep neural network, and the research and development of the low-resource language related voice technology are promoted.

Description

Voice keyword sample screening method based on detection error weighted editing distance
Technical Field
The invention relates to the technical field of data processing, in particular to a voice keyword sample screening method based on detection error weighting editing distance.
Background
In recent years, with the rapid development of deep learning, the performance of tasks such as voice recognition and voice keyword detection is also greatly improved. Neural network-based methods are currently the mainstay of research, such as recurrent neural networks (Recurrent Neural Network, RNN), convolutional neural networks (Convolutional Neural Network, CNN), transformers, etc. However, deep neural networks have high demands on both the size and quality of the data set. The deep neural network shows remarkable performance under the conditions of sufficient training data quantity and high sample quality. Large-scale speech data sets are generally recorded by organizing recording staff or collected from a network, and due to the influence of some objective factors, a corpus often contains inaccurate text labels, i.e. the speech semantic content of a sample and the text have certain access. But it is not known which samples are marked inaccurately. When the corpus is large in scale, manual auditing labeling correction is difficult to be performed on all samples, so that huge time and labor cost are required, and meanwhile, the efficiency is low. Such training data may cause the model to learn a wrong mapping, thereby affecting the performance of the model. Therefore, it is necessary to clean the corpus and screen out voice keyword samples with unqualified labels, so as to construct a high-quality voice data set to be applied to different tasks.
Voice keyword detection is an effective intelligent voice processing technique. In the current voice keyword detection method based on the neural network, a large number of voice keyword samples are required to be used for iterative training. The model learns the commonalities among samples in the training process, and forms different mapping relations. The model can output a lot of useful information in the training process, and one heuristic knowledge is to use the intermediate information of the keyword model training process to help screen out unqualified labeling samples. The main basis of the heuristic knowledge is that most samples are reliable, and under the assumption, the model obtained by training is also basically reliable, namely, the samples marked with errors occupy a small number, and the qualified samples occupy a large number, so that the correct mapping relationship learned by the model is stronger. In different iteration runs, predictions of the model will tend to be consistent for a good sample; for failed samples, the prediction of the model may appear to float significantly. And (3) eliminating abnormal samples by fusing the output information of the model in the whole training period. How to use the output information of the model during training becomes critical to screening.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a voice keyword sample screening method based on the detection error weighting edit distance, which utilizes the output information in the training process of a voice keyword recognition model to weight the detection error of a sample keyword so as to revise the edit distance of a decoding sequence and a tag sequence, so that important samples can be paid greater attention to, and unqualified voice keyword samples can be screened out.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows:
a voice keyword sample screening method based on detection error weighted editing distance comprises the following steps:
s1, using an original voice data set as a training sample setWherein X is n For speech keyword samples, Y n For the corresponding recorded text, N is the total number of training samples, and then the training sample set +.>Recording text Y in (3) n Is transcribed into a sequence with a tuning knot->The tuning joints of all keywords are respectively represented by numbers from 0 to K-2, other tuning syllables are respectively represented by numbers from K-1, K-1 is the number of tuning syllables of all keywords, and a tag sequence>Thereby obtaining a training sample set Z t
S2, obtaining a training sample set Z through the acquired training sample set t Iteratively training a voice keyword recognition model until the model converges, recognizing all voice keyword samples after each iteration, and recording each voice keyword sample X n Is a decoding sequence of (a)
S3, respectively decoding each sequenceCorresponding tag sequence->Comparing to calculate the edit distance, the number of missed keyword detection and the number of false alarms, and checking the missed keyword detectionThe number and the false alarm number are weighted respectively to revise the editing distance, and the revised editing distance is obtained;
s4, fusing all revised editing distances obtained in the training process of the voice keyword recognition model of each voice keyword sample to obtain an error value of each training sample;
and S5, screening the voice keyword samples according to the error value of each training sample until the proportion of the qualified samples meets the requirement.
Further, in step S3, the number of missed detection and the number of false alarms are weighted respectively to revise the edit distance, and the revision formula adopted is:
D=D e +n FR ·D FR +n FA ·D FA (1)
in the formula (1), D e To decode a sequenceCorresponding tag sequence->Edit distance, n FR For the number of missed keyword detection, D FR To the cost of the missed detection of the key words, n FA For the number of false alarms of keywords, D FA The keyword false alarm cost is used; d (D) FR And D FA Is an empirical constant and satisfies D FR ≥0,D FA ≥0。
Further, when the step S4 fuses all revised edit distances obtained in the model training process for each voice keyword sample, the edit distances from the second iteration to the mth iteration are specifically fused, and the fusion formula is as follows:
in the formula (2), the amino acid sequence of the compound,representing training samplesM represents the error value in the training sample set Z t Iteration times of training of keyword recognition model, D i Representing the revised edit distance calculated for the i-th iteration.
Further, the edit distance is the sequence to be decodedModulated to tag sequence->The number of insert, delete and replace operations required.
Further, the step S5 of screening the voice keyword samples according to the error value of each training sample specifically includes:
firstly, sequencing all training samples according to the error value from large to small, setting a threshold value, and if the error value of the training samples is smaller than or equal to the threshold value, putting the training samples into a qualified sample set; if the error value of the training sample is greater than the threshold value, the training sample is put into a candidate sample set; then, manually checking the training samples in the candidate sample set, and if the checking is a qualified sample, moving the training samples into the qualified sample set; finally, taking the obtained qualified sample set as a new training sample set Z t Repeating the steps S2-S5 until the proportion of the qualified samples meets the requirement.
Further, the set threshold value is obtained by selecting the following processes:
1) Dividing the ordered error value sequence into different continuous intervals according to the numerical range of the error value of each training sample;
2) Starting from the interval with the largest numerical value, randomly extracting k training samples in each interval in sequence for manual auditing, if the recording text of the training samples is consistent with the voice semantic content, the training samples are regarded as qualified samples, otherwise, the training samples are regarded as unqualified samples, and the audited training samples are recorded;
3) Counting the occupation ratio of unqualified samples in k training samples extracted in the current interval, stopping manual auditing if the occupation ratio of the unqualified samples is smaller than a coefficient alpha, and taking the maximum error value of the interval as a set threshold value; where k and α are empirical parameters.
Further, the proportion of the disqualified samples in all the continuous intervals is smaller than the coefficient alpha until the proportion of the qualified samples meets the requirement.
In the technical scheme, the voice keyword sample refers to all samples used for training a voice keyword detector, and comprises a positive sample containing keywords and a negative sample not containing keywords. Positive and negative samples are indispensable in the voice keyword detector training process, positive and negative samples are also indispensable in the voice keyword detector evaluation process, and negative samples are generally more than positive samples. The literature is commonly referred to as a speech keyword sample.
Compared with the prior art, the principle of the technical scheme is as follows:
according to the technical scheme, output information in the training process of the voice keyword recognition model is utilized, and detection errors of sample keywords are weighted, so that the editing distance between a decoding sequence and a tag sequence is revised, important samples can be focused more greatly, and unqualified voice keyword samples are screened out.
Compared with the prior art, the technical scheme has the following advantages:
according to the technical scheme, the workload of manually auditing all samples is greatly reduced, and the screening efficiency is improved. An effective scheme is provided for cleaning the corpus, a high-quality voice data set is constructed, the difficulty in constructing the low-resource small language corpus is reduced, a voice keyword sample with higher quality is provided for the deep neural network, and the research and development of the low-resource language related voice technology are promoted.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the services required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the figures in the following description are only some embodiments of the present invention, and that other figures can be obtained according to these figures without inventive effort to a person skilled in the art.
Fig. 1 is a schematic flow chart of a speech keyword sample screening method based on detecting error weighted editing distance in an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
As shown in fig. 1, the method for screening a speech keyword sample based on a detection error weighted edit distance according to the present embodiment includes the following steps:
s1, using an original voice data set as a training sample setWherein X is n For speech keyword samples, Y n For the corresponding recorded text, N is the total number of training samples, and then the training sample set +.>Recording text Y in (3) n Is transcribed into a sequence with a tuning knot->The tuning joints of all keywords are respectively represented by numbers from 0 to K-2, the other tuning syllables are represented by numbers from K-1, K-1 is the number of tuning syllables of all keywords, and a tag sequence of a voice keyword sample is constructed>Thereby obtaining a training sample set Z t
In this example, the recorded 392.67-hour Jiangxi Gangzhou speech sound data without manual review is used as a training sample set, which contains 761 different speakers in total; in addition, the voice data of the guest voice which has been subjected to the manual examination for 15.51 hours is used as a verification sample set, wherein the verification sample set is used as a judgment criterion for model convergence,and there is no intersection with the training sample set; the number of keywords to be identified is predefined to be 50. Firstly, all keywords are represented by tuning joints, and after the tuning joints of the keywords are sequenced, the keywords are respectively represented by numbers 0 to K-2, so that a mapping relation table is obtained. For each voice keyword sample, converting the text sequence into a sequence with a tuning joint, traversing the sequence with the tuning joint of the sample according to the mapping relation table, if the current tuning joint is in the mapping relation table, using the corresponding number to represent, otherwise using the number K-1 to represent, and obtaining the label sequence of each voice keyword sample
S2, obtaining a training sample set Z through the acquired training sample set t Iteratively training a keyword recognition model until the model converges, recognizing all voice keyword samples after each iteration, and recording each voice keyword sample X n Is a decoding sequence of (a)
In this embodiment, first, extracting 80-dimensional logarithmic mel spectrum features from each voice keyword sample in the training sample set and the verification sample set according to frame length of 25ms and frame shift of 10ms, and obtaining voice feature expression of the sample. The structure of the keyword recognition model is a convolutional cyclic neural network, and specifically comprises a 3-layer convolutional network, a 2-layer bidirectional gating unit and a 2-layer full-connection layer, the convolutional kernel size is 3 multiplied by 3, a loss function uses a joint sense time sequence classifier, an optimizer uses Adam, and the initial learning rate is 0.001. In the model training process, one iteration takes part in one training for all training samples, and the judgment criterion of model convergence is that the keyword recognition performance of the verification sample set is not improved. After each round of iterative training, all training samples are identified once by using a current model, a beam searching algorithm is adopted in a decoding method, the beam width is set to be 20, and a decoding sequence of the voice keyword samples is obtained and recorded.
S3, respectively decoding each sequenceAnd tag sequence->Comparing to calculate the editing distance, the number of missed detection of the keywords and the number of false alarms, and respectively weighting the number of missed detection and the number of false alarms to revise the editing distance to obtain revised editing distance;
in this embodiment, the keyword recognition performance of the verification sample set is optimal at round 16. For each training sample, there are 16 decoding sequences, corresponding to the output results of the 1 st to 16 th models; comparing each decoded sequence with the tag sequence to calculate an edit distance d e . In the keyword detection task, two types of detection errors are more concerned, namely missed detection of keywords and false alarms. Although the edit distance measures the deviation of the code sequence from the label sequence, the error cost of false alarm and omission is consistent with the error cost of non-key words, so that the edit distance is necessary to be revised, so that the quality evaluation index of the sample more highlights the sample with error in key word detection. Therefore, according to the sequence of the keywords, the number of false alarms and missed detection of the statistical samples is weighted respectively, and the editing distance D is calculated e The revision is carried out, and the specific implementation formula is as follows:
D=D e +n FR ·D FR +n FA ·D FA (1)
in the formula (1), D e To decode a sequenceCorresponding tag sequence->Edit distance, n FR The number of missed keyword detection of the voice keyword sample is D FR To the cost of the missed detection of the key words, n FA The number of keyword false alarms of the voice keyword sample is D FA The keyword false alarm cost is used; d (D) FR And D FA Is an empirical constant and satisfies D FR ≥0,D FA And is more than or equal to 0. In the present embodiment, D FR =3,D FA =3。
S4, fusing all revised editing distances obtained in the keyword recognition model training process of each voice keyword sample to obtain an error value of each training sample;
specifically, the edit distance from the second iteration to the mth iteration after revising is fused, and a fusion formula is as follows:
in the formula (2), the amino acid sequence of the compound,representing the error value of the training samples, m represents the error value in the training sample set Z t Iteration number D of training of lower keyword recognition model i Representing the revised edit distance calculated for the i-th iteration. The keyword recognition model starts training in a random initialization mode, and larger errors can be generated in model output of the 1 st round, so that the editing distance from the second round of iteration to the m-th round of iteration after revision is fused. In this embodiment, m is 16, and an error value +_is calculated for each training sample>The value range is within the range of 0, + -infinity).
S5, screening the voice keyword samples according to the error value of each training sample to obtain qualified voice keyword samples, wherein the specific process is as follows:
the error values of each training sample are sequenced from big to small, a threshold value is set, the training sample set is divided into two subsets, if the error values are larger than the threshold value, the samples are put into candidate sample sets, otherwise, the samples are put into qualified sample sets. The selection mode of the threshold value is specifically as follows:
dividing the numerical range of the error value of the sample into different continuous sections;
starting from the interval with the largest numerical value, randomly extracting k training samples in each interval in sequence for manual auditing, if the recording text of the training samples is consistent with the voice semantic content, the samples are regarded as qualified samples, otherwise, the samples are regarded as unqualified samples, and the audited training samples are recorded;
counting the occupation ratio of unqualified samples in k training samples extracted in the current interval, stopping manual auditing if the occupation ratio of the unqualified samples is smaller than a coefficient alpha, and taking the maximum error value of the interval as a set threshold value; where k and α are empirical parameters.
In this embodiment, the divided sections are [0,2 ], [2,4 ], [4,6 ], [6,8 ], [8, 10), [10, 12), [12, 14), [14, 16), [16, ++ infinity), the empirical parameter k=100, α=0.1.
The samples in the candidate sample set are the selected unqualified samples, and in order to avoid deleting the correct samples by mistake and reusing the samples, manual auditing can be performed, and the samples are put into the qualified sample set again after being audited to be qualified or revised to be qualified.
Taking qualified sample set as new training sample set Z t Repeating the steps S2-S5 until the ratio of the disqualified samples in all the continuous intervals is smaller than the coefficient alpha, and finally obtaining a qualified sample set which is the washed corpus.
The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, so variations in shape and principles of the present invention should be covered.

Claims (7)

1. The voice keyword sample screening method based on the detection error weighted editing distance is characterized by comprising the following steps of:
s1, using an original voice data set as a training sample setWherein X is n For speech keyword samples, Y n N is the total number of training samples for corresponding recording text, and then the training samples are collected/>Recording text Y in (3) n Is transcribed into a sequence with a tuning knot->The tuning joints of all keywords are respectively represented by numbers from 0 to K-2, other tuning syllables are respectively represented by numbers from K-1, K-1 is the number of tuning syllables of all keywords, and a tag sequence>Thereby obtaining a training sample set Z t
S2, obtaining a training sample set Z through the acquired training sample set t Iteratively training a voice keyword recognition model until the model converges, recognizing all voice keyword samples after each iteration, and recording each voice keyword sample X n Is a decoding sequence of (a)
S3, respectively decoding each sequenceCorresponding tag sequence->Comparing to calculate the editing distance, the number of missed detection of the keywords and the number of false alarms, and respectively weighting the number of missed detection and the number of false alarms to revise the editing distance to obtain revised editing distance;
s4, fusing revised editing distances obtained in the training process of the voice keyword recognition model of each voice keyword sample to obtain an error value of each training sample;
and S5, screening the voice keyword samples according to the error value of each training sample until the proportion of the qualified samples meets the requirement.
2. The method for screening a voice keyword sample based on the detection error weighted edit distance according to claim 1, wherein in step S3, the number of missed detection and the number of false alarms are weighted respectively to revise the edit distance, and a revised formula is adopted as follows:
D=D e +n FR ·D FR +n FA ·D FA (1)
in the formula (1), D e To decode a sequenceCorresponding tag sequence->Edit distance, n FR For the number of missed keyword detection, D FR To the cost of the missed detection of the key words, n FA For the number of false alarms of keywords, D FA The keyword false alarm cost is used; d (D) FR And D FA Is an empirical constant and satisfies D FR ≥0,D FA ≥0。
3. The method for screening voice keyword samples based on error weighted editing distance detection according to claim 1, wherein when the revised editing distance obtained in the model training process of each voice keyword sample is fused in step S4, the revised editing distance from the second iteration to the mth iteration is specifically fused, and the fusion formula is as follows:
in the formula (2), the amino acid sequence of the compound,representing the error value of the training samples, m represents the error value in the training sample set Z t Training of keyword recognition modelNumber of iterations of training, D i Representing the revised edit distance calculated for the i-th iteration.
4. A method for screening speech keyword samples based on detection error weighted edit distance according to any of claims 1-3, wherein the edit distance is a sequence to be decodedModulated to tag sequence->The number of insert, delete and replace operations required.
5. The method for screening voice keyword samples based on the detection error weighted editing distance according to claim 1, wherein the step S5 is to screen the voice keyword samples according to the error value of each training sample, and specifically comprises:
firstly, sequencing all training samples according to the error value from large to small, setting a threshold value, and if the error value of the training samples is smaller than or equal to the threshold value, putting the training samples into a qualified sample set; if the error value of the training sample is greater than the threshold value, the training sample is put into a candidate sample set; then, manually checking the training samples in the candidate sample set, and if the checking is a qualified sample, moving the training samples into the qualified sample set; finally, taking the obtained qualified sample set as a new training sample set Z t Repeating the steps S2-S5 until the proportion of the qualified samples meets the requirement.
6. The method for screening a voice keyword sample based on detecting an error weighted edit distance according to claim 5, wherein the set threshold is selected by:
1) Dividing the ordered error value sequence into different continuous intervals according to the numerical range of the error value of each training sample;
2) Starting from the interval with the largest numerical value, randomly extracting k training samples in each interval in sequence for manual auditing, if the recording text of the training samples is consistent with the voice semantic content, the training samples are regarded as qualified samples, otherwise, the training samples are regarded as unqualified samples, and the audited training samples are recorded;
3) Counting the occupation ratio of unqualified samples in k training samples extracted in the current interval, stopping manual auditing if the occupation ratio of the unqualified samples is smaller than a coefficient alpha, and taking the maximum error value of the interval as a set threshold value; where k and α are empirical parameters.
7. The method for screening voice keyword samples based on error-detection weighted edit distance according to claim 6, wherein the ratio of non-conforming samples in all consecutive intervals is smaller than a coefficient α until the ratio of conforming samples satisfies the requirement.
CN202110938700.5A 2021-08-16 2021-08-16 Voice keyword sample screening method based on detection error weighted editing distance Active CN113823274B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110938700.5A CN113823274B (en) 2021-08-16 2021-08-16 Voice keyword sample screening method based on detection error weighted editing distance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110938700.5A CN113823274B (en) 2021-08-16 2021-08-16 Voice keyword sample screening method based on detection error weighted editing distance

Publications (2)

Publication Number Publication Date
CN113823274A CN113823274A (en) 2021-12-21
CN113823274B true CN113823274B (en) 2023-10-27

Family

ID=78923084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110938700.5A Active CN113823274B (en) 2021-08-16 2021-08-16 Voice keyword sample screening method based on detection error weighted editing distance

Country Status (1)

Country Link
CN (1) CN113823274B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559881A (en) * 2013-11-08 2014-02-05 安徽科大讯飞信息科技股份有限公司 Language-irrelevant key word recognition method and system
WO2019153996A1 (en) * 2018-02-09 2019-08-15 叶伟 Text error correction method and apparatus for voice recognition
WO2019214149A1 (en) * 2018-05-11 2019-11-14 平安科技(深圳)有限公司 Text key information identification method, electronic device, and readable storage medium
CN110827806A (en) * 2019-10-17 2020-02-21 清华大学深圳国际研究生院 Voice keyword detection method and system
CN111128128A (en) * 2019-12-26 2020-05-08 华南理工大学 Voice keyword detection method based on complementary model scoring fusion
WO2021057038A1 (en) * 2019-09-24 2021-04-01 上海依图信息技术有限公司 Apparatus and method for speech recognition and keyword detection based on multi-task model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559881A (en) * 2013-11-08 2014-02-05 安徽科大讯飞信息科技股份有限公司 Language-irrelevant key word recognition method and system
WO2019153996A1 (en) * 2018-02-09 2019-08-15 叶伟 Text error correction method and apparatus for voice recognition
WO2019214149A1 (en) * 2018-05-11 2019-11-14 平安科技(深圳)有限公司 Text key information identification method, electronic device, and readable storage medium
WO2021057038A1 (en) * 2019-09-24 2021-04-01 上海依图信息技术有限公司 Apparatus and method for speech recognition and keyword detection based on multi-task model
CN110827806A (en) * 2019-10-17 2020-02-21 清华大学深圳国际研究生院 Voice keyword detection method and system
CN111128128A (en) * 2019-12-26 2020-05-08 华南理工大学 Voice keyword detection method based on complementary model scoring fusion

Also Published As

Publication number Publication date
CN113823274A (en) 2021-12-21

Similar Documents

Publication Publication Date Title
CN109840287B (en) Cross-modal information retrieval method and device based on neural network
CN111709242B (en) Chinese punctuation mark adding method based on named entity recognition
CN110930993A (en) Specific field language model generation method and voice data labeling system
CN116127952A (en) Multi-granularity Chinese text error correction method and device
CN113408287B (en) Entity identification method and device, electronic equipment and storage medium
CN113672732B (en) Method and device for classifying service data
CN116542256B (en) Natural language understanding method and device integrating dialogue context information
CN112767921A (en) Voice recognition self-adaption method and system based on cache language model
CN111831902A (en) Recommendation reason screening method and device and electronic equipment
CN112417132A (en) New intention recognition method for screening negative samples by utilizing predicate guest information
CN114661872A (en) Beginner-oriented API self-adaptive recommendation method and system
CN116502646A (en) Semantic drift detection method and device, electronic equipment and storage medium
CN115983274A (en) Noise event extraction method based on two-stage label correction
CN113239694B (en) Argument role identification method based on argument phrase
CN111862963A (en) Voice wake-up method, device and equipment
CN113823274B (en) Voice keyword sample screening method based on detection error weighted editing distance
CN117634615A (en) Multi-task code retrieval method based on mode irrelevant comparison learning
CN115376547B (en) Pronunciation evaluation method, pronunciation evaluation device, computer equipment and storage medium
CN116050419A (en) Unsupervised identification method and system oriented to scientific literature knowledge entity
CN113707131B (en) Speech recognition method, device, equipment and storage medium
CN114418111A (en) Label prediction model training and sample screening method, device and storage medium
CN114333828A (en) Quick voice recognition system for digital product
CN113495964A (en) Method, device and equipment for screening triples and readable storage medium
CN111444708A (en) SQ L statement intelligent completion method based on use scene
CN117350288B (en) Case matching-based network security operation auxiliary decision-making method, system and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant