CN114846543A - Voice recognition result detection method and device and storage medium - Google Patents

Voice recognition result detection method and device and storage medium Download PDF

Info

Publication number
CN114846543A
CN114846543A CN202080088999.3A CN202080088999A CN114846543A CN 114846543 A CN114846543 A CN 114846543A CN 202080088999 A CN202080088999 A CN 202080088999A CN 114846543 A CN114846543 A CN 114846543A
Authority
CN
China
Prior art keywords
result
recognition result
score
evaluation
translation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080088999.3A
Other languages
Chinese (zh)
Inventor
薛征山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Shenzhen Huantai Technology Co Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Shenzhen Huantai Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd, Shenzhen Huantai Technology Co Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Publication of CN114846543A publication Critical patent/CN114846543A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

A method, device and storage medium for detecting voice recognition results are provided, the method comprises: acquiring a speech recognition result to be detected, and translating the speech recognition result to be detected from a first language to a second language by using a machine translation model to obtain a translation result to be detected (S101); determining a first evaluation characteristic based on the translation result to be tested and the voice recognition result to be tested; the first evaluation feature is used for representing the recognition and translation effects of the voice recognition result to be tested (S102); evaluating the speech recognition result to be tested based on the first evaluation characteristic to obtain a first evaluation result (S103); and judging whether the first evaluation result meets a preset condition to obtain a first judgment result, and determining an error detection result of the speech recognition result to be detected according to the first judgment result (S104).

Description

Voice recognition result detection method and device and storage medium Technical Field
The embodiment of the application relates to the technical field of voice recognition, in particular to a method and a device for detecting a voice recognition result and a storage medium.
Background
Limited by the accuracy of speech recognition, the speech recognition result is often erroneous, which will cause obstacles to subsequent understanding and increase the difficulty of speech understanding. In general, after a speech is recognized and a speech recognition result is obtained, an error is detected in the speech recognition result, so that the error can be further corrected.
At present, some language features of the language to which the speech recognition result belongs are mainly used to train a corresponding error detection model so as to perform error detection on the speech recognition result. For example, for the Chinese speech recognition result, an error detection model can be designed based on word collocation and context information, so that the error detection of the Chinese speech recognition result is realized. However, the error detection is performed by using the error detection model corresponding to the language to which the speech recognition result belongs, so that fewer detection features can be utilized, and the detection accuracy is low.
Disclosure of Invention
In order to solve the related technical problem, embodiments of the present application desirably provide a method and an apparatus for detecting a speech recognition result, and a storage medium.
The technical scheme of the embodiment of the application can be realized as follows:
the embodiment of the application provides a method for detecting a voice recognition result, which comprises the following steps:
acquiring a speech recognition result to be detected, and translating the speech recognition result to be detected from a first language to a second language by using a machine translation model to obtain a translation result to be detected;
determining a first evaluation characteristic based on the translation result to be tested and the voice recognition result to be tested; the first evaluation feature is used for representing the recognition and translation effects of the voice recognition result to be detected;
evaluating the voice recognition result to be tested based on the first evaluation characteristic to obtain a first evaluation result;
and judging whether the first evaluation result meets a preset condition or not to obtain a first judgment result, and determining an error detection result of the to-be-detected voice recognition result according to the first judgment result.
In the foregoing solution, the determining a first evaluation feature based on the translation result to be tested and the speech recognition result to be tested includes:
obtaining a translation score and a confusion score of the machine translation model on the translation result to be detected, and obtaining a first translation score and a first confusion score;
inputting the speech recognition result to be tested into the language model corresponding to the first language to obtain a first language model score;
determining the first translation score, the first confusion score, and the first language model score as the first assessment feature.
In the foregoing solution, the evaluating the to-be-tested speech recognition result based on the first evaluation feature to obtain a first evaluation result includes:
acquiring a feature weight corresponding to each of the first translation score, the first confusion score and the first language model score to obtain three feature weights;
and weighting the first translation score, the first confusion score and the first language model score by using the three feature weights to obtain the first evaluation result.
In the foregoing solution, before obtaining the feature weight corresponding to each of the first translation score, the first confusion score, and the first language model score, and obtaining three feature weights, the method further includes:
acquiring a sample voice recognition result and a preset detection result of the sample voice recognition result;
translating the sample speech recognition result from the first language to the second language by using the machine translation model to obtain a sample translation result;
determining a second translation score, a second confusion score, and a second language model score based on the sample translation results and the sample speech recognition results;
acquiring a preset weight corresponding to each of the second translation score, the second confusion score and the second language model score to obtain three preset weights;
and determining the three feature weights by using the second translation score, the second confusion score, the second language model score, the preset detection result and the three preset weights.
In the above solution, the determining the three feature weights by using the second translation score, the second confusion score, the second language model score, the preset detection result, and the three preset weights includes:
weighting the second translation score, the second confusion score and the second language model score by using the three preset weights to obtain a second evaluation result; the second evaluation feature is used for characterizing the recognition and translation effects of the sample voice recognition result;
judging whether the second evaluation result meets the preset condition or not to obtain a second judgment result, and determining an error detection result of the sample voice recognition result according to the second judgment result;
and adjusting the three preset weights based on the error detection result of the sample voice recognition result and the preset detection result to obtain the three characteristic weights.
In the foregoing solution, the adjusting the three preset weights based on the false detection result of the sample speech recognition result and the preset detection result to obtain the three feature weights includes:
and adjusting the three preset weights according to a weight adjustment algorithm until the error detection result of the sample voice recognition result is the same as the preset detection result, thereby obtaining the three characteristic weights.
In the foregoing solution, the determining the error detection result of the to-be-detected speech recognition result according to the first determination result includes:
determining that an error detection result of the speech recognition result to be detected is an error when the first judgment result is that the first evaluation result does not meet the preset condition;
and determining that the error detection result of the speech recognition result to be detected is error-free under the condition that the first judgment result is that the first evaluation result meets the preset condition.
In the foregoing scheme, the determining whether the first evaluation result meets a preset condition to obtain a first determination result includes:
comparing the first evaluation result with an evaluation threshold;
determining that the first judgment result is that the first evaluation result does not meet the preset condition under the condition that the first evaluation result is smaller than the evaluation threshold;
and determining that the first judgment result is that the first evaluation result meets the preset condition when the first evaluation result is greater than or equal to the evaluation threshold.
The embodiment of the application provides a voice recognition result detection device, including:
the translation module is configured to obtain a speech recognition result to be tested, and translate the speech recognition result to be tested from a first language to a second language by using a machine translation model to obtain a translation result to be tested;
a determination module configured to determine a first evaluation feature based on the translation result to be tested and the speech recognition result to be tested; the first evaluation feature is used for representing the recognition and translation effects of the voice recognition result to be detected;
the evaluation module is configured to evaluate the speech recognition result to be tested based on the first evaluation characteristic to obtain a first evaluation result;
and the judging module is configured to judge whether the first evaluation result meets a preset condition to obtain a first judgment result, and determine an error detection result of the to-be-detected voice recognition result according to the first judgment result.
The embodiment of the application provides a voice recognition result detection device, which comprises a processor and a memory;
the processor is configured to execute the voice recognition result detection program stored in the memory to implement the voice recognition result detection method.
An embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the above-mentioned voice recognition result detection method.
The embodiment of the application provides a method and a device for detecting a voice recognition result and a storage medium, wherein the method comprises the following steps: acquiring a speech recognition result to be detected, and translating the speech recognition result to be detected from a first language to a second language by using a machine translation model to obtain a translation result to be detected; determining a first evaluation characteristic based on the translation result to be tested and the voice recognition result to be tested; the first evaluation feature is used for representing the recognition and translation effects of the voice recognition result to be detected; evaluating the voice recognition result to be tested based on the first evaluation characteristic to obtain a first evaluation result; and judging whether the first evaluation result meets a preset condition or not to obtain a first judgment result, and determining an error detection result of the speech recognition result to be detected according to the first judgment result. According to the technical scheme, the speech recognition result to be detected is translated into the other language, error detection is carried out on the speech recognition to be detected by combining the translation result, compared with the method for carrying out error detection only based on the related features of the single language, more features can be obtained to realize error detection, and the accuracy of error detection is improved.
Drawings
Fig. 1 is a schematic flowchart of a method for detecting a speech recognition result according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an exemplary process for detecting an error in a speech recognition result according to an embodiment of the present application;
fig. 3 is a schematic flowchart of determining three feature weights according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a system architecture of an exemplary application of a speech recognition result detection method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a speech recognition result detection apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a speech recognition result detection apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant application and are not limiting of the application. It should be noted that, for the convenience of description, only the parts related to the related applications are shown in the drawings.
The embodiment of the application provides a voice recognition result detection method which is realized by a voice recognition result detection device. Fig. 1 is a schematic flow chart of a method for detecting a speech recognition result according to an embodiment of the present application. As shown in fig. 1, the method for detecting the speech recognition result mainly includes the following steps:
s101, obtaining a speech recognition result to be detected, and translating the speech recognition result to be detected from a first language to a second language by using a machine translation model to obtain a translation result to be detected.
In an embodiment of the application, the voice recognition result detection device may obtain a voice recognition result to be detected, and translate the voice recognition result to be detected from the first language to the second language by using the machine translation model to obtain a translation result to be detected.
It should be noted that, in the embodiment of the present application, the speech recognition result to be detected is a speech recognition result that needs to be subjected to error detection, and is a text generated after speech recognition. The voice recognition result detection device can receive a voice recognition result to be detected obtained by voice recognition of the voice recognition processing device, and in addition, the voice recognition result detection device can also be provided with a recognition module for voice recognition to obtain a voice recognition result to be detected. The specific speech recognition result to be tested and the source of the speech recognition result to be tested are not limited in the embodiments of the present application.
It should be noted that, in the embodiment of the present application, the language of the speech recognition result to be detected is the first language. The machine translation model is capable of translating an input text from a first language to a second language, wherein the first language and the second language are two different languages. The specific machine translation model, the first language and the second language may be selected according to actual requirements, and the embodiment of the present application is not limited.
For example, in an embodiment of the present application, the speech recognition result to be tested is in chinese, that is, the first language is chinese, and the machine translation model may translate the speech recognition result to be tested from chinese to english, that is, the second language is english, so as to obtain the translation result to be tested in english.
S102, determining a first evaluation characteristic based on a translation result to be tested and a voice recognition result to be tested; the first evaluation feature is used for representing the recognition and translation effects of the voice recognition result to be tested.
In an embodiment of the application, after the voice recognition result detection device obtains the voice recognition result to be detected and the voice recognition result to be detected, a first evaluation feature may be determined based on the translation result to be detected and the voice recognition result to be detected, where the first evaluation feature is used to characterize the recognition and translation effects of the voice recognition result to be detected.
Specifically, in an embodiment of the present application, the determining, by the speech recognition result detecting device, the first evaluation feature based on the translation result to be tested and the speech recognition result to be tested includes: acquiring a first translation score and a first confusion score of a machine translation model on a translation result to be detected; inputting the speech recognition result to be tested into a language model corresponding to the first language to obtain a score of the first language model; a first translation score, a first confusion score, and a first language model score are determined as first assessment features.
It should be noted that, in the embodiment of the present application, in the process of translating the speech recognition result to be tested by using the machine translation model in step S101, the speech recognition result detecting device may actually translate a plurality of translation results with the second language, and determine a corresponding translation score and a confusion score for each translation result. The translation score represents the overall translation effect of the translation result, such as fluency and semantic matching degree, and the confusion score represents the confusion degree of the translation result in the translation process. The machine translation model actually determines the translation result with the highest translation score among the plurality of translation results as the to-be-detected translation result corresponding to the to-be-detected voice recognition result, so that the voice recognition result detection device can directly obtain the translation score and the confusion score of the machine translation model on the to-be-detected translation result and determine the translation score and the confusion score as the first translation score and the first confusion score.
It should be noted that, in the embodiment of the present application, the language of the to-be-detected speech recognition result is the first language, and therefore, the speech recognition result detection apparatus may input the to-be-detected speech recognition result into the language model corresponding to the first language, where the language model may be based on the designed word collocation related to the first language, and the context structure evaluates the to-be-detected speech recognition result from the aspects of fluency, sentence structure, and the like, so as to obtain the corresponding language model score. That is, the first speech model score represents the semantic expression effect of the to-be-detected speech recognition result in the first language environment.
For example, in the embodiment of the present application, the speech recognition result to be tested is "how good the weather is today", and the speech recognition result detection apparatus may input the speech recognition result to be tested into a language model of chinese language, and the language model is evaluated in terms of fluency, structure, and the like of sentences, so as to obtain the first language model score a 1. In addition, the voice recognition result detection device translates the weather of today into english by using the machine translation model, and obtains a translation result to be tested as "at's the weather like today", and a first translation score a2 and a first confusion score A3 of the translation result to be tested. Wherein, a1, a2, and A3 were all first assessment features.
It can be understood that, in the embodiment of the present application, the voice recognition result detecting device determines the first translation score, the first confusion score and the first language model score as the first evaluation feature, so as to not only consider the recognition effect from the language feature of the voice recognition result to be detected, but also consider the expression result of another language of the voice recognition result to be detected, and obtain more information representing the recognition effect of the voice recognition result to be detected, thereby performing subsequent error detection more accurately by using the first evaluation feature.
It should be noted that, in the embodiment of the present application, the speech recognition result detection apparatus may also determine the first evaluation feature based on the translation result to be tested and the speech recognition result to be tested in other manners from other angles. For example, the speech recognition result detection device may also respectively score the translation result to be tested and the speech recognition result to be tested by using corresponding models from the aspects of smoothness, language logic and the like, and determine each score as a first evaluation feature. The number and type of specific first evaluation features are not limited in the embodiments of the present application.
S103, evaluating the voice recognition result to be tested based on the first evaluation characteristic to obtain a first evaluation result.
In an embodiment of the application, after the voice recognition result detection device obtains the first evaluation feature, the voice recognition result to be tested is further evaluated based on the first evaluation feature, so as to obtain a first evaluation result.
Specifically, in an embodiment of the present application, the evaluating the to-be-tested speech recognition result by the speech recognition result detecting device based on the first evaluation feature to obtain a first evaluation result, including: acquiring a feature weight corresponding to each of the first translation score, the first confusion score and the first language model score to obtain three feature weights; and carrying out weighting processing on the first translation score, the first confusion score and the first language model score by using three feature weights to obtain a first evaluation result.
It should be noted that, in the embodiment of the present application, for each type of evaluation feature in the first evaluation features, a corresponding feature weight is set to characterize the importance degree of using the evaluation feature in the process of evaluating the speech recognition result to be tested. The speech recognition result detection means may multiply each of the first evaluation features by the corresponding feature weight to obtain three product results, and then calculate the sum of the three product results as the first evaluation result.
Specifically, in an embodiment of the present application, the first evaluation feature includes: the speech recognition result detection device calculates a first evaluation result of the speech recognition result to be detected according to the following formula (1):
Score(s)=λ 1 ×TM+λ 2 ×PP+λ 3 ×LM (1)
wherein score(s) is the first assessment, TM is the translation score, and λ is 1 For translation score the corresponding feature weight, PP is the confusion score, lambda 2 To score the corresponding feature weights for the confusion, LM is the speech model score, λ 3 And scoring the corresponding characteristic weight for the language model.
In the embodiment of the present application, the speech recognition result detecting device may perform weighting processing on the first evaluation feature, and may perform weighting processing in other forms, such as weighting averaging, as well as the first evaluation result obtained by the weighted summation, for example, but the embodiment of the present application is not limited thereto.
And S104, judging whether the first evaluation result meets a preset condition or not to obtain a first judgment result, and determining to obtain an error detection result of the to-be-detected voice recognition result according to the first judgment result.
In the embodiment of the application, after the voice recognition result detection device obtains the first evaluation result, it may be determined whether the first evaluation result meets a preset condition, so as to obtain a first determination result, and determine an error detection result of the to-be-detected voice recognition result according to the first determination result.
Specifically, in an embodiment of the present application, the determining, by the speech recognition result detecting device, an error detection result of the speech recognition result to be detected according to the first determination result includes: determining that an error detection result of the speech recognition result to be detected is an error when the first judgment result is that the first evaluation result does not meet the preset condition; and determining that the error detection result of the speech recognition result to be detected is error-free under the condition that the first judgment result is that the first evaluation result meets the preset condition.
Specifically, in an embodiment of the present application, the determining, by the speech recognition result detecting device, whether the first evaluation result satisfies a preset condition to obtain a first determination result includes: comparing the first evaluation result with an evaluation threshold; under the condition that the first evaluation result is smaller than the evaluation threshold, determining that the first judgment result is that the first evaluation result does not meet the preset condition; and under the condition that the first evaluation result is greater than or equal to the evaluation threshold, determining that the first judgment result is that the first evaluation result does not meet the preset condition.
It can be understood that, in the embodiment of the present application, the first evaluation result represents the overall score of the speech recognition result to be tested in various aspects such as fluency, wrongly written characters, and the like, and therefore, in a case where the first evaluation result is smaller than the evaluation threshold, it indicates that the speech recognition result to be tested has an error, and correspondingly, in a case where the first evaluation result is greater than or equal to the evaluation threshold, it indicates that the speech recognition result to be tested has no error. The specific evaluation threshold may be set according to actual requirements, and the embodiment of the present application is not limited.
It should be noted that, in the embodiment of the present application, a preset condition may be preset according to an actual error detection criterion, and the specific preset condition is not limited in the embodiment of the present application. For example, the preset condition may also be a preset interval, that is, when the first evaluation result exceeds the preset interval, it is determined that the error detection result of the speech recognition result to be detected is error, and when the first evaluation result is within the preset interval range, it is determined that the error detection result of the speech recognition result to be detected is error-free.
Fig. 2 is a schematic diagram of an exemplary error detection process of a speech recognition result according to an embodiment of the present application. As shown in fig. 2, the speech recognition result detection apparatus inputs the speech recognition result to be detected into the machine translation model, the machine translation model can translate the first language of the speech recognition result to be detected into the second language, and output the translation result to be detected, and then obtain the translation score, the confusion score, and the language model score of the speech recognition result to be detected, so as to perform weighted summation by using corresponding feature weights, and obtain a first evaluation result, and finally, compare the first evaluation result with the evaluation threshold, and determine the error detection result of the speech recognition result to be detected.
It should be noted that, in the embodiment of the present application, the speech recognition result detecting device may actually determine the three feature weights in a certain manner before acquiring the three feature weights in step S103.
Fig. 3 is a schematic flowchart of determining three feature weights according to an embodiment of the present disclosure. As shown in fig. 3, the method mainly comprises the following steps:
s301, obtaining a sample voice recognition result and a preset detection result of the sample voice recognition result.
In an embodiment of the present application, the speech recognition result detection device may obtain the sample speech recognition result and a preset detection result of the sample speech recognition result first.
It should be noted that, in the embodiment of the present application, the number of the sample speech recognition results may be multiple. For example hundreds of thousands. And the preset detection result of the sample voice recognition result is a judgment result for manually judging whether the sample voice recognition result has errors.
Illustratively, in the embodiment of the present application, one sample speech recognition result is "new school in 2019 of first university", and the preset detection result is error-free. In addition, a sample speech recognition result is "popular with second university in popular with new-born study in 2019", and the preset detection result is that an error exists.
It should be noted that, in the embodiment of the present application, a large number of sample speech recognition results may be obtained according to a certain proportion of preset detection results without errors and with errors, and the specific proportion is not limited in the embodiment of the present application.
It should be noted that, in the embodiment of the present application, the sample speech recognition result and the speech recognition result to be detected are in the same language, that is, both the sample speech recognition result and the speech recognition result to be detected are in the first language.
S302, translating the sample voice recognition result from the first language to the second language by using a machine translation model to obtain a sample translation result.
In an embodiment of the application, after obtaining the sample speech recognition result, the speech recognition result detecting device may translate the sample speech recognition result from the first language to the second language by using a machine translation model to obtain a sample translation result.
It should be noted that, in the embodiment of the present application, a process of translating the sample speech recognition result by the speech recognition result detecting device using the machine translation model is the same as that of translating the speech recognition result to be tested in the step S101, and details are not repeated here.
S303, determining a second translation score, a second confusion score and a second language model score based on the sample translation result and the sample voice recognition result.
In an embodiment of the application, the speech recognition result detecting device, after obtaining the sample translation result and the sample speech recognition result, further determines a second translation score, a second confusion score and a second language model score based on the sample translation result and the sample speech recognition result.
It should be noted that, in the embodiment of the present application, the voice recognition result detecting device determines the second translation score, the second confusion score and the second language model score based on the sample translation result and the sample voice recognition result, which is similar to the process of determining the first translation score, the first confusion score and the first language model score based on the translation result to be detected and the voice recognition result to be detected in step S102, and is not described herein again.
S304, acquiring a preset weight corresponding to each of the second translation score, the second confusion score and the second language model score to obtain three preset weights.
In an embodiment of the application, the voice recognition result detecting device may further obtain preset weights corresponding to each of the second translation score, the second confusion score, and the second language model score, so as to obtain three preset weights.
It should be noted that, in the embodiment of the present application, since the degree of importance of different types of features in performing error detection cannot be determined in the initial case, three preset weights may be set in advance, for example, each of the three preset weights may be set to 1. The specific three preset weights are not limited in the embodiments of the present application.
S305, determining three characteristic weights by using the second translation score, the second confusion score, the second language model score, the preset detection result and the three preset weights.
In an embodiment of the application, after obtaining the second translation score, the second confusion score, the second language model score, the preset detection result and the three preset weights, the voice recognition result detecting device may determine the three feature weights by using the second translation score, the second confusion score, the second language model score, the preset detection result and the three preset weights.
Specifically, in the embodiment of the present application, the voice recognition result detecting device determines three feature weights by using the second translation score, the second confusion score, the second language model score, the preset detection result, and three preset weights, including: weighting the second translation score, the second confusion score and the second language model score by using three preset weights to obtain a second evaluation result; judging whether the second evaluation result meets a preset condition or not to obtain a second judgment result, and determining an error detection result of the sample voice recognition result according to the second judgment result; and adjusting the three preset weights based on the error detection result and the preset detection result of the sample voice recognition result to obtain three characteristic weights.
It should be noted that, in the embodiment of the present application, the speech recognition result detecting apparatus performs weighting processing on the second evaluation feature by using three preset weights, which is similar to the process of performing weighting processing on the first evaluation feature by using three feature weights in step S103, and the difference is only in the specific values of the features and the weights, and details are not repeated here.
It should be noted that, in the embodiment of the present application, the voice recognition result detecting device determines whether the second evaluation result meets the preset condition, which is similar to the process of determining whether the first evaluation result meets the preset condition in step S104, and the difference is only that the determined objects are different, and details are not repeated here.
Specifically, in the embodiment of the present application, the voice recognition result detecting apparatus adjusts three preset weights based on the false detection result and the preset detection result of the sample voice recognition result to obtain three feature weights, including: and adjusting the three preset weights according to a weight adjustment algorithm until the error detection result of the sample voice recognition result is the same as the preset detection result, and obtaining three characteristic weights.
It is understood that, in the embodiment of the present application, the false detection result of the sample speech recognition result is the same as the preset detection result, that is, it is appropriate to specify the weight setting, and therefore, the three adjusted preset weights may be determined as the three feature weights.
It should be noted that, in the embodiment of the present application, a weight adjustment algorithm, such as a Minimum Error Rate Training (MERT) algorithm, may be preset according to an actual requirement. Specific weight adjustment algorithm the embodiments of the present application are not limited.
It can be understood that, in the embodiment of the present application, the speech recognition result detecting apparatus may determine the three feature weights by using a large number of sample speech recognition results, and therefore, in the process of adjusting the weights, when an error detection result of a sample speech recognition result with a higher proportion of the large number of sample speech recognition results is the same as a preset detection result, the obtained weights may be determined as the feature weights.
It should be noted that, in the embodiment of the present application, the voice recognition result detection method may be applied to various application scenarios requiring voice recognition, so as to implement error detection of a voice recognition result.
Fig. 4 is a schematic system architecture diagram of an exemplary application of a speech recognition result detection method according to an embodiment of the present application. As shown in fig. 4, the system may include: the voice recognition system comprises a client, a cloud, a voice processing server and a display screen, wherein the voice recognition result detection method provided by the application is integrated in the cloud.
In practical application, in the process of carrying out conference speech, a client acquires voice data of a speaker, the acquired voice data is sent to a voice processing server, the voice processing server identifies the voice data to obtain a voice recognition result to be detected, then the voice processing server can send the voice recognition result to be detected to a cloud, the cloud carries out error detection on the voice recognition result to be detected according to a voice recognition result detection method, the error detection result is returned to the voice processing server, if the error detection result is an error, the voice processing server can correct the voice recognition result to be detected according to the error detection result in a certain mode to obtain a correct voice recognition result, and finally the correct voice recognition result is projected to a display screen to be displayed.
It should be noted that, in the embodiment of the present application, the voice recognition result detection method is implemented in a software manner, and may be integrated in the cloud and the mobile terminal.
The embodiment of the application provides a method for detecting a voice recognition result, which comprises the following steps: acquiring a speech recognition result to be detected, and translating the speech recognition result to be detected from a first language to a second language by using a machine translation model to obtain a translation result to be detected; determining a first evaluation characteristic based on the translation result to be tested and the voice recognition result to be tested; the first evaluation feature is used for representing the recognition and translation effects of the voice recognition result to be tested; evaluating the voice recognition result to be tested based on the first evaluation characteristic to obtain a first evaluation result; and judging whether the first evaluation result meets a preset condition or not to obtain a first judgment result, and determining an error detection result of the speech recognition result to be detected according to the first judgment result. According to the technical scheme, the speech recognition result to be detected is translated into the other language, error detection is carried out on the speech recognition to be detected by combining the translation result, compared with the method for carrying out error detection only based on the related features of the single language, more features can be obtained to realize error detection, and the accuracy of error detection is improved.
The embodiment of the application provides a voice recognition result detection device. Fig. 5 is a schematic structural diagram of a speech recognition result detection apparatus according to an embodiment of the present application. As shown in fig. 5, the voice recognition result detecting device includes:
the translation module 501 is configured to obtain a speech recognition result to be tested, and translate the speech recognition result to be tested from a first language to a second language by using a machine translation model to obtain a translation result to be tested;
a determining module 502 configured to determine a first evaluation feature based on the translation result to be tested and the speech recognition result to be tested; the first evaluation feature is used for representing the recognition and translation effects of the voice recognition result to be detected;
the evaluation module 503 is configured to evaluate the to-be-tested speech recognition result based on the first evaluation feature to obtain a first evaluation result;
the determining module 504 is configured to determine whether the first evaluation result meets a preset condition, obtain a first determination result, and determine an error detection result of the to-be-detected speech recognition result according to the first determination result.
In an embodiment, the determining module 502 is configured to obtain a translation score and a confusion score of the machine translation model for the translation result to be tested, so as to obtain a first translation score and a first confusion score; inputting the speech recognition result to be detected into the language model corresponding to the first language to obtain a score of the first language model; determining the first translation score, the first confusion score, and the first language model score as first assessment features.
In an embodiment, the evaluation module 503 is configured to obtain a feature weight corresponding to each of the first translation score, the first confusion score, and the first language model score, so as to obtain three feature weights; and weighting the first translation score, the first confusion score and the first language model score by using the three feature weights to obtain the first evaluation result.
In an embodiment, the determining module 502 is configured to obtain a sample speech recognition result and a preset detection result of the sample speech recognition result; translating the sample speech recognition result from the first language to the second language by using the machine translation model to obtain a sample translation result; determining a second translation score, a second confusion score, and a second language model score based on the sample translation results and the sample speech recognition results; acquiring a preset weight corresponding to each of the second translation score, the second confusion score and the second language model score to obtain three preset weights; and determining the three feature weights by using the second translation score, the second confusion score, the second language model score, the preset detection result and the three preset weights.
In an embodiment, the determining module 502 is configured to perform weighting processing on the second translation score, the second confusion score, and the second language model score by using the three preset weights, so as to obtain a second evaluation result; judging whether the second evaluation result meets the preset condition or not to obtain a second judgment result, and determining an error detection result of the sample voice recognition result according to the second judgment result; and adjusting the three preset weights based on the error detection result of the sample voice recognition result and the preset detection result to obtain the three characteristic weights.
In an embodiment, the determining module 502 is configured to adjust the three preset weights according to a weight adjustment algorithm until an error detection result of the sample speech recognition result is the same as the preset detection result, so as to obtain the three feature weights.
In an embodiment, the determining module 504 is configured to determine that an error detection result of the to-be-detected speech recognition result is an error if the first determining result is that the first evaluation result does not satisfy the preset condition; and determining that the error detection result of the speech recognition result to be detected is error-free under the condition that the first judgment result is that the first evaluation result meets the preset condition.
In one embodiment, the determining module 504 is configured to compare the first evaluation result with an evaluation threshold; determining that the first judgment result is that the first evaluation result does not meet the preset condition under the condition that the first evaluation result is smaller than the evaluation threshold; and determining that the first judgment result is that the first evaluation result meets the preset condition when the first evaluation result is greater than or equal to the evaluation threshold.
It should be noted that, in practical applications, the steps executed by the translation module 501, the determination module 502, the evaluation module 503, and the judgment module 504 may be implemented by a processor.
It should be noted that: in the voice recognition result detection apparatus provided in the above embodiment, when performing error detection on a voice recognition result, only the division of each program module is taken as an example, and in practical applications, the processing may be distributed to different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or the module processing described above. In addition, the voice recognition result detection apparatus provided in the above embodiment and the voice recognition result detection method embodiment belong to the same concept, and specific implementation processes thereof are described in the method embodiment and are not described herein again.
Based on the hardware implementation of the program module, and in order to implement the method of the application embodiment, the embodiment of the application further provides a device for detecting a voice recognition result. Fig. 6 is a schematic structural diagram of a speech recognition result detection apparatus according to an embodiment of the present application. As shown in fig. 6, the voice recognition result detecting device includes: a processor 601, memory 602, and a communication bus 603;
the communication bus 603 is configured to implement communication connection between the processor 601 and the memory 602;
the processor 601 is configured to execute the voice recognition result detection program stored in the memory 602 to implement the voice recognition result detection method.
The embodiment of the application provides a voice recognition result detection device, which is used for acquiring a voice recognition result to be detected, and translating the voice recognition result to be detected from a first language to a second language by using a machine translation model to obtain a translation result to be detected; determining a first evaluation characteristic based on the translation result to be tested and the voice recognition result to be tested; the first evaluation feature is used for representing the recognition and translation effects of the voice recognition result to be detected; evaluating the voice recognition result to be tested based on the first evaluation characteristic to obtain a first evaluation result; and judging whether the first evaluation result meets a preset condition or not to obtain a first judgment result, and determining an error detection result of the speech recognition result to be detected according to the first judgment result. The speech recognition result detection device provided by the embodiment of the application translates the speech recognition result to be detected into another language, carries out error detection on the speech recognition to be detected by combining the translation result, and can obtain more characteristics to realize error detection and improve the accuracy of error detection compared with the error detection based on the relevant characteristics of a single language.
Embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored, the computer program, when executed by one or more processors, implementing the above simultaneous interpretation method. The computer-readable storage medium may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or may be a respective device, such as a mobile phone, computer, tablet device, personal digital assistant, etc., that includes one or any combination of the above-mentioned memories.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks in the flowchart and/or block diagram block or blocks.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

  1. A voice recognition result detection method comprises the following steps:
    acquiring a speech recognition result to be detected, and translating the speech recognition result to be detected from a first language to a second language by using a machine translation model to obtain a translation result to be detected;
    determining a first evaluation characteristic based on the translation result to be tested and the voice recognition result to be tested; the first evaluation feature is used for representing the recognition and translation effects of the voice recognition result to be detected;
    evaluating the voice recognition result to be tested based on the first evaluation characteristic to obtain a first evaluation result;
    and judging whether the first evaluation result meets a preset condition or not to obtain a first judgment result, and determining an error detection result of the to-be-detected voice recognition result according to the first judgment result.
  2. The method of claim 1, wherein the determining a first evaluation feature based on the translation result to be tested and the speech recognition result to be tested comprises:
    obtaining a translation score and a confusion score of the machine translation model on the translation result to be detected, and obtaining a first translation score and a first confusion score;
    inputting the speech recognition result to be tested into the language model corresponding to the first language to obtain a first language model score;
    determining the first translation score, the first confusion score, and the first language model score as the first assessment feature.
  3. The method of claim 2, wherein the evaluating the speech recognition result to be tested based on the first evaluation feature to obtain a first evaluation result comprises:
    acquiring a feature weight corresponding to each of the first translation score, the first confusion score and the first language model score to obtain three feature weights;
    and weighting the first translation score, the first confusion score and the first language model score by using the three feature weights to obtain the first evaluation result.
  4. The method of claim 3, wherein before obtaining the feature weight corresponding to each of the first translation score, the first confusion score, and the first language model score, three feature weights are obtained, the method further comprises:
    acquiring a sample voice recognition result and a preset detection result of the sample voice recognition result;
    translating the sample speech recognition result from the first language to the second language by using the machine translation model to obtain a sample translation result;
    determining a second translation score, a second confusion score, and a second language model score based on the sample translation results and the sample speech recognition results;
    acquiring a preset weight corresponding to each of the second translation score, the second confusion score and the second language model score to obtain three preset weights;
    and determining the three feature weights by using the second translation score, the second confusion score, the second language model score, the preset detection result and the three preset weights.
  5. The method of claim 4, wherein said determining the three feature weights using the second translation score, the second confusion score, the second language model score, the preset detection results, and the three preset weights comprises:
    weighting the second translation score, the second confusion score and the second language model score by using the three preset weights to obtain a second evaluation result; the second evaluation feature is used for characterizing the recognition and translation effects of the sample voice recognition result;
    judging whether the second evaluation result meets the preset condition or not to obtain a second judgment result, and determining an error detection result of the sample voice recognition result according to the second judgment result;
    and adjusting the three preset weights based on the error detection result of the sample voice recognition result and the preset detection result to obtain the three characteristic weights.
  6. The method of claim 5, wherein the adjusting the three preset weights based on the false detection result of the sample speech recognition result and the preset detection result to obtain the three feature weights comprises:
    and adjusting the three preset weights according to a weight adjustment algorithm until the error detection result of the sample voice recognition result is the same as the preset detection result, thereby obtaining the three characteristic weights.
  7. The method according to any one of claims 1 to 6, wherein the determining the error detection result of the speech recognition result to be detected according to the first judgment result comprises:
    determining that an error detection result of the speech recognition result to be detected is an error when the first judgment result is that the first evaluation result does not meet the preset condition;
    and determining that the error detection result of the speech recognition result to be detected is error-free under the condition that the first judgment result is that the first evaluation result meets the preset condition.
  8. The method according to any one of claims 1 to 7, wherein the determining whether the first evaluation result satisfies a preset condition to obtain a first determination result comprises:
    comparing the first evaluation result with an evaluation threshold;
    determining that the first judgment result is that the first evaluation result does not meet the preset condition under the condition that the first evaluation result is smaller than the evaluation threshold;
    and determining that the first evaluation result meets the preset condition when the first evaluation result is greater than or equal to the evaluation threshold.
  9. A speech recognition result detection apparatus comprising:
    the translation module is configured to obtain a speech recognition result to be tested, and translate the speech recognition result to be tested from a first language to a second language by using a machine translation model to obtain a translation result to be tested;
    a determination module configured to determine a first evaluation feature based on the translation result to be tested and the speech recognition result to be tested; the first evaluation feature is used for representing the recognition and translation effects of the voice recognition result to be detected;
    the evaluation module is configured to evaluate the speech recognition result to be tested based on the first evaluation characteristic to obtain a first evaluation result;
    and the judging module is configured to judge whether the first evaluation result meets a preset condition to obtain a first judgment result, and determine an error detection result of the to-be-detected voice recognition result according to the first judgment result.
  10. A speech recognition result detection apparatus, the apparatus comprising a processor and a memory;
    the processor configured to execute the voice recognition result detection program stored in the memory to implement the voice recognition result detection method of any one of claims 1 to 8.
  11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a speech recognition result detection method according to any one of claims 1 to 8.
CN202080088999.3A 2020-01-10 2020-01-10 Voice recognition result detection method and device and storage medium Pending CN114846543A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/071389 WO2021138898A1 (en) 2020-01-10 2020-01-10 Speech recognition result detection method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
CN114846543A true CN114846543A (en) 2022-08-02

Family

ID=76787656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080088999.3A Pending CN114846543A (en) 2020-01-10 2020-01-10 Voice recognition result detection method and device and storage medium

Country Status (2)

Country Link
CN (1) CN114846543A (en)
WO (1) WO2021138898A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113746989A (en) * 2021-08-23 2021-12-03 北京高阳捷迅信息技术有限公司 Method, device, equipment and storage medium for customer service intelligent quality inspection

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000222406A (en) * 1999-01-27 2000-08-11 Sony Corp Voice recognition and translation device and its method
CN104050160B (en) * 2014-03-12 2017-04-05 北京紫冬锐意语音科技有限公司 Interpreter's method and apparatus that a kind of machine is blended with human translation
CN105336342B (en) * 2015-11-17 2019-05-28 科大讯飞股份有限公司 Speech recognition result evaluation method and system
CN107086040B (en) * 2017-06-23 2021-03-02 歌尔股份有限公司 Voice recognition capability test method and device
CN107544726B (en) * 2017-07-04 2021-04-16 百度在线网络技术(北京)有限公司 Speech recognition result error correction method and device based on artificial intelligence and storage medium
CN110211571B (en) * 2019-04-26 2023-05-26 平安科技(深圳)有限公司 Sentence fault detection method, sentence fault detection device and computer readable storage medium
CN110556127B (en) * 2019-09-24 2021-01-01 北京声智科技有限公司 Method, device, equipment and medium for detecting voice recognition result

Also Published As

Publication number Publication date
WO2021138898A1 (en) 2021-07-15

Similar Documents

Publication Publication Date Title
CN108091328B (en) Speech recognition error correction method and device based on artificial intelligence and readable medium
US11093813B2 (en) Answer to question neural networks
US20210027788A1 (en) Conversation interaction method, apparatus and computer readable storage medium
US11775777B2 (en) Artificial intelligence system using phrase tables to evaluate and improve neural network based machine translation
CN110415679B (en) Voice error correction method, device, equipment and storage medium
JP5901001B1 (en) Method and device for acoustic language model training
CN106875949B (en) Correction method and device for voice recognition
US20140350934A1 (en) Systems and Methods for Voice Identification
CN106503231B (en) Search method and device based on artificial intelligence
US11270686B2 (en) Deep language and acoustic modeling convergence and cross training
US20180277145A1 (en) Information processing apparatus for executing emotion recognition
WO2018153316A1 (en) Method and apparatus for obtaining text extraction model
CN109102824B (en) Voice error correction method and device based on man-machine interaction
WO2020252935A1 (en) Voiceprint verification method, apparatus and device, and storage medium
CN109947651B (en) Artificial intelligence engine optimization method and device
WO2023045184A1 (en) Text category recognition method and apparatus, computer device, and medium
CN111583906A (en) Role recognition method, device and terminal for voice conversation
CN111326177B (en) Voice evaluation method, electronic equipment and computer readable storage medium
KR20180039371A (en) Automatic Interpretation System
CN111354354B (en) Training method, training device and terminal equipment based on semantic recognition
KR101374900B1 (en) Apparatus for grammatical error correction and method for grammatical error correction using the same
CN114846543A (en) Voice recognition result detection method and device and storage medium
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN110335628B (en) Voice test method and device of intelligent equipment and electronic equipment
CN114783405B (en) Speech synthesis method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination