CN110085261B - Pronunciation correction method, device, equipment and computer readable storage medium - Google Patents

Pronunciation correction method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN110085261B
CN110085261B CN201910409877.9A CN201910409877A CN110085261B CN 110085261 B CN110085261 B CN 110085261B CN 201910409877 A CN201910409877 A CN 201910409877A CN 110085261 B CN110085261 B CN 110085261B
Authority
CN
China
Prior art keywords
phoneme
pronunciation
confusion
audio data
actual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910409877.9A
Other languages
Chinese (zh)
Other versions
CN110085261A (en
Inventor
刘晨晨
沈欣尧
张蕾
杨晓飞
蒋成林
张潇君
周达
李坤
马义飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Liulishuo Information Technology Co ltd
Original Assignee
Shanghai Liulishuo Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Liulishuo Information Technology Co ltd filed Critical Shanghai Liulishuo Information Technology Co ltd
Priority to CN201910409877.9A priority Critical patent/CN110085261B/en
Publication of CN110085261A publication Critical patent/CN110085261A/en
Application granted granted Critical
Publication of CN110085261B publication Critical patent/CN110085261B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/04Speaking
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a pronunciation correcting method, which comprises the steps of obtaining audio data input aiming at a preset text; comparing the audio data with a standard pronunciation model of a preset text to detect whether the pronunciation error exists in the phoneme; and when the pronunciation error is detected, determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error. The method and the device can detect that errors exist in pronunciation of the user, can further judge actual confusion phonemes mistakenly formed when the user actually pronounces, solve the problem that the user cannot know pronunciation errors by himself, enable the user to correct existing pronunciation problems in a targeted mode, and improve learning efficiency. In addition, the application also provides a pronunciation correcting device, equipment and a computer readable storage medium with the technical effects.

Description

Pronunciation correction method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of speech technology, and in particular, to a pronunciation correction method, apparatus, device, and computer-readable storage medium.
Background
With the development of scientific technology, the application of language learning based on the internet is rapidly developed. In some language learning applications, an application provider sends learning materials to a client through the internet, and a user acquires the learning materials through the client to perform corresponding learning. For language learning, in addition to learning grammar and vocabulary, pronunciation capability is one of the most important capabilities. In general, the user can improve the pronunciation capability of the user by reading aloud, reading with the back and the like. However, in most cases, the user cannot know whether the pronunciation is accurate.
The traditional scheme is that a voice is scored, the pronunciation of a learner is compared with a standard pronunciation, and the learner corrects the pronunciation by simulating the standard pronunciation, but sometimes a user cannot clearly distinguish the difference between the pronunciation and the standard pronunciation, and cannot clearly adjust the pronunciation to send a target pronunciation, so that the user cannot be helped to correct the pronunciation problem in a targeted and more efficient manner.
Disclosure of Invention
The invention aims to provide a pronunciation correction method, a pronunciation correction device, pronunciation correction equipment and a computer readable storage medium, which aim to solve the problem that the existing method cannot carry out pronunciation correction in a targeted and efficient manner.
In order to solve the above technical problem, the present invention provides a pronunciation correction method, including:
acquiring audio data for a predetermined text entry;
comparing the audio data with a standard pronunciation model of the preset text to detect whether the pronunciation error exists in the phoneme;
and when the pronunciation error is detected, determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error.
Optionally, after determining the actual confusing phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected, the method further includes:
selecting a plurality of words respectively comprising the target phoneme and the actual confusion phoneme;
playing a demonstration record of the corresponding word, and/or randomly playing the audio of any one of the words for selecting the corresponding word for learning.
Optionally, after determining the actual confusing phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected, the method further includes:
and prompting the difference between the target phoneme and the actual confusion phoneme through voice and/or characters so as to assist a user in adjusting the pronunciation mode.
Optionally, after determining the actual confusing phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected, the method further includes:
selecting a plurality of words containing target phonemes for a user to carry out sound correction practice;
acquiring practice audio data input by a user for the words;
analyzing the practice audio data to obtain practice evaluation indication information, wherein the practice evaluation indication information is used for indicating which sound in a phoneme and a target phoneme is actually mixed with the practice audio data;
and feeding back the exercise evaluation indication information through a display interface.
Optionally, the comparing the audio data with the standard pronunciation model of the predetermined text to detect whether a pronunciation error exists in the phoneme includes:
comparing the audio data with a standard pronunciation model of the preset text by adopting an acoustic model to obtain a likelihood score and a time boundary of each phoneme;
and carrying out editing distance alignment to determine whether the phonemes have pronunciation errors.
Optionally, when it is detected that there is a pronunciation error, the determining an actual confusion phoneme corresponding to the target phoneme with the current pronunciation error includes:
pre-constructing a confusion phoneme set corresponding to each phoneme;
when pronunciation errors are detected, Bayesian judgment is adopted at inconsistent positions to detect the probability that the current pronunciation is each confusion phoneme in the confusion phoneme set;
and determining the confusion phoneme with the highest probability as the actual confusion phoneme.
Optionally, after the determining the actual confusing phoneme corresponding to the current mispronunciation target phoneme when the presence of the mispronunciation is detected, the method further includes:
and adjusting the likelihood scores of the corresponding phonemes according to the determined actual confusing phonemes.
The present application also provides a pronunciation correction device, including:
the acquisition module is used for acquiring audio data aiming at the preset text entry;
the detection module is used for comparing the audio data with the standard pronunciation model of the preset text to detect whether the pronunciation error exists in the phoneme;
and the determining module is used for determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected.
The application also provides pronunciation correction equipment, is applied to the server side, equipment includes:
a memory for storing a computer program;
a processor for implementing the following steps when executing the computer program: acquiring audio data for a predetermined text entry; comparing the audio data with a standard pronunciation model of the preset text to detect whether the pronunciation error exists in the phoneme; and when the pronunciation error is detected, determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error.
The application also provides pronunciation correction equipment, is applied to the client, equipment includes:
the audio acquisition device is used for inputting audio data aiming at the preset text;
the communication device is used for sending the audio data to a server so that the server can compare the audio data with a standard pronunciation model of the preset text to detect whether the pronunciation of the phoneme is wrong; when a pronunciation error is detected, determining an actual confusion phoneme corresponding to a target phoneme with the current pronunciation error; and receiving the actual confusion phoneme returned by the server;
display means for displaying the actual confusing phoneme on a display interface.
The present application further provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of any of the pronunciation correction methods described above.
The pronunciation correcting method provided by the invention comprises the steps of acquiring audio data input aiming at a preset text; comparing the audio data with a standard pronunciation model of a preset text to detect whether the pronunciation error exists in the phoneme; and when the pronunciation error is detected, determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error. The method and the device can detect that errors exist in pronunciation of the user, can further judge actual confusion phonemes mistakenly formed when the user actually pronounces, solve the problem that the user cannot know pronunciation errors by himself, enable the user to correct existing pronunciation problems in a targeted mode, and improve learning efficiency. In addition, the application also provides a pronunciation correcting device, equipment and a computer readable storage medium with the technical effects.
Drawings
In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flow chart of one embodiment of a pronunciation correction method provided herein;
FIG. 2 is a flowchart of a process for detecting whether a phoneme therein has a pronunciation error;
FIG. 3 is a flowchart illustrating a process for determining an actual confusing phoneme corresponding to a current mispronounced target phoneme;
FIG. 4 is a flow chart of another embodiment of a pronunciation correction method provided herein;
FIG. 5 is a diagram illustrating a first specific practice of correcting a tone;
FIG. 6 is a schematic diagram of a targeted sound correction exercise mode II;
FIG. 7 is a third schematic diagram of a targeted practice of correcting a tone;
fig. 8 is a block diagram of a pronunciation correction apparatus according to an embodiment of the present invention;
fig. 9 is a block diagram of a pronunciation correction device applied to a server according to an embodiment of the present invention;
fig. 10 is a block diagram illustrating a structure of a pronunciation correction device applied to a client according to an embodiment of the present invention;
fig. 11 is a block diagram of a pronunciation correction system according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
The embodiment of the invention can be used in pronunciation learning scenes, in particular pronunciation learning scenes or pronunciation correction scenes in language learning, wherein languages include but are not limited to foreign languages such as English, French, German and Japanese, and Chinese branches such as Mandarin, Cantonese and Sichuan. The language learning scenario according to the embodiment of the present invention may be, for example, a pronunciation evaluation scenario, a pronunciation correction scenario, or the like in the language learning software or the language learning terminal, or may be another language learning scenario, and the embodiment of the present invention is not limited.
As will be explained in detail below in the application scenario of the embodiment of the present application, a user may perform pronunciation learning through a client, and the client may display a content to be learned by the user on a display interface and may output an audio content in a voice form to the user through an audio playing device such as a speaker. When the user learns the pronunciation of the voice, the client can collect the audio data of the user during pronunciation through the audio collecting device so as to carry out subsequent operation. It can be understood that the subject performing the pronunciation correction operation may be a client or a server, which does not affect the implementation of the present application.
The client in the embodiment of the present invention may include, but is not limited to: smart phones, tablet computers, MP4, MP3, PCs, PDAs, wearable devices, head-mounted display devices, and the like; the server may include, but is not limited to: a single web server, a server group of multiple web servers, or a cloud based on cloud computing consisting of a large number of computers or web servers.
With reference to the above application scenarios, a flowchart of a specific embodiment of the pronunciation correction method provided in the present application is shown in fig. 1, and the method includes:
step S101: acquiring audio data for a predetermined text entry;
wherein the predetermined text comprises one or more sentences, each sentence comprising one or more words. The user can read the preset text aloud, the voice aiming at the preset text is input through the client, and the audio data corresponding to the voice is obtained after the voice is collected by the audio collecting device.
Step S102: comparing the audio data with a standard pronunciation model of the preset text to detect whether the pronunciation error exists in the phoneme;
and comparing the audio data with the standard pronunciation model to detect whether the pronunciation error exists in the phoneme. It should be noted that the process may be executed by the client or by the background server, which does not affect the implementation of the present application.
As shown in fig. 2, the process of detecting whether there is a pronunciation error in the phoneme in this step specifically includes:
step S1021: comparing the audio data with a standard pronunciation model of the preset text by adopting an acoustic model to obtain a likelihood score and a time boundary of each phoneme;
specifically, the CE model may be used to analyze the audio data, the viterbi algorithm is used to calculate the time boundary of each phoneme, and the GOP algorithm is used to calculate the score of each phoneme.
Step S1022: and carrying out editing distance alignment to determine whether the phonemes have pronunciation errors.
After the time boundary of each phoneme is obtained, a phoneme sequence of the actual pronunciation of the user can be obtained, and the phoneme sequence of the actual pronunciation of the book is compared with a phoneme sequence corresponding to the standard pronunciation model to determine whether the phonemes are consistent or not. If the phoneme in the phoneme is inconsistent with the pronunciation error, determining that the phoneme in the phoneme has pronunciation error; and if the phoneme in the phoneme is completely consistent with the phoneme in the phoneme, determining that the phoneme in the phoneme has no pronunciation error.
Step S103: and when the pronunciation error is detected, determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error.
When the pronunciation error is detected, the step further determines the type of the error, that is, which confusing phoneme is specifically wrong, and determines the actual confusing phoneme.
As shown in fig. 3, the process of determining the actual confusing phoneme corresponding to the current mispronunciation target phoneme in this step specifically includes:
step S1031: pre-constructing a confusion phoneme set corresponding to each phoneme;
and finding all the phonemes which are easy to be confused, using the phonemes as elements in the confused phoneme set, and constructing the confused phoneme set corresponding to each phoneme.
Step S1032: when pronunciation errors are detected, Bayesian judgment is adopted at inconsistent positions to detect the probability that the current pronunciation is each confusion phoneme in the confusion phoneme set;
step S1033: and determining the confusion phoneme with the highest probability as the actual confusion phoneme.
After detecting that pronunciation errors exist in the steps, detecting the probability of the current pronunciation as each confusion phoneme in the confusion phoneme set by adopting Bayesian judgment at the inconsistent position, and determining the confusion phoneme with the maximum probability as the actual confusion phoneme.
Assuming that there is a segment of audio o, it is decided to which phoneme h it belongsi or hjUsually, only p (h) needs to be judgedi|o)、p(hjI o) the size of the probability, the largest one being taken asIs the recognition result. In the present embodiment, a factor α is added to the aboveijWhen judging, consider p (h)i| o) and αijp(hiI o) size, alphaijIt serves as a prior probability. And is therefore referred to as a bayesian decision.
A confusion phoneme table (minor pair) is established for each phoneme, and the confusion phoneme table can be pre-recorded and established based on the teaching and research experience or can be formed after a neural network learns a large amount of user pronunciation data. Aiming at any phoneme, the optimal threshold corresponding to the confusion phoneme can be searched according to the confusion phoneme table and the development set, and the optimal threshold is a prior factor used by the phoneme recognition network in the process of recognizing the phoneme. See the following equation:
Figure BDA0002062477530000081
where h denotes which phoneme is, o denotes an acoustic signal, p (h)i|o)p(hjI o) the phoneme h corresponding to the larger conditional probability valuei or hjI.e. the phoneme that the final audio o actually corresponds to. By adding a priori factor alpha in the development setijSo that it will refer to p (h) when calculating the output detection resulti| o) and αijp(hiI o) size, alphaijI.e. as a prior probability. Thus, the phoneme recognition network in the embodiment of the invention is more flexible through the principle, and more accurate confusion phoneme recognition can be realized without increasing the data volume of a word dictionary on the basis of the prior art.
Further, after determining the actual confusion phoneme corresponding to the current pronunciation-wrong target phoneme when the pronunciation error is detected, the method further includes: and adjusting the scores of the corresponding phonemes according to the determined actual confusion phonemes. When the phoneme score and the model of phoneme error detection are inconsistent, inconsistency may occur. For example: to correct pronunciation and score the word bit [ bIt ], the correction model detects that the user uttered [ I ] as [ I: but the CE model may still score [ I ] higher, which can be adjusted appropriately according to the error detection result.
The pronunciation correcting method provided by the invention comprises the steps of acquiring audio data input aiming at a preset text; comparing the audio data with a standard pronunciation model of a preset text to detect whether the pronunciation error exists in the phoneme; and when the pronunciation error is detected, determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error. The method and the device can detect that errors exist in pronunciation of the user, can further judge actual confusion phonemes mistakenly formed when the user actually pronounces, solve the problem that the user cannot know pronunciation errors by himself, enable the user to correct existing pronunciation problems in a targeted mode, and improve learning efficiency.
Referring to fig. 4, on the basis of the foregoing embodiment, after determining that there is a pronunciation error, the pronunciation correcting method provided in the present application may further include: step S104: and performing targeted pronunciation exercise on the target phoneme and the confusion phoneme to achieve the effect of helping a user to correct pronunciation problems.
One specific implementation may be: selecting a plurality of words respectively comprising the target phoneme and the actual confusion phoneme; playing a demonstration record of the corresponding word, and/or randomly playing the audio of any one of the words for selecting the corresponding word for learning.
For example, two words (e.g., beat/bi: t/and bit/bIt /) that respectively contain the actual confusion phoneme and the target phoneme and have all the other pronunciations consistent with each other can be selected and played randomly, so that the learner can select the corresponding word. If the learner selects the wrong word, the learner alternately plays the demonstration recordings of the two words, and the user can transversely compare the two words in a short time to hear the pronunciation difference of the two sounds. With this exercise, the user can be made to aurally distinguish the target phoneme from the actual confusing phonemes.
Another specific embodiment may be: and prompting the difference between the target phoneme and the actual confusion phoneme through voice and/or characters so as to assist a user in adjusting the pronunciation mode.
The difference between the two sounds is directly explained by voice and text, and how to adjust on the current wrong habitual pronunciation method is indicated. Through the practice, the user can be made to know the pronunciation difference of the target phoneme and the actual confusion phoneme.
Yet another embodiment may be: selecting a plurality of words containing target phonemes for a user to carry out sound correction practice; acquiring practice audio data input by a user for the words; analyzing the practice audio data to obtain practice evaluation indication information, wherein the practice evaluation indication information is used for indicating which sound in a phoneme and a target phoneme is actually mixed with the practice audio data; and feeding back the exercise evaluation indication information through a display interface.
The vocabulary containing the target phonemes is read aloud by the pronunciation correction technique that the learner has learned. When the feedback is correct, the score based on the target sound is not simply shown, but the sound which the user pronounces between the confusion sound and the target sound is more similar to is shown. Since adult learners have developed certain pronunciation habits and it is difficult to quickly correct pronunciation, a way to intuitively demonstrate the relatively limited pronunciation capability progress of learners is required. By feeding back the pronunciation of the user which is more inclined to a certain pronunciation point, the user can continuously try to learn the pronunciation correction skill and feedback and confirm whether the direction of the pronunciation correction is correct or not and whether the pronunciation is in place or not according to the score deviation. Through the exercise mode, the user can more easily perceive the drip progress of the user, and the learner is assisted to judge whether the practice and the direction of the self sound correction are correct or not according to the feedback of the deviation sound, so that the effects of helping the user to identify and correct the pronunciation are achieved.
Fig. 5 to 7 show a specific sound-correcting exercise mode schematic diagram, in which a word including a target phoneme and an actual confusion phoneme is shown in the schematic diagram, pronunciation difference prompt information of the target phoneme and the actual confusion phoneme, including mouth shape contrast pictures and pronunciation difference text descriptions, is shown in the schematic diagram, and in the schematic diagram, in the third schematic diagram, a specific exercise word "sheet" is shown, which one of the target phoneme and the confusion phoneme is the current pronunciation of a user, and is visually displayed through a progress bar on an interface.
In the following, the pronunciation correction device provided by the embodiment of the present invention is introduced, and the pronunciation correction device described below and the pronunciation correction method described above may be referred to correspondingly.
Fig. 8 is a block diagram of a pronunciation correction apparatus according to an embodiment of the present invention, where the pronunciation correction apparatus according to fig. 8 may include:
an obtaining module 100, configured to obtain audio data for a predetermined text entry;
the detection module 200 is configured to compare the audio data with a standard pronunciation model of the predetermined text to detect whether a pronunciation error exists in a phoneme of the predetermined text;
the determining module 300 is configured to determine an actual confusing phoneme corresponding to the target phoneme with the current pronunciation error when it is detected that there is a pronunciation error.
On the basis of any of the above embodiments, the pronunciation correction device provided by the present application may further include:
the system comprises a first feedback module, a second feedback module and a third feedback module, wherein the first feedback module is used for selecting a plurality of words respectively comprising a target phoneme and an actual confusion phoneme after determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected; playing a demonstration record of the corresponding word, and/or randomly playing the audio of any one of the words for selecting the corresponding word for learning.
On the basis of any of the above embodiments, the pronunciation correction device provided by the present application may further include:
and the second feedback module is used for determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected, and prompting the difference between the target phoneme and the actual confusion phoneme through voice and/or characters so as to assist the user in adjusting the pronunciation mode.
On the basis of any of the above embodiments, the pronunciation correction device provided by the present application may further include:
the third feedback module is used for selecting a plurality of words containing the target phoneme for the user to carry out sound correction exercise after determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected; acquiring practice audio data input by a user for the words; analyzing the practice audio data to obtain practice evaluation indication information, wherein the practice evaluation indication information is used for indicating which sound in a phoneme and a target phoneme is actually mixed with the practice audio data; and feeding back the exercise evaluation indication information through a display interface.
As a specific implementation manner, in this embodiment, the detection module 200 is specifically configured to: comparing the audio data with a standard pronunciation model of the preset text by adopting an acoustic model to obtain a likelihood score and a time boundary of each phoneme; and carrying out editing distance alignment to determine whether the phonemes have pronunciation errors.
As a specific implementation manner, in this embodiment, the determining module 300 is specifically configured to: pre-constructing a confusion phoneme set corresponding to each phoneme; when pronunciation errors are detected, Bayesian judgment is adopted at inconsistent positions to detect the probability that the current pronunciation is each confusion phoneme in the confusion phoneme set; and determining the confusion phoneme with the highest probability as the actual confusion phoneme.
As a specific implementation manner, in this embodiment, the detection module 200 is specifically configured to: when a pronunciation error is detected, after determining an actual confusion phoneme corresponding to a target phoneme with the current pronunciation error, the method further comprises the following steps: and adjusting the likelihood scores of the corresponding phonemes according to the determined actual confusing phonemes.
The pronunciation correcting device of this embodiment is used for implementing the pronunciation correcting method, and therefore the specific implementation of the pronunciation correcting device can be seen in the foregoing embodiments of the pronunciation correcting method, for example, the obtaining module 100, the detecting module 200, and the determining module 300 are respectively used for implementing steps S101, S102, and S103 of the pronunciation correcting method, so the specific implementation thereof can refer to the description of the corresponding embodiments of each part, and will not be described herein again.
The pronunciation correcting device provided by the invention obtains the audio data input aiming at the preset text; comparing the audio data with a standard pronunciation model of a preset text to detect whether the pronunciation error exists in the phoneme; and when the pronunciation error is detected, determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error. The method and the device can detect that errors exist in pronunciation of the user, can further judge actual confusion phonemes mistakenly formed when the user actually pronounces, solve the problem that the user cannot know pronunciation errors by himself, enable the user to correct existing pronunciation problems in a targeted mode, and improve learning efficiency.
In addition, the present application also provides a pronunciation correction device, which is applied to the server 1, as shown in fig. 9, the device includes:
a memory 11 for storing a computer program;
a processor 12 for implementing the following steps when executing the computer program: acquiring audio data for a predetermined text entry; comparing the audio data with a standard pronunciation model of the preset text to detect whether the pronunciation error exists in the phoneme; and when the pronunciation error is detected, determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the pronunciation correction device, such as a hard disk. The memory 11 may also be an external storage device of the pronunciation correction device in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit of the pronunciation correction device and an external storage device. The memory 11 may be used not only to store application software installed in the pronunciation correction device and various types of data such as the code of the pronunciation correction program 01, but also to temporarily store data that has been output or is to be output.
The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data, such as executing the pronunciation correction program 01.
Optionally, the processor 12 is configured to implement the following steps when executing the computer program: comparing the audio data with a standard pronunciation model of the preset text by adopting an acoustic model to obtain a likelihood score and a time boundary of each phoneme; and carrying out editing distance alignment to determine whether the phonemes have pronunciation errors.
Optionally, the processor 12 is configured to implement the following steps when executing the computer program: pre-constructing a confusion phoneme set corresponding to each phoneme; when pronunciation errors are detected, Bayesian judgment is adopted at inconsistent positions to detect the probability that the current pronunciation is each confusion phoneme in the confusion phoneme set; and determining the confusion phoneme with the highest probability as the actual confusion phoneme.
Optionally, the processor 12 is configured to implement the following steps when executing the computer program: when a pronunciation error is detected, after determining an actual confusion phoneme corresponding to a target phoneme with the current pronunciation error, the method further comprises the following steps: and adjusting the scores of the corresponding phonemes according to the determined actual confusion phonemes.
It can be understood that the server in the embodiment of the present application may include, but is not limited to: a single web server, a server group of multiple web servers, or a cloud based on cloud computing consisting of a large number of computers or web servers.
Furthermore, the method is simple. The present application also provides a pronunciation correction device applied to the client 2, as shown in fig. 10, the device includes:
an audio acquisition device 21 for entering audio data for a predetermined text;
the communication device 22 is used for sending the audio data to a server, so that the server compares the audio data with a standard pronunciation model of the predetermined text to detect whether a pronunciation error exists in a phoneme in the audio data; when a pronunciation error is detected, determining an actual confusion phoneme corresponding to a target phoneme with the current pronunciation error; and receiving the actual confusion phoneme returned by the server;
display means 23 for displaying said actual confusing phoneme on a display interface.
Optionally, the pronunciation correction device provided by the present application may further include:
the first feedback device is used for selecting a plurality of words respectively comprising the target phoneme and the actual confusion phoneme after determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected; playing a demonstration record of the corresponding word, and/or randomly playing the audio of any one of the words for selecting the corresponding word for learning.
Optionally, the pronunciation correction device provided by the present application may further include:
and the second feedback device is used for prompting the difference between the target phoneme and the actual confusion phoneme through voice and/or characters after determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected, so as to assist the user in adjusting the pronunciation mode.
Optionally, the pronunciation correction device provided by the present application may further include:
the third feedback device is used for selecting a plurality of words containing the target phoneme for the user to carry out sound correction exercise after determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected; acquiring practice audio data input by a user for the words; analyzing the practice audio data to obtain practice evaluation indication information, wherein the practice evaluation indication information is used for indicating which sound in a phoneme and a target phoneme is actually mixed with the practice audio data; and feeding back the exercise evaluation indication information through a display interface.
It can be understood that the client in the embodiment of the present application may include, but is not limited to: smart phones, tablets, MP4, MP3, PCs, PDAs, wearable devices, head mounted display devices, and the like.
Further, the present application also provides a pronunciation correction system, as shown in fig. 11, the system includes any one of the above-mentioned servers 1 and any one of the above-mentioned clients 2. The user can carry out pronunciation study through the client, and the client can show the content that the user waited to study on display interface to can also export the audio frequency content of speech form to the user through audio playback devices such as speaker, when the user carries out pronunciation study of pronunciation, the client can gather the audio data when the user pronounces through audio acquisition device, and with audio data transmission to server, carry out the process that pronunciation was corrected by the server. And after the audio data are analyzed at the server side and feedback information is obtained, the feedback information is sent to the client side. And displaying the feedback information through a display device of the client, and providing visual auxiliary information for the user.
Furthermore, the present application also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of any of the pronunciation correction methods described above.
The pronunciation correction device, pronunciation correction system, computer readable storage medium provided by the present application correspond to the aforementioned method. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In conclusion, the method and the device for detecting the pronunciation errors of the user can detect that the pronunciation of the user has errors, can further judge actual confusion phonemes mistakenly formed when the user actually pronounces, solve the problem that the user cannot know the pronunciation errors by himself, enable the user to correct the existing pronunciation problems in a targeted manner, and improve learning efficiency.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The pronunciation correction method, apparatus, device and computer readable storage medium provided by the present invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (7)

1. A pronunciation correction method, comprising:
acquiring audio data for a predetermined text entry;
comparing the audio data with a standard pronunciation model of the preset text by adopting an acoustic model to obtain a likelihood score and a time boundary of each phoneme;
carrying out editing distance alignment and determining whether the phoneme has pronunciation errors;
when pronunciation errors are detected, Bayesian judgment is adopted at inconsistent positions to detect the probability that the current pronunciation is each confusion phoneme in the confusion phoneme set; the confusion phoneme set is preset and corresponds to each phoneme;
determining the confusion phoneme with the maximum probability as an actual confusion phoneme;
adjusting the likelihood score of the mispronounced phoneme according to the actual confused phoneme;
when a pronunciation error is detected, after determining an actual confusion phoneme corresponding to a target phoneme with the current pronunciation error, the method further comprises:
selecting a plurality of words containing target phonemes for a user to carry out sound correction practice;
acquiring practice audio data input by a user for the words;
analyzing the practice audio data to obtain practice evaluation indication information, wherein the practice evaluation indication information is used for indicating which sound in a phoneme and a target phoneme is actually mixed with the practice audio data;
and feeding back the exercise evaluation indication information through a display interface.
2. The pronunciation correction method as claimed in claim 1, wherein the determining the actual confusing phoneme corresponding to the current mispronounced target phoneme further comprises, when the presence of the pronunciation error is detected:
selecting a plurality of words respectively comprising the target phoneme and the actual confusion phoneme;
playing a demonstration record of the corresponding word, and/or randomly playing the audio of any one of the words for selecting the corresponding word for learning.
3. The pronunciation correction method as claimed in claim 1, wherein the determining the actual confusing phoneme corresponding to the current mispronounced target phoneme further comprises, when the presence of the pronunciation error is detected:
and prompting the difference between the target phoneme and the actual confusion phoneme through voice and/or characters so as to assist a user in adjusting the pronunciation mode.
4. A pronunciation correction device, comprising:
the acquisition module is used for acquiring audio data aiming at the preset text entry;
the detection module is used for comparing the audio data with a standard pronunciation model of the preset text by adopting an acoustic model to obtain a likelihood score and a time boundary of each phoneme; carrying out editing distance alignment and determining whether the phoneme has pronunciation errors;
the determining module is used for detecting the probability that the current pronunciation is each confusion phoneme in the confusion phoneme set by adopting Bayesian judgment at the inconsistent position when the pronunciation error is detected; the confusion phoneme set is preset and corresponds to each phoneme; determining the confusion phoneme with the maximum probability as an actual confusion phoneme;
the detection module is further used for adjusting the likelihood score of the mispronounced phoneme according to the actual confusion phoneme;
the third feedback module is used for selecting a plurality of words containing the target phoneme for the user to carry out sound correction exercise after determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected; acquiring practice audio data input by a user for the words; analyzing the practice audio data to obtain practice evaluation indication information, wherein the practice evaluation indication information is used for indicating which sound in a phoneme and a target phoneme is actually mixed with the practice audio data; and feeding back the exercise evaluation indication information through a display interface.
5. A pronunciation correction device applied to a server, the device comprising:
a memory for storing a computer program;
a processor for implementing the following steps when executing the computer program: acquiring audio data for a predetermined text entry;
comparing the audio data with a standard pronunciation model of the preset text by adopting an acoustic model to obtain a likelihood score and a time boundary of each phoneme;
carrying out editing distance alignment and determining whether the phoneme has pronunciation errors;
when pronunciation errors are detected, Bayesian judgment is adopted at inconsistent positions to detect the probability that the current pronunciation is each confusion phoneme in the confusion phoneme set; the confusion phoneme set is preset and corresponds to each phoneme;
determining the confusion phoneme with the maximum probability as an actual confusion phoneme;
adjusting the likelihood score of the mispronounced phoneme according to the actual confused phoneme;
when a pronunciation error is detected, after determining an actual confusion phoneme corresponding to a target phoneme with the current pronunciation error, the method further comprises:
selecting a plurality of words containing target phonemes for a user to carry out sound correction practice;
acquiring practice audio data input by a user for the words;
analyzing the practice audio data to obtain practice evaluation indication information, wherein the practice evaluation indication information is used for indicating which sound in a phoneme and a target phoneme is actually mixed with the practice audio data;
and feeding back the exercise evaluation indication information through a display interface.
6. An pronunciation correction device applied to a client, the device comprising:
the audio acquisition device is used for inputting audio data aiming at the preset text;
the communication device is used for sending the audio data to a server so that the server can compare the audio data with a standard pronunciation model of the preset text by adopting an acoustic model to obtain a likelihood score and a time boundary of each phoneme;
carrying out editing distance alignment and determining whether the phoneme has pronunciation errors;
when pronunciation errors are detected, Bayesian judgment is adopted at inconsistent positions to detect the probability that the current pronunciation is each confusion phoneme in the confusion phoneme set; the confusion phoneme set is preset and corresponds to each phoneme;
determining the confusion phoneme with the maximum probability as an actual confusion phoneme;
adjusting the likelihood score of the mispronounced phoneme according to the actual confused phoneme;
the third feedback device is used for selecting a plurality of words containing the target phoneme for the user to carry out sound correction exercise after determining the actual confusion phoneme corresponding to the target phoneme with the current pronunciation error when the pronunciation error is detected; acquiring practice audio data input by a user for the words; analyzing the practice audio data to obtain practice evaluation indication information, wherein the practice evaluation indication information is used for indicating which sound in a phoneme and a target phoneme is actually mixed with the practice audio data; feeding back the exercise evaluation indication information through a display interface;
display means for displaying the actual confusing phoneme on a display interface.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the pronunciation correction method as claimed in any one of claims 1 to 3.
CN201910409877.9A 2019-05-16 2019-05-16 Pronunciation correction method, device, equipment and computer readable storage medium Active CN110085261B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910409877.9A CN110085261B (en) 2019-05-16 2019-05-16 Pronunciation correction method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910409877.9A CN110085261B (en) 2019-05-16 2019-05-16 Pronunciation correction method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110085261A CN110085261A (en) 2019-08-02
CN110085261B true CN110085261B (en) 2021-08-24

Family

ID=67420623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910409877.9A Active CN110085261B (en) 2019-05-16 2019-05-16 Pronunciation correction method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110085261B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782921B (en) * 2019-09-19 2023-09-22 腾讯科技(深圳)有限公司 Voice evaluation method and device, storage medium and electronic device
CN111292769A (en) * 2020-03-04 2020-06-16 苏州驰声信息科技有限公司 Method, system, device and storage medium for correcting pronunciation of spoken language
CN111462553B (en) * 2020-04-17 2021-03-30 杭州菲助科技有限公司 Language learning method and system based on video dubbing and sound correction training
CN111613244A (en) * 2020-05-20 2020-09-01 北京搜狗科技发展有限公司 Scanning and reading-following processing method and related device
CN113744718A (en) * 2020-05-27 2021-12-03 海尔优家智能科技(北京)有限公司 Voice text output method and device, storage medium and electronic device
CN111862960B (en) * 2020-08-07 2024-04-30 广州视琨电子科技有限公司 Pronunciation error detection method, pronunciation error detection device, electronic equipment and storage medium
CN112133325B (en) * 2020-10-14 2024-05-07 北京猿力未来科技有限公司 Wrong phoneme recognition method and device
CN112365752A (en) * 2020-12-03 2021-02-12 安徽信息工程学院 Parent-child interaction type early education system
CN112634862B (en) * 2020-12-18 2024-01-23 北京大米科技有限公司 Information interaction method and device, readable storage medium and electronic equipment
CN112614510B (en) * 2020-12-23 2024-04-30 北京猿力未来科技有限公司 Audio quality assessment method and device
CN113192494A (en) * 2021-04-15 2021-07-30 辽宁石油化工大学 Intelligent English language identification and output system and method
CN112992184B (en) * 2021-04-20 2021-09-10 北京世纪好未来教育科技有限公司 Pronunciation evaluation method and device, electronic equipment and storage medium
CN113536776B (en) * 2021-06-22 2024-06-14 深圳价值在线信息科技股份有限公司 Method for generating confusion statement, terminal device and computer readable storage medium
CN114327357B (en) * 2022-01-05 2024-02-02 郑州市金水区正弘国际小学 Language learning assisting method, electronic equipment and storage medium
CN115083437B (en) * 2022-05-17 2023-04-07 北京语言大学 Method and device for determining uncertainty of learner pronunciation
CN116894442B (en) * 2023-09-11 2023-12-05 临沂大学 Language translation method and system for correcting guide pronunciation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060116877A1 (en) * 2004-12-01 2006-06-01 Pickering John B Methods, apparatus and computer programs for automatic speech recognition
CN101727764A (en) * 2008-10-21 2010-06-09 微星科技股份有限公司 Method and device for assisting in correcting pronunciation
CN104485116A (en) * 2014-12-04 2015-04-01 上海流利说信息技术有限公司 Voice quality evaluation equipment, voice quality evaluation method and voice quality evaluation system
CN104681037A (en) * 2015-03-19 2015-06-03 广东小天才科技有限公司 Pronunciation guiding method and device and point reading machine
CN109036464A (en) * 2018-09-17 2018-12-18 腾讯科技(深圳)有限公司 Pronounce error-detecting method, device, equipment and storage medium
CN109545244A (en) * 2019-01-29 2019-03-29 北京猎户星空科技有限公司 Speech evaluating method, device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060116877A1 (en) * 2004-12-01 2006-06-01 Pickering John B Methods, apparatus and computer programs for automatic speech recognition
CN101727764A (en) * 2008-10-21 2010-06-09 微星科技股份有限公司 Method and device for assisting in correcting pronunciation
CN104485116A (en) * 2014-12-04 2015-04-01 上海流利说信息技术有限公司 Voice quality evaluation equipment, voice quality evaluation method and voice quality evaluation system
CN104681037A (en) * 2015-03-19 2015-06-03 广东小天才科技有限公司 Pronunciation guiding method and device and point reading machine
CN109036464A (en) * 2018-09-17 2018-12-18 腾讯科技(深圳)有限公司 Pronounce error-detecting method, device, equipment and storage medium
CN109545244A (en) * 2019-01-29 2019-03-29 北京猎户星空科技有限公司 Speech evaluating method, device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《基于语音心理声学分析的驾驶疲劳检测》;李响等;《仪器仪表学报》;20181031;第39卷(第10期);第166-175页 *
《基于贝叶斯网络的云南民族口音说话人识别》;普园媛等;《2007仪表自动化及先进集成技术大会论文集(一)》;20071231;第379-382页 *

Also Published As

Publication number Publication date
CN110085261A (en) 2019-08-02

Similar Documents

Publication Publication Date Title
CN110085261B (en) Pronunciation correction method, device, equipment and computer readable storage medium
CN109036464B (en) Pronunciation error detection method, apparatus, device and storage medium
CN104252864B (en) Real-time voice analysis method and system
CN110782921B (en) Voice evaluation method and device, storage medium and electronic device
US9076347B2 (en) System and methods for improving language pronunciation
US7996209B2 (en) Method and system of generating and detecting confusing phones of pronunciation
CN108431883B (en) Language learning system and language learning program
US11282511B2 (en) System and method for automatic speech analysis
CN109817244B (en) Spoken language evaluation method, device, equipment and storage medium
KR20160122542A (en) Method and apparatus for measuring pronounciation similarity
CN111081080B (en) Voice detection method and learning device
CN111951825A (en) Pronunciation evaluation method, medium, device and computing equipment
TW200849218A (en) Voice processing methods and systems, and machine readable medium thereof
CN109410984B (en) Reading scoring method and electronic equipment
CN110136748A (en) A kind of rhythm identification bearing calibration, device, equipment and storage medium
CN110503941B (en) Language ability evaluation method, device, system, computer equipment and storage medium
KR100995847B1 (en) Language training method and system based sound analysis on internet
CN110349567B (en) Speech signal recognition method and device, storage medium and electronic device
CN111951629A (en) Pronunciation correction system, method, medium and computing device
CN113486970A (en) Reading capability evaluation method and device
CN110046354B (en) Recitation guiding method, apparatus, device and storage medium
CN112309429A (en) Method, device and equipment for explosion loss detection and computer readable storage medium
CN111951827B (en) Continuous reading identification correction method, device, equipment and readable storage medium
CN110890095A (en) Voice detection method, recommendation method, device, storage medium and electronic equipment
CN110097874A (en) A kind of pronunciation correction method, apparatus, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant