CN113096690A

CN113096690A - Pronunciation evaluation method, device, equipment and storage medium

Info

Publication number: CN113096690A
Application number: CN202110318661.9A
Authority: CN
Inventors: 徐燃
Original assignee: Beijing Roobo Technology Co ltd
Current assignee: Beijing Rubu Technology Co.,Ltd.
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2021-07-09

Abstract

The embodiment of the invention provides a pronunciation evaluation method, a device, equipment and a storage medium, which are used for evaluating pronunciation by acquiring a speech material to be evaluated; scoring the voice material to be evaluated to obtain an original score, and performing feature classification on the voice material to be evaluated to obtain classification information; and the original score and the classification information pass through a classification scoring module to obtain a final output score. The scheme of the embodiment can adopt strict and partial encouragement output scores respectively according to different speaker types, and can simultaneously meet different requirements of young children and formal learners on evaluation criteria.

Description

Pronunciation evaluation method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a pronunciation evaluation method, a pronunciation evaluation device, pronunciation evaluation equipment and a pronunciation evaluation storage medium.

Background

The automatic oral pronunciation evaluating technology is increasingly applied to the field of education assistance, and a language learner can be automatically trained and accompanied to carry out pronunciation training to give accuracy scores of pronunciation, so that the language learner is improved towards a more standard pronunciation direction. However, the language learning and pronunciation practicer may be an adult or a young child. For adults or teenagers who formally learn languages, strict pronunciation evaluation and objective scoring are important; while the junior children of beginners, most of which are under 10 years old, are very sensitive to evaluation scores, and parents generally feed back learning enthusiasm seriously hurting the children if scores are given with strict objective scores, so that the children are frustrated and contradicted in learning.

For education of children of low ages, education ways of encouraging participation and affirmation are generally adopted at home and abroad, so that the children can learn more, participate more and enhance the confidence and interest of the children. Therefore, there is a need to invent a pronunciation assessment method, device, apparatus and storage medium to meet different requirements of both young children and formal learners on assessment criteria.

Disclosure of Invention

In view of the above problems, the present invention provides a pronunciation evaluation method, device, apparatus and storage medium, which can simultaneously meet different requirements of young children and formal learners on evaluation criteria.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, an embodiment of the present invention provides a pronunciation evaluation method, including:

obtaining a voice material to be evaluated;

scoring the voice material to be evaluated to obtain an original score, and performing feature classification on the voice material to be evaluated to obtain classification information;

and the original score and the classification information pass through a classification scoring module to obtain a final output score.

In a second aspect, an embodiment of the present invention further provides a pronunciation evaluation device, including:

the voice acquisition module is used for acquiring the current voice material to be evaluated of the user;

the pronunciation evaluation module is used for scoring the voice material to be evaluated to obtain an original score;

the characteristic classification module is used for carrying out characteristic classification on the voice material to be evaluated to obtain classification information; (ii) a

And the classification scoring module is used for obtaining a final output score according to the original score and the classification information of the voice material to be evaluated.

In a third aspect, an embodiment of the present invention further provides an apparatus, including:

one or more processors;

storage means for storing one or more programs;

and the sound collector is used for collecting the speech material to be evaluated of the user.

When executed by the one or more processors, cause the one or more processors to implement a pronunciation assessment method as any of the embodiments of the invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the pronunciation assessment method according to any of the embodiments of the present invention.

The above summary of the present invention is merely an overview of the technical solutions of the present invention, and the present invention can be implemented in accordance with the content of the description in order to make the technical means of the present invention more clearly understood, and the above and other objects, features, and advantages of the present invention will be more clearly understood.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a schematic flow chart of a pronunciation evaluation method provided in an embodiment of the present invention;

FIG. 2 is a schematic flow chart of another pronunciation assessment method provided in the embodiments of the present invention;

fig. 3 is a schematic structural diagram of a pronunciation evaluation device provided in an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a pronunciation evaluation device provided in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations (or steps) can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

The following describes in detail a pronunciation evaluation method, apparatus, device, and storage medium provided in embodiments of the present invention. The embodiment can be applied to the condition of assisting the learner in pronunciation training, and the method can be executed by a pronunciation evaluating device which can be realized in a software and/or hardware mode and is integrated on equipment with a network communication function. The device may be a user terminal device or a server, and the user terminal device may specifically be a mobile phone, a computer, a tablet computer, or the like.

Fig. 1 is a schematic flow chart of a pronunciation evaluation method provided in an embodiment of the present invention. As shown in fig. 1, the pronunciation evaluation method provided in the embodiment of the present invention may include:

s101, obtaining the current speech material to be evaluated of the user.

The automatic oral pronunciation evaluating technology is increasingly applied to the field of education assistance, and a language learner can be automatically trained and accompanied to carry out pronunciation training to give accuracy scores of pronunciation, so that the language learner is improved towards a more standard pronunciation direction. In this embodiment, after the current user, i.e., the language learner, utters the speech corresponding to the text, the current user is acquired by the device to become the speech material to be evaluated. For example, when pronunciation training is required, the user can select an appropriate pronunciation training text from preset pronunciation training texts according to personal preference and mastery degree as the current pronunciation training text. In the present embodiment, the preset pronunciation training text refers to a text content for the learner to use in pronunciation training.

And S102, scoring the current speech material to be evaluated to obtain an original score, and performing feature classification on the speech material to be evaluated to obtain classification information.

In this embodiment, the raw score of the current phonetic material to be evaluated depends on the spoken language skills of the user in reading the current pronunciation training text. Scoring is achieved through a pronunciation assessment engine, which is a type of engine that accepts input speech and reference answers and gives assessment information of different dimensions. These evaluation dimensions include, but are not limited to: pronunciation accuracy of words, sentence pronunciation accuracy, phoneme level accuracy and error correction, overall fluency, integrity, etc.

In this embodiment, the classification information of the speech material currently to be evaluated depends on the type of the user. The classification information is realized by a speaker feature classifier, which can be a speech feature frame-by-frame classification decision device based on a neural network. The input layer of the neural network is voice spectrum characteristics which can be MFCC, fbank or original Fourier transform digital spectrum information and the like, and the output layer is classified into adults, children and silence.

In this implementation, the acquisition of the original score and the acquisition of the classification information are performed in parallel. The pronunciation evaluation engine and the speaker feature classifier are both real-time streaming engines.

And S103, obtaining a final output score by the original score and the classification information through a classification scoring module.

In this embodiment, the final output score of the current speech material to be evaluated depends on the original score and classification information of the speech material to be evaluated of the user. The final output score is realized through a classification scorer, and the working mode of the classification scorer is to receive the original score of pronunciation evaluation and then carry out certain mapping conversion on the final output score according to the difference of speaker types. For example, if the original score of the pronunciation evaluation engine is designed according to the objective consistency of the expert score strictly, and the actual pronunciation quality can be objectively reflected, the original score can be directly output to adults; for children, the motivation to encourage them is considered, for example, if the pronunciation of the child is over 60 minutes, the child is considered to be very good.

An alternative embodiment is to apply star rating, e.g. three star rating, for the children's score, 3 stars above 60 (full stars) and two stars below 40 from 40 to 60, and give 0 star when the evaluation engine refuses to give the score (the child does not say it).

The method can be implemented by taking the above-mentioned method as a reference, and any other reasonable score mapping and transformation methods can be regarded as the classification score mapping module described in the present invention, including but not limited to configuring different working parameters, difficulty control coefficients, etc. for the pronunciation evaluation engine for different people.

Fig. 2 is a flow chart of another pronunciation assessment method provided in an embodiment of the present invention, which may be combined with various alternatives in one or more of the above embodiments. As shown in fig. 2, the pronunciation evaluation method provided in the embodiment of the present invention may include:

s201, obtaining the current speech material to be evaluated of the user.

S202, carrying out feature classification on the voice material to be evaluated to obtain classification information.

And S203, scoring the current speech material to be evaluated to obtain an original score.

In this embodiment, the original score of the current speech material to be evaluated depends on the spoken language ability of the user when reading the current pronunciation training text, and the classification information obtained in S102. Scoring is achieved through a pronunciation assessment engine, which is a type of engine that accepts input speech and reference answers and gives assessment information of different dimensions. These evaluation dimensions include, but are not limited to: pronunciation accuracy of words, sentence pronunciation accuracy, phoneme level accuracy and error correction, overall fluency, integrity, etc.

In this embodiment, the utterance evaluation engine and the speaker feature classifier are both real-time streaming engines.

And S204, the original scores and the classification information pass through a classification scoring module to obtain final output scores.

In this embodiment, the speaker feature classifier and the utterance evaluation engine are in a serial relationship. The speaker feature classifier is pre-posed. The original score of the current speech material to be evaluated depends on the spoken language ability of the user when reading the current pronunciation training text and also depends on the classification information acquired in S102.

Fig. 3 is a schematic structural diagram of a pronunciation evaluation device provided in an embodiment of the present invention, where the embodiment is applicable to a case of assisting a learner in pronunciation training, and the pronunciation evaluation device may be implemented in a software and/or hardware manner and integrated on a device with a network communication function. The device may be a user terminal device or a server, and the user terminal device may specifically be a mobile phone, a computer, a tablet computer, or the like.

As shown in fig. 3, the pronunciation evaluation device provided in the embodiment of the present invention may include: a voice acquisition module 301, a pronunciation evaluation module 302, a feature classification module 303 and a classification scoring module 304. Wherein:

the voice obtaining module 301 is configured to obtain a current voice material to be evaluated of a user;

the pronunciation evaluating module 302 is used for scoring the voice material to be evaluated to obtain an original score;

optionally, the pronunciation evaluating module is a pronunciation evaluating engine, and the pronunciation evaluating engine is a class of engine that accepts input speech and reference answers and provides evaluating information of different dimensions. These evaluation dimensions include, but are not limited to: pronunciation accuracy of words, sentence pronunciation accuracy, phoneme level accuracy and error correction, overall fluency, integrity, etc.

The feature classification module 303 is used for performing feature classification on the voice material to be evaluated to obtain classification information;

optionally, the pronunciation evaluating module may be a speaker feature classifier, and the speaker feature classifier is a speech feature frame-by-frame classification decision device based on a neural network. The input layer of the neural network is voice spectrum characteristics which can be MFCC, fbank or original Fourier transform digital spectrum information and the like, and the output layer is classified into three categories of adults, children and silence

And the classification scoring module 304 is used for obtaining a final output score according to the original score and the classification information of the speech material to be evaluated.

And optionally, the classification scoring module is a classification scoring mapping module. The working mode of the classification score mapping is to receive the original score of pronunciation evaluation and then perform certain mapping conversion on the final output score according to the difference of speaker types.

On the basis of the above embodiments, please refer to fig. 4. The embodiment of the present invention further provides a pronunciation evaluating apparatus, which includes a memory 31 and a processor 32, wherein:

a memory 31 for storing a computer program;

a processor 32 for implementing the steps of the pronunciation assessment method as described above when executing the computer program.

And the sound collector 33 is used for collecting the speech material to be evaluated of the user.

Of course, the embodiment of the present invention also provides a computer-readable storage medium, where the computer program stored thereon is not limited to the method operations described above, and may also perform related operations in the pronunciation evaluation method provided in any embodiment of the present invention.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A pronunciation assessment method, the method comprising:

obtaining a voice material to be evaluated;

2. The pronunciation assessment method according to claim 1, further comprising:

the scoring of the voice material to be evaluated to obtain an original score is realized by a pronunciation evaluation engine; the characteristic classification of the speech material to be evaluated to obtain the classification information is realized by a speaker characteristic classifier.

3. The pronunciation assessment method according to claim 2, further comprising:

the pronunciation evaluation engine and the speaker feature classifier are both real-time streaming engines.

4. The pronunciation assessment method according to claim 2, wherein said speaker feature classifier is a speech feature frame-by-frame classification decision device based on neural network; the pronunciation evaluation engine is a type of engine which receives input voice and reference answers and gives evaluation information of different dimensions.

5. The utterance evaluation method according to claim 2, wherein the speaker feature classifier is in a serial relationship with the utterance evaluation engine.

6. The utterance evaluation method according to claim 2, wherein the speaker feature classifier is in a parallel relationship with the utterance evaluation engine.

7. A pronunciation evaluation device, comprising:

the characteristic classification module is used for carrying out characteristic classification on the voice material to be evaluated to obtain classification information;

8. The pronunciation evaluating device is characterized by comprising a memory and a processor; wherein:

the memory for storing a computer program;

the processor is configured to implement the pronunciation assessment method according to any one of claims 1 to 6 when executing the computer program.

9. The pronunciation evaluation device according to claim 8, further comprising a sound collector for collecting the speech material to be evaluated of the user.

10. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, implements the pronunciation assessment method as claimed in any one of claims 1 to 6.