CN108257615A

CN108257615A - A kind of user language appraisal procedure and system

Info

Publication number: CN108257615A
Application number: CN201810036799.8A
Authority: CN
Inventors: 蔡森川; 杜娟; 何嘉斌; 顾嘉唯
Original assignee: Beijing Genius Intelligent Technology Co Ltd
Current assignee: Luka Beijing Intelligent Technology Co ltd
Priority date: 2018-01-15
Filing date: 2018-01-15
Publication date: 2018-07-06

Abstract

This application discloses a kind of user language appraisal procedures and system, this method to include：Identify whether current text is target text, when it is target text to identify current text, then receive user with read operation, corresponding to be played first is determined to perform with read operation with pronunciation frequency, first is played with pronunciation frequency, acquire user according to described first with pronunciation frequency re-reading second with pronunciation frequency, assessed with pronunciation frequency the second of user.By the above method, user can assess user with reading, user enabled to effectively to grasp oneself language learning situation, improves the efficiency that user learns language during with reading.

Description

A kind of user language appraisal procedure and system

Technical field

This application involves field of computer technology more particularly to a kind of user language appraisal procedures and system.

Background technology

With the continuous development of globalised economy, learn different language becomes increasingly to weigh in people live Will, e.g., study English.

At present, people usually can learn language to improve learning efficiency by language learning device.

In the prior art, people by language learning device be typically that language repeater or point reader learn language.

But learn language using language repeater, tape can only be coordinated to play out study, and user language is carried out with Language repeater can not assess user language after reading, at the same time, using point reader learn language when, user to language into Row can not equally be assessed user language with point reader after reading so that user can not be effective during language is learnt Grasp oneself language learning situation, e.g., whether language pronouncing accurate etc., and user learns the less efficient of language.

Invention content

The embodiment of the present application provides a kind of user language assessment and determines method and system, solves prior art user and is using During language repeater and point reader study language, the situation of oneself language learning can not be effectively grasped, user learns language Speech it is less efficient the problem of.

A kind of user language appraisal procedure provided by the embodiments of the present application, including：

Identify whether current text is target text；

When identify current text be target text when, then receive user with read operation；

Corresponding to be played first is determined to perform with read operation with pronunciation frequency；

First is played with pronunciation frequency；

Acquire user according to described first with pronunciation frequency re-reading second with pronunciation frequency；

It is assessed with pronunciation frequency the second of user.

Preferably, current text image is obtained；Extract the characteristics of image of the current text image；In pre-stored figure As searching whether there is extracted characteristics of image in property data base；If so, identification current text is target text；If No, then it is non-targeted text to identify current text.

Preferably, the characteristics of image of current text image is extracted by convolutional neural networks algorithm；Or by recycling nerve Network algorithm extracts the characteristics of image of current text image；Or current text image is extracted by scale invariant feature change algorithm Characteristics of image.

Preferably, the page of text image that continues in target text is obtained；The image of the page of text that continues described in extraction image is special Sign；According to the characteristics of image of the page of text image that continues, in page number property data base, the page of text correspondence that continues is determined The page number；According to the corresponding page number of the page of text that continues, in sentence retrtieval database, determine that the page number is corresponding Sentence retrtieval；According to the page number and the corresponding sentence retrtieval of the page number, first is obtained with pronunciation frequency；By institute First obtained is determined as performing with read operation corresponding to be played first with pronunciation frequency with pronunciation frequency.

Preferably, successively extraction user second with each word audio in pronunciation frequency；By the sequence of extraction word audio For each word audio, each phoneme audio in the word audio is extracted successively；By the sequence of extraction phoneme audio for every A phoneme audio determines the corresponding standard phoneme audio of the phoneme audio, and the phoneme audio and standard phoneme audio is carried out Comparison determines the first fractional value of the phoneme audio；For any word audio, all phonemes which is included Second fractional value of the sum of first fractional value of audio as the word audio；The second of the user is included with pronunciation frequency The sum of the second number value of all word audios as the user second with pronunciation frequency assessed value, according to the user Second with pronunciation frequency assessed value, assessed with pronunciation frequency the second of user.

Preferably, it according to the sequence of extraction of the word audio comprising the phoneme audio, is carried in the sentence retrtieval The corresponding word of word audio is taken, according to the sequence of extraction of the phoneme audio, in the corresponding word of word audio extracted The corresponding word phonetic symbol of the phoneme audio is extracted, according to the corresponding word phonetic symbol of the phoneme audio is extracted, in standard phoneme sound In frequency database, the corresponding standard phoneme audio of the word phonetic symbol is determined.

Preferably, judge whether the second of the user with the assessed value of pronunciation frequency be more than preset threshold value；If so, institute State user second is qualified with pronunciation frequency, and prompts user；If it is not, then the second of the user is not qualified with pronunciation frequency, lay equal stress on Replay puts described second with pronunciation frequency.

A kind of user language assessment system provided by the embodiments of the present application, including：

Central processing unit, for identifying whether current text is target text；

Image feedback device, for when central processing unit identification current text be target text when, then receive user with Read operation；

Central processing unit, for determining to perform with read operation corresponding to be played first with pronunciation frequency；

Loud speaker, for playing first with pronunciation frequency；

Microphone, for acquire user according to described first with pronunciation frequency re-reading second with pronunciation frequency；

Cloud server is assessed for second to user with pronunciation frequency.

Preferably, the system also includes：

Camera, for scanning current text image；

The central processing unit is specifically used for, and obtains current text image, and the image for extracting the current text image is special Sign searches whether there is extracted characteristics of image in pre-stored image feature base, if so, identification ought be above This is target text, if it is not, then identifying that current text is non-targeted text.

Preferably, the central processing unit is additionally operable to, and the figure of current text image is extracted by convolutional neural networks algorithm As feature；Or the characteristics of image of current text image is extracted by Recognition with Recurrent Neural Network algorithm；Or become by scale invariant feature Change the characteristics of image of algorithm extraction current text image.

Preferably, the system also includes：

Camera, for scanning the page of text image that continues in target text；

The central processing unit is specifically used for, and obtains the page of text image that continues in target text, continue text described in extraction The characteristics of image of this page of image according to the characteristics of image of the page of text image that continues, in page number property data base, determines institute The corresponding page number of the page of text that continues is stated, according to the corresponding page number of the page of text that continues, in sentence retrtieval database, really Determine the corresponding sentence retrtieval of the page number, according to the page number and the corresponding sentence retrtieval of the page number, obtain First with pronunciation frequency, is determined as performing with read operation corresponding to be played first with pronunciation with pronunciation frequency by acquired first Frequently.

Preferably, the cloud server is specifically used for, and the second of extraction user is with each word sound in pronunciation frequency successively Frequently, each phoneme audio in the word audio is extracted successively for each word audio by the sequence of extraction word audio, by carrying The sequence of phoneme audio is taken for each phoneme audio, determines the corresponding standard phoneme audio of the phoneme audio, and by the phoneme Audio is compared with standard phoneme audio, determines the first fractional value of the phoneme audio, for any word audio, by the list Second fractional value of the sum of first fractional value of all phoneme audios that word audio is included as the word audio, by the use The second of family is with the sum of second number value of all word audios for being included of pronunciation frequency as the user second with pronunciation The assessed value of frequency, according to the second of the user with pronunciation frequency assessed value, assessed with pronunciation frequency the second of user.

Preferably, the cloud server is additionally operable to, according to the sequence of extraction of the word audio comprising the phoneme audio, The corresponding word of word audio is extracted in the sentence retrtieval, according to the sequence of extraction of the phoneme audio, what is extracted The corresponding word phonetic symbol of the phoneme audio is extracted in the corresponding word of word audio, according to extracting the corresponding list of phoneme audio Word phonetic symbol in standard phoneme audio database, determines the corresponding standard phoneme audio of the word phonetic symbol.

Preferably, the cloud server is specifically used for, judge the second of the user with pronunciation frequency assessed value whether More than preset threshold value, if so, the second of the user is qualified with pronunciation frequency, and pass through loud speaker and image feedback device User is prompted, if it is not, then the second of the user is not qualified with pronunciation frequency, and passes through loud speaker and repeats playing described second with reading Audio.

The embodiment of the present application provides a kind of user language appraisal procedure and system, this method include：Identifying current text is It is no for target text, when it is target text to identify current text, then receive user with read operation, determine to perform with read operation Corresponding to be played first, with pronunciation frequency, plays first with pronunciation frequency, acquisition user is re-reading with pronunciation frequency according to described first Second with pronunciation frequency, assessed with pronunciation frequency the second of user.By the above method, user during with reading, User with reading can be assessed, user is enabled to effectively to grasp oneself language learning situation, improves user's study The efficiency of language.

Description of the drawings

Attached drawing described herein is used for providing further understanding of the present application, forms the part of the application, this Shen Illustrative embodiments and their description please do not form the improper restriction to the application for explaining the application.In the accompanying drawings：

Fig. 1 is the process schematic of user language provided by the embodiments of the present application assessment；

Fig. 2 forms structure diagram for user language assessment system provided by the embodiments of the present application.

Specific embodiment

Purpose, technical scheme and advantage to make the application are clearer, below in conjunction with the application specific embodiment and Technical scheme is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the application one Section Example, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing Go out all other embodiments obtained under the premise of creative work, shall fall in the protection scope of this application.

Fig. 1 is user language evaluation process provided by the embodiments of the present application, specifically includes following steps：

S101：Identify whether current text is target text.

In practical applications, people usually can learn language to improve learning efficiency by language learning device.

Further, during user learns language by the system of the application as shown in Figure 2, the system is first Identify whether current text is target text.

It should be noted that in this application, text can refer to complete books, can also refer to by individually several The text of page paper composition, in addition, in this application, user needs the text learnt being placed in specified region, so as to Text is identified in system, and it is current that user, which is needed the current text definition for needing to learn and be placed in specified region, Text.

Herein it should also be noted that, in this application, which can only identify and help user's learning system that can prop up The text held can not be identified and learn for the text do not supported in system, which text specifically supported, can basis Actual demand is set, and in this application, the text which can identify and help user's learning system and can support is determined Justice is target text.

Further, the application give identification current text whether be target text embodiment, it is specific as follows, obtain Current text image is taken, extracts the characteristics of image of the current text image, is searched in pre-stored image feature base With the presence or absence of the characteristics of image extracted, if so, identification current text is target text, if it is not, then identifying that current text is Non-targeted text.

It should be noted that in this application, current text image refers to the cover of text, and can pass through Camera scans and obtains current text image, certainly, in practical applications, will can scan and identify in systems in advance Current text image setting be other pages in text, specifically how setting can determine according to actual demand, in addition, During the characteristics of image for extracting the current text image, can current text image be extracted by convolutional neural networks algorithm Characteristics of image, specifically, current text image is input to according to tri- channels of RGB in convolutional neural networks, convolutional Neural Network can carry out process of convolution to current text image, and pond processing is carried out again to the current text image after process of convolution, after Continuous be repeated as many times to current text image according to the sequence that process of convolution and pondization are handled is handled, until extraction part is special Sign, the local feature that will finally be extracted by the full articulamentum of multilayer, calculate global characteristics, which is to work as The characteristics of image of preceding text image.

Herein it should also be noted that, in this application, can not only use the extraction of convolutional neural networks algorithm current The characteristics of image of text image, can also by Recognition with Recurrent Neural Network algorithm extract current text image characteristics of image or The characteristics of image of current text image is extracted by scale invariant feature change algorithm, current text no longer is extracted to above two The algorithm of the characteristics of image of image is described in detail.

S102：When identify current text be target text when, then receive user with read operation.

In this application, after it is target text to identify current text, user's current text can be prompted to support with reading mould Formula, user can input or click with read operation in system as shown in Figure 2 according to actual learning situation, and e.g., user currently thinks The language content in the third page in learning objective text (that is, current text) is wanted, then user can be by target text page turning to Page three, and in system as shown in Figure 2 input or click with read operation, system receive user with read operation after, can hold Row step S103 after it is non-targeted text to identify current text, then prompts user's current text not support with reading mode.

It should be noted that referring to that system can play the voice of current text with reading mode, user is repeated.

S103：Corresponding to be played first is determined to perform with read operation with pronunciation frequency.

Further, due to when receive user with read operation after, system as shown in Figure 2 is it needs to be determined that user is current Page turning to page of text in pronunciation frequency what is, therefore, in this application, it is thus necessary to determine that perform treat corresponding with read operation First played is with pronunciation frequency.

It should be noted that for the broadcasting of better compartment system with pronunciation frequency and user according to system plays With pronunciation frequency repeat with pronunciation frequency, therefore, in this application, by system plays with pronunciation frequency be defined as first with pronunciation Frequently, user is defined as second with pronunciation frequency with what pronunciation frequency was repeated according to system plays with pronunciation frequency.

Further the application, which gives, determines to perform with read operation corresponding to be played first with the specific of pronunciation frequency Embodiment, it is specific as follows：

The page of text image that continues in target text is obtained, the characteristics of image of the page of text image that continues is extracted, according to this The characteristics of image of the page of text that continues image in page number property data base, determines the corresponding page number of the page of text that continues, according to this The corresponding page number of the page of text that continues in sentence retrtieval database, determines the corresponding sentence retrtieval of the page number, according to The page number and the corresponding sentence retrtieval of the page number obtain first with pronunciation frequency；By acquired first with pronunciation Frequency is determined as performing with read operation corresponding to be played first with pronunciation frequency.

It should be noted that in this application, which is certain one page in target text, and it is to use The page of text of hope study that the current page turning in family is arrived.Extracting the characteristics of image for continuing page of text image ought be above with said extracted The extracting mode of the characteristics of image of this image is the same.What is stored in the page number property data base is that supported text includes The text page number.The sentence retrtieval database purchase has sentence retrtieval, is usually all to use during with reading One pattern with reading, that is to say, that play in short, user repeats this in short, therefore, in the sentence retrtieval The time of occurrence point for indicating each sentence in some audio is described, it can be with definition statement mark there are many form according to actual conditions Remember text, the application gives a kind of definition format of sentence retrtieval, specific such as table 1：

Time of occurrence	Sentence
		00:00:00.000-->00:00:10.000	Good morning
00:00:11.000-->00:00:15.000	Nice to meet you
		00:00:16.000-->00:00:25.000	I’m very happy to meet you

Table 1

Wherein, what the part of table 1 in front was indicated is the time, and what is partly indicated below is in specific audio is corresponding Hold.

Herein it should also be noted that, the application is during with reading or more than the more pattern with reading, That is disposably playing more words, user disposably repeats the more words played, and being specifically arranged to how many words can root Determine according to situation, the application is not specifically limited, and when using more than more pattern with reading, is equally remembered in sentence retrtieval The whole time of occurrence point for indicating that multiple sentences are formed in some audio is carried, such as shown in table 2：

Time of occurrence	Sentence
		00:00:00.000-->00:00:15.000	Good morning, nice to meet you
00:00:16.000-->00:00:25.000	I ' m very happy to meet you, thank you

Table 2

In addition, according to the page number and the corresponding sentence retrtieval of the page number, first is obtained with pronunciation frequency, specifically Each sentence corresponding first can be found in the page number with pronunciation frequency according to the page number using playing controller, according to sentence The corresponding time of occurrence of every words, then can determine first corresponding to the sentence for currently needing to play with pronunciation in retrtieval Frequently.

S104：First is played with pronunciation frequency.

S105：Acquire user according to described first with pronunciation frequency re-reading second with pronunciation frequency.

S106：It is assessed with pronunciation frequency the second of user.

Further, it in order to which user is allowed clearly to grasp the situation of the study language of oneself, in this application, is adopting Collect user according to described first with pronunciation frequency re-reading second with pronunciation frequency after, need to be commented with pronunciation frequency the second of user Estimate.

Further, the application gives the second specific embodiment assessed with pronunciation frequency to user, specifically It is as follows：

The second of extraction user is with each word audio in pronunciation frequency successively, by the sequence of extraction word audio for each Word audio extracts each phoneme audio in the word audio successively, and each phoneme sound is directed to by the sequence of extraction phoneme audio Frequently, it determines the corresponding standard phoneme audio of the phoneme audio, and the phoneme audio and standard phoneme audio is compared, determine First fractional value of the phoneme audio, for any word audio, the of all phoneme audios which is included Second fractional value of the sum of one fractional value as the word audio, all lists that the second of the user is included with pronunciation frequency The sum of second number value of word audio as the user second with pronunciation frequency assessed value, according to the second of the user with The assessed value of pronunciation frequency is assessed with pronunciation frequency the second of user.

For example, it is assumed that currently according to the page number and the corresponding sentence retrtieval of the page number, first got is with reading Audio is good luck, then the user acquired is with pronunciation frequency with pronunciation frequency re-reading second according to first：Good luck's Audio, extract the second of user successively is with each word audio in pronunciation frequency：The audio of good and the audio of luck, for The audio of good, each phoneme audio extracted successively in the audio of good are：The audio of g, the audio of u, the audio of d, for The audio of luck, each phoneme audio extracted successively in the audio of luck are：The audio of l, the audio of Λ, the audio of k, for g Audio, determine the audio of the corresponding standard g of audio of the g, and the audio of the g and the audio of standard g are compared, determine First fractional value of the audio of the g；For the audio of u, the audio of the corresponding standard u of audio of the u is determined, and by the sound of the u The audio of frequency and standard u is compared, and determines the first fractional value of the audio of the u；For the audio of d, the audio pair of the d is determined The audio of standard d answered, and the audio of the d and the audio of standard d are compared, determine the first fractional value of the audio of the d； For the audio of l, the audio of the corresponding standard l of audio of the l is determined, and the audio of the audio of the l and standard l is carried out pair Than determining the first fractional value of the audio of the l；For the audio of Λ, the audio of the corresponding standard Λ of audio of the Λ is determined, and The audio of the Λ and the audio of standard Λ are compared, determine the first fractional value of the audio of the Λ；For the audio of k, really The audio of the corresponding standard k of audio of the fixed k, and the audio of the k and the audio of standard k are compared, determine the audio of the k The first fractional value；For the audio of good, the first fractional value of the audio for the g that the audio of the good is included, the audio of u The first fractional value, the second fractional value of the sum of the first fractional value of the audio of d as the word audio, for the sound of luck Frequently, the first fractional value of the audio of the l audio of the luck included, the first fractional value of the audio of Λ, the of the audio of k Second fractional value of the sum of one fractional value as the word audio；The good that finally the second of user is included with pronunciation frequency The sum of second fractional value of audio and the second fractional value of audio of luck as the user second with pronunciation frequency assessed value.

In addition, it should be noted that the application gives above-mentioned determining phoneme audio corresponding standard phoneme sound The specific embodiment of frequency, it is specific as follows：

According to the sequence of extraction of the word audio comprising the phoneme audio, word sound is extracted in the sentence retrtieval Frequently according to the sequence of extraction of the phoneme audio, the sound is extracted in the corresponding word of word audio extracted for corresponding word The corresponding word phonetic symbol of plain audio, according to the corresponding word phonetic symbol of the phoneme audio is extracted, in standard phoneme audio database In, determine the corresponding standard phoneme audio of the word phonetic symbol.

For example, continuation of the previous cases, for the audio of g, determines that current first is right with pronunciation frequency first in sentence retrtieval The sentence good luck answered, according to the sequence of extraction of the word audio (that is, audio of good) of the audio comprising the g (that is, First is extracted in pronunciation frequency being first), the audio of the good of audio of the extraction comprising the g in sentence retrtieval Corresponding word good, that is, current first is extracted in sentence retrtieval in pronunciation frequently corresponding sentence good luck First word good, word good are the corresponding word of audio of the good of the audio comprising the g, according to the sound of the g The sequence of extraction (being extracted that is, being first in the audio of good) of frequency, in the corresponding list of the audio of the good extracted The corresponding word phonetic symbol g of audio of the g is extracted in word good, that is, first word phonetic symbol g, the list are extracted in word good Word phonetic symbol g is the corresponding word phonetic symbol of audio of the g, according to the corresponding word phonetic symbol g of the audio for extracting the g, in standard In phoneme audio database, the corresponding standard phoneme audio of word phonetic symbol is determined.

Further, the application in second according to the user with the assessed value of pronunciation frequency, to the second of user with reading It, specifically can be as follows during audio is assessed：

Judge whether the second of the user with the assessed value of pronunciation frequency be more than preset threshold value；

If so, the second of the user is qualified with pronunciation frequency, and prompt user；

If it is not, then the second of the user with pronunciation frequency it is not qualified, and repeat playing described second with pronunciation frequency, again to Family is assessed with pronunciation frequency, until qualification.

By the above method, user can assess user with reading, enable to user during with reading Oneself language learning situation is effectively grasped, improves the efficiency that user learns language.

It is above user language appraisal procedure provided by the embodiments of the present application, based on same thinking, the embodiment of the present application A kind of user language assessment system is also provided, as shown in Fig. 2, the system includes：

Central processing unit 201, for identifying whether current text is target text；

Image feedback device 202, for when it is target text that central processing unit 201, which identifies current text, then receiving use Family with read operation；

Central processing unit 201, for determining to perform with read operation corresponding to be played first with pronunciation frequency；

Loud speaker 203, for playing first with pronunciation frequency；

Microphone 204, for acquire user according to described first with pronunciation frequency re-reading second with pronunciation frequency；

Cloud server 205 is assessed for second to user with pronunciation frequency.

The system also includes：

Camera 206, for scanning current text image；

The central processing unit 201 is specifically used for, and obtains current text image, extracts the image of the current text image Feature searches whether there is extracted characteristics of image in pre-stored image feature base, if so, identification is current Text is target text, if it is not, then identifying that current text is non-targeted text.

The central processing unit 201 is additionally operable to, and the image that current text image is extracted by convolutional neural networks algorithm is special Sign；Or the characteristics of image of current text image is extracted by Recognition with Recurrent Neural Network algorithm；Or changed by scale invariant feature and calculated Method extracts the characteristics of image of current text image.

The system also includes：

Camera 206, for scanning the page of text image that continues in target text；

The central processing unit 201 is specifically used for, and obtains the page of text image that continues in target text, continues described in extraction The characteristics of image of page of text image according to the characteristics of image of the page of text image that continues, in page number property data base, determines The corresponding page number of the page of text that continues, according to the corresponding page number of the page of text that continues, in sentence retrtieval database, It determines the corresponding sentence retrtieval of the page number, according to the page number and the corresponding sentence retrtieval of the page number, obtains First is taken with pronunciation frequency, is determined as performing with read operation corresponding to be played first with reading with pronunciation frequency by acquired first Audio.

The cloud server 205 is specifically used for, and the second of extraction user is pressed with each word audio in pronunciation frequency successively The sequence of word audio is extracted for each word audio, each phoneme audio in the word audio is extracted successively, by extraction sound The sequence of plain audio is directed to each phoneme audio, determines the corresponding standard phoneme audio of the phoneme audio, and by the phoneme audio It is compared with standard phoneme audio, determines the first fractional value of the phoneme audio, for any word audio, by the word sound Second fractional value of the sum of first fractional value of all phoneme audios that frequency is included as the word audio, by the user's Second with the sum of second number value of all word audios for being included of pronunciation frequency as the user second with pronunciation frequency Assessed value, according to the second of the user with pronunciation frequency assessed value, assessed with pronunciation frequency the second of user.

The cloud server 205 is additionally operable to, according to the sequence of extraction of the word audio comprising the phoneme audio, described The corresponding word of word audio is extracted in sentence retrtieval, according to the sequence of extraction of the phoneme audio, in the word extracted The corresponding word phonetic symbol of the phoneme audio is extracted in the corresponding word of audio, according to extracting the corresponding word sound of the phoneme audio Mark, in standard phoneme audio database, determines the corresponding standard phoneme audio of the word phonetic symbol.

The cloud server 205 is specifically used for, judge the second of the user with pronunciation frequency assessed value whether be more than Preset threshold value if so, the second of the user is qualified with pronunciation frequency, and passes through loud speaker 203 and image feedback device 202 prompting users if it is not, then the second of the user is not qualified with pronunciation frequency, and passes through loud speaker 203 and repeat playing described the Two with pronunciation frequency.

In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and memory.

Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, CD-ROM read-only memory (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, available for storing the information that can be accessed by a computing device.It defines, calculates according to herein Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.

It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of elements are not only including those elements, but also wrap Include other elements that are not explicitly listed or further include for this process, method, commodity or equipment it is intrinsic will Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that wanted including described Also there are other identical elements in the process of element, method, commodity or equipment.

It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or the embodiment in terms of combining software and hardware can be used in the application Form.It is deposited moreover, the application can be used to can be used in one or more computers for wherein including computer usable program code The shape of computer program product that storage media is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

The foregoing is merely embodiments herein, are not limited to the application.For those skilled in the art For, the application can have various modifications and variations.All any modifications made within spirit herein and principle are equal Replace, improve etc., it should be included within the scope of claims hereof.

Claims

1. a kind of user language appraisal procedure, which is characterized in that including：

Identify whether current text is target text；

First is played with pronunciation frequency；

It is assessed with pronunciation frequency the second of user.

2. the method as described in claim 1, which is characterized in that whether identification current text is target text, is specifically included：

Obtain current text image；

Extract the characteristics of image of the current text image；

Search whether there is extracted characteristics of image in pre-stored image feature base；

If so, identification current text is target text；

If it is not, then identify that current text is non-targeted text.

3. method as claimed in claim 2, which is characterized in that the characteristics of image of extraction current text image specifically includes：

The characteristics of image of current text image is extracted by convolutional neural networks algorithm；Or

The characteristics of image of current text image is extracted by Recognition with Recurrent Neural Network algorithm；Or

The characteristics of image of current text image is extracted by scale invariant feature change algorithm.

4. the method as described in claim 1, which is characterized in that determine to perform with read operation corresponding to be played first with reading Audio specifically includes：

Obtain the page of text image that continues in target text；

The characteristics of image of the page of text that continues described in extraction image；

According to the characteristics of image of the page of text image that continues, in page number property data base, the page of text pair that continues is determined The page number answered；

According to the corresponding page number of the page of text that continues, in sentence retrtieval database, the corresponding language of the page number is determined Sentence retrtieval；

According to the page number and the corresponding sentence retrtieval of the page number, first is obtained with pronunciation frequency；

It is determined as performing with read operation corresponding to be played first with pronunciation frequency with pronunciation frequency by acquired first.

5. method as claimed in claim 4, which is characterized in that assessed with pronunciation frequency the second of user, specifically included：

The second of extraction user is with each word audio in pronunciation frequency successively；

By the sequence of extraction word audio for each word audio, each phoneme audio in the word audio is extracted successively；

Each phoneme audio is directed to by the sequence of extraction phoneme audio, determines the corresponding standard phoneme audio of the phoneme audio, and The phoneme audio and standard phoneme audio are compared, determine the first fractional value of the phoneme audio；

For any word audio, the sum of the first fractional value of all phoneme audios which is included is as the list Second fractional value of word audio；

The sum of second number value of all word audios that the second of the user is included with pronunciation frequency is as the user Second with pronunciation frequency assessed value；

According to the second of the user with pronunciation frequency assessed value, assessed with pronunciation frequency the second of user.

6. method as claimed in claim 5, which is characterized in that determine the corresponding standard phoneme audio of the phoneme audio, specifically Including：

According to the sequence of extraction of the word audio comprising the phoneme audio, word audio pair is extracted in the sentence retrtieval The word answered；

According to the sequence of extraction of the phoneme audio, phoneme audio correspondence is extracted in the corresponding word of word audio extracted Word phonetic symbol；

According to the corresponding word phonetic symbol of the phoneme audio is extracted, in standard phoneme audio database, the word sound is determined Mark corresponding standard phoneme audio.

7. method as claimed in claim 5, which is characterized in that according to the second of the user with pronunciation frequency assessed value, it is right The second of user is assessed with pronunciation frequency, specifically includes：

If it is not, then the second of the user is not qualified with pronunciation frequency, and described second is repeated playing with pronunciation frequency.

8. a kind of user language assessment system, which is characterized in that including：

Central processing unit, for identifying whether current text is target text；

Image feedback device, for when central processing unit identification current text is target text, then receiving being grasped with reading for user Make；

Loud speaker, for playing first with pronunciation frequency；

Cloud server is assessed for second to user with pronunciation frequency.

9. system as claimed in claim 8, which is characterized in that the system also includes：

Camera, for scanning current text image；

The central processing unit is specifically used for, and obtains current text image, extracts the characteristics of image of the current text image, Search whether there is extracted characteristics of image in pre-stored image feature base, if so, identification current text is Target text, if it is not, then identifying that current text is non-targeted text.

10. system as claimed in claim 9, which is characterized in that the central processing unit is additionally operable to, and passes through convolutional neural networks Algorithm extracts the characteristics of image of current text image；Or the image spy of current text image is extracted by Recognition with Recurrent Neural Network algorithm Sign；Or the characteristics of image of current text image is extracted by scale invariant feature change algorithm.

11. system as claimed in claim 8, which is characterized in that the system also includes：

Camera, for scanning the page of text image that continues in target text；

The central processing unit is specifically used for, and obtains the page of text image that continues in target text, continue page of text described in extraction The characteristics of image of image according to the characteristics of image of the page of text image that continues, in page number property data base, determines described treat The corresponding page number of page of text is read, according to the corresponding page number of the page of text that continues, in sentence retrtieval database, determines institute The corresponding sentence retrtieval of the page number is stated, according to the page number and the corresponding sentence retrtieval of the page number, obtains first With pronunciation frequency, it is determined as performing with read operation corresponding to be played first with pronunciation frequency with pronunciation frequency by acquired first.

12. system as claimed in claim 11, which is characterized in that the cloud server is specifically used for, and extracts user successively Second with each word audio in pronunciation frequency, by the sequence of extraction word audio for each word audio, extraction successively should Each phoneme audio in word audio by the sequence of extraction phoneme audio for each phoneme audio, determines the phoneme audio pair The standard phoneme audio answered, and the phoneme audio and standard phoneme audio are compared, determine first point of the phoneme audio Numerical value, for any word audio, the sum of the first fractional value of all phoneme audios which is included is as this Second fractional value of word audio, the second number value of all word audios that the second of the user is included with pronunciation frequency The sum of as the user second with pronunciation frequency assessed value, according to the second of the user with pronunciation frequency assessed value, it is right The second of user is assessed with pronunciation frequency.

13. system as claimed in claim 12, which is characterized in that the cloud server is additionally operable to, according to including the phoneme The sequence of extraction of the word audio of audio extracts the corresponding word of word audio, according to the sound in the sentence retrtieval The sequence of extraction of plain audio extracts the corresponding word phonetic symbol of the phoneme audio in the corresponding word of word audio extracted, According to the corresponding word phonetic symbol of the phoneme audio is extracted, in standard phoneme audio database, the word phonetic symbol pair is determined The standard phoneme audio answered.

14. system as claimed in claim 12, which is characterized in that the cloud server is specifically used for, and judges the user Second with the assessed value of pronunciation frequency whether be more than preset threshold value, if so, the second of the user is qualified with pronunciation frequency, and User is prompted by loud speaker and image feedback device, if it is not, then the second of the user is not qualified with pronunciation frequency, and is passed through Loud speaker repeats playing described second with pronunciation frequency.