CN103366742A - Voice input method and system - Google Patents

Voice input method and system Download PDF

Info

Publication number
CN103366742A
CN103366742A CN2012101013029A CN201210101302A CN103366742A CN 103366742 A CN103366742 A CN 103366742A CN 2012101013029 A CN2012101013029 A CN 2012101013029A CN 201210101302 A CN201210101302 A CN 201210101302A CN 103366742 A CN103366742 A CN 103366742A
Authority
CN
China
Prior art keywords
syllable
candidate
word
text
candidate word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101013029A
Other languages
Chinese (zh)
Other versions
CN103366742B (en
Inventor
李曜
许东星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI GEAK ELECTRONICS Co.,Ltd.
Original Assignee
Shengle Information Technolpogy Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shengle Information Technolpogy Shanghai Co Ltd filed Critical Shengle Information Technolpogy Shanghai Co Ltd
Priority to CN201210101302.9A priority Critical patent/CN103366742B/en
Publication of CN103366742A publication Critical patent/CN103366742A/en
Application granted granted Critical
Publication of CN103366742B publication Critical patent/CN103366742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a voice input method and system. The method includes: recording a voice and at the same time segmenting the input voice into voice segments and generating a text for each voice segment; and displaying the text of each voice segment in order and correcting the text of each voice segment in order according to a selection of a user. The voice input method and system enable a voice identification result to be segmented automatically and paragraphed and then returned for a second confirmation of the user so that the user can record a voice while correcting and confirming a returned text.

Description

Pronunciation inputting method and system
Technical field
The invention belongs to field of speech recognition, particularly a kind of pronunciation inputting method and system.
Background technology
Along with the progress of speech recognition technology and the rise of cloud computing, adopt phonetic entry and carry out the transcription of speech-to-text and the scheme that text turns back to portable terminal has been become a kind of trend by cloud server at portable terminal.Because the size restrictions of portable terminal, it is always not fully up to expectations directly to carry out the convenience of text input by physics or dummy keyboard, and can predict phonetic entry will be at the increasing local key-press input that substitutes.
But the speech recognition accuracy rate is difficult to reach 100% present situation and has hindered the process that phonetic entry thoroughly substitutes key-press input.In fact, because the complicacy of true pronunciation under the various conditions in the life, the accuracy rate of speech recognition may reach 100% never, especially under noisy environment, must there be various mistakes in the recognition result, that is to say, certainly exist the process of a secondary-confirmation for the result of speech recognition.Existing phonetic entry scheme is as follows: after pressing record button, can eject the interface that expression is as shown in Figure 1 being recorded on the portable terminal, then the user loquiturs, after finishing, can on interface as shown in Figure 2, the text that recognizes be illustrated in the text input frame 21, if there is identification error in the texts in the text input frame 21, accesses keyboard 22 by the user again and make amendment and confirm and preserve.Yet in this phonetic entry scheme, the user can not do any editor to recognition result in Recording Process, must be after the disposable voice that will input be all finished, the user could revise one by one and confirm the mistake in the returned text and preserve, and then the text that will confirm is used for follow-up such as sending short messages, send out mail, the application of account and so on.So this affirmation process is usually more loaded down with trivial details for the user, friendly not.
Summary of the invention
The object of the present invention is to provide a kind of pronunciation inputting method and system, can automatically carry out identification by stages to the input voice, user Ke Bian recording limit to identification by stages to text revise.
For addressing the above problem, the invention provides a kind of pronunciation inputting method, comprising:
In recording constantly with the phonetic segmentation sound bite of input and generate the text of each sound bite; And
The text that shows successively each sound bite is revised the text of each sound bite successively according to user's selection.
Further, in said method, by cloud server constantly with the phonetic segmentation sound bite of input and generate the text of each sound bite.
Further, in said method, by voice activity detection algorithm constantly with the input the phonetic segmentation sound bite.
Further, in said method, described selection according to the user comprises the step that the text of each sound bite is revised successively:
Need the content revised in the text of each sound bite of user selection;
Generation is corresponding to the syllable of each word in the candidate word of each word in the described content, the described content with corresponding to candidate's syllable of each word in the described content;
According to the described candidate word of user selection, described syllable and described candidate's syllable the text in the pronunciation fragment is revised.
Further, in said method, the step that described described candidate word according to user selection, described syllable and described candidate's syllable are revised the text in the pronunciation fragment comprises:
When the described candidate word of user selection, the described candidate word of selecting is replaced corresponding word in the described content;
When the described syllable of user selection, generate the candidate word corresponding to described syllable, from the candidate word of described syllable, select correct candidate word and replace corresponding word in the described content;
When the described candidate's syllable of user selection, generate the candidate word corresponding to candidate's syllable, from the candidate word of described candidate's syllable, select correct candidate word and replace corresponding word in the described content;
In the described candidate word that generates, candidate's syllable, do not have correct result, then can call input method text is made amendment.
Further, in said method, described in recording constantly with the phonetic segmentation sound bite of input and generate before the step of text of each sound bite, also comprise: when recording, playback environ-ment is carried out noise monitoring and obtain signal to noise ratio (S/N ratio).
Further, in said method, described generation comprises corresponding to the syllable of each word in the candidate word of each word in the described content, the described content with corresponding to the step of candidate's syllable of each word in the described content:
When described signal to noise ratio (S/N ratio) during greater than predetermined threshold value, reduce described candidate word, described candidate's syllable;
When described signal to noise ratio (S/N ratio) during less than predetermined threshold value, increase described candidate word, described candidate's syllable.
According to another side of the present invention, a kind of voice entry system is provided, comprising:
The cutting module is used in recording constantly with the phonetic segmentation sound bite of inputting and the text that generates each sound bite; And
Correcting module for the text that shows successively each sound bite, is revised the text of each sound bite successively according to user's selection.
Further, in said system, described cutting module is positioned on the cloud server.
Further, in said system, described cutting module is passed through voice activity detection algorithm constantly with the phonetic segmentation sound bite of inputting.
Further, in said system, described correcting module comprises:
Selected cell, the text that is used for obtaining each sound bite of user selection need the content revised;
Candidate unit be used for to generate corresponding to the syllable of each word in the candidate word of described each word of content, the described content with corresponding to candidate's syllable of each word in the described content;
Amending unit is used for according to the described candidate word of user selection, described syllable and described candidate's syllable the text of pronunciation fragment being revised.
Further, in said system, described amending unit is used for when the described candidate word of user selection, and the described candidate word of selecting is replaced corresponding word in the described content; When the described syllable of user selection, generate the candidate word corresponding to described syllable, from the candidate word of described syllable, select correct candidate word and replace corresponding word in the described content; When the described candidate's syllable of user selection, generate the candidate word corresponding to candidate's syllable, from the candidate word of described candidate's syllable, select correct candidate word and replace corresponding word in the described content; In the described candidate word that generates, candidate's syllable, do not have correct result, then can call input method text is made amendment.
Further, in said system, also comprise the noise monitoring unit, be used for when recording, playback environ-ment being carried out noise monitoring and obtain signal to noise ratio (S/N ratio).
Further, in said system, described candidate unit is used for reducing described candidate word, described candidate's syllable when described signal to noise ratio (S/N ratio) during greater than predetermined threshold value; When described signal to noise ratio (S/N ratio) during less than predetermined threshold value, increase described candidate word.
Compared with prior art, the present invention by in recording constantly with the phonetic segmentation sound bite of input and generate the text of each sound bite, the text that shows successively each sound bite, selection according to the user is revised the text of each sound bite successively, can the automatic segmentation voice identification result and carry out segmentation and return the secondary-confirmation for the user, while the user can record returned text is made amendment and confirmed.
In addition, need the content revised in the text by each sound bite of user selection, then generate corresponding to the syllable of each word in the candidate word of each word in the described content, the described content with corresponding to candidate's syllable of each word in the described content, according to the described candidate word of user selection, described syllable and described candidate's syllable the text in the pronunciation fragment is revised again, can be made things convenient for the user to select fast correct literal that the content in the text is revised.
In addition, obtain signal to noise ratio (S/N ratio) by in when recording playback environ-ment being carried out noise monitoring, when described signal to noise ratio (S/N ratio) during greater than predetermined threshold value, reduce described candidate word, described candidate's syllable; When described signal to noise ratio (S/N ratio) during less than predetermined threshold value, increase described candidate word, described candidate's syllable, can adjust according to different signal to noise ratio (S/N ratio)s the number of candidate result.
Description of drawings
Fig. 1 is the recording interface synoptic diagram of existing voice input scheme;
Fig. 2 is that the identification text of existing voice input scheme is showed and modification interface synoptic diagram;
Fig. 3 is the process flow diagram of the pronunciation inputting method of the embodiment of the invention one;
Fig. 4 is that recording, the identification text of the embodiment of the invention one showed and modification interface synoptic diagram
Fig. 5 is successively the identification text being showed of the embodiment of the invention one and revises the interface synoptic diagram;
Fig. 6 is the process flow diagram of the pronunciation inputting method of the embodiment of the invention two;
Fig. 7 is the noise monitoring interface synoptic diagram of the embodiment of the invention two;
Fig. 8 is the high-level schematic functional block diagram of the voice entry system of the embodiment of the invention three.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
Embodiment one
Shown in Fig. 3~5, the invention provides a kind of pronunciation inputting method, comprising:
Step S11, in recording constantly with the phonetic segmentation sound bite of input and generate the text of each sound bite, concrete, but the present invention's automatic segmentation voice identification result also carries out segmentation and returns the secondary-confirmation for the user, can be by cloud server constantly with the phonetic segmentation sound bite of inputting and the text that generates each sound bite, by voice activity detection algorithm constantly with the input the phonetic segmentation sound bite, it is the starting point and ending point of determining exactly voice from a segment signal that comprises voice that sound end detects, distinguish voice and non-speech audio, the sound end detection is an importance in the voice processing technology, for example, when the user inputs voice continuously, can be adopted by cloud server the algorithm of end-point detection, efficient voice is cut into one one according to the rhythm that the user pauses in a minute, and be converted into successively text, and turning back on the displaying interface of portable terminal as shown in Figure 4, will record interface and recognition result of this interface shows that the interface is integrated on the same interface;
Step S12 shows the text of each sound bite successively;
Step S13, selection according to the user is revised the text of each sound bite successively, concrete, the user can make amendment and confirms returned text while recording among the present invention, need to prove, in the interaction schemes of the present invention, all text identification results are not displayed, but only the text identification result of current segmentation is illustrated on the interface such as Fig. 5, after the user revises the recognition result 1 of sound bite 1 and confirms, show again next section recognition result 2, the benefit of this exhibition scheme is to show successively limited result on limited screen, allow the user will concentrate on current recognition result, improve to revise the efficient of text, shown in this step can specifically comprise:
Step S131 needs the content revised in the text of each sound bite of user selection, concrete, when the user need to revise part word among the text identification result, can click the concrete literal among the text identification result;
Step S132, generation is corresponding to the candidate word of each word in the described content, the syllable of each word and corresponding to candidate's syllable of each word in the described content in the described content, concrete, when the concrete literal that needs among user's click recognition result to revise, can arrange and eject the some candidate words corresponding with this literal, the corresponding syllable and the some candidate's syllables that comprise this literal, can effectively voice identification result and input method be combined like this, provide a plurality of candidates for user selection, and recognition result deteriorated to syllable from literal, enlarge the scope of hitting, make the user needn't input a string letter, but find own needed word by the candidate;
Step S133, according to the described candidate word of user selection, described syllable and described candidate's syllable the text in the pronunciation fragment is revised, concrete, when the user revises the recognition result that returns and confirms, as shown in Figure 5 " cancellation " and " affirmation " two orders can be provided, be respectively applied to delete rapidly and preserve this text identification result, this step can further comprise:
Step S1331 when the described candidate word of user selection, replaces corresponding word in the described content with the described candidate word of selecting, and is concrete, if correct literal is present in the candidate word, then the user directly clicks the literal that candidate word just can substitute original identification error;
Step S1332, when the described syllable of user selection, generation is corresponding to the candidate word of described syllable, from the candidate word of described syllable, select correct candidate word and replace corresponding word in the described content, concrete, if there is not correct literal in the candidate word, then the user can click correct syllable, selects to want that word of inputting from candidate word corresponding to this syllable of providing again;
Step S1333, when the described candidate's syllable of user selection, generation is corresponding to the candidate word of candidate's syllable, from the candidate word of described candidate's syllable, select correct candidate word and replace corresponding word in the described content, concrete, if do not have correct literal in candidate word corresponding to correct syllable, then the user can click candidate's syllable, selects to want that word of inputting from candidate word corresponding to this candidate's syllable of providing again;
Step S1334 does not have correct result in the described candidate word that generates, candidate's syllable, then can call input method text is made amendment.
The present invention can be simultaneously displayed on recording interface and return results interface on the interface of portable terminal, allow the user can see while recording the text results of returning, and can revise the text results of returning in real time, be that the user can say one section voice continuously, in the situation of not closing recording, the text results of returning is revised and confirmed, then continue recording, also can use other people voice of sound recordings on one side, and revise simultaneously and confirm the identification return results.
Embodiment two
As shown in Figure 6 and Figure 7, the invention provides another kind of pronunciation inputting method, the difference of present embodiment and embodiment is, increased when recording playback environ-ment has been carried out the step that signal to noise ratio (S/N ratio) is obtained in noise monitoring, can adjust according to different signal to noise ratio (S/N ratio)s the number of candidate result, and be not suitable for adopting prompting user in the very noisy situation of phonetic entry, this example can specifically comprise:
Step S21, when recording, playback environ-ment is carried out noise monitoring and obtain signal to noise ratio (S/N ratio), concrete, this step can automatically detect the signal to noise ratio (S/N ratio) of input voice and feed back on interactive interface, prompting user in the very noisy situation of phonetic entry can be not suitable for adopting, also can in subsequent step S242, adjust the number of candidate result according to different signal to noise ratio (S/N ratio)s, because noise is very large for the impact of speech recognition, when the playback environ-ment noise is stronger, the accuracy rate of speech recognition can descend rapidly, the literal that the user need to revise also increases greatly, therefore, the function that can add in the present embodiment noise monitoring, can be according to the result of end-point detection, every section recognition result is calculated respectively voice segments energy corresponding to this result and quiet section energy (quiet section energy equivalence is in the energy of noise), thereby estimate the signal to noise ratio (S/N ratio) of this section voice, and the pollution level of neighbourhood noise shows with as shown in Figure 7 band recording volume bar 71 and the interface of noise volume bar 72 will record the time, after neighbourhood noise surpasses certain threshold value, but prompting user " current noise is excessive, and the keyboard input is used in suggestion ";
Step S22, in recording constantly with the phonetic segmentation sound bite of input and generate the text of each sound bite, concrete, by cloud server constantly with the phonetic segmentation sound bite of input and generate the text of each sound bite, by voice activity detection algorithm constantly with the phonetic segmentation sound bite of input;
Step S23 shows the text of each sound bite successively;
Step S24 revises the text of each sound bite successively according to user's selection, and this step can specifically comprise:
Step S241 needs the content of revising in the text of each sound bite of user selection;
Step S242, generation is corresponding to the syllable of each word in the candidate word of each word in the described content, the described content with corresponding to candidate's syllable of each word in the described content, can make things convenient for the user to select fast correct literal that the content in the text is revised, this step can further comprise:
Step S2421 when described signal to noise ratio (S/N ratio) during greater than predetermined threshold value, reduces described candidate word, described candidate's syllable, and concrete, signal to noise ratio (S/N ratio) is large, and the expression voice are subjected to the pollution of noise little, and the accuracy of recognition result is high, then can suitably reduce the number of candidate result;
Step S2422, when described signal to noise ratio (S/N ratio) during less than predetermined threshold value, increase described candidate word, described candidate's syllable, concrete, signal to noise ratio (S/N ratio) is little, and the expression voice are subjected to noise pollution large, and then wrong possibility appears in recognition result also increases greatly, then need to increase the number of candidate result, be convenient to the user and can therefrom select correct literal;
Step S243 revises the text in the pronunciation fragment according to the described candidate word of user selection, described syllable and described candidate's syllable, and this step can further comprise:
Step S2431 when the described candidate word of user selection, replaces corresponding word in the described content with the described candidate word of selecting;
Step S2432 when the described syllable of user selection, generates the candidate word corresponding to described syllable, selects correct candidate word and replace corresponding word in the described content from the candidate word of described syllable;
Step S2433 when the described candidate's syllable of user selection, generates the candidate word corresponding to candidate's syllable, selects correct candidate word and replace corresponding word in the described content from the candidate word of described candidate's syllable;
Step S2434 does not have correct result in the described candidate word that generates, candidate's syllable, then can call input method text is made amendment.
In the present embodiment multiple voice technology such as noise monitoring, end-point detection, continuous speech recognition or framework are integrated in the reciprocal process, allow the user can fully experience the convenience of phonetic entry, improve the user of user when phonetic entry and key-press input promiscuous operation and experience.
Embodiment three
As shown in Figure 8, the present invention also provides another kind of voice entry system, comprises cutting module 41, correcting module 42 and noise monitoring unit 43.
Cutting module 41 is used in recording constantly with the phonetic segmentation sound bite of inputting and the text that generates each sound bite, concrete, described cutting module 41 is positioned on the cloud server, described cutting module 41 by voice activity detection algorithm constantly with the phonetic segmentation sound bite of input, but this module automatic segmentation voice identification result and carry out segmentation and return the secondary-confirmation for the user.
Correcting module 42 is used for showing successively the text of each sound bite, selection according to the user is revised the text of each sound bite successively, concrete, this module can realize that the user makes amendment and confirms returned text while recording, need to prove, in the interaction schemes of the present invention, all text identification results are not displayed, but only the text identification result of current segmentation is showed on the interface, after the user revises the text identification result of this sound bite and confirms, show again next section recognition result, the benefit of this exhibition scheme is to show successively limited result on limited screen, allow the user will concentrate on current recognition result, improve the efficient of revising text, described correcting module 42 can further comprise selected cell 421, candidate unit 422 and amending unit 423.
The text that selected cell 421 is used for obtaining each sound bite of user selection needs the content revised.
Candidate unit 422 is used for generating the candidate word corresponding to described each word of content, the syllable of each word and corresponding to candidate's syllable of each word in the described content in the described content, concrete, when the concrete literal that needs among user's click recognition result to revise, can arrange and eject the some candidate words corresponding with this literal, the corresponding syllable and the some candidate's syllables that comprise this literal, can effectively voice identification result and input method be combined like this, provide a plurality of candidates for user selection, and recognition result deteriorated to syllable from literal, enlarge the scope of hitting, make the user needn't input a string letter, but find own needed word by the candidate, in addition, described candidate unit 412 also can be used for reducing described candidate word when described signal to noise ratio (S/N ratio) during greater than predetermined threshold value, described candidate's syllable, signal to noise ratio (S/N ratio) is large, the expression voice are subjected to the pollution of noise little, and the accuracy of recognition result is high, then can suitably reduce the number of candidate result; When described signal to noise ratio (S/N ratio) during less than predetermined threshold value, increase described candidate word, described candidate's syllable, signal to noise ratio (S/N ratio) is little, the expression voice are subjected to noise pollution large, then wrong possibility appears in recognition result also increases greatly, then needs to increase the number of candidate result, is convenient to the user and can therefrom selects correct literal.
Amending unit 423 is used for according to the described candidate word of user selection, described syllable and described candidate's syllable the text of pronunciation fragment being revised, concrete, described amending unit 413 is used for when the described candidate word of user selection, and the described candidate word of selecting is replaced corresponding word in the described content; When the described syllable of user selection, generate the candidate word corresponding to described syllable, from the candidate word of described syllable, select correct candidate word and replace corresponding word in the described content; When the described candidate's syllable of user selection, generate the candidate word corresponding to candidate's syllable, from the candidate word of described candidate's syllable, select correct candidate word and replace corresponding word in the described content; In the described candidate word that generates, candidate's syllable, do not have correct result, then can call input method text is made amendment.
Noise monitoring unit 43 is used for when recording playback environ-ment being carried out noise monitoring and obtains signal to noise ratio (S/N ratio), can adjust according to different signal to noise ratio (S/N ratio)s the number of candidate result, and is being not suitable for adopting prompting user in the very noisy situation of phonetic entry.
The present invention by in recording constantly with the phonetic segmentation sound bite of input and generate the text of each sound bite, the text that shows successively each sound bite, selection according to the user is revised the text of each sound bite successively, can the automatic segmentation voice identification result and carry out segmentation and return the secondary-confirmation for the user, while the user can record returned text is made amendment and confirmed.
In addition, need the content revised in the text by each sound bite of user selection, then generate corresponding to the syllable of each word in the candidate word of each word in the described content, the described content with corresponding to candidate's syllable of each word in the described content, according to the described candidate word of user selection, described syllable and described candidate's syllable the text in the pronunciation fragment is revised again, can be made things convenient for the user to select fast correct literal that the content in the text is revised.
In addition, obtain signal to noise ratio (S/N ratio) by in when recording playback environ-ment being carried out noise monitoring, when described signal to noise ratio (S/N ratio) during greater than predetermined threshold value, reduce described candidate word, described candidate's syllable; When described signal to noise ratio (S/N ratio) during less than predetermined threshold value, increase described candidate word, described candidate's syllable, can adjust according to different signal to noise ratio (S/N ratio)s the number of candidate result.
Each embodiment adopts the mode of going forward one by one to describe in this instructions, and what each embodiment stressed is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.For the disclosed system of embodiment, because corresponding with the disclosed method of embodiment, so description is fairly simple, relevant part partly illustrates referring to method and gets final product.
The professional can also further recognize, unit and the algorithm steps of each example of describing in conjunction with embodiment disclosed herein, can realize with electronic hardware, computer software or the combination of the two, for the interchangeability of hardware and software clearly is described, composition and the step of each example described in general manner according to function in the above description.These functions are carried out with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.The professional and technical personnel can specifically should be used for realizing described function with distinct methods to each, but this realization should not thought and exceeds scope of the present invention.
Obviously, those skilled in the art can carry out various changes and modification to invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these revise and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these change and modification.

Claims (14)

1. a pronunciation inputting method is characterized in that, comprising:
In recording constantly with the phonetic segmentation sound bite of input and generate the text of each sound bite; And
The text that shows successively each sound bite is revised the text of each sound bite successively according to user's selection.
2. pronunciation inputting method as claimed in claim 1 is characterized in that, by cloud server constantly with the phonetic segmentation sound bite of input and generate the text of each sound bite.
3. pronunciation inputting method as claimed in claim 1 is characterized in that, by voice activity detection algorithm constantly with the input the phonetic segmentation sound bite.
4. pronunciation inputting method as claimed in claim 1 is characterized in that, described selection according to the user comprises the step that the text of each sound bite is revised successively:
Need the content revised in the text of each sound bite of user selection;
Generation is corresponding to the syllable of each word in the candidate word of each word in the described content, the described content with corresponding to candidate's syllable of each word in the described content;
According to the described candidate word of user selection, described syllable and described candidate's syllable the text in the pronunciation fragment is revised.
5. pronunciation inputting method as claimed in claim 4 is characterized in that, the step that described described candidate word according to user selection, described syllable and described candidate's syllable are revised the text in the pronunciation fragment comprises:
When the described candidate word of user selection, the described candidate word of selecting is replaced corresponding word in the described content;
When the described syllable of user selection, generate the candidate word corresponding to described syllable, from the candidate word of described syllable, select correct candidate word and replace corresponding word in the described content;
When the described candidate's syllable of user selection, generate the candidate word corresponding to candidate's syllable, from the candidate word of described candidate's syllable, select correct candidate word and replace corresponding word in the described content;
In the described candidate word that generates, candidate's syllable, do not have correct result, then can call input method text is made amendment.
6. pronunciation inputting method as claimed in claim 5, it is characterized in that, described in recording constantly with the phonetic segmentation sound bite of input and generate before the step of text of each sound bite, also comprise: when recording, playback environ-ment is carried out noise monitoring and obtain signal to noise ratio (S/N ratio).
7. pronunciation inputting method as claimed in claim 6 is characterized in that, described generation comprises corresponding to the syllable of each word in the candidate word of each word in the described content, the described content with corresponding to the step of candidate's syllable of each word in the described content:
When described signal to noise ratio (S/N ratio) during greater than predetermined threshold value, increase described candidate word, described candidate's syllable;
When described signal to noise ratio (S/N ratio) during less than predetermined threshold value, reduce described candidate word, described candidate's syllable.
8. a voice entry system is characterized in that, comprising:
The cutting module is used in recording constantly with the phonetic segmentation sound bite of inputting and the text that generates each sound bite; And
Correcting module for the text that shows successively each sound bite, is revised the text of each sound bite successively according to user's selection.
9. voice entry system as claimed in claim 8 is characterized in that, described cutting module is positioned on the cloud server.
10. voice entry system as claimed in claim 8 is characterized in that, described cutting module is passed through voice activity detection algorithm constantly with the phonetic segmentation sound bite of inputting.
11. voice entry system as claimed in claim 8 is characterized in that, described correcting module comprises:
Selected cell, the text that is used for obtaining each sound bite of user selection need the content revised;
Candidate unit be used for to generate corresponding to the syllable of each word in the candidate word of described each word of content, the described content with corresponding to candidate's syllable of each word in the described content;
Amending unit is used for according to the described candidate word of user selection, described syllable and described candidate's syllable the text of pronunciation fragment being revised.
12. voice entry system as claimed in claim 11 is characterized in that, described amending unit is used for when the described candidate word of user selection, and the described candidate word of selecting is replaced corresponding word in the described content; When the described syllable of user selection, generate the candidate word corresponding to described syllable, from the candidate word of described syllable, select correct candidate word and replace corresponding word in the described content; When the described candidate's syllable of user selection, generate the candidate word corresponding to candidate's syllable, from the candidate word of described candidate's syllable, select correct candidate word and replace corresponding word in the described content; In the described candidate word that generates, candidate's syllable, do not have correct result, then can call input method text is made amendment.
13. voice entry system as claimed in claim 12 is characterized in that, also comprises the noise monitoring unit, is used for when recording playback environ-ment being carried out noise monitoring and obtains signal to noise ratio (S/N ratio).
14. voice entry system as claimed in claim 13 is characterized in that, described candidate unit is used for reducing described candidate word, described candidate's syllable when described signal to noise ratio (S/N ratio) during greater than predetermined threshold value; When described signal to noise ratio (S/N ratio) during less than predetermined threshold value, increase described candidate word, described candidate's syllable.
CN201210101302.9A 2012-03-31 2012-03-31 Pronunciation inputting method and system Active CN103366742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210101302.9A CN103366742B (en) 2012-03-31 2012-03-31 Pronunciation inputting method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210101302.9A CN103366742B (en) 2012-03-31 2012-03-31 Pronunciation inputting method and system

Publications (2)

Publication Number Publication Date
CN103366742A true CN103366742A (en) 2013-10-23
CN103366742B CN103366742B (en) 2018-07-31

Family

ID=49367943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210101302.9A Active CN103366742B (en) 2012-03-31 2012-03-31 Pronunciation inputting method and system

Country Status (1)

Country Link
CN (1) CN103366742B (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559880A (en) * 2013-11-08 2014-02-05 百度在线网络技术(北京)有限公司 Voice input system and voice input method
CN103871401A (en) * 2012-12-10 2014-06-18 联想(北京)有限公司 Method for voice recognition and electronic equipment
CN105206267A (en) * 2015-09-09 2015-12-30 中国科学院计算技术研究所 Voice recognition error correction method with integration of uncertain feedback and system thereof
CN105469801A (en) * 2014-09-11 2016-04-06 阿里巴巴集团控股有限公司 Input speech restoring method and device
CN105630959A (en) * 2015-12-24 2016-06-01 联想(北京)有限公司 Text information displaying method and electronic equipment
CN106331893A (en) * 2016-08-31 2017-01-11 科大讯飞股份有限公司 Real-time subtitle display method and system
CN106603381A (en) * 2016-11-24 2017-04-26 北京小米移动软件有限公司 Chat information processing method and device
CN106710597A (en) * 2017-01-04 2017-05-24 广东小天才科技有限公司 Voice data recording method and device
CN107068145A (en) * 2016-12-30 2017-08-18 中南大学 Speech evaluating method and system
CN107230478A (en) * 2017-05-03 2017-10-03 上海斐讯数据通信技术有限公司 A kind of voice information processing method and system
CN107644646A (en) * 2017-09-27 2018-01-30 北京搜狗科技发展有限公司 Method of speech processing, device and the device for speech processes
CN107679032A (en) * 2017-09-04 2018-02-09 百度在线网络技术(北京)有限公司 Voice changes error correction method and device
CN108039173A (en) * 2017-12-20 2018-05-15 深圳安泰创新科技股份有限公司 Voice messaging input method, mobile terminal, system and readable storage medium storing program for executing
CN108320747A (en) * 2018-02-08 2018-07-24 广东美的厨房电器制造有限公司 Appliances equipment control method, equipment, terminal and computer readable storage medium
CN108600773A (en) * 2018-04-25 2018-09-28 腾讯科技(深圳)有限公司 Caption data method for pushing, subtitle methods of exhibiting, device, equipment and medium
CN108632465A (en) * 2018-04-27 2018-10-09 维沃移动通信有限公司 A kind of method and mobile terminal of voice input
CN108737634A (en) * 2018-02-26 2018-11-02 珠海市魅族科技有限公司 Pronunciation inputting method and device, computer installation and computer readable storage medium
CN109471537A (en) * 2017-09-08 2019-03-15 腾讯科技(深圳)有限公司 Pronunciation inputting method, device, computer equipment and storage medium
CN109739425A (en) * 2018-04-19 2019-05-10 北京字节跳动网络技术有限公司 A kind of dummy keyboard, pronunciation inputting method, device and electronic equipment
CN110347996A (en) * 2019-07-15 2019-10-18 北京百度网讯科技有限公司 Amending method, device, electronic equipment and the storage medium of text
CN110491370A (en) * 2019-07-15 2019-11-22 北京大米科技有限公司 A kind of voice stream recognition method, device, storage medium and server
CN110600039A (en) * 2019-09-27 2019-12-20 百度在线网络技术(北京)有限公司 Speaker attribute determination method and device, electronic equipment and readable storage medium
CN111326144A (en) * 2020-02-28 2020-06-23 网易(杭州)网络有限公司 Voice data processing method, device, medium and computing equipment
CN112151072A (en) * 2020-08-21 2020-12-29 北京搜狗科技发展有限公司 Voice processing method, apparatus and medium
CN114120992A (en) * 2020-09-01 2022-03-01 北京字节跳动网络技术有限公司 Method and device for generating video through voice, electronic equipment and computer readable medium
CN117251556A (en) * 2023-11-17 2023-12-19 北京遥领医疗科技有限公司 Patient screening system and method in registration queue

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1181574A (en) * 1996-10-31 1998-05-13 微软公司 Method and system for selecting recognized words when correcting recognized speech
JP2000259178A (en) * 1999-03-08 2000-09-22 Fujitsu Ten Ltd Speech recognition device
JP2002156996A (en) * 2000-11-16 2002-05-31 Toshiba Corp Voice recognition device, recognition result correcting method, and recording medium
CN1901041A (en) * 2005-07-22 2007-01-24 康佳集团股份有限公司 Voice dictionary forming method and voice identifying system and its method
US7310602B2 (en) * 2004-09-27 2007-12-18 Kabushiki Kaisha Equos Research Navigation apparatus
CN101131636A (en) * 2006-08-18 2008-02-27 李颖 On-line voice or Pinyin input method
CN101593076A (en) * 2008-05-28 2009-12-02 Lg电子株式会社 Portable terminal and the method that is used to revise its text
JP2010231433A (en) * 2009-03-26 2010-10-14 Fujitsu Ten Ltd Retrieval device
CN102122506A (en) * 2011-03-08 2011-07-13 天脉聚源(北京)传媒科技有限公司 Method for recognizing voice
CN102215233A (en) * 2011-06-07 2011-10-12 盛乐信息技术(上海)有限公司 Information system client and information publishing and acquisition methods
CN102299934A (en) * 2010-06-23 2011-12-28 上海博路信息技术有限公司 Voice input method based on cloud mode and voice recognition
CN102779511A (en) * 2011-05-12 2012-11-14 Nhn株式会社 Speech recognition system and method based on word-level candidate generation

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1181574A (en) * 1996-10-31 1998-05-13 微软公司 Method and system for selecting recognized words when correcting recognized speech
JP2000259178A (en) * 1999-03-08 2000-09-22 Fujitsu Ten Ltd Speech recognition device
JP2002156996A (en) * 2000-11-16 2002-05-31 Toshiba Corp Voice recognition device, recognition result correcting method, and recording medium
US7310602B2 (en) * 2004-09-27 2007-12-18 Kabushiki Kaisha Equos Research Navigation apparatus
CN1901041A (en) * 2005-07-22 2007-01-24 康佳集团股份有限公司 Voice dictionary forming method and voice identifying system and its method
CN101131636A (en) * 2006-08-18 2008-02-27 李颖 On-line voice or Pinyin input method
CN101593076A (en) * 2008-05-28 2009-12-02 Lg电子株式会社 Portable terminal and the method that is used to revise its text
JP2010231433A (en) * 2009-03-26 2010-10-14 Fujitsu Ten Ltd Retrieval device
CN102299934A (en) * 2010-06-23 2011-12-28 上海博路信息技术有限公司 Voice input method based on cloud mode and voice recognition
CN102122506A (en) * 2011-03-08 2011-07-13 天脉聚源(北京)传媒科技有限公司 Method for recognizing voice
CN102779511A (en) * 2011-05-12 2012-11-14 Nhn株式会社 Speech recognition system and method based on word-level candidate generation
CN102215233A (en) * 2011-06-07 2011-10-12 盛乐信息技术(上海)有限公司 Information system client and information publishing and acquisition methods

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103871401B (en) * 2012-12-10 2016-12-28 联想(北京)有限公司 A kind of method of speech recognition and electronic equipment
CN103871401A (en) * 2012-12-10 2014-06-18 联想(北京)有限公司 Method for voice recognition and electronic equipment
US10068570B2 (en) 2012-12-10 2018-09-04 Beijing Lenovo Software Ltd Method of voice recognition and electronic apparatus
CN103559880A (en) * 2013-11-08 2014-02-05 百度在线网络技术(北京)有限公司 Voice input system and voice input method
CN103559880B (en) * 2013-11-08 2015-12-30 百度在线网络技术(北京)有限公司 Voice entry system and method
CN105469801A (en) * 2014-09-11 2016-04-06 阿里巴巴集团控股有限公司 Input speech restoring method and device
CN105469801B (en) * 2014-09-11 2019-07-12 阿里巴巴集团控股有限公司 A kind of method and device thereof for repairing input voice
CN105206267B (en) * 2015-09-09 2019-04-02 中国科学院计算技术研究所 A kind of the speech recognition errors modification method and system of fusion uncertainty feedback
CN105206267A (en) * 2015-09-09 2015-12-30 中国科学院计算技术研究所 Voice recognition error correction method with integration of uncertain feedback and system thereof
CN105630959A (en) * 2015-12-24 2016-06-01 联想(北京)有限公司 Text information displaying method and electronic equipment
CN106331893A (en) * 2016-08-31 2017-01-11 科大讯飞股份有限公司 Real-time subtitle display method and system
CN106331893B (en) * 2016-08-31 2019-09-03 科大讯飞股份有限公司 Real-time caption presentation method and system
CN106603381A (en) * 2016-11-24 2017-04-26 北京小米移动软件有限公司 Chat information processing method and device
CN107068145A (en) * 2016-12-30 2017-08-18 中南大学 Speech evaluating method and system
CN106710597A (en) * 2017-01-04 2017-05-24 广东小天才科技有限公司 Voice data recording method and device
CN107230478A (en) * 2017-05-03 2017-10-03 上海斐讯数据通信技术有限公司 A kind of voice information processing method and system
CN107679032A (en) * 2017-09-04 2018-02-09 百度在线网络技术(北京)有限公司 Voice changes error correction method and device
CN109471537A (en) * 2017-09-08 2019-03-15 腾讯科技(深圳)有限公司 Pronunciation inputting method, device, computer equipment and storage medium
CN107644646A (en) * 2017-09-27 2018-01-30 北京搜狗科技发展有限公司 Method of speech processing, device and the device for speech processes
CN107644646B (en) * 2017-09-27 2021-02-02 北京搜狗科技发展有限公司 Voice processing method and device for voice processing
CN108039173A (en) * 2017-12-20 2018-05-15 深圳安泰创新科技股份有限公司 Voice messaging input method, mobile terminal, system and readable storage medium storing program for executing
CN108320747A (en) * 2018-02-08 2018-07-24 广东美的厨房电器制造有限公司 Appliances equipment control method, equipment, terminal and computer readable storage medium
CN108737634A (en) * 2018-02-26 2018-11-02 珠海市魅族科技有限公司 Pronunciation inputting method and device, computer installation and computer readable storage medium
CN108737634B (en) * 2018-02-26 2020-03-27 珠海市魅族科技有限公司 Voice input method and device, computer device and computer readable storage medium
CN109739425A (en) * 2018-04-19 2019-05-10 北京字节跳动网络技术有限公司 A kind of dummy keyboard, pronunciation inputting method, device and electronic equipment
CN109739425B (en) * 2018-04-19 2020-02-18 北京字节跳动网络技术有限公司 Virtual keyboard, voice input method and device and electronic equipment
CN108600773A (en) * 2018-04-25 2018-09-28 腾讯科技(深圳)有限公司 Caption data method for pushing, subtitle methods of exhibiting, device, equipment and medium
CN108632465A (en) * 2018-04-27 2018-10-09 维沃移动通信有限公司 A kind of method and mobile terminal of voice input
CN110491370A (en) * 2019-07-15 2019-11-22 北京大米科技有限公司 A kind of voice stream recognition method, device, storage medium and server
CN110347996A (en) * 2019-07-15 2019-10-18 北京百度网讯科技有限公司 Amending method, device, electronic equipment and the storage medium of text
CN110600039A (en) * 2019-09-27 2019-12-20 百度在线网络技术(北京)有限公司 Speaker attribute determination method and device, electronic equipment and readable storage medium
CN110600039B (en) * 2019-09-27 2022-05-20 百度在线网络技术(北京)有限公司 Method and device for determining speaker attribute, electronic equipment and readable storage medium
CN111326144A (en) * 2020-02-28 2020-06-23 网易(杭州)网络有限公司 Voice data processing method, device, medium and computing equipment
CN111326144B (en) * 2020-02-28 2023-03-03 网易(杭州)网络有限公司 Voice data processing method, device, medium and computing equipment
CN112151072A (en) * 2020-08-21 2020-12-29 北京搜狗科技发展有限公司 Voice processing method, apparatus and medium
CN112151072B (en) * 2020-08-21 2024-07-02 北京搜狗科技发展有限公司 Voice processing method, device and medium
CN114120992A (en) * 2020-09-01 2022-03-01 北京字节跳动网络技术有限公司 Method and device for generating video through voice, electronic equipment and computer readable medium
CN117251556A (en) * 2023-11-17 2023-12-19 北京遥领医疗科技有限公司 Patient screening system and method in registration queue

Also Published As

Publication number Publication date
CN103366742B (en) 2018-07-31

Similar Documents

Publication Publication Date Title
CN103366742A (en) Voice input method and system
US11037566B2 (en) Word-level correction of speech input
EP3469592B1 (en) Emotional text-to-speech learning system
US20120016671A1 (en) Tool and method for enhanced human machine collaboration for rapid and accurate transcriptions
KR101312849B1 (en) Combined speech and alternate input modality to a mobile device
US20210004768A1 (en) System and method for interview training with time-matched feedback
US10089974B2 (en) Speech recognition and text-to-speech learning system
EP3120345B1 (en) Incremental utterance decoder combination for efficient and accurate decoding
EP2609588B1 (en) Speech recognition using language modelling
US9135231B1 (en) Training punctuation models
KR20180090869A (en) Determine dialog states for language models
US9244906B2 (en) Text entry at electronic communication device
CN106484131B (en) Input error correction method and input method device
US10553206B2 (en) Voice keyword detection apparatus and voice keyword detection method
US20170169812A1 (en) Providing intelligent transcriptions of sound messages in a messaging application
CN103369122A (en) Voice input method and system
US9772816B1 (en) Transcription and tagging system
JP2015158582A (en) Voice recognition device and program
US20140365218A1 (en) Language model adaptation using result selection
JP6930538B2 (en) Information processing equipment, information processing methods, and programs
JP6559417B2 (en) Information processing apparatus, information processing method, dialogue system, and control program
JP2013050605A (en) Language model switching device and program for the same
CN116564286A (en) Voice input method and device, storage medium and electronic equipment
JP6260138B2 (en) COMMUNICATION PROCESSING DEVICE, COMMUNICATION PROCESSING METHOD, AND COMMUNICATION PROCESSING PROGRAM
Glackin et al. Smart Transcription

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
ASS Succession or assignment of patent right

Owner name: SHANGHAI GUOKE ELECTRONIC CO., LTD.

Free format text: FORMER OWNER: SHENGYUE INFORMATION TECHNOLOGY (SHANGHAI) CO., LTD.

Effective date: 20140919

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20140919

Address after: 201203, room 1, building 380, 108 Yin Yin Road, Shanghai, Pudong New Area

Applicant after: Shanghai Guoke Electronic Co., Ltd.

Address before: 201203 Shanghai City, Pudong New Area Shanghai City, Guo Shou Jing Road, Zhangjiang hi tech Park No. 356 building 3 Room 102

Applicant before: Shengle Information Technology (Shanghai) Co., Ltd.

EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 127, building 3, 356 GuoShouJing Road, Zhangjiang High Tech Park, Pudong New Area, Shanghai 201204

Patentee after: SHANGHAI GEAK ELECTRONICS Co.,Ltd.

Address before: Room 108, building 1, 380 Yinbei Road, Pudong New Area, Shanghai 201203

Patentee before: Shanghai Nutshell Electronics Co.,Ltd.

CP03 Change of name, title or address