WO2017114020A1 - 语音输入方法和终端设备 - Google Patents

语音输入方法和终端设备 Download PDF

Info

Publication number
WO2017114020A1
WO2017114020A1 PCT/CN2016/106261 CN2016106261W WO2017114020A1 WO 2017114020 A1 WO2017114020 A1 WO 2017114020A1 CN 2016106261 W CN2016106261 W CN 2016106261W WO 2017114020 A1 WO2017114020 A1 WO 2017114020A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition result
user
voice input
content
error correction
Prior art date
Application number
PCT/CN2016/106261
Other languages
English (en)
French (fr)
Inventor
李利平
王苏杭
严从现
杨磊
刘敏
赵虹
姚佳
Original Assignee
北京搜狗科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京搜狗科技发展有限公司 filed Critical 北京搜狗科技发展有限公司
Priority to US15/781,902 priority Critical patent/US10923118B2/en
Publication of WO2017114020A1 publication Critical patent/WO2017114020A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • the present application relates to the field of human-computer interaction technologies, and in particular, to a voice input method and a terminal device.
  • Speech recognition technology is a high-tech technology that correctly recognizes human speech through a machine and converts the vocabulary content in human speech into corresponding computer-readable input text or commands. With the continuous advancement of technology, speech recognition technology has become more and more extensive.
  • the technical problem to be solved by the embodiments of the present application is to provide a voice input method and a terminal device, which are used to improve the accuracy of voice input, the richness of voice input content, and the speed of voice processing.
  • a voice input method including:
  • the voice input mode and the edit mode can switch to each other.
  • the application further discloses a terminal device, including:
  • a voice input module configured to receive a first voice input by a user in a voice input mode, and receive a second voice input by the user in an edit mode
  • a voice recognition module configured to separately identify the first voice and the second voice, and respectively generate a first recognition result and a second recognition result
  • a display module configured to display corresponding text content to the user according to the first recognition result
  • the editing operation processing module is configured to convert the second recognition result into an editing instruction in the editing mode, and perform a corresponding operation according to the editing instruction; the voice input mode and the editing mode can switch between each other.
  • the application also discloses an apparatus for voice input, including:
  • a memory and one or more programs, wherein one or more programs are stored in the memory, and configured to be executed by one or more processors, the one or more programs comprising instructions for:
  • the voice input mode and the edit mode can be switched to each other.
  • the embodiments of the present application include the following advantages:
  • the voice input method and the terminal device provided by the present application have two different modes, a voice input mode and an edit mode, in the voice input process, and can switch between the two modes, and perform different in the two different modes.
  • the data processing process can further perform further processing (including operation actions, error correction, adding content elements, etc.) based on the original input and the original input, thereby improving the accuracy of the voice input and the richness of the voice input content, and Improve the speed of voice processing, greatly improving the user experience.
  • FIG. 1 is a flowchart of a voice input method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a voice input mode according to an embodiment of the present application.
  • FIG. 3 is a flowchart of a voice input method according to another embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a terminal device according to another embodiment of the present application.
  • FIG. 6 is a block diagram of an apparatus 800 for voice input, according to an exemplary embodiment
  • FIG. 7 is a schematic structural diagram of a server in an embodiment of the present application.
  • the present application provides a voice input method, as shown in FIG. 1 , including: S11.
  • receiving a first voice input by a user and identifying and generating the voice a first recognition result displaying corresponding text content to the user according to the first recognition result;
  • receiving, in the edit mode the second voice input by the user and identifying the second recognition result;
  • the second recognition result is converted into an editing instruction, and a corresponding operation is performed according to the editing instruction; the voice input mode and the editing mode are switchable from each other.
  • the execution body of the method of the present embodiment is a terminal device, and the terminal device may be a mobile phone, a tablet computer, a PDA or a notebook computer, and of course, any other electronic device that needs to be input. Limit it.
  • the present application realizes further operation processing based on the original input and the original input by distinguishing between different data processing processes in the voice input mode and the edit mode. On the one hand, the user can manually omit the content step that needs to be edited, and realize the full editing operation. On the other hand, the convenience, accuracy, and richness of the input content of the voice input can be improved.
  • the first voice input by the user may be received by a microphone or other voice collecting device, and the first voice is recognized to generate a first recognition result, and then the recognition result is written in words.
  • speech recognition is a process of model matching.
  • a speech model is first established according to the characteristics of human speech. By analyzing the input speech signal and extracting the required features, the speech recognition is required.
  • the template; the process of identifying the first voice is a process of comparing the feature of the input first voice signal with the template, and finally determining an optimal template that matches the first voice, thereby obtaining a result of the voice recognition.
  • the specific speech recognition algorithm may adopt a statistically-based implicit Markov model recognition and training algorithm, and may also adopt a neural network-based training and recognition algorithm, a dynamic time-based matching matching recognition algorithm, and the like, and other algorithms. No restrictions are placed here.
  • step S11 the corresponding text content is generated and presented by identifying the first voice input by the user.
  • the picture may include a still picture.
  • the method described in the embodiment of the present application can be manually operated by the user from the voice input mode. Switching to the edit mode or switching from the edit mode to the voice input mode.
  • step S12. in the edit mode, the second voice input by the user is received and recognized to generate a second recognition result.
  • the second voice input by the user may be received by using a microphone or other voice collecting device, and then the second voice is recognized to generate a second recognition result, and the specific voice recognition means may be The same as step S11.
  • the biggest difference between the voice input mode and the edit mode is that in the voice input mode, the corresponding text content is directly displayed according to the first recognition result, and in the edit mode, the second recognition is performed by step S13.
  • the result is converted into an edit instruction and the corresponding operation is performed in accordance with the edit instruction.
  • the converting the second recognition result into the editing instruction may include: performing semantic analysis on the second recognition result, matching the semantic analysis result with a pre-stored operation information model, and determining editing according to the matching result.
  • the type of instruction may be performed by performing semantic analysis on the second recognition result, matching the semantic analysis result with a pre-stored operation information model, and determining editing according to the matching result. The
  • the operation information model can be classified into three types: a command type operation information model, an error correction type operation information model, and an added content element class operation information model, and each operation information model includes at least one operation information model.
  • the command-type operation information model may include: deleting the operation information model (applicable to deleting a text, a symbol, a letter, a content element, etc. before and after the cursor, such as the displayed text content "we are going to school", the cursor is displayed in After “learning to go”, if the user wants to delete the word "go", he can input the voice "delete the previous text”, match the recognition result "delete the previous text” with the deletion operation information model, and perform the deletion operation if the matching is successful. ), a newline operation information model, a carriage return operation information model, an empty operation information model, a transmission operation information model, an undo operation information model, and the like.
  • the error correction class operation information model includes: a replacement operation information model for replacing a word, an information operation model for adding a word to be added, a shift operation information model for moving a word position, and a word removal operation information model for deleting a word ( Applicable to remove some of the words in the displayed text content, such as the text content displayed is "Today we go to the barbecue?", the user wants to remove "Today", you can enter the voice "Delete Today", the speech recognition result " "Delete today” is matched with the removal operation information model. After the matching is successful, the operation is determined to be "removed”, and the removed content is determined to be “today”. Finally, the operation of removing "today” is performed.
  • the removal operation information model and The biggest difference in deleting the operation information model is that the removal of the operation information model needs to consider the content matching factor, that is, to determine which part of the content needs to be deleted.
  • Adding a content element class operation model may include: adding a terminal device or server side content, etc., and specifically adding at least one of a text, an application, a text, a picture, an animation, a video, an audio, and the like.
  • the adding the content element class operation model may specifically include: adding a first added element operation information model of the current page file (including at least one of a webpage, an application, a text, a text, a picture, an animation, a video, an audio, and the like) (You can use process data to get content elements or screenshots of current page files), add a second storage location (including at least one of text, application, text, image, animation, video, audio, etc.) Elemental operation information model, A third added element operation information model for adding at least one of text, application, text, image, animation, video, audio, etc., which is taken or acquired for a certain period of time, and an element library (also called a media library) for adding an application.
  • a first added element operation information model of the current page file including at least one of a webpage, an application, a text, a text, a picture, an animation, a video, an audio, and the like
  • a second storage location including at least one of text, application, text, image, animation, video,
  • a fourth added element operation information model of at least one of a picture, a text, a text, an animation, an audio, and a video is not limited to the above enumerated cases.
  • the above content refers to the use of application process, storage location, shooting time, element attributes and other information to obtain content elements
  • the application is not limited to the above manner, the content elements obtained by any means can be added to the input box, Directly presented to the user or sent directly to the opposite user.
  • the content element includes at least one of a text, an application, a text, a picture, an animation, a video, an audio, and the like.
  • a matching result is obtained. If the type of the editing instruction is determined as a command according to the matching result, the command is directly executed; if the type of the editing instruction is determined according to the matching result For error correction, performing error correction operation on the displayed text content according to the second recognition result; if it is determined according to the matching result that the type of the editing instruction is an added content element, pushing corresponding corresponding according to the second recognition result Content element.
  • the present application improves the range covered by the voice input by proposing different operations for different types of editing instructions, that is, not only inputting the text content on the screen by voice, but also inputting command operation instructions, error correction instructions, and rich content through voice input. Element add instruction.
  • the accuracy of the speech input on the error correction is improved, and the user selection is not required.
  • the content to be corrected as long as the voice input mode is switched to the edit mode, the displayed text content can be directly corrected according to the input second voice; and the voice input can be used for inputting commands and adding.
  • the content element greatly enriches the content of voice input, and changes the limitation of currently obtaining the content of the upper screen text through voice input. In short, the user experience is greatly improved.
  • the present application is also not limited to the use of an operational information model to determine what command operations to perform, how to correct errors, and what content elements to add, as long as it is capable of data processing, analysis, judgment, and determination of what is performed corresponding to the speech recognition results.
  • the operations are all within the scope of the present application.
  • the means for switching between the voice input mode and the edit mode may be to trigger a button in the display interface, including a click button and a long press button.
  • a button in the display interface including a click button and a long press button.
  • a “press and hold edit” button is displayed below the display interface, and when the user wants to cut into the edit mode, press and hold the button to perform the second Voice input.
  • the voice input mode is automatically switched back by the editing mode.
  • the logo of the button is not limited to "press and hold editing", but may also include graphic elements, other text elements or a combination of graphic elements and text elements.
  • the two modes can also be switched by clicking a button.
  • the “switch to edit mode” button is displayed below the display interface, and in the edit mode, the display interface is displayed below.
  • Switch to voice input mode Voice input mode
  • the means for switching between the mode and the edit mode may be other triggering means such as gesture triggering.
  • the research and development personnel may flexibly design according to the actual application, which is not specifically limited in the embodiment of the present application.
  • the voice input mode the first voice input by the user is received and recognized to generate a first recognition result, and the corresponding text content is presented to the user according to the first recognition result.
  • the type of the editing instruction is error correction, and the error correction operation is performed on the displayed text content according to the second recognition result.
  • the type of the editing instruction is adding a content element, and then pushing the corresponding content element according to the second recognition result.
  • the second recognition result is matched with the operation information model (not limited to the above-mentioned operation information model) exemplified above, and the corresponding operation information model can be determined according to the matching operation information model.
  • the type of the operation editing instruction is specific to step S24.
  • Each operation information model has a mapping relationship with a command. After the second recognition result matches the operation information model, the operation information model and the command may be The mapping relationship determines the corresponding command and directly executes, and the command includes at least one of deletion, line feed, carriage return, emptying, sending, and revoking.
  • the deleting is specifically deleting the previous character or other content element of the current cursor, and the newline is specifically changed to the next line at the current cursor, and the carriage return is specifically determining the content of the upper screen, and the clearing is specifically clearing the current upper
  • the text content and other content elements of the screen are sent specifically to send the content of the upper screen, and the revocation is specifically an operation before the revocation.
  • an edit command and/or input content prompt information is provided. Specifically, as shown in FIG. 2, the user may be prompted to input commands or input contents during the edit mode.
  • step S25 if it is determined that the type of the editing instruction is error correction according to the matching result, an error correction operation is performed on the presented text content according to the second recognition result. Since the error correction involves specific content to be corrected and corrected content, a preferred embodiment is to perform semantic analysis on the second recognition result, and determine a corresponding type of error correction operation and content to be corrected or corrected according to the result of the semantic analysis. .
  • the user inputs the voice “li xiang” in the voice input mode, and the first recognition result is “ideal”, but the user actually wants to output “Li Xiang”.
  • the user triggers the toggle button as shown in FIG. 2, and switches from the voice input mode to the edit mode.
  • the user speaks “Mu Li Li's Li, the sound of the sound”
  • the terminal device recognizes the result "Mu Zi Li's Li Semantic analysis of the sound of the sound, analysis of the results of the "Li of Lizi Li” as the structural information of the word “Li”
  • "sound of the sound” is the semantic information of the word “sound”, thus determining the "Li” and “ring” Word for correction
  • the corresponding text to be corrected is determined as “reason” and “think”, thereby determining that the error correction operation type is "replacement” and replacing "li” and “sound” with “ "”, “think", complete the error correction process.
  • structural information and semantic information are the main expressions.
  • the user inputs the speech about the structural information and semantic information of the corrected content.
  • Semantic analysis is performed on the second recognition result corresponding to the voice, and the corrected content can be determined first, and then the displayed text content is corrected according to the corrected content. Since the basis of voice input is speech recognition, the most important relationship between the content before correction and the content after correction is that the sound is the same or similar. In replacing such an error correction operation type, it is often used that the content before or after correction is matched to the corrected content, or the corrected content is matched to the content before correction.
  • the user inputs the first voice, and the text content displayed is “the weather is cold, the night is cold, and the cup is needed to keep warm.” In fact, the user wants to be “cool in the sky and sleep cold at night”. If you want to buy a quilt, you need to keep warm.”
  • the user triggers the editing mode, inputs a second voice “quilt”, and the terminal device recognizes the second voice as “cup”, and performs voice matching with the displayed text content to determine that the content to be corrected is “cup”, and then
  • the presented text is subjected to context analysis, and the terminal device considers that the "cup” should be “quilt” according to "sleep at night” and “warm”, and determines that the operation type is replacement, and "cup” is replaced with “quilt”.
  • the part to be error-corrected can be determined, and the content of the error correction is determined according to the context of the part to be corrected, and the part to be corrected is replaced.
  • the voice input method proposed in the embodiment of the present invention determines the part of the displayed text content to be error-corrected by matching the second recognition result of the user, and automatically corrects the determined part to be corrected. It can quickly find and correct voice input errors, thus quickly completing the error correction process, further improving the accuracy of voice input and improving the user experience.
  • the user may also input a second voice “delete certain content” and “something is redundant”, and the terminal device determines, according to the recognition result corresponding to the second voice, that the error correction operation type is deleted, according to “ “some content” determines the content to be corrected, and performs a deletion operation on it;
  • the user inputs a second voice "adding something in front of or behind a certain word", according to the location information "in a certain Before or after the word "add” determines that the error correction operation type is "add content”, and according to "some content”, the content that needs to be added, that is, the content after error correction, is performed, and then the error correction operation is performed. It can be seen from the two scenarios exemplified above that, according to the second recognition result, the type of the error correction operation and the content before or after the error correction can be directly determined, and then accurate error correction is performed.
  • the error correction process when determining the corrected content, there are likely to be several candidates. In this case, all of the candidates can be displayed to the user, and the user can input information about the candidate location.
  • the third voice such as "first item” and "second item", can also be selected by clicking One of the candidates can guarantee the accuracy and speed of error correction.
  • step S13 the second recognition result is converted into an editing instruction, and the corresponding operation is performed according to the editing instruction, and specifically, the second recognition result is further matched with the added content element class operation model, thereby determining the operation. Whether the type is an add content element.
  • Content elements can be added according to various information, such as adding a file or page of the current window according to the process data (including a webpage file), adding a file of a predetermined storage location according to the storage location information, and adding a photo or video acquired or acquired according to the time information.
  • the recorded audio, the graphic, the picture, the animation, etc. in the media library of the application software added according to the attribute information or the identification information, corresponding to different information can use different information identification and matching manner
  • the aforementioned operation information model is a method.
  • it is not limited to the technical means of using the matching operation information model, as long as the manner of determining the corresponding operation according to the recognition result includes the present application within the scope to be protected.
  • the pictures in the media library of the application software are added in the voice input box.
  • user A and user B are in chat, user A inputs the voice "Wang Chai" in the edit mode, and the terminal device corresponds to the second recognition result of the voice "Wang Tsai" and the text in the media library.
  • the identification information (or attribute information) of the application, the picture, the text, the animation, the audio, and/or the video is matched, and the matching text, application, picture, text, animation, and the corresponding identification information (or attribute information) are matched.
  • At least one piece of content information in the audio or video such as an animation or a picture that identifies Wang Tsai, is displayed in the user's input box or directly transmitted.
  • the embodiment of the present application obtains the content element by matching the speech recognition result with the identification information (or the attribute information) of the content element in the media library, thereby providing the user with a very convenient way to obtain the facial text, the picture, the text, and the text in the media library.
  • the way of content elements such as at least one of content information in animation, audio, and video, and greatly enriches the voice input content.
  • the face text is a graphic composed of words, numbers, and/or symbols, and the face characters include emoji.
  • the audio includes at least one of an expression sound, a recording, and music.
  • the present application proposes an implementation manner of pushing content elements according to historical information of the user.
  • user A and user B are chatting through the instant chat application, and user A inputs "haha", and the content element matching the "haha” may be multiple smiley faces, small balls, and the like.
  • Emoticon animation and other content elements After matching these content elements, the terminal device can randomly push a certain content element, or push a local user, such as a content element that user A is accustomed to, such as a laughing picture of a small ball or a big laugh.
  • the animation, etc. can also push the content elements that the opposite user, such as user B, is accustomed to, such as a crayon Xiaoxin laughter or a laughter animation.
  • At least one of a face text, a picture, a text, an animation, an application, an audio, and a video is recommended to the user based on user habits or contradictory user habits.
  • the local terminal can adjust the historical information or the preference of the content element used by the local user, for example, the user A, and determine the frequency of use of the matched content element in the history according to the historical information.
  • the previous (eg, highest or lowest) of the matched content elements are pushed to the user or prompted to the user.
  • the local terminal may apply to the server for the history information or preferences of the content element used by the opposite user, for example, the user B, and determine the frequency of use of the matched content element in history according to the historical information, and select the history.
  • the matching content elements that are ranked first (eg, highest or lowest) by frequency are pushed to the user or prompted to the user.
  • At least one of the current hot text, picture, text, animation, application, audio, and video may be recommended based on user habits or recommendations of the opposite user's habits.
  • the judgment about the heat can be determined by considering similar user preference, attention degree, and the like of the user or the opposite user, or factors such as the popularity, attention, and the like of most users of the network.
  • a file of a predetermined storage location is added to a voice input box or a transmission list.
  • user C and user D are chatting, and user C wants to send the already stored file to the other party, then only need to input the second voice "add the file name in the ljl folder on the D disk to include "voice input” a file
  • the terminal device matches the second recognition result corresponding to the second voice with the second added element operation information model "add", "D disk", "folder”, "file name”, and determines editing
  • the instruction is to add the already stored file, and extract specific address information and/or file name from the second recognition result, obtain the file to be added, and set the file to be added as “D: ⁇ My Documents ⁇ ”.
  • the form of ljl ⁇ speech input method ⁇ FileRecv" is displayed in the voice input box or displayed at a predetermined position outside the input box and the human-computer interaction interface.
  • the file name, the file name keyword, the file name + the approximate storage location or the file name keyword + the approximate storage location, etc. can be directly said, and the terminal device can obtain the information according to the recognition result.
  • the editing instruction is determined to add the already stored file, the file is automatically queried according to the recognition result and pushed to the user.
  • the user can add photos, videos, and recorded audio that are taken or acquired for a certain period of time to the user input box or the transmission list.
  • the user inputs a second voice “Add a photo taken today” and “Add a video just taken” in the edit mode, and the terminal device will match the second recognition result corresponding to the second voice with the third.
  • Adding an element operation information model "Today”, “Just”, “Shooting”, “Video”, “Photo” to match, determining to add an element type editing instruction, and then acquiring the photo or video according to the second recognition result, A thumbnail of the obtained photo or video is displayed in the input box, or the corresponding file address information is displayed in the transmission list.
  • the user can add the currently active web page or application interface to the user input box or the sending list.
  • the user modifies a word document. In the process of modifying the document, the user needs to communicate with the other party to modify the details.
  • the instant messaging application window using the voice input method proposed by the application floats on the word application window.
  • the user When the current page content of the specific word needs to be sent to the other party, the user only needs to activate the instant messaging application window and enter the editing mode, and input the current page of the voice into the input box ( The picture can be directly displayed.
  • the user needs to send the current word file to the other party, the user only needs to activate the instant messaging application window and enter the editing mode, and input the current file by voice, and the word file can be added to the input box ( Can The link address is displayed and can also be added to the send list).
  • the user is greatly facilitated to flexibly add page content or file content according to process data in the voice input process, and utilizes complicated screen capture operations in the prior art, and even browses files from the root directory to find The convenience of the target file is greatly improved.
  • the user browses the Taobao webpage and finds that a very good product wants to be recommended to a friend, or a series of recommended page content to recommend to a friend, then the user can take a screen shot operation on the current page, and then In the edit mode, enter the second voice “send screenshot”, you can add the content of the last screenshot to the input box or the send list on the user interface side, or enter and send three screenshots to record the last three screenshots. The content is added to the input box or to the send list on the side of the user interface. Of course, you can also directly send a link to the current web page to the other party. This method is very convenient for the user to send the current window page to the user, which improves the smoothness of communication.
  • the present application adopts the technical means of adding content elements, and the purpose of transmitting a file or a page image can be achieved by inputting a simple voice.
  • the present application further provides a terminal device.
  • the structure of the terminal device includes: a voice input module 101, a voice recognition module 102, a display module 103, and an editing operation processing module 104, where:
  • the voice input module 101 is configured to receive a first voice input by a user in a voice input mode, and receive a second voice input by the user in an edit mode, where the voice recognition module 102 is configured to respectively perform the first voice
  • the voice and the second voice are identified, and the first recognition result and the second recognition result are respectively generated;
  • the display module 103 is configured to display corresponding text content to the user according to the first recognition result
  • the edit operation processing module 104 is configured to convert the second recognition result into an edit instruction in an edit mode, and perform a corresponding operation according to the edit instruction; the voice input mode and the edit mode are switchable with each other.
  • the voice input module 101 and the voice recognition module 102 collect voice and recognize voice in two modes: a voice input mode and an edit mode, and the display module 103 directly generates the voice input mode according to the voice input mode.
  • the first recognition result displays the corresponding text content
  • the editing operation processing module 104 performs error correction, command operation on the text content or adds other content elements outside the text according to the second voice input in the edit mode.
  • the terminal device divides the input voice into two modes, so that when the second recognition result is converted into an editing instruction, less processing resources are required, and the matching accuracy between the second recognition result and the editing instruction is high; On the one hand, the user selects the content part to be edited, which realizes complete voice input, and on the other hand, improves the convenience and accuracy of voice input in editing.
  • the editing operation processing module 104 specifically includes a matching module 1041, a determining module 1042, and an executing module 1043, where:
  • the matching module 1041 is configured to match the second identification result with a pre-stored operation information model
  • the determining module 1042 is configured to determine a type of the editing instruction according to the matching result
  • the execution module 1043 is configured to perform a corresponding operation according to the type of the editing instruction.
  • the determining module 1042 when the determining module 1042 determines that the type of the editing instruction is a command, the executing module directly executes the command; and when the determining module determines that the type of the editing instruction is error correction, the executing module And performing an error correction operation on the displayed text content according to the second recognition result; when the determining module determines that the type of the editing instruction is adding the content element, the executing module pushes the corresponding content element according to the second recognition result.
  • the present application improves the range covered by the voice input by proposing different operations for different types of editing instructions, that is, not only inputting the text content of the upper screen by voice, but also inputting command operation instructions, error correction instructions, and rich by voice.
  • the content element adds an instruction.
  • the accuracy of the speech input is improved, so that the user does not need to select the error correction.
  • the content can be directly corrected according to the input second voice as long as the voice input mode is switched to the edit mode; and the voice input can be used for inputting commands and other content elements. It greatly enriches the content of voice input, and changes the limitation of obtaining the content of the upper screen text through voice input. In short, it greatly improves the user experience.
  • FIG. 6 is a block diagram of an apparatus 800 for voice input, according to an exemplary embodiment.
  • device 800 can be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a gaming console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
  • device 800 can include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, interface 812 of input/output (I/O, Input/Output), Sensor component 814, and communication component 816.
  • processing component 802 memory 804, power component 806, multimedia component 808, audio component 810, interface 812 of input/output (I/O, Input/Output), Sensor component 814, and communication component 816.
  • Processing component 802 typically controls the overall operation of device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • Processing component 802 can include one or more processors 820 to execute instructions to perform all or part of the steps of the above described methods.
  • processing component 802 can include one or more modules to facilitate interaction between component 802 and other components.
  • processing component 802 can include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.
  • Memory 804 is configured to store various types of data to support operation at device 800. This Examples of such data include instructions for any application or method operating on device 800, contact data, phone book data, messages, pictures, videos, and the like.
  • Memory 804 can be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), Electrically Erasable Programmable Read- Only Memory), Erasable Programmable Read Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), Magnetic Memory, flash memory, disk or CD.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM Erasable Programmable Read Only Memory
  • PROM Programmable Read-Only Memory
  • ROM Read-Only Memory
  • Magnetic Memory flash memory, disk or CD.
  • Power component 806 provides power to various components of device 800.
  • Power component 806 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 800.
  • the multimedia component 808 includes a screen between the device 800 and the user that provides an output interface.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP, Touch Panel). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor may sense not only the boundary of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation.
  • the multimedia component 808 includes a front camera and/or a rear camera. When the device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 810 is configured to output and/or input an audio signal.
  • the audio component 810 includes a microphone (MIC, Microphone) that is configured to receive an external audio signal when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode.
  • the received audio signal may be further stored in memory 804 or transmitted via communication component 816.
  • the audio component 810 also includes a speaker for outputting an audio signal.
  • the I/O interface 812 provides an interface between the processing component 802 and the peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.
  • Sensor assembly 814 includes one or more sensors for providing device 800 with a status assessment of various aspects.
  • sensor assembly 814 can detect an open/closed state of device 800, a relative positioning of components, such as the display and keypad of device 800, and sensor component 814 can also detect a change in position of one component of device 800 or device 800. The presence or absence of user contact with device 800, device 800 orientation or acceleration/deceleration, and temperature variation of device 800.
  • Sensor assembly 814 can include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • the sensor component 814 may further include a photo sensor, such as a CMOS (Complementary Metal Oxide Semiconductor) or a CCD (Charged Coupled Device) image sensor, for imaging Used in use.
  • the sensor assembly 814 can also include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 816 is configured to facilitate wired or wireless communication between device 800 and other devices.
  • the device 800 can access a wireless network based on a communication standard, such as WiFi (WIreless FIdelity), 2G or 3G, or a combination thereof.
  • the communication component 816 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel.
  • the communication component 816 also includes a Near Field Communication (NFC) module to facilitate short range communication.
  • NFC Near Field Communication
  • the NFC module can be based on Radio Frequency Identification (RFID) technology, IrDA (Infrared Data Association) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and others. Technology to achieve.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wide Band
  • Bluetooth Bluetooth
  • the device 800 may be configured by one or more Application Specific Integrated Circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPD, Digital Signal Processing). Device), Programmable Logic Device (PLD), Field-Programmable Gate Array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the above method .
  • ASICs Application Specific Integrated Circuits
  • DSPs digital signal processors
  • DSPD Digital Signal Processing
  • Device Digital Signal Processing
  • PLD Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • controller microcontroller, microprocessor or other electronic component implementation for performing the above method .
  • non-transitory computer readable storage medium comprising instructions, such as a memory 804 comprising instructions executable by processor 820 of apparatus 800 to perform the above method.
  • the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, and an optical data storage. Equipment, etc.
  • a non-transitory computer readable storage medium when instructions in the storage medium are executed by a processor of a mobile terminal, enabling the mobile terminal to perform a voice input method, the method comprising: in a voice input mode, Receiving a first voice input by the user and identifying a first recognition result, and displaying corresponding text content to the user according to the first recognition result;
  • the voice input mode and the edit mode can be switched to each other.
  • the step of: converting the second recognition result into an editing instruction comprises: matching the second recognition result with a pre-stored operation information model, and determining a type of the editing instruction according to the matching result.
  • the step of: performing a corresponding operation according to the editing instruction includes at least one of the following steps:
  • the command is directly executed
  • the corresponding content element is pushed according to the second recognition result.
  • the command includes at least one of deletion, line feed, carriage return, emptying, sending, and revoking.
  • the step of: performing error correction operation on the displayed text content according to the second recognition result, if the type of the editing instruction is determined to be error correction according to the matching result specifically:
  • the portion to be error-corrected is error-corrected according to the type of error correction operation.
  • the step of: performing error correction on the part to be error-corrected according to the type of the error correction operation specifically: determining, according to the context of the part to be error-corrected, the content of the error correction, and performing the error correction correct.
  • the step of: performing an error correction operation on the displayed text content according to the second identification result specifically:
  • the displayed text content is error-corrected according to the type of error correction operation and the content after error correction.
  • the error correction operation type is replacement; the step: correcting the displayed text content according to the error correction operation type and the error correction content, specifically: the same or similar to the sound The text is replaced.
  • the second voice includes structural information or semantic information of the replacement word.
  • the step of: if the type of the editing instruction is determined to be an added content element according to the matching result, and pushing the corresponding content element according to the second recognition result specifically:
  • At least one of a face text, a picture, a text, an animation, an application, an audio, and a video is recommended to the user based on user habits or opposite user habits.
  • the picture comprises a static picture.
  • the method further includes: after switching from the voice input mode to the edit mode, providing an edit instruction and/or input content prompt information.
  • FIG. 7 is a schematic structural diagram of a server in an embodiment of the present application.
  • the server 1900 can vary considerably depending on configuration or performance, and can include one or more central processing units (CPUs) 1922 (eg, one or more processors) and memory 1932, one or one The above storage medium 1942 or storage medium 1930 of data 1944 (eg, one or one storage device in Shanghai).
  • the memory 1932 and the storage medium Quality 1930 can be short-lived or persistent.
  • the program stored on storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations in the server.
  • central processor 1922 can be configured to communicate with storage medium 1930, which performs a series of instruction operations in storage medium 1930.
  • Server 1900 may also include one or more power sources 1926, one or more wired or wireless network interfaces 1950, one or more input and output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941.
  • power sources 1926 For example, Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)
  • Document Processing Apparatus (AREA)

Abstract

一种语音输入方法和终端设备,其中,该方法包括:在语音输入模式下,接收用户输入的第一语音并对其进行识别生成第一识别结果,根据所述第一识别结果向用户展现相应的文字内容(S11);在编辑模式下,接收用户输入的第二语音并对其进行识别生成第二识别结果(S12);将所述第二识别结果转换为编辑指令,根据所述编辑指令执行相应操作;所述语音输入模式和编辑模式相互之间可切换(S13)。通过将语音输入划分为语音输入模式和编辑模式,并通过语音输入模式和编辑模式相互之间的切换,在实现文字内容的语音输入的同时,还能够根据用户的语音输入实现相应的编辑操作,进而提高语音输入的效率和趣味性,提升了用户体验。

Description

语音输入方法和终端设备
本申请要求在2015年12月31日提交中国专利局、申请号为201511032340.3、发明名称为“语音输入方法和终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人机交互技术领域,特别涉及一种语音输入方法和终端设备。
背景技术
语音识别技术是一种通过机器正确识别人类的语音,并将人类语音中的词汇内容转换为相应的计算机可读可输入的文本或命令的高科技技术。随着科技的不断进步,语音识别技术涉及领域也越来越广泛。
随着语音输入等方式日益得到普遍应用,当前逐步出现了可通过语音识别技术将用户输入的语音信息转换为对应的文字信息来进行呈现的方式,然而,该种输出形式较为单一,缺乏趣味性,并且由于语音识别的模型并不完善,识别的结果可能会产生错误,进而导致语音识别率比较低,用户体验差。
发明内容
本申请实施例所要解决的技术问题是提供一种语音输入方法和终端设备,用以提高语音输入的准确性、语音输入内容的丰富性以及语音处理的速度。
为了解决上述问题,本申请公开了一种语音输入方法,包括:
在语音输入模式下,接收用户输入的第一语音并识别生成第一识别结果,根据所述第一识别结果向用户展现相应的文字内容;
在编辑模式下,接收用户输入的第二语音并识别生成第二识别结果;将所述第二识别结果转换为编辑指令,根据所述编辑指令执行相应操作;
所述语音输入模式和编辑模式相互之间能相互切换。
另一方面,本申请还公开一种终端设备,包括:
语音输入模块,配置为在语音输入模式下,接收用户输入的第一语音,在编辑模式下接收用户输入的第二语音;
语音识别模块,配置为分别对所述第一语音、第二语音进行识别,分别生成第一识别结果、第二识别结果;
显示模块,配置为根据第一识别结果向用户展现相应的文字内容;
编辑操作处理模块,配置为在编辑模式下将所述第二识别结果转换为编辑指令,并根据所述编辑指令执行相应操作;所述语音输入模式和编辑模式之间能相互切换。
再一方面,本申请还公开一种用于语音输入的装置,包括:
存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:
在语音输入模式下,接收用户输入的第一语音并识别生成第一识别结果,根据所述第一识别结果向用户展现相应的文字内容;
在编辑模式下,接收用户输入的第二语音并识别生成第二识别结果;将所述第二识别结果转换为编辑指令,根据所述编辑指令执行相应操作;
所述语音输入模式和编辑模式之间能相互切换。
与背景技术相比,本申请实施例包括以下优点:
本申请提供的语音输入方法和终端设备,在语音输入过程中,具有语音输入模式和编辑模式两种不同的模式,两种模式之间可进行切换,在这两种不同的模式下进行不同的数据处理过程,能够分别进行原始输入和原始输入基础上的进一步的处理(包括操作动作、纠错、添加内容元素等等),从而提高了语音输入的准确性以及语音输入内容的丰富性,而且提高了语音处理的速度,在很大程度上提升了用户体验。
本申请的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
图1为本申请一种实施方式的语音输入方法的流程图;
图2为本申请一种实施方式的语音输入模式示意图;
图3为本申请另一种实施方式的语音输入方法的流程图;
图4为本申请一实施例的终端设备的结构示意图;
图5为本申请另一实施例的终端设备的结构示意图;
图6是根据一示例性实施例示出的一种用于语音输入的装置800的框图;
图7是本申请实施例中服务器的结构示意图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和实施例,对本申请的具体实施方式作进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语,应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非被特定定义,否则不会用理想化或过于正式的含义来解释。
以下将结合附图对本申请实施例的语音输入方法和终端设备进行详细说明。
为了实现语音的输入准确性以及内容丰富性,本申请提出一种语音输入方法,如图1所示,包括:S11.在语音输入模式下,接收用户输入的第一语音并对其进行识别生成第一识别结果,根据所述第一识别结果向用户展现相应的文字内容;S12.在编辑模式下,接收用户输入的第二语音并对其进行识别生成第二识别结果;S13.将所述第二识别结果转换为编辑指令,根据所述编辑指令执行相应操作;所述语音输入模式和编辑模式相互之间可切换。
本实施方式的方法的执行主体为终端设备,所述终端设备可以是手机、平板电脑、掌上电脑PDA或笔记本等设备,当然,也可为其他任何需要进行输入的电子设备,本申请对此不加以限制。本申请通过区分在语音输入模式和编辑模式两种模式不同的数据处理过程,实现了原始输入和原始输入基础上进一步操作处理。一方面,可以省略用户手动选择需要编辑的内容步骤,实现完全编辑操作,另一方面,可以提高语音输入在编辑操作上的便捷性、准确性和输入内容的丰富性等。
在所述步骤S11.中,在语音输入模式下,可通过麦克风或其他语音采集器件接收用户输入的第一语音,并对第一语音进行识别以生成第一识别结果,然后将识别结果以文字的方式展示给用户。具体来说,语音识别是一个模型匹配的过程,在这个过程中,首先根据人的语音特点建立语音模型,通过对输入的语音信号的分析,抽取所需的特征,来建立语音识别所需的模板;对第一语音进行识别的过程即是将输入的第一语音信号的特征与所述模板比较的过程,最后确定与所述第一语音匹配的最佳模板,从而获得语音识别的结果。具体的语音识别算法,可采用基于统计的隐含马尔可夫模型识别和训练算法,也可采用基于神经网络的训练和识别算法、基于动态时间归整匹配的识别算法等等其他算法,本申请在此不做任何限定。
在步骤S11.,通过对用户输入的第一语音进行识别,生成并展现对应的文字内容。
在展现所述文字内容后,如果用户需要进行删除、换行、回车、清空、发送、撤销等等此类的命令操作或者需要对所述展现的文字内容进行纠错、或者需要在文字内容中添加其他的内容元素(包括图片、图像、视频、音频、动画等等)或者对所述文字内容添加文件(包括各种格式的文件,也可将所述文件视为内容元素)等等,所述图片可以包括静止图片。
本申请实施例所述方法可通过用户手动操作的方式由所述语音输入模 式切换至编辑模式或者由所述编辑模式切换至所述语音输入模式。
在步骤S12.中,在编辑模式下,接收用户输入的第二语音并对其进行识别生成第二识别结果。在具体实施方式中,当用户切换到编辑模式后,可通过麦克风或其他语音采集器件接收用户输入的第二语音,然后对第二语音进行识别以生成第二识别结果,具体的语音识别手段可以与步骤S11.相同,在此不再赘述。语音输入模式和编辑模式两种模式最大的不同是:在语音输入模式下,直接根据第一识别结果进行相应文字内容的显示,而在编辑模式下,通过步骤S13.,将所述第二识别结果转换为编辑指令,并根据所述编辑指令执行相应操作。将所述第二识别结果转换为编辑指令,具体可包括:对所述第二识别结果进行语义分析,将所述语义分析结果与预先存储的操作信息模型进行匹配,根据所述匹配结果确定编辑指令的类型。
在一种具体实施方式中,操作信息模型可分为三种:命令型操作信息模型、纠错类操作信息模型以及添加内容元素类操作信息模型,每种操作信息模型包括至少一个操作信息模型。
举例来说,命令型操作信息模型可包括:删除操作信息模型(适用于删除光标前后一个文字、符号、字母、内容元素等,比如所展示的文字内容“我们要去上学去”,光标显示在“学去”后,用户想删除“去”字,便可以输入语音“删除前一个文字”,将所述识别结果“删除前一个文字”与删除操作信息模型进行匹配,匹配成功便执行删除操作)、换行操作信息模型、回车操作信息模型、清空操作信息模型、发送操作信息模型、撤销操作信息模型等。
纠错类操作信息模型包括:替换字词的替换操作信息模型、增加字词的补入操作信息模型、将字词移动位置的移位操作信息模型、删除字词的字词去除操作信息模型(适用于去除所展示的文字内容中的部分字词,比如所展示的文字内容为“今天我们去烧烤?”,用户想去除“今天”,便可输入语音“删除今天”,将语音识别结果“删除今天”与去除操作信息模型进行匹配,匹配成功后,确定操作为“去除”,还要确定去除的内容为“今天”,最后,执行去除“今天”的操作。所述去除操作信息模型与删除操作信息模型最大的区别在于,去除操作信息模型需要考虑内容匹配因素,即要判断出需要删除那部分内容。)等。
添加内容元素类操作模型可包括:添加终端设备或服务器侧内容等情形,具体可以添加文本、应用、颜文字、图片、动画、视频、音频等文件中的至少一项内容。
添加内容元素类操作模型具体可包括:添加当前页面文件(包括网页、应用程序、文本、颜文字、图片、动画、视频、音频等文件中的至少一项内容)的第一添加元素操作信息模型(可利用进程数据获取内容元素或当前页面文件截图)、添加一定存储位置的文件(包括文本、应用、颜文字、图片、动画、视频、音频等文件中的至少一项内容)的第二添加元素操作信息模型、 添加一定时间拍摄或获取的文本、应用、颜文字、图片、动画、视频、音频等文件中的至少一项内容的第三添加元素操作信息模型、添加应用程序的元素库(也称媒体库)中的图片、颜文字、文本、动画、音频、视频中的至少一项内容等的第四添加元素操作信息模型。值得说明的是,上述例举的具体的操作信息模型和操作信息模型种类只是为了说明操作信息模型的含义,操作信息模型并不局限于上述列举的情况。
上述内容了提到了利用应用进程、存储位置、拍摄时间、元素属性等信息来获取内容元素,本申请并不局限于上述方式,采用任何方式获取的内容元素均可被添加到输入框中,可以直接向用户展现,或直接发送给对侧用户。所述内容元素包括文本、应用、颜文字、图片、动画、视频、音频等文件中的至少一项内容。
在进行第二识别结果与操作信息模型的匹配后,得到匹配结果,如果根据所述匹配结果确定编辑指令的类型为命令,则直接执行所述命令;如果根据所述匹配结果确定编辑指令的类型为纠错,则根据所述第二识别结果,对展现的文字内容进行纠错操作;如果根据所述匹配结果确定编辑指令的类型为添加内容元素,则根据所述第二识别结果推送相应的内容元素。本申请通过提出针对不同的编辑指令类型进行不同的操作提高了语音输入所涵盖的范围,即不仅通过语音输入上屏文字内容,还可通过语音输入命令性操作指令、纠错指令以及丰富的内容元素添加指令。本申请通过将命令型编辑指令、纠错类编辑指令、添加内容元素类编辑指令的语音识别结果分别匹配不同的操作信息模型,提高了语音输入在纠错上的准确性,可以不需要用户选择待纠错的内容,只要由语音输入模式切换至编辑模式,便可根据输入的第二语音直接对所述展现的文字内容进行纠错;而且开创性的提出了语音输入可用于输入命令、添加内容元素,极大地丰富了语音输入的内容,改变了目前通过语音输入只获得上屏文字内容的局限性,总之,在很大程度上提升了用户使用体验。
本申请也不局限于利用操作信息模型来确定执行何种命令性操作、如何纠错以及添加什么内容元素,只要是能够对语音识别结果进行数据处理、分析、判断并能确定执行对应的何种操作均属于本申请的思想范围内。
在语音输入模式和编辑模式之间进行切换的手段可为触发显示界面中的按钮,包括点击按钮以及长按按钮。作为一种实施方式,如图2所示,在语音输入模式下,显示界面的下方显示“按住编辑”按钮,在用户想要切入到编辑模式时,按住所述按钮,即可进行第二语音输入。在用户松开所述按钮时,自动由所述编辑模式切回语音输入模式。当然按钮的标识并不局限于“按住编辑”,也可包括图形元素,其它文字元素或者图形元素与文字元素的组合。作为另一种实施方式,也可采用点击按钮的方式进行两种模式的切换,比如在语音输入模式下,显示界面下方显示“切换至编辑模式”按钮,在编辑模式下,显示界面下方显示“切换至语音输入模式”。在语音输入模 式和编辑模式之间进行切换的手段还可是手势触发等其他触发手段,对于两种模式之间的切换手段,研发人员可根据实际应用进行灵活设计,本申请实施例不作具体限定。
作为一种优选的实施方式的语音输入方法,参照图3,包括以下步骤:
S21.在语音输入模式下,接收用户输入的第一语音并对其进行识别生成第一识别结果,根据所述第一识别结果向用户展现相应的文字内容。
S22.在编辑模式下,接收用户输入的第二语音并对其进行识别生成第二识别结果。
S23.将所述第二识别结果与预先存储的操作信息模型进行匹配,根据所述匹配结果确定编辑指令的类型;
S24.编辑指令的类型为命令,则直接执行所述命令;
S25.编辑指令的类型为纠错,则根据所述第二识别结果,对展现的文字内容进行纠错操作。
S26.编辑指令的类型为添加内容元素,则根据所述第二识别结果推送相应的内容元素。
所述步骤S23.中,将所述第二识别结果与前面所例举的操作信息模型(不局限于上述例举的操作信息模型)进行匹配,根据匹配到的操作信息模型便可确定对应的操作编辑指令的类型,具体到步骤S24.,每个操作信息模型与一条命令具有映射关系,在第二识别结果匹配到操作信息模型后,便可根据所述每个操作信息模型与命令之间的映射关系,确定对应的命令,并直接执行,所述命令包括删除、换行、回车、清空、发送、撤销中的至少一个。所述删除具体为删除当前光标的前一个字符或者其他内容元素,所述换行具体为在当前光标处换到下一行,所述回车具体为确定上屏内容,所述清空具体为清空当前上屏的文字内容和其他内容元素,发送具体为将上屏的内容发送出去,所述撤销具体为撤销之前的一个操作。由语音输入模式切换至编辑模式后,提供编辑指令和/或输入内容提示信息,具体可如图2所示,提示用户在编辑模式时,可以语音输入哪些指令或输入内容等。
在步骤S25.中,如果根据所述匹配结果确定编辑指令的类型为纠错,则根据所述第二识别结果,对展现的文字内容进行纠错操作。由于纠错牵涉到具体的待纠正内容和纠正后内容,优选的实施方式是对所述第二识别结果进行语义分析,根据语义分析结果确定对应的纠错操作类型以及待纠正内容或纠正后内容。
作为一种具体应用场景,用户在语音输入模式下输入语音“li xiang”,第一识别结果为“理想”,但用户其实想要输出的是“李响”。用户触发如图2中所示的切换按钮,由语音输入模式切换至编辑模式,在编辑模式下,用户说出“木子李的李,响声的响”,终端设备对识别结果“木子李的李,响声的响”进行语义分析,分析结果“木子李的李”为“李”字的结构信息,“响声的响”为“响”字的语义信息,从而确定“李”、“响”两字为纠正 后的字,再根据音相同或者相似的预存语音模型确定待纠正的相应文字为“理”、“想”,从而确定纠错操作类型为“替换”,利用“李”、“响”替换“理”、“想”,完成纠错过程。对于具体内容,结构信息和语义信息是主要的表达方式,在该场景,用户输入的就是有关纠正后内容的结构信息和语义信息的语音。针对与所述语音对应的第二识别结果进行语义分析,能够先确定纠正后的内容,之后根据所述纠正后的内容对所述展现的文本内容进行纠错。由于语音输入的基础是语音识别,因此纠正前的内容和纠正后的内容最主要的关系就是音相同或者相近。在替换这种纠错操作类型中,经常利用音相同或者相近由纠正前的内容匹配到纠正后的内容,或者由纠正后的内容匹配到纠正前的内容。
作为另一种具体场景,用户输入第一语音,展现的文字内容为“天凉了,晚上睡觉冷,想买杯子,需要保暖”,实际上用户想要的是“天凉了,晚上睡觉冷,想买被子,需要保暖”。用户触发编辑模式,输入第二语音“被子”,终端设备将该第二语音识别为“杯子”,并与所述展现的文字内容进行语音匹配,确定待纠正的内容为“杯子”,便对所述展现的文字进行上下文分析,终端设备根据“晚上睡觉”和“保暖”认为“杯子”应该是“被子”,便确定操作类型为替换,将“杯子”替换为“被子”。在该种场景中,根据用户输入的第二语音,能够确定出待纠错的部分,根据待纠错部分的上下文确定纠错后的内容,对待纠错的部分进行替换。本场景下的实施方式所提出的语音输入方法,通过对用户的第二识别结果进行匹配,确定展现的文字内容中待纠错的部分,并对确定的待纠错的部分进行自动纠错,能够快速的对语音输入错误进行查找和更正,从而快速完成纠错过程,进一步提高了语音输入的准确性,提升用户体验。
作为第三种具体场景,用户还可输入第二语音“删除某某内容”,“某某内容多余”,终端设备根据对应所述第二语音的识别结果确定纠错操作类型为删除,根据“某某内容”确定待纠错的内容,对其执行删除操作;作为第四种具体场景,用户输入第二语音“在某个字词前面或者后面增加某某内容”,根据位置信息“在某个字词前面或者后面”“增加”确定纠错操作类型为“增加内容”,根据“某某内容”确定需要增加的内容,即纠错后的内容,然后执行纠错操作。通过上述例举的两种场景可以看出,根据第二识别结果还可直接确定纠错操作类型以及纠错前或后的内容,然后进行准确的纠错。
通过上述具体的几种场景说明,不难发现,本申请通过挖掘纠错的类型(包括预先建立纠错操作信息模型)以及语义分析结果,能够对进行所述展示的文字内容或者其他内容元素进行准确的纠错。
在纠错过程中,在确定纠正后的内容时,很有可能会有几种候选项,在这种情况下,可将这几种候选项均显示给用户,用户可输入有关候选项位置信息的第三语音,比如“第一项”、“第二项”,也可通过点击的方式选择 其中一个候选,能够保证纠错的准确性和快捷性。
在步骤S13.将所述第二识别结果转换为编辑指令,根据所述编辑指令执行相应操作中,具体还可包括将所述第二识别结果与添加内容元素类操作模型进行匹配,从而确定操作类型是否为添加内容元素。
可依据多种信息添加内容元素,比如依据进程数据添加当前窗口的文件或者页面(包括网页文件),依据存储位置信息添加预定存储位置的文件、依据时间信息添加一定时间拍摄或获取的照片、视频和录制的音频、依据属性信息或者标识信息添加应用软件的媒体库中的图形、图片、动画等,对应于不同的信息,可利用不同的信息识别和匹配方式,前面提到的操作信息模型是一种方式。当然不仅仅限于利用匹配操作信息模型的技术手段,只要是根据识别结果确定相对应的操作的方式均包含本申请在所要保护的范围内。
作为一种具体实施方式,将应用软件的媒体库中的图片添加在语音输入框中。作为第五种应用场景,用户A和用户B在聊天,用户A在编辑模式下输入语音“汪仔”,终端设备则将对应语音“汪仔”的第二识别结果与媒体库中的颜文字、应用、图片、文本、动画、音频和/或视频的标识信息(或者说属性信息)进行匹配,将匹配成功的标识信息(或者属性信息)对应的颜文字、应用、图片、文本、动画、音频、视频中的至少一项内容信息,例如识别出汪仔的动画或者图片等显示在用户的输入框中或者直接发送该信息。本申请实施方式通过将语音识别结果与媒体库中的内容元素的标识信息(或者属性信息)进行匹配来获取内容元素,为用户提供了非常便捷的获取媒体库中的颜文字、图片、文本、动画、音频、视频中的至少一项内容信息等内容元素的方式,而且大大丰富了语音输入内容。所述颜文字为由文字、数字和/或符号组成的图形,所述颜文字包括表情符号。所述音频包括表情声音、录音、音乐中的至少一个。
在很多情况下,匹配成功的内容元素不止一个,在此,本申请提出一种根据用户的历史信息进行内容元素的推送的实施方式。举例来说,用户A与用户B在通过即时聊天应用程序聊天,用户A语音输入“哈哈”,与该“哈哈”相匹配的内容元素可能是多个笑脸表情图片、小丸子等多个大笑的表情动画等多种内容元素,匹配到这些内容元素后,终端设备可以随机推送某个内容元素,也可以推送本地用户例如用户A习惯使用的内容元素,例如小丸子的大笑图片或者大笑的动画等,当然也可以推送对侧用户例如用户B习惯使用的内容元素,例如蜡笔小新的大笑图片或者大笑的动画等。
在所述编辑模式下,基于用户习惯或对侧用户习惯向用户推荐颜文字、图片、文本、动画、应用、音频、视频中的至少一项。
基于用户习惯的推荐,本地终端可调出本地用户例如用户A使用内容元素的历史信息或喜好等,根据历史信息确定所述匹配的内容元素在历史上的使用频次,选择历史上使用频次排序靠前(例如最高或最低)的所述匹配的内容元素推送给用户或者提示给用户。
基于对侧用户习惯的推荐,本地终端可向服务器申请对侧用户例如用户B使用内容元素的历史信息或喜好等,根据历史信息确定所述匹配的内容元素在历史上的使用频次,选择历史上使用频次排序靠前(例如最高或最低)的所述匹配的内容元素推送给用户或者提示给用户。
在所述编辑模式下,基于用户习惯或对侧用户习惯的推荐,还可以推荐当前的热度较高的颜文字、图片、文本、动画、应用、音频、视频中的至少一项。关于热度的判断可以考虑用户或者对侧用户的相似用户喜爱度、关注度等,或者网络绝大部分用户的喜爱度、关注等因素确定。
作为另一种具体实施方式,将预定存储位置的文件添加在语音输入框或者发送列表中。作为第六种应用场景,用户C和用户D在聊天,用户C希望将已经存储的文件发送给对方,那么只需要输入第二语音“添加D盘上ljl文件夹中的文件名称包含“语音输入”的文件”,终端设备将对应所述第二语音的第二识别结果与第二添加元素操作信息模型“添加”、“D盘”、“文件夹”、“文件名称”进行匹配,确定编辑指令为添加已经存储的文件,再从所述第二识别结果提取具体地址信息和/或文件名称,获取到所述要添加的文件,将所述要添加的文件以“D:\My Documents\ljl\语音输入方法\FileRecv”的形式显示在语音输入框中,或者在输入框外、人机交互界面的预定位置显示。在具体实施时,也可直接说出文件名称、文件名称的关键词、文件名称+大致的存储位置或者文件名称关键词+大致的存储位置等能获取到文件的信息,终端设备根据识别结果在确定编辑指令为添加已经存储的文件时,会自动根据所述识别结果查询所述文件并推送给用户。
作为该种编辑指令类型的第三具体实施方式,用户可添加一定时间拍摄或获取的照片、视频和录制的音频至用户输入框中或者发送列表中。作为第七种应用场景,用户在编辑模式下输入第二语音“添加今天拍摄的照片”、“添加刚刚拍摄的视频”,终端设备则会将对应该第二语音的第二识别结果与第三添加元素操作信息模型“今天”、“刚刚”、“拍摄”、“视频”、“照片”进行匹配,确定为添加元素类型编辑指令,然后根据所述第二识别结果获取所述照片或者视频,将获取到的照片或者视频的缩略图显示在输入框中,或者将对应的文件地址信息显示在发送列表中。
作为该种编辑指令类型的第四种具体实施方式,用户可添加当前活动的网页或者应用程序界面至用户输入框中或者发送列表中。作为第八种应用场景,用户修改一份word文档,在修改文档的过程中,需要与对方沟通修改的细节,一利用本申请提出的语音输入方法的即时通讯应用窗口浮在word应用窗口上面,在需要将具体的word的当前页面内容发送给对方时,用户只需激活所述即时通讯应用窗口并进入编辑模式,语音输入“当前页面”,便可将word的当前页面添加至输入框中(可直接显示图片),如果用户需要将当前word文件发送给对方,用户只需激活所述即时通讯应用窗口并进入编辑模式,语音输入“当前文件”,便可将word文件添加到输入框中(可 显示链接地址,也可添加至发送列表中)。通过上述提出的实施方式,极大地方便了用户在语音输入过程中灵活地根据进程数据添加页面内容或者文件内容,相对于现有技术中利用复杂的截屏操作、甚至从根目录开始浏览文件以查找目标文件的方式便捷性大大提高。
作为第九种应用场景,用户在浏览淘宝网页,发现一款非常好的商品想推荐给朋友,或者一系列需要推荐的页面内容想推荐给朋友,那么用户可对当前页面进行截屏操作,然后在编辑模式下,输入第二语音“发送截屏”,便可将最近一次截屏的内容添加到输入框中或者用户界面一侧的发送列表中,或者输入发送三张截屏,便可将最近三次截屏的内容添加到输入框中或者用户界面一侧的发送列表中。当然,也可以直接发送当前网页的链接给对方用户。该种方式非常方便于用户将当前窗口页面发送给用户,提高了沟通的畅通性。
通过上述四种具体实施方式,本申请采用添加内容元素的技术手段,通过输入简单的语音即可达到发送文件或者页面图像的目的。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
本申请还提供一种终端设备,如图4所示,所述终端设备的结构包括:语音输入模块101、语音识别模块102、显示模块103以及编辑操作处理模块104,其中:
所述语音输入模块101,配置为在语音输入模式下,接收用户输入的第一语音,在编辑模式下接收用户输入的第二语音;所述语音识别模块102,配置为分别对所述第一语音、第二语音进行识别,分别生成第一识别结果、第二识别结果;
所述显示模块103,配置为根据第一识别结果向用户展现相应的文字内容;
所述编辑操作处理模块104,配置为在编辑模式下将所述第二识别结果转换为编辑指令,并根据所述编辑指令执行相应操作;所述语音输入模式和编辑模式相互之间可切换。
本申请实施例提供的终端设备,所述语音输入模块101、语音识别模块102在语音输入模式和编辑模式两种模式下采集语音、识别语音,所述显示模块103直接根据在语音输入模式生成的第一识别结果展示相应文字内容,而所述编辑操作处理模块104根据在编辑模式下输入的第二语音,进行对所述文字内容的纠错、命令式操作或者添加文字外其他内容元素。该终端设备将输入的语音区分为两种模式,使得第二识别结果在转换为编辑指令时,需要的处理资源少,而且第二识别结果与编辑指令的匹配准确性高;在用户体 验上,一方面,省略用户选择要编辑的内容部分,实现了完全的语音输入,另一方面,提高了语音输入在编辑上的便捷性和准确性。
进一步地,如图5所示,所述编辑操作处理模块104具体包括匹配模块1041、确定模块1042以及执行模块1043,其中:
所述匹配模块1041,配置为将所述第二识别结果与预先存储的操作信息模型进行匹配;
所述确定模块1042,配置为根据所述匹配结果确定编辑指令的类型;
所述执行模块1043,配置为根据所述编辑指令的类型执行相应操作。
根据本申请的优选实施例,所述确定模块1042确定编辑指令的类型为命令时,所述执行模块直接执行所述命令;所述确定模块确定编辑指令的类型为纠错时,所述执行模块根据所述第二识别结果,对展现的文字内容进行纠错操作;所述确定模块确定编辑指令的类型为添加内容元素时,所述执行模块根据所述第二识别结果推送相应的内容元素。
本申请通过提出针对不同的编辑指令类型进行不同的操作提高了语音输入所涵盖的范围,即不仅通过语音输入上屏的文字内容,还可通过语音输入命令性操作指令、纠错指令以及丰富的内容元素添加指令。本申请通过将命令性操作指令和用于纠错、添加其他内容元素的语音识别结果分别匹配不同的操作信息模型,提高了语音输入在纠错上的准确性,从而不需要用户选择待纠错的内容,只要由语音输入模式切换至编辑模式,便可根据输入的第二语音直接对所述展现的文字内容进行纠错;而且开创性的提出了语音输入可用于输入命令、其他内容元素,极大地丰富了语音输入的内容,改变了目前通过语音输入只获得上屏文字内容的局限性,总之,在很大程度上提升了用户使用体验。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
图6是根据一示例性实施例示出的一种用于语音输入的装置800的框图。例如,装置800可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
参照图7,装置800可以包括以下一个或多个组件:处理组件802,存储器804,电源组件806,多媒体组件808,音频组件810,输入/输出(I/O,Input/Output)的接口812,传感器组件814,以及通信组件816。
处理组件802通常控制装置800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理元件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理部件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。
存储器804被配置为存储各种类型的数据以支持在设备800的操作。这 些数据的示例包括用于在装置800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory),可擦除可编程只读存储器(EPROM,Erasable Programmable Read Only Memory),可编程只读存储器(PROM,Programmable Read-Only Memory),只读存储器(ROM,Read-Only Memory),磁存储器,快闪存储器,磁盘或光盘。
电源组件806为装置800的各种组件提供电力。电力组件806可以包括电源管理***,一个或多个电源,及其他与为装置800生成、管理和分配电力相关联的组件。
多媒体组件808包括在所述装置800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD,Liquid Crystal Display)和触摸面板(TP,Touch Panel)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。当设备800处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜***或具有焦距和光学变焦能力。
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC,Microphone),当装置800处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。
I/O接口812为处理组件802和***接口模块之间提供接口,上述***接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件814包括一个或多个传感器,用于为装置800提供各个方面的状态评估。例如,传感器组件814可以检测到设备800的打开/关闭状态,组件的相对定位,例如所述组件为装置800的显示器和小键盘,传感器组件814还可以检测装置800或装置800一个组件的位置改变,用户与装置800接触的存在或不存在,装置800方位或加速/减速和装置800的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如CMOS(Complementary Metal Oxide Semiconductor,互补金属氧化物半导体)或CCD(Charged Coupled Device,电荷耦合器件)图像传感器,用于在成像应 用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件816被配置为便于装置800和其他设备之间有线或无线方式的通信。装置800可以接入基于通信标准的无线网络,如WiFi(WIreless FIdelity,无线保真),2G或3G,或它们的组合。在一个示例性实施例中,通信部件816经由广播信道接收来自外部广播管理***的广播信号或广播相关信息。在一个示例性实施例中,所述通信部件816还包括近场通信(NFC,Near Field Communication)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID,Radio Frequency Identification)技术,红外数据协会(IrDA,Infrared Data Association)技术,超宽带(UWB,Ultra Wide Band)技术,蓝牙(BT,Bluetooth)技术和其他技术来实现。
在示例性实施例中,装置800可以被一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、数字信号处理器(DSP,Digital Signal Processing)、数字信号处理设备(DSPD,Digital Signal Processing Device)、可编程逻辑器件(PLD,Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable GateArray)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器804,上述指令可由装置800的处理器820执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM,Random Access Memory)、CD-ROM(Compact Disc Read-Only Memory,只读光盘)、磁带、软盘和光数据存储设备等。
一种非临时性计算机可读存储介质,当所述存储介质中的指令由移动终端的处理器执行时,使得移动终端能够执行一种语音输入方法,所述方法包括:在语音输入模式下,接收用户输入的第一语音并识别生成第一识别结果,根据所述第一识别结果向用户展现相应的文字内容;
在编辑模式下,接收用户输入的第二语音并识别生成第二识别结果;将所述第二识别结果转换为编辑指令,根据所述编辑指令执行相应操作;
所述语音输入模式和编辑模式之间能相互切换。
可选地,所述步骤:将所述第二识别结果转换为编辑指令,具体包括:将所述第二识别结果与预先存储的操作信息模型进行匹配,根据所述匹配结果确定编辑指令的类型。
可选地,所述步骤:根据所述编辑指令执行相应操作,至少包括以下一个步骤:
如果根据所述匹配结果确定编辑指令的类型为命令,则直接执行所述命令;
如果根据所述匹配结果确定编辑指令的类型为纠错,则根据所述第二识别结果,对展现的文字内容进行纠错操作;
如果根据所述匹配结果确定编辑指令的类型为添加内容元素,则根据所述第二识别结果推送相应的内容元素。
可选地,所述命令包括删除、换行、回车、清空、发送、撤销中的至少一个。
可选地,所述步骤:如果根据所述匹配结果确定编辑指令的类型为纠错,则根据所述第二识别结果,对展现的文字内容进行纠错操作,具体包括:
对所述第二识别结果进行语义分析,根据语义分析结果确定对应的纠错操作类型以及待纠错部分;
对所述待纠错的部分按照所述纠错操作类型进行纠错。
可选地,所述步骤:对所述待纠错的部分按照所述纠错操作类型进行纠错,具体包括:根据待纠错部分的上下文确定纠错后的内容,对待纠错的部分进行更正。
可选地,所述步骤:根据所述第二识别结果,对所述展现的文字内容进行纠错操作,具体包括:
对所述第二识别结果进行语义分析,根据语义分析结果确定对应的纠错操作类型以及纠错后的内容;
按照所述纠错操作类型、纠错后的内容对所述展现的文字内容进行纠错。
可选地,所述纠错操作类型为替换;所述步骤:按照所述纠错操作类型、纠错后的内容对所述展现的文字内容进行纠错,具体为:对音相同或相近的文字进行替换。
可选地,所述第二语音包括替换字词的结构信息或者语义信息。
可选地,所述步骤:如果根据所述匹配结果确定编辑指令的类型为添加内容元素,根据所述第二识别结果推送相应的内容元素,具体包括:
将所述第二识别结果与预存的颜文字、图片、文本、动画、应用、音频、视频中的至少一项的标识信息和/或属性信息进行匹配;
向用户展现相匹配的颜文字、图片、文本、动画、应用、音频、视频中的至少一项。
可选地,在所述编辑模式下,基于用户习惯或对侧用户习惯向用户推荐颜文字、图片、文本、动画、应用、音频、视频中的至少一项。
可选地,所述图片包括静态图片。
可选地,该方法还包括:由语音输入模式切换至编辑模式后,提供编辑指令和/或输入内容提示信息。
图7是本申请实施例中服务器的结构示意图。该服务器1900可因配置或性能不同而产生比较大的差异,可以包括一个或一个以***处理器(central processing units,CPU)1922(例如,一个或一个以上处理器)和存储器1932,一个或一个以上存储应用程序1942或数据1944的存储介质1930(例如一个或一个以上海量存储设备)。其中,存储器1932和存储介 质1930可以是短暂存储或持久存储。存储在存储介质1930的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1922可以设置为与存储介质1930通信,在服务器1900上执行存储介质1930中的一系列指令操作。
服务器1900还可以包括一个或一个以上电源1926,一个或一个以上有线或无线网络接口1950,一个或一个以上输入输出接口1958,一个或一个以上键盘1956,和/或,一个或一个以上操作***1941,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
对于终端设备、装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请实施例是参照根据本申请实施例的方法、终端设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
本领域技术人员在考虑说明书及实践这里公开的申请后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性 变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (15)

  1. 一种语音输入方法,其特征在于,该方法包括:
    在语音输入模式下,接收用户输入的第一语音并识别生成第一识别结果,根据所述第一识别结果向用户展现相应的文字内容;
    在编辑模式下,接收用户输入的第二语音并识别生成第二识别结果;将所述第二识别结果转换为编辑指令,根据所述编辑指令执行相应操作;
    所述语音输入模式和编辑模式之间能相互切换。
  2. 根据权利要求1所述的方法,其特征在于,所述步骤:将所述第二识别结果转换为编辑指令,具体包括:将所述第二识别结果与预先存储的操作信息模型进行匹配,根据所述匹配结果确定编辑指令的类型。
  3. 根据权利要求2所述的方法,其特征在于,所述步骤:根据所述编辑指令执行相应操作,至少包括以下一个步骤:
    如果根据所述匹配结果确定编辑指令的类型为命令,则直接执行所述命令;
    如果根据所述匹配结果确定编辑指令的类型为纠错,则根据所述第二识别结果,对展现的文字内容进行纠错操作;
    如果根据所述匹配结果确定编辑指令的类型为添加内容元素,则根据所述第二识别结果推送相应的内容元素。
  4. 根据权利要求4所述的方法,其特征在于,所述命令包括删除、换行、回车、清空、发送、撤销中的至少一个。
  5. 根据权利要求3或4所述的方法,其特征还在于,所述步骤:如果根据所述匹配结果确定编辑指令的类型为纠错,则根据所述第二识别结果,对展现的文字内容进行纠错操作,具体包括:
    对所述第二识别结果进行语义分析,根据语义分析结果确定对应的纠错操作类型以及待纠错部分;
    对所述待纠错的部分按照所述纠错操作类型进行纠错。
  6. 根据权利要求5所述的方法,其特征还在于,所述步骤:对所述待纠错的部分按照所述纠错操作类型进行纠错,具体包括:根据待纠错部分的上下文确定纠错后的内容,对待纠错的部分进行更正。
  7. 根据权利要求3或4所述的方法,其特征还在于,所述步骤:根据所述第二识别结果,对所述展现的文字内容进行纠错操作,具体包括:
    对所述第二识别结果进行语义分析,根据语义分析结果确定对应的纠错操作类型以及纠错后的内容;
    按照所述纠错操作类型、纠错后的内容对所述展现的文字内容进行纠错。
  8. 根据权利要求7的方法,其特征还在于,所述纠错操作类型为替换;所述步骤:按照所述纠错操作类型、纠错后的内容对所述展现的文字内容进行纠错,具体为:对音相同或相近的文字进行替换。
  9. 根据权利要求8所述的方法,其特征还在于,所述第二语音包括替换字词的结构信息或者语义信息。
  10. 根据权利要求1-9任一项所述的方法,其特征还在于,所述步骤:如果根据所述匹配结果确定编辑指令的类型为添加内容元素,根据所述第二识别结果推送相应的内容元素,具体包括:
    将所述第二识别结果与预存的颜文字、图片、文本、动画、应用、音频、视频中的至少一项的标识信息和/或属性信息进行匹配;
    向用户展现相匹配的颜文字、图片、文本、动画、应用、音频、视频中的至少一项。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,在所述编辑模式下,基于用户习惯或对侧用户习惯向用户推荐颜文字、图片、文本、动画、应用、音频、视频中的至少一项。
  12. 根据权利要求10或11所述的方法,其特征在于,所述图片包括静态图片。
  13. 根据权利要求1-12任一项所述的方法,其特征还在于,该方法还包括:由语音输入模式切换至编辑模式后,提供编辑指令和/或输入内容提示信息。
  14. 一种终端设备,其特征在于,该设备包括:
    语音输入模块,配置为在语音输入模式下,接收用户输入的第一语音,在编辑模式下接收用户输入的第二语音;
    语音识别模块,配置为分别对所述第一语音、第二语音进行识别,分别生成第一识别结果、第二识别结果;
    显示模块,配置为根据第一识别结果向用户展现相应的文字内容;
    编辑操作处理模块,配置为在编辑模式下将所述第二识别结果转换为编辑指令,并根据所述编辑指令执行相应操作;所述语音输入模式和编辑模式之间能相互切换。
  15. 一种用于语音输入的装置,其特征在于,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:
    在语音输入模式下,接收用户输入的第一语音并识别生成第一识别结果,根据所述第一识别结果向用户展现相应的文字内容;
    在编辑模式下,接收用户输入的第二语音并识别生成第二识别结果;将所述第二识别结果转换为编辑指令,根据所述编辑指令执行相应操作;
    所述语音输入模式和编辑模式之间能相互切换。
PCT/CN2016/106261 2015-12-31 2016-11-17 语音输入方法和终端设备 WO2017114020A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/781,902 US10923118B2 (en) 2015-12-31 2016-11-17 Speech recognition based audio input and editing method and terminal device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511032340.3A CN106933561A (zh) 2015-12-31 2015-12-31 语音输入方法和终端设备
CN201511032340.3 2015-12-31

Publications (1)

Publication Number Publication Date
WO2017114020A1 true WO2017114020A1 (zh) 2017-07-06

Family

ID=59224521

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/106261 WO2017114020A1 (zh) 2015-12-31 2016-11-17 语音输入方法和终端设备

Country Status (4)

Country Link
US (1) US10923118B2 (zh)
CN (1) CN106933561A (zh)
TW (1) TWI720062B (zh)
WO (1) WO2017114020A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110018746A (zh) * 2018-01-10 2019-07-16 微软技术许可有限责任公司 通过多种输入模式来处理文档

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI647609B (zh) * 2017-04-14 2019-01-11 緯創資通股份有限公司 即時通訊方法、系統及電子裝置與伺服器
CN107300986B (zh) * 2017-06-30 2022-01-18 联想(北京)有限公司 输入法切换方法及装置
CN109410915B (zh) * 2017-08-15 2022-03-04 ***通信集团终端有限公司 语音质量的评估方法和装置、计算机可读存储介质
CN107978310B (zh) * 2017-11-30 2022-11-25 腾讯科技(深圳)有限公司 音频处理方法和装置
CN109903754B (zh) * 2017-12-08 2022-04-26 北京京东尚科信息技术有限公司 用于语音识别的方法、设备和存储器设备
TWI651714B (zh) * 2017-12-22 2019-02-21 隆宸星股份有限公司 語音選項選擇系統與方法以及使用其之智慧型機器人
CN108089836A (zh) * 2017-12-29 2018-05-29 上海与德科技有限公司 一种基于机器人的辅助学习方法及机器人
TWI664627B (zh) * 2018-02-06 2019-07-01 宣威科技股份有限公司 可優化外部的語音信號裝置
CN108509393A (zh) * 2018-03-20 2018-09-07 联想(北京)有限公司 一种编辑文本信息的方法和装置
TWI662542B (zh) * 2018-04-26 2019-06-11 鴻海精密工業股份有限公司 語音辨識裝置及方法
TWI679632B (zh) * 2018-05-09 2019-12-11 和碩聯合科技股份有限公司 語音偵測方法以及語音偵測裝置
CN110491377A (zh) * 2018-05-14 2019-11-22 成都野望数码科技有限公司 一种输入方法和装置
CN110493447A (zh) * 2018-05-14 2019-11-22 成都野望数码科技有限公司 一种消息处理方法以及相关设备
CN108965584A (zh) * 2018-06-21 2018-12-07 北京百度网讯科技有限公司 一种语音信息的处理方法、装置、终端和存储介质
CN108446278B (zh) * 2018-07-17 2018-11-06 弗徕威智能机器人科技(上海)有限公司 一种基于自然语言的语义理解***及方法
CN109119079B (zh) * 2018-07-25 2022-04-01 天津字节跳动科技有限公司 语音输入处理方法和装置
CN109119076B (zh) * 2018-08-02 2022-09-30 重庆柚瓣家科技有限公司 一种老人用户交流习惯的收集***及方法
CN110838291B (zh) * 2018-08-16 2024-06-18 北京搜狗科技发展有限公司 一种输入方法、装置和电子设备
CN109284501A (zh) * 2018-08-30 2019-01-29 上海与德通讯技术有限公司 一种文字更正方法、装置、服务器及存储介质
CN109388792B (zh) * 2018-09-30 2020-11-24 珠海格力电器股份有限公司 文本处理方法、装置、设备、计算机设备和存储介质
CN109599114A (zh) * 2018-11-07 2019-04-09 重庆海特科技发展有限公司 语音处理方法、存储介质和装置
CN111385462A (zh) * 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 信号处理装置、信号处理方法及相关产品
CN109637541B (zh) * 2018-12-29 2021-08-17 联想(北京)有限公司 语音转换文字的方法和电子设备
TWI767498B (zh) * 2019-02-13 2022-06-11 華南商業銀行股份有限公司 整合機器學習的跨通路人工智慧對話式平台及其運作方法
CN111859089B (zh) * 2019-04-30 2024-02-06 北京智慧星光信息技术有限公司 一种用于互联网信息的错词检测控制方法
CN110297928A (zh) * 2019-07-02 2019-10-01 百度在线网络技术(北京)有限公司 表情图片的推荐方法、装置、设备和存储介质
CN110457105B (zh) * 2019-08-07 2021-11-09 腾讯科技(深圳)有限公司 界面操作方法、装置、设备及存储介质
CN112612442A (zh) * 2019-09-19 2021-04-06 北京搜狗科技发展有限公司 一种输入方法、装置和电子设备
US11289092B2 (en) * 2019-09-25 2022-03-29 International Business Machines Corporation Text editing using speech recognition
CN110827815B (zh) * 2019-11-07 2022-07-15 深圳传音控股股份有限公司 一种语音识别方法、终端、***以及计算机存储介质
US11138386B2 (en) * 2019-11-12 2021-10-05 International Business Machines Corporation Recommendation and translation of symbols
CN112905079B (zh) * 2019-11-19 2022-12-13 北京搜狗科技发展有限公司 一种数据处理方法、装置和介质
CN113761843B (zh) * 2020-06-01 2023-11-28 华为技术有限公司 语音编辑方法、电子设备及计算机可读存储介质
CN112463105A (zh) * 2020-11-10 2021-03-09 北京搜狗科技发展有限公司 一种数据处理方法、装置和电子设备
CN112637407A (zh) * 2020-12-22 2021-04-09 维沃移动通信有限公司 语音输入方法、装置及电子设备
TWI763207B (zh) 2020-12-25 2022-05-01 宏碁股份有限公司 聲音訊號處理評估方法及裝置
CN113326279A (zh) * 2021-05-27 2021-08-31 阿波罗智联(北京)科技有限公司 语音搜索方法和装置、电子设备、计算机可读介质
CN113378530A (zh) * 2021-06-28 2021-09-10 北京七维视觉传媒科技有限公司 语音编辑方法及装置、设备和介质
CN115442469A (zh) * 2022-08-31 2022-12-06 南京维沃软件技术有限公司 消息输出、消息发送方法及其装置
US11825004B1 (en) 2023-01-04 2023-11-21 Mattel, Inc. Communication device for children

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177724A (zh) * 2013-03-19 2013-06-26 华为终端有限公司 语音控制文本操作的方法、装置及终端
CN103246648A (zh) * 2012-02-01 2013-08-14 腾讯科技(深圳)有限公司 语音输入控制方法及装置
CN103645876A (zh) * 2013-12-06 2014-03-19 百度在线网络技术(北京)有限公司 语音输入方法和装置
CN104346127A (zh) * 2013-08-02 2015-02-11 腾讯科技(深圳)有限公司 语音输入的实现方法、装置及终端

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5909667A (en) * 1997-03-05 1999-06-01 International Business Machines Corporation Method and apparatus for fast voice selection of error words in dictated text
EP1093058A1 (en) * 1999-09-28 2001-04-18 Cloanto Corporation Method and apparatus for processing text and character data
US7444286B2 (en) * 2001-09-05 2008-10-28 Roth Daniel L Speech recognition using re-utterance recognition
US20030220788A1 (en) * 2001-12-17 2003-11-27 Xl8 Systems, Inc. System and method for speech recognition and transcription
NZ564249A (en) * 2005-06-16 2010-12-24 Firooz Ghassabian Data entry system
WO2007121441A2 (en) * 2006-04-17 2007-10-25 Vovision Llc Methods and systems for correcting transcribed audio files
US8352264B2 (en) * 2008-03-19 2013-01-08 Canyon IP Holdings, LLC Corrective feedback loop for automated speech recognition
US20090326938A1 (en) * 2008-05-28 2009-12-31 Nokia Corporation Multiword text correction
US20110035209A1 (en) * 2009-07-06 2011-02-10 Macfarlane Scott Entry of text and selections into computing devices
KR101681281B1 (ko) * 2010-04-12 2016-12-12 구글 인코포레이티드 입력 방법 에디터에 대한 확장 프레임워크
US8797266B2 (en) * 2011-05-16 2014-08-05 John Zachary Dennis Typing input systems, methods, and devices
KR20140008835A (ko) * 2012-07-12 2014-01-22 삼성전자주식회사 음성 인식 오류 수정 방법 및 이를 적용한 방송 수신 장치
KR20140108995A (ko) * 2013-03-04 2014-09-15 삼성전자주식회사 페이지의 일부 영역을 이용한 데이터 처리 방법 및 장치
US10121493B2 (en) * 2013-05-07 2018-11-06 Veveo, Inc. Method of and system for real time feedback in an incremental speech input interface
US10586556B2 (en) * 2013-06-28 2020-03-10 International Business Machines Corporation Real-time speech analysis and method using speech recognition and comparison with standard pronunciation
US20160004502A1 (en) * 2013-07-16 2016-01-07 Cloudcar, Inc. System and method for correcting speech input
US20160210276A1 (en) * 2013-10-24 2016-07-21 Sony Corporation Information processing device, information processing method, and program
CN105094371A (zh) * 2015-08-28 2015-11-25 努比亚技术有限公司 移动终端的文字输入模式切换装置和方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246648A (zh) * 2012-02-01 2013-08-14 腾讯科技(深圳)有限公司 语音输入控制方法及装置
CN103177724A (zh) * 2013-03-19 2013-06-26 华为终端有限公司 语音控制文本操作的方法、装置及终端
CN104346127A (zh) * 2013-08-02 2015-02-11 腾讯科技(深圳)有限公司 语音输入的实现方法、装置及终端
CN103645876A (zh) * 2013-12-06 2014-03-19 百度在线网络技术(北京)有限公司 语音输入方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110018746A (zh) * 2018-01-10 2019-07-16 微软技术许可有限责任公司 通过多种输入模式来处理文档
CN110018746B (zh) * 2018-01-10 2023-09-01 微软技术许可有限责任公司 通过多种输入模式来处理文档

Also Published As

Publication number Publication date
US10923118B2 (en) 2021-02-16
TWI720062B (zh) 2021-03-01
US20180366119A1 (en) 2018-12-20
CN106933561A (zh) 2017-07-07
TW201725580A (zh) 2017-07-16

Similar Documents

Publication Publication Date Title
WO2017114020A1 (zh) 语音输入方法和终端设备
CN105426152B (zh) 弹幕的显示方法和装置
WO2017088247A1 (zh) 输入处理方法、装置及设备
CN107291260B (zh) 一种信息输入方法和装置、及用于信息输入的装置
WO2017080084A1 (zh) 字体添加方法及装置
CN110391966B (zh) 一种消息处理方法、装置和用于消息处理的装置
CN104079964B (zh) 传输视频信息的方法及装置
WO2020019683A1 (zh) 一种输入方法、装置和电子设备
CN104268151B (zh) 联系人分组方法及装置
CN110244860B (zh) 一种输入方法、装置和电子设备
CN111046210A (zh) 一种信息推荐方法、装置和电子设备
CN105487799A (zh) 内容转换方法及装置
CN110968246A (zh) 中文智能手写输入识别方法及装置
CN112988956B (zh) 自动生成对话的方法及装置、信息推荐效果检测方法及装置
CN109977424A (zh) 一种机器翻译模型的训练方法及装置
CN110780749B (zh) 一种字符串纠错方法和装置
CN112000766A (zh) 一种数据处理方法、装置和介质
WO2019196527A1 (zh) 一种数据处理方法、装置和电子设备
WO2019144724A1 (zh) 一种表情输入方法及装置
CN109284510B (zh) 一种文本处理方法、***和一种用于文本处理的装置
CN111831132A (zh) 一种信息推荐方法、装置和电子设备
CN107977089B (zh) 一种输入方法和装置、一种用于输入的装置
CN113127613B (zh) 聊天信息处理方法及装置
CN109558017B (zh) 一种输入方法、装置和电子设备
CN112905023A (zh) 一种输入纠错方法、装置和用于输入纠错的装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16880810

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16880810

Country of ref document: EP

Kind code of ref document: A1