WO2017128991A1 - 一种基于语音识别的即时通信方法和即时通信*** - Google Patents

一种基于语音识别的即时通信方法和即时通信*** Download PDF

Info

Publication number
WO2017128991A1
WO2017128991A1 PCT/CN2017/071382 CN2017071382W WO2017128991A1 WO 2017128991 A1 WO2017128991 A1 WO 2017128991A1 CN 2017071382 W CN2017071382 W CN 2017071382W WO 2017128991 A1 WO2017128991 A1 WO 2017128991A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
text information
voice
instant messaging
text
Prior art date
Application number
PCT/CN2017/071382
Other languages
English (en)
French (fr)
Inventor
鄢志杰
Original Assignee
阿里巴巴集团控股有限公司
鄢志杰
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 鄢志杰 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017128991A1 publication Critical patent/WO2017128991A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]

Definitions

  • the present application relates to the field of instant messaging technologies, and in particular, to a voice communication based instant messaging method and an instant messaging system.
  • the social app intercom chat through mobile phone or tablet is a convenient function commonly used by many softwares, such as Tencent's WeChat, Ali's nail, Alipay, Taobao, etc. all have such a function.
  • the main implementation of such functions is that the transmitting terminal records its own message by voice, and the receiving party clicks on the received information and listens through the handset or externally.
  • the main disadvantage is that the receiving terminal can't see the information content at the same time as the text information. You need to tap and hold the phone or tablet to the ear to listen to it, or use the speaker of the mobile phone or tablet to put it on, in many occasions ( For example, there are other people in the meeting or next to it. This is very inconvenient, and there may be problems with privacy leaks.
  • embodiments of the present application have been made in order to provide a voice recognition based instant messaging method and instant messaging system that overcomes the above problems or at least partially solves the above problems.
  • the present application discloses a voice recognition based instant messaging method, including:
  • the text message is sent to the receiving terminal.
  • Another embodiment of the present application provides a voice communication based instant messaging method, including:
  • the edited text message is displayed and the edited text message is sent to the server.
  • Another embodiment of the present application provides a voice communication based instant messaging method, including:
  • An embodiment of the present application provides an instant messaging system based on voice recognition, which includes:
  • a voice information receiving module configured to receive voice information sent by the sending terminal
  • a text information generating module configured to perform voice recognition on the voice information to generate text information
  • a first sending module configured to send the voice information to the receiving terminal
  • the second sending module is configured to send the text information to the receiving terminal.
  • Another embodiment of the present application provides a voice recognition based instant messaging system, including:
  • a voice information recording and sending module configured to record voice information and send it to a server
  • a text information receiving display module configured to receive text information generated by identifying the voice information, and display the text information
  • the editing module is configured to enter an interface for editing the text information after receiving the correct operation instruction;
  • the display sending module is configured to display the edited text information and send the edited text information to the server.
  • Another embodiment of the present application provides an instant messaging system based on voice recognition, including:
  • a voice information acquiring module configured to receive voice information sent by the server
  • a text information obtaining module configured to receive text information generated by the server after identifying the voice information
  • a text message display tag module for displaying and marking the text message.
  • the voice information and the text information are all sent to the receiving terminal through the voice recognition function, thereby overcoming the obstacle of obtaining information by the receiving terminal, and facilitating the user. Use, to avoid the problem of privacy leaks.
  • FIG. 1 is a flow chart of a voice recognition based instant messaging method according to a first embodiment of the present application.
  • FIG. 2 is a flow chart of a voice recognition based instant messaging method according to a second embodiment of the present application.
  • FIG. 3 is a flow chart of a voice recognition based instant messaging method according to a third embodiment of the present application.
  • FIG. 4 is a flow chart of a voice recognition based instant messaging method according to a fourth embodiment of the present application.
  • Figure 5 is a block diagram of an instant messaging system corresponding to the voice recognition based instant messaging method of the first embodiment of the present application.
  • FIG. 6 is a block diagram of an instant messaging system corresponding to the voice recognition based instant messaging method of the second embodiment of the present application.
  • Figure 7 is a block diagram of an instant messaging system corresponding to the voice recognition based instant messaging method of the third embodiment of the present application.
  • FIG. 8 is a block diagram of an instant messaging system corresponding to the voice recognition based instant messaging method of the fourth embodiment of the present application.
  • One of the core ideas of the present application is to propose an instant communication method and an instant communication system, which uses voice recognition to identify voice information, and displays the text information directly on the screen of the transmitting terminal and the receiving terminal through the server, thereby facilitating reception.
  • the terminal receives the information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, and avoids the problem of user privacy leakage.
  • FIG. 1 is a flowchart of a voice recognition-based instant messaging method according to a first embodiment of the present application.
  • the instant messaging method in the first embodiment of the present application is applied to a server, and includes the following steps:
  • S101 Receive voice information sent by the sending terminal.
  • the transmitting terminal can record the voice information in an instant communication interface (for example, a chat interface), and after the recording is completed, the mark or button is released, and the recording is completed. After that, the transmitting terminal sends the voice information to the server through the network.
  • an instant communication interface for example, a chat interface
  • the server After receiving the voice information sent by the party, the server recognizes the voice information as text information through voice recognition technology. Speech recognition technology is commonly used in the field Technology is not described here.
  • the server transmits the voice information received in step S101 to the receiving terminal.
  • step S103 may be performed simultaneously with step S102 or sequentially, and when executed sequentially, the sequence of steps of step S102 and step S103 is not particularly limited.
  • the server transmits the text information generated after the speech recognition processing to the receiving terminal.
  • the server transmits a specified mark while transmitting the text information for distinguishing the text information converted from the voice information and the text information directly input by the sender in a text manner.
  • step S104 may be performed simultaneously with step S103, or step S104 may be performed before or after step S103, which is not particularly limited.
  • step S103 may be performed first, and the voice information received in step S101 is sent to the receiving terminal, and then step S102 is performed to generate voice information by voice recognition, and then step S104 is performed to generate the voice information.
  • the text information is sent to the receiving terminal.
  • step S102 is performed first, and the voice information received in step S101 is voice-recoordinated to generate text information, and then steps S103 and S104 are performed simultaneously or sequentially to perform voice.
  • the information and the text information generated after the identification are transmitted to the receiving terminal.
  • the first embodiment of the present application provides a voice communication-based instant messaging method, which generates voice information by identifying the voice information, and sends the voice information and the text information to the receiving terminal through the server.
  • the instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, and avoids the problem of user privacy leakage.
  • FIG. 2 is a flowchart of a voice recognition based instant messaging method according to a second embodiment of the present application.
  • the instant messaging method in the first embodiment of the present application is applied to a server, and includes the following steps:
  • S201 Receive voice information sent by the sending terminal.
  • steps S201 to S204 are the same as or similar to the steps S101 to S104 in the first embodiment, and are not described herein.
  • the method may further include
  • the server transmits the text information generated in step S202 to the transmitting terminal.
  • the execution order of the step S205, the step S204, and the step S203 is not limited, and the three may be executed at the same time or sequentially in any order, and the present application is not particularly limited.
  • the method may further include:
  • the server sends the recognized text information to the database connected to the server for use.
  • This step S206 can be performed simultaneously with any of the steps S203 to S205 or in any order, and the present application is not particularly limited.
  • the method may further include:
  • step S207 can be performed simultaneously with step S205, that is, while transmitting the text information generated after the identification to the transmitting terminal, the error assisting correction information is simultaneously transmitted to the transmitting terminal, and the transmitting terminal modifies the recognized text information.
  • a word graph and a plurality of candidate word information will be generated.
  • an algorithm may be used according to the information in the word map to recommend an alternative error correction word to the user. .
  • This information can be used to assist in more accurate error correction of the recognized text by returning the transmitting terminal.
  • the user of the sending terminal selects an error correction and clicks on a word identifying the error
  • other candidate words of the word can be obtained by assisting the correction information, and displayed on the virtual keyboard, the user can click through the correct word.
  • Candidates efficiently perform error correction. Specifically, for example, the user says, “I want to buy a yellow one”, and the speech recognition error is recognized as “I want to buy red”.
  • the algorithm can prompt according to the word map information.
  • the second candidate for "yellow” is for the user to click. When the user clicks "yellow", the replacement error correction operation is completed, which is very simple and fast.
  • the method may further include:
  • Step S208 receiving the edited text information sent by the sending terminal, and transmitting the text information to the receiving terminal;
  • the transmitting terminal sends the edited text information to the server, and the server receives the edited text information and sends it to the receiving terminal.
  • the application may further include:
  • step S209 the edited text information is sent to the database.
  • the corrected automatic speech recognition result is of high value and is particularly important. It prompts: 1) the server fails to correctly recognize the voice information; 2) the correct text information of the voice information has been used by the user Given by correction.
  • the speech recognition system's training algorithm can be used to record the erroneous text content, the corresponding speech content and the correct speech content, so as to avoid making similar mistakes thereafter. The ability of such error correction data to self-evolve the speech recognition system is unmatched by other data.
  • the second embodiment of the present application provides a voice communication-based instant messaging method, which generates voice information by identifying voice information, and sends voice information and text information to the receiving terminal through the server, and sends the text information.
  • auxiliary modification information is provided, by which the user of the transmitting terminal can be efficiently modified.
  • the instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, avoids the problem of user privacy leakage, and further ensures that the receiving terminal receives the information. The accuracy of the information.
  • FIG. 3 is a flowchart of a voice recognition-based instant messaging method according to a third embodiment of the present application.
  • the instant messaging method in the third embodiment of the present application is applied to a transmitting terminal of information, and includes the following steps:
  • the transmitting terminal can record voice information in an instant communication interface (such as a chat interface), for example, pressing and holding a specified mark or button of the input box to start recording, and releasing the mark or button after the recording is completed, Recording is complete.
  • an instant communication interface such as a chat interface
  • the instant messaging interface may default to sending directly, or the sending terminal may click another tag or button to send the information to the server over the network.
  • S302. Receive generated text information after the voice information is recognized by the server, and display the text information.
  • the server transmits the voice information sent by the terminal to perform voice recognition to generate text information and transmits the text information to the transmitting terminal, and the transmitting terminal receives the recognized text information and displays it.
  • the sending terminal sends the recorded voice information to the server in step S301.
  • the sending terminal can receive the text information generated by the server after the identification of the voice information in the same chat interface. And displayed in the chat interface.
  • the error correction interface can be opened by issuing a correcting operation instruction.
  • the correct operation instruction may press the text information for the user, and the sending terminal receives the instruction and opens the error correction interface to enter the edit text state, and the correction interface may display an input interface such as a virtual keyboard or a handwriting keyboard for the user. Correct the error.
  • the user can add or delete text information through a virtual keyboard or the like.
  • the method can further include:
  • S304 Display the edited text information, and send the edited text information to the server.
  • the edited text information after editing by the user of the sending terminal is displayed on the sending end, and the text information is simultaneously uploaded by the sending terminal to the server, and sent by the server to the receiving party and displayed synchronously. No longer.
  • the method may further include:
  • Step S302a receiving auxiliary modification information sent by the server
  • the word graph and the multi-candidate candidate information generated in the speech recognition process are transmitted to the transmitting terminal, which can assist the transmitting terminal user to perform error correction on the recognized text more efficiently.
  • the error correction interface can display not only the text information into the editing state, the virtual keyboard or the handwriting keyboard, but also the auxiliary modification information sent by the server in step S302a, for example, generated after the server considers the voice recognition. If a sentence or a certain word in the text message does not conform to the grammatical composition, you can add a dotted underline to the sentence or the word, and display the auxiliary sent by the server at other positions of the sending terminal display interface (such as the input interface). Modify multiple candidate words included in the message for the user to select the correct candidate.
  • step S302 the method further includes:
  • the transmitting terminal can play the voice information recorded in step 3101 through the earpiece or the speaker.
  • the third embodiment of the present application provides a voice communication-based instant messaging method, which generates voice information by identifying voice information and provides an error correction function, so that the user of the transmitting terminal can modify the recognized text. information.
  • the instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, avoids the problem of user privacy leakage, and ensures that the receiving terminal receives the information. The accuracy.
  • the third embodiment of the present application can also receive the auxiliary modification information sent by the server, so that the user can modify the text information efficiently, thereby further improving the accuracy and timeliness of the information.
  • FIG. 4 is a flowchart of a voice recognition based instant messaging method according to a fourth embodiment of the present application.
  • the instant messaging method in the fourth embodiment of the present application is applied to a receiving terminal of information, and includes the following steps:
  • S401 Receive voice information sent by a server.
  • the transmitting terminal records the voice information and sends it to the server, where the voice information is sent by the server to the receiving terminal;
  • S402. Receive text information generated by the server and identify the voice information.
  • the server generates the text information through voice recognition, and then sends the text information to the receiving terminal, and the receiving terminal receives the text information generated by the recognition.
  • step S401 and the step S402 can be performed simultaneously or sequentially, that is, the receiving terminal can receive the voice information and the generated text information simultaneously or sequentially, and the application is not particularly limited.
  • the server converts the voice information into the text information
  • the voice information and the text information are simultaneously sent to the receiving terminal, and the receiving terminal simultaneously receives the voice information and the text information.
  • the receiving terminal can display the text information on the interface of the instant messaging. Since the text information is generated by recognizing the voice information, in order to distinguish it from the text information directly input by the sender in the text, the text information can be marked, for example, by setting a special background color, a font, and a special character to be marked. (for example, "voice recognition" or "ASR") to distinguish between plain text information and text information for speech recognition.
  • ASR voice recognition
  • the receiving terminal receives the voice information and the text information corresponding to the voice information, the receiving terminal marks the text information to distinguish it from the server.
  • the text information input by the sending terminal directly in the form of text; another possible way is that the server simultaneously sends a mark when transmitting the text information, and the mark is displayed on the display interface of the receiving terminal simultaneously with the text information.
  • the method further includes:
  • S402a Receive tag information sent by the server.
  • the flag information may be, for example, a special background color, a font, a special character (for example, "speech recognition” or "ASR"), or the like.
  • the method may further include:
  • the instruction to play the voice information may be that the user clicks on the text information, and when the user clicks on the displayed text information, the receiving terminal plays the voice information received in step S401 through the earpiece or the speaker;
  • the method may further include:
  • the transmitting terminal after the transmitting terminal performs error correction on the text information, the transmitting terminal sends the corrected text information to the server, and the server sends the corrected text information to the receiving terminal, and the receiving terminal receives the edited text information and displays it.
  • the receiving terminal may overwrite the text information before the modification with the edited text information.
  • the fourth embodiment of the present application provides a voice communication-based instant messaging method, which generates voice information by recognizing voice information and provides an error correction function, so that the user of the receiving terminal can directly receive the voice recognition.
  • Text information and can clarify whether the text information is text information generated by the sending terminal directly in text form or after speech recognition.
  • the instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, and avoids the problem of user privacy leakage.
  • FIG. 5 shows an instant messaging system corresponding to the voice recognition based instant messaging method according to the first embodiment of the present invention.
  • the instant messaging system 500 in this embodiment includes the following modules:
  • the voice information receiving module 501 is configured to receive voice information sent by the sending terminal.
  • the text information generating module 502 is configured to perform voice recognition on the voice information to generate text information.
  • a first sending module 503, configured to send the voice information to the receiving terminal
  • the second sending module 504 is configured to send the text information to the receiving terminal.
  • FIG. 6 is a view showing an instant messaging system corresponding to the voice recognition based instant messaging method according to the second embodiment of the present invention.
  • the system 600 in addition to the voice information connection described above.
  • the system 600 in addition to the receiving module 601, the text information generating module 602, the first sending module 603, and the second sending module 604, the system 600 further includes:
  • the third sending module 605 is configured to send the text information to the sending terminal.
  • system 600 further includes:
  • the information transceiver module 606 is configured to receive the edited text information sent by the sending terminal, and send the information to the receiving terminal.
  • system further includes:
  • the first storage module 607 stores the text information in a database.
  • system further includes:
  • a fourth sending module 608, configured to send the auxiliary error correction information to the sending terminal
  • the information transceiver module 609 is configured to receive the edited text information sent by the sending terminal, and send the information to the receiving terminal.
  • system further includes:
  • the text information association module 610 is configured to send the edited text information to the database and associate with the text information before the correction.
  • the auxiliary error correction information includes a word map and a candidate word for a specified word, word or sentence of the text information.
  • the word map and candidate words of the specified word, word or sentence are obtained from the database.
  • the first sending module and the second sending module are simultaneously executed, and the voice information and the text information are simultaneously sent to the receiving terminal.
  • FIG. 7 shows an instant messaging system corresponding to the voice recognition based instant messaging method according to the third embodiment of the present invention.
  • the instant messaging system 700 in this embodiment includes the following Module:
  • the voice information recording and sending module 701 is configured to record voice information and send it to the server;
  • the text information receiving and displaying module 702 is configured to receive the text information generated by identifying the voice information, and display the text information;
  • the editing module 703 is configured to enter an interface for editing the text information after receiving the correct operation instruction;
  • the display sending module 704 is configured to display the edited text information, and send the edited text information to the server.
  • system further includes:
  • the auxiliary modification information receiving module 705 is configured to receive auxiliary modification information sent by the server.
  • the auxiliary error correction information includes a word map and a candidate word for a specified word, word or sentence of the text information, the candidate word being displayed in an interface of the edit text information.
  • the interface for editing text information includes an input interface.
  • system further includes:
  • the voice information playing module 706 is configured to play the voice information after receiving the instruction to play the voice information.
  • the playing voice information command is generated by the user clicking the text message.
  • FIG. 8 shows an instant messaging system corresponding to the voice recognition based instant messaging method according to the fourth embodiment of the present invention.
  • the instant messaging system 800 in this embodiment includes the following modules:
  • the voice information obtaining module 801 is configured to receive voice information sent by the server.
  • the text information obtaining module 802 is configured to receive text information generated by the server and identify the voice information;
  • the text information display marking module 803 is configured to display and mark the text information.
  • system further includes:
  • the tag information obtaining module 804 is configured to receive tag information sent by the server.
  • the text information acquisition module and the mark information acquisition module are simultaneously executed, and the text information and the mark information are simultaneously acquired.
  • the text information display tagging module is configured to display the text information, and mark the text information by using the tag information.
  • system further includes:
  • the voice information playing module 805 is configured to: when receiving an instruction of the user to play the voice information, play the voice information.
  • the instruction to play the voice message is generated by the user clicking the text message.
  • system further includes:
  • the receiving display module 806 is configured to receive the edited text information sent by the server, and display the edited text information.
  • the edited text information is displayed in a manner that covers pre-edit text information.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the voice recognition based instant messaging method and the instant messaging system proposed by the embodiments of the present application have at least the following advantages:
  • Voice recognition based instant messaging method and instant communication proposed in the embodiment of the present application
  • the voice recognition function overcomes the obstacle of obtaining information by the receiving terminal, which is convenient for the user to use and avoids the problem of privacy leakage.
  • the error modifying function enables the transmitting terminal to have an opportunity to correct the error of the voice recognition system
  • the real identification error data is obtained through the data collection function to improve the performance of the voice recognition system.
  • the error correction step facilitates the sending terminal to perform error correction
  • the step of information marking is convenient for the receiving terminal to recognize whether the received information is virtual keyboard input or voice information;
  • the receiving terminal can select the text information generated after the voice information is recognized, and play back the original voice information.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • Memory may include computer readable media Non-permanent memory, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media including both permanent and non-persistent, removable and non-removable media may be implemented by any method or technology for signal storage.
  • the signals can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read only memory
  • flash memory or other memory technology
  • compact disk read only memory CD-ROM
  • DVD digital versatile disk
  • a magnetic tape cartridge, magnetic tape storage or other magnetic storage device or any other non-transporting medium can be used to store signals that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device
  • the instructions executed on the instructions are provided for implementing one or more blocks in a flow or a flow and/or block diagram of the flowchart The steps of the function specified in the box.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种基于语音识别的即时通信方法和即时通信***。即时通信方法包括:接收发送终端发送的语音信息(S101);将语音信息进行语音识别,生成文字信息(S102);将语音信息发送至接收终端(S103);以及将文字信息发送至接收终端(S104)。即时通信方法和***克服了某些场合下接收终端收到语音信息后无法收听的障碍,避免了使用者隐私泄露的问题。

Description

一种基于语音识别的即时通信方法和即时通信*** 技术领域
本申请涉及即时通信技术领域,尤其涉及一种基于语音识别的即时通信方法和即时通信***。
背景技术
通过手机或平板电脑的社交app对讲聊天是很多软件常用的便利功能,例如腾讯的微信、阿里的钉钉、支付宝、淘宝等都具备这样的功能。目前这类功能主要的实现方式是发送终端通过语音方式录制自己的留言,接受方点按收到的信息,通过听筒或外放收听。
这类功能在方便发送终端的同时,对接收终端实际带来了一定障碍。主要缺点在于:接收终端无法像文字信息一样一目了然的看到信息内容,需要点按再将手机或平板拿到耳边用听筒收听,或是用手机或平板的扬声器外放,这在很多场合(例如会议中、或旁边有其他人),这是非常不便的,也可能存在隐私泄露的问题。
发明内容
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的基于语音识别的即时通信方法和即时通信***。
为解决上述问题,本申请公开一种基于语音识别的即时通信方法,包括:
接收发送终端发送的语音信息;
将该语音信息进行语音识别,生成文字信息;
将该语音信息发送至接收终端;以及
将该文字信息发送至接收终端。
本申请另一实施例提出一种基于语音识别的即时通信方法,包括:
录制语音信息并发送至服务器;
接收经过识别该语音信息生成的文字信息,并显示该文字信息;
在接收到纠正操作指令后,进入编辑文字信息的界面;
显示编辑后文字信息,并将编辑后文字信息发送至服务器。
本申请再一实施例提出一种基于语音识别的即时通信方法,包括:
接收服务器发送的语音信息;
接收服务器发送的识别该语音信息后生成的文字信息;
显示并标记该文字信息。
本申请一实施例提出一种基于语音识别的即时通信***,其特征在于,包括:
语音信息接收模块,用于接收发送终端发送的语音信息;
文字信息生成模块,用于将该语音信息进行语音识别,生成文字信息;
第一发送模块,用于将该语音信息发送至接收终端;以及
第二发送模块,用于将该文字信息发送至接收终端。
本申请另一实施例提出一种基于语音识别的即时通信***,包括:
语音信息录制发送模块,用于录制语音信息并发送至服务器;
文字信息接收显示模块,用于接收经过识别该语音信息生成的文字信息,并显示该文字信息;
编辑模块,用于在接收到纠正操作指令后,进入编辑文字信息的界面;
显示发送模块,用于显示编辑后文字信息,并将编辑后文字信息发送至服务器。
本申请再一实施例提出一种基于语音识别的即时通信***,包括:
语音信息获取模块,用于接收服务器发送的语音信息;
文字信息获取模块,用于接收服务器发送的识别该语音信息后生成的文字信息;
文字信息显示标记模块,用于显示并标记该文字信息。
本申请实施例至少具有以下优点:
本申请实施例提出的基于语音识别的即时通信方法和即时通信***中,通过语音识别功能,将语音信息和文字信息均发送至接收终端,克服了接收终端获得信息的障碍,方便了使用者的使用,避免了隐私泄露的问题。
附图说明
图1是本申请第一实施例的基于语音识别的即时通信方法的流程图。
图2是本申请第二实施例的基于语音识别的即时通信方法的流程图。
图3是本申请第三实施例的基于语音识别的即时通信方法的流程图。
图4是本申请第四实施例的基于语音识别的即时通信方法的流程图。
图5是对应于本申请第一实施例的基于语音识别的即时通信方法的即时通信***的方框图。
图6是对应于本申请第二实施例的基于语音识别的即时通信方法的即时通信***的方框图。
图7是对应于本申请第三实施例的基于语音识别的即时通信方法的即时通信***的方框图。
图8是对应于本申请第四实施例的基于语音识别的即时通信方法的即时通信***的方框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。
本申请的核心思想之一在于,提出一种即时通信方法和即时通信***,使用语音识别将语音信息进行识别,并通过服务器将文字信息直接显示在发送终端和接收终端的屏幕上,方便了接收终端接收信息,克服了某些场合下接收终端收到语音信息后无法收听的障碍,避免了使用者隐私泄露的问题。
第一实施例
本申请第一实施例提出一种基于语音识别的即时通信方法,如图1所示为本申请第一实施例的基于语音识别的即时通信方法的流程图。本申请第一实施例中的即时通信方法应用于服务器,包括如下步骤:
S101,接收发送终端发送的语音信息;
在这一步骤中,发送终端可以在即时通信界面(例如聊天界面)录制语音信息,录音完成之后松开该标记或按钮,则录制完成。之后,发送终端将语音信息通过网络发送至服务器。
S102,将该语音信息识别为文字信息;
在这一步骤中,服务器接受到该方发送的语音信息之后,通过语音识别技术,将该语音信息识别为文字信息。语音识别技术是本领域常用的 技术,在此并不赘述。
S103,将该语音信息发送至接收终端;
在这一步骤中,服务器将步骤S101中接收到的语音信息发送至接收终端。
值得注意的是,步骤S103可以与步骤S102同时执行或先后执行,当先后执行时,步骤S102和步骤S103的步骤顺序并不特别限定。
S104,将识别后生成的该文字信息发送至接收终端;
在这一步骤中,服务器将经过语音识别处理后生成的文字信息发送给接收终端。优选地,在这一步骤中,服务器在发送文字信息的同时发送指定标记,用于区别由语音信息转成的文字信息和发送方直接以文字方式输入的文字信息。
值得注意的是,当步骤S103在步骤S102之后执行时,步骤S104可以与步骤S103同时执行,或者步骤S104可以先于或后于步骤S103执行,本申请并不特别限定。
在一实施例中,可以先执行步骤S103,将步骤S101中收到的语音信息发送至接收终端,再执行步骤S102,将语音信息经过语音识别生成文字信息,之后执行步骤S104,将识别后生成的文字信息发送至接收终端;在另一实施例中,可以先执行步骤S102,将步骤S101中收到的语音信息进行语音识别生成文字信息,再同时或先后执行步骤S103和步骤S104,将语音信息和识别后生成的文字信息发送至接收终端。
综上所述,本申请第一实施例提出一种基于语音识别的即时通信方法,将语音信息通过识别生成文字信息,通过服务器将语音信息和文字信息均发送至接收终端。该实施例提供的即时通信方法方便了接收终端接收信息,克服了某些场合下接收终端收到语音信息后无法收听的障碍,避免了使用者隐私泄露的问题。
第二实施例
本申请第二实施例提出一种基于语音识别的即时通信方法,如图2所示为本申请第二实施例的基于语音识别的即时通信方法的流程图。本申请第一实施例中的即时通信方法应用于服务器,包括如下步骤:
S201,接收发送终端发送的语音信息;
S202,将该语音信息识别为文字信息;
S203,将该语音信息发送至接收终端;
S204,将识别后生成的该文字信息发送至接收终端;
上述步骤S201至S204与第一实施例中的步骤S101至步骤S104相同或相似,在此并不赘述。
在一优选实施例中,在步骤S202之后,该方法还可以包括
S205,将识别后生成的该文字信息发送至发送终端;
在这一步骤中,服务器将在步骤S202中生成的文字信息发送至发送终端。
其中,步骤S205、步骤S204和步骤S203的执行顺序并不限制,三者可以同时执行,或者以任意顺序先后执行,本申请并不特别限制。
另外,在步骤S202之后,所述方法还可以包括:
S206,将识别后生成的该文字信息储存于数据库;
在这一步骤中,服务器将识别后生成的文字信息发送至与服务器连接的数据库中备用。这一步骤S206可以与步骤S203至S205中的任一者同时或以任意顺序先后执行,本申请并不特别限制。
在步骤S202之后,所述方法还可以包括:
S207,将辅助错误纠正信息发送至发送终端;
这一步骤可以与步骤S203至S205中的任一者同时或以任意顺序先后 执行,本申请并不特别限制。优选地,步骤S207可以与步骤S205同时执行,即在将识别后生成的文字信息发送至发送终端的同时,将错误辅助纠正信息同时发送至发送终端,供发送终端修改识别后的文字信息。
在语音识别过程中,将会产生词图(word graph)及识别词多候选信息,在步骤S207中,可以根据词图里的信息,使用算法,推荐备选的纠错词给使用者点选。这些信息通过回传发送终端,可以辅助更高效的对识别文本进行错误纠正。例如,当发送终端的使用者选择错误纠正、并点击识别错误的某字词时,可通过辅助纠正信息得到该字词的其他候选字词,并显示在虚拟键盘上,使用者可通过点击正确候选高效的进行错误纠正。具体地,举例来说,使用者说:“我要买黄色的”,语音识别错误识别成“我要买红色的”,当使用者点击“红色”这个词时,算法可根据词图信息,提示出“黄色”这个第二候选供使用者点选。用户点击“黄色”,即完成了替换纠错的操作,非常简单快捷。
之后,所述方法还可以包括:
步骤S208,接收发送终端发出的编辑后文字信息,并发送至接收终端;
在这一步骤中,当发送终端的使用者完成纠正后,发送终端将编辑后文字信息发送至服务器,服务器接收该编辑后文字信息,并发送至接收终端。
优选地,在步骤S208之后,本申请还可以包括:
步骤S209,将编辑后文字信息发送至数据库。
在这一步骤中,被纠正过的自动语音识别结果价值很高、尤为重要,它提示了:1)服务器未能完全正确地识别该语音信息;2)该语音信息的正确文字信息已由用户通过纠正给出。对这类编辑后文字信息,可以利用语音识别***的训练算法,记录识别错误的文字内容、所对应的语音内容和正确的语音内容,避免此后再犯类似错误。这类错误纠正数据对语音识别***自我进化的功能是其他数据所不可比拟的。
综上所述,本申请第二实施例提出一种基于语音识别的即时通信方法,将语音信息通过识别生成文字信息,通过服务器将语音信息和文字信息均发送至接收终端,并将文字信息发送至发送终端,在发送给发送终端之后提供辅助修改信息,利用该信息可以让发送终端的使用者能够高效地修改。该实施例提供的即时通信方法方便了接收终端接收信息,克服了某些场合下接收终端收到语音信息后无法收听的障碍,避免了使用者隐私泄露的问题,同时进一步保证了接收终端接收到信息的准确性。
第三实施例
本申请第三实施例提出一种基于语音识别的即时通信方法,如图3所示为本申请第三实施例的基于语音识别的即时通信方法的流程图。本申请第三实施例中的即时通信方法应用于信息的发送终端,包括如下步骤:
S301,录制语音信息并发送至服务器;
在这一步骤中,发送终端可以在即时通信界面(例如聊天界面)录制语音信息,例如按住输入框的指定标记或按钮不放,则开始录音,录音完成之后松开该标记或按钮,则录制完成。在录制完成之后,该即时通信界面可以默认为直接发送,或者发送终端点击另一标记或按钮,将信息通过网络发送至服务器。
S302,接收经过服务器识别该语音信息后的生成文字信息,并显示该文字信息;
在这一步骤中,服务器将发送终端发送的语音信息进行语音识别生成文字信息并回传给发送终端,发送终端接收识别后的文字信息,并进行显示。例如在聊天界面,发送终端在步骤S301中将录制好的语音信息发送给服务器,在此步骤S302中,发送终端可在同一聊天界面中接收服务器回传的识别该语音信息后生成的文字信息,并显示于该聊天界面。
S303,在接收到纠正操作指令后,开启错误纠正界面,进入编辑文字信息的界面;
在这一步骤中,当发送终端的使用者认为语音识别后生成的文字信息的内容与语音信息不一致,则可以通过发出纠正操作指令开启错误纠正界面。例如,纠正操作指令可以为使用者长按该文字信息,发送终端即接收该指令并开启错误纠正界面,进入编辑文本状态,同时该纠正界面可以显示虚拟键盘或者手写键盘等输入界面,供使用者纠正错误。使用者可以通过虚拟键盘等对文字信息进行增、删等操作。
之后,本方法还可以包括:
S304,显示编辑后文字信息,并将编辑后文字信息发送至服务器。
在这一步骤中,发送终端的使用者编辑之后的编辑后文字信息已显示在发送端,该文字信息同时由发送终端上传至服务器中,由该服务器发送至接收方并进行同步显示,本申请不再赘述。
在一优选实施例中,步骤S302之后还可以包括:
步骤S302a,接收服务器发送的辅助修改信息;
在这一步骤中,将在语音识别过程中产生的词图(word graph)及识别词多候选信息发送至发送终端,可以辅助发送终端使用者更高效的对识别文本进行错误纠正。
在步骤S303中,该错误纠正界面不仅可以显示文本信息进入编辑状态、虚拟键盘或者手写键盘等输入界面,同时可以显示步骤S302a中服务器发送的辅助修改信息,例如,当服务器认为语音识别之后生成的文本信息中某一句话或某一个词不符合语法构成,则可以在该句或该词的下方加上虚线下划线,同时在发送终端显示界面的其他位置(例如输入界面)显示服务器发送来的辅助修改信息中包含的多个候选词,供使用者点选正确的候选词。或者,当发送方选择错误纠正、并点击识别错误的某字词时,可通过辅助纠正信息得到该字词的其他候选字词,并显示在虚拟键盘上,用户可通过点击正确候选高效的进行错误纠正。
在一优选实施例中,步骤S302之后还包括:
S302b,在接收到播放语音信息指令后,播放语音信息;
在该步骤中,若发送终端的使用者通过点击所显示的文字信息等方式发出播放语音信息指令,则发送终端可以通过听筒或扬声器播放在步骤3101中录制的语音信息。
综上所述,本申请第三实施例提出一种基于语音识别的即时通信方法,将语音信息通过识别生成文字信息,并提供错误纠正功能,可以让发送终端的使用者能够修改识别后的文字信息。该实施例提供的即时通信方法方便了接收终端接收信息,克服了某些场合下接收终端收到语音信息后无法收听的障碍,避免了使用者隐私泄露的问题,同时保证了接收终端接收到信息的准确性。
优选地,本申请第三实施例还可以接收服务器发出的辅助修改信息,可以让使用者高效地修改文本信息,进一步提高了信息的准确性和及时性。
第四实施例
本申请第四实施例提出一种基于语音识别的即时通信方法,如图4所示为本申请第四实施例的基于语音识别的即时通信方法的流程图。本申请第四实施例中的即时通信方法应用于信息的接收终端,包括如下步骤:
S401,接收服务器发送的语音信息;
在这一步骤中,发送终端录制语音信息并发送至服务器,在由服务器将该语音信息发送至接收终端;
S402,接收服务器发送的识别该语音信息后生成的文字信息;
在这一步骤中,服务器将该语音信息经过语音识别生成文字信息之后,发送至接收终端,接收终端接收经过识别生成的这一文字信息。
值得注意的是,步骤S401和步骤S402可以同时或先后执行,即接收终端可以同时或先后接收语音信息和生成的文字信息,本申请并不特别限制。优选地,服务器将语音信息转成文字信息之后,再将语音信息和文字信息同时发送给接收终端,接收终端同时接收该语音信息和该文字信息。
S403,显示并标记该文字信息;
在这一步骤中,接收终端可以将该文字信息显示于即时通信的界面上。由于该文字信息是由语音信息经过识别后生成,为了将其区别于发送方直接以文字输入的文字信息,可以对该文字信息进行标记,例如通过设置特别的底色、字体、标记特别的字符(例如“语音识别”或“ASR”)来区分普通文字信息和语音识别的文字信息。
在标记该文字信息中,一种可能的方式是,当接收终端接收到语音信息和对应于该语音信息的文字信息,则接收终端将该文字信息进行标记,使之区别于服务器发来的由发送终端直接以文字形式输入的文字信息;另一种可能的方式是,服务器在发送该文字信息时同时发送标记,该标记与该文字信息同时显示于接收终端的显示界面上。在这一种情况下,步骤S402之后还包括:
S402a,接收服务器发送的标记信息。
在这一步骤中,这一标记信息例如可以为设置特别的底色、字体、标记特别的字符(例如“语音识别”或“ASR”)等。
优选地,在步骤S403之后,该方法还可以包括:
S404,当接收到使用者的播放该语音信息的指令,播放该语音信息;
在这一实施例中,播放语音信息的指令可以为使用者点击该文字信息,当使用者点击所显示的文字信息,该接收终端怎通过听筒或扬声器播放步骤S401中接收到的语音信息;
优选地,在步骤S403之后,该方法还可以包括:
S405,接收服务器发送的编辑后文字信息,并显示编辑后文字信息;
在该步骤中,当发送终端对文字信息进行错误纠正后,发送终端将纠正后文本信息发送至服务器,由服务器发送至接收终端,接收终端接收该编辑后文字信息,并进行显示。优选地,接收终端可以用编辑后文字信息覆盖修改之前的文字信息。
综上所述,本申请第四实施例提出一种基于语音识别的即时通信方法,将语音信息通过识别生成文字信息,并提供错误纠正功能,可以让接收终端的使用者直接接收经过语音识别的文字信息,并能够明确该文字信息是由发送终端直接以文字形式发出还是经过语音识别后生成的文字信息。该实施例提供的即时通信方法方便了接收终端接收信息,克服了某些场合下接收终端收到语音信息后无法收听的障碍,避免了使用者隐私泄露的问题。
图5所示为对应于本发明第一实施例的基于语音识别的即时通信方法的即时通信***,如图5所示,该实施例中的即时通信***500包括如下模块:
语音信息接收模块501,用于接收发送终端发送的语音信息;
文字信息生成模块502,用于将该语音信息进行语音识别,生成文字信息;
第一发送模块503,用于将该语音信息发送至接收终端;以及
第二发送模块504,用于将该文字信息发送至接收终端。
图6所示为对应于本发明第二实施例的基于语音识别的即时通信方法的即时通信***,如图6所示,在一优选实施例中,除了上述语音信息接 收模块601、文字信息生成模块602、第一发送模块603、第二发送模块604之外,所述***600还包括:
第三发送模块605,用于将该文字信息发送至发送终端。
此外,所述***600还包括:
信息收发模块606,用于接收所述发送终端发出的编辑后文字信息,并发送至接收终端。
在一优选实施例中,所述***还包括:
第一存储模块607,将该文字信息储存于数据库。
在一优选实施例中,所述***还包括:
第四发送模块608,用于将辅助错误纠正信息发送至发送终端;以及
信息收发模块609,用于接收所述发送终端发出的编辑后文字信息,并发送至接收终端。
在一优选实施例中,所述***还包括:
文字信息关联模块610,用于将编辑后文字信息发送至数据库,并与纠正前的所述文字信息关联。
在一优选实施例中,所述辅助错误纠正信息包括针对所述文字信息的指定字、词或句的词图和候选字词。
在一优选实施例中,所述指定字、词或句的词图和候选字词从所述数据库中获得。
在一优选实施例中,所述第一发送模块和所述第二发送模块同时执行,将所述将该语音信息和所述文字信息同时发送至接收终端。
图7所示为对应于本发明第三实施例的基于语音识别的即时通信方法的即时通信***,如图7所示,该实施例中的即时通信***700包括如下 模块:
语音信息录制发送模块701,用于录制语音信息并发送至服务器;
文字信息接收显示模块702,用于接收经过识别该语音信息生成的文字信息,并显示该文字信息;
编辑模块703,用于在接收到纠正操作指令后,进入编辑文字信息的界面;
显示发送模块704,用于显示编辑后文字信息,并将编辑后文字信息发送至服务器。
在一优选实施例中,所述***还包括:
辅助修改信息接收模块705,用于接收服务器发送的辅助修改信息。
在一优选实施例中,所述辅助错误纠正信息包括针对所述文字信息的指定字、词或句的词图和候选字词,所述候选字词显示在所述编辑文字信息的界面中。
在一优选实施例中,所述编辑文字信息的界面包括输入界面。
在一优选实施例中,所述***还包括:
语音信息播放模块706,用于在接收到播放语音信息指令后,播放语音信息。
在一优选实施例中,所述播放语音信息指令通过使用者点击该文字信息生成。
图8所示为对应于本发明第四实施例的基于语音识别的即时通信方法的即时通信***,如图8所示,该实施例中的即时通信***800包括如下模块:
语音信息获取模块801,用于接收服务器发送的语音信息;
文字信息获取模块802,用于接收服务器发送的识别该语音信息后生成的文字信息;
文字信息显示标记模块803,用于显示并标记该文字信息。
在一优选实施例中,所述***还包括:
标记信息获取模块804,用于接收服务器发送的标记信息。
在一优选实施例中,所述文字信息获取模块和所述标记信息获取模块同时执行,将所述文字信息和所述标记信息同时获取。
在一优选实施例中,文字信息显示标记模块用于显示所述文字信息,利用所述标记信息对所述文字信息进行标记。
在一优选实施例中,所述***还包括:
语音信息播放模块805,用于当接收到使用者的播放该语音信息的指令,播放该语音信息。
在一优选实施例中,所述播放该语音信息的指令通过使用者点击该文字信息生成。
在一优选实施例中,所述***还包括:
接收显示模块806,用于接收服务器发送的编辑后文字信息,并显示该编辑后文字信息。
在一优选实施例中,所述编辑后文字信息以覆盖编辑前文字信息的方式显示。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
综上所述,本申请实施例提出的基于语音识别的即时通信方法和即时通信***,至少具有以下优点:
(1)本申请实施例提出的基于语音识别的即时通信方法和即时通信 ***中,通过语音识别功能,克服了接收终端获得信息的障碍,方便了使用者的使用,避免了隐私泄露的问题。
(2)本申请实施例提出的基于语音识别的即时通信方法和即时通信***中,通过错误修改功能,使得发送终端有机会纠正语音识别***的错误;
(3)本申请实施例提出的基于语音识别的即时通信方法和即时通信***中,通过数据收集功能,获得真实识别错误数据以改进语音识别***的性能。
(4)本申请实施例提出的基于语音识别的即时通信方法和即时通信***中,错误纠正的步骤方便发送终端进行错误纠正;
(5)本申请实施例提出的基于语音识别的即时通信方法和即时通信***中,信息标记的步骤方便接收终端辨识收到的信息是虚拟键盘输入还是语音信息;
(6)本申请实施例提出的基于语音识别的即时通信方法和即时通信***中,如果是语音信息,接收终端可以点选识别语音信息后生成的文字信息,对原始的语音信息进行回放。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质 中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信号存储。信号可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信号。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多 个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种基于语音识别的即时通信方法和即时通信***,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (32)

  1. 一种基于语音识别的即时通信方法,其特征在于,包括:
    接收发送终端发送的语音信息;
    将该语音信息进行语音识别,生成文字信息;
    将该语音信息发送至接收终端;以及
    将该文字信息发送至接收终端。
  2. 如权利要求1所述的即时通信方法,其特征在于,在将该语音信息进行语音识别,生成文字信息之后,所述方法还包括:
    将该文字信息发送至发送终端。
  3. 如权利要求2所述的即时通信方法,其特征在于,在将该文字信息发送至发送终端之后,所述方法还包括:
    接收所述发送终端发出的编辑后文字信息,并发送至接收终端。
  4. 如权利要求3所述的即时通信方法,其特征在于,在将该语音信息进行语音识别,生成文字信息之后,并在接收所述发送终端发出的编辑后文字信息,并发送至接收终端之前,所述方法还包括:
    将辅助错误纠正信息发送至发送终端,所述辅助错误纠正信息包括针对所述文字信息的指定字、词或句的词图和候选字词。
  5. 如权利要求2所述的即时通信方法,其特征在于,在将该语音信息进行语音识别,生成文字信息之后,所述方法还包括:
    将该文字信息储存于数据库;
    在将该语音信息进行语音识别,生成文字信息之后,所述方法还包括:
    将辅助错误纠正信息发送至发送终端;
    接收所述发送终端发出的编辑后文字信息,并发送至接收终端;
    在接收发送终端发出的编辑后文字信息,并发送至接收终端之后,所述方法还包括:
    将编辑后文字信息发送至数据库,并与纠正前的所述文字信息关联。
  6. 如权利要求5所述的即时通信方法,其特征在于,所述辅助错误纠正信息包括针对所述文字信息的指定字、词或句的词图和候选字词,所述指定字、词或句的词图和候选字词从所述数据库中获得。
  7. 一种基于语音识别的即时通信方法,其特征在于,包括:
    录制语音信息并发送至服务器;
    接收经过识别该语音信息生成的文字信息,并显示该文字信息;
    在接收到纠正操作指令后,进入编辑文字信息的界面;
    显示编辑后文字信息,并将编辑后文字信息发送至服务器。
  8. 如权利要求7所述的即时通信方法,其特征在于,在接收经过识别该语音信息生成的文字信息,并显示该文字信息之后,所述方法还包括:
    接收服务器发送的辅助修改信息,所述辅助错误纠正信息包括针对所述文字信息的指定字、词或句的词图和候选字词,所述候选字词显示在所述编辑文字信息的界面中。
  9. 如权利要求7所述的即时通信方法,其特征在于,在接收经过识别该语音信息生成的文字信息,并显示该文字信息之后,所述方法还包括:
    在接收到播放语音信息指令后,播放语音信息。
  10. 如权利要求9所述的即时通信方法,其特征在于,所述播放语音信息指令通过使用者点击该文字信息生成。
  11. 一种基于语音识别的即时通信方法,其特征在于,包括:
    接收服务器发送的语音信息;
    接收服务器发送的识别该语音信息后生成的文字信息;
    显示并标记该文字信息。
  12. 如权利要求11所述的即时通信方法,其特征在于,所述方法还包括:
    接收服务器发送的标记信息。
  13. 如权利要求12所述的即时通信方法,其特征在于,所述显示并标记该文字信息的步骤包括:
    显示所述文字信息,利用所述标记信息对所述文字信息进行标记。
  14. 如权利要求11所述的即时通信方法,其特征在于,所述显示并标记该文字信息的步骤之后,所述方法还包括:
    当接收到使用者的播放该语音信息的指令,播放该语音信息,所述播放该语音信息的指令通过使用者点击该文字信息生成。
  15. 如权利要求11所述的即时通信方法,其特征在于,在显示并标记该文字信息的步骤之后,所述方法还包括:
    接收服务器发送的编辑后文字信息,并显示该编辑后文字信息。
  16. 如权利要求15所述的即时通信方法,其特征在于,所述编辑后文字信息以覆盖编辑前文字信息的方式显示。
  17. 一种基于语音识别的即时通信***,其特征在于,包括:
    语音信息接收模块,用于接收发送终端发送的语音信息;
    文字信息生成模块,用于将该语音信息进行语音识别,生成文字信息;
    第一发送模块,用于将该语音信息发送至接收终端;以及
    第二发送模块,用于将该文字信息发送至接收终端。
  18. 如权利要求17所述的即时通信***,其特征在于,所述***还包括:
    第三发送模块,用于将该文字信息发送至发送终端。
  19. 如权利要求18所述的即时通信***,其特征在于,所述***还包括:
    信息收发模块,用于接收所述发送终端发出的编辑后文字信息,并发送至接收终端。
  20. 如权利要求19所述的即时通信***,其特征在于,所述***还包括:
    第四发送模块,用于将辅助错误纠正信息发送至发送终端,所述辅助错误纠正信息包括针对所述文字信息的指定字、词或句的词图和候选字词。
  21. 如权利要求18所述的即时通信***,其特征在于,所述***还包括:
    第一存储模块,将该文字信息储存于数据库第四发送模块,用于将辅助错误纠正信息发送至发送终端;
    信息收发模块,用于接收所述发送终端发出的编辑后文字信息,并发送至接收终端;
    文字信息关联模块,用于将编辑后文字信息发送至数据库,并与纠正前的所述文字信息关联。
  22. 如权利要求21所述的即时通信***,其特征在于,所述辅助错误纠正信息包括针对所述文字信息的指定字、词或句的词图和候选字词,所述指定字、词或句的词图和候选字词从所述数据库中获得。
  23. 一种基于语音识别的即时通信***,其特征在于,包括:
    语音信息录制发送模块,用于录制语音信息并发送至服务器;
    文字信息接收显示模块,用于接收经过识别该语音信息生成的文字信息,并显示该文字信息;
    编辑模块,用于在接收到纠正操作指令后,进入编辑文字信息的界面;
    显示发送模块,用于显示编辑后文字信息,并将编辑后文字信息发送至服务器。
  24. 如权利要求23所述的即时通信***,其特征在于,所述***还包括:
    辅助修改信息接收模块,用于接收服务器发送的辅助修改信息,所述辅助错误纠正信息包括针对所述文字信息的指定字、词或句的词图和候选字词,所述候选字词显示在所述编辑文字信息的界面中。
  25. 如权利要求23所述的即时通信***,其特征在于,所述***还包括:
    语音信息播放模块,用于在接收到播放语音信息指令后,播放语音信息。
  26. 如权利要求25所述的即时通信***,其特征在于,所述播放语音信息指令通过使用者点击该文字信息生成。
  27. 一种基于语音识别的即时通信***,其特征在于,包括:
    语音信息获取模块,用于接收服务器发送的语音信息;
    文字信息获取模块,用于接收服务器发送的识别该语音信息后生成的文字信息;
    文字信息显示标记模块,用于显示并标记该文字信息。
  28. 如权利要求27所述的即时通信***,其特征在于,所述***还包括:
    标记信息获取模块,用于接收服务器发送的标记信息。
  29. 如权利要求28所述的即时通信***,其特征在于,文字信息显示标记模块用于显示所述文字信息,利用所述标记信息对所述文字信息进行标记。
  30. 如权利要求27所述的即时通信***,其特征在于,所述***还包括:
    语音信息播放模块,用于当接收到使用者的播放该语音信息的指令,播放该语音信息,所述播放该语音信息的指令通过使用者点击该文字信息生成。
  31. 如权利要求27所述的即时通信***,其特征在于,所述***还包括:
    接收显示模块,用于接收服务器发送的编辑后文字信息,并显示该编辑后文字信息。
  32. 如权利要求31所述的即时通信***,其特征在于,所述编辑后文字信息以覆盖编辑前文字信息的方式显示。
PCT/CN2017/071382 2016-01-26 2017-01-17 一种基于语音识别的即时通信方法和即时通信*** WO2017128991A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610052305.6 2016-01-26
CN201610052305.6A CN106997764B (zh) 2016-01-26 2016-01-26 一种基于语音识别的即时通信方法和即时通信***

Publications (1)

Publication Number Publication Date
WO2017128991A1 true WO2017128991A1 (zh) 2017-08-03

Family

ID=59397373

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/071382 WO2017128991A1 (zh) 2016-01-26 2017-01-17 一种基于语音识别的即时通信方法和即时通信***

Country Status (3)

Country Link
CN (1) CN106997764B (zh)
TW (1) TWI774654B (zh)
WO (1) WO2017128991A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112530435A (zh) * 2019-09-19 2021-03-19 比亚迪股份有限公司 数据传输方法、装置、***、可读存储介质及电子设备
CN113571061A (zh) * 2020-04-28 2021-10-29 阿里巴巴集团控股有限公司 语音转写文本编辑***、方法、装置及设备
CN115442273A (zh) * 2022-09-14 2022-12-06 润芯微科技(江苏)有限公司 一种基于语音识别的音频传输完整性监控方法和装置

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107689912B (zh) * 2017-09-15 2020-05-12 珠海格力电器股份有限公司 语音消息发送、播放、传输方法及装置、终端和服务器
CN107888479A (zh) * 2017-10-31 2018-04-06 深圳云之家网络有限公司 语音通信方法、装置、计算机设备及存储介质
CN108109625B (zh) * 2017-12-21 2021-07-20 北京华夏电通科技股份有限公司 手机语音识别内外网传输***及方法
CN110392158A (zh) * 2018-04-19 2019-10-29 成都野望数码科技有限公司 一种消息处理方法、装置以及终端设备
CN110570865A (zh) * 2018-06-06 2019-12-13 上海擎感智能科技有限公司 一种基于云端服务器的通信方法、***及云端服务器
CN109087641A (zh) * 2018-08-27 2018-12-25 杭州安恒信息技术股份有限公司 智能音箱、指令录入器及其安全预警方法、装置
CN111147948A (zh) * 2018-11-02 2020-05-12 北京快如科技有限公司 信息处理方法、装置及电子设备
CN109493665A (zh) * 2018-12-28 2019-03-19 南京红松信息技术有限公司 基于语音识别的快速答题方法及其***
CN109600307A (zh) * 2019-01-29 2019-04-09 北京百度网讯科技有限公司 即时通讯方法、终端、设备、计算机可读介质
CN109801627A (zh) * 2019-01-31 2019-05-24 冯泽 语音类信息处理方法、装置、计算机设备和存储介质
CN109922371B (zh) * 2019-03-11 2021-07-09 海信视像科技股份有限公司 自然语言处理方法、设备及存储介质
CN110943908A (zh) * 2019-11-05 2020-03-31 上海盛付通电子支付服务有限公司 语音消息发送方法、电子设备及介质
CN111698446B (zh) * 2020-05-26 2021-09-21 上海智勘科技有限公司 在实时视频中同时进行文本信息传输的方法
CN112651125A (zh) * 2020-12-22 2021-04-13 郑州捷安高科股份有限公司 仿真列车通信方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102710539A (zh) * 2012-05-02 2012-10-03 中兴通讯股份有限公司 语音信息传送方法及装置
CN102946499A (zh) * 2012-11-14 2013-02-27 广州市讯飞樽鸿信息技术有限公司 可视化语音信箱***及应用于可视化语音信箱***的方法
WO2013184048A1 (en) * 2012-06-04 2013-12-12 Telefonaktiebolaget Lm Ericsson (Publ) Method and message server for routing a speech message
CN103632670A (zh) * 2013-11-30 2014-03-12 青岛英特沃克网络科技有限公司 语音和文本消息自动转换***及其方法
CN104732975A (zh) * 2013-12-20 2015-06-24 华为技术有限公司 一种语音即时通讯方法及装置
CN105430208A (zh) * 2015-10-23 2016-03-23 小米科技有限责任公司 语音会话方法、装置及终端设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150254238A1 (en) * 2007-10-26 2015-09-10 Facebook, Inc. System and Methods for Maintaining Speech-To-Speech Translation in the Field
CN104007832B (zh) * 2013-02-25 2017-09-01 上海触乐信息科技有限公司 连续滑行输入文本的方法、***及设备
CN104700836B (zh) * 2013-12-10 2019-01-29 阿里巴巴集团控股有限公司 一种语音识别方法和***
KR20160008949A (ko) * 2014-07-15 2016-01-25 한국전자통신연구원 음성 대화 기반의 외국어 학습 방법 및 이를 위한 장치
CN104407834A (zh) * 2014-11-13 2015-03-11 腾讯科技(成都)有限公司 信息输入方法和装置
CN105159870B (zh) * 2015-06-26 2018-06-29 徐信 一种精准完成连续自然语音文本化的处理***及方法
CN105068982A (zh) * 2015-08-26 2015-11-18 百度在线网络技术(北京)有限公司 输入内容的修改方法和装置
CN105245917B (zh) * 2015-09-28 2018-05-04 徐信 一种多媒体语音字幕生成的***和方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102710539A (zh) * 2012-05-02 2012-10-03 中兴通讯股份有限公司 语音信息传送方法及装置
WO2013184048A1 (en) * 2012-06-04 2013-12-12 Telefonaktiebolaget Lm Ericsson (Publ) Method and message server for routing a speech message
CN102946499A (zh) * 2012-11-14 2013-02-27 广州市讯飞樽鸿信息技术有限公司 可视化语音信箱***及应用于可视化语音信箱***的方法
CN103632670A (zh) * 2013-11-30 2014-03-12 青岛英特沃克网络科技有限公司 语音和文本消息自动转换***及其方法
CN104732975A (zh) * 2013-12-20 2015-06-24 华为技术有限公司 一种语音即时通讯方法及装置
CN105430208A (zh) * 2015-10-23 2016-03-23 小米科技有限责任公司 语音会话方法、装置及终端设备

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112530435A (zh) * 2019-09-19 2021-03-19 比亚迪股份有限公司 数据传输方法、装置、***、可读存储介质及电子设备
CN112530435B (zh) * 2019-09-19 2024-04-16 比亚迪股份有限公司 数据传输方法、装置、***、可读存储介质及电子设备
CN113571061A (zh) * 2020-04-28 2021-10-29 阿里巴巴集团控股有限公司 语音转写文本编辑***、方法、装置及设备
CN115442273A (zh) * 2022-09-14 2022-12-06 润芯微科技(江苏)有限公司 一种基于语音识别的音频传输完整性监控方法和装置
CN115442273B (zh) * 2022-09-14 2023-04-07 润芯微科技(江苏)有限公司 一种基于语音识别的音频传输完整性监控方法和装置

Also Published As

Publication number Publication date
CN106997764B (zh) 2021-07-27
TWI774654B (zh) 2022-08-21
CN106997764A (zh) 2017-08-01
TW201733376A (zh) 2017-09-16

Similar Documents

Publication Publication Date Title
WO2017128991A1 (zh) 一种基于语音识别的即时通信方法和即时通信***
US11388291B2 (en) System and method for processing voicemail
US8605868B2 (en) System and method for externally mapping an interactive voice response menu
CN103035240B (zh) 用于使用上下文信息的语音识别修复的方法和***
CN205647778U (zh) 一种智能会议***
US8583093B1 (en) Playing local device information over a telephone connection
US20130144619A1 (en) Enhanced voice conferencing
CN103916513A (zh) 在通信终端记录通话信息的方法和设备
US9661133B2 (en) Electronic device and method for extracting incoming/outgoing information and managing contacts
TW201520794A (zh) 資料遷移方法及裝置
US20200118569A1 (en) Conference sound box and conference recording method, apparatus, system and computer storage medium
CN106847256A (zh) 一种语音转化聊天方法
CN107480146A (zh) 一种识别语种语音的会议纪要快速翻译方法
CN106899486B (zh) 一种消息显示方法及装置
CN106558311B (zh) 语音内容提示方法和装置
CN113055529A (zh) 录音控制方法和录音控制装置
CN111462726A (zh) 一种外呼应答方法、装置、设备及介质
WO2016107001A1 (zh) 一种记录语音通信信息的方法、终端及计算机存储介质
WO2023226726A1 (zh) 语音数据处理方法及装置
CN109147791A (zh) 一种速记***和方法
WO2022213943A1 (zh) 消息发送方法、消息发送装置、电子设备和存储介质
WO2017071210A1 (zh) 联系人的创建方法及装置
US20160028871A1 (en) Voice mail transcription
CN112911074A (zh) 一种语音通信处理方法、装置、设备和机器可读介质
JP2017216672A (ja) 通話装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17743606

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17743606

Country of ref document: EP

Kind code of ref document: A1