WO2018133798A1 - 基于语音识别的数据传输方法和装置 - Google Patents

基于语音识别的数据传输方法和装置 Download PDF

Info

Publication number
WO2018133798A1
WO2018133798A1 PCT/CN2018/073021 CN2018073021W WO2018133798A1 WO 2018133798 A1 WO2018133798 A1 WO 2018133798A1 CN 2018073021 W CN2018073021 W CN 2018073021W WO 2018133798 A1 WO2018133798 A1 WO 2018133798A1
Authority
WO
WIPO (PCT)
Prior art keywords
data transmission
transmission channel
voice
recognition result
segment
Prior art date
Application number
PCT/CN2018/073021
Other languages
English (en)
French (fr)
Inventor
林剑城
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018133798A1 publication Critical patent/WO2018133798A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72469User interfaces specially adapted for cordless or mobile telephones for operating the device by selecting functions from two or more displayed items, e.g. menus or icons
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/725Cordless telephones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of computer technology, and in particular, to a data transmission method and apparatus based on voice recognition.
  • the embodiment of the present application provides a data transmission method and apparatus based on voice recognition, which can improve the efficiency of online voice recognition.
  • a data transmission method based on speech recognition comprising:
  • a speech recognition based data transmission apparatus comprising: a processor and a memory, the memory storing computer readable instructions, the computer readable instructions being executed by the processor to:
  • a voice transmission-based data transmission method for a terminal comprising a processor and a memory, the method comprising:
  • a non-volatile storage medium having stored therein computer readable instructions executable by a processor to perform:
  • 1 is an application environment diagram of a data transmission method based on voice recognition in an embodiment
  • FIG. 2 is a schematic diagram showing the internal structure of a terminal for implementing a voice recognition based data transmission method in an embodiment
  • FIG. 3 is a schematic flow chart of a data transmission method based on voice recognition in an embodiment
  • FIG. 4 is a flow chart showing the steps of entering a voice input state in an embodiment
  • FIG. 5 is a schematic diagram of an interface when a voice input interface is not turned on in an embodiment
  • FIG. 6 is a schematic diagram of an interface after a voice input interface is opened in an embodiment
  • FIG. 7 is a schematic diagram of an interface when a voice input interface is opened in another embodiment
  • FIG. 8 is a flow chart showing the steps of receiving a voice recognition result matched with a transmitted voice segment through a data transmission channel in an embodiment
  • FIG. 9 is a schematic flow chart showing steps of establishing and maintaining a data transmission channel in an embodiment
  • FIG. 10 is a schematic flow chart of a data transmission method based on voice recognition in another embodiment
  • 11 is a timing diagram of a data transmission method based on voice recognition in an embodiment
  • FIG. 12 is a structural block diagram of a data transmission apparatus based on voice recognition in an embodiment
  • Figure 13 is a block diagram showing the structure of a data transmission apparatus based on speech recognition in another embodiment.
  • the traditional voice online identification method requires a waiting period of time for each voice recognition, and the voice recognition efficiency is low. Especially for mobile terminals that communicate over the mobile network, the problem is more obvious.
  • FIG. 1 is an application environment diagram of a data transmission method based on voice recognition in an embodiment.
  • the speech recognition based data transmission method is applied to a speech recognition based data transmission system.
  • the voice recognition based data transmission system includes a terminal 110 and a server 120, and the terminal 110 is connected to the server 120 through a network.
  • the terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like.
  • the server 120 may be an independent physical server or a physical server cluster.
  • FIG. 2 is a schematic diagram showing the internal structure of a terminal in an embodiment.
  • the terminal includes a processor connected through a system bus, a non-volatile storage medium, an internal memory and a network interface, a sound collecting device, a display screen, and an input device.
  • the non-volatile storage medium of the terminal stores an operating system, and further includes a data transmission device based on voice recognition, and the data transmission device based on voice recognition is used to implement a data transmission method based on voice recognition.
  • the processor is used to provide computing and control capabilities to support the operation of the entire terminal.
  • the internal memory in the terminal provides an environment for the operation of the voice recognition based data transmission device in the non-volatile storage medium, and the internal memory can store computer readable instructions that are executed by the processor
  • the processor can be caused to perform a speech recognition based data transmission method.
  • the network interface is used for network communication with the server, such as sending a voice segment to the server, receiving a voice recognition result returned by the server, and the like.
  • the display screen of the terminal may be a liquid crystal display or an electronic ink display screen.
  • the input device may be a touch layer covered on the display screen, or may be a button, a trackball or a touchpad provided on the terminal housing, or may be an external device. Keyboard, trackpad or mouse.
  • the terminal can be a mobile phone, a tablet or a personal digital assistant or a wearable device.
  • the structure shown in FIG. 2 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the terminal to which the solution of the present application is applied.
  • the specific terminal may include more than the figure or Fewer parts, or some parts, or different parts.
  • a data transmission method based on voice recognition is provided. This embodiment is applied to the terminal 110 in FIG. 1 to illustrate. The method specifically includes the following steps:
  • the voice input state refers to a state in which voice data is input.
  • the data transmission channel refers to the channel used for data transmission.
  • the terminal can run a client that supports voice input. When detecting that the client enters the voice input state, the terminal can establish a data transmission channel and maintain the established data transmission channel to transmit the subsequent voice. The voice data entered in the input state.
  • the terminal can detect an instruction to enter a voice input state, according to which the voice input state is entered.
  • the terminal may detect a pre-defined triggering operation for triggering an instruction to enter the voice input state, and trigger the corresponding instruction to enter the voice input state when the triggering operation is detected.
  • the triggering operation may be an operation on a control in the interface of the terminal, such as a touch operation on the control or a click operation of the cursor.
  • the triggering operation can also be a click on a predefined physical button, or a swaying operation on a predefined interface that can trigger an instruction to enter a voice input state, and the like.
  • the terminal may also detect a predefined interface state change for triggering the incoming voice input state, and enter a voice input state upon detecting the interface state change.
  • the predefined interface state change may be an interface state change when the terminal interface changes from the desktop to the client main interface when the client running on the terminal starts, or the terminal interface may be based on the user after the client is running. The state of the interface changes when the client's main interface changes to an interface that can perform voice input.
  • the terminal After detecting that the current voice input state is entered, the terminal sends a request for establishing a data transmission channel to the server, and after receiving a response message for the request fed back by the server, establishing a data transmission channel with the server, and Keep the data transmission channel.
  • the terminal may establish a TCP (Transmission Control Protocol) based data transmission channel with the server. Specifically, after detecting that the current voice input state is entered, the terminal sends a connection request message carrying a SYN (synchronous synchronization) message to the server; after receiving the connection request message, the server in the listening state feeds back the ACK to the terminal. (Acknowledgement confirmation character) response message to confirm the connection request, and change the current state from the listening state to the response state; after receiving the response message fed back by the server, the terminal updates the current state to the connection establishment state, and feeds back to the server.
  • a reply message of ACK (Acknowledgement Acknowledgement Character) confirms the connection, causing the server to change the current state from the response state to the connection establishment state.
  • the terminal can perform data transmission through the data transmission channel, and maintain the data transmission channel through a heartbeat mechanism in an idle phase in which the data transmission channel does not perform data transmission until The terminal actively closes the data transmission channel.
  • the voice segment refers to the voice data segmented according to a certain manner.
  • the voice segment may be voice data that is manually input by the user when the voice input is performed, and the terminal may acquire the voice data input by the user each time the user inputs the voice input, and input the user each time.
  • the voice data is used as a voice clip.
  • the terminal may call a local sound collection device to collect sound when detecting the user performing voice input, and form voice data.
  • the speech segment may be speech data of a predetermined duration.
  • the preset duration is a preset time interval for performing voice data interception, such as 200 milliseconds.
  • the terminal may start timing when detecting that the user performs voice input.
  • the time duration reaches the preset duration
  • the current input voice data is acquired as a voice segment, and the timing is restarted, and the time duration reaches the preset.
  • the duration is long, the currently input voice data is intercepted as a voice segment, and the timing operation is restarted until the user ends the voice input.
  • the terminal may sequentially send the obtained voice segments to the server through the data transmission channel in the order of acquisition.
  • the server After receiving the voice segment sent by the terminal, the server performs voice recognition according to the received voice segment, obtains a voice recognition result that matches the received voice segment, and sends the voice recognition result to the terminal through the data transmission channel. .
  • the terminal can run a client that supports voice input, and the terminal can close the data transmission channel when detecting that the client exits the voice input state.
  • the terminal can detect an instruction to exit the voice input state and exit the voice input state in accordance with the instruction.
  • the terminal may detect a pre-defined triggering operation for triggering an instruction to exit the voice input state, and trigger a corresponding instruction to exit the voice input state when the triggering operation is detected.
  • the terminal may also detect a predefined interface state change for triggering the exit voice input state, and exit the voice input state when the interface state change is detected.
  • the predefined interface state change may be an interface state change when the terminal interface is changed from the client main interface to the desktop when the client running on the terminal is closed; or the terminal interface may be based on the user when the client is running. The operation changes from the interface that can perform voice input to the interface state change when the client's main interface is used.
  • the terminal after detecting that the current voice input state is exited, the terminal sends a connection close message carrying a FIN (final end) message to the server; after receiving the connection close message, the server in the connection establishment state sends a message to the server.
  • the terminal feeds back a response message carrying an ACK (Acknowledgement Confirmation Character) to confirm that the terminal is terminated and continues to transmit data to the server.
  • the server After the response message is sent back to the terminal, and after the voice recognition result obtained by the voice segment sent by the terminal is sent, the server sends a connection close message carrying the FIN (final end) message to the terminal to notify the terminal.
  • the server has sent the data to be sent to the terminal.
  • the terminal After receiving the connection close message sent by the server, the terminal updates the current status to the connection closed state, and feeds back a response message carrying an ACK (Acknowledgement Confirmation Character) to the server to confirm that the connection is closed, so that the server updates the current status to the connection closed. status.
  • ACK Acknowledgement Confirmation Character
  • step S310 when the terminal performs step S304, S306 or S308, if it detects that the voice input state is currently exited, step S310 may be performed.
  • the above-mentioned voice recognition-based data transmission method establishes a data transmission channel when entering a voice input state, and can be immediately transmitted after inputting a voice segment, thereby improving data transmission efficiency, thereby improving the efficiency of voice recognition.
  • each of the voice segments acquired in sequence and the voice recognition result matched with the transmitted voice segment can be transmitted on the data transmission channel, and the data transmission channel is not closed until the voice input state is exited.
  • the step of entering a voice input state in the voice recognition based data transmission method includes:
  • the voice input interface is a window for performing voice input in the terminal main interface.
  • the voice input interface has two states: a collapsed state and an expanded state.
  • the open entry of the voice input interface is an operation entry for changing the state of the voice input interface.
  • the terminal detects a triggering operation acting on the opening portal to update the current state of the voice input interface. If the voice input interface is currently hidden, in the stowed state, the voice input interface is turned on; if the voice input interface is currently in the expanded state, the voice input interface is closed, so that the voice input interface is hidden.
  • the voice input interface in the main interface of the terminal is usually in a stowed state, and the open entrance of the voice input interface is displayed.
  • the open command refers to an instruction for triggering the opening of the voice input interface.
  • the terminal can obtain an opening instruction for the voice input interface triggered by the user acting on the opening. Specifically, the terminal may detect a predefined triggering operation for opening the portal, and triggering a corresponding opening command when the triggering operation is detected.
  • the triggering operation is an operation of opening an entry, such as a touch operation for opening an entrance or a cursor click operation.
  • the terminal displays the voice input interface according to the opening instruction.
  • the user intends to perform voice input when expanding the voice input interface, and sets a voice input state when the voice input interface is displayed, so as to establish a data transmission when determining that the user intends to perform voice input.
  • the channel can be transmitted immediately after inputting the voice segment, which can improve the data transmission efficiency, thereby improving the efficiency of voice recognition.
  • the step of exiting the voice input state in the voice recognition based data transmission method comprises: acquiring an interface hiding instruction for the voice input interface; hiding the voice input interface according to the interface hiding instruction.
  • the terminal may detect a predefined triggering operation for triggering the interface hiding instruction, and trigger the interface hiding instruction when the triggering operation is detected.
  • Trigger operation is an operation to open an entry, such as a touch operation to open an entry or a cursor click operation.
  • the triggering operation may also be a click on a predefined physical button, or an operation on other areas outside the voice input interface in the main interface of the terminal.
  • the user intends to end the voice input when triggering the hidden voice input interface, and exits the voice input state when setting the hidden voice input interface, so as to close the data after determining that the user intends to end the voice input.
  • the transmission channel enables the data transmission channel to be maintained when the user can perform voice input, and can transmit data through the data transmission channel when data needs to be transmitted, thereby improving data transmission efficiency, thereby improving the efficiency of voice recognition.
  • FIG. 5 is an interface diagram when the voice input interface is not turned on in an embodiment, and the interface includes an open entry 510 of the voice input interface.
  • FIG. 6 is a schematic diagram of an interface after the voice input interface is opened in an embodiment.
  • the terminal When the user clicks the open entry 510 of the voice input interface in the interface shown in FIG. 5, the terminal will be in the interface shown in FIG. 6.
  • the voice input interface 620 is shown.
  • the terminal When the user clicks on the voice input control 621, the terminal will acquire the voice segment input by the user.
  • the terminal display interface will change to the interface as shown in FIG.
  • the voice recognition based data transmission method further comprises: outputting a voice recognition result on the voice input interface; and canceling the output when the cancel operation for the output voice recognition result is detected The speech recognition result; when the confirmation input operation for the output speech recognition result is detected, the text entry operation is performed based on the output speech recognition result.
  • the cancel operation refers to an operation set in advance for canceling the voice recognition result of the current output.
  • the confirmation input operation refers to an operation set in advance for confirming the voice recognition result of the current output.
  • the terminal may display the voice recognition result in a text form in a predefined area in the voice input interface. The user can perform corresponding operations according to whether the displayed voice recognition result conforms to the content expressed by the user, so that the terminal performs different responses.
  • the terminal may detect the operation of the voice recognition result by the user, and determine that the user intends to cancel the currently output voice recognition result when the detected operation is consistent with the preset cancellation operation, and the terminal may cancel the output. Speech recognition results.
  • the operation detected by the terminal is consistent with the preset confirmation access operation, it is determined that the user intends to confirm the currently output voice recognition result, and the terminal can perform a text entry operation according to the output voice recognition result.
  • the speech recognition based data transmission method may be specifically applied to a session scenario of a client supporting a session.
  • the terminal may establish a session connection with the session server, and send the voice recognition result of the text entry operation to the session server through the session connection, so that the session server responds according to the session message sent by the terminal with the voice recognition result as the content.
  • the speech recognition result is outputted in the speech input interface, and the different operations of the user's speech recognition result for the output are detected to perform different responses, thereby improving the accuracy of the speech recognition.
  • FIG. 7 is a schematic diagram of an interface when the voice input interface is opened in another embodiment, where the interface diagram includes a voice input control 710 and a voice recognition result display area 720 .
  • the terminal acquires the voice segment input by the user, and sends the obtained voice segment to the server to receive the voice recognition result matched by the server and matched with the transmitted voice segment, and the voice fed back by the server.
  • the recognition result is displayed in the speech recognition result display area 720.
  • the terminal may cancel the outputted speech recognition result when the cancel operation is detected, and cancel the operation such as the operation acting on the voice input control 710 and sliding up.
  • the terminal may send the outputted speech recognition result to the session server upon detecting the confirmation input operation. Confirm the input operation such as the click operation after clicking the voice input control 710.
  • step S306 specifically includes: transmitting, by the data transmission channel, a voice segment to the server to which the data transmission channel is connected, so that the server receives the received voice segment according to the received multiple voice segments. Speech recognition is performed to obtain a speech recognition result that matches the transmitted speech segment.
  • the server may perform voice recognition according to the voice segment sent by the terminal. After receiving the voice segment, the server may perform voice recognition on the voice segment based on the voice recognition result of the plurality of voice segments that have completed the voice recognition, and obtain a voice recognition result that matches the voice segment received by the server. The server may also perform voice recognition on the voice segment after receiving the voice segment, and combine the received plurality of voice segments to obtain a voice recognition result that matches the voice segment received by the server.
  • the server performing voice recognition performs voice recognition according to the received plurality of voice segments, and obtains a voice recognition result that matches the transmitted voice segment, and the manner of combining the context of the voice segment before and after the voice recognition is performed. Make the speech recognition result more accurate.
  • step S308 includes: receiving, when the voice segment is sent through the data transmission channel, the voice recognition result fed back and transmitted by the server in parallel through the data transmission channel.
  • the terminal sends the voice segment to the server through the data transmission channel, and the server sends the voice recognition result to the terminal through the data transmission channel, which can be performed asynchronously.
  • the server may perform voice recognition on the received voice segment every time after receiving the voice segment sent by the terminal, and when the voice recognition result is obtained, the obtained voice recognition result may be immediately sent to the terminal through the data transmission channel.
  • the server when the server obtains the voice recognition result, the server can send the obtained recognition result to the terminal, and does not need to send the voice segment that needs to be sent to the server after the terminal completes transmission, thereby improving data transmission efficiency and further improving.
  • the step S308 in the voice transmission based data transmission method specifically includes the following steps:
  • S802. Receive, by using a data transmission channel, a data packet encapsulated according to an application layer protocol.
  • the data packet encapsulated according to the application layer protocol refers to a data packet obtained by the server to encapsulate the data to be transmitted according to the data packet format specified by the application layer protocol.
  • the server may encrypt the obtained voice recognition result according to a preset encryption manner.
  • the server then creates a data packet according to the application layer-based binary protocol, adjusts the data packet header according to the protocol standard, and adds the encrypted voice recognition result to the packet body of the data packet, completes the data packet encapsulation, and then encapsulates the completed data.
  • the packet is sent to the terminal through the data transmission channel.
  • the voice recognition result that needs to be transmitted is encrypted and then transmitted, thereby improving the security of the voice recognition result transmission.
  • the step of establishing a data transmission channel and maintaining in the voice recognition based data transmission method includes:
  • the idle state refers to a state in which data transmission is not performed.
  • the terminal may start when establishing a data transmission channel, and periodically detect whether there is data transmission in the data transmission channel. When detecting that the current data is transmitted through the data transmission channel, the terminal determines that the data transmission channel is maintained at this time, and waits for the detection time point to perform detection; when the terminal detects that no data is currently transmitted through the data transmission channel, the terminal determines that the time is The data transmission channel is in an idle state.
  • S906 Send a heartbeat packet through the data transmission channel when detecting that the data transmission channel is in an idle state.
  • the heartbeat data packet refers to a custom data packet in which the terminal notifies the server of the state of the terminal.
  • the terminal determines that the data transmission channel is in an idle state, it cannot determine whether the data transmission channel is maintained, and can send a heartbeat data packet to the server through the data transmission channel to notify the server terminal that the data transmission channel needs to be maintained with the server.
  • the preset duration is a time that the terminal presets to receive the response packet for the heartbeat packet.
  • a response packet is a custom packet that the server notifies the terminal of the status of the server. Receiving a response packet for a heartbeat packet transmitted through the data transmission channel within a preset duration indicates that the data transmission channel is maintained. If the response packet for the heartbeat packet transmitted through the data transmission channel is not received within the preset duration, it indicates that the data transmission channel is abnormal, and the terminal closes the abnormal data transmission channel, and re-establishes the data transmission channel and holds.
  • the heartbeat mechanism ensures that a data transmission channel that is normally available for data transmission is maintained before the data transmission channel is closed according to the user's intention, and can be immediately transmitted when data needs to be transmitted, thereby improving data transmission efficiency and further improving. Speech recognition efficiency.
  • a data transmission method based on voice recognition is provided for a terminal, and the method specifically includes the following steps:
  • S1002 display an open entry of the voice input interface; obtain an open command for opening the entry; and display a voice input interface according to the open command.
  • step S1006 periodically check whether the data transmission channel is in an idle state; if yes, go to step S1008, if no, continue to step S1006.
  • S1008 Send a heartbeat packet through a data transmission channel.
  • step S1010 Detect whether a response packet for the heartbeat packet transmitted through the data transmission channel is received within the preset duration; if yes, proceed to step S1006; if no, go to step S1012.
  • step S1012 closing the current data transmission channel, re-establishing the data transmission channel, and jumping to step S1006.
  • the terminal when the terminal acquires the input voice segment to be ready for transmission, the terminal may first detect whether the current data difference transmission channel is abnormal. Specifically, the terminal may invoke the operating system interface to detect the current network state. When the current network state is normal, determine that the data transmission channel is normal and maintained, and send the voice segment through the data transmission channel; when the current network state is abnormal, determine the data transmission. If the channel is abnormal, the abnormal data transmission channel is closed, and the data transmission channel is re-established, and the voice segment is transmitted through the re-established data transmission channel.
  • step S1018 the abnormal data transmission channel is closed, the data transmission channel is re-established, and step S1020 is performed.
  • S1020 Send a voice segment to the server to which the data transmission channel is connected through the data transmission channel, so that after receiving the sent voice segment, the server performs voice recognition according to the received multiple voice segments, and obtains and sends the voice segment. Matching speech recognition results.
  • S1024 Parsing the data packet to obtain an encrypted speech recognition result encapsulated in the data packet; decrypting the encrypted speech recognition result to obtain a speech recognition result that matches the transmitted speech segment.
  • S1034 Obtain an interface hiding instruction for the voice input interface; and hide the voice input interface according to the interface hiding instruction.
  • a processing method when an abnormality occurs in the data transmission channel is provided, which ensures that the data transmission channel can be normally maintained when the terminal needs to perform data transmission, thereby improving data transmission efficiency and further improving voice recognition efficiency.
  • the voice recognition based data transmission method detects the data transmission channel each time the terminal transmits the voice segment through the data transmission channel, and/or each time the terminal receives the voice recognition result through the data transmission channel. Whether it is abnormal; when the data transmission channel is abnormal, the data transmission channel is closed, the data transmission channel is re-established and maintained; the voice segment that needs to be sent is continuously transmitted through the re-established data transmission channel and/or the reception needs to be received. The speech recognition result.
  • the terminal may detect an error message fed back through the data transmission channel when transmitting data through the data transmission channel, and if it is determined that the data transmission channel is abnormal when the error message is detected, close the abnormal data transmission channel and restart Establish a data transmission channel to transmit data through the re-established data transmission channel.
  • Figure 11 is a timing diagram of a data transmission method based on speech recognition in one embodiment.
  • the terminal enters the voice input interface, initiates a request to establish a data transmission channel to the server, and prepares to output a voice recognition result.
  • the server listens to the request of the terminal, it accepts the request to establish a data transmission channel, establishes a data transmission channel with the terminal, and maintains.
  • the terminal sequentially acquires the voice segment input by the user, and can immediately send the voice segment to the server through the data transmission channel after acquiring one voice segment.
  • the server can immediately perform voice recognition, encrypt the obtained voice recognition result, and send it to the terminal asynchronously through the data transmission channel.
  • the terminal decrypts the encrypted speech recognition result sent by the server, and displays the decrypted speech recognition result.
  • the terminal sends the voice segment to the server and the server sends the voice recognition result to the terminal in parallel.
  • the terminal hides the voice input interface, the voice input is ended, and the server initiates a request to close the data transmission channel, and closes the data transmission channel after the server accepts the data transmission channel close request.
  • a voice recognition based data transmission apparatus including: a channel establishment module 1201, an acquisition module 1202, a transmission module 1203, a reception module 1204, and a channel closure module 1205.
  • the channel establishing module 1201 is configured to establish and maintain a data transmission channel when entering a voice input state.
  • the obtaining module 1202 is configured to sequentially acquire the input voice segment.
  • the sending module 1203 is configured to sequentially send the voice segment through the data transmission channel.
  • the receiving module 1204 is configured to receive, by using a data transmission channel, a voice recognition result that matches the transmitted voice segment.
  • the channel closing module 1205 is configured to close the data transmission channel when exiting the voice input state.
  • the above-mentioned voice recognition-based data transmission device establishes a data transmission channel when entering a voice input state, and can be immediately transmitted after inputting a voice segment, thereby improving data transmission efficiency, thereby improving the efficiency of voice recognition.
  • each of the voice segments acquired in sequence and the voice recognition result matched with the transmitted voice segment can be transmitted on the data transmission channel, and the data transmission channel is not closed until the voice input state is exited.
  • the channel establishment module 1201 is further configured to display an open entry of the voice input interface; obtain an open command for opening the entry; and display the voice input interface according to the open command.
  • the user intends to perform voice input when expanding the voice input interface, and sets a voice input state when the voice input interface is displayed, so as to establish a data transmission when determining that the user intends to perform voice input.
  • the channel can be transmitted immediately after inputting the voice segment, which can improve the data transmission efficiency, thereby improving the efficiency of voice recognition.
  • the channel shutdown module 1205 is further configured to acquire an interface hiding instruction for the voice input interface; the voice input interface is hidden according to the interface hiding instruction.
  • the user intends to end the voice input when triggering the hidden voice input interface, and exits the voice input state when setting the hidden voice input interface, so as to close the data after determining that the user intends to end the voice input.
  • the transmission channel enables the data transmission channel to be maintained when the user can perform voice input, and can transmit data through the data transmission channel when data needs to be transmitted, thereby improving data transmission efficiency, thereby improving the efficiency of voice recognition.
  • the sending module 1203 is further configured to send a voice segment to the server to which the data transmission channel is connected through the data transmission channel, so that after receiving the transmitted voice segment, the server according to the received multiple voices The segment performs speech recognition to obtain a speech recognition result that matches the transmitted speech segment.
  • the server performing voice recognition performs voice recognition according to the received plurality of voice segments, and obtains a voice recognition result that matches the transmitted voice segment, and the manner of combining the context of the voice segment before and after the voice recognition is performed. Make the speech recognition result more accurate.
  • the receiving module 1204 is further configured to receive, in parallel, the voice recognition result fed back and sent by the server through the data transmission channel when the voice segment is sent through the data transmission channel.
  • the server when the server obtains the voice recognition result, the server can send the obtained recognition result to the terminal, and does not need to send the voice segment that needs to be sent to the server after the terminal completes transmission, thereby improving data transmission efficiency and further improving.
  • the receiving module 1204 is further configured to receive, by using a data transmission channel, a data packet encapsulated according to an application layer protocol, parse the data packet, obtain an encrypted voice recognition result encapsulated in the data packet, and decrypt the encrypted voice recognition result. , obtaining a speech recognition result that matches the transmitted speech segment.
  • the voice recognition result that needs to be transmitted is encrypted and then transmitted, thereby improving the security of the voice recognition result transmission.
  • the channel establishing module 1201 is further configured to establish a data transmission channel; periodically detecting whether the data transmission channel is in an idle state; and when detecting that the data transmission channel is in an idle state, sending a heartbeat data packet through the data transmission channel; If the response packet for the heartbeat packet transmitted through the data transmission channel is not received within the preset duration, the data transmission channel is closed, and the data transmission channel is re-established and maintained.
  • the heartbeat mechanism ensures that a data transmission channel that is normally available for data transmission is maintained before the data transmission channel is closed according to the user's intention, and can be immediately transmitted when data needs to be transmitted, thereby improving data transmission efficiency and further improving. Speech recognition efficiency.
  • FIG. 13 is a structural block diagram of a voice recognition based data transmission apparatus 1200 in another embodiment.
  • the voice recognition based data transmission apparatus 1200 further includes an output module 1206.
  • An output module 1206, configured to output a voice recognition result on the voice input interface; when the cancel operation for the output voice recognition result is detected, cancel the output voice recognition result; when the confirmation input operation for the output voice recognition result is detected , the text entry operation is performed according to the output speech recognition result.
  • the speech recognition result is outputted in the speech input interface, and the different operations of the user's speech recognition result for the output are detected to perform different responses, thereby improving the accuracy of the speech recognition.
  • the speech recognition based data transmission device 1200 further comprises: a detection module 1207 for receiving speech recognition each time a speech segment is transmitted through the data transmission channel, and/or each time through the data transmission channel
  • a detection module 1207 for receiving speech recognition each time a speech segment is transmitted through the data transmission channel, and/or each time through the data transmission channel
  • a processing method when an abnormality occurs in the data transmission channel is provided, which ensures that the data transmission channel can be normally maintained when data transmission is required, the data transmission efficiency is improved, and the speech recognition efficiency is further improved.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

一种基于语音识别的数据传输方法和装置(1200),方法包括:当进入语音输入状态时,建立数据传输通道并保持(S302);依次获取输入的语音片段(S304);通过数据传输通道,依次发送语音片段(S306);通过数据传输通道,接收与发送的语音片段匹配的语音识别结果(S308);当退出语音输入状态时,关闭数据传输通道(S310)。

Description

基于语音识别的数据传输方法和装置
本申请要求于2017年01月22日提交中国专利局、申请号为201710047882.0、发明名称为“基于语音识别的数据传输方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种基于语音识别的数据传输方法和装置。
背景技术
随着计算机技术的发展,越来越多的计算机用户选择在计算机平台通过语音来进行意愿表达,以使计算机通过对用户语音数据进行识别,从而基于语音识别结果进行进一步处理。伴随着人们生活水平的提高,用户对于语音在线识别的需求越来越强烈。
发明内容
本申请实施例提供了一种基于语音识别的数据传输方法和装置,可以提高在线语音识别的效率。
一种基于语音识别的数据传输方法,所述方法包括:
当进入语音输入状态时,建立数据传输通道并保持;
依次获取输入的语音片段;
通过所述数据传输通道,依次发送所述语音片段;
通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果;
当退出所述语音输入状态时,关闭所述数据传输通道。
一种基于语音识别的数据传输装置,所述装置包括:处理器和存储器,所 述存储器上存储有计算机可读指令,所述计算机可读指令由所述处理器执行以完成以下操作:
当进入语音输入状态时,建立数据传输通道并保持;
依次获取输入的语音片段;
通过所述数据传输通道,依次发送所述语音片段;
通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果;
当退出所述语音输入状态时,关闭所述数据传输通道。
一种基于语音识别的数据传输方法,用于终端,该终端包括处理器和存储器,所述方法包括:
当进入语音输入状态时,建立数据传输通道并保持;
依次获取输入的语音片段;
通过所述数据传输通道,依次发送所述语音片段;
通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果;
当退出所述语音输入状态时,关闭所述数据传输通道。
一种非易失性存储介质,其中存储有计算机可读指令,所述计算机可读指令可以由处理器执行以完成:
当进入语音输入状态时,建立数据传输通道并保持;
依次获取输入的语音片段;
通过所述数据传输通道,依次发送所述语音片段;
通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果;
当退出所述语音输入状态时,关闭所述数据传输通道。
附图简要说明
图1为一个实施例中基于语音识别的数据传输方法的应用环境图;
图2为一个实施例中用于实现基于语音识别的数据传输方法的终端的内部 结构示意图;
图3为一个实施例中基于语音识别的数据传输方法的流程示意图;
图4为一个实施例中进入语音输入状态的步骤的流程示意图;
图5为一个实施例中语音输入界面未开启时的界面示意图;
图6为一个实施例中语音输入界面开启后的界面示意图;
图7为另一个实施例中语音输入界面开启时的界面示意图;
图8为一个实施例中通过数据传输通道,接收与发送的语音片段匹配的语音识别结果的步骤的流程示意图;
图9为一个实施例中建立数据传输通道并保持的步骤的流程示意图;
图10为另一个实施例中基于语音识别的数据传输方法的流程示意图;
图11为一个实施例中基于语音识别的数据传输方法的时序图;
图12为一个实施例中基于语音识别的数据传输装置的结构框图;
图13为另一个实施例中基于语音识别的数据传输装置的结构框图。
实施本发明的方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
传统的语音在线识别方式,每次进行语音识别都需要进行一段时间的等待,语音识别效率较低。尤其对于通过移动网络进行网络通信的移动终端来说,问题更为明显。
图1为一个实施例中基于语音识别的数据传输方法的应用环境图。参照图1,该基于语音识别的数据传输方法应用于基于语音识别的数据传输***。基于语音识别的数据传输***包括终端110和服务器120,终端110通过网络与服务器120连接。终端110具体可以是台式终端或移动终端,移动终端具体可以手机、平板电脑、笔记本电脑等中的至少一种。服务器120具体可以是独立的物理服务器,也可以是物理服务器集群。
图2为一个实施例中终端的内部结构示意图。如图2所示,该终端包括通过***总线连接的处理器、非易失性存储介质、内存储器和网络接口、声音采集装置、显示屏和输入装置。其中,终端的非易失性存储介质存储有操作***,还包括一种基于语音识别的数据传输装置,该基于语音识别的数据传输装置用于实现一种基于语音识别的数据传输方法。该处理器用于提供计算和控制能力,支撑整个终端的运行。终端中的内存储器为非易失性存储介质中的基于语音识别的数据传输装置的运行提供环境,该内存储器中可储存有计算机可读指令,该计算机可读指令被所述处理器执行时,可使得所述处理器执行基于语音识别的数据传输方法。网络接口用于与服务器进行网络通信,如发送语音片段至服务器,接收服务器返回的语音识别结果等。终端的显示屏可以是液晶显示屏或者电子墨水显示屏等,输入装置可以是显示屏上覆盖的触摸层,也可以是终端外壳上设置的按键、轨迹球或触控板,也可以是外接的键盘、触控板或鼠标等。该终端可以是手机、平板电脑或者个人数字助理或穿戴式设备等。图2中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端的限定,具体的终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
如图3所示,在一个实施例中,提供了一种基于语音识别的数据传输方法,本实施例以该方法应用于上述图1中的终端110来举例说明。该方法具体包括如下步骤:
S302,当进入语音输入状态时,建立数据传输通道并保持。
其中,语音输入状态是指进行语音数据输入的状态。数据传输通道是指用于进行数据传输的通道。在本实施例中,终端上可运行有支持语音输入的客户端,终端可在检测到该客户端进入语音输入状态时,建立数据传输通道,并保持建立的数据传输通道,以传输后续在语音输入状态下输入的语音数据。
在一个实施例中,终端可检测用于进入语音输入状态的指令,根据该指令进入语音输入状态。具体地,终端可检测预定义的用于触发进入语音输入状态的指令的触发操作,在检测到该触发操作时触发相应的进入语音输入状态的指 令。触发操作可以是对终端的界面中的控件的操作,比如对控件的触控操作或者光标的点击操作等。触发操作也可以是对预定义的物理按钮的点击,或者针对预定义的可触发进入语音输入状态的指令的界面的晃动操作等。
在一个实施例中,终端也可检测预定义的用于触发进入语音输入状态的界面状态变化,在检测到该界面状态变化时进入语音输入状态。具体地,预定义的界面状态变化可以是终端上运行的客户端启动时,终端界面由桌面变化为客户端主界面时的界面状态变化;也可以是该客户端在运行后,终端界面根据用户操作由客户端主界面变化为可进行语音输入的界面时的界面状态变化。
进一步地,终端在检测到当前进入语音输入状态后,向服务器发送建立数据传输通道的请求,并在接收到服务器反馈的针对该请求的应答消息后,建立与服务器之间的数据传输通道,并保持该数据传输通道。
在一个实施例中,终端可与服务器间建立基于TCP(Transmission Control Protocol传输控制协议)的数据传输通道。具体地,终端在检测到当前进入语音输入状态后,向服务器发送携带SYN(synchronous同步)消息的连接请求报文;处于监听状态的服务器在接收到该连接请求报文后,向终端反馈携带ACK(Acknowledgement确认字符)的应答消息以确认连接请求,并将当前状态由监听状态变化为响应状态;终端在接收到服务器反馈的应答消息后,将当前状态更新为连接建立状态,并向服务器反馈携带ACK(Acknowledgement确认字符)的应答消息以确认连接,使得服务器将当前状态由响应状态变化为连接建立状态。
更进一步地,终端在建立与服务器之间的数据传输通道后,可通过该数据传输通道进行数据传输,并在数据传输通道在未进行数据传输的空闲阶段,通过心跳机制保持该数据传输通道直至终端主动关闭该数据传输通道。
S304,依次获取输入的语音片段。
其中,语音片段是指按照某种方式分段划分后的语音数据。在一个实施例中,语音片段可以是用户在进行语音输入时,人为分次输入的语音数据,终端可在每次检测到用户进行语音输入时,获取用户输入的语音数据,将用户每次 输入的语音数据作为一个语音片段。具体地,终端可在检测到用户进行语音输入时调用本地的声音采集装置采集声音,形成语音数据。
在一个实施例中,语音片段可以是预设时长的语音数据。预设时长是预先设置的用于进行语音数据截取的时间间隔,比如200毫秒等。具体地,终端可在检测到用户进行语音输入时开始计时,当计时时长达到预设时长时,获取当前输入的语音数据为一个语音片段,并重新开始计时,且继续执行当计时时长达到预设时长时,截取当前输入的语音数据为一个语音片段,并重新开始计时的操作,直至用户结束语音输入。
S306,通过数据传输通道,依次发送语音片段。
具体地,终端可将依次获取的语音片段,按照获取顺序,依次通过数据传输通道发送至服务器。
S308,通过数据传输通道,接收与发送的语音片段匹配的语音识别结果。
具体地,服务器在接收到终端发送的语音片段后,根据接收到的语音片段进行语音识别,得到与接收到的语音片段匹配的语音识别结果,再将该语音识别结果通过数据传输通道发送至终端。
S310,当退出语音输入状态时,关闭数据传输通道。
具体地,终端上可运行有支持语音输入的客户端,终端可在检测到该客户端退出语音输入状态时,关闭数据传输通道。
在一个实施例中,终端可检测用于退出语音输入状态的指令,根据该指令退出语音输入状态。具体地,终端可检测预定义的用于触发退出语音输入状态的指令的触发操作,在检测到该触发操作时触发相应的退出语音输入状态的指令。终端也可检测预定义的用于触发退出语音输入状态的界面状态变化,在检测到该界面状态变化时退出语音输入状态。具体地,预定义的界面状态变化可以是终端上运行的客户端关闭时,终端界面由客户端主界面变化为桌面时的界面状态变化;也可以是该客户端在运行时,终端界面根据用户操作由可进行语音输入的界面变化为客户端主界面时的界面状态变化。
在一个实施例中,终端在检测到当前退出语音输入状态后,向服务器发送 携带FIN(final结束)消息的连接关闭报文;处于连接建立状态的服务器在接收到该连接关闭报文后,向终端反馈携带ACK(Acknowledgement确认字符)的应答消息,以确认知晓终端结束向服务器继续发送数据。服务器在向终端反馈应答消息后,且在将根据该终端发送的语音片段识别得到的语音识别结果发送完毕后,向该终端发送携带FIN(final结束)消息的连接关闭报文,以通知该终端,服务器已将需发送给该终端的数据发送完毕。终端在接收到服务器发送的连接关闭报文后,将当前状态更新为连接关闭状态,并向服务器反馈携带ACK(Acknowledgement确认字符)的应答消息以确认连接关闭,使得服务器将当前状态更新为连接关闭状态。
在一个实施例中,终端在执行步骤S304、S306或S308时,若检测到当前退出语音输入状态,均可执行步骤S310。
上述基于语音识别的数据传输方法,进入语音输入状态时便建立数据传输通道,后续在输入语音片段后可立即传输,可提高数据传输效率,从而提高了语音识别的效率。在建立数据传输通道后,依次获取的各语音片段,以及与发送的语音片段匹配的语音识别结果,均可在该数据传输通道上进行传输,直到退出语音输入状态时才关闭该数据传输通道,不需要在每次进行数据传输时重新建立新的数据传输通道,这样极大地避免了由于频繁地建立和关闭数据传输通道导致的额外耗时,提高了数据传输效率,进一步提高了语音识别效率。
如图4所示,在一个实施例中,基于语音识别的数据传输方法中进入语音输入状态的步骤包括:
S402,显示语音输入界面的开启入口。
其中,语音输入界面是终端主界面中用于进行语音输入的窗口。语音输入界面具有两种状态:收起状态和展开状态。语音输入界面的开启入口是用于改变语音输入界面状态的操作入口。终端检测作用于该开启入口的触发操作,更新语音输入界面的当前状态。若语音输入界面当前被隐藏,处于收起状态,则开启语音输入界面;若语音输入界面当前处于展开状态,则关闭语音输入界面,使得语音输入界面被隐藏。终端主界面中的语音输入界面通常处于收起状态, 显示语音输入界面的开启入口。
S404,获取针对开启入口的开启指令。
其中,开启指令是指用于触发开启语音输入界面的指令。终端可获取用户作用于开启入口而触发的针对语音输入界面的开启指令。具体地,终端可检测针对开启入口的预定义的触发操作,在检测到该触发操作时触发相应的开启指令。触发操作是对开启入口的操作,比如对开启入口的触控操作或者光标点击操作等。
S406,根据开启指令,展示语音输入界面。
具体地,终端在检测到针对语音输入界面的开启指令后,根据该开启指令展示语音输入界面。
在本实施例中,基于人因工程学认定用户在展开语音输入界面时意图进行语音输入,并设定展示语音输入界面时进入语音输入状态,以在判定用户意图进行语音输入时便建立数据传输通道,后续在输入语音片段后可立即传输,可提高数据传输效率,从而提高了语音识别的效率。
进一步地,在一个实施例中,基于语音识别的数据传输方法中退出语音输入状态的步骤包括:获取针对语音输入界面的界面隐藏指令;根据界面隐藏指令隐藏语音输入界面。
具体地,终端可检测预定义的用于触发界面隐藏指令的触发操作,在检测到该触发操作时触发界面隐藏指令。触发操作对开启入口的操作,比如对开启入口的触控操作或者光标点击操作等。触发操作也可以是对预定义的物理按钮的点击,或者是对终端主界面中语音输入界面外的其他区域的操作等。
在本实施例中,基于人因工程学认定用户在触发隐藏语音输入界面时意图结束语音输入,并设定隐藏语音输入界面时退出语音输入状态,以在判定用户意图结束语音输入后才关闭数据传输通道,使得在用户可能进行语音输入时,数据传输通道被保持,并在有数据需要传输时即可通过该数据传输通道进行数据传输,可提高数据传输效率,从而提高了语音识别的效率。
举例说明,参照图5,图5为一个实施例中语音输入界面未开启时的界面示 意图,该界面包括语音输入界面的开启入口510。参照图6,图6为一个实施例中语音输入界面开启后的界面示意图,当用户点击如图5所示界面中的语音输入界面的开启入口510时,终端将在如图6所示的界面中展示语音输入界面620。当用户点击语音输入控件621时,终端将获取用户输入的语音片段。当用户点击语音输入界面的开启入口610时,终端展示界面将变化至如图5所示的界面。
更进一步地,在一个实施例中,步骤S308之后,基于语音识别的数据传输方法还包括:在语音输入界面输出语音识别结果;当检测到针对输出的语音识别结果的取消操作时,撤销输出的语音识别结果;当检测到针对输出的语音识别结果的确认输入操作时,根据输出的语音识别结果进行文本录入操作。
其中,取消操作是指预先设置的用于取消当前输出的语音识别结果的操作。确认输入操作是指预先设置的用于确认当前输出的语音识别结果的操作。具体地,终端在接收到服务器返回的语音识别结果后,可将该语音识别结果以文本形式在语音输入界面中预定义的区域展示。用户可根据展示的语音识别结果是否符合用户意图表达的内容进行相应的操作,以使得终端进行不同的响应。
终端可在语音输入界面输出语音识别结果后,检测用户针对语音识别结果的操作,在检测到的操作与预先设置的取消操作一致时,判定用户意图取消当前输出的语音识别结果,终端可撤销输出的语音识别结果。终端检测到的操作与预先设置的确认出入操作一致时,判定用户此时意图确认当前输出的语音识别结果,终端可根据输出的语音识别结果进行文本录入操作。
在一个实施例中,基于语音识别的数据传输方法可具体应用于支持会话的客户端的会话场景中。终端可建立与会话服务器间的会话连接,将进行文本录入操作的语音识别结果通过会话连接发送至会话服务器,以使得会话服务器根据终端发送的以语音识别结果为内容的会话消息进行响应。
在本实施例中,将语音识别结果在语音输入界面中输出,通过检测用户针对输出的语音识别结果的不同操作以进行不同的响应,提高语音识别的准确性。
举例说明,参照图7,图7为另一个实施例中语音输入界面开启时的界面示意图,该界面示意图包括语音输入控件710和语音识别结果展示区720。当用户 点击语音输入控件710时,终端将获取用户输入的语音片段,并将获取的语音片段发送至服务器,以接收服务器反馈的与发送的语音片段匹配的语音识别结果,并将服务器反馈的语音识别结果在语音识别结果展示区720进行展示。终端可在检测到取消操作时撤销输出的语音识别结果,取消操作比如作用于语音输入控件710且向上滑动的操作。终端可在检测到确认输入操作时向会话服务器发送输出的语言识别结果。确认输入操作比如点击语音输入控件710后的抬起操作。
在一个实施例中,步骤S306具体包括:通过数据传输通道,依次向数据传输通道所连接至的服务器发送语音片段,使得服务器在接收到发送的语音片段后,根据已接收到的多个语音片段进行语音识别,得到与发送的语音片段匹配的语音识别结果。
具体地,终端在将获取的语音片段发送至服务器后,服务器可根据终端发送的语音片段进行语音识别。服务器可在每次接收到语音片段后,基于已完成语音识别的多个语音片段的语音识别结果,对该语音片段进行语音识别,得到与服务器接收到的语音片段匹配的语音识别结果。服务器也可在每次接收到语音片段后,将该语音片段结合已接收到的多个语音片段进行语音识别,得到与服务器接收到的语音片段匹配的语音识别结果。
在本实施例中,进行语音识别的服务器根据已接收到的多个语音片段进行语音识别,得到与发送的语音片段匹配的语音识别结果,这种结合前后语音片段的语境进行语音识别的方式使得语音识别结果更为准确。
进一步地,在一个实施例中,步骤S308包括:在通过数据传输通道发送语音片段时,通过数据传输通道并行接收服务器反馈的与发送的语音识别结果。
具体地,终端通过数据传输通道向服务器发送语音片段与服务器通过数据传输通道向终端发送语音识别结果可异步进行。服务器可在每次接收到终端发送的语音片段后即对已经接收到的语音片段进行语音识别,并在得到语音识别结果时,可立即将得到的语音识别结果通过数据传输通道发送至终端。
在本实施例中,服务器在得到语音识别结果时,即可向终端发送得到的识 别结果,无需在终端将需要发送至服务器的语音片段传输完成后再进行发送,提高了数据传输效率,进一步提高了语音识别效率。
如图8所示,在一个实施例中,基于语音识别的数据传输方法中步骤S308具体包括如下步骤:
S802,通过数据传输通道,接收按照应用层协议封装的数据包。
具体地,按照应用层协议封装的数据包是指服务器根据应用层协议规定的数据包格式将需要进行传输的数据进行封装得到的数据包。在本实施例中,服务器在根据终端发送的语音片段得到语音识别结果后,可根据预先设置的加密方式对得到的语音识别结果进行加密。服务器再根据基于应用层的二进制协议制作数据包,按照协议标准调整数据包头,并将加密后的语音识别结果添加的到数据包的包体中,完成数据包的封装,再将封装完成的数据包通过数据传输通道发送至终端。
S804,解析数据包,得到数据包中封装的加密的语音识别结果。
S806,将加密的语音识别结果解密,得到与发送的语音片段匹配的语音识别结果。
在本实施例中,将需要进行传输的语音识别结果进行加密后再传输,提高了语音识别结果传输的安全性。
如图9所示,在一个实施例中,基于语音识别的数据传输方法中建立数据传输通道并保持的步骤包括:
S902,建立数据传输通道。
S904,定期检测数据传输通道是否处于空闲状态。
其中,定期是指周期性地执行某操作。空闲状态是指未进行数据传输的状态。具体地,终端可在建立数据传输通道时开始,定期检测数据传输通道中是否有数据传输。终端在检测到当前有数据通过数据传输通道传输时,判定此时数据传输通道被保持,等待下次检测时间点进行检测;终端在检测到当前未有数据通过数据传输通道传输时,判定此时数据传输通道处于空闲状态。
S906,当检测到数据传输通道处于空闲状态时,通过数据传输通道发送心 跳数据包。
具体地,心跳数据包是指终端向服务器通知终端状态的自定义数据包。终端在判定数据传输通道处于空闲状态时,无法判定数据传输通道是否被保持,可通过数据传输通道向服务器发送心跳数据包,以通知服务器终端需要保持与服务器之间的数据传输通道。
S908,若在预设时长内未接收到通过数据传输通道传来的针对心跳数据包的应答包,则关闭数据传输通道,重新建立数据传输通道并保持。
具体地,预设时长是终端预先设置的接收针对心跳数据包的应答包的等到时间。应答包是指服务器向终端通知服务器状态的自定义数据包。在预设时长内接收到通过数据传输通道传来的针对心跳数据包的应答包,则表示数据传输通道被保持。若在预设时长内未接收到通过数据传输通道传来的针对心跳数据包的应答包,则表示数据传输通道异常,终端则关闭出现异常的数据传输通道,重新建立数据传输通道并保持。
在本实施例中,通过心跳机制保证在根据用户意图关闭数据传输通道前,保持有正常可供数据传输的数据传输通道,在需要传输数据时可立即传输,提高了数据传输效率,进一步提高了语音识别效率。
如图10所示,在一个实施例中,提供了一种基于语音识别的数据传输方法,用于终端,该方法具体包括如下步骤:
S1002,显示语音输入界面的开启入口;获取针对开启入口的开启指令;根据开启指令,展示语音输入界面。
S1004,建立数据传输通道。
S1006,定期检测数据传输通道是否处于空闲状态;若是,则跳转到步骤S1008,若否,则继续执行步骤S1006。
S1008,通过数据传输通道发送心跳数据包。
S1010,检测预设时长内是否接收到通过数据传输通道传来的针对心跳数据包的应答包;若是,则继续执行步骤S1006;若否,则跳转到步骤S1012。
S1012,关闭当前数据传输通道,重新建立数据传输通道,并跳转到步骤 S1006。
S1014,依次获取输入的语音片段。
S1016,在准备发送语音片段时检测数据传输通道是否异常;若是,则执行步骤S1018;若否,则执行步骤S1020。
在本实施例中,终端在获取输入的语音片段准备发送时,可先检测当前数据差传输通道是否异常。具体地,终端可调用操作***接口检测当前网络状态,在当前网络状态正常时,判定数据传输通道正常且被保持,并通过该数据传输通道发送语音片段;在当前网络状态异常时,判定数据传输通道出现异常,则关闭该出现异常的数据传输通道,并重新建立数据传输通道,通过重新建立的数据传输通道发送语音片段。
S1018,关闭异常的数据传输通道,重新建立数据传输通道,并执行步骤S1020。
S1020,通过数据传输通道依次向数据传输通道所连接至的服务器发送语音片段,使得服务器在接收到发送的语音片段后,根据已接收到的多个语音片段进行语音识别,得到与发送的语音片段匹配的语音识别结果。
S1022,在通过数据传输通道发送语音片段时,通过数据传输通道并行接收服务器反馈的按照应用层协议封装的数据包。
S1024,解析数据包,得到数据包中封装的加密的语音识别结果;将加密的语音识别结果解密,得到与发送的语音片段匹配的语音识别结果。
S1026,在语音输入界面输出语音识别结果。
S1028,判断针对输出的语音识别结果的操作为取消操作还是确认输入操作;若为取消操作,则执行步骤S1030;若为确认输入操作,则执行步骤S1032。
S1030,撤销输出的语音识别结果。
S1032,根据输出的语音识别结果进行文本录入操作。
S1034,获取针对语音输入界面的界面隐藏指令;根据界面隐藏指令隐藏语音输入界面。
S1036,关闭数据传输通道。
在本实施例中,提供了数据传输通道出现异常时的处理方法,保证了终端在需要进行数据传输时,数据传输通道能被正常保持,提高了数据传输效率,进一步提高了语音识别效率。
在一个实施例中,基于语音识别的数据传输方法中在终端每次通过数据传输通道发送语音片段时,和/或,在终端每次通过数据传输通道接收语音识别结果时,则检测数据传输通道是否异常;当数据传输通道异常时,则关闭数据传输通道,重新建立数据传输通道并保持;通过重新建立的数据传输通道继续发送当次需发送的所述语音片段和/或接收当次需接收的所述语音识别结果。
具体地,终端可在通过数据传输通道传输数据时,检测通过数据传输通道反馈的错误消息,在检测到错误消息时,判定数据传输通道出现异常,则关闭该出现异常的数据传输通道,并重新建立数据传输通道,通过重新建立的数据传输通道传输数据。
图11为一个实施例中基于语音识别的数据传输方法的时序图。参考图11,用户在点击终端界面展示的开启入口后,终端进入语音输入界面,向服务器发起建立数据传输通道的请求,并准备输出语音识别结果。服务器监听到终端的请求后,接受该建立数据传输通道的请求,建立与终端间的数据传输通道并保持。
终端依次获取用户输入的语音片段,并在每次获取一个语音片段后,可立即通过数据传输通道将该语音片段发送至服务器。服务器在接收到该语音片段后,可立即进行语音识别,将得到的语音识别结果加密后通过数据传输通道异步发送至终端。终端对服务器发送的加密后的语音识别结果进行解密,将解密的语音识别结果进行展示。
在数据传输通道保持阶段,终端向服务器发送语音片段与服务器向终端发送语音识别结果可并行进行。在终端隐藏语音输入界面时,结束语音输入,向服务器发起关闭数据传输通道请求,在服务器接受数据传输通道关闭请求后关闭数据传输通道。
如图12所示,在一个实施例中,提供了一种基于语音识别的数据传输装置,包括:通道建立模块1201、获取模块1202、发送模块1203、接收模块1204和通道关闭模块1205。
通道建立模块1201,用于当进入语音输入状态时,建立数据传输通道并保持。
获取模块1202,用于依次获取输入的语音片段。
发送模块1203,用于通过数据传输通道,依次发送语音片段。
接收模块1204,用于通过数据传输通道,接收与发送的语音片段匹配的语音识别结果。
通道关闭模块1205,用于当退出语音输入状态时,关闭数据传输通道。
上述基于语音识别的数据传输装置,进入语音输入状态时便建立数据传输通道,后续在输入语音片段后可立即传输,可提高数据传输效率,从而提高了语音识别的效率。在建立数据传输通道后,依次获取的各语音片段,以及与发送的语音片段匹配的语音识别结果,均可在该数据传输通道上进行传输,直到退出语音输入状态时才关闭该数据传输通道,不需要在每次进行数据传输时重新建立新的数据传输通道,这样极大地避免了由于频繁地建立和关闭数据传输通道导致的额外耗时,提高了数据传输效率,进一步提高了语音识别效率。
在一个实施例中,通道建立模块1201还用于显示语音输入界面的开启入口;获取针对开启入口的开启指令;根据开启指令,展示语音输入界面。
在本实施例中,基于人因工程学认定用户在展开语音输入界面时意图进行语音输入,并设定展示语音输入界面时进入语音输入状态,以在判定用户意图进行语音输入时便建立数据传输通道,后续在输入语音片段后可立即传输,可提高数据传输效率,从而提高了语音识别的效率。
在一个实施例中,通道关闭模块1205还用于获取针对语音输入界面的界面隐藏指令;根据界面隐藏指令隐藏语音输入界面。
在本实施例中,基于人因工程学认定用户在触发隐藏语音输入界面时意图结束语音输入,并设定隐藏语音输入界面时退出语音输入状态,以在判定用户 意图结束语音输入后才关闭数据传输通道,使得在用户可能进行语音输入时,数据传输通道被保持,并在有数据需要传输时即可通过该数据传输通道进行数据传输,可提高数据传输效率,从而提高了语音识别的效率。
在一个实施例中,发送模块1203还用于通过数据传输通道,依次向数据传输通道所连接至的服务器发送语音片段,使得服务器在接收到发送的语音片段后,根据已接收到的多个语音片段进行语音识别,得到与发送的语音片段匹配的语音识别结果。
在本实施例中,进行语音识别的服务器根据已接收到的多个语音片段进行语音识别,得到与发送的语音片段匹配的语音识别结果,这种结合前后语音片段的语境进行语音识别的方式使得语音识别结果更为准确。
在一个实施例中,接收模块1204还用于在通过数据传输通道发送语音片段时,通过数据传输通道并行接收服务器反馈的与发送的语音识别结果。
在本实施例中,服务器在得到语音识别结果时,即可向终端发送得到的识别结果,无需在终端将需要发送至服务器的语音片段传输完成后再进行发送,提高了数据传输效率,进一步提高了语音识别效率。
在一个实施例中,接收模块1204还用于通过数据传输通道,接收按照应用层协议封装的数据包;解析数据包,得到数据包中封装的加密的语音识别结果;将加密的语音识别结果解密,得到与发送的语音片段匹配的语音识别结果。
在本实施例中,将需要进行传输的语音识别结果进行加密后再传输,提高了语音识别结果传输的安全性。
在一个实施例中,通道建立模块1201还用于建立数据传输通道;定期检测数据传输通道是否处于空闲状态;当检测到数据传输通道处于空闲状态时,通过数据传输通道发送心跳数据包;若在预设时长内未接收到通过数据传输通道传来的针对心跳数据包的应答包,则关闭数据传输通道,重新建立数据传输通道并保持。
在本实施例中,通过心跳机制保证在根据用户意图关闭数据传输通道前,保持有正常可供数据传输的数据传输通道,在需要传输数据时可立即传输,提 高了数据传输效率,进一步提高了语音识别效率。
图13为另一个实施例中基于语音识别的数据传输装置1200的结构框图,参照图13,该基于语音识别的数据传输装置1200还包括:输出模块1206。
输出模块1206,用于在语音输入界面输出语音识别结果;当检测到针对输出的语音识别结果的取消操作时,撤销输出的语音识别结果;当检测到针对输出的语音识别结果的确认输入操作时,根据输出的语音识别结果进行文本录入操作。
在本实施例中,将语音识别结果在语音输入界面中输出,通过检测用户针对输出的语音识别结果的不同操作以进行不同的响应,提高语音识别的准确性。
在一个实施例中,该基于语音识别的数据传输装置1200还包括:检测模块1207,用于在每次通过数据传输通道发送语音片段时,和/或,在每次通过数据传输通道接收语音识别结果时,则检测数据传输通道是否异常;当数据传输通道异常时,则关闭数据传输通道,重新建立数据传输通道并保持;通过重新建立的数据传输通道继续发送当次需发送的所述语音片段和/或接收当次需接收的所述语音识别结果。
在本实施例中,提供了数据传输通道出现异常时的处理方法,保证了在需要进行数据传输时,数据传输通道能被正常保持,提高了数据传输效率,进一步提高了语音识别效率。
实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细, 但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (28)

  1. 一种基于语音识别的数据传输方法,所述方法包括:
    当进入语音输入状态时,建立数据传输通道并保持;
    依次获取输入的语音片段;
    通过所述数据传输通道,依次发送所述语音片段;
    通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果;
    当退出所述语音输入状态时,关闭所述数据传输通道。
  2. 根据权利要求1所述的方法,其特征在于,所述进入语音输入状态的步骤包括:
    显示语音输入界面的开启入口;
    获取针对所述开启入口的开启指令;
    根据所述开启指令,展示语音输入界面。
  3. 根据权利要求2所述的方法,其特征在于,所述退出所述语音输入状态的步骤包括:
    获取针对所述语音输入界面的界面隐藏指令;
    根据所述界面隐藏指令隐藏所述语音输入界面。
  4. 根据权利要求2所述的方法,其特征在于,所述通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果之后,所述方法还包括:
    在所述语音输入界面输出所述语音识别结果;
    当检测到针对输出的所述语音识别结果的取消操作时,撤销输出的所述语音识别结果;
    当检测到针对输出的所述语音识别结果的确认输入操作时,根据输出的所述语音识别结果进行文本录入操作。
  5. 根据权利要求1所述的方法,其特征在于,所述通过所述数据传输通道,依次发送所述语音片段的步骤包括:
    通过所述数据传输通道,依次向所述数据传输通道所连接至的服务器发送所述语音片段,使得所述服务器在接收到发送的所述语音片段后,根据已接收 到的多个语音片段进行语音识别,得到与发送的所述语音片段匹配的语音识别结果。
  6. 根据权利要求5所述的方法,其特征在于,所述通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果的步骤包括:
    在通过所述数据传输通道发送语音片段时,通过所述数据传输通道并行接收所述服务器反馈的与发送的语音识别结果。
  7. 根据权利要求1至4中任一项所述的方法,其特征在于,所述通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果包括:
    通过所述数据传输通道,接收按照应用层协议封装的数据包;
    解析所述数据包,得到所述数据包中封装的加密的语音识别结果;
    将加密的语音识别结果解密,得到与发送的所述语音片段匹配的语音识别结果。
  8. 根据权利要求1至6中任一项所述的方法,其特征在于,所述建立数据传输通道并保持的步骤包括:
    建立数据传输通道;
    定期检测所述数据传输通道是否处于空闲状态;
    当检测到所述数据传输通道处于空闲状态时,通过所述数据传输通道发送心跳数据包;
    若在预设时长内未接收到通过所述数据传输通道传来的针对所述心跳数据包的应答包,则关闭所述数据传输通道,重新建立数据传输通道并保持。
  9. 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:
    在每次通过所述数据传输通道发送所述语音片段时,和/或,在每次通过所述数据传输通道接收所述语音识别结果时,则
    检测所述数据传输通道是否异常;
    当所述数据传输通道异常时,则
    关闭所述数据传输通道,重新建立数据传输通道并保持;
    通过重新建立的所述数据传输通道,继续发送当次需发送的所述语音片段和/或接收当次需接收的所述语音识别结果。
  10. 一种基于语音识别的数据传输装置,其特征在于,所述装置包括:处理器和存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令由所述处理器执行以完成以下操作:
    当进入语音输入状态时,建立数据传输通道并保持;
    依次获取输入的语音片段;
    通过所述数据传输通道,依次发送所述语音片段;
    通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果;
    当退出所述语音输入状态时,关闭所述数据传输通道。
  11. 根据权利要求10所述的装置,其特征在于,所述进入语音输入状态的步骤包括:
    显示语音输入界面的开启入口;
    获取针对所述开启入口的开启指令;
    根据所述开启指令,展示语音输入界面。
  12. 根据权利要求11所述的装置,其特征在于,所述退出所述语音输入状态的步骤包括:
    获取针对所述语音输入界面的界面隐藏指令;
    根据所述界面隐藏指令隐藏所述语音输入界面。
  13. 根据权利要求11所述的装置,其特征在于,在所述通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果之后,所述计算机可读指令还使所述处理器执行以下操作:
    在所述语音输入界面输出所述语音识别结果;
    当检测到针对输出的所述语音识别结果的取消操作时,撤销输出的所述语音识别结果;
    当检测到针对输出的所述语音识别结果的确认输入操作时,根据输出的所 述语音识别结果进行文本录入操作。
  14. 根据权利要求10所述的装置,其特征在于,所述通过所述数据传输通道,依次发送所述语音片段的步骤包括:
    通过所述数据传输通道,依次向所述数据传输通道所连接至的服务器发送所述语音片段,使得所述服务器在接收到发送的所述语音片段后,根据已接收到的多个语音片段进行语音识别,得到与发送的所述语音片段匹配的语音识别结果。
  15. 根据权利要求14所述的装置,其特征在于,所述通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果的步骤包括:
    在通过所述数据传输通道发送语音片段时,通过所述数据传输通道并行接收所述服务器反馈的与发送的语音识别结果。
  16. 根据权利要求10至13中任一项所述的装置,其特征在于,所述通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果包括:
    通过所述数据传输通道,接收按照应用层协议封装的数据包;
    解析所述数据包,得到所述数据包中封装的加密的语音识别结果;
    将加密的语音识别结果解密,得到与发送的所述语音片段匹配的语音识别结果。
  17. 根据权利要求10至15中任一项所述的装置,其特征在于,所述建立数据传输通道并保持的步骤包括:
    建立数据传输通道;
    定期检测所述数据传输通道是否处于空闲状态;
    当检测到所述数据传输通道处于空闲状态时,通过所述数据传输通道发送心跳数据包;
    若在预设时长内未接收到通过所述数据传输通道传来的针对所述心跳数据包的应答包,则关闭所述数据传输通道,重新建立数据传输通道并保持。
  18. 根据权利要求10至15中任一项所述的装置,其特征在于,所述计算机可读指令还使所述处理器执行以下操作:
    在每次通过所述数据传输通道发送所述语音片段时,和/或,在每次通过所述数据传输通道接收所述语音识别结果时,则
    检测所述数据传输通道是否异常;
    当所述数据传输通道异常时,则关闭所述数据传输通道,重新建立数据传输通道并保持;
    通过重新建立的所述数据传输通道继续发送当次需发送的所述语音片段和/或接收当次需接收的所述语音识别结果。
  19. 一种基于语音识别的数据传输方法,用于终端,该终端包括处理器和存储器,所述方法包括:
    当进入语音输入状态时,建立数据传输通道并保持;
    依次获取输入的语音片段;
    通过所述数据传输通道,依次发送所述语音片段;
    通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果;
    当退出所述语音输入状态时,关闭所述数据传输通道。
  20. 根据权利要求19所述的方法,其特征在于,所述进入语音输入状态的步骤包括:
    显示语音输入界面的开启入口;
    获取针对所述开启入口的开启指令;
    根据所述开启指令,展示语音输入界面。
  21. 根据权利要求20所述的方法,其特征在于,所述退出所述语音输入状态的步骤包括:
    获取针对所述语音输入界面的界面隐藏指令;
    根据所述界面隐藏指令隐藏所述语音输入界面。
  22. 根据权利要求20所述的方法,其特征在于,所述通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果之后,所述方法还包括:
    在所述语音输入界面输出所述语音识别结果;
    当检测到针对输出的所述语音识别结果的取消操作时,撤销输出的所述语音识别结果;
    当检测到针对输出的所述语音识别结果的确认输入操作时,根据输出的所述语音识别结果进行文本录入操作。
  23. 根据权利要求19所述的方法,其特征在于,所述通过所述数据传输通道,依次发送所述语音片段的步骤包括:
    通过所述数据传输通道,依次向所述数据传输通道所连接至的服务器发送所述语音片段,使得所述服务器在接收到发送的所述语音片段后,根据已接收到的多个语音片段进行语音识别,得到与发送的所述语音片段匹配的语音识别结果。
  24. 根据权利要求23所述的方法,其特征在于,所述通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果的步骤包括:
    在通过所述数据传输通道发送语音片段时,通过所述数据传输通道并行接收所述服务器反馈的与发送的语音识别结果。
  25. 根据权利要求19至22中任一项所述的方法,其特征在于,所述通过所述数据传输通道,接收与发送的所述语音片段匹配的语音识别结果包括:
    通过所述数据传输通道,接收按照应用层协议封装的数据包;
    解析所述数据包,得到所述数据包中封装的加密的语音识别结果;
    将加密的语音识别结果解密,得到与发送的所述语音片段匹配的语音识别结果。
  26. 根据权利要求19至24中任一项所述的方法,其特征在于,所述建立数据传输通道并保持的步骤包括:
    建立数据传输通道;
    定期检测所述数据传输通道是否处于空闲状态;
    当检测到所述数据传输通道处于空闲状态时,通过所述数据传输通道发送心跳数据包;
    若在预设时长内未接收到通过所述数据传输通道传来的针对所述心跳数据 包的应答包,则关闭所述数据传输通道,重新建立数据传输通道并保持。
  27. 根据权利要求19至24中任一项所述的方法,其特征在于,所述方法还包括:
    在每次通过所述数据传输通道发送所述语音片段时,和/或,在每次通过所述数据传输通道接收所述语音识别结果时,则
    检测所述数据传输通道是否异常;
    当所述数据传输通道异常时,则
    关闭所述数据传输通道,重新建立数据传输通道并保持;
    通过重新建立的所述数据传输通道,继续发送当次需发送的所述语音片段和/或接收当次需接收的所述语音识别结果。
  28. 一种非易失性存储介质,其特征在于,其中存储有计算机可读指令,所述计算机可读指令可以由处理器执行以完成如权利要求1至9中任一项的方法。
PCT/CN2018/073021 2017-01-22 2018-01-17 基于语音识别的数据传输方法和装置 WO2018133798A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710047882.0 2017-01-22
CN201710047882.0A CN108346429B (zh) 2017-01-22 2017-01-22 基于语音识别的数据传输方法和装置

Publications (1)

Publication Number Publication Date
WO2018133798A1 true WO2018133798A1 (zh) 2018-07-26

Family

ID=62907776

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/073021 WO2018133798A1 (zh) 2017-01-22 2018-01-17 基于语音识别的数据传输方法和装置

Country Status (2)

Country Link
CN (1) CN108346429B (zh)
WO (1) WO2018133798A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111081248A (zh) * 2019-12-27 2020-04-28 安徽仁昊智能科技有限公司 一种人工智能语音识别装置
CN111755008B (zh) * 2020-06-11 2022-05-27 北京字节跳动网络技术有限公司 信息处理方法、装置、电子设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020116180A1 (en) * 2001-02-20 2002-08-22 Grinblat Zinovy D. Method for transmission and storage of speech
US20040215464A1 (en) * 2002-01-09 2004-10-28 Nelson Warren Fred Voice activated-automotive window display unit
US20090018818A1 (en) * 2007-07-10 2009-01-15 Aibelive Co., Ltd. Operating device for natural language input
CN103209271A (zh) * 2013-03-05 2013-07-17 胡东明 内嵌移动式语音通讯设备的远程智能控制***和方法
CN104285428A (zh) * 2012-05-08 2015-01-14 三星电子株式会社 用于运行通信服务的方法和***

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5244663B2 (ja) * 2009-03-18 2013-07-24 Kddi株式会社 音声によってテキストを入力する音声認識処理方法及びシステム
CN102299934A (zh) * 2010-06-23 2011-12-28 上海博路信息技术有限公司 一种基于云模式和语音识别的语音输入方法
CN104517609A (zh) * 2013-09-27 2015-04-15 华为技术有限公司 一种语音识别方法及装置
CN105988581B (zh) * 2015-06-16 2019-03-08 恒大法拉第未来智能汽车(广东)有限公司 一种语音输入方法及装置
CN105094717B (zh) * 2015-07-15 2019-02-26 百度在线网络技术(北京)有限公司 基于语音输入的打印方法、打印装置和打印机
CN105302925A (zh) * 2015-12-10 2016-02-03 百度在线网络技术(北京)有限公司 推送语音搜索数据的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020116180A1 (en) * 2001-02-20 2002-08-22 Grinblat Zinovy D. Method for transmission and storage of speech
US20040215464A1 (en) * 2002-01-09 2004-10-28 Nelson Warren Fred Voice activated-automotive window display unit
US20090018818A1 (en) * 2007-07-10 2009-01-15 Aibelive Co., Ltd. Operating device for natural language input
CN104285428A (zh) * 2012-05-08 2015-01-14 三星电子株式会社 用于运行通信服务的方法和***
CN103209271A (zh) * 2013-03-05 2013-07-17 胡东明 内嵌移动式语音通讯设备的远程智能控制***和方法

Also Published As

Publication number Publication date
CN108346429A (zh) 2018-07-31
CN108346429B (zh) 2022-07-08

Similar Documents

Publication Publication Date Title
US9666190B2 (en) Speech recognition using loosely coupled components
US11057500B2 (en) Publication of applications using server-side virtual screen change capture
JP5883841B2 (ja) 片方向通信を使用する分散型音声認識
CN105099984B (zh) 一种app间账号互通的方法和装置
JP6266812B2 (ja) マルチレートストリーミングを用いたBluetooth(登録商標) Low Energy2次データチャネル
CN106888079A (zh) 资源分配方法及装置
CN113676741B (zh) 数据传输方法、装置、存储介质及电子设备
US11140534B2 (en) Non-intrusive proximity based advertising and message delivery
WO2019120102A1 (zh) 无线手持电子装置、智能电子设备及其配对连接方法
CN109994115B (zh) 通讯方法及装置、数据处理方法及设备
CN111930709B (zh) 数据存储方法、装置、电子设备和计算机可读介质
WO2018133798A1 (zh) 基于语音识别的数据传输方法和装置
US10133307B2 (en) Dock for extending the utility of an electronic device
US11341963B2 (en) Electronic apparatus and method for controlling same
WO2019032205A1 (en) VIRTUAL PROFILE FOR BLUETOOTH
US20180048624A1 (en) Extensible, plug-n-play, private, secure network gateway
US11947640B2 (en) Adaptive, multi-channel, embedded application programming interface (API)
US20140370814A1 (en) Connecting wireless devices
WO2021103741A1 (zh) 内容处理方法、装置、计算机设备及存储介质
US20160285924A1 (en) Communication channel creation using sound stream
JP7242248B2 (ja) 電子機器、その制御方法、およびそのプログラム
US10778674B2 (en) Voice authentication and setup for wireless media rendering system
JP2018521551A (ja) 暗号化オーディオセッションを確立する方法及びシステム
US10027652B2 (en) Secured agent communications
WO2019090712A1 (zh) 屏幕在线分享方法及主分享终端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18742357

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18742357

Country of ref document: EP

Kind code of ref document: A1