CN104899087A

CN104899087A - Speech recognition achieving method and system for third-party applications

Info

Publication number: CN104899087A
Application number: CN201510334239.7A
Authority: CN
Inventors: 王夏鸣; 胡浩; 赵志翔; 陶涛; 童勇勇; 崔阿鹏; 储双双
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2015-06-16
Filing date: 2015-06-16
Publication date: 2015-09-09
Anticipated expiration: 2035-06-16
Also published as: CN104899087B

Abstract

The invention discloses a speech recognition achieving method and system for third-party applications. The method comprises that an auxiliary client arranged at a first terminal obtains a speech input instruction initiated by a main client arranged at a second terminal; the auxiliary client produces a background recording requirement according to the speech input instruction and transmits the background recording requirement to an operation system of the first terminal to request the first terminal to call recording equipment of the second terminal to perform recording; the auxiliary client recognizes speech information obtained through recording by controlling the second terminal through the first terminal, so that the main client processes speech recognition results. By means of the method and the system, third-party speech recognition user operations can be simplified, and the third-party speech recognition efficiency is improved.

Description

The speech recognition implementation method of third-party application and system

Technical field

The embodiment of the present invention relates to application software and the network communications technology, particularly relates to a kind of speech recognition implementation method and system of third-party application.

Background technology

System iOS 8.0 version of i Phone (iPhone) supports the input method of third party's keyboard, but due to System Privileges regulation, third party's keyboard does not have authority to access the microphone of i Phone, sound-recording function cannot be provided on keyboard, also just cannot support the speech identifying function of third party's input method.

The speech recognition schemes of present iOS 8 third party keyboard is: when user needs by speech recognition input characters, first in the interface of third party's keyboard, load button is clicked, jump to the speech recognition master routine that iOS provides, in this speech recognition master routine, carrying out speech recognition, (this master routine develops the program provided as iOS, there is authority access microphone, carry out phonetic entry); Need user manually to return the application of keyboard place after speech recognition, longer paste on demand text filed, the system that recalls pastes menu, pastes, completes input.

There is complicated operation in existing iOS8 third party's speech recognition schemes, the problem that interaction flow is tediously long.Complete flow process to need altogether: 1. click microphone->2. and jump to master routine->3. phonetic entry->4. and copy identification content->5. manually to return former application->6. long by text filed->7. click paste totally 7 steps.

Summary of the invention

The invention provides a kind of speech recognition implementation method and system of third-party application, to realize providing easy third party's speech recognition schemes.

First aspect, embodiments provides a kind of speech recognition implementation method of third-party application, comprising:

Be configured at the assistant client terminal of first terminal, obtain the speech-input instructions of the primary client initiation being configured at the second terminal;

Described assistant client terminal produces backstage recording request according to described speech-input instructions, and is transferred to the operating system of described first terminal, records with the sound pick-up outfit of asking described first terminal to call described second terminal;

Described assistant client terminal controls described second terminal by described first terminal and identifies the voice messaging obtained of recording, for described primary client process institute speech recognition result.

Second aspect, the speech recognition that the embodiment of the present invention additionally provides a kind of third-party application realizes system, comprising:

Assistant client terminal and primary client, described assistant client terminal is configured in first terminal, and described primary client is configured in the second terminal; Described assistant client terminal comprises:

Instruction acquisition module, for obtaining the speech-input instructions that primary client is initiated;

Recording control module, for producing backstage recording request according to described speech-input instructions, and is transferred to the operating system of described first terminal, records with the sound pick-up outfit of asking described first terminal to call described second terminal;

Speech recognition controlled module, is identified, for described primary client processed voice recognition result the voice messaging obtained of recording for being controlled described second terminal by described first terminal;

Described primary client comprises:

Instruction initiation module, for initiating described speech-input instructions;

Result treatment module, for the treatment of institute's speech recognition result.

After the present invention is configured at the speech-input instructions of the primary client of the second terminal by the assistant client terminal acquisition being configured at first terminal, generate backstage recording request and record based on this recording request call second terminal, achieving the mode of being recorded by backstage and make primary client obtain the effect of recording authority from the second terminal.Assistant client terminal controls described second terminal by first terminal and identifies the voice messaging obtained of recording and export, and carries out subsequent treatment, realize third-party speech recognition for primary client to voice identification result.In prior art, need to perform: 1. click microphone->2. and jump to master routine->3. phonetic entry->4. and copy identification content->5. manually to return former application->6. long by text filed->7. click paste, complex operation.In the present invention, user only needs to click microphone (triggering voice input instruction), and input voice information can realize speech recognition, without the need to carrying out jumping to master routine, copying identification content, manually returning former application, length by operation that is text filed and click paste, reach the effect of the user operation simplifying third party's speech recognition, improve third party's audio identification efficiency.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the speech recognition implementation method of a third-party application in the embodiment of the present invention one;

Fig. 2 is the process flow diagram of the speech recognition implementation method of a third-party application in the embodiment of the present invention two;

Fig. 3 is the process flow diagram of the speech recognition implementation method of another third-party application in the embodiment of the present invention two;

Fig. 4 is the structural representation that the speech recognition of a third-party application in the embodiment of the present invention three realizes system;

Fig. 5 is the structural representation that the speech recognition of another third-party application in the embodiment of the present invention three realizes system;

Fig. 6 is the structural representation that the speech recognition of another third-party application in the embodiment of the present invention three realizes system.

Embodiment

Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not entire infrastructure.

Embodiment one

The process flow diagram of the speech recognition implementation method of the third-party application that Fig. 1 provides for the embodiment of the present invention one, the present embodiment is applicable to the situation of being carried out speech recognition in iOS by third-party application, the method can by being configured with the first terminal of assistant client terminal (as Apple Watch, apple wrist-watch) and be configured with primary client the second terminal (as iPhone, i Phone) cooperatively interact to perform, specifically comprise the steps:

Step 110, be configured at the assistant client terminal of first terminal, obtain the speech-input instructions that the primary client that is configured at the second terminal is initiated.

Preferably, first terminal is portable intelligent wearable device, such as intelligent watch, intelligent glasses etc.Second terminal is compare the electronic equipment with higher height reason ability, such as smart mobile phone, panel computer etc. than first terminal.

In the embodiment of the present invention, the application that third party possesses speech recognition demand is installed, such as third party's keyboard in primary client.For third party's keyboard, be configured with record button in third party's keyboard, the click situation of primary client to record button is monitored.If user presses record button, then listen to the event of pressing; If user lifts record button, then listen to the event of lifting.When listen to press event time, triggering voice identification sign on; When listen to lift event time, triggering voice identification halt instruction.Wherein, phonetic entry sign on, or phonetic entry halt instruction all belongs to speech-input instructions.Or, speech-input instructions also can by voice activation detect silence suppression (VAD) mode input.Can also by button click as phonetic entry sign on, button click is as phonetic entry halt instruction again.

First terminal, by communicating with the second terminal, obtains the speech-input instructions that primary client is initiated.The communication that can realize between first terminal and the second terminal by the WatchKit of iOS.Such as adopt WatchKit monitoring thread to communicate, WatchKit monitoring thread is for communicate with apple wrist-watch for realizing i Phone.

Step 120, assistant client terminal produce backstage recording request according to speech-input instructions, and are transferred to the operating system of first terminal, record with the sound pick-up outfit of asking first terminal to call the second terminal.

If get the speech-input instructions that primary client is initiated, assistant client terminal generates the recording request of corresponding backstage.Such as, when receiving phonetic entry sign on, generate backstage recording and start request; When receiving phonetic entry halt instruction, generate backstage recording and stop request.

After assistant client terminal generates backstage recording request (backstage recording starts request or backstage recording stops request), the recording request of this backstage is transferred to the operating system of first terminal in first terminal (this locality), wherein, the operating system of first terminal has the authority that scheduling second terminal carries out backstage recording.Afterwards, the sound pick-up outfit that the operating system of first terminal calls the second terminal is recorded.

Step 130, assistant client terminal control the second terminal by first terminal and identify, for primary client processed voice recognition result the voice messaging obtained of recording.

If need to carry out speech recognition, the operating system of assistant client terminal notice first terminal, controls the second terminal by the operating system of first terminal and identifies the voice messaging obtained of recording.The identification to voice messaging is realized by speech recognition technology.Speech recognition technology can use the technical scheme provided in prior art, repeats no more herein.

The technical scheme of the present embodiment, can utilize the operating system of first terminal to call the second terminal and record, and makes third-party application can realize recording by consistency operation.In prior art, need to perform: 1. click microphone->2. and jump to master routine->3. phonetic entry->4. and copy identification content->5. manually to return former application->6. long by text filed->7. click paste, complex operation.In the present embodiment, user only needs to click microphone (triggering voice input instruction), and input voice information, without the need to carrying out jumping to master routine, copying identification content, manually returning former application, length by operations such as text filed and click paste, reach the effect of the user operation simplifying third party's speech recognition in iOS, improve third party's audio identification efficiency.

Embodiment two

The present embodiment additionally provides a kind of speech recognition implementation method of third-party application, as illustrating embodiment one, as shown in Figure 2, step 110, be configured at the assistant client terminal of first terminal, obtain the speech-input instructions of the primary client initiation being configured at the second terminal, comprising:

Step 110 ', be configured at the assistant client terminal of first terminal by monitoring thread, monitor the shared region that is configured in the second terminal, to obtain the speech-input instructions in described primary client write shared region.

Because the second terminal has larger memory capacity compared with first terminal, therefore can distribute one and carry out the exclusive storage area of data sharing with first terminal in the second terminal, be called shared region.First terminal is after connecting with the second terminal, and assistant client terminal is monitored shared region by monitoring thread (as WatchKit monitoring thread).When have in shared region new data stored in time, assistant client terminal can to new stored in data read.

Accordingly, step 130, assistant client terminal control described second terminal by described first terminal and identify the voice messaging obtained of recording, and comprise for described primary client processed voice recognition result:

Step 130 ', assistant client terminal controls the second terminal by first terminal and identifies the voice messaging that obtains of recording, and voice identification result is write shared region, for primary client processed voice recognition result.

Voice identification result is written in shared region after carrying out speech recognition by the second terminal.Voice identification result is read in the primary client being configured at the second terminal from shared region, and processes voice identification result.

The present embodiment additionally provides a kind of speech recognition implementation method of third-party application, be specifically described as to above-described embodiment, step 130, assistant client terminal control described second terminal by described first terminal and identify the voice messaging obtained of recording, and implement by any one mode following:

1, assistant client terminal controls the second terminal by first terminal, voice messaging is sent to server and identifies, and receive voice identification result.

2, assistant client terminal controls the second terminal by first terminal, carries out this locality identify voice messaging.

First terminal, by controlling thread (as WatchKit controls thread), controls the second terminal and carries out speech recognition.Can determine that using server to carry out identifying or carry out this locality identifies according to the Internet Use of the processing power of the second terminal and the second terminal.

The technical scheme that the present embodiment provides, if use server to carry out speech recognition, can use less system resource to realize speech identifying function, improve the resource utilization of the second terminal in the second terminal.If use client to carry out this locality to voice messaging to identify, server can not be relied on to carry out speech recognition, avoid, because network failure causes obtaining the problem of voice identification result, improving the reliability of speech recognition.

The embodiment of the present invention additionally provides a kind of speech recognition implementation method of third-party application, is specifically described as to above-described embodiment, and in step 110, speech-input instructions is initiated in primary client, comprising:

The primary client of third party's input method receives the phonetic entry sign on and phonetic entry halt instruction that user inputs at interface of input method, writes described shared region.

A kind of implementation of third-party application is third party's input method.Third party's input method is configured with assistant client terminal on first terminal, and the second terminal is configured with primary client.As a kind of implementation: in the interface of input method of user in primary client, by pressing corresponding function button, triggering voice input sign on; By lifting corresponding function button, triggering voice input halt instruction.Wherein, corresponding function button such as icon is the record button of loudspeaker, or icon is the record button etc. of red circular.

The technical scheme that the present embodiment provides, phonetic entry sign on and the phonetic entry halt instruction of user's input can be received in third party's interface of input method, and be sent to first terminal by shared region, realize the effect of triggering voice input in the interface of third party's input method.

The embodiment of the present invention additionally provides a kind of speech recognition implementation method of third-party application, is specifically described as to above-described embodiment, and primary client process institute speech recognition result in step 140, comprising:

Voice identification result is read from described shared region in described primary client, and shows in the text box of interface of input method.

Shared region can provide the read-write operation of data for primary client and assistant client terminal.Voice identification result is written in shared region after identifying voice messaging by the second terminal.Voice identification result is read from shared region in primary client, and shows in the text box of interface of input method, reaches the effect voice messaging that user inputs being converted to text message.

It should be noted that, first terminal described in above-described embodiment is intelligent watch, and described second terminal is smart mobile phone, and described operating system is iOS operating system.

Below by a use scenes, above-described embodiment is specifically described:

In this use scenes, first terminal is intelligent watch (Apple Watch), and the second terminal is smart mobile phone (iPhone).Wherein, the application (Application, APP) of third party's input method is all housed in intelligent watch and smart mobile phone, and the third party's input method in smart mobile phone is primary client, and the third party's input method in intelligent watch is assistant client terminal.User by smart mobile phone and intelligent watch pairing, and is starting the application of the third party's input method in smart mobile phone and intelligent watch.

As shown in Figure 3, in this use scenes, the phonetic entry of third party's input method in smart mobile phone is realized by following step:

Step 301, when user starts third party's input method of smart mobile phone and intelligent watch, the assistant client terminal being configured at intelligent watch starts Watchkit monitoring thread and monitors shared region on the backstage of intelligent watch.

Step 302, user press speech voice input function key on the keyboard of third party's input method of primary client.

Step 303, when user presses speech voice input function key on the keyboard of third party's input method of primary client, phonetic entry sign on is initiated in primary client, and this instruction is written in shared region.Wherein, speech voice input function key has microphone icon.

Step 304, assistant client terminal read phonetic entry sign in shared region, and ask according to the recording that starts on phonetic entry sign on generation backstage.

Step 305, assistant client terminal are transferred to the operating system of first terminal by starting the request of recording.

After the operating system of step 306, first terminal receives and starts recording request, the sound pick-up outfit calling the second terminal starts recording.

Step 307, sound pick-up outfit prompting user input voice information.

Step 308, user carry out the input of voice messaging according to the prompting of sound pick-up outfit.After input, user lifts speech voice input function key on the keyboard of third party's input method of primary client.

Step 309, when user lifts speech voice input function key on the keyboard of third party's input method of primary client, phonetic entry halt instruction is initiated in primary client, and this instruction is written in shared region.

Step 310, assistant client terminal read the phonetic entry halt instruction initiated primary client from shared region.

Step 311, assistant client terminal produce the stopping recording request on backstage according to phonetic entry halt instruction, and are transferred to the operating system of first terminal.

After the operating system of step 312, first terminal receives and stops recording request, the sound pick-up outfit calling the second terminal stops recording.

The operating system that the operating system of step 313, first terminal controls the second terminal identifies the voice messaging obtained of recording.

Wherein, can by assistant client terminal after the operating system transmission identification request of first terminal, the operating system being controlled the second terminal by the operating system of first terminal carries out speech recognition; Also can by the operating system of first terminal after receiving stopping recording request, the operating system controlling the second terminal carries out speech recognition.

Voice identification result is write shared region by the operating system of step 314, the second terminal.

Step 315, primary client read voice identification result from shared region, and processed voice recognition result.

In above-mentioned use scenes, user in the third party's keyboard in smart mobile phone, can carry out phonetic entry by clicking speech voice input function key.With need in prior art to exit third party's keyboard, recorded by smart mobile phone, and the mode that recording result copies back third-party application compared, the technical scheme that the present embodiment provides can simplify user operation, is user-friendly to.

Embodiment three

The speech recognition that the embodiment of the present invention additionally provides a kind of third-party application realizes system, and for realizing said method, as shown in Figure 4, said system comprises:

Assistant client terminal 41 and primary client 51, described assistant client terminal 41 is configured in first terminal 4, and described primary client 51 is configured in the second terminal 5.As shown in Figure 5, described assistant client terminal 41 comprises:

Instruction acquisition module 411, for obtaining the speech-input instructions that primary client 51 is initiated;

Recording control module 412, for producing backstage recording request according to described speech-input instructions, and is transferred to the operating system of described first terminal 4, records with the sound pick-up outfit of asking described first terminal 4 to call described second terminal 5;

Speech recognition controlled module 413, is identified, for described primary client 51 processed voice recognition result the voice messaging obtained of recording for being controlled described second terminal 5 by described first terminal 4;

As shown in Figure 6, described primary client 51 comprises:

Instruction initiation module 511, for initiating described speech-input instructions;

Result treatment module 512, for the treatment of institute's speech recognition result.

Further, instruction acquisition module 411 specifically for: by monitoring thread, monitor the shared region that is configured in the second terminal 5, write speech-input instructions in shared region to obtain described primary client 51;

Speech recognition controlled module 413 specifically for: control described second terminal 5 by described first terminal 4, voice identification result write described shared region.

Further, speech recognition controlled module 413 specifically for:

Control described second terminal 5 by described first terminal 4, described voice messaging is sent to server and identifies, and receive voice identification result; Or

Control described second terminal 5 by described first terminal 4, this locality is carried out to described voice messaging and identifies.

Further, described instruction initiation module 511 specifically for:

Receive phonetic entry sign on and phonetic entry halt instruction that user inputs at interface of input method, write described shared region.

Further, described result treatment module 512 specifically for:

Voice identification result is read from described shared region in described primary client 51, and shows in the text box of interface of input method.

Further, described first terminal 4 is intelligent watch, and described second terminal 5 is smart mobile phone, and described operating system is iOS operating system.

Note, above are only preferred embodiment of the present invention and institute's application technology principle.Skilled person in the art will appreciate that and the invention is not restricted to specific embodiment described here, various obvious change can be carried out for a person skilled in the art, readjust and substitute and can not protection scope of the present invention be departed from.Therefore, although be described in further detail invention has been by above embodiment, the present invention is not limited only to above embodiment, when not departing from the present invention's design, can also comprise other Equivalent embodiments more, and scope of the present invention is determined by appended right.

Claims

1. a speech recognition implementation method for third-party application, is characterized in that, comprising:

Described assistant client terminal controls described second terminal by described first terminal and identifies, for described primary client processed voice recognition result the voice messaging obtained of recording.

2. method according to claim 1, is characterized in that:

Be configured at the assistant client terminal of first terminal, the speech-input instructions obtaining the primary client initiation being configured at the second terminal comprises: the assistant client terminal being configured at first terminal passes through monitoring thread, monitor the shared region be configured in the second terminal, to obtain the speech-input instructions in described primary client write shared region;

Accordingly, described assistant client terminal controls described second terminal by described first terminal and identifies the voice messaging obtained of recording, comprise for described primary client processed voice recognition result: described assistant client terminal controls described second terminal by described first terminal and identifies the voice messaging obtained of recording, and voice identification result is write described shared region, for described primary client processed voice recognition result.

3. method according to claim 2, is characterized in that, described assistant client terminal carries out identification by the voice messaging that described second terminal of described first terminal control obtains recording and comprises:

Described assistant client terminal controls described second terminal by described first terminal, described voice messaging is sent to server and identifies, and receive voice identification result; Or

Described assistant client terminal controls described second terminal by described first terminal, carries out this locality identify described voice messaging.

4. according to the method in claim 2 or 3, it is characterized in that, primary client is initiated speech-input instructions and is comprised:

5. method according to claim 4, is characterized in that, described primary client processed voice recognition result comprises:

6. method according to claim 2, is characterized in that: described first terminal is intelligent watch, and described second terminal is smart mobile phone, and described operating system is iOS operating system.

7. the speech recognition of third-party application realizes a system, it is characterized in that, comprising:

Described primary client comprises:

8. system according to claim 7, is characterized in that:

Instruction acquisition module specifically for: by monitoring thread, monitor the shared region that is configured in the second terminal, to obtain the speech-input instructions in described primary client write shared region;

Speech recognition controlled module specifically for: control described second terminal by described first terminal, voice identification result write described shared region.

9. system according to claim 8, is characterized in that, described speech recognition controlled module specifically for:

Control described second terminal by described first terminal, described voice messaging is sent to server and identifies, and receive voice identification result; Or

Control described second terminal by described first terminal, this locality is carried out to described voice messaging and identifies.

10. system according to claim 8 or claim 9, is characterized in that, described instruction initiation module specifically for:

11. systems according to claim 10, is characterized in that, described result treatment module specifically for:

12. systems according to claim 8, is characterized in that: described first terminal is intelligent watch, and described second terminal is smart mobile phone, and described operating system is iOS operating system.