CN105931642B - Voice recognition method, device and system - Google Patents

Voice recognition method, device and system Download PDF

Info

Publication number
CN105931642B
CN105931642B CN201610375073.8A CN201610375073A CN105931642B CN 105931642 B CN105931642 B CN 105931642B CN 201610375073 A CN201610375073 A CN 201610375073A CN 105931642 B CN105931642 B CN 105931642B
Authority
CN
China
Prior art keywords
user
recognition
speech
voice
identification information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610375073.8A
Other languages
Chinese (zh)
Other versions
CN105931642A (en
Inventor
汤跃忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
iFlytek Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201610375073.8A priority Critical patent/CN105931642B/en
Publication of CN105931642A publication Critical patent/CN105931642A/en
Application granted granted Critical
Publication of CN105931642B publication Critical patent/CN105931642B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a voice recognition method, voice recognition equipment and a voice recognition system. The method comprises the following steps: acquiring voice input of a user; selecting a speech database to recognize speech input by a user and outputting a resulting recognition output; selecting one or more candidate optimal recognition outputs from the recognition outputs using a domain decision; and determining an optimal recognition output of the one or more candidate optimal recognition outputs with the individual identification information of the user as a determination condition. The above scheme improves the accuracy of speech recognition without increasing the response time.

Description

Voice recognition method, device and system
Technical Field
The invention relates to the field of voice recognition, in particular to a voice recognition method, voice recognition equipment and a voice recognition system.
Background
With the popularization of the application of intelligent devices, a voice recognition system becomes a new means for information application, and meanwhile, intelligent control of the devices can be realized through the voice recognition system.
In the use of speech recognition systems, the user experience has become a focus of much of the system focus. For the application of the voice recognition system, the response time and the accuracy of judgment become core contents for improving the user experience. In the current determination form, a specific data model is often used to determine the speech data. This type of decision makes all speech environment decisions using a common system. The judgment form inevitably increases the workload of voice recognition, prolongs the response judgment time and further reduces the user experience.
In the art, automatic speech recognition systems (ASR) are common in the art for recognition of speech input by a recognition engine system. The engine model of a speech recognition system usually consists of two parts, an acoustic model and a language model, corresponding to the calculation of the speech-to-syllable probability and the calculation of the syllable-to-word probability, respectively. The language model is mainly divided into a rule model and a statistical model, and the statistical rule inherent in a language unit is revealed by a probability statistical method. The engine unit completes the recognition output of the voice input through the judgment of the knowledge field.
There are various ways to perform the voice judgment in a specific range by adding specific user information marks to the general system, thereby improving the response time and the judgment accuracy. Common forms in the art are: the database classification set for different dialects and accent forms is set, so that the voice input can be classified systematically in the initial judgment stage, and the quick response time is realized. Specific information identifiers can be added to the selected form of the database. The information identification can come from the user terminal. The identification information may be obtained by processing voice input information of the user. The same identification information may be obtained in other ways, such as by location information of the user, signal source of the mobile device, etc. The information is used as the identification information of the user and is input into the ASR system, so that the data selection and judgment of the user are assisted, the response time is prolonged, and the misjudgment rate is reduced.
However, although the above form adds the identification information of the user, the above information merely assists the system in selecting the language database by inputting the information of the language type and the location. In this form, while the response time is reduced, in the final recognition result output, the targeted output of the corresponding user cannot be obtained through the application of the identification information, that is, the recognition efficiency is not high.
There is therefore a need for an identification method that can improve the identification efficiency of a user while achieving an increase in response time.
Disclosure of Invention
In order to solve the above problem, embodiments of the present invention provide a speech recognition method, device, and system, so as to improve the accuracy of speech recognition without increasing response time.
According to an aspect of the present invention, there is provided a speech recognition method including: acquiring voice input of a user; selecting a speech database to recognize speech input by a user and outputting a resulting recognition output; selecting one or more candidate optimal recognition outputs from the recognition outputs using a domain decision; and determining an optimal recognition output of the one or more candidate optimal recognition outputs with the individual identification information of the user as a determination condition.
According to another aspect of the present invention, there is provided a speech recognition apparatus including: a voice acquisition unit for acquiring a voice input of a user; a voice recognition unit for selecting a voice database to recognize a voice input by a user and outputting a recognition output as a result; a first decision unit for selecting one or more candidate optimal recognition outputs from the recognition outputs using domain decision; and a second determination unit configured to determine an optimal recognition output among the one or more candidate optimal recognition outputs using the individual identification information of the user as a determination condition.
According to a third aspect of the present invention, there is provided a speech recognition system comprising: the above-described voice recognition device; and a client device communicatively coupled to the speech recognition device.
According to the scheme, the secondary result judgment of the voice recognition is carried out by using the specific information identification of the user, the judgment result is output as the final result, the multi-stage output of the voice recognition judgment output is realized, and the newly-increased judgment range of the judgment output adopts the output result of the field judgment as the input. Therefore, only a small number of results can be reserved for final determination, and therefore, the scheme does not increase the load of the system and can more accurately determine the output result of the voice recognition on the premise of not reducing the response time.
Drawings
The above features and advantages of the present invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic flow diagram of a speech recognition method according to an embodiment of the present invention;
FIG. 2 provides a flow diagram of a method for speech recognition using native information of a user in accordance with an embodiment of the present invention;
FIG. 3 shows a flow diagram of another speech recognition method according to an embodiment of the invention;
FIG. 4 is a schematic block diagram illustrating a speech recognition device for implementing a speech recognition method according to an embodiment of the present invention; and
FIG. 5 shows a schematic block diagram of a speech recognition system according to an embodiment of the present invention.
Detailed Description
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components, although they are shown in different drawings. For the purposes of clarity and conciseness, a detailed description of known functions and configurations incorporated herein will be omitted to avoid making the subject matter of the present invention unclear.
Fig. 1 shows a schematic flow diagram of a speech recognition method according to an embodiment of the present invention.
As shown in fig. 1, in step S01, a voice input of the user is acquired.
In some examples, the user's voice input may be obtained through a client device (e.g., a voice receiving unit of the client device, such as a microphone, etc.) that the user is using. A speech recognition device communicatively coupled to the client device may then obtain speech input from the client device.
Here, the client device used by the user may be a mobile phone, a fixed terminal, a PDA (personal digital assistant), a notebook, a netbook, a tablet computer, etc. of the user, however, the present invention is not limited thereto, and any mobile or non-mobile device conceivable to those skilled in the art may be used as the client device.
The voice recognition device described in the present application may be referred to as a server, a cloud server, a remote terminal, etc. in some implementations, however, the present invention is not limited thereto, and the voice recognition device in the present invention may be any device that can be used to implement the technical solution of the present invention, regardless of whether it is mobile or non-mobile, and regardless of the name of the device in the specific implementation.
In some examples, the user's voice information may be read by a unit such as a microphone of the client device. The user's voice information may be converted to an electronic signal and stored, for example, the user may make a voice input through a microphone system of the electronic device: "play a musical drama", "play a drama", "i want to listen to a drama", and the like.
Even in some examples, the client device may not be used, for example where the speech recognition device is local to the user, who may input speech directly at the speech recognition device (e.g., his microphone).
At step S02, the speech database is selected to recognize the speech input by the user, and the resulting recognition output is output.
In some examples, a speech database to be used may be selected, and recognition of speech is performed using an acoustic model and a language model of a speech recognition engine, etc., according to the selected speech database, and a recognition result is output.
In step S03, one or more candidate optimal recognition outputs are selected from the recognition outputs using domain determination.
The most preferable candidate output result can be selected for output by domain decision. A plurality of output results to be selected can be included in the output; for example, the plurality of results to be selected may be a plurality of results such as "i want to listen to a comedy", "i want to listen to a yue-opera", and the like. Of course, in some cases, only one output result may be output.
Optionally, in step S04, the personal identification information of the user is detected.
This step may be performed between step S03 and step S05, which will be described in detail below, but the present invention is not limited thereto, and this step may be performed at any time before step S05 is performed. For example, in the case where the user uses the voice recognition apparatus a plurality of times, it is also possible to store the personal identification information detected when the user used the voice recognition apparatus before, and use the stored personal identification information in the present recognition.
The personalized identity information may include, for example, geographic location information of the user, the source of the current connection signal of the mobile device used by the user, the native place of the user, and other information known to those skilled in the art that can personalize the identity of the user. The geographical location information of the user may be obtained in a variety of ways. The information collection can be a combination of various methods, or a single method, and may include, for example: the IP address is obtained through network connection of the user, for example, when the user uses an intelligent voice device connected with a cloud server, the address of the user is 'Shaoxing city in Zhejiang province' through detection of user network information; or may be determined by the location of the base station with which the user's mobile device is associated; the geographic location of the user may also be obtained by the GPS system of the user's mobile device. One of the above-mentioned multiple acquisition manners may be used, and any combination of the multiple acquisition manners may be used to avoid misjudgment (for example, when an internet user uses a proxy server, it is difficult to judge the location of the user through network information).
In step S05, an optimal recognition output among the one or more candidate optimal recognition outputs is determined with the individual identification information of the user as a determination condition.
And further judging a plurality of candidate optimal recognition outputs by using the individual identification information of the user as a judgment condition, and judging the most suitable optimal recognition output in the plurality of candidate optimal recognition outputs through small-range retrieval and recognition. For example, if the candidate optimal recognition outputs determined in step S03 are "my want to hear a yue opera" and "my want to hear a yue opera", and the location of the geographical information of the user obtained in step S04 is "shaoxing city, zhejiang province", for example, the candidate optimal recognition output determined in step S03 is searched for a low sample size using the above information as a determination condition, and thus it can be determined that the output result is "my want to hear a yue opera".
Therefore, the identification accuracy is improved through the relevance between the user individual identification information and the identification output. In the above method, in step S05, the recognition determination is performed again only in the small-range recognition domain, so that the determination form does not impose an excessive load on the overall response time, and thus the above method ensures that the recognition rate of the user voice input is improved on the premise that the response time is not substantially increased, thereby obtaining a high user experience.
In another example, if the user inputs 'i want to listen to a swan goose' through the intelligent voice system, the system can judge that the combination with higher probability is 'swan' or 'red goose', the two can be used as the output form of a plurality of optimal combinations, personalized identification is added in the final selected form database for system judgment, different results are finally guided to be output according to different acquired personalized identifications, and therefore the user experience is improved to a greater extent, and the user requirements are accurately identified.
When the input information of the user is definitely directed to the position of geographic information, for example, after the multiple candidate optimal results are judged and output, the retrieval form of a small sample can be avoided, and the geographic information identified by the user is directly marked as judgment information to be compared and output with multiple optimal solutions, so that the result is output more quickly: for example, the user inputs "weather facing sun", and in the outputted plurality of areas facing sun, the selection is made by the geographic information identification recognized by the user. The above approach further simplifies the recognition pattern, but is only limited to the condition that multiple optimal solutions are all directed to the same personal identification information (e.g., geographic information).
There may also be a case where the number of candidate recognition outputs output in step S03 is only 1. In this case, the process of step S05 may be bypassed. However, in other examples, step S05 may be used to determine whether the one candidate recognition output of step S03 is suitable, and discard recognition outputs that are clearly unsuitable, again prompting the user to input speech.
At the end of the determination of the optimum recognition output, in step S06, the optimum recognition output is output. The output means that can be used herein may include, but is not limited to, sound, image, text, or any other means used in the art to output information, and the present invention is not limited thereto.
In the above description of the technical solution, the geographical location information of the user is used as an example of the personal identification information of the user, however, other personal identification information may also be used. Such as native information of the user, etc.
In the case of using the native information of the user, the determination may be made by dialect, accent, of the user voice input, thereby determining the native information of the user. FIG. 2 provides a flow diagram of a method for speech recognition using native information of a user, according to an embodiment of the present invention.
In the step S01 shown in fig. 2, when the voice input of the user is acquired, the dialect and/or accent attribute of the user may be recognized by the acquired voice of the user to determine the native information of the user (step S07).
After the above-mentioned native information is acquired, the determination of the optimum output result is made in step S05 using the native information as the individual identification information of the user.
For example, in step S02, the dialect attribute of the speech may be determined by the speech recognition system, and the determination result is, for example, "zhejiang dialect".
The plurality of candidate optimal outcomes selected in step S03 may then be further determined in step S05 using the above-described "zhejiang dialect" attribute as a determination condition. For example, when the candidate optimal results to be determined are "i want to listen to the over-the-river" and "i want to listen to the yue-opera", the final output result can be determined to be "i want to listen to the over-the-river" by determining the condition "zhejiang dialect".
The manner of performing the determination by using the native information as the individual identification information of the user can avoid a determination error caused by obtaining the geographic information identification through device association, for example, an error which may be generated when the user is native in Zhejiang province and uses the voice recognition device in the situation of the Guangdong at present.
The case of using the geographical location information and the native information as the individual identification information of the user is described above with reference to fig. 1 and 2, respectively. However, in some examples, the two situations may be combined to obtain a more accurate determination result. For example, the native information of the user and the geographical location information of the user may be used in combination as the individual identification information.
A third specific embodiment and a third specific embodiment are a combination of the first and second embodiments, wherein in a specific embodiment, the determination of the local information and the determination of the geographical location information are used in combination, the determination results of the two are compared, and the comparison result is used as the determination identification information in S05. For example, when the determination results are the same (e.g., both are Zhejiang), the determination result is used as the determination identification information. However, in other embodiments, if the two determinations are different, a higher priority may be given to the local determination or the geographic location determination, for example, based on system settings or user settings. Or in other embodiments, in the case of having more individual identification information, the determination may also be performed in combination with the more individual identification information, for example, different weights are assigned to different identification information, and the determination result with the largest total score is selected. Any other determination method using various different personalized identification information, which is easily conceivable by those skilled in the art, may be adopted in the technical solution of the present invention, and is not described herein again.
The individual identification information of the user is used only in the determination of step S05 in the above example, however, in some examples, the individual identification information of the user may also be used in the voice recognition of step S02. FIG. 3 shows a flow diagram of another speech recognition method according to an embodiment of the invention.
As shown in fig. 3, in step S01, a voice input of the user is acquired.
In a next step, the individual identification information of the user is detected. For example, native location information of the user can be detected according to voice input of the user, or geographical location information of the user can be detected by other means, and the invention is not limited to this. Of course, as previously described, this detection step may be performed at any time before the personality identifying information is used (in this example, before step S02). In some cases, it may even be possible to use stored previously acquired personality identification information.
Then, in the voice recognition step of step S02, the individual identification information is used as the criteria for database selection in step S02 to speed up the voice recognition;
in the subsequent step, the same method is used for data recognition, and in S05, the personal identification information is used again to determine a small sample, and finally, the output data is accurately obtained.
In the above example, the personal identification information is used twice, the first use of the identification information is used for selecting the voice judgment database (for example, the database used in voice recognition is selected by a specific geographic information identification), and the second use of the geographic information identification is used for selecting an appropriate judgment output from the candidate optimal results, because even if an appropriate voice database is selected, the output information which is not appropriate according to the probability combination also appears, and therefore, the optimal results can be screened by the personal identification information (for example, the personal information, or the geographic information identification).
Fig. 4 is a schematic block diagram illustrating a speech recognition apparatus for implementing the above-described speech recognition method according to an embodiment of the present invention. As shown in fig. 4, the voice recognition apparatus may include a voice acquiring unit 410 for acquiring a voice input of a user; a voice recognition unit 420 for selecting a voice database to recognize a voice input by a user and outputting a recognition output as a result; a first decision unit 430 for selecting one or more candidate optimal recognition outputs from the recognition outputs using domain decision; and a second decision unit 440 for deciding an optimal recognition output among the one or more candidate optimal recognition outputs with the individual identification information of the user as a decision condition.
In some examples, the speech recognition unit 420 may also be to: the speech input by the user is recognized using the acoustic model and the language model of the speech recognition engine according to the selected speech database.
In some examples, the speech recognition device may further include: the information detecting unit 450 is configured to detect the individual identification information of the user.
In some examples, the voice recognition apparatus may further include a memory 460 for storing the individual identification information detected by the information detecting unit 450. In addition, the memory may also store any data used by the voice recognition device in performing voice recognition, such as the voice database described above, which is not limited by the present invention.
The personal identification information of the user in the present invention may include one or more of geographical location information of the user, a currently connected signal source of the mobile device used by the user, and a home country of the user. And as described above, the individual identification information of the user in the present invention is not limited thereto, but may be any information used in the art for individually identifying the user.
In some examples, the information detection unit 450 is further configured to: the native place of the user is obtained by recognizing the dialect and/or accent attribute of the user when performing recognition of the voice input by the user.
In some examples, the speech recognition unit 420 is further to: the individual identification information of the user is used to select a voice database for voice recognition.
A schematic block diagram of a speech recognition device according to an embodiment of the present invention is described above in modules/units. It should be noted, however, that one or more of the modules/units may be implemented by one or more specific hardware. Fig. 4 is a schematic block diagram for explaining the technical solution of the present invention. More or fewer modules/units may also be included in an actual implementation. For example, in some implementations, an output device such as a speaker, display, etc. for outputting information may also be included. And in some implementations, various storage devices may be further included to store data/programs or generated data/programs, etc. required in implementing the technical solutions of the present invention, and the present invention is not limited thereto.
FIG. 5 shows a schematic block diagram of a speech recognition system according to an embodiment of the present invention. As shown in fig. 5, the speech recognition system includes a cloud server (or called speech recognition device) and a client speech intelligent device (or called client device) communicatively connected to the speech recognition device according to the speech recognition system shown in fig. 4. As previously described, the client device may also be omitted when the user is co-located with the speech recognition device. The user may input speech directly at the speech recognition device.
The speech recognition process of the speech recognition apparatus shown in fig. 5 is the same as the process described with reference to fig. 1, 2, and 3, and will not be described again.
It should be noted that the technical solutions described in the embodiments of the present invention may be arbitrarily combined without conflict.
In the embodiments provided in the present invention, it should be understood that the disclosed method and apparatus may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one second processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a hardware mode, and the electricity can be realized in a hardware and software functional unit mode.
The above description is only for implementing the embodiments of the present invention, and those skilled in the art will understand that any modification or partial replacement without departing from the scope of the present invention shall fall within the scope defined by the claims of the present invention, and therefore, the scope of the present invention shall be subject to the protection scope of the claims.

Claims (13)

1. A speech recognition method comprising:
acquiring voice input of a user;
selecting a speech database to recognize speech input by a user and outputting a resulting recognition output;
selecting one or more candidate optimal recognition outputs from the recognition outputs using a domain decision; and
and taking the individual identification information of the user as a judgment condition, and performing retrieval and identification on the one or more candidate optimal recognition outputs so as to judge the optimal recognition output in the one or more candidate optimal recognition outputs.
2. The speech recognition method of claim 1, wherein the selecting a speech database to recognize speech input by a user comprises:
the speech input by the user is recognized using the acoustic model and the language model of the speech recognition engine according to the selected speech database.
3. The speech recognition method of claim 1, further comprising:
and detecting the individual identification information of the user.
4. The voice recognition method of claim 3, wherein the individual identification information of the user comprises one or more of geographical location information of the user, a currently connected signal source of a mobile device used by the user, and a user's whereabouts.
5. The voice recognition method according to claim 4, wherein the individual identification information of the user includes a native of the user, which is obtained by recognizing a dialect and/or accent attribute of the user when performing recognition of the voice input by the user.
6. The speech recognition method of claim 1, further comprising:
selecting a voice database for voice recognition using the individual identification information of the user.
7. A speech recognition device comprising:
a voice acquisition unit for acquiring a voice input of a user;
a voice recognition unit for selecting a voice database to recognize a voice input by a user and outputting a recognition output as a result;
a first decision unit for selecting one or more candidate optimal recognition outputs from the recognition outputs using domain decision; and
and the second judging unit is used for searching and identifying the one or more candidate optimal recognition outputs by taking the individual identification information of the user as a judging condition so as to judge the optimal recognition output in the one or more candidate optimal recognition outputs.
8. The speech recognition device of claim 7, wherein the speech recognition unit is further configured to:
the speech input by the user is recognized using the acoustic model and the language model of the speech recognition engine according to the selected speech database.
9. The speech recognition device of claim 7, the information detection unit further to:
and detecting the individual identification information of the user.
10. The speech recognition device of claim 9, wherein the personality identification information of the user includes one or more of geographic location information of the user, a currently connected signal source of a mobile device used by the user, and a user's whereabouts.
11. The speech recognition device of claim 10, wherein the information detection unit is further configured to: the native place of the user is obtained by recognizing the dialect and/or accent attribute of the user when performing recognition of the voice input by the user.
12. The speech recognition device of claim 7, wherein the speech recognition unit is further configured to:
selecting a voice database for voice recognition using the individual identification information of the user.
13. A speech recognition system comprising:
the speech recognition device of any one of claims 7 to 12; and
a client device in communication with the speech recognition device.
CN201610375073.8A 2016-05-31 2016-05-31 Voice recognition method, device and system Active CN105931642B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610375073.8A CN105931642B (en) 2016-05-31 2016-05-31 Voice recognition method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610375073.8A CN105931642B (en) 2016-05-31 2016-05-31 Voice recognition method, device and system

Publications (2)

Publication Number Publication Date
CN105931642A CN105931642A (en) 2016-09-07
CN105931642B true CN105931642B (en) 2020-11-10

Family

ID=56832830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610375073.8A Active CN105931642B (en) 2016-05-31 2016-05-31 Voice recognition method, device and system

Country Status (1)

Country Link
CN (1) CN105931642B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108206020A (en) * 2016-12-16 2018-06-26 北京智能管家科技有限公司 A kind of audio recognition method, device and terminal device
CN109101475B (en) * 2017-06-20 2021-07-27 北京嘀嘀无限科技发展有限公司 Travel voice recognition method and system and computer equipment
TW201921336A (en) 2017-06-15 2019-06-01 大陸商北京嘀嘀無限科技發展有限公司 Systems and methods for speech recognition
CN107464115A (en) * 2017-07-20 2017-12-12 北京小米移动软件有限公司 personal characteristic information verification method and device
CN107785021B (en) * 2017-08-02 2020-06-02 深圳壹账通智能科技有限公司 Voice input method, device, computer equipment and medium
CN110517660A (en) * 2019-08-22 2019-11-29 珠海格力电器股份有限公司 Noise-reduction method and device based on built-in Linux real-time kernel

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104836720A (en) * 2014-02-12 2015-08-12 北京三星通信技术研究有限公司 Method for performing information recommendation in interactive communication, and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6983244B2 (en) * 2003-08-29 2006-01-03 Matsushita Electric Industrial Co., Ltd. Method and apparatus for improved speech recognition with supplementary information
CN103037117B (en) * 2011-09-29 2016-08-03 中国电信股份有限公司 Audio recognition method, system and audio access platform
CN103578469A (en) * 2012-08-08 2014-02-12 百度在线网络技术(北京)有限公司 Method and device for showing voice recognition result
CN103903611B (en) * 2012-12-24 2018-07-03 联想(北京)有限公司 A kind of recognition methods of voice messaging and equipment
CN103956169B (en) * 2014-04-17 2017-07-21 北京搜狗科技发展有限公司 A kind of pronunciation inputting method, device and system
CN105070288B (en) * 2015-07-02 2018-08-07 百度在线网络技术(北京)有限公司 Vehicle-mounted voice instruction identification method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104836720A (en) * 2014-02-12 2015-08-12 北京三星通信技术研究有限公司 Method for performing information recommendation in interactive communication, and device

Also Published As

Publication number Publication date
CN105931642A (en) 2016-09-07

Similar Documents

Publication Publication Date Title
CN105931642B (en) Voice recognition method, device and system
CN107798032B (en) Method and device for processing response message in self-service voice conversation
US10810212B2 (en) Validating provided information in a conversation
CN106251869B (en) Voice processing method and device
CN104335160A (en) Function execution instruction system, function execution instruction method, and function execution instruction program
WO2017206661A1 (en) Voice recognition method and system
US20190164540A1 (en) Voice recognition system and voice recognition method for analyzing command having multiple intents
CN107886951B (en) Voice detection method, device and equipment
CN108447471A (en) Audio recognition method and speech recognition equipment
CN103699530A (en) Method and equipment for inputting texts in target application according to voice input information
CN103458056A (en) Speech intention judging method based on automatic classification technology for automatic outbound system
CN104462105B (en) Chinese word cutting method, device and server
CN110415679A (en) Voice error correction method, device, equipment and storage medium
CN105827787B (en) number marking method and device
CN103635962A (en) Voice recognition system, recognition dictionary logging system, and audio model identifier series generation device
CN110956955B (en) Voice interaction method and device
CN106649410B (en) Method and device for obtaining chat reply content
CN103365834A (en) System and method for eliminating language ambiguity
CN111028834A (en) Voice message reminding method and device, server and voice message reminding equipment
CN108600559B (en) Control method and device of mute mode, storage medium and electronic equipment
US20130332170A1 (en) Method and system for processing content
EP2913822B1 (en) Speaker recognition
JP2012168349A (en) Speech recognition system and retrieval system using the same
US10593323B2 (en) Keyword generation apparatus and keyword generation method
KR20170010978A (en) Method and apparatus for preventing voice phishing using pattern analysis of communication content

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190312

Address after: 100086 8th Floor, 76 Zhichun Road, Haidian District, Beijing

Applicant after: Beijing Jingdong Shangke Information Technology Co., Ltd.

Applicant after: Iflytek Co., Ltd.

Address before: Room C-301, 3rd floor, No. 2 Building, 20 Suzhou Street, Haidian District, Beijing 100080

Applicant before: BEIJING LINGLONG TECHNOLOGY CO., LTD.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant