CN105355195A

CN105355195A - Audio frequency recognition method and audio frequency recognition device

Info

Publication number: CN105355195A
Application number: CN201510623617.3A
Authority: CN
Inventors: 傅强; 王阳; 侯恩星
Original assignee: Xiaomi Inc
Current assignee: Beijing Xiaomi Technology Co Ltd; Xiaomi Inc
Priority date: 2015-09-25
Filing date: 2015-09-25
Publication date: 2016-02-24

Abstract

The invention discloses an audio frequency recognition method and an audio frequency recognition device. The audio frequency recognition method includes: collecting calibration speech; acquiring speech feature information of the calibration speech; collecting speech to be recognized; detecting command speech matched with the speech feature information in the collected speech to be recognized; executing operation corresponding to the command speech by responding to the detected command speech. According to the technical scheme, the audio frequency recognition method has the advantages that a user uses the collected speech as the calibration speech and detects the command speech matched with the calibration speech in the speech to be recognized, and if the command speech is detected, the operation corresponding to the command speech is executed, thus, the user can collect own speech to serve as the calibration speech in advance, the speech inputted by the user can be recognized quite easily even if the inputted speech is nonstandard mandarin, convenience is brought to the user and user's use experience is improved.

Description

Audio identification methods and device

Technical field

The disclosure relates to technical field of voice recognition, particularly relates to audio identification methods and device.

Background technology

Speech recognition technology was widely applied in present stage, speech recognition technology is also referred to as automatic speech recognition (AutomaticSpeechRecognition, ASR), its target is computer-readable input by the vocabulary Content Transformation in the voice of the mankind, such as button, binary coding or character string.Field involved by speech recognition technology comprises: signal transacting, pattern-recognition, theory of probability and information theory, sound generating mechanism and hearing mechanism, artificial intelligence etc.

The application of speech recognition technology comprises phonetic dialing, Voice Navigation, indoor equipment control, voice document searching, simple dictation data inputting etc.Speech recognition technology as mechanical translation and speech synthesis technique combine, can construct more complicated application with other natural language processing techniques.

Summary of the invention

Disclosure embodiment provides audio identification methods and device.Described technical scheme is as follows:

First aspect, provides a kind of audio identification methods, comprising:

Gather calibration voice;

Obtain the voice characteristics information of described calibration voice;

Gather voice to be identified;

The instruction voice mated with described voice characteristics information is detected in the voice to be identified gathered;

In response to described instruction voice being detected, perform the operation that described instruction voice is corresponding.

In one embodiment, described method also can comprise:

Obtain the mark of calibration voice, the mark of described calibration voice comprises: biological information or flag;

Mark described in association store and described voice characteristics information.

In one embodiment, detecting the instruction voice mated with described voice characteristics information in the described voice to be identified gathering, can comprise:

Obtain the mark of described voice to be identified, the mark of described voice to be identified comprises: biological information or flag;

The target identification identical with the mark of described voice to be identified is searched in the mark of described calibration voice;

Obtain the target voice characteristic information corresponding with described target identification;

The instruction voice mated with described target voice characteristic information is detected in the voice to be identified gathered.

In one embodiment, described biological information, can comprise following one or more: voiceprint, iris information and finger print information.

Extract the voice characteristics information of voice to be identified;

The voice characteristics information of described calibration voice is detected in the voice characteristics information of the voice to be identified extracted;

Described in response to described instruction voice being detected, perform the operation that described instruction voice is corresponding, comprising:

In response to voice characteristics information target alignment voice being detected, described target alignment voice are defined as described instruction voice;

Perform the operation that described instruction voice is corresponding.

In one embodiment, described collection calibration voice, can comprise:

Gather the input voice of preset times, the time interval of adjacent collection is less than or equal to Preset Time;

Calculate the eigenwert of the voice characteristics information of two input voice that adjacent time gathers;

In response to determine described two input voice eigenwert between difference be less than preset error value, by described two input voice be defined as calibrate voice.

In one embodiment, described voice characteristics information can comprise following one or more: the tone color of voice, pitch, the duration of a sound and loudness of a sound.

Second aspect, provides a kind of speech recognizing device, comprising:

First acquisition module, for gathering calibration voice;

First acquisition module, for obtaining the voice characteristics information of the described calibration voice that the first acquisition module gathers;

Second acquisition module, for obtain described calibration voice at the first acquisition module voice characteristics information after, gather voice to be identified;

Detection module, after gathering voice to be identified at the second acquisition module, detects the instruction voice mated with described voice characteristics information in the voice to be identified gathered;

Execution module, in response to described instruction voice being detected, performs the operation that described instruction voice is corresponding.

In one embodiment, described device also can comprise:

Second acquisition module, for obtaining the mark of calibration voice, the mark of described calibration voice comprises: biological information or flag;

Memory module, after obtaining the mark of calibration voice at the second acquisition module, mark described in association store and described voice characteristics information.

In one embodiment, described detection module, can comprise:

First obtains submodule, and for obtaining the mark of described voice to be identified, the mark of described voice to be identified comprises: biological information or flag;

Search submodule, for after first obtains the mark of the described voice to be identified of submodule acquisition, in the mark of described calibration voice, search the target identification identical with the mark of described voice to be identified;

Second obtains submodule, for obtaining the target voice characteristic information corresponding with described target identification;

First detection sub-module, for after second obtains the submodule acquisition target voice characteristic information corresponding with described target identification, detects the instruction voice mated with described target voice characteristic information in the voice to be identified gathered.

In one embodiment, described detection module, can comprise:

Extract submodule, for extracting the voice characteristics information of voice to be identified;

Second detection sub-module, for detecting the voice characteristics information of described calibration voice in the voice characteristics information extracting the voice to be identified that submodule extracts;

Described execution module, comprising:

Determine submodule, for the voice characteristics information of target alignment voice being detected in response to the second detection sub-module, described target alignment voice are defined as described instruction voice;

Implementation sub-module, for performing operation corresponding to described instruction voice.

In one embodiment, described first acquisition module, can comprise:

Gather submodule, for gathering the input voice of preset times, the time interval of adjacent collection is less than or equal to Preset Time;

Calculating sub module, for calculating the eigenwert of the voice characteristics information of two input voice that adjacent time gathers;

Determine submodule, in response to determine described two input voice eigenwert between difference be less than preset error value, by described two input voice be defined as calibrate voice.

The third aspect, provides a kind of speech recognizing device, comprising:

Processor;

For the storer of storage of processor executable instruction;

Wherein, described processor is configured to:

Gather calibration voice;

Obtain the voice characteristics information of described calibration voice;

Gather voice to be identified;

In response to the described instruction voice of detection, perform the operation that described instruction voice is corresponding.

The technical scheme that embodiment of the present disclosure provides can comprise following beneficial effect:

Technique scheme, by will gather voice as calibration voice, and from voice to be identified, detect the instruction voice with calibration voice match, if instruction voice detected, then perform operation corresponding to instruction voice, therefore, the voice that user can gather oneself are in advance as calibration voice, even if the voice that input of user off-gauge mandarin like this, also can identify easily, bring facility to user, improve the experience of user.

Should be understood that, it is only exemplary and explanatory that above general description and details hereinafter describe, and can not limit the disclosure.

Accompanying drawing explanation

Accompanying drawing to be herein merged in instructions and to form the part of this instructions, shows and meets embodiment of the present disclosure, and is used from instructions one and explains principle of the present disclosure.

Fig. 1 is the process flow diagram of the audio identification methods according to an exemplary embodiment.

Fig. 2 is the process flow diagram of the another kind of audio identification methods according to an exemplary embodiment.

Fig. 3 is the process flow diagram of step S104 in a kind of audio identification methods according to an exemplary embodiment.

Fig. 4 is the process flow diagram detecting the method for the instruction voice mated with voice characteristics information in the voice to be identified gathered according to an exemplary embodiment.

Fig. 5 is the process flow diagram of step S101 in a kind of audio identification methods according to an exemplary embodiment.

Fig. 6 is the process flow diagram of the audio identification methods according to an exemplary embodiment one.

Fig. 7 is the block diagram of the speech recognizing device according to an exemplary embodiment.

Fig. 8 is the block diagram of the another kind of speech recognizing device according to an exemplary embodiment.

Fig. 9 is the block diagram of detection module 74 in the speech recognizing device according to an exemplary embodiment.

Figure 10 is the block diagram of the another kind of speech recognizing device according to an exemplary embodiment.

Figure 11 is the block diagram of the first acquisition module 71 in the speech recognizing device according to an exemplary embodiment.

Figure 12 is the block diagram being applicable to speech recognizing device according to an exemplary embodiment.

Embodiment

Here will be described exemplary embodiment in detail, its sample table shows in the accompanying drawings.When description below relates to accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawing represents same or analogous key element.Embodiment described in following exemplary embodiment does not represent all embodiments consistent with the disclosure.On the contrary, they only with as in appended claims describe in detail, the example of apparatus and method that aspects more of the present disclosure are consistent.

Due to the difference of individual subscriber accent, some user is bad at mandarin, and its words said accurately may not be identified by equipment, and cause equipment not understand user's word, disclosure embodiment provides audio identification methods, accurately can identify the voice of user.Please refer to Fig. 1, it is the process flow diagram of a kind of audio identification methods according to an exemplary embodiment, the method may be used in terminal, and terminal can be arbitrary equipment such as mobile phone, computing machine, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building equipment, personal digital assistant.As shown in Figure 1, this audio identification methods comprises the following steps S101 to S105:

In step S101, gather calibration voice.

In this step, using the voice of input as calibration voice, the voice of input are the voice of user, and user can be owner user, also can be other users except owner user.Calibration voice can be gathered by the microphone in terminal device such as mobile phone, because some user is bad to speak standard Chinese pronunciation, its word is band accent, in this case, accurately may not be identified by equipment, equipment can be caused not understand user's word, thus None-identified goes out user's word, in this step, the voice of user are gathered in advance as calibration voice, like this, when user puts off until some time later words, terminal device detects that user's word is consistent with the calibration voice of collection, be then easy to identify user's word.

In step s 102, the voice characteristics information of calibration voice is obtained.

In one embodiment, voice characteristics information can comprise following one or more: the tone color of voice, pitch, the duration of a sound and loudness of a sound.

In step s 103, voice to be identified are gathered.

The voice gathering user in step S101, as after calibration voice, in this step, gather voice to be identified, identify voice to be identified.Voice to be identified are also the voice of user's input, and the voice of user's input can be a word, also can be one section of words.From the voice to be identified of user's input, identify the voice characteristics information such as the tone color of voice, pitch, the duration of a sound and loudness of a sound, and mate with the voice characteristics information of calibration voice in step S104.

In step S104, in the voice to be identified gathered, detect the instruction voice mated with voice characteristics information.

Obtain the voice characteristics information of voice to be identified, when comprising the voice consistent with the voice characteristics information of calibration voice in the voice characteristics information of voice to be identified, then these voice are the instruction voice mated with the voice characteristics information of calibration voice.

Illustrate, calibration voice are " little Bai ", and voice to be identified are " I is little Bai ", and " little Bai " wherein in voice to be identified is the instruction voice mated with the voice characteristics information of calibration voice.

In step S105, in response to instruction voice being detected, perform the operation that instruction voice is corresponding.

Instruction voice can be the triggering password entering speech recognition, when this instruction voice being detected, enters recognition mode.Instruction voice can also be the voice triggering call, and after this instruction voice being detected, perform call operation, such as instruction voice is " making a phone call to Zhang San ", after this instruction voice being detected, then performs the operation of calling Zhang San.

Illustrate, first calibration voice are gathered, calibration voice are the voice of user, the voice of user are " little Bai " of non-standard mandarin, after having gathered calibration voice, calibration voice are preserved, gather voice to be identified, voice to be identified can be such as " I is little Bai ", the instruction voice with calibration voice match is detected, i.e. " little Bai ", when instruction voice " little Bai " being detected from voice to be identified, perform operation corresponding to instruction voice, operational example corresponding to instruction voice is as can be open or close speech recognition mode etc.

The said method of disclosure embodiment, by will gather voice as calibration voice, and from voice to be identified, detect the instruction voice with calibration voice match, if instruction voice detected, then perform operation corresponding to instruction voice, therefore, the voice that user can gather oneself are in advance as calibration voice, even if the voice that input of user off-gauge mandarin like this, also can identify easily, bring facility to user, improve the experience of user.

In one embodiment, please refer to Fig. 2, it is the process flow diagram of the another kind of audio identification methods according to an exemplary embodiment, as shown in Figure 2, audio identification methods also can comprise the following steps S106 to S107, and step S106 to S107 can perform after step s 102:

In step s 106, obtain the mark of calibration voice, the mark of calibration voice can include but not limited to: biological information or flag.

In step s 107, association store mark and voice characteristics information.

The voice of multiple user can be defined as calibrating voice, for the ease of speech recognition, can by the mark of user or biological information be corresponding with the voice characteristics information of the calibration voice of this user preserves, the biological information of user can be such as the information that the voiceprint, iris information, finger print information etc. of user uniquely can represent this user.Mark can be account, the telephone number or instant messaging account etc. of user.

In the present embodiment, preserve corresponding with the voice characteristics information of calibration voice for mark, like this when collecting voice to be identified, first determine it is which user, search from calibration voice corresponding to this user again, thus the instruction voice with the calibration voice match of this user can be detected easily and efficiently from voice to be identified.

In one embodiment, please refer to Fig. 3, it is the process flow diagram of step S104 in a kind of audio identification methods according to an exemplary embodiment, and as shown in Figure 3, step S104 can be embodied as following steps 301 to S304:

In step S301, obtain the mark of voice to be identified, the mark of voice to be identified comprises: biological information or flag.

In one embodiment, biological information, can include but not limited to following one or more: voiceprint, iris information and finger print information.

In step s 302, in the mark of calibration voice, the target identification identical with the mark of voice to be identified is searched.

In step S303, obtain the target voice characteristic information corresponding with target identification;

In step s 304, in the voice to be identified gathered, detect the instruction voice mated with target voice characteristic information.

In the present embodiment, by obtaining the mark of voice to be identified, determine it is the voice to be identified of which user, and find the calibration voice with this mark association store, in the voice to be identified gathered, detect the instruction voice mated with the voice characteristics information of these calibration voice.Thus when there being the calibration voice of multiple user, the calibration voice that this user is corresponding can be determined easily and efficiently, and detect the instruction voice with this calibration voice match fast in the voice to be identified gathered.

In one embodiment, please refer to Fig. 4, it is the method flow diagram detecting the instruction voice mated with voice characteristics information in the voice to be identified gathered according to an exemplary embodiment, and as shown in Figure 4, step S104 can be embodied as following steps 401 to S402:

In step S401, extract the voice characteristics information of voice to be identified;

In step S402, the voice characteristics information of testing calibration voice in the voice characteristics information of the voice to be identified extracted;

Now, step S105 can be embodied as following steps 403 to S404::

In step S403, in response to voice characteristics information target alignment voice being detected, target alignment voice are defined as instruction voice;

In step s 404, operation corresponding to instruction voice is performed.

In the present embodiment, target alignment voice are the voice with voice match to be identified in calibration voice, when the voice characteristics information of target alignment voice being detected, target alignment voice are defined as instruction voice, thus exactly target alignment voice can be defined as instruction voice.

In one embodiment, please refer to Fig. 5, it is the process flow diagram of step S101 in a kind of audio identification methods according to an exemplary embodiment, and as shown in Figure 5, step S101 can be embodied as following steps 501 to S503:

In step S501, gather the input voice of preset times, the time interval of adjacent collection is less than or equal to Preset Time;

In step S502, calculate the eigenwert of the voice characteristics information of two input voice that adjacent time gathers;

In step S503, be less than preset error value in response to the difference determined between two eigenwerts inputting voice, two input voice are defined as calibrating voice.

In the present embodiment, in order to prevent maloperation, gather the input voice of preset times, and the time interval of adjacent collection is less than or equal to Preset Time, Preset Time can be such as 3 seconds, and calculate the eigenwert of the voice characteristics information of two input voice that adjacent time gathers, difference between the eigenwert of two input voice is less than preset error value, the voice of the identical content of same user input can be regarded as, now, input voice are defined as calibrate voice, thus effectively prevent because some other voice are defined as calibrating voice by user misoperation.

With specific embodiment, the technique scheme that disclosure embodiment provides is described below.

Embodiment one

Please refer to Fig. 6, it is a kind of audio identification methods according to exemplary embodiment one, and in terminal, terminal can be mobile terminal such as mobile phone, and as shown in Figure 6, next terminal proceeds as follows:

In step s 601, gather calibration voice, these calibration voice are the triggering password entering recognition mode.

User is when using speech identifying function, certain triggering password is needed to enter recognition mode, such as, i Phone is by home button triggering voice recognition function by long, some equipment can listen for user word at any time, speech recognition mode is entered after receiving the particular password that user says, this particular password is the triggering password entering recognition mode, such as: user says: " little Bai, it is what time present? " here " little Bai " is exactly the triggering password entering speech recognition, after recognition of devices goes out " little Bai ", enter speech recognition mode, equipment will start the content identified below, then response is made.But due to the difference of individual subscriber accent, some user is bad at mandarin, " little Bai " that say may can not be understood by equipment.In this step, when environmental noise is smaller around, user is entered into " calibration mode " by the pre-set button on mobile phone app or equipment, after entering into " calibration mode ", gathers calibration voice.

In step S602, extract the voice characteristics information of calibration voice, the voice characteristics information of these calibration voice is preserved.

The voice characteristics information of these calibration voice is preserved, namely completes voice calibration.

In step S603, gather voice to be identified, in the voice to be identified gathered, detect the instruction voice mated with voice characteristics information.

Gather voice to be identified, because calibration voice are the triggering password entering recognition mode, such as: little Bai, in voice to be identified, detect the instruction voice whether comprising " little Bai ".

In step s 604, in response to instruction voice being detected, perform the operation that instruction voice is corresponding.

When comprising the instruction voice of " little Bai " when detecting, enter recognition mode.

Embodiment one, by will gather voice as calibration voice, namely the triggering password of recognition mode is entered, and the voice characteristics information of these calibration voice is preserved, the instruction voice with calibration voice match is detected from voice to be identified, if instruction voice detected, then perform operation corresponding to instruction voice, therefore, user can gather the voice of oneself in advance as calibration voice, even if like this user input voice and off-gauge mandarin, also can identify easily, bring facility to user, improve the experience of user.

Following is disclosure device embodiment, may be used for performing disclosure embodiment of the method.

Fig. 7 is the block diagram of a kind of speech recognizing device according to an exemplary embodiment, and this device can realize becoming the some or all of of electronic equipment by software, hardware or both combinations.As shown in Figure 7, this speech recognizing device comprises:

First acquisition module 71, is configured to gather calibration voice;

First acquisition module 72, is configured to the voice characteristics information of the calibration voice that acquisition first acquisition module gathers;

Second acquisition module 73, is configured to, after the first acquisition module obtains the voice characteristics information of calibration voice, gather voice to be identified;

Detection module 74, is configured to after the second acquisition module gathers voice to be identified, detects the instruction voice mated with voice characteristics information in the voice to be identified gathered;

Execution module 75, is configured in response to instruction voice being detected, performs the operation that instruction voice is corresponding.

The said apparatus of disclosure embodiment, by will gather voice as calibration voice, and from the voice to be identified that the second acquisition module 73 gathers, detect the instruction voice of the calibration voice match gathered with the first acquisition module 71, if instruction voice detected, then perform operation corresponding to instruction voice, therefore, user can gather the voice of oneself in advance as calibration voice, even if like this user input voice and off-gauge mandarin, also can identify easily, bring facility to user, improve the experience of user.

In one embodiment, please refer to Fig. 8, it is the block diagram of the another kind of speech recognizing device according to an exemplary embodiment, and as shown in Figure 8, this speech recognizing device also can comprise:

Second acquisition module 76, is configured to the mark obtaining calibration voice, and the mark of calibration voice comprises: biological information or flag;

Memory module 77, is configured to after the second acquisition module obtains the mark of calibration voice, association store mark and voice characteristics information.

In the present embodiment, second acquisition module 76 obtains the mark of calibration voice, memory module 77 is preserved corresponding with the voice characteristics information of calibration voice for mark, like this when collecting voice to be identified, first determine it is which user, search from calibration voice corresponding to this user again, thus the instruction voice with the calibration voice match of this user can be detected easily and efficiently from voice to be identified.

In one embodiment, please refer to Fig. 9, it is the block diagram of detection module 74 in a kind of speech recognizing device according to an exemplary embodiment, and as shown in Figure 9, detection module 74 can comprise:

First obtains submodule 91, and be configured to the mark obtaining voice to be identified, the mark of voice to be identified comprises: biological information or flag;

Search submodule 92, be configured to, after the first acquisition submodule obtains the mark of voice to be identified, in the mark of calibration voice, search the target identification identical with the mark of voice to be identified;

Second obtains submodule 93, is configured to obtain the target voice characteristic information corresponding with target identification;

First detection sub-module 94, is configured to, after second obtains the submodule acquisition target voice characteristic information corresponding with target identification, detect the instruction voice mated with target voice characteristic information in the voice to be identified gathered.

In the present embodiment, first obtains the mark that submodule 91 obtains voice to be identified, search submodule 92 and determine it is the voice to be identified of which user, and find the calibration voice with this mark association store, second obtains submodule 93 obtains the target voice characteristic information corresponding with target identification, and the first detection sub-module 94 detects the instruction voice mated with the voice characteristics information of these calibration voice in the voice to be identified gathered.Thus when there being the calibration voice of multiple user, the calibration voice that this user is corresponding can be determined easily and efficiently, and detect the instruction voice with this calibration voice match fast in the voice to be identified gathered.

In one embodiment, biological information, can comprise following one or more: voiceprint, iris information and finger print information.

In one embodiment, please refer to Figure 10, it is the block diagram of the another kind of speech recognizing device according to an exemplary embodiment, and as shown in Figure 10, detection module 74, can comprise:

Extract submodule 101, be configured to the voice characteristics information extracting voice to be identified;

Second detection sub-module 102, is configured to the voice characteristics information of testing calibration voice in the voice characteristics information extracting the voice to be identified that submodule extracts;

Execution module 75, comprising:

Determine submodule 103, be configured to voice characteristics information target alignment voice being detected in response to the second detection sub-module, target alignment voice are defined as instruction voice;

Implementation sub-module 104, is configured to perform operation corresponding to instruction voice.

In one embodiment, please refer to Figure 11, it is the block diagram of the first acquisition module 71 in a kind of speech recognizing device according to an exemplary embodiment, and as shown in figure 11, the first acquisition module 71 can comprise:

Gather submodule 111, be configured to the input voice gathering preset times, the time interval of adjacent collection is less than or equal to Preset Time;

Calculating sub module 112, is configured to the eigenwert of the voice characteristics information calculating two input voice that adjacent time gathers;

Determine submodule 113, the difference be configured in response to determining between two eigenwerts inputting voice is less than preset error value, is defined as calibrating voice by two input voice.

In the present embodiment, in order to prevent maloperation, gather the input voice that submodule 111 gathers preset times, and the time interval of adjacent collection is less than or equal to Preset Time, Preset Time can be such as 3 seconds, calculating sub module 112 calculates the eigenwert of the voice characteristics information of two input voice that adjacent time gathers, difference between the eigenwert of two input voice is less than preset error value, the voice of the identical content of same user input can be regarded as, now, input voice are defined as calibrate voice, thus effectively prevent because some other voice are defined as calibrating voice by user misoperation.

The disclosure embodiment still provides a kind of speech recognizing device, and this speech recognizing device comprises:

Processor;

For the storer of storage of processor executable instruction;

Wherein, described processor is configured to:

Gather calibration voice;

Obtain the voice characteristics information of described calibration voice;

Gather voice to be identified;

Above-mentioned processor also can be configured to:

Described biological information, can comprise following one or more: voiceprint, iris information and finger print information.

Above-mentioned processor also can be configured to:

Extract the voice characteristics information of voice to be identified;

Perform the operation that described instruction voice is corresponding.

Above-mentioned processor also can be configured to:

Described voice characteristics information can comprise following one or more: the tone color of voice, pitch, the duration of a sound and loudness of a sound.

About the device in above-described embodiment, wherein the concrete mode of modules executable operations has been described in detail in about the embodiment of the method, will not elaborate explanation herein.

Figure 12 is a kind of block diagram for speech recognizing device according to an exemplary embodiment, and this device is applicable to terminal device.Such as, device 1200 can be mobile phone, computing machine, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building equipment, personal digital assistant etc.

Device 1200 can comprise following one or more assembly: the interface 1212 of processing components 1202, storer 1204, power supply module 1206, multimedia groupware 1208, audio-frequency assembly 1210, I/O (I/O), sensor module 1214 and communications component 1216.

The integrated operation of the usual control device 1200 of processing components 1202, such as with display, call, data communication, camera operation and record operate the operation be associated.Treatment element 1202 can comprise one or more processor 1220 to perform instruction, to complete all or part of step of above-mentioned method.In addition, processing components 1202 can comprise one or more module, and what be convenient between processing components 1202 and other assemblies is mutual.Such as, processing element 1202 can comprise multi-media module, mutual with what facilitate between multimedia groupware 1208 and processing components 1202.

Storer 1204 is configured to store various types of data to be supported in the operation of equipment 1200.The example of these data comprises for any application program of operation on device 1200 or the instruction of method, contact data, telephone book data, message, picture, video etc.Storer 1204 can be realized by the volatibility of any type or non-volatile memory device or their combination, as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory EPROM (EPROM), programmable read only memory (PROM), ROM (read-only memory) (ROM), magnetic store, flash memory, disk or CD.

The various assemblies that electric power assembly 1206 is device 1200 provide electric power.Electric power assembly 1206 can comprise power-supply management system, one or more power supply, and other and the assembly generating, manage and distribute electric power for device 1200 and be associated.

Multimedia groupware 1208 is included in the screen providing an output interface between described device 1200 and user.In certain embodiments, screen can comprise liquid crystal display (LCD) and touch panel (TP).If screen comprises touch panel, screen may be implemented as touch-screen, to receive the input signal from user.Touch panel comprises one or more touch sensor with the gesture on sensing touch, slip and touch panel.Described touch sensor can the border of not only sensing touch or sliding action, but also detects the duration relevant to described touch or slide and pressure.In certain embodiments, multimedia groupware 1208 comprises a front-facing camera and/or post-positioned pick-up head.When equipment 1200 is in operator scheme, during as screening-mode or video mode, front-facing camera and/or post-positioned pick-up head can receive outside multi-medium data.Each front-facing camera and post-positioned pick-up head can be fixing optical lens systems or have focal length and optical zoom ability.

Audio-frequency assembly 1210 is configured to export and/or input audio signal.Such as, audio-frequency assembly 1210 comprises a microphone (MIC), and when device 1200 is in operator scheme, during as call model, logging mode and speech recognition mode, microphone is configured to receive external audio signal.The sound signal received can be stored in storer 1204 further or be sent via communications component 1216.In certain embodiments, audio-frequency assembly 1210 also comprises a loudspeaker, for output audio signal.

I/O interface 1212 is for providing interface between processing components 1202 and peripheral interface module, and above-mentioned peripheral interface module can be keyboard, some striking wheel, button etc.These buttons can include but not limited to: home button, volume button, start button and locking press button.

Sensor module 1214 comprises one or more sensor, for providing the state estimation of various aspects for device 1200.Such as, sensor module 1214 can detect the opening/closing state of equipment 1200, the relative positioning of assembly, such as described assembly is display and the keypad of device 1200, the position of all right pick-up unit 1200 of sensor module 1214 or device 1200 assemblies changes, the presence or absence that user contacts with device 1200, the temperature variation of device 1200 orientation or acceleration/deceleration and device 1200.Sensor module 1214 can comprise proximity transducer, be configured to without any physical contact time detect near the existence of object.Sensor module 1214 can also comprise optical sensor, as CMOS or ccd image sensor, for using in imaging applications.In certain embodiments, this sensor module 1214 can also comprise acceleration transducer, gyro sensor, Magnetic Sensor, pressure transducer or temperature sensor.

Communications component 1216 is configured to the communication being convenient to wired or wireless mode between device 1200 and other equipment.Device 1200 can access the wireless network based on communication standard, as WiFi, 2G or 3G, or their combination.In one exemplary embodiment, communication component 1216 receives from the broadcast singal of external broadcasting management system or broadcast related information via broadcast channel.In one exemplary embodiment, described communication component 1216 also comprises near-field communication (NFC) module, to promote junction service.Such as, can based on radio-frequency (RF) identification (RFID) technology in NFC module, Infrared Data Association (IrDA) technology, ultra broadband (UWB) technology, bluetooth (BT) technology and other technologies realize.

In the exemplary embodiment, device 1200 can be realized, for performing said method by one or more application specific integrated circuit (ASIC), digital signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD) (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components.

In the exemplary embodiment, additionally provide a kind of non-transitory computer-readable recording medium comprising instruction, such as, comprise the storer 1204 of instruction, above-mentioned instruction can perform said method by the processor 820 of device 1200.Such as, described non-transitory computer-readable recording medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc.

A kind of non-transitory computer-readable recording medium, when the instruction in described storage medium is performed by the processor of device 1200, make device 1200 can perform the method for above-mentioned audio identification, described method comprises:

Gather calibration voice;

Obtain the voice characteristics information of described calibration voice;

Gather voice to be identified;

In one embodiment, described method also can comprise:

Extract the voice characteristics information of voice to be identified;

Perform the operation that described instruction voice is corresponding.

In one embodiment, described collection calibration voice, can comprise:

Those skilled in the art, at consideration instructions and after putting into practice disclosed herein disclosing, will easily expect other embodiment of the present disclosure.The application is intended to contain any modification of the present disclosure, purposes or adaptations, and these modification, purposes or adaptations are followed general principle of the present disclosure and comprised the undocumented common practise in the art of the disclosure or conventional techniques means.Instructions and embodiment are only regarded as exemplary, and true scope of the present disclosure and spirit are pointed out by claim below.

Should be understood that, the disclosure is not limited to precision architecture described above and illustrated in the accompanying drawings, and can carry out various amendment and change not departing from its scope.The scope of the present disclosure is only limited by appended claim.

Claims

1. an audio identification methods, is characterized in that, comprising:

Gather calibration voice;

Obtain the voice characteristics information of described calibration voice;

Gather voice to be identified;

2. the method for claim 1, is characterized in that, described method also comprises:

3. method as claimed in claim 2, is characterized in that, detecting the instruction voice mated with described voice characteristics information, comprising in the described voice to be identified gathering:

4. method as claimed in claim 2 or claim 3, is characterized in that, described biological information, comprises following one or more: voiceprint, iris information and finger print information.

5. the method for claim 1, is characterized in that, detecting the instruction voice mated with described voice characteristics information, comprising in the described voice to be identified gathering:

Extract the voice characteristics information of voice to be identified;

Perform the operation that described instruction voice is corresponding.

6. the method for claim 1, is characterized in that, described collection calibration voice, comprising:

7. the method according to any one of claim 1 to 6, is characterized in that, described voice characteristics information comprises following one or more: the tone color of voice, pitch, the duration of a sound and loudness of a sound.

8. a speech recognizing device, is characterized in that, comprising:

First acquisition module, for gathering calibration voice;

9. device as claimed in claim 8, it is characterized in that, described device also comprises:

10. device as claimed in claim 9, it is characterized in that, described detection module, comprising:

11. devices as described in claim 9 or 10, is characterized in that, described biological information, comprise following one or more: voiceprint, iris information and finger print information.

12. devices as claimed in claim 8, it is characterized in that, described detection module, comprising:

Described execution module, comprising:

13. devices as claimed in claim 8, it is characterized in that, described first acquisition module, comprising:

14. devices according to any one of claim 8 to 13, it is characterized in that, described voice characteristics information comprises following one or more: the tone color of voice, pitch, the duration of a sound and loudness of a sound.

15. 1 kinds of speech recognizing devices, is characterized in that, comprising:

Processor;

For the storer of storage of processor executable instruction;

Wherein, described processor is configured to:

Gather calibration voice;

Obtain the voice characteristics information of described calibration voice;

Gather voice to be identified;