CN109473111A - A kind of voice enabling apparatus and method - Google Patents
A kind of voice enabling apparatus and method Download PDFInfo
- Publication number
- CN109473111A CN109473111A CN201811644724.4A CN201811644724A CN109473111A CN 109473111 A CN109473111 A CN 109473111A CN 201811644724 A CN201811644724 A CN 201811644724A CN 109473111 A CN109473111 A CN 109473111A
- Authority
- CN
- China
- Prior art keywords
- audio data
- sound source
- voice
- denoising
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 40
- 230000005540 biological transmission Effects 0.000 claims abstract description 16
- 230000003993 interaction Effects 0.000 claims abstract description 6
- 238000006243 chemical reaction Methods 0.000 claims description 14
- 230000002618 waking effect Effects 0.000 claims description 6
- 241001269238 Data Species 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 abstract description 17
- 230000006870 function Effects 0.000 description 15
- 230000008569 process Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000004807 localization Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000002269 spontaneous effect Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000011900 installation process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000007474 system interaction Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The present invention discloses a kind of voice enabling apparatus, including, sound source acquisition module is exported for acquiring audio data to speech processing module;Speech processing module generates the first audio data and second audio data for handling audio data;Data transmission module exports the first audio data and second audio data to the external equipment being attached thereto for realizing the data interaction with external equipment.The invention also discloses a kind of application devices to carry out the method that voice is energized, the apparatus according to the invention and method may be implemented will not speech identifying function host equipment assign voice interactive function, and the noise treatment problem in the prior art to speech recognition is overcome, speech recognition result is optimized.And it reduces power consumption, be not take up resource.
Description
Technical field
The present invention relates to technical field of voice interaction, especially a kind of voice enabling apparatus and method.
Background technique
With the development of science and technology, smart machine is more more and more universal, but at present on the market, most of smart machine does not have
Interactive voice ability, and commonly the equipment with voice interactive function is mostly that near field pickup interaction or simple single-wheel dialogue are set
Meter, it is not high for the accuracy of processing and the speech recognition of noise in interactive voice, while host equipment can not be played
Source of sound eliminated, to cannot achieve far field Speech processing.
Another aspect, the interactive voice of most equipment are all run on the host device, have certain influence to power consumption, usually
It can be unable to reach low-power consumption requirement, while most of front end signal processing is also placed in host equipment and carries out operation, to system resource
There is larger occupancy, influences running efficiency of system.
Summary of the invention
In view of the above-mentioned problems, the present invention is directed to propose a kind of technical side for the far field interactive voice that can be realized host equipment
Case especially convenient can be realized and be handed over the far field voice of host equipment on the basis of not changing host equipment structure
The solution of mutual Function Extension.
According to the first aspect of the invention, a kind of voice enabling apparatus is provided, including
Sound source acquisition module is exported for acquiring audio data to following speech processing modules;
Speech processing module generates the first audio data for handling the audio data;
Data transmission module exports the first audio data to connecting therewith for realizing the data interaction with external equipment
The external equipment connect.
According to the second aspect of the invention, a kind of method for realizing that voice is energized by voice enabling apparatus is provided,
Include the following steps:
Voice enabling apparatus is connected to main equipment by data transmission module;
Voice enabling apparatus acquires audio data, and handles the audio data, generate the first audio data and
Second audio data;
Voice enabling apparatus exports the first audio data and second audio data to main equipment.
The device and method provided according to the present invention, may be implemented will not speech identifying function host equipment assign language
Sound interactive function, and can be by data transmission module directly and host devices communication, with the acquisition of degree of realization audio-frequency information
And processing, enable the host equipment being attached thereto easily to possess far field interactive voice ability, pole easily extends master
The phonetic function of machine equipment.In addition, device and method provided in an embodiment of the present invention can carry out front end signal to audio data
Processing, the problems such as front end signal processing bring reduces power consumption, occupancy resource will be carried out by overcoming host equipment in the prior art.
Detailed description of the invention
Fig. 1 is the voice enabling apparatus functional block diagram of an embodiment of the present invention;
Fig. 2 is the voice enabling apparatus functional block diagram of a further embodiment of this invention;
Fig. 3 is the method flow diagram energized by voice enabling apparatus realization voice of an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.
The present invention can describe in the general context of computer-executable instructions executed by a computer, such as program
Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, member
Part, data structure etc..The present invention can also be practiced in a distributed computing environment, in these distributed computing environments, by
Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with
In the local and remote computer storage media including storage equipment.
In the present invention, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, hardware
Combination, software or software in execution with software etc..In detail, for example, element can with but be not limited to run on processing
Process, processor, object, executable element, execution thread, program and/or the computer of device.In addition, running on server
Application program or shell script, server can be element.One or more elements can be in the process and/or thread of execution
In, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and can be by each
Kind computer-readable medium operation.Element can also according to the signal with one or more data packets, for example, from one with
Another element interacts in local system, distributed system, and/or the network in internet passes through signal and other system interactions
The signals of data communicated by locally and/or remotely process.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise", not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or equipment institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence " including ... ", it is not excluded that including described want
There is also other identical elements in the process, method, article or equipment of element.
The invention will now be described in further detail with reference to the accompanying drawings.
Fig. 1 schematically shows a kind of voice enabling apparatus functional block diagram of embodiment according to the present invention.Such as Fig. 1
It is shown,
Voice enabling apparatus includes: sound source acquisition module 1, speech processing module 2 and data transmission module 3.
Wherein, sound source acquisition module 1 is exported for acquiring audio data to speech processing module 2.Exemplary, the module is real
It is now multiple microphones, is especially moveable shotgun microphone, the positioning to sound source may be implemented, user can be directly opposite
The module issues the instruction, such as " I will record " etc. of interactive voice, to realize far field pickup.And set removable for microphone
Dynamic, then it can be enhanced by adjusting the direction of microphone and realizing for Sounnd source direction, other angles noise is weakened,
It thereby may be ensured that the quality of audio.
Speech processing module 2 is for handling audio data, the first audio data of generation and second audio data, the
One audio data is the phonetic order that user issues, and second audio data is to wake up control signal, i.e., relevant to result is waken up
Content-data can carry out voice wake-up in the present apparatus according to the phonetic order that user issues and obtain wake-up control signal.?
Other realize in example that it does not include that voice wakes up identifying processing that voice enabling apparatus, which may be set to be, only generate the first audio number
According to i.e. only progress front end signal processing.
Data transmission module 3 is for realizing the data interaction with external equipment, by the first audio data and the second audio number
According to output to the external equipment being attached thereto, can thus make without the host equipment of voice interactive function according to the first sound
Frequency evidence and second audio data realize voice interactive function.Data transmission module 3 supports usb protocol, Bluetooth protocol and WiFi
At least one of agreement illustratively can be implemented as USB interface.Front end signal processing is only carried out in voice enabling apparatus
In the case of, data transmission module 3 exports the first audio data to the external equipment being attached thereto.
Wherein, sound source acquisition module 1 includes the first sound source acquisition component 101 and the second sound source acquisition component 102.First sound
Source acquisition component 101 is for acquiring sound source audio data;Second sound source acquisition component 102 is for acquiring reference audio data.Show
Example property, the first sound source acquisition component 101 and the second sound source acquisition component 102 are embodied as two moveable microphones, right
The voice of typing carries out the audio collection of 16k/16bit.Acquire sound source audio data when, can by user directly against two can
Mobile microphone is spoken, by 101 typing sound source audio of the first sound source acquisition component.Reference audio data then predominantly for
Connection host equipment background sound, can be directly by moveable microphone close to sound mouth (such as loudspeaker of host equipment
), or multidigit angle is rotated against needing to shield the direction of source of sound, to collect the source of sound or shield side of host equipment broadcasting
To source of sound as reference audio data.Two audio datas that will acquire are transmitted to speech processing module 2.
Speech processing module 2 includes that noise eliminates unit 201 and beam forming unit 203.
Noise eliminates unit 201 for going according to sound source audio data and reference audio data to sound source audio data
It makes an uproar processing, so as to optimizing speech recognition as a result, obtain more accurate speech recognition effect, overcomes in the prior art
The interference of background sound.
Beam forming unit 203 is used to carry out Wave beam forming to the sound source audio data after denoising, after realizing to denoising
Sound source audio data filtering processing, to obtain to export to the first pure audio data of external equipment.
Wherein, noise eliminates the noise reduction technology that unit 201 mainly applies DSP (Digital Signal Processing), including modulus turns
Change component 2011, echo cancellor component 2012 and digital-to-analogue transition components 2013.Analog-to-digital conversion component 2011 is used for sound source audio
Data and reference audio data carry out analog-to-digital conversion, which is internally provided with the circuit that can carry out analog-to-digital conversion, referring to existing
There is the analog-to-digital conversion mode of technology to generate digital signal.Echo cancellor component 2012 is used for the number generated according to analog-to-digital conversion component
Word signal carries out subtraction, the sound source digital signal after obtaining denoising subtracts the corresponding digital signal of sound source audio data
Digital signal after going the corresponding digital signal of reference audio data to be denoised, as sound source digital signal.Digital-to-analogue conversion group
Part 2013 is used to carry out digital-to-analogue conversion to the sound source digital signal after denoising, the sound source audio data after generating denoising.According to this
Working in coordination for several components can obtain the audio data removed with reference to sound data.
Filtering, which forms unit 203 and is referred to the prior art, to be realized, therefore to its implementation without repeating.
It may be implemented to assign some host equipment interactive voice abilities without voice interactive function according to the present embodiment,
And the front end signals processing such as denoise, filter for the phonetic order of the user of acquisition, content, it is more excellent so as to obtain
The speech recognition result of change.Meanwhile the device of the embodiment of the present invention can external equipment simply and can be achieved with far field to pick up
Sound, the design for integrating multiple microphones facilitate the positioning for carrying out sound source, and to be enhanced for Sounnd source direction, and other angles are made an uproar
Sound is weakened, to guarantee the quality of audio.And it, can be specifically for the patch of property for the background sound issued on host equipment
Nearly host equipment sound mouth is used as so as to collect source of sound or the shield direction source of sound of host equipment broadcasting with reference to sound, and
Echo cancellor is carried out, such source of sound is interfered and carries out anti-noise processing, to realize the function of Statistical error audio.
In addition, the functions such as the front end signal processing and wake-up of voice are integrated into hardware chip, to no longer occupy master
The system resource of machine equipment, while in power consumption, on special speech chip, there can be larger optimization to phonetic algorithm,
To realize low-power consumption requirement.
Fig. 2 is the voice enabling apparatus functional block diagram of a further embodiment of this invention.As shown in Fig. 2,
The speech processing module 2 of the voice enabling apparatus further includes waking up authentication unit 202 and second audio data generation
Unit 205.
It wakes up authentication unit 202 to be used to carry out the sound source audio data after denoising wake-up identification, generates and wake up control letter
Number and wake up angle, which knows is parsed by the voice content to the sound source audio after denoising or right otherwise
The semantic interpretation answered, is identified according to semanteme, show that the wake-up word to be expressed of user, implementation are referred to existing skill
Art, wherein wake up parameter of the angle for inventor according to semantic parsing addition, the mode for obtaining wake-up angle may is that in sound
At sound acquisition, the microphone being made of multiple microphones acquires array, by data that multiple microphones acquire while being given to voice
Wake up authentication unit 202, the unit can using wake up phonetic algorithm according to different microphones receive audio case propagation delays and
Ability is distributed to confirm point source of sound, since each frame audio can all have sound positioning, so passing through the confirmation sound when waking up verifying
Source point, so that it may obtain sound positioning result, be exported as angle is waken up.The time delay feelings of audio are determined using phonetic algorithm
Condition and ability distribution can be achieved by the prior art.
Preferably, in the embodiment of the present invention, speech processing module further includes the first audio data generation unit 204.With this
Meanwhile beam forming unit 203 is used to carry out Wave beam forming to the sound source audio data after denoising, generates three road audio streams i.e. three
The audio output of road 16k.First audio data generation unit 204 is used for the three road audio streams generated to beam forming unit 203
It carries out processing and generates the output of the first audio data, specifically take any road audio to export as the first audio data, then rely on sound source
The pointed wake-up angle of positioning, wake-up angle pointed by auditory localization result are when waking up processing and to wake up result
It exports together.
For second audio data, it comprises the control signals of wake-up, and it is raw directly to transmit it to second audio data
At unit 205, it is used to be handled (number turns audio) to the wake-up control signal for waking up the generation of authentication unit 202 equally raw
At the audio of 48k, i.e. second audio data exports.
First audio data and the two audio datas of second audio data are transmitted by the driving of data transmission module 3
To the application layer of host equipment, application layer carries out the first audio data to split into three parts by the audio data of acquisition two-way
Audio A, B, C are stored to round-robin queue, are recalled based on OneShot.Duration is carried out to the wake-up signal in second audio data
Monitoring.When listening to wake-up signal, obtain the wake-up signal is to which road audio of A, B, C according to beam forming unit 203
As identification object, so that corresponding identification object be matched with wake-up signal, interactive voice is realized.
According to the present embodiment may be implemented will not speech identifying function host equipment assign voice interactive function, and
The noise treatment problem in the prior art to speech recognition is overcome, speech recognition result is optimized.Also, before voice
The functions such as end signal processing and wake-up are integrated into hardware chip, so that the system resource of host equipment is no longer occupied, while
Power consumption can have larger optimization on special speech chip to phonetic algorithm, to realize low-power consumption requirement.
Fig. 3 schematically shows that application voice enabling apparatus according to an embodiment of the present invention realizes the voice side of energizing
Method flow chart, as shown in figure 3, the present embodiment includes the following steps:
Step S301: voice enabling apparatus is connected to main equipment by data transmission module.Can by usb protocol,
Bluetooth protocol and WIFI agreement etc. establish connection with main equipment, which supports a plurality of types of main equipments.
Step S302: voice enabling apparatus acquires audio data, and handles audio data, generates the first audio number
According to and second audio data.Wherein, the audio data of voice enabling apparatus acquisition includes sound source audio data and reference audio number
According to.Specific implementation are as follows: denoising, the denoising are carried out to sound source audio data according to sound source audio data and reference audio data
The mode of processing applies noise reduction technology in DSP.In order to facilitate the calculating process of denoising, first by sound source audio data and with reference to sound
Frequency carries out subtraction, the number that will be obtained after subtraction according to digital signal is respectively converted into, to the digital signal after conversion
Word signal is converted to analog signal, thus the sound source audio data after being denoised.It is thus achieved that the effect of optimization interactive voice
Fruit.
And Wave beam forming is carried out to the sound source audio data after denoising, generates the first audio data, it is also right at the same time
Sound source audio data after denoising carries out wake-up identification, generates second audio data.And to the sound source audio number after denoising
When according to carrying out Wave beam forming, audio selection is carried out also according to angle is waken up, specifically, since sound source acquisition module 1 includes multiple
Microphone generates having MCVF multichannel voice frequency after beamforming (beam forming) algorithm, respectively corresponds different angle
Enhance audio, and specifically take which road audio is exported as the first audio data, then relies on wake-up angle pointed by auditory localization
It spends, wake-up angle pointed by auditory localization result is to export together when waking up processing with wake-up result.Specifically
Implementation is referred to the device realization principle of Fig. 2.
Step S303: voice enabling apparatus exports the first audio data and second audio data to main equipment.The data
The mode of transmission is referred to step S301, and specific implementation can establish in voice enabling apparatus and adapt to multiple types main equipment
Multiple interfaces.
According to this method may be implemented will not speech identifying function host equipment assign voice interactive function, and gram
The noise treatment problem in the prior art to speech recognition has been taken, speech recognition result is optimized, and has reached reduction master
Machine equipment power consumption is not take up resource and other effects.
By taking external host equipment is television set as an example, voice enabling apparatus application of the invention is realized on a television set
The specifically used method of the far field pickup of television set is as follows:
Firstly, the voice enabling apparatus is mounted on the top of television set by user, the microphone array of its main part is ensured
Column are accustomed to direction towards user, and centre is maintained at level angle without main barrier as far as possible.Later, voice is energized
The USB line of device is inserted in the junction at television set rear, to keep power supply and signal transmission.Again by the Mike of voice enabling apparatus
Near the loudspeaker that wind array is fixed on television set in a manner of pasting etc..The installation process of the voice enabling apparatus is completed with this.
In use, voice enabling apparatus is completed by microphone (the first i.e. above-mentioned sound source acquisition component 101)
The process that the sound that user issues is picked up.And microphone (i.e. above-mentioned by being pasted near television set speaker
Two sound source acquisition components 102) pickup of the completion to the spontaneous sound of television set.By voice enabling apparatus to two groups of sound of acquisition
It compares, completion filters out spontaneous sound, obtains the instruction sound that user actively issues.To complete further to believe
Number processing.Subsequent treatment process is referring to above-mentioned method part.
From there through the mode of this external transmission audio, can be transmitted necessary to audio to avoid soft circuit in system layer
System debug work;Hard circuit is also avoided to work for the dependence of terminal and system adaptation.Equipment is preferably reduced simultaneously
The true interference of Self-sounding part is avoided since power amplification system, loudspeaker etc. broadcast link in sound and voice signal is asynchronous,
Caused by problem.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It is realized by the mode of software plus general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned technology
Scheme substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, the computer
Software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions to
So that computer equipment (can be personal computer, server or the network equipment etc.) execute each embodiment or
Method described in certain parts of embodiment.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although
The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. voice enabling apparatus, which is characterized in that including
Sound source acquisition module is exported for acquiring audio data to following speech processing modules;
Speech processing module generates the first audio data for handling the audio data;
Data transmission module exports the first audio data to being attached thereto for realizing the data interaction with external equipment
External equipment.
2. the apparatus according to claim 1, which is characterized in that the sound source acquisition module includes the first sound source acquisition group
Part, for acquiring sound source audio data;
Second sound source acquisition component, for acquiring reference audio data;
The speech processing module includes
Noise eliminates unit, for being carried out at denoising according to sound source audio data and reference audio data to sound source audio data
Reason;With
Beam forming unit generates the output of the first audio data for carrying out Wave beam forming to the sound source audio data after denoising.
3. the apparatus of claim 2, wherein the noise eliminates unit and includes
Analog-to-digital conversion component generates digital signal for carrying out analog-to-digital conversion to sound source audio data and reference audio data;
Echo cancellor component, the digital signal for being generated according to analog-to-digital conversion component carries out subtraction, after obtaining denoising
Sound source digital signal;
Digital-to-analogue conversion component, for carrying out digital-to-analogue conversion to the sound source digital signal after denoising, the sound source audio after generating denoising
Data.
4. device according to claim 3, which is characterized in that it is logical that the speech processing module also generates second audio data
It crosses the data transmission module to export to the external equipment, the speech processing module further includes
Authentication unit is waken up, for carrying out wake-up identification to the sound source audio data after denoising, generates and wakes up control signal;
Second audio data generation unit generates for handling to waking up the wake-up control signal that authentication unit generates
The output of two audio datas.
5. according to the described in any item devices of claim 2 to 4, which is characterized in that the first sound source acquisition component and second
Sound source acquisition component is embodied as at least two moveable microphones.
6. device according to claim 5, wherein the data transmission module supports usb protocol, WIFI agreement and bluetooth
At least one of agreement.
7. realizing that voice is energized method by voice enabling apparatus as claimed in claim 4, which is characterized in that including walking as follows
It is rapid:
The voice enabling apparatus is connected to main equipment by data transmission module;
The voice enabling apparatus acquires audio data, and handles the audio data, generate the first audio data and
Second audio data;
The voice enabling apparatus exports the first audio data and second audio data to the main equipment.
8. the method according to the description of claim 7 is characterized in that the audio data of voice enabling apparatus acquisition includes sound
Source audio data and reference audio data, the voice enabling apparatus carry out processing to the audio data and include:
Denoising is carried out to sound source audio data according to sound source audio data and reference audio data;
Wave beam forming is carried out to the sound source audio data after denoising, generates the first audio data;
Wake-up identification is carried out to the sound source audio data after denoising, generates second audio data.
9. according to the method described in claim 8, it is characterized in that, voice enabling apparatus acquisition sound source audio data is realized
For
Voice enabling apparatus the first sound source acquisition component is accustomed to direction towards user to be arranged, it is complete by the first sound source acquisition component
At the pickup of sound source audio;
The voice enabling apparatus acquisition reference audio data are embodied as
Second sound source acquisition component of voice enabling apparatus is fixed near the loudspeaker of main equipment, the second sound source acquisition group is passed through
Part completes the pickup of the reference audio of main equipment.
10. according to the method described in claim 9, it is characterized in that, described according to sound source audio data and reference audio data
Carrying out denoising to sound source audio data includes:
Sound source audio data and reference audio data are respectively converted into digital signal;
Subtraction is carried out to the digital signal after conversion;
The digital signal obtained after subtraction is converted into analog signal, the sound source audio data after being denoised.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811644724.4A CN109473111B (en) | 2018-12-29 | 2018-12-29 | Voice enabling device and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811644724.4A CN109473111B (en) | 2018-12-29 | 2018-12-29 | Voice enabling device and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109473111A true CN109473111A (en) | 2019-03-15 |
CN109473111B CN109473111B (en) | 2024-03-08 |
Family
ID=65678383
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811644724.4A Active CN109473111B (en) | 2018-12-29 | 2018-12-29 | Voice enabling device and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109473111B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110213696A (en) * | 2019-06-30 | 2019-09-06 | 联想(北京)有限公司 | Audio frequency apparatus, signal processing method and system |
CN110265029A (en) * | 2019-06-21 | 2019-09-20 | 百度在线网络技术(北京)有限公司 | Speech chip and electronic equipment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101246687A (en) * | 2008-03-20 | 2008-08-20 | 北京航空航天大学 | Intelligent voice interaction system and method thereof |
CN101753871A (en) * | 2008-11-28 | 2010-06-23 | 康佳集团股份有限公司 | Voice remote control TV system |
US20120233765A1 (en) * | 2007-07-31 | 2012-09-20 | Mitchell Altman | System and Method for Controlling the Environment of a Steambath |
CN202721771U (en) * | 2012-04-24 | 2013-02-06 | 青岛海尔电子有限公司 | Television system with audio recognition function |
CN107566874A (en) * | 2017-09-22 | 2018-01-09 | 百度在线网络技术(北京)有限公司 | Far field speech control system based on television equipment |
CN207603830U (en) * | 2017-12-05 | 2018-07-10 | 炬芯(珠海)科技有限公司 | A kind of household electrical appliance intelligent voice system |
CN108364648A (en) * | 2018-02-11 | 2018-08-03 | 北京百度网讯科技有限公司 | Method and device for obtaining audio-frequency information |
CN108447483A (en) * | 2018-05-18 | 2018-08-24 | 深圳市亿道数码技术有限公司 | Speech recognition system |
CN108538305A (en) * | 2018-04-20 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | Audio recognition method, device, equipment and computer readable storage medium |
CN208256287U (en) * | 2017-09-29 | 2018-12-18 | 杭州聪普智能科技有限公司 | Control device and smart home device based on speech recognition |
CN209515191U (en) * | 2018-12-29 | 2019-10-18 | 苏州思必驰信息科技有限公司 | A kind of voice enabling apparatus |
-
2018
- 2018-12-29 CN CN201811644724.4A patent/CN109473111B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120233765A1 (en) * | 2007-07-31 | 2012-09-20 | Mitchell Altman | System and Method for Controlling the Environment of a Steambath |
CN101246687A (en) * | 2008-03-20 | 2008-08-20 | 北京航空航天大学 | Intelligent voice interaction system and method thereof |
CN101753871A (en) * | 2008-11-28 | 2010-06-23 | 康佳集团股份有限公司 | Voice remote control TV system |
CN202721771U (en) * | 2012-04-24 | 2013-02-06 | 青岛海尔电子有限公司 | Television system with audio recognition function |
CN107566874A (en) * | 2017-09-22 | 2018-01-09 | 百度在线网络技术(北京)有限公司 | Far field speech control system based on television equipment |
CN208256287U (en) * | 2017-09-29 | 2018-12-18 | 杭州聪普智能科技有限公司 | Control device and smart home device based on speech recognition |
CN207603830U (en) * | 2017-12-05 | 2018-07-10 | 炬芯(珠海)科技有限公司 | A kind of household electrical appliance intelligent voice system |
CN108364648A (en) * | 2018-02-11 | 2018-08-03 | 北京百度网讯科技有限公司 | Method and device for obtaining audio-frequency information |
CN108538305A (en) * | 2018-04-20 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | Audio recognition method, device, equipment and computer readable storage medium |
CN108447483A (en) * | 2018-05-18 | 2018-08-24 | 深圳市亿道数码技术有限公司 | Speech recognition system |
CN209515191U (en) * | 2018-12-29 | 2019-10-18 | 苏州思必驰信息科技有限公司 | A kind of voice enabling apparatus |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110265029A (en) * | 2019-06-21 | 2019-09-20 | 百度在线网络技术(北京)有限公司 | Speech chip and electronic equipment |
CN110213696A (en) * | 2019-06-30 | 2019-09-06 | 联想(北京)有限公司 | Audio frequency apparatus, signal processing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN109473111B (en) | 2024-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10438607B2 (en) | Device and method for cancelling echo | |
JP7434137B2 (en) | Speech recognition method, device, equipment and computer readable storage medium | |
US11295760B2 (en) | Method, apparatus, system and storage medium for implementing a far-field speech function | |
CN110288997A (en) | Equipment awakening method and system for acoustics networking | |
CN108681440A (en) | A kind of smart machine method for controlling volume and system | |
US11997448B2 (en) | Multi-modal audio processing for voice-controlled devices | |
US10923138B2 (en) | Sound collection apparatus for far-field voice | |
CN109817238A (en) | Audio signal sample device, acoustic signal processing method and device | |
CN205004033U (en) | Cloud intelligence speech recognition PA -system | |
US10667045B1 (en) | Robot and auto data processing method thereof | |
CN110349582A (en) | Display device and far field speech processing circuit | |
CN110875045A (en) | Voice recognition method, intelligent device and intelligent television | |
CN103885744A (en) | Sound based gesture recognition method | |
CN109473111A (en) | A kind of voice enabling apparatus and method | |
CN110992967A (en) | Voice signal processing method and device, hearing aid and storage medium | |
CN114640938A (en) | Hearing aid function implementation method based on Bluetooth headset chip and Bluetooth headset | |
CN109524004A (en) | The voice interaction device and system of a kind of method of parallel transmission that realizing MCVF multichannel voice frequency and data, circumscribed | |
CN209515191U (en) | A kind of voice enabling apparatus | |
US10747494B2 (en) | Robot and speech interaction recognition rate improvement circuit and method thereof | |
WO2017000772A1 (en) | Front-end audio processing system | |
CN109697987A (en) | A kind of the far field voice interaction device and implementation method of circumscribed | |
CN208094741U (en) | A kind of intelligent microphone based on speech recognition technology | |
CN110517682A (en) | Audio recognition method, device, equipment and storage medium | |
CN207039811U (en) | A kind of multimedia microphone Intelligent Measurement audio amplifier | |
US20190152061A1 (en) | Motion control method and device, and robot with enhanced motion control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Applicant after: Sipic Technology Co.,Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Applicant before: AI SPEECH Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |