CN109473111A - A kind of voice enabling apparatus and method - Google Patents

A kind of voice enabling apparatus and method Download PDF

Info

Publication number
CN109473111A
CN109473111A CN201811644724.4A CN201811644724A CN109473111A CN 109473111 A CN109473111 A CN 109473111A CN 201811644724 A CN201811644724 A CN 201811644724A CN 109473111 A CN109473111 A CN 109473111A
Authority
CN
China
Prior art keywords
audio data
sound source
voice
denoising
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811644724.4A
Other languages
Chinese (zh)
Other versions
CN109473111B (en
Inventor
雷雄国
涂长宇
郑炜乔
郭彭亮
刘强
何家锋
徐瑞婷
卢玉环
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AI Speech Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201811644724.4A priority Critical patent/CN109473111B/en
Publication of CN109473111A publication Critical patent/CN109473111A/en
Application granted granted Critical
Publication of CN109473111B publication Critical patent/CN109473111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present invention discloses a kind of voice enabling apparatus, including, sound source acquisition module is exported for acquiring audio data to speech processing module;Speech processing module generates the first audio data and second audio data for handling audio data;Data transmission module exports the first audio data and second audio data to the external equipment being attached thereto for realizing the data interaction with external equipment.The invention also discloses a kind of application devices to carry out the method that voice is energized, the apparatus according to the invention and method may be implemented will not speech identifying function host equipment assign voice interactive function, and the noise treatment problem in the prior art to speech recognition is overcome, speech recognition result is optimized.And it reduces power consumption, be not take up resource.

Description

A kind of voice enabling apparatus and method
Technical field
The present invention relates to technical field of voice interaction, especially a kind of voice enabling apparatus and method.
Background technique
With the development of science and technology, smart machine is more more and more universal, but at present on the market, most of smart machine does not have Interactive voice ability, and commonly the equipment with voice interactive function is mostly that near field pickup interaction or simple single-wheel dialogue are set Meter, it is not high for the accuracy of processing and the speech recognition of noise in interactive voice, while host equipment can not be played Source of sound eliminated, to cannot achieve far field Speech processing.
Another aspect, the interactive voice of most equipment are all run on the host device, have certain influence to power consumption, usually It can be unable to reach low-power consumption requirement, while most of front end signal processing is also placed in host equipment and carries out operation, to system resource There is larger occupancy, influences running efficiency of system.
Summary of the invention
In view of the above-mentioned problems, the present invention is directed to propose a kind of technical side for the far field interactive voice that can be realized host equipment Case especially convenient can be realized and be handed over the far field voice of host equipment on the basis of not changing host equipment structure The solution of mutual Function Extension.
According to the first aspect of the invention, a kind of voice enabling apparatus is provided, including
Sound source acquisition module is exported for acquiring audio data to following speech processing modules;
Speech processing module generates the first audio data for handling the audio data;
Data transmission module exports the first audio data to connecting therewith for realizing the data interaction with external equipment The external equipment connect.
According to the second aspect of the invention, a kind of method for realizing that voice is energized by voice enabling apparatus is provided, Include the following steps:
Voice enabling apparatus is connected to main equipment by data transmission module;
Voice enabling apparatus acquires audio data, and handles the audio data, generate the first audio data and Second audio data;
Voice enabling apparatus exports the first audio data and second audio data to main equipment.
The device and method provided according to the present invention, may be implemented will not speech identifying function host equipment assign language Sound interactive function, and can be by data transmission module directly and host devices communication, with the acquisition of degree of realization audio-frequency information And processing, enable the host equipment being attached thereto easily to possess far field interactive voice ability, pole easily extends master The phonetic function of machine equipment.In addition, device and method provided in an embodiment of the present invention can carry out front end signal to audio data Processing, the problems such as front end signal processing bring reduces power consumption, occupancy resource will be carried out by overcoming host equipment in the prior art.
Detailed description of the invention
Fig. 1 is the voice enabling apparatus functional block diagram of an embodiment of the present invention;
Fig. 2 is the voice enabling apparatus functional block diagram of a further embodiment of this invention;
Fig. 3 is the method flow diagram energized by voice enabling apparatus realization voice of an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.
The present invention can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, member Part, data structure etc..The present invention can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.
In the present invention, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, hardware Combination, software or software in execution with software etc..In detail, for example, element can with but be not limited to run on processing Process, processor, object, executable element, execution thread, program and/or the computer of device.In addition, running on server Application program or shell script, server can be element.One or more elements can be in the process and/or thread of execution In, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and can be by each Kind computer-readable medium operation.Element can also according to the signal with one or more data packets, for example, from one with Another element interacts in local system, distributed system, and/or the network in internet passes through signal and other system interactions The signals of data communicated by locally and/or remotely process.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise", not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or equipment institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence " including ... ", it is not excluded that including described want There is also other identical elements in the process, method, article or equipment of element.
The invention will now be described in further detail with reference to the accompanying drawings.
Fig. 1 schematically shows a kind of voice enabling apparatus functional block diagram of embodiment according to the present invention.Such as Fig. 1 It is shown,
Voice enabling apparatus includes: sound source acquisition module 1, speech processing module 2 and data transmission module 3.
Wherein, sound source acquisition module 1 is exported for acquiring audio data to speech processing module 2.Exemplary, the module is real It is now multiple microphones, is especially moveable shotgun microphone, the positioning to sound source may be implemented, user can be directly opposite The module issues the instruction, such as " I will record " etc. of interactive voice, to realize far field pickup.And set removable for microphone Dynamic, then it can be enhanced by adjusting the direction of microphone and realizing for Sounnd source direction, other angles noise is weakened, It thereby may be ensured that the quality of audio.
Speech processing module 2 is for handling audio data, the first audio data of generation and second audio data, the One audio data is the phonetic order that user issues, and second audio data is to wake up control signal, i.e., relevant to result is waken up Content-data can carry out voice wake-up in the present apparatus according to the phonetic order that user issues and obtain wake-up control signal.? Other realize in example that it does not include that voice wakes up identifying processing that voice enabling apparatus, which may be set to be, only generate the first audio number According to i.e. only progress front end signal processing.
Data transmission module 3 is for realizing the data interaction with external equipment, by the first audio data and the second audio number According to output to the external equipment being attached thereto, can thus make without the host equipment of voice interactive function according to the first sound Frequency evidence and second audio data realize voice interactive function.Data transmission module 3 supports usb protocol, Bluetooth protocol and WiFi At least one of agreement illustratively can be implemented as USB interface.Front end signal processing is only carried out in voice enabling apparatus In the case of, data transmission module 3 exports the first audio data to the external equipment being attached thereto.
Wherein, sound source acquisition module 1 includes the first sound source acquisition component 101 and the second sound source acquisition component 102.First sound Source acquisition component 101 is for acquiring sound source audio data;Second sound source acquisition component 102 is for acquiring reference audio data.Show Example property, the first sound source acquisition component 101 and the second sound source acquisition component 102 are embodied as two moveable microphones, right The voice of typing carries out the audio collection of 16k/16bit.Acquire sound source audio data when, can by user directly against two can Mobile microphone is spoken, by 101 typing sound source audio of the first sound source acquisition component.Reference audio data then predominantly for Connection host equipment background sound, can be directly by moveable microphone close to sound mouth (such as loudspeaker of host equipment ), or multidigit angle is rotated against needing to shield the direction of source of sound, to collect the source of sound or shield side of host equipment broadcasting To source of sound as reference audio data.Two audio datas that will acquire are transmitted to speech processing module 2.
Speech processing module 2 includes that noise eliminates unit 201 and beam forming unit 203.
Noise eliminates unit 201 for going according to sound source audio data and reference audio data to sound source audio data It makes an uproar processing, so as to optimizing speech recognition as a result, obtain more accurate speech recognition effect, overcomes in the prior art The interference of background sound.
Beam forming unit 203 is used to carry out Wave beam forming to the sound source audio data after denoising, after realizing to denoising Sound source audio data filtering processing, to obtain to export to the first pure audio data of external equipment.
Wherein, noise eliminates the noise reduction technology that unit 201 mainly applies DSP (Digital Signal Processing), including modulus turns Change component 2011, echo cancellor component 2012 and digital-to-analogue transition components 2013.Analog-to-digital conversion component 2011 is used for sound source audio Data and reference audio data carry out analog-to-digital conversion, which is internally provided with the circuit that can carry out analog-to-digital conversion, referring to existing There is the analog-to-digital conversion mode of technology to generate digital signal.Echo cancellor component 2012 is used for the number generated according to analog-to-digital conversion component Word signal carries out subtraction, the sound source digital signal after obtaining denoising subtracts the corresponding digital signal of sound source audio data Digital signal after going the corresponding digital signal of reference audio data to be denoised, as sound source digital signal.Digital-to-analogue conversion group Part 2013 is used to carry out digital-to-analogue conversion to the sound source digital signal after denoising, the sound source audio data after generating denoising.According to this Working in coordination for several components can obtain the audio data removed with reference to sound data.
Filtering, which forms unit 203 and is referred to the prior art, to be realized, therefore to its implementation without repeating.
It may be implemented to assign some host equipment interactive voice abilities without voice interactive function according to the present embodiment, And the front end signals processing such as denoise, filter for the phonetic order of the user of acquisition, content, it is more excellent so as to obtain The speech recognition result of change.Meanwhile the device of the embodiment of the present invention can external equipment simply and can be achieved with far field to pick up Sound, the design for integrating multiple microphones facilitate the positioning for carrying out sound source, and to be enhanced for Sounnd source direction, and other angles are made an uproar Sound is weakened, to guarantee the quality of audio.And it, can be specifically for the patch of property for the background sound issued on host equipment Nearly host equipment sound mouth is used as so as to collect source of sound or the shield direction source of sound of host equipment broadcasting with reference to sound, and Echo cancellor is carried out, such source of sound is interfered and carries out anti-noise processing, to realize the function of Statistical error audio.
In addition, the functions such as the front end signal processing and wake-up of voice are integrated into hardware chip, to no longer occupy master The system resource of machine equipment, while in power consumption, on special speech chip, there can be larger optimization to phonetic algorithm, To realize low-power consumption requirement.
Fig. 2 is the voice enabling apparatus functional block diagram of a further embodiment of this invention.As shown in Fig. 2,
The speech processing module 2 of the voice enabling apparatus further includes waking up authentication unit 202 and second audio data generation Unit 205.
It wakes up authentication unit 202 to be used to carry out the sound source audio data after denoising wake-up identification, generates and wake up control letter Number and wake up angle, which knows is parsed by the voice content to the sound source audio after denoising or right otherwise The semantic interpretation answered, is identified according to semanteme, show that the wake-up word to be expressed of user, implementation are referred to existing skill Art, wherein wake up parameter of the angle for inventor according to semantic parsing addition, the mode for obtaining wake-up angle may is that in sound At sound acquisition, the microphone being made of multiple microphones acquires array, by data that multiple microphones acquire while being given to voice Wake up authentication unit 202, the unit can using wake up phonetic algorithm according to different microphones receive audio case propagation delays and Ability is distributed to confirm point source of sound, since each frame audio can all have sound positioning, so passing through the confirmation sound when waking up verifying Source point, so that it may obtain sound positioning result, be exported as angle is waken up.The time delay feelings of audio are determined using phonetic algorithm Condition and ability distribution can be achieved by the prior art.
Preferably, in the embodiment of the present invention, speech processing module further includes the first audio data generation unit 204.With this Meanwhile beam forming unit 203 is used to carry out Wave beam forming to the sound source audio data after denoising, generates three road audio streams i.e. three The audio output of road 16k.First audio data generation unit 204 is used for the three road audio streams generated to beam forming unit 203 It carries out processing and generates the output of the first audio data, specifically take any road audio to export as the first audio data, then rely on sound source The pointed wake-up angle of positioning, wake-up angle pointed by auditory localization result are when waking up processing and to wake up result It exports together.
For second audio data, it comprises the control signals of wake-up, and it is raw directly to transmit it to second audio data At unit 205, it is used to be handled (number turns audio) to the wake-up control signal for waking up the generation of authentication unit 202 equally raw At the audio of 48k, i.e. second audio data exports.
First audio data and the two audio datas of second audio data are transmitted by the driving of data transmission module 3 To the application layer of host equipment, application layer carries out the first audio data to split into three parts by the audio data of acquisition two-way Audio A, B, C are stored to round-robin queue, are recalled based on OneShot.Duration is carried out to the wake-up signal in second audio data Monitoring.When listening to wake-up signal, obtain the wake-up signal is to which road audio of A, B, C according to beam forming unit 203 As identification object, so that corresponding identification object be matched with wake-up signal, interactive voice is realized.
According to the present embodiment may be implemented will not speech identifying function host equipment assign voice interactive function, and The noise treatment problem in the prior art to speech recognition is overcome, speech recognition result is optimized.Also, before voice The functions such as end signal processing and wake-up are integrated into hardware chip, so that the system resource of host equipment is no longer occupied, while Power consumption can have larger optimization on special speech chip to phonetic algorithm, to realize low-power consumption requirement.
Fig. 3 schematically shows that application voice enabling apparatus according to an embodiment of the present invention realizes the voice side of energizing Method flow chart, as shown in figure 3, the present embodiment includes the following steps:
Step S301: voice enabling apparatus is connected to main equipment by data transmission module.Can by usb protocol, Bluetooth protocol and WIFI agreement etc. establish connection with main equipment, which supports a plurality of types of main equipments.
Step S302: voice enabling apparatus acquires audio data, and handles audio data, generates the first audio number According to and second audio data.Wherein, the audio data of voice enabling apparatus acquisition includes sound source audio data and reference audio number According to.Specific implementation are as follows: denoising, the denoising are carried out to sound source audio data according to sound source audio data and reference audio data The mode of processing applies noise reduction technology in DSP.In order to facilitate the calculating process of denoising, first by sound source audio data and with reference to sound Frequency carries out subtraction, the number that will be obtained after subtraction according to digital signal is respectively converted into, to the digital signal after conversion Word signal is converted to analog signal, thus the sound source audio data after being denoised.It is thus achieved that the effect of optimization interactive voice Fruit.
And Wave beam forming is carried out to the sound source audio data after denoising, generates the first audio data, it is also right at the same time Sound source audio data after denoising carries out wake-up identification, generates second audio data.And to the sound source audio number after denoising When according to carrying out Wave beam forming, audio selection is carried out also according to angle is waken up, specifically, since sound source acquisition module 1 includes multiple Microphone generates having MCVF multichannel voice frequency after beamforming (beam forming) algorithm, respectively corresponds different angle Enhance audio, and specifically take which road audio is exported as the first audio data, then relies on wake-up angle pointed by auditory localization It spends, wake-up angle pointed by auditory localization result is to export together when waking up processing with wake-up result.Specifically Implementation is referred to the device realization principle of Fig. 2.
Step S303: voice enabling apparatus exports the first audio data and second audio data to main equipment.The data The mode of transmission is referred to step S301, and specific implementation can establish in voice enabling apparatus and adapt to multiple types main equipment Multiple interfaces.
According to this method may be implemented will not speech identifying function host equipment assign voice interactive function, and gram The noise treatment problem in the prior art to speech recognition has been taken, speech recognition result is optimized, and has reached reduction master Machine equipment power consumption is not take up resource and other effects.
By taking external host equipment is television set as an example, voice enabling apparatus application of the invention is realized on a television set The specifically used method of the far field pickup of television set is as follows:
Firstly, the voice enabling apparatus is mounted on the top of television set by user, the microphone array of its main part is ensured Column are accustomed to direction towards user, and centre is maintained at level angle without main barrier as far as possible.Later, voice is energized The USB line of device is inserted in the junction at television set rear, to keep power supply and signal transmission.Again by the Mike of voice enabling apparatus Near the loudspeaker that wind array is fixed on television set in a manner of pasting etc..The installation process of the voice enabling apparatus is completed with this.
In use, voice enabling apparatus is completed by microphone (the first i.e. above-mentioned sound source acquisition component 101) The process that the sound that user issues is picked up.And microphone (i.e. above-mentioned by being pasted near television set speaker Two sound source acquisition components 102) pickup of the completion to the spontaneous sound of television set.By voice enabling apparatus to two groups of sound of acquisition It compares, completion filters out spontaneous sound, obtains the instruction sound that user actively issues.To complete further to believe Number processing.Subsequent treatment process is referring to above-mentioned method part.
From there through the mode of this external transmission audio, can be transmitted necessary to audio to avoid soft circuit in system layer System debug work;Hard circuit is also avoided to work for the dependence of terminal and system adaptation.Equipment is preferably reduced simultaneously The true interference of Self-sounding part is avoided since power amplification system, loudspeaker etc. broadcast link in sound and voice signal is asynchronous, Caused by problem.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It is realized by the mode of software plus general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned technology Scheme substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, the computer Software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions to So that computer equipment (can be personal computer, server or the network equipment etc.) execute each embodiment or Method described in certain parts of embodiment.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. voice enabling apparatus, which is characterized in that including
Sound source acquisition module is exported for acquiring audio data to following speech processing modules;
Speech processing module generates the first audio data for handling the audio data;
Data transmission module exports the first audio data to being attached thereto for realizing the data interaction with external equipment External equipment.
2. the apparatus according to claim 1, which is characterized in that the sound source acquisition module includes the first sound source acquisition group Part, for acquiring sound source audio data;
Second sound source acquisition component, for acquiring reference audio data;
The speech processing module includes
Noise eliminates unit, for being carried out at denoising according to sound source audio data and reference audio data to sound source audio data Reason;With
Beam forming unit generates the output of the first audio data for carrying out Wave beam forming to the sound source audio data after denoising.
3. the apparatus of claim 2, wherein the noise eliminates unit and includes
Analog-to-digital conversion component generates digital signal for carrying out analog-to-digital conversion to sound source audio data and reference audio data;
Echo cancellor component, the digital signal for being generated according to analog-to-digital conversion component carries out subtraction, after obtaining denoising Sound source digital signal;
Digital-to-analogue conversion component, for carrying out digital-to-analogue conversion to the sound source digital signal after denoising, the sound source audio after generating denoising Data.
4. device according to claim 3, which is characterized in that it is logical that the speech processing module also generates second audio data It crosses the data transmission module to export to the external equipment, the speech processing module further includes
Authentication unit is waken up, for carrying out wake-up identification to the sound source audio data after denoising, generates and wakes up control signal;
Second audio data generation unit generates for handling to waking up the wake-up control signal that authentication unit generates The output of two audio datas.
5. according to the described in any item devices of claim 2 to 4, which is characterized in that the first sound source acquisition component and second Sound source acquisition component is embodied as at least two moveable microphones.
6. device according to claim 5, wherein the data transmission module supports usb protocol, WIFI agreement and bluetooth At least one of agreement.
7. realizing that voice is energized method by voice enabling apparatus as claimed in claim 4, which is characterized in that including walking as follows It is rapid:
The voice enabling apparatus is connected to main equipment by data transmission module;
The voice enabling apparatus acquires audio data, and handles the audio data, generate the first audio data and Second audio data;
The voice enabling apparatus exports the first audio data and second audio data to the main equipment.
8. the method according to the description of claim 7 is characterized in that the audio data of voice enabling apparatus acquisition includes sound Source audio data and reference audio data, the voice enabling apparatus carry out processing to the audio data and include:
Denoising is carried out to sound source audio data according to sound source audio data and reference audio data;
Wave beam forming is carried out to the sound source audio data after denoising, generates the first audio data;
Wake-up identification is carried out to the sound source audio data after denoising, generates second audio data.
9. according to the method described in claim 8, it is characterized in that, voice enabling apparatus acquisition sound source audio data is realized For
Voice enabling apparatus the first sound source acquisition component is accustomed to direction towards user to be arranged, it is complete by the first sound source acquisition component At the pickup of sound source audio;
The voice enabling apparatus acquisition reference audio data are embodied as
Second sound source acquisition component of voice enabling apparatus is fixed near the loudspeaker of main equipment, the second sound source acquisition group is passed through Part completes the pickup of the reference audio of main equipment.
10. according to the method described in claim 9, it is characterized in that, described according to sound source audio data and reference audio data Carrying out denoising to sound source audio data includes:
Sound source audio data and reference audio data are respectively converted into digital signal;
Subtraction is carried out to the digital signal after conversion;
The digital signal obtained after subtraction is converted into analog signal, the sound source audio data after being denoised.
CN201811644724.4A 2018-12-29 2018-12-29 Voice enabling device and method Active CN109473111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811644724.4A CN109473111B (en) 2018-12-29 2018-12-29 Voice enabling device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811644724.4A CN109473111B (en) 2018-12-29 2018-12-29 Voice enabling device and method

Publications (2)

Publication Number Publication Date
CN109473111A true CN109473111A (en) 2019-03-15
CN109473111B CN109473111B (en) 2024-03-08

Family

ID=65678383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811644724.4A Active CN109473111B (en) 2018-12-29 2018-12-29 Voice enabling device and method

Country Status (1)

Country Link
CN (1) CN109473111B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110213696A (en) * 2019-06-30 2019-09-06 联想(北京)有限公司 Audio frequency apparatus, signal processing method and system
CN110265029A (en) * 2019-06-21 2019-09-20 百度在线网络技术(北京)有限公司 Speech chip and electronic equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246687A (en) * 2008-03-20 2008-08-20 北京航空航天大学 Intelligent voice interaction system and method thereof
CN101753871A (en) * 2008-11-28 2010-06-23 康佳集团股份有限公司 Voice remote control TV system
US20120233765A1 (en) * 2007-07-31 2012-09-20 Mitchell Altman System and Method for Controlling the Environment of a Steambath
CN202721771U (en) * 2012-04-24 2013-02-06 青岛海尔电子有限公司 Television system with audio recognition function
CN107566874A (en) * 2017-09-22 2018-01-09 百度在线网络技术(北京)有限公司 Far field speech control system based on television equipment
CN207603830U (en) * 2017-12-05 2018-07-10 炬芯(珠海)科技有限公司 A kind of household electrical appliance intelligent voice system
CN108364648A (en) * 2018-02-11 2018-08-03 北京百度网讯科技有限公司 Method and device for obtaining audio-frequency information
CN108447483A (en) * 2018-05-18 2018-08-24 深圳市亿道数码技术有限公司 Speech recognition system
CN108538305A (en) * 2018-04-20 2018-09-14 百度在线网络技术(北京)有限公司 Audio recognition method, device, equipment and computer readable storage medium
CN208256287U (en) * 2017-09-29 2018-12-18 杭州聪普智能科技有限公司 Control device and smart home device based on speech recognition
CN209515191U (en) * 2018-12-29 2019-10-18 苏州思必驰信息科技有限公司 A kind of voice enabling apparatus

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120233765A1 (en) * 2007-07-31 2012-09-20 Mitchell Altman System and Method for Controlling the Environment of a Steambath
CN101246687A (en) * 2008-03-20 2008-08-20 北京航空航天大学 Intelligent voice interaction system and method thereof
CN101753871A (en) * 2008-11-28 2010-06-23 康佳集团股份有限公司 Voice remote control TV system
CN202721771U (en) * 2012-04-24 2013-02-06 青岛海尔电子有限公司 Television system with audio recognition function
CN107566874A (en) * 2017-09-22 2018-01-09 百度在线网络技术(北京)有限公司 Far field speech control system based on television equipment
CN208256287U (en) * 2017-09-29 2018-12-18 杭州聪普智能科技有限公司 Control device and smart home device based on speech recognition
CN207603830U (en) * 2017-12-05 2018-07-10 炬芯(珠海)科技有限公司 A kind of household electrical appliance intelligent voice system
CN108364648A (en) * 2018-02-11 2018-08-03 北京百度网讯科技有限公司 Method and device for obtaining audio-frequency information
CN108538305A (en) * 2018-04-20 2018-09-14 百度在线网络技术(北京)有限公司 Audio recognition method, device, equipment and computer readable storage medium
CN108447483A (en) * 2018-05-18 2018-08-24 深圳市亿道数码技术有限公司 Speech recognition system
CN209515191U (en) * 2018-12-29 2019-10-18 苏州思必驰信息科技有限公司 A kind of voice enabling apparatus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110265029A (en) * 2019-06-21 2019-09-20 百度在线网络技术(北京)有限公司 Speech chip and electronic equipment
CN110213696A (en) * 2019-06-30 2019-09-06 联想(北京)有限公司 Audio frequency apparatus, signal processing method and system

Also Published As

Publication number Publication date
CN109473111B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
US10438607B2 (en) Device and method for cancelling echo
JP7434137B2 (en) Speech recognition method, device, equipment and computer readable storage medium
US11295760B2 (en) Method, apparatus, system and storage medium for implementing a far-field speech function
CN110288997A (en) Equipment awakening method and system for acoustics networking
CN108681440A (en) A kind of smart machine method for controlling volume and system
US11997448B2 (en) Multi-modal audio processing for voice-controlled devices
US10923138B2 (en) Sound collection apparatus for far-field voice
CN109817238A (en) Audio signal sample device, acoustic signal processing method and device
CN205004033U (en) Cloud intelligence speech recognition PA -system
US10667045B1 (en) Robot and auto data processing method thereof
CN110349582A (en) Display device and far field speech processing circuit
CN110875045A (en) Voice recognition method, intelligent device and intelligent television
CN103885744A (en) Sound based gesture recognition method
CN109473111A (en) A kind of voice enabling apparatus and method
CN110992967A (en) Voice signal processing method and device, hearing aid and storage medium
CN114640938A (en) Hearing aid function implementation method based on Bluetooth headset chip and Bluetooth headset
CN109524004A (en) The voice interaction device and system of a kind of method of parallel transmission that realizing MCVF multichannel voice frequency and data, circumscribed
CN209515191U (en) A kind of voice enabling apparatus
US10747494B2 (en) Robot and speech interaction recognition rate improvement circuit and method thereof
WO2017000772A1 (en) Front-end audio processing system
CN109697987A (en) A kind of the far field voice interaction device and implementation method of circumscribed
CN208094741U (en) A kind of intelligent microphone based on speech recognition technology
CN110517682A (en) Audio recognition method, device, equipment and storage medium
CN207039811U (en) A kind of multimedia microphone Intelligent Measurement audio amplifier
US20190152061A1 (en) Motion control method and device, and robot with enhanced motion control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant before: AI SPEECH Co.,Ltd.

GR01 Patent grant
GR01 Patent grant