CN108573699A - Voice sharing recognition methods - Google Patents

Voice sharing recognition methods Download PDF

Info

Publication number
CN108573699A
CN108573699A CN201710144058.7A CN201710144058A CN108573699A CN 108573699 A CN108573699 A CN 108573699A CN 201710144058 A CN201710144058 A CN 201710144058A CN 108573699 A CN108573699 A CN 108573699A
Authority
CN
China
Prior art keywords
signal
speech recognition
voice
voice signal
background service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710144058.7A
Other languages
Chinese (zh)
Inventor
陈新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201710144058.7A priority Critical patent/CN108573699A/en
Publication of CN108573699A publication Critical patent/CN108573699A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02163Only one microphone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a kind of voice sharing recognition methods, when there are when multiple speech recognition equipments in same environment, each speech recognition equipment is triggered by voice signal, proximity sensor, button or remote signal, it can reinforce phonetic recognization rate by background service device or mutually collaboration, and the specified one or more speech recognition equipments of service data are responded;Background service device can distinguish each voice signal source and position according to the device identification and/or network address of speech recognition equipment, identify that matching rate preselects speech recognition equipment according to highest, and all voice signals are aligned according to same initiation feature, it carries out signal and merges tuning and segment polishing, more preferably voice signal and new identification matching rate are generated, finally goes out to acquire the best speech recognition equipment of signal according to the intensity of voice signal, sound source position and feature calculation to continue multi-process interaction.

Description

Voice sharing recognition methods
Technical field
The present invention relates to a kind of voice sharing recognition methods, and voice is detected simultaneously by suitable for working as more speech recognition equipments When signal, determining complete speech signal, and the treatment mechanism that one or more specified device is responded how are cooperateed with.
Technical background
Common speech recognition equipment can only be triggered by specific voice signal, be responded, voice signal 5 meters with Upper discrimination can be greatly reduced, and in order to respond the voice of user at any time, more speech recognitions are just needed in big environment Device, each device all independently can be identified and be responded at present, can be interfered with each other, and bad user experience is caused.If can be Coordinate between more speech recognition equipments, recognition effect can be reinforced by the voice signal of multi pass acquisition, while according to signal The parameters such as intensity, direction and distance, find and are responded with the most matched device of user, speech-recognition services can be greatly improved Experience.
Invention content
The invention discloses a kind of voice sharing recognition methods, which is characterized in that for having speech trigger and will acquire Voice signal be uniformly sent to the speech recognition equipment that specified background service device is handled, when in same environment exist it is more When a speech recognition equipment, it can be known to reinforce voice using voice sharing recognition methods by the background service device Not rate finds related service data, and specified one or more speech recognition equipments are responded;The voice sharing identification side Method is, the background service device, can be from receiving what first speech recognition equipment was sent when idle state receives data Start timing at the time of residing for voice signal, in particular time range, waits and to be received send from each speech recognition equipment Voice signal, background service device can distinguish respectively according to the device identification and/or network address of the speech recognition equipment Each voice signal can be identified in voice signal source, background service device, select the identification highest PRELIMINARY RESULTS of matching rate Corresponding speech recognition equipment is set as preliminary response apparatus, if highest identification matching rate does not reach the minimum of setting and wants It asks, after all voice signals can be aligned by background service device according to same initiation feature again, carries out the merging tuning of signal It with segment polishing, generates more preferably voice signal and is identified, if recognition result is different, and identifies matching rate higher, then PRELIMINARY RESULTS is replaced with new recognition result, and goes out to acquire signal according to the intensity of voice signal, sound source position and feature calculation Best speech recognition equipment replaces preliminary response apparatus, and recognition result is received by finally determining response apparatus, continues Follow-up processing flow;When multiple speech recognition equipments are in consolidated network, due to the Intranet of each speech recognition equipment Address is different, and background service device is identical in outer net address, remains able to distinguish each voice according to internal address Signal source.
The voice sharing recognition methods, which is characterized in that the speech trigger of the speech recognition equipment is to pass through list Microphone acquires voice signal realization in real time, and speech recognition equipment acquires voice signal by low-power consumption, held according to voice signal Continuous amplitude characteristic is then activated into acquisition in real time, judges voice signal according to local identification library when lasting amplitude is more than setting value Whether it is trigger signal or locally executes instruction, if not trigger signal and locally executes instruction and then ignore current speech letter Number, continue to acquire, if it is trigger signal, first send out response signal, is further continued for acquisition subsequent voice signal, is sent to background service Device then first sends out response signal if it is instruction is locally executed, then executes local control operation.
The voice sharing recognition methods, which is characterized in that the speech trigger of the speech recognition equipment is to pass through wheat Gram wind array acquires voice signal realization in real time, and speech recognition equipment calculates sound source position according to the voice signal of microphone array It sets, and the microphone signal of corresponding position can be reinforced, the microphone signal in the other positions that decay ultimately generates high quality Voice signal and sound source position information, speech recognition equipment according to local identification library judge voice signal whether be trigger signal or Instruction is locally executed, then ignores voice signal with instruction is locally executed if not trigger signal, continues to acquire, if it is triggering Signal first sends out response signal, and voice signal harmony source location information is sent to backstage, while continuing to acquire subsequent voice letter Number, it is continuously sent to background service device, if it is instruction is locally executed, then first sends out response signal, then execute local control Operation.
The voice sharing recognition methods, which is characterized in that the speech trigger of the speech recognition equipment is to pass through people Body induction sensor acquires signal realization in real time, when human body comes close to or in contact with speech recognition equipment, can trigger human body sensing Sensor sends out trigger signal, and speech recognition equipment first sends out response signal, is further continued for acquisition subsequent voice signal, is sent to backstage Service unit.
The voice sharing recognition methods, which is characterized in that the speech trigger of the speech recognition equipment be by by Button realizes that when a button is pressed, can send out trigger signal, speech recognition equipment first sends out response signal, is further continued for adopting Collect subsequent voice signal, is sent to background service device.
The voice sharing recognition methods, which is characterized in that the speech trigger of the speech recognition equipment is to pass through nothing Line reception device is realized, when radio receiver receives specific wireless signal, can send out trigger signal, speech recognition Device first sends out response signal, is further continued for acquisition subsequent voice signal, is sent to background service device.
The voice sharing recognition methods, which is characterized in that after the speech recognition equipment receives trigger signal, a side Face carries out local identification, while trigger signal is sent to background service device, background service device according to trigger signal quality and/ Or arrival time, only allow one or more speech recognition equipments to send out response signal, all speech recognition equipments being triggered after Continuous acquisition subsequent voice signal, is sent to background service device.
The voice sharing recognition methods, which is characterized in that after the speech recognition equipment receives trigger signal, a side Face carries out local identification, while trigger signal is sent to background service device, background service device according to trigger signal quality and/ Or arrival time, it only allows one or more speech recognition equipments to send out response signal, and continue to acquire subsequent voice signal, is sent to Background service device, it is other to be not allowed to the speech recognition equipment of response that not continue to acquisition subsequent voice signal.
The voice sharing recognition methods, which is characterized in that advanced after the speech recognition equipment receives trigger signal Trigger signal, is sent to background service device, background service device is according to trigger signal matter by the local identification of row again after identifying successfully Amount and/or arrival time, one or more speech recognition equipments is only allowed to send out response signal, and continues to acquire subsequent voice letter Number, it is sent to background service device, it is other to be not allowed to the speech recognition equipment of response that not continue to acquisition subsequent voice signal.
The voice sharing recognition methods, which is characterized in that the speech recognition equipment passes to after receiving trigger signal Background service device, and be allowed to after sending out response signal, according to the voice signal of subsequent acquisition, it is sent to background service device, By returned data to play sound, control illumination, transmission infrared forwarding data, transmission wireless data, execute local control instruction In one or more modes presented, and within certain time, continue to acquire voice signal, pass back to background service Device forms multi-process interaction, this is triggered without trigger signal in the process, unless voice is not detected in time-out time Signal, then speech recognition equipment return to state to be triggered, wait for trigger signal that could enter interaction flow..
Specific implementation mode
The voice sharing recognition methods of the present invention, specific implementation mode are that the master controller of speech recognition equipment uses The high speed processor of programmable band DSP, built-in memory unit, the infrared link block with radio function of outer tape splicing, and Trigger button and infrared proximity transducer, user use for the first time, need the application program controlling by external equipment, set voice The position of identification device and device identification, registering and log in background service account number can normal use.
When master controller receives button, after the trigger signal of infrared proximity transducer or voice signal, make a sound and lamp Optical response signal, and continue to detect voice signal and be sent to background service device and handled, by the service data received to play Sound controls light, transmits infrared signal, and the mode for transmitting wireless signal is presented.
When there are multiple speech recognition equipments, coordinated by background service device, or coordinated between each other so that only Have from the speech recognition equipment response user voice that user is nearest or acquisition signal is best and feeds back.

Claims (10)

1. a kind of voice sharing recognition methods, which is characterized in that for having speech trigger and unifying the voice signal of acquisition It is sent to the speech recognition equipment that specified background service device is handled, when there are multiple speech recognitions in same environment When device, phonetic recognization rate can be reinforced using voice sharing recognition methods by the background service device, find correlation Service data, and specified one or more speech recognition equipments are responded;The voice sharing recognition methods is the backstage Service unit, can be from receiving residing for the voice signal that first speech recognition equipment is sent when idle state receives data Moment starts timing, in particular time range, the voice signal to be received sent from each speech recognition equipment, backstage is waited to take Business device can distinguish each voice signal source according to the device identification and/or network address of the speech recognition equipment, after Each voice signal can be identified in platform service unit, select the corresponding speech recognition dress of the identification highest PRELIMINARY RESULTS of matching rate It sets, is set as preliminary response apparatus, if highest identification matching rate does not reach the minimum requirements of setting, background service device meeting After all voice signals are aligned according to same initiation feature again, the merging tuning and segment polishing of signal are carried out, is generated more Ideal voice signal is identified, if recognition result is different, and identifies matching rate higher, is then replaced with new recognition result PRELIMINARY RESULTS, and go out to acquire the best speech recognition equipment of signal according to the intensity of voice signal, sound source position and feature calculation, Preliminary response apparatus is replaced, recognition result is received by finally determining response apparatus, continues follow-up processing flow;When multiple The speech recognition equipment is in consolidated network, since the internal address of each speech recognition equipment is different, background service dress It sets identical in outer net address, remains able to distinguish each voice signal source according to internal address.
2. according to the voice sharing recognition methods described in claim 1, which is characterized in that the voice of the speech recognition equipment Triggering is to acquire voice signal realization in real time by single microphone, and speech recognition equipment acquires voice signal, root by low-power consumption Continue amplitude characteristic according to voice signal, when lasting amplitude is more than setting value, then activates into acquisition in real time, library is identified according to local Judge whether voice signal is trigger signal or locally executes instruction, then ignores with instruction is locally executed if not trigger signal Current speech signal continues to acquire, and if it is trigger signal, first sends out response signal, is further continued for acquisition subsequent voice signal, send Response signal is then first sent out if it is instruction is locally executed to background service device, then executes local control operation.
3. according to the voice sharing recognition methods described in claim 1, which is characterized in that the voice of the speech recognition equipment Triggering is to acquire voice signal realization in real time by microphone array, and speech recognition equipment is according to the voice signal of microphone array Sound source position is calculated, and the microphone signal of corresponding position can be reinforced, the microphone signal in the other positions that decay, finally The voice signal and sound source position information of high quality are generated, whether speech recognition equipment judges voice signal according to local identification library For trigger signal or instruction is locally executed, then ignores voice signal with instruction is locally executed if not trigger signal, continues to adopt Collection, if it is trigger signal, first sends out response signal, voice signal harmony source location information is sent to backstage, while continuing to adopt Collect subsequent voice signal, be continuously sent to background service device, if it is instruction is locally executed, then first sends out response signal, then hold The local control operation of row.
4. according to the voice sharing recognition methods described in claim 1, which is characterized in that the voice of the speech recognition equipment Triggering is to acquire signal realization in real time by human body sensor, can when human body comes close to or in contact with speech recognition equipment Triggering human body sensor sends out trigger signal, and speech recognition equipment first sends out response signal, is further continued for acquisition subsequent voice Signal is sent to background service device.
5. according to the voice sharing recognition methods described in claim 1, which is characterized in that the voice of the speech recognition equipment Triggering is realized by button, when a button is pressed, can send out trigger signal, speech recognition equipment first sends out response letter Number, it is further continued for acquisition subsequent voice signal, is sent to background service device.
6. according to the voice sharing recognition methods described in claim 1, which is characterized in that the voice of the speech recognition equipment Triggering is realized by radio receiver, when radio receiver receives specific wireless signal, can send out triggering Signal, speech recognition equipment first send out response signal, are further continued for acquisition subsequent voice signal, are sent to background service device.
7. according to any voice sharing recognition methods in claim 1 to 6, which is characterized in that the speech recognition dress It sets after receiving trigger signal, on the one hand carries out local identification, while trigger signal is sent to background service device, background service dress It sets according to trigger signal quality, one or more speech recognition equipments is only allowed to send out response signal, all voices being triggered are known Other device can continue to acquisition subsequent voice signal, be sent to background service device.
8. according to the voice sharing recognition methods described in claim 7, which is characterized in that the speech recognition equipment receives tactile After signalling, local identification is on the one hand carried out, while trigger signal is sent to background service device, background service device is according to tactile Signalling quality and/or arrival time only allow one or more speech recognition equipments to send out response signal, and it is follow-up to continue acquisition Voice signal, is sent to background service device, and other speech recognition equipments for being not allowed to response will not continue to acquire follow-up language Sound signal.
9. according to the voice sharing recognition methods described in claim 8, which is characterized in that the speech recognition equipment receives tactile After signalling, local identification is first carried out, trigger signal is just sent to background service device, background service device root after identifying successfully According to trigger signal quality and/or arrival time, one or more speech recognition equipments is only allowed to send out response signal, and continues to acquire Subsequent voice signal, is sent to background service device, it is other be not allowed to the speech recognition equipment of response that will not continue to acquisition after Continuous voice signal.
10. according to the voice sharing recognition methods described in claim 9, which is characterized in that the speech recognition equipment receives Background service device is passed to after trigger signal, and is allowed to after sending out response signal, according to the voice signal of subsequent acquisition, is sent to Background service device, by returned data to play sound, control illumination, transmission infrared forwarding data, transmission wireless data, execution One or more modes in local control instruction are presented, and within certain time, are continued to acquire voice signal, be returned Background service device is passed to, multi-process interaction is formed, this is triggered without trigger signal in the process, unless in time-out time Voice signal is not detected, then speech recognition equipment returns to state to be triggered, waits for trigger signal that could enter interaction flow.
CN201710144058.7A 2017-03-13 2017-03-13 Voice sharing recognition methods Pending CN108573699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710144058.7A CN108573699A (en) 2017-03-13 2017-03-13 Voice sharing recognition methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710144058.7A CN108573699A (en) 2017-03-13 2017-03-13 Voice sharing recognition methods

Publications (1)

Publication Number Publication Date
CN108573699A true CN108573699A (en) 2018-09-25

Family

ID=63577952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710144058.7A Pending CN108573699A (en) 2017-03-13 2017-03-13 Voice sharing recognition methods

Country Status (1)

Country Link
CN (1) CN108573699A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111370012A (en) * 2020-05-27 2020-07-03 北京小米移动软件有限公司 Bluetooth voice audio acquisition method and system
CN112820287A (en) * 2020-12-31 2021-05-18 乐鑫信息科技(上海)股份有限公司 Distributed speech processing system and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111370012A (en) * 2020-05-27 2020-07-03 北京小米移动软件有限公司 Bluetooth voice audio acquisition method and system
CN112820287A (en) * 2020-12-31 2021-05-18 乐鑫信息科技(上海)股份有限公司 Distributed speech processing system and method

Similar Documents

Publication Publication Date Title
US10397742B2 (en) Detecting location within a network
US11561519B2 (en) Systems and methods of gestural interaction in a pervasive computing environment
US11056108B2 (en) Interactive method and device
CN107450390B (en) intelligent household appliance control device, control method and control system
US10524046B2 (en) Systems and methods for automatic speech recognition
EP3951779A1 (en) Method for enhancing far-field speech recognition rate, system and readable storage medium
CN106440192A (en) Household appliance control method, device and system and intelligent air conditioner
US20180048482A1 (en) Control system and control processing method and apparatus
WO2019112924A1 (en) Indoor position and vector tracking systems and method
WO2020168727A1 (en) Voice recognition method and device, storage medium, and air conditioner
EP1217608B1 (en) Activation of voice-controlled apparatus
EP3602241B1 (en) Method and apparatus for interaction with an intelligent personal assistant
CN106970535B (en) Control method and electronic equipment
CN108231079A (en) For the method, apparatus, equipment and computer readable storage medium of control electronics
US6662137B2 (en) Device location discovery by sound
CN108573699A (en) Voice sharing recognition methods
JP2002311990A5 (en)
CN110383236A (en) Master device is selected to realize isochronous audio
CN113671846B (en) Intelligent device control method and device, wearable device and storage medium
WO2022017003A1 (en) Voice transmission control method, voice remote controller, terminal device, and storage medium
CN111090412A (en) Volume adjusting method and device and audio equipment
US7092886B2 (en) Controlling the order of output of multiple devices
CN111766303B (en) Voice acquisition method, device, equipment and medium based on acoustic environment evaluation
CN112105129B (en) Intelligent lamp, intelligent lighting method and computer readable storage medium
US20020082835A1 (en) Device group discovery method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180925

WD01 Invention patent application deemed withdrawn after publication