CN110517678A - A kind of AI voice answer-back response system of view-based access control model induction - Google Patents
A kind of AI voice answer-back response system of view-based access control model induction Download PDFInfo
- Publication number
- CN110517678A CN110517678A CN201910804779.5A CN201910804779A CN110517678A CN 110517678 A CN110517678 A CN 110517678A CN 201910804779 A CN201910804779 A CN 201910804779A CN 110517678 A CN110517678 A CN 110517678A
- Authority
- CN
- China
- Prior art keywords
- voice
- equipment
- people
- visual response
- phone
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000006698 induction Effects 0.000 title claims abstract description 7
- 230000000007 visual effect Effects 0.000 claims abstract description 85
- 238000012806 monitoring device Methods 0.000 claims abstract description 56
- 238000006243 chemical reaction Methods 0.000 claims abstract description 11
- 238000012544 monitoring process Methods 0.000 claims description 35
- 230000003068 static effect Effects 0.000 claims description 33
- 239000000463 material Substances 0.000 claims description 9
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 claims description 3
- 229920001131 Pulp (paper) Polymers 0.000 claims description 3
- 239000003990 capacitor Substances 0.000 claims description 3
- 229910052799 carbon Inorganic materials 0.000 claims description 3
- 239000000835 fiber Substances 0.000 claims description 3
- 230000005236 sound signal Effects 0.000 claims description 3
- 210000002268 wool Anatomy 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 8
- 230000003993 interaction Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000002354 daily effect Effects 0.000 description 3
- 230000003542 behavioural effect Effects 0.000 description 2
- 238000013016 damping Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000000034 method Methods 0.000 description 2
- 230000010355 oscillation Effects 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Alarm Systems (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present invention relates to a kind of AI voice answer-back response system of view-based access control model induction, including voice-output device, voice-input device, speech apparatus, speech sound reponsive apparatus;Number visual response equipment, phone visual response equipment, nozzle type visual response equipment, earplug visual response equipment, monitoring device;User inputs voice by voice-input device, speech apparatus carries out the conversion of analog signal and digital signal to the voice of input, speech sound reponsive apparatus judge whether being special sound, is that special sound then carries out voice response, is output by voice equipment and carries out AI dialogue mode;It is not special sound is then other voice responses;Then start monitoring device, at this moment will be according to nozzle type visual response equipment, phone visual response equipment, the information that number visual response equipment generates is to determine whether response is output by voice equipment and carries out AI dialogue mode only when three is judged as YES.
Description
Technical field
The present invention relates to a kind of artificial intelligent voice response system, specifically a kind of AI of view-based access control model induction
Voice answer-back response system.
Background technique
Intelligent sound box is the product of speaker upgrading, is the tool that family consumer is surfed the Internet with voice, than
Such as requesting songs, online shopping, or understanding weather forecast, it can also be controlled smart home device, for example open
Curtain, setting refrigerator temperature, allow in advance water heater heating etc..
Intelligent sound box actually belongs to intelligent sound technology, and core is very brief --- and to allow machine in voice dialogue
This link possesses the ability for being similar to people, and intelligent sound box becomes the general presence of small household appliances, and it is empty to penetrate into daily life
Between, but the response system of current intelligent sound technology, the daily habits and behavior aspect of simulating people are showed simultaneously
It is not fully up to expectations.
The response system of current intelligent sound technology needs user to say a specific word, intelligent sound
Case carries out response, this specific word is usually the title of intelligent sound box by this specific word.And people exist
In every-day language, when person to person talks with face-to-face, the title of other side is seldom said, then engage in the dialogue, this does not just meet the daily of people
Habit and behavior, this is the deficiencies in the prior art place.
Summary of the invention
In order to solve intelligent retrieval function in the prior art, the technical solution adopted by the present invention is that, a kind of view-based access control model
The AI voice answer-back response system of induction, which is characterized in that including voice-output device, voice-input device, voice conversion is set
It is standby, speech sound reponsive apparatus;Number visual response equipment, phone visual response equipment, nozzle type visual response equipment, monitoring device.
It is a kind of intelligent sound interaction platform that the present invention, which might also say that, and monitoring device is mounted on the region for needing to respond, right
The region is monitored in real time.
In monitoring device, 360 ° of rotating cameras of energy carry out panoramic video monitoring to response region.
It is a kind of AI voice judgement conversational system, voice-output device, with speech apparatus phase that the present invention, which might also say that,
It even, is the output equipment for generating voice.
In voice-output device, it is provided with dynamic speaker, utilizes the interaction between voice coil and stationary magnetic field
Power makes diaphragm oscillations and sounding.
In voice-output device, it is provided with cone basin formula loudspeaker, the diaphragm materials used are in paper pulp material or mix
Wool, silk, carbon fibre material, to increase its rigidity, interior damping and waterproof performance.
In voice-output device, it is provided with frequency divider, frequency divider is that power divider is also referred to as passive type rear class frequency divider,
It is to be divided after power power amplifier.It mainly includes inductance, resistance, capacitor passive block, forms filter network,
The audio signal of each frequency range is sent in the loudspeaker of corresponding band respectively and goes to reset.
It is a kind of artificial intelligent voice response interaction platform, voice-input device, with voice that the present invention, which might also say that,
Conversion equipment is connected, and the voice messaging of people is directly inputted to the human interface device of computer.
It is a kind of AI voice technology response system that the present invention, which might also say that, and speech apparatus is set with voice input
Standby to be connected with voice-output device, the voice of input carries out the conversion of analog signal and digital signal, and voice-input device is defeated
The characteristic information (variation such as frequency, period, tone) of the voice entered records in a computer after making digitized processing;Or meter
The information of calculation machine is converted to the characteristic information output of voice.
Nozzle type visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area
Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, then be not responding to.
In nozzle type visual response equipment, face identification system is set, it is specific by setting in the human face region of identification
The threshold value of color detects the region of lip, and by the comparison of the previous frame and next frame of video, the boundary of lip is not overlapped,
Then the nozzle type of people is not static.
Number visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area
Video, judged there are several individuals in video, two or more people are judged as being just to be not responding to.
In number visual response equipment, counter is set, counter 1 then responds, and counter is greater than 1, then does not ring
It answers.
Phone visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area
Video, judge the whether hand-held phone of the people in video or wear earphone, human hand held phone or wear earphone, be then not responding to.
In phone visual response equipment, the 3 d model library of mobile phone and fixed-line telephone is set, passes through identification people's
Hand, and then object in the hand of people is compared by 3 d model library, and then judge whether it is phone.
In phone visual response equipment, the 3 d model library of bluetooth headset and common headphones is set, passes through identification people's
Ear, and then the object by being worn on the ear of 3 d model library comparison people, and then judge whether it is earphone.
Speech sound reponsive apparatus, with voice-input device, speech apparatus is connected, to the equipment that voice generates response, language
Sound equipment should be divided into two kinds, and one kind is that special sound responds, and one kind is other voice responses.Special sound response, as long as being exactly language
Sound response apparatus receives special sound and just generates response, is output by voice equipment and engages in the dialogue mode;Other voice responses,
It is other voices for receiving special sound in addition to speech sound reponsive apparatus, then starts monitoring device, at this moment will be regarded according to nozzle type
Feel sensing apparatus, phone visual response equipment, the information that number visual response equipment generates is to determine whether response, only works as people
Number visual response equipment, phone visual response equipment, nozzle type visual response equipment when being all judged as YES, are output by voice and set
It is standby to carry out AI dialogue mode.
The workflow of voice response is that user inputs voice by voice-input device, and speech apparatus is to input
Voice carry out analog signal and digital signal conversion, it is specific that speech sound reponsive apparatus, which judge whether being special sound,
Voice then carries out voice response, is output by voice equipment and carries out AI dialogue mode;Be not special sound be then other voices ring
It answers;
The workflow of other voice responses is, by nozzle type visual response equipment, shoots to monitoring device to monitoring area
Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static
Then, there are several individuals in video is judged to the video that monitoring device shoots monitoring area by number visual response equipment,
Two or more people are judged as being just to be not responding to, and a people is just by phone visual response equipment, to monitoring device pair
The video of monitoring area shooting judge the whether hand-held phone of the people in video or wears earphone, human hand held phone or wears earphone
It is then not responding to, people does not hold phone or wears earphone then, carries out voice response, is output by voice equipment and carries out AI dialogue mode.
Detailed description of the invention
Fig. 1 is overall structure diagram of the invention.
Fig. 2 is the work flow diagram of voice response of the invention.
Fig. 3 is one work flow diagram of embodiment of other voice responses of the invention.
Fig. 4 is two work flow diagram of embodiment of other voice responses of the invention.
Fig. 5 is three work flow diagram of embodiment of other voice responses of the invention.
Fig. 6 is the example IV work flow diagram of other voice responses of the invention.
Fig. 7 is five work flow diagram of embodiment of other voice responses of the invention.
Fig. 8 is six work flow diagram of embodiment of other voice responses of the invention.
Specific embodiment
The embodiment of the monitor supervision platform system of intelligent retrieval of the invention is described in detail below with reference to accompanying drawings.
Embodiment one
In order to solve intelligent retrieval function in the prior art, the technical solution adopted by the present invention is that, a kind of view-based access control model
The AI voice answer-back response system of induction, which is characterized in that including voice-output device, voice-input device, voice conversion is set
It is standby, speech sound reponsive apparatus;Number visual response equipment, phone visual response equipment, nozzle type visual response equipment, monitoring device.
Monitoring device is mounted on the region for needing to respond, monitors in real time to the region.
In monitoring device, 360 ° of rotating cameras of energy carry out panoramic video monitoring to response region.
Voice-output device is connected with speech apparatus, is the output equipment for generating voice.
In voice-output device, dynamic speaker makes to shake using the interaction force between voice coil and stationary magnetic field
Film vibrates and sounding.
In voice-output device, it is provided with dynamic speaker, utilizes the interaction between voice coil and stationary magnetic field
Power makes diaphragm oscillations and sounding.
In voice-output device, it is provided with cone basin formula loudspeaker, the diaphragm materials used are based on paper pulp material, or mix
Enter wool, silk, carbon fibre material, to increase its rigidity, interior damping and waterproof performance.
In voice-output device, it is provided with frequency divider, frequency divider is that power divider is also referred to as passive type rear class frequency divider,
It is to be divided after power power amplifier.It mainly includes inductance, resistance, capacitor passive block, forms filter network,
The audio signal of each frequency range is sent in the loudspeaker of corresponding band respectively and goes to reset.
Voice-input device is connected with speech apparatus, and the voice messaging of people is directly inputted to the man-machine of computer
Interface equipment.
Speech apparatus is connected with voice-input device and voice-output device, and the voice of input carries out analog signal
With the conversion of digital signal, the characteristic information (frequency, period, tone etc. change) of the voice of voice-input device input is counted
It is recorded in a computer after wordization processing;Or the characteristic information that the information of computer is converted to voice is exported.
Nozzle type visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area
Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, then be not responding to.
In nozzle type visual response equipment, face identification system is set, it is specific by setting in the human face region of identification
The threshold value of color detects the region of lip, and by the comparison of the previous frame and next frame of video, the boundary of lip is not overlapped,
Then the nozzle type of people is not static.
In nozzle type visual response equipment, face identification system is set, is compared, is ignored inside frame by rectangular edges
Image recognition.
This is primarily to nozzle type visual response equipment excludes the face in television set.Since television set is rectangular edges
Frame, therefore the face in television set is ignored, in order to avoid accidentally identify the face in television set.
Number visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area
Video, judged there are several individuals in video, two or more people are judged as being just to be not responding to.
In number visual response equipment, counter is set, counter 1 then responds, and counter is greater than 1, then does not ring
It answers.
Phone visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area
Video, judge that the whether hand-held phone of people in video, human hand held phone are then not responding to.
In phone visual response equipment, the 3 d model library of mobile phone and fixed-line telephone is set, passes through identification people's
Hand, and then object in the hand of people is compared by 3 d model library, and then judge whether it is phone.
In phone visual response equipment, the 3 d model library of bluetooth headset and common headphones is set, passes through identification people's
Ear, and then the object by being worn on the ear of 3 d model library comparison people, and then judge whether it is earphone.
In order to judge whether user is to make a phone call by earphone, identify whether user's has earphone.
Speech sound reponsive apparatus, with voice-input device, speech apparatus is connected, to the equipment that voice generates response, language
Sound equipment should be divided into two kinds, and one kind is that special sound responds, and one kind is other voice responses.Special sound response, as long as being exactly language
Sound response apparatus receives special sound and just generates response, is output by voice equipment and engages in the dialogue mode;Other voice responses,
It is other voices for receiving special sound in addition to speech sound reponsive apparatus, then starts monitoring device, at this moment will be regarded according to nozzle type
Feel sensing apparatus, phone visual response equipment, the information that number visual response equipment generates is to determine whether response, only works as people
Number visual response equipment, phone visual response equipment, nozzle type visual response equipment, when being all judged as YES, speech sound reponsive apparatus into
Row response, and be output by voice equipment and carry out AI dialogue mode.
The workflow of voice response is that user inputs voice by voice-input device, and speech apparatus is to input
Voice carry out analog signal and digital signal conversion, it is specific that speech sound reponsive apparatus, which judge whether being special sound,
Voice then carries out voice response, is output by voice equipment and carries out AI dialogue mode;Be not special sound be then other voices ring
It answers;
The workflow of other voice responses is, by nozzle type visual response equipment, shoots to monitoring device to monitoring area
Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static
Then, there are several individuals in video is judged to the video that monitoring device shoots monitoring area by number visual response equipment,
Two or more people are judged as being just to be not responding to, and a people is just by phone visual response equipment, to monitoring device pair
The video of monitoring area shooting judge the whether hand-held phone of the people in video or wears earphone, human hand held phone or wears earphone
It is then not responding to, people does not hold phone or wears earphone then, carries out voice response, is output by voice equipment and carries out AI dialogue mode.
Embodiment two
The workflow of other voice responses is, by nozzle type visual response equipment, shoots to monitoring device to monitoring area
Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static
Then, by phone visual response equipment, to the video that monitoring device shoots monitoring area, judge people in video whether hand
It holds phone or wears earphone, human hand held phone or wear earphone and be then not responding to, the not hand-held phone of people or wear earphone then, by number visual impression
Equipment is answered, to the video that monitoring device shoots monitoring area, is judged, there are several individuals in video, two or more
People be judged as being just to be not responding to, a people just carries out voice response, be output by voice equipment carry out AI dialogue mode.
Embodiment three
The workflow of other voice responses is, by number visual response equipment, shoots to monitoring device to monitoring area
Video, carry out having several individuals in judgement video, two or more people are judged as being just to be not responding to, then by nozzle type
Visual response equipment carries out judging whether the nozzle type of the people in video is static to the video that monitoring device shoots monitoring area,
The nozzle type of people be it is static, be not responding to, the nozzle type of people is not static then, by phone visual response equipment, to monitoring device to monitored space
The video of domain shooting judge the whether hand-held phone of the people in video or wears earphone, human hand held phone or wears earphone and does not ring then
It answers, people, which does not hold phone or wears earphone, then carries out voice response, is output by voice equipment and carries out AI dialogue mode.
Example IV
The workflow of other voice responses is, by number visual response equipment, shoots to monitoring device to monitoring area
Video, carry out having several individuals in judgement video, two or more people are judged as being just to be not responding to, then by phone
Visual response equipment, to the video that monitoring device shoots monitoring area, judge the whether hand-held phone of people in video or
It wears earphone, human hand held phone or wears earphone and be then not responding to, the not hand-held phone of people or wear earphone, then by nozzle type visual response equipment,
To the video that monitoring device shoots monitoring area, carry out judging whether the nozzle type of the people in video is static, and the nozzle type of people is quiet
It is only then not responding to, the nozzle type of people is not static, then carries out voice response, is output by voice equipment and carries out AI dialogue mode.
Embodiment five
The workflow of other voice responses is, by phone visual response equipment, shoots to monitoring device to monitoring area
Video, judge the whether hand-held phone of the people in video or wear earphone, human hand held phone or wearing earphone and being then not responding to, people is not
Hand-held phone wears earphone then, is sentenced by number visual response equipment to the video that monitoring device shoots monitoring area
It is disconnected, there are several individuals in video, two or more people are judged as being just to be not responding to, and a people is just by nozzle type visual impression
Equipment is answered, to the video that monitoring device shoots monitoring area, carries out judging whether the nozzle type of the people in video is static, the mouth of people
Type be it is static, be not responding to, the nozzle type of people is not static then, carry out voice response, be output by voice equipment carry out AI dialogue mould
Formula.
Embodiment six
The workflow of other voice responses is, by phone visual response equipment, shoots to monitoring device to monitoring area
Video, judge the whether hand-held phone of the people in video or wear earphone, human hand held phone or wearing earphone and being then not responding to, people is not
Hand-held phone wears earphone then, is judged by nozzle type visual response equipment the video that monitoring device shoots monitoring area
Whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static, then by number visual impression
Equipment is answered, to the video that monitoring device shoots monitoring area, is judged, there are several individuals in video, two or more
People be judged as being just to be not responding to, a people just then, carries out voice response, is output by voice equipment and carries out AI dialogue mould
Formula.
Illustrate that user does not speak, sound for judging that the nozzle type of people does not change by nozzle type visual response equipment
Source be from TV, radio, other noises are then not responding to;The nozzle type of people changes then, illustrates that sound is to make
The sound of user, it is possible that being to speak with other people, therefore judge the people in video by number visual response equipment again
Number may be to talk between two people with regard to explanation, be then not responding to if it is two or more people;If it is one
People just illustrates that this people is likely to speak to intelligent response system, it is possible that making a phone call or wearing earphone;Therefore lead to again
Phone visual response equipment is crossed, judge the whether hand-held phone of the people in video or wears earphone, human hand held phone or wears earphone, then is said
Bright he is making a phone call or is wearing earphone, then is not responding to, if if the not hand-held phone of people or wearing earphone, illustrate he be with intelligent language
Sound answering system is spoken, then carries out voice response, is output by voice equipment and is carried out AI dialogue mode.
The purpose of the invention is to enable intelligent voice response system more reasonably apish behavioural habits, by intelligence
From the point of view of energy voice response system is as one " people ", when he should respond reaction could be more humanized.
Intelligent voice response system passes through nozzle type visual response equipment, phone visual response equipment, the judgement of number visual response equipment
Whether user with intelligent voice response system dialog, without particular words as stiff instruction.Certainly also there is example
Outside, such as if user is to talk to onself.Firstly, such case is seldom, furthermore, if by intelligent voice response system
As one " people ", two people of A and B stay together, and A talks to onself, and in addition B is also likely to will be considered that A is spoken with oneself,
This is exactly the behavioural habits of people.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its
Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.
Claims (8)
1. a kind of AI voice answer-back response system of view-based access control model induction, which is characterized in that including voice-output device, voice is defeated
Enter equipment, speech apparatus, speech sound reponsive apparatus;Number visual response equipment, phone visual response equipment, nozzle type visual impression
Answer equipment, monitoring device;
Monitoring device is mounted on the region for needing to respond, monitors in real time to the region;
Voice-output device is connected with speech apparatus, is the output equipment for generating voice;
Voice-input device is connected with speech apparatus, and the voice messaging of people is directly inputted to the man-machine interface of computer
Equipment;
Speech apparatus is connected with voice-input device and voice-output device, and the voice of input carries out analog signal sum number
The conversion of word signal records in a computer after the characteristic information of the voice of voice-input device input is made digitized processing;
Or the characteristic information that the information of computer is converted to voice is exported;
Nozzle type visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, the view shot to monitoring device to monitoring area
Frequently, carry out judging whether the nozzle type of the people in video static, the nozzle type of people be it is static, then be not responding to;
Number visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, the view shot to monitoring device to monitoring area
Frequently, judged there are several individuals in video, two or more people are judged as being just to be not responding to;
Phone visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, the view shot to monitoring device to monitoring area
Frequently, it carries out judging the whether hand-held phone of the people in video or wears earphone, human hand held phone or wear earphone, be then not responding to;
Speech sound reponsive apparatus, with voice-input device, speech apparatus is connected, and to the equipment that voice generates response, voice is rung
Two kinds should be divided into, one kind is that special sound responds, and one kind is that special sound responds, and one kind is other voice responses.Special sound
Response, exactly generates response as long as speech sound reponsive apparatus receives special sound, is output by voice equipment and engages in the dialogue mould
Formula;Other voice responses are other voices for receiving special sound in addition to speech sound reponsive apparatus, then start monitoring device, this
When will be according to nozzle type visual response equipment, phone visual response equipment, information that number visual response equipment generates judges
Whether respond, only when number visual response equipment, phone visual response equipment, nozzle type visual response equipment is all judged as YES
When, it is output by voice equipment and carries out AI dialogue mode;
The workflow of voice response is that user inputs voice, language of the speech apparatus to input by voice-input device
Sound carries out the conversion of analog signal and digital signal, and it is special sound that speech sound reponsive apparatus, which judge whether being special sound,
Voice response is then carried out, equipment is output by voice and carries out AI dialogue mode;It is not special sound is then other voice responses;
The workflow of other voice responses is, by nozzle type visual response equipment, the view that monitoring device shoots monitoring area
Frequently, carry out judging whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static then,
The nozzle type of people is not static then, is judged by number visual response equipment the video that monitoring device shoots monitoring area,
There are several individuals in video, two or more people are judged as being just to be not responding to, and a people is just set by phone visual response
It is standby, to the video that monitoring device shoots monitoring area, judge the whether hand-held phone of people in video, human hand held phone is then
It is not responding to, the not hand-held phone of people then, carries out voice response, is output by voice equipment and carries out AI dialogue mode.
2. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that in monitoring device, energy
360 ° of rotating cameras carry out panoramic video monitoring to response region.
3. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that in voice-output device
In, it is provided with cone basin formula loudspeaker, the diaphragm materials used mix wool, silk, carbon fibre material in paper pulp material.
4. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that in voice-output device
In, it is provided with frequency divider, frequency divider, which is that power divider is also referred to as passive type rear class frequency divider, to be divided after power power amplifier
Frequency;It mainly includes inductance, resistance, capacitor passive block, forms filter network, the audio signal of each frequency range is sent respectively
It is reset into the loudspeaker of corresponding band.
5. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in nozzle type visual response
In standby, face identification system is set, by the threshold value of setting specific color in the human face region of identification, detects the area of lip
Domain, by the comparison of the previous frame and next frame of video, the boundary of lip is not overlapped, then the nozzle type of people is not static.
6. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in number visual response
In standby, counter is set, counter 1 then responds, and counter is greater than 1, then is not responding to.
7. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in phone visual response
In standby, the 3 d model library of mobile phone and fixed-line telephone is set, is compared by identifying the hand of people, and then by 3 d model library
Object in the hand of people, and then judge whether it is phone.
8. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in phone visual response
In standby, the 3 d model library of bluetooth headset and common headphones is set, by identifying the ear of people, and then passes through 3 d model library ratio
To the object worn on the ear of people, and then judge whether it is earphone.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910804779.5A CN110517678B (en) | 2019-08-28 | 2019-08-28 | AI voice response system based on visual sense |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910804779.5A CN110517678B (en) | 2019-08-28 | 2019-08-28 | AI voice response system based on visual sense |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110517678A true CN110517678A (en) | 2019-11-29 |
CN110517678B CN110517678B (en) | 2022-04-08 |
Family
ID=68627619
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910804779.5A Active CN110517678B (en) | 2019-08-28 | 2019-08-28 | AI voice response system based on visual sense |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110517678B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114360527A (en) * | 2021-12-30 | 2022-04-15 | 亿咖通(湖北)技术有限公司 | Vehicle-mounted voice interaction method, device, equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014159581A1 (en) * | 2013-03-12 | 2014-10-02 | Nuance Communications, Inc. | Methods and apparatus for detecting a voice command |
US20180158449A1 (en) * | 2016-12-02 | 2018-06-07 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and device for waking up via speech based on artificial intelligence |
CN108337362A (en) * | 2017-12-26 | 2018-07-27 | 百度在线网络技术(北京)有限公司 | Voice interactive method, device, equipment and storage medium |
CN109658925A (en) * | 2018-11-28 | 2019-04-19 | 上海蔚来汽车有限公司 | It is a kind of that wake-up vehicle-mounted voice dialogue method and system are exempted from based on context |
CN109767774A (en) * | 2017-11-08 | 2019-05-17 | 阿里巴巴集团控股有限公司 | A kind of exchange method and equipment |
CN109979036A (en) * | 2019-04-03 | 2019-07-05 | 深圳市海圳汽车技术有限公司 | With recorder control and the system and control method of speech recognition controlled, recorder |
CN110010125A (en) * | 2017-12-29 | 2019-07-12 | 深圳市优必选科技有限公司 | A kind of control method of intelligent robot, device, terminal device and medium |
-
2019
- 2019-08-28 CN CN201910804779.5A patent/CN110517678B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014159581A1 (en) * | 2013-03-12 | 2014-10-02 | Nuance Communications, Inc. | Methods and apparatus for detecting a voice command |
US20180158449A1 (en) * | 2016-12-02 | 2018-06-07 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and device for waking up via speech based on artificial intelligence |
CN109767774A (en) * | 2017-11-08 | 2019-05-17 | 阿里巴巴集团控股有限公司 | A kind of exchange method and equipment |
CN108337362A (en) * | 2017-12-26 | 2018-07-27 | 百度在线网络技术(北京)有限公司 | Voice interactive method, device, equipment and storage medium |
CN110010125A (en) * | 2017-12-29 | 2019-07-12 | 深圳市优必选科技有限公司 | A kind of control method of intelligent robot, device, terminal device and medium |
CN109658925A (en) * | 2018-11-28 | 2019-04-19 | 上海蔚来汽车有限公司 | It is a kind of that wake-up vehicle-mounted voice dialogue method and system are exempted from based on context |
CN109979036A (en) * | 2019-04-03 | 2019-07-05 | 深圳市海圳汽车技术有限公司 | With recorder control and the system and control method of speech recognition controlled, recorder |
Non-Patent Citations (2)
Title |
---|
JUN"ICHI IDO,等: "Interaction of receptionist ASKA using vision and speech information", 《IEEE CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS 2003》 * |
郑志辉,等: "基于语音实现人机对话的空调控制器研究开发", 《2 0 1 8年中国家用电器技术大会 论文集》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114360527A (en) * | 2021-12-30 | 2022-04-15 | 亿咖通(湖北)技术有限公司 | Vehicle-mounted voice interaction method, device, equipment and storage medium |
CN114360527B (en) * | 2021-12-30 | 2023-09-26 | 亿咖通(湖北)技术有限公司 | Vehicle-mounted voice interaction method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110517678B (en) | 2022-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11386905B2 (en) | Information processing method and device, multimedia device and storage medium | |
CN107978316A (en) | The method and device of control terminal | |
CN109446876A (en) | Sign language information processing method, device, electronic equipment and readable storage medium storing program for executing | |
US10225510B2 (en) | Providing a log of events to an isolated user | |
US20230045237A1 (en) | Wearable apparatus for active substitution | |
CN107741698A (en) | Linkage control method, device and system for intelligent household appliance and entrance guard equipment | |
CN106235931A (en) | Control the method and device of face cleaning instrument work | |
CN103139351A (en) | Volume control method and device, and communication terminal | |
US20180054688A1 (en) | Personal Audio Lifestyle Analytics and Behavior Modification Feedback | |
CN106067996B (en) | Voice reproduction method, voice dialogue device | |
CN106205628A (en) | Acoustical signal optimization method and device | |
CN112532266A (en) | Intelligent helmet and voice interaction control method of intelligent helmet | |
CN113038337B (en) | Audio playing method, wireless earphone and computer readable storage medium | |
US20240096343A1 (en) | Voice quality enhancement method and related device | |
CN108900951A (en) | Volume adjusting method, earphone and computer readable storage medium | |
CN111692418A (en) | Water outlet device and control method thereof | |
CN110211583A (en) | A kind of voice interactive method and interactive voice equipment based on intelligent line traffic control | |
CN110232909A (en) | A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing | |
CN110517678A (en) | A kind of AI voice answer-back response system of view-based access control model induction | |
CN108347522A (en) | Adjust the method and device of volume | |
Dargie | Adaptive audio-based context recognition | |
CN113709291A (en) | Audio processing method and device, electronic equipment and readable storage medium | |
CN106210247A (en) | Terminal control method and device | |
CN106686245A (en) | Working mode adjusting method and device | |
CN106328131A (en) | Interaction system capable of sensing position of caller and starting method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |