CN110517678A - A kind of AI voice answer-back response system of view-based access control model induction - Google Patents

A kind of AI voice answer-back response system of view-based access control model induction Download PDF

Info

Publication number
CN110517678A
CN110517678A CN201910804779.5A CN201910804779A CN110517678A CN 110517678 A CN110517678 A CN 110517678A CN 201910804779 A CN201910804779 A CN 201910804779A CN 110517678 A CN110517678 A CN 110517678A
Authority
CN
China
Prior art keywords
voice
equipment
people
visual response
phone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910804779.5A
Other languages
Chinese (zh)
Other versions
CN110517678B (en
Inventor
邹珺
熊阿伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Baolai Technology Co Ltd
Original Assignee
Nanchang Baolai Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Baolai Technology Co Ltd filed Critical Nanchang Baolai Technology Co Ltd
Priority to CN201910804779.5A priority Critical patent/CN110517678B/en
Publication of CN110517678A publication Critical patent/CN110517678A/en
Application granted granted Critical
Publication of CN110517678B publication Critical patent/CN110517678B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Alarm Systems (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention relates to a kind of AI voice answer-back response system of view-based access control model induction, including voice-output device, voice-input device, speech apparatus, speech sound reponsive apparatus;Number visual response equipment, phone visual response equipment, nozzle type visual response equipment, earplug visual response equipment, monitoring device;User inputs voice by voice-input device, speech apparatus carries out the conversion of analog signal and digital signal to the voice of input, speech sound reponsive apparatus judge whether being special sound, is that special sound then carries out voice response, is output by voice equipment and carries out AI dialogue mode;It is not special sound is then other voice responses;Then start monitoring device, at this moment will be according to nozzle type visual response equipment, phone visual response equipment, the information that number visual response equipment generates is to determine whether response is output by voice equipment and carries out AI dialogue mode only when three is judged as YES.

Description

A kind of AI voice answer-back response system of view-based access control model induction
Technical field
The present invention relates to a kind of artificial intelligent voice response system, specifically a kind of AI of view-based access control model induction Voice answer-back response system.
Background technique
Intelligent sound box is the product of speaker upgrading, is the tool that family consumer is surfed the Internet with voice, than Such as requesting songs, online shopping, or understanding weather forecast, it can also be controlled smart home device, for example open Curtain, setting refrigerator temperature, allow in advance water heater heating etc..
Intelligent sound box actually belongs to intelligent sound technology, and core is very brief --- and to allow machine in voice dialogue This link possesses the ability for being similar to people, and intelligent sound box becomes the general presence of small household appliances, and it is empty to penetrate into daily life Between, but the response system of current intelligent sound technology, the daily habits and behavior aspect of simulating people are showed simultaneously It is not fully up to expectations.
The response system of current intelligent sound technology needs user to say a specific word, intelligent sound Case carries out response, this specific word is usually the title of intelligent sound box by this specific word.And people exist In every-day language, when person to person talks with face-to-face, the title of other side is seldom said, then engage in the dialogue, this does not just meet the daily of people Habit and behavior, this is the deficiencies in the prior art place.
Summary of the invention
In order to solve intelligent retrieval function in the prior art, the technical solution adopted by the present invention is that, a kind of view-based access control model The AI voice answer-back response system of induction, which is characterized in that including voice-output device, voice-input device, voice conversion is set It is standby, speech sound reponsive apparatus;Number visual response equipment, phone visual response equipment, nozzle type visual response equipment, monitoring device.
It is a kind of intelligent sound interaction platform that the present invention, which might also say that, and monitoring device is mounted on the region for needing to respond, right The region is monitored in real time.
In monitoring device, 360 ° of rotating cameras of energy carry out panoramic video monitoring to response region.
It is a kind of AI voice judgement conversational system, voice-output device, with speech apparatus phase that the present invention, which might also say that, It even, is the output equipment for generating voice.
In voice-output device, it is provided with dynamic speaker, utilizes the interaction between voice coil and stationary magnetic field Power makes diaphragm oscillations and sounding.
In voice-output device, it is provided with cone basin formula loudspeaker, the diaphragm materials used are in paper pulp material or mix Wool, silk, carbon fibre material, to increase its rigidity, interior damping and waterproof performance.
In voice-output device, it is provided with frequency divider, frequency divider is that power divider is also referred to as passive type rear class frequency divider, It is to be divided after power power amplifier.It mainly includes inductance, resistance, capacitor passive block, forms filter network, The audio signal of each frequency range is sent in the loudspeaker of corresponding band respectively and goes to reset.
It is a kind of artificial intelligent voice response interaction platform, voice-input device, with voice that the present invention, which might also say that, Conversion equipment is connected, and the voice messaging of people is directly inputted to the human interface device of computer.
It is a kind of AI voice technology response system that the present invention, which might also say that, and speech apparatus is set with voice input Standby to be connected with voice-output device, the voice of input carries out the conversion of analog signal and digital signal, and voice-input device is defeated The characteristic information (variation such as frequency, period, tone) of the voice entered records in a computer after making digitized processing;Or meter The information of calculation machine is converted to the characteristic information output of voice.
Nozzle type visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, then be not responding to.
In nozzle type visual response equipment, face identification system is set, it is specific by setting in the human face region of identification The threshold value of color detects the region of lip, and by the comparison of the previous frame and next frame of video, the boundary of lip is not overlapped, Then the nozzle type of people is not static.
Number visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area Video, judged there are several individuals in video, two or more people are judged as being just to be not responding to.
In number visual response equipment, counter is set, counter 1 then responds, and counter is greater than 1, then does not ring It answers.
Phone visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area Video, judge the whether hand-held phone of the people in video or wear earphone, human hand held phone or wear earphone, be then not responding to.
In phone visual response equipment, the 3 d model library of mobile phone and fixed-line telephone is set, passes through identification people's Hand, and then object in the hand of people is compared by 3 d model library, and then judge whether it is phone.
In phone visual response equipment, the 3 d model library of bluetooth headset and common headphones is set, passes through identification people's Ear, and then the object by being worn on the ear of 3 d model library comparison people, and then judge whether it is earphone.
Speech sound reponsive apparatus, with voice-input device, speech apparatus is connected, to the equipment that voice generates response, language Sound equipment should be divided into two kinds, and one kind is that special sound responds, and one kind is other voice responses.Special sound response, as long as being exactly language Sound response apparatus receives special sound and just generates response, is output by voice equipment and engages in the dialogue mode;Other voice responses, It is other voices for receiving special sound in addition to speech sound reponsive apparatus, then starts monitoring device, at this moment will be regarded according to nozzle type Feel sensing apparatus, phone visual response equipment, the information that number visual response equipment generates is to determine whether response, only works as people Number visual response equipment, phone visual response equipment, nozzle type visual response equipment when being all judged as YES, are output by voice and set It is standby to carry out AI dialogue mode.
The workflow of voice response is that user inputs voice by voice-input device, and speech apparatus is to input Voice carry out analog signal and digital signal conversion, it is specific that speech sound reponsive apparatus, which judge whether being special sound, Voice then carries out voice response, is output by voice equipment and carries out AI dialogue mode;Be not special sound be then other voices ring It answers;
The workflow of other voice responses is, by nozzle type visual response equipment, shoots to monitoring device to monitoring area Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static Then, there are several individuals in video is judged to the video that monitoring device shoots monitoring area by number visual response equipment, Two or more people are judged as being just to be not responding to, and a people is just by phone visual response equipment, to monitoring device pair The video of monitoring area shooting judge the whether hand-held phone of the people in video or wears earphone, human hand held phone or wears earphone It is then not responding to, people does not hold phone or wears earphone then, carries out voice response, is output by voice equipment and carries out AI dialogue mode.
Detailed description of the invention
Fig. 1 is overall structure diagram of the invention.
Fig. 2 is the work flow diagram of voice response of the invention.
Fig. 3 is one work flow diagram of embodiment of other voice responses of the invention.
Fig. 4 is two work flow diagram of embodiment of other voice responses of the invention.
Fig. 5 is three work flow diagram of embodiment of other voice responses of the invention.
Fig. 6 is the example IV work flow diagram of other voice responses of the invention.
Fig. 7 is five work flow diagram of embodiment of other voice responses of the invention.
Fig. 8 is six work flow diagram of embodiment of other voice responses of the invention.
Specific embodiment
The embodiment of the monitor supervision platform system of intelligent retrieval of the invention is described in detail below with reference to accompanying drawings.
Embodiment one
In order to solve intelligent retrieval function in the prior art, the technical solution adopted by the present invention is that, a kind of view-based access control model The AI voice answer-back response system of induction, which is characterized in that including voice-output device, voice-input device, voice conversion is set It is standby, speech sound reponsive apparatus;Number visual response equipment, phone visual response equipment, nozzle type visual response equipment, monitoring device.
Monitoring device is mounted on the region for needing to respond, monitors in real time to the region.
In monitoring device, 360 ° of rotating cameras of energy carry out panoramic video monitoring to response region.
Voice-output device is connected with speech apparatus, is the output equipment for generating voice.
In voice-output device, dynamic speaker makes to shake using the interaction force between voice coil and stationary magnetic field Film vibrates and sounding.
In voice-output device, it is provided with dynamic speaker, utilizes the interaction between voice coil and stationary magnetic field Power makes diaphragm oscillations and sounding.
In voice-output device, it is provided with cone basin formula loudspeaker, the diaphragm materials used are based on paper pulp material, or mix Enter wool, silk, carbon fibre material, to increase its rigidity, interior damping and waterproof performance.
In voice-output device, it is provided with frequency divider, frequency divider is that power divider is also referred to as passive type rear class frequency divider, It is to be divided after power power amplifier.It mainly includes inductance, resistance, capacitor passive block, forms filter network, The audio signal of each frequency range is sent in the loudspeaker of corresponding band respectively and goes to reset.
Voice-input device is connected with speech apparatus, and the voice messaging of people is directly inputted to the man-machine of computer Interface equipment.
Speech apparatus is connected with voice-input device and voice-output device, and the voice of input carries out analog signal With the conversion of digital signal, the characteristic information (frequency, period, tone etc. change) of the voice of voice-input device input is counted It is recorded in a computer after wordization processing;Or the characteristic information that the information of computer is converted to voice is exported.
Nozzle type visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, then be not responding to.
In nozzle type visual response equipment, face identification system is set, it is specific by setting in the human face region of identification The threshold value of color detects the region of lip, and by the comparison of the previous frame and next frame of video, the boundary of lip is not overlapped, Then the nozzle type of people is not static.
In nozzle type visual response equipment, face identification system is set, is compared, is ignored inside frame by rectangular edges Image recognition.
This is primarily to nozzle type visual response equipment excludes the face in television set.Since television set is rectangular edges Frame, therefore the face in television set is ignored, in order to avoid accidentally identify the face in television set.
Number visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area Video, judged there are several individuals in video, two or more people are judged as being just to be not responding to.
In number visual response equipment, counter is set, counter 1 then responds, and counter is greater than 1, then does not ring It answers.
Phone visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area Video, judge that the whether hand-held phone of people in video, human hand held phone are then not responding to.
In phone visual response equipment, the 3 d model library of mobile phone and fixed-line telephone is set, passes through identification people's Hand, and then object in the hand of people is compared by 3 d model library, and then judge whether it is phone.
In phone visual response equipment, the 3 d model library of bluetooth headset and common headphones is set, passes through identification people's Ear, and then the object by being worn on the ear of 3 d model library comparison people, and then judge whether it is earphone.
In order to judge whether user is to make a phone call by earphone, identify whether user's has earphone.
Speech sound reponsive apparatus, with voice-input device, speech apparatus is connected, to the equipment that voice generates response, language Sound equipment should be divided into two kinds, and one kind is that special sound responds, and one kind is other voice responses.Special sound response, as long as being exactly language Sound response apparatus receives special sound and just generates response, is output by voice equipment and engages in the dialogue mode;Other voice responses, It is other voices for receiving special sound in addition to speech sound reponsive apparatus, then starts monitoring device, at this moment will be regarded according to nozzle type Feel sensing apparatus, phone visual response equipment, the information that number visual response equipment generates is to determine whether response, only works as people Number visual response equipment, phone visual response equipment, nozzle type visual response equipment, when being all judged as YES, speech sound reponsive apparatus into Row response, and be output by voice equipment and carry out AI dialogue mode.
The workflow of voice response is that user inputs voice by voice-input device, and speech apparatus is to input Voice carry out analog signal and digital signal conversion, it is specific that speech sound reponsive apparatus, which judge whether being special sound, Voice then carries out voice response, is output by voice equipment and carries out AI dialogue mode;Be not special sound be then other voices ring It answers;
The workflow of other voice responses is, by nozzle type visual response equipment, shoots to monitoring device to monitoring area Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static Then, there are several individuals in video is judged to the video that monitoring device shoots monitoring area by number visual response equipment, Two or more people are judged as being just to be not responding to, and a people is just by phone visual response equipment, to monitoring device pair The video of monitoring area shooting judge the whether hand-held phone of the people in video or wears earphone, human hand held phone or wears earphone It is then not responding to, people does not hold phone or wears earphone then, carries out voice response, is output by voice equipment and carries out AI dialogue mode.
Embodiment two
The workflow of other voice responses is, by nozzle type visual response equipment, shoots to monitoring device to monitoring area Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static Then, by phone visual response equipment, to the video that monitoring device shoots monitoring area, judge people in video whether hand It holds phone or wears earphone, human hand held phone or wear earphone and be then not responding to, the not hand-held phone of people or wear earphone then, by number visual impression Equipment is answered, to the video that monitoring device shoots monitoring area, is judged, there are several individuals in video, two or more People be judged as being just to be not responding to, a people just carries out voice response, be output by voice equipment carry out AI dialogue mode.
Embodiment three
The workflow of other voice responses is, by number visual response equipment, shoots to monitoring device to monitoring area Video, carry out having several individuals in judgement video, two or more people are judged as being just to be not responding to, then by nozzle type Visual response equipment carries out judging whether the nozzle type of the people in video is static to the video that monitoring device shoots monitoring area, The nozzle type of people be it is static, be not responding to, the nozzle type of people is not static then, by phone visual response equipment, to monitoring device to monitored space The video of domain shooting judge the whether hand-held phone of the people in video or wears earphone, human hand held phone or wears earphone and does not ring then It answers, people, which does not hold phone or wears earphone, then carries out voice response, is output by voice equipment and carries out AI dialogue mode.
Example IV
The workflow of other voice responses is, by number visual response equipment, shoots to monitoring device to monitoring area Video, carry out having several individuals in judgement video, two or more people are judged as being just to be not responding to, then by phone Visual response equipment, to the video that monitoring device shoots monitoring area, judge the whether hand-held phone of people in video or It wears earphone, human hand held phone or wears earphone and be then not responding to, the not hand-held phone of people or wear earphone, then by nozzle type visual response equipment, To the video that monitoring device shoots monitoring area, carry out judging whether the nozzle type of the people in video is static, and the nozzle type of people is quiet It is only then not responding to, the nozzle type of people is not static, then carries out voice response, is output by voice equipment and carries out AI dialogue mode.
Embodiment five
The workflow of other voice responses is, by phone visual response equipment, shoots to monitoring device to monitoring area Video, judge the whether hand-held phone of the people in video or wear earphone, human hand held phone or wearing earphone and being then not responding to, people is not Hand-held phone wears earphone then, is sentenced by number visual response equipment to the video that monitoring device shoots monitoring area It is disconnected, there are several individuals in video, two or more people are judged as being just to be not responding to, and a people is just by nozzle type visual impression Equipment is answered, to the video that monitoring device shoots monitoring area, carries out judging whether the nozzle type of the people in video is static, the mouth of people Type be it is static, be not responding to, the nozzle type of people is not static then, carry out voice response, be output by voice equipment carry out AI dialogue mould Formula.
Embodiment six
The workflow of other voice responses is, by phone visual response equipment, shoots to monitoring device to monitoring area Video, judge the whether hand-held phone of the people in video or wear earphone, human hand held phone or wearing earphone and being then not responding to, people is not Hand-held phone wears earphone then, is judged by nozzle type visual response equipment the video that monitoring device shoots monitoring area Whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static, then by number visual impression Equipment is answered, to the video that monitoring device shoots monitoring area, is judged, there are several individuals in video, two or more People be judged as being just to be not responding to, a people just then, carries out voice response, is output by voice equipment and carries out AI dialogue mould Formula.
Illustrate that user does not speak, sound for judging that the nozzle type of people does not change by nozzle type visual response equipment Source be from TV, radio, other noises are then not responding to;The nozzle type of people changes then, illustrates that sound is to make The sound of user, it is possible that being to speak with other people, therefore judge the people in video by number visual response equipment again Number may be to talk between two people with regard to explanation, be then not responding to if it is two or more people;If it is one People just illustrates that this people is likely to speak to intelligent response system, it is possible that making a phone call or wearing earphone;Therefore lead to again Phone visual response equipment is crossed, judge the whether hand-held phone of the people in video or wears earphone, human hand held phone or wears earphone, then is said Bright he is making a phone call or is wearing earphone, then is not responding to, if if the not hand-held phone of people or wearing earphone, illustrate he be with intelligent language Sound answering system is spoken, then carries out voice response, is output by voice equipment and is carried out AI dialogue mode.
The purpose of the invention is to enable intelligent voice response system more reasonably apish behavioural habits, by intelligence From the point of view of energy voice response system is as one " people ", when he should respond reaction could be more humanized. Intelligent voice response system passes through nozzle type visual response equipment, phone visual response equipment, the judgement of number visual response equipment Whether user with intelligent voice response system dialog, without particular words as stiff instruction.Certainly also there is example Outside, such as if user is to talk to onself.Firstly, such case is seldom, furthermore, if by intelligent voice response system As one " people ", two people of A and B stay together, and A talks to onself, and in addition B is also likely to will be considered that A is spoken with oneself, This is exactly the behavioural habits of people.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims (8)

1. a kind of AI voice answer-back response system of view-based access control model induction, which is characterized in that including voice-output device, voice is defeated Enter equipment, speech apparatus, speech sound reponsive apparatus;Number visual response equipment, phone visual response equipment, nozzle type visual impression Answer equipment, monitoring device;
Monitoring device is mounted on the region for needing to respond, monitors in real time to the region;
Voice-output device is connected with speech apparatus, is the output equipment for generating voice;
Voice-input device is connected with speech apparatus, and the voice messaging of people is directly inputted to the man-machine interface of computer Equipment;
Speech apparatus is connected with voice-input device and voice-output device, and the voice of input carries out analog signal sum number The conversion of word signal records in a computer after the characteristic information of the voice of voice-input device input is made digitized processing; Or the characteristic information that the information of computer is converted to voice is exported;
Nozzle type visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, the view shot to monitoring device to monitoring area Frequently, carry out judging whether the nozzle type of the people in video static, the nozzle type of people be it is static, then be not responding to;
Number visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, the view shot to monitoring device to monitoring area Frequently, judged there are several individuals in video, two or more people are judged as being just to be not responding to;
Phone visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, the view shot to monitoring device to monitoring area Frequently, it carries out judging the whether hand-held phone of the people in video or wears earphone, human hand held phone or wear earphone, be then not responding to;
Speech sound reponsive apparatus, with voice-input device, speech apparatus is connected, and to the equipment that voice generates response, voice is rung Two kinds should be divided into, one kind is that special sound responds, and one kind is that special sound responds, and one kind is other voice responses.Special sound Response, exactly generates response as long as speech sound reponsive apparatus receives special sound, is output by voice equipment and engages in the dialogue mould Formula;Other voice responses are other voices for receiving special sound in addition to speech sound reponsive apparatus, then start monitoring device, this When will be according to nozzle type visual response equipment, phone visual response equipment, information that number visual response equipment generates judges Whether respond, only when number visual response equipment, phone visual response equipment, nozzle type visual response equipment is all judged as YES When, it is output by voice equipment and carries out AI dialogue mode;
The workflow of voice response is that user inputs voice, language of the speech apparatus to input by voice-input device Sound carries out the conversion of analog signal and digital signal, and it is special sound that speech sound reponsive apparatus, which judge whether being special sound, Voice response is then carried out, equipment is output by voice and carries out AI dialogue mode;It is not special sound is then other voice responses;
The workflow of other voice responses is, by nozzle type visual response equipment, the view that monitoring device shoots monitoring area Frequently, carry out judging whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static then, The nozzle type of people is not static then, is judged by number visual response equipment the video that monitoring device shoots monitoring area, There are several individuals in video, two or more people are judged as being just to be not responding to, and a people is just set by phone visual response It is standby, to the video that monitoring device shoots monitoring area, judge the whether hand-held phone of people in video, human hand held phone is then It is not responding to, the not hand-held phone of people then, carries out voice response, is output by voice equipment and carries out AI dialogue mode.
2. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that in monitoring device, energy 360 ° of rotating cameras carry out panoramic video monitoring to response region.
3. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that in voice-output device In, it is provided with cone basin formula loudspeaker, the diaphragm materials used mix wool, silk, carbon fibre material in paper pulp material.
4. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that in voice-output device In, it is provided with frequency divider, frequency divider, which is that power divider is also referred to as passive type rear class frequency divider, to be divided after power power amplifier Frequency;It mainly includes inductance, resistance, capacitor passive block, forms filter network, the audio signal of each frequency range is sent respectively It is reset into the loudspeaker of corresponding band.
5. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in nozzle type visual response In standby, face identification system is set, by the threshold value of setting specific color in the human face region of identification, detects the area of lip Domain, by the comparison of the previous frame and next frame of video, the boundary of lip is not overlapped, then the nozzle type of people is not static.
6. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in number visual response In standby, counter is set, counter 1 then responds, and counter is greater than 1, then is not responding to.
7. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in phone visual response In standby, the 3 d model library of mobile phone and fixed-line telephone is set, is compared by identifying the hand of people, and then by 3 d model library Object in the hand of people, and then judge whether it is phone.
8. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in phone visual response In standby, the 3 d model library of bluetooth headset and common headphones is set, by identifying the ear of people, and then passes through 3 d model library ratio To the object worn on the ear of people, and then judge whether it is earphone.
CN201910804779.5A 2019-08-28 2019-08-28 AI voice response system based on visual sense Active CN110517678B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910804779.5A CN110517678B (en) 2019-08-28 2019-08-28 AI voice response system based on visual sense

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910804779.5A CN110517678B (en) 2019-08-28 2019-08-28 AI voice response system based on visual sense

Publications (2)

Publication Number Publication Date
CN110517678A true CN110517678A (en) 2019-11-29
CN110517678B CN110517678B (en) 2022-04-08

Family

ID=68627619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910804779.5A Active CN110517678B (en) 2019-08-28 2019-08-28 AI voice response system based on visual sense

Country Status (1)

Country Link
CN (1) CN110517678B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114360527A (en) * 2021-12-30 2022-04-15 亿咖通(湖北)技术有限公司 Vehicle-mounted voice interaction method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014159581A1 (en) * 2013-03-12 2014-10-02 Nuance Communications, Inc. Methods and apparatus for detecting a voice command
US20180158449A1 (en) * 2016-12-02 2018-06-07 Baidu Online Network Technology (Beijing) Co., Ltd. Method and device for waking up via speech based on artificial intelligence
CN108337362A (en) * 2017-12-26 2018-07-27 百度在线网络技术(北京)有限公司 Voice interactive method, device, equipment and storage medium
CN109658925A (en) * 2018-11-28 2019-04-19 上海蔚来汽车有限公司 It is a kind of that wake-up vehicle-mounted voice dialogue method and system are exempted from based on context
CN109767774A (en) * 2017-11-08 2019-05-17 阿里巴巴集团控股有限公司 A kind of exchange method and equipment
CN109979036A (en) * 2019-04-03 2019-07-05 深圳市海圳汽车技术有限公司 With recorder control and the system and control method of speech recognition controlled, recorder
CN110010125A (en) * 2017-12-29 2019-07-12 深圳市优必选科技有限公司 A kind of control method of intelligent robot, device, terminal device and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014159581A1 (en) * 2013-03-12 2014-10-02 Nuance Communications, Inc. Methods and apparatus for detecting a voice command
US20180158449A1 (en) * 2016-12-02 2018-06-07 Baidu Online Network Technology (Beijing) Co., Ltd. Method and device for waking up via speech based on artificial intelligence
CN109767774A (en) * 2017-11-08 2019-05-17 阿里巴巴集团控股有限公司 A kind of exchange method and equipment
CN108337362A (en) * 2017-12-26 2018-07-27 百度在线网络技术(北京)有限公司 Voice interactive method, device, equipment and storage medium
CN110010125A (en) * 2017-12-29 2019-07-12 深圳市优必选科技有限公司 A kind of control method of intelligent robot, device, terminal device and medium
CN109658925A (en) * 2018-11-28 2019-04-19 上海蔚来汽车有限公司 It is a kind of that wake-up vehicle-mounted voice dialogue method and system are exempted from based on context
CN109979036A (en) * 2019-04-03 2019-07-05 深圳市海圳汽车技术有限公司 With recorder control and the system and control method of speech recognition controlled, recorder

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUN"ICHI IDO,等: "Interaction of receptionist ASKA using vision and speech information", 《IEEE CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS 2003》 *
郑志辉,等: "基于语音实现人机对话的空调控制器研究开发", 《2 0 1 8年中国家用电器技术大会 论文集》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114360527A (en) * 2021-12-30 2022-04-15 亿咖通(湖北)技术有限公司 Vehicle-mounted voice interaction method, device, equipment and storage medium
CN114360527B (en) * 2021-12-30 2023-09-26 亿咖通(湖北)技术有限公司 Vehicle-mounted voice interaction method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110517678B (en) 2022-04-08

Similar Documents

Publication Publication Date Title
US11386905B2 (en) Information processing method and device, multimedia device and storage medium
CN107978316A (en) The method and device of control terminal
CN109446876A (en) Sign language information processing method, device, electronic equipment and readable storage medium storing program for executing
US10225510B2 (en) Providing a log of events to an isolated user
US20230045237A1 (en) Wearable apparatus for active substitution
CN107741698A (en) Linkage control method, device and system for intelligent household appliance and entrance guard equipment
CN106235931A (en) Control the method and device of face cleaning instrument work
CN103139351A (en) Volume control method and device, and communication terminal
US20180054688A1 (en) Personal Audio Lifestyle Analytics and Behavior Modification Feedback
CN106067996B (en) Voice reproduction method, voice dialogue device
CN106205628A (en) Acoustical signal optimization method and device
CN112532266A (en) Intelligent helmet and voice interaction control method of intelligent helmet
CN113038337B (en) Audio playing method, wireless earphone and computer readable storage medium
US20240096343A1 (en) Voice quality enhancement method and related device
CN108900951A (en) Volume adjusting method, earphone and computer readable storage medium
CN111692418A (en) Water outlet device and control method thereof
CN110211583A (en) A kind of voice interactive method and interactive voice equipment based on intelligent line traffic control
CN110232909A (en) A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing
CN110517678A (en) A kind of AI voice answer-back response system of view-based access control model induction
CN108347522A (en) Adjust the method and device of volume
Dargie Adaptive audio-based context recognition
CN113709291A (en) Audio processing method and device, electronic equipment and readable storage medium
CN106210247A (en) Terminal control method and device
CN106686245A (en) Working mode adjusting method and device
CN106328131A (en) Interaction system capable of sensing position of caller and starting method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant