CN110517678A

CN110517678A - A kind of AI voice answer-back response system of view-based access control model induction

Info

Publication number: CN110517678A
Application number: CN201910804779.5A
Authority: CN
Inventors: 邹珺; 熊阿伟
Original assignee: Nanchang Baolai Technology Co Ltd
Current assignee: Nanchang Baolai Technology Co Ltd
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2019-11-29
Anticipated expiration: 2039-08-28
Also published as: CN110517678B

Abstract

The present invention relates to a kind of AI voice answer-back response system of view-based access control model induction, including voice-output device, voice-input device, speech apparatus, speech sound reponsive apparatus；Number visual response equipment, phone visual response equipment, nozzle type visual response equipment, earplug visual response equipment, monitoring device；User inputs voice by voice-input device, speech apparatus carries out the conversion of analog signal and digital signal to the voice of input, speech sound reponsive apparatus judge whether being special sound, is that special sound then carries out voice response, is output by voice equipment and carries out AI dialogue mode；It is not special sound is then other voice responses；Then start monitoring device, at this moment will be according to nozzle type visual response equipment, phone visual response equipment, the information that number visual response equipment generates is to determine whether response is output by voice equipment and carries out AI dialogue mode only when three is judged as YES.

Description

A kind of AI voice answer-back response system of view-based access control model induction

Technical field

The present invention relates to a kind of artificial intelligent voice response system, specifically a kind of AI of view-based access control model induction Voice answer-back response system.

Background technique

Intelligent sound box is the product of speaker upgrading, is the tool that family consumer is surfed the Internet with voice, than Such as requesting songs, online shopping, or understanding weather forecast, it can also be controlled smart home device, for example open Curtain, setting refrigerator temperature, allow in advance water heater heating etc..

Intelligent sound box actually belongs to intelligent sound technology, and core is very brief --- and to allow machine in voice dialogue This link possesses the ability for being similar to people, and intelligent sound box becomes the general presence of small household appliances, and it is empty to penetrate into daily life Between, but the response system of current intelligent sound technology, the daily habits and behavior aspect of simulating people are showed simultaneously It is not fully up to expectations.

The response system of current intelligent sound technology needs user to say a specific word, intelligent sound Case carries out response, this specific word is usually the title of intelligent sound box by this specific word.And people exist In every-day language, when person to person talks with face-to-face, the title of other side is seldom said, then engage in the dialogue, this does not just meet the daily of people Habit and behavior, this is the deficiencies in the prior art place.

Summary of the invention

In order to solve intelligent retrieval function in the prior art, the technical solution adopted by the present invention is that, a kind of view-based access control model The AI voice answer-back response system of induction, which is characterized in that including voice-output device, voice-input device, voice conversion is set It is standby, speech sound reponsive apparatus；Number visual response equipment, phone visual response equipment, nozzle type visual response equipment, monitoring device.

It is a kind of intelligent sound interaction platform that the present invention, which might also say that, and monitoring device is mounted on the region for needing to respond, right The region is monitored in real time.

In monitoring device, 360 ° of rotating cameras of energy carry out panoramic video monitoring to response region.

It is a kind of AI voice judgement conversational system, voice-output device, with speech apparatus phase that the present invention, which might also say that, It even, is the output equipment for generating voice.

In voice-output device, it is provided with dynamic speaker, utilizes the interaction between voice coil and stationary magnetic field Power makes diaphragm oscillations and sounding.

In voice-output device, it is provided with cone basin formula loudspeaker, the diaphragm materials used are in paper pulp material or mix Wool, silk, carbon fibre material, to increase its rigidity, interior damping and waterproof performance.

In voice-output device, it is provided with frequency divider, frequency divider is that power divider is also referred to as passive type rear class frequency divider, It is to be divided after power power amplifier.It mainly includes inductance, resistance, capacitor passive block, forms filter network, The audio signal of each frequency range is sent in the loudspeaker of corresponding band respectively and goes to reset.

It is a kind of artificial intelligent voice response interaction platform, voice-input device, with voice that the present invention, which might also say that, Conversion equipment is connected, and the voice messaging of people is directly inputted to the human interface device of computer.

It is a kind of AI voice technology response system that the present invention, which might also say that, and speech apparatus is set with voice input Standby to be connected with voice-output device, the voice of input carries out the conversion of analog signal and digital signal, and voice-input device is defeated The characteristic information (variation such as frequency, period, tone) of the voice entered records in a computer after making digitized processing；Or meter The information of calculation machine is converted to the characteristic information output of voice.

Nozzle type visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, then be not responding to.

In nozzle type visual response equipment, face identification system is set, it is specific by setting in the human face region of identification The threshold value of color detects the region of lip, and by the comparison of the previous frame and next frame of video, the boundary of lip is not overlapped, Then the nozzle type of people is not static.

Number visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area Video, judged there are several individuals in video, two or more people are judged as being just to be not responding to.

In number visual response equipment, counter is set, counter 1 then responds, and counter is greater than 1, then does not ring It answers.

Phone visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area Video, judge the whether hand-held phone of the people in video or wear earphone, human hand held phone or wear earphone, be then not responding to.

In phone visual response equipment, the 3 d model library of mobile phone and fixed-line telephone is set, passes through identification people's Hand, and then object in the hand of people is compared by 3 d model library, and then judge whether it is phone.

In phone visual response equipment, the 3 d model library of bluetooth headset and common headphones is set, passes through identification people's Ear, and then the object by being worn on the ear of 3 d model library comparison people, and then judge whether it is earphone.

Speech sound reponsive apparatus, with voice-input device, speech apparatus is connected, to the equipment that voice generates response, language Sound equipment should be divided into two kinds, and one kind is that special sound responds, and one kind is other voice responses.Special sound response, as long as being exactly language Sound response apparatus receives special sound and just generates response, is output by voice equipment and engages in the dialogue mode；Other voice responses, It is other voices for receiving special sound in addition to speech sound reponsive apparatus, then starts monitoring device, at this moment will be regarded according to nozzle type Feel sensing apparatus, phone visual response equipment, the information that number visual response equipment generates is to determine whether response, only works as people Number visual response equipment, phone visual response equipment, nozzle type visual response equipment when being all judged as YES, are output by voice and set It is standby to carry out AI dialogue mode.

The workflow of voice response is that user inputs voice by voice-input device, and speech apparatus is to input Voice carry out analog signal and digital signal conversion, it is specific that speech sound reponsive apparatus, which judge whether being special sound, Voice then carries out voice response, is output by voice equipment and carries out AI dialogue mode；Be not special sound be then other voices ring It answers；

The workflow of other voice responses is, by nozzle type visual response equipment, shoots to monitoring device to monitoring area Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static Then, there are several individuals in video is judged to the video that monitoring device shoots monitoring area by number visual response equipment, Two or more people are judged as being just to be not responding to, and a people is just by phone visual response equipment, to monitoring device pair The video of monitoring area shooting judge the whether hand-held phone of the people in video or wears earphone, human hand held phone or wears earphone It is then not responding to, people does not hold phone or wears earphone then, carries out voice response, is output by voice equipment and carries out AI dialogue mode.

Detailed description of the invention

Fig. 1 is overall structure diagram of the invention.

Fig. 2 is the work flow diagram of voice response of the invention.

Fig. 3 is one work flow diagram of embodiment of other voice responses of the invention.

Fig. 4 is two work flow diagram of embodiment of other voice responses of the invention.

Fig. 5 is three work flow diagram of embodiment of other voice responses of the invention.

Fig. 6 is the example IV work flow diagram of other voice responses of the invention.

Fig. 7 is five work flow diagram of embodiment of other voice responses of the invention.

Fig. 8 is six work flow diagram of embodiment of other voice responses of the invention.

Specific embodiment

The embodiment of the monitor supervision platform system of intelligent retrieval of the invention is described in detail below with reference to accompanying drawings.

Embodiment one

Monitoring device is mounted on the region for needing to respond, monitors in real time to the region.

Voice-output device is connected with speech apparatus, is the output equipment for generating voice.

In voice-output device, dynamic speaker makes to shake using the interaction force between voice coil and stationary magnetic field Film vibrates and sounding.

In voice-output device, it is provided with cone basin formula loudspeaker, the diaphragm materials used are based on paper pulp material, or mix Enter wool, silk, carbon fibre material, to increase its rigidity, interior damping and waterproof performance.

Voice-input device is connected with speech apparatus, and the voice messaging of people is directly inputted to the man-machine of computer Interface equipment.

Speech apparatus is connected with voice-input device and voice-output device, and the voice of input carries out analog signal With the conversion of digital signal, the characteristic information (frequency, period, tone etc. change) of the voice of voice-input device input is counted It is recorded in a computer after wordization processing；Or the characteristic information that the information of computer is converted to voice is exported.

In nozzle type visual response equipment, face identification system is set, is compared, is ignored inside frame by rectangular edges Image recognition.

This is primarily to nozzle type visual response equipment excludes the face in television set.Since television set is rectangular edges Frame, therefore the face in television set is ignored, in order to avoid accidentally identify the face in television set.

Phone visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, and shoots to monitoring device to monitoring area Video, judge that the whether hand-held phone of people in video, human hand held phone are then not responding to.

In order to judge whether user is to make a phone call by earphone, identify whether user's has earphone.

Speech sound reponsive apparatus, with voice-input device, speech apparatus is connected, to the equipment that voice generates response, language Sound equipment should be divided into two kinds, and one kind is that special sound responds, and one kind is other voice responses.Special sound response, as long as being exactly language Sound response apparatus receives special sound and just generates response, is output by voice equipment and engages in the dialogue mode；Other voice responses, It is other voices for receiving special sound in addition to speech sound reponsive apparatus, then starts monitoring device, at this moment will be regarded according to nozzle type Feel sensing apparatus, phone visual response equipment, the information that number visual response equipment generates is to determine whether response, only works as people Number visual response equipment, phone visual response equipment, nozzle type visual response equipment, when being all judged as YES, speech sound reponsive apparatus into Row response, and be output by voice equipment and carry out AI dialogue mode.

Embodiment two

The workflow of other voice responses is, by nozzle type visual response equipment, shoots to monitoring device to monitoring area Video, judge whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static Then, by phone visual response equipment, to the video that monitoring device shoots monitoring area, judge people in video whether hand It holds phone or wears earphone, human hand held phone or wear earphone and be then not responding to, the not hand-held phone of people or wear earphone then, by number visual impression Equipment is answered, to the video that monitoring device shoots monitoring area, is judged, there are several individuals in video, two or more People be judged as being just to be not responding to, a people just carries out voice response, be output by voice equipment carry out AI dialogue mode.

Embodiment three

The workflow of other voice responses is, by number visual response equipment, shoots to monitoring device to monitoring area Video, carry out having several individuals in judgement video, two or more people are judged as being just to be not responding to, then by nozzle type Visual response equipment carries out judging whether the nozzle type of the people in video is static to the video that monitoring device shoots monitoring area, The nozzle type of people be it is static, be not responding to, the nozzle type of people is not static then, by phone visual response equipment, to monitoring device to monitored space The video of domain shooting judge the whether hand-held phone of the people in video or wears earphone, human hand held phone or wears earphone and does not ring then It answers, people, which does not hold phone or wears earphone, then carries out voice response, is output by voice equipment and carries out AI dialogue mode.

Example IV

The workflow of other voice responses is, by number visual response equipment, shoots to monitoring device to monitoring area Video, carry out having several individuals in judgement video, two or more people are judged as being just to be not responding to, then by phone Visual response equipment, to the video that monitoring device shoots monitoring area, judge the whether hand-held phone of people in video or It wears earphone, human hand held phone or wears earphone and be then not responding to, the not hand-held phone of people or wear earphone, then by nozzle type visual response equipment, To the video that monitoring device shoots monitoring area, carry out judging whether the nozzle type of the people in video is static, and the nozzle type of people is quiet It is only then not responding to, the nozzle type of people is not static, then carries out voice response, is output by voice equipment and carries out AI dialogue mode.

Embodiment five

The workflow of other voice responses is, by phone visual response equipment, shoots to monitoring device to monitoring area Video, judge the whether hand-held phone of the people in video or wear earphone, human hand held phone or wearing earphone and being then not responding to, people is not Hand-held phone wears earphone then, is sentenced by number visual response equipment to the video that monitoring device shoots monitoring area It is disconnected, there are several individuals in video, two or more people are judged as being just to be not responding to, and a people is just by nozzle type visual impression Equipment is answered, to the video that monitoring device shoots monitoring area, carries out judging whether the nozzle type of the people in video is static, the mouth of people Type be it is static, be not responding to, the nozzle type of people is not static then, carry out voice response, be output by voice equipment carry out AI dialogue mould Formula.

Embodiment six

The workflow of other voice responses is, by phone visual response equipment, shoots to monitoring device to monitoring area Video, judge the whether hand-held phone of the people in video or wear earphone, human hand held phone or wearing earphone and being then not responding to, people is not Hand-held phone wears earphone then, is judged by nozzle type visual response equipment the video that monitoring device shoots monitoring area Whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static, then by number visual impression Equipment is answered, to the video that monitoring device shoots monitoring area, is judged, there are several individuals in video, two or more People be judged as being just to be not responding to, a people just then, carries out voice response, is output by voice equipment and carries out AI dialogue mould Formula.

Illustrate that user does not speak, sound for judging that the nozzle type of people does not change by nozzle type visual response equipment Source be from TV, radio, other noises are then not responding to；The nozzle type of people changes then, illustrates that sound is to make The sound of user, it is possible that being to speak with other people, therefore judge the people in video by number visual response equipment again Number may be to talk between two people with regard to explanation, be then not responding to if it is two or more people；If it is one People just illustrates that this people is likely to speak to intelligent response system, it is possible that making a phone call or wearing earphone；Therefore lead to again Phone visual response equipment is crossed, judge the whether hand-held phone of the people in video or wears earphone, human hand held phone or wears earphone, then is said Bright he is making a phone call or is wearing earphone, then is not responding to, if if the not hand-held phone of people or wearing earphone, illustrate he be with intelligent language Sound answering system is spoken, then carries out voice response, is output by voice equipment and is carried out AI dialogue mode.

The purpose of the invention is to enable intelligent voice response system more reasonably apish behavioural habits, by intelligence From the point of view of energy voice response system is as one " people ", when he should respond reaction could be more humanized. Intelligent voice response system passes through nozzle type visual response equipment, phone visual response equipment, the judgement of number visual response equipment Whether user with intelligent voice response system dialog, without particular words as stiff instruction.Certainly also there is example Outside, such as if user is to talk to onself.Firstly, such case is seldom, furthermore, if by intelligent voice response system As one " people ", two people of A and B stay together, and A talks to onself, and in addition B is also likely to will be considered that A is spoken with oneself, This is exactly the behavioural habits of people.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims

1. a kind of AI voice answer-back response system of view-based access control model induction, which is characterized in that including voice-output device, voice is defeated Enter equipment, speech apparatus, speech sound reponsive apparatus；Number visual response equipment, phone visual response equipment, nozzle type visual impression Answer equipment, monitoring device；

Monitoring device is mounted on the region for needing to respond, monitors in real time to the region；

Voice-output device is connected with speech apparatus, is the output equipment for generating voice；

Voice-input device is connected with speech apparatus, and the voice messaging of people is directly inputted to the man-machine interface of computer Equipment；

Speech apparatus is connected with voice-input device and voice-output device, and the voice of input carries out analog signal sum number The conversion of word signal records in a computer after the characteristic information of the voice of voice-input device input is made digitized processing； Or the characteristic information that the information of computer is converted to voice is exported；

Nozzle type visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, the view shot to monitoring device to monitoring area Frequently, carry out judging whether the nozzle type of the people in video static, the nozzle type of people be it is static, then be not responding to；

Number visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, the view shot to monitoring device to monitoring area Frequently, judged there are several individuals in video, two or more people are judged as being just to be not responding to；

Phone visual response equipment, with speech sound reponsive apparatus, monitoring device is connected, the view shot to monitoring device to monitoring area Frequently, it carries out judging the whether hand-held phone of the people in video or wears earphone, human hand held phone or wear earphone, be then not responding to；

Speech sound reponsive apparatus, with voice-input device, speech apparatus is connected, and to the equipment that voice generates response, voice is rung Two kinds should be divided into, one kind is that special sound responds, and one kind is that special sound responds, and one kind is other voice responses.Special sound Response, exactly generates response as long as speech sound reponsive apparatus receives special sound, is output by voice equipment and engages in the dialogue mould Formula；Other voice responses are other voices for receiving special sound in addition to speech sound reponsive apparatus, then start monitoring device, this When will be according to nozzle type visual response equipment, phone visual response equipment, information that number visual response equipment generates judges Whether respond, only when number visual response equipment, phone visual response equipment, nozzle type visual response equipment is all judged as YES When, it is output by voice equipment and carries out AI dialogue mode；

The workflow of voice response is that user inputs voice, language of the speech apparatus to input by voice-input device Sound carries out the conversion of analog signal and digital signal, and it is special sound that speech sound reponsive apparatus, which judge whether being special sound, Voice response is then carried out, equipment is output by voice and carries out AI dialogue mode；It is not special sound is then other voice responses；

The workflow of other voice responses is, by nozzle type visual response equipment, the view that monitoring device shoots monitoring area Frequently, carry out judging whether the nozzle type of the people in video static, the nozzle type of people be it is static, be not responding to, the nozzle type of people is not static then, The nozzle type of people is not static then, is judged by number visual response equipment the video that monitoring device shoots monitoring area, There are several individuals in video, two or more people are judged as being just to be not responding to, and a people is just set by phone visual response It is standby, to the video that monitoring device shoots monitoring area, judge the whether hand-held phone of people in video, human hand held phone is then It is not responding to, the not hand-held phone of people then, carries out voice response, is output by voice equipment and carries out AI dialogue mode.

2. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that in monitoring device, energy 360 ° of rotating cameras carry out panoramic video monitoring to response region.

3. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that in voice-output device In, it is provided with cone basin formula loudspeaker, the diaphragm materials used mix wool, silk, carbon fibre material in paper pulp material.

4. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that in voice-output device In, it is provided with frequency divider, frequency divider, which is that power divider is also referred to as passive type rear class frequency divider, to be divided after power power amplifier Frequency；It mainly includes inductance, resistance, capacitor passive block, forms filter network, the audio signal of each frequency range is sent respectively It is reset into the loudspeaker of corresponding band.

5. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in nozzle type visual response In standby, face identification system is set, by the threshold value of setting specific color in the human face region of identification, detects the area of lip Domain, by the comparison of the previous frame and next frame of video, the boundary of lip is not overlapped, then the nozzle type of people is not static.

6. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in number visual response In standby, counter is set, counter 1 then responds, and counter is greater than 1, then is not responding to.

7. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in phone visual response In standby, the 3 d model library of mobile phone and fixed-line telephone is set, is compared by identifying the hand of people, and then by 3 d model library Object in the hand of people, and then judge whether it is phone.

8. the monitor supervision platform system of the intelligent retrieval according to claim 1, which is characterized in that set in phone visual response In standby, the 3 d model library of bluetooth headset and common headphones is set, by identifying the ear of people, and then passes through 3 d model library ratio To the object worn on the ear of people, and then judge whether it is earphone.