WO2020244410A1 - 基于捂嘴动作识别的语音交互唤醒电子设备、方法和介质 - Google Patents

基于捂嘴动作识别的语音交互唤醒电子设备、方法和介质 Download PDF

Info

Publication number
WO2020244410A1
WO2020244410A1 PCT/CN2020/092190 CN2020092190W WO2020244410A1 WO 2020244410 A1 WO2020244410 A1 WO 2020244410A1 CN 2020092190 W CN2020092190 W CN 2020092190W WO 2020244410 A1 WO2020244410 A1 WO 2020244410A1
Authority
WO
WIPO (PCT)
Prior art keywords
mouth
user
electronic device
covering
gesture
Prior art date
Application number
PCT/CN2020/092190
Other languages
English (en)
French (fr)
Inventor
喻纯
史元春
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Priority to US17/616,075 priority Critical patent/US20220319520A1/en
Publication of WO2020244410A1 publication Critical patent/WO2020244410A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1041Mechanical or electronic switches, or control elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/07Applications of wireless loudspeakers or wireless microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones

Definitions

  • the present invention generally relates to control and interaction methods of smart electronic portable devices.
  • a specific instruction is triggered or modal information input such as voice is activated.
  • Tap (or press and hold) an interface element (such as an icon) on the screen of the mobile device to trigger a command or activate modal information input such as voice.
  • a specific word (such as product nickname) can be used as the wake-up word, and the device activates the voice input after detecting the corresponding wake-up word.
  • an intelligent electronic portable device including a sensor system, capable of capturing signals from which it can be judged that the user's hand is placed on the user's mouth to make a mouth-covering gesture
  • the intelligent electronic portable device includes a memory and A processor with computer-executable instructions stored in the memory, and when the computer-executable instructions are executed by the processor, they can be operated to perform the following interactive method: processing the signal to determine whether the user puts his hand on his mouth and makes a mouth-covering gesture ; In response to determining that the user puts his hand on his mouth to make a mouth-covering gesture, the mouth-covering gesture is used as a user interactive input control method to control program execution on the smart electronic device, including triggering corresponding control instructions or triggering other input methods.
  • the mouth-covering gesture is distinguished between using the left hand and using the right hand.
  • the mouth-covering gesture distinguishes different positions of the palm relative to the mouth, including the palm between the mouth and the left ear, the palm between the mouth and the right ear, and the palm directly in front of the mouth.
  • the mouth-covering gesture distinguishes the types of gestures that touch the face and those that do not touch the face.
  • the specific hand types of the mouth-covering gesture include but are not limited to the following categories:
  • the smart electronic device when the smart electronic device recognizes that the mouth-covering gesture is a predetermined category, it executes a specific control instruction.
  • the executed control instruction is to trigger other input methods other than the mouth-covering gesture, that is, to process information input by other input methods.
  • the other input modes include one of voice input, non-mouth covering gesture input, line of sight input, blinking input, head movement input, or a combination thereof.
  • the signal is processed to detect whether the user removes the mouth-covering gesture; in response to detecting that the user removes the mouth-covering gesture, the smart electronic device ends the interaction process.
  • any feedback including visual and auditory is provided to remind the user that the smart electronic device has triggered other input methods.
  • the other input mode triggered is voice input
  • the smart electronic device processes the voice input made by the user while keeping the mouth-covering gesture.
  • the smart electronic device processes the voice signal as a voice input.
  • the smart electronic device is a mobile phone equipped with a sensor of binaural Bluetooth headset, wired headset or camera.
  • the smart electronic device is a smart wearable device among a watch, a smart ring, and a wrist watch.
  • the smart electronic device is a head-mounted smart display device equipped with a microphone or a multi-microphone group.
  • the sensor system includes one or more of the following items: a camera; an infrared camera; a depth camera; a microphone; a dual microphone group; a multiple microphone group; a proximity sensor; and an accelerometer.
  • the signal used for recognition by the sensor system includes a facial image taken by a camera.
  • the signal includes a facial image captured by a camera
  • the user makes a mouth-covering gesture
  • one or more types of mouth-covering gestures of the user are recognized.
  • the smart electronic device is a smart phone
  • the camera includes a front camera of the smart phone.
  • the characteristics of the signal used for recognition by the sensor system include one or more of the time domain characteristics, the frequency spectrum characteristics, or the sound source location characteristics of the sound signal received by a single microphone.
  • the microphone is a microphone on a mobile phone and/or a microphone on a wire-controlled headset.
  • the characteristics of the signal used for recognition by the sensor system include the characteristics of differences between sound signals received by multiple microphones.
  • the mouth-covering gesture is recognized through the signal difference between the left and right headsets.
  • the signal is a proximity light sensor signal on the smart ring.
  • an interaction method for a smart electronic device includes a sensor system capable of capturing a signal of a user with one hand on the mouth and making a mouth-covering gesture.
  • the interaction method performed includes: processing the signal to determine that the user makes a mouth-covering gesture with one hand; in response to determining that the user keeps the mouth-covering gesture with the hand on the mouth, according to the type of the mouth-covering gesture, the smart device.
  • the interactive content of the current application and the information input by the user through other modes at the same time analyze the user’s interactive intention; according to the parsed interactive intention, the smart device will receive, analyze and output the corresponding content of the user’s input information
  • the signal is processed to determine that the user removes the mouth-covering gesture; in response to determining that the user removes the mouth-covering gesture, the interaction process is ended.
  • the content output form includes one or a combination of voice and image.
  • the user's input information includes not only the mouth-covering gesture itself, but also other modal information of the user.
  • the other modal information includes voice or eye expression.
  • a computer-readable medium having computer-executable instructions stored thereon, and the computer-executable instructions can execute the aforementioned voice interactive wake-up method when executed by a computer.
  • the interaction is more natural.
  • the user can interact by making a mouth-covering gesture, which conforms to user habits and cognition.
  • the use efficiency is higher. It can be used with one hand. Users do not need to operate the device or switch between different user interfaces/applications, do not need to hold down a button or repeat the wake-up word, just raise their hand to the mouth to use.
  • Fig. 1 is a schematic flowchart of a voice input interaction method according to an embodiment of the present invention
  • FIG. 2 is a schematic front view of the right hand covering the mouth to the left in the triggering posture according to an embodiment of the present invention
  • FIG. 3 is a schematic side view of the right hand covering the mouth to the left in the triggering posture according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of the four fingers not exceeding the nose posture in the trigger posture according to the embodiment of the present invention.
  • Fig. 5 is a schematic diagram of a posture of a thumb against a chin in a trigger posture according to an embodiment of the present invention.
  • the camera in this article refers to the ordinary camera, excluding the infrared camera.
  • FIG. 1 is a schematic flowchart of an interaction method for a smart electronic device to start and end an interaction with a user by recognizing and removing a mouth-covering gesture by a user according to an embodiment of the present invention.
  • Intelligent electronic portable devices including sensor systems, can capture signals from which it can be judged that the user's hand is placed on the user's mouth to make or remove the mouth-covering gesture.
  • the sensor system includes one or more of the following items: a camera, an infrared camera, a microphone, a dual-microphone group, a multi-microphone group, a proximity sensor, and an accelerometer.
  • the interaction here may include, but is not limited to: voice interaction, eye contact, gesture interaction, and so on.
  • S101 processing the signal to determine that the user puts his hand on the mouth to make a mouth-covering gesture.
  • a mouth-covering gesture distinguishes between using the left hand and using the right hand.
  • the mouth-covering gesture distinguishes different positions of the palm relative to the mouth, including the palm between the mouth and the left ear, the palm between the mouth and the right ear, and the palm directly in front of the mouth.
  • the mouth-covering gesture distinguishes the types of gestures that touch the face and those that do not touch the face.
  • the mouth-covering gesture may include one of the following items:
  • the user covers his mouth with one hand to the left or right;
  • the user touches the face with one hand to cover the mouth, and covers the mouth-covering gesture of the entire mouth;
  • the user touches the face with one hand and covers the mouth, the thumb is on the mouth, the index finger is on the top of the lips, and the gesture of covering the mouth is exposed under the palm;
  • the user touches the face with one hand and covers the mouth, the thumb is attached to the mouth, the tail finger touches the chin, and the mouth is exposed above the palm of the mouth;
  • the user covers the mouth without touching the face with one hand, and covers the mouth-covering gesture of the entire mouth;
  • the user touches the face and covers the mouth with no hand, the thumb sticks to the mouth, the index finger touches the upper part of the lips, and the mouth cover gesture is revealed under the palm;
  • the user covers the mouth with one hand without touching the face, the thumb is attached to the mouth, the tail finger is touching the chin, and the mouth is exposed above the palm of the mouth.
  • Figures 2 to 5 show a few cases where a user puts one hand on his mouth and makes a mouth-covering gesture to trigger information input.
  • 2 and 3 are schematic diagrams of the front and side of the left hand covering the mouth to the right in the triggering posture, respectively.
  • the user puts his left hand on the left side of the mouth stretches his fingers to cover the mouth to the left, keeps the thumb upward, and the other four fingers are left above the lips, and the position below the nose, that is, the upper and left sides of the mouth are blocked by the left hand.
  • the extended position of the other four fingers except the thumb may not exceed or extend beyond the right side of the nose, and the thumb may be on the side of the face or against the chin.
  • Fig. 5 are schematic diagrams of the posture of the four fingers not exceeding the nose and the thumb touching the chin, respectively. Similar to the aforementioned posture of covering the mouth with the left hand to the right, the position and extension of the thumb and the other four fingers are different in the two postures.
  • the above description of the triggering postures is exemplary, not exhaustive, and is not limited to the disclosed postures.
  • the mouth-covering gesture is used as a user interactive input control method to control the execution of the program on the smart electronic device, including triggering a corresponding control instruction or triggering Other input methods.
  • the smart electronic device is a smart phone
  • the front camera of the smart phone detects that the user puts his hand on his mouth to make a mouth-covering gesture, and the trigger control command is to set the phone to mute; another design is when it detects
  • the smart phone vibrates to remind the user that the user has entered the voice input mode, and the user can make voice input by speaking.
  • the smart electronic device may also be a wireless headset. By analyzing the difference of the microphone signals on the headset, it is determined that the user places the hand on the mouth to make a mouth-covering gesture.
  • the user’s interaction intention is determined according to the type of mouth-covering gesture, the interactive content of the current application of the smart device, and the information input by the user through other modes at the same time. Resolve.
  • the smart electronic device recognizes which mouth-covering gesture the user makes, and then corresponds the mouth-covering gesture to a predetermined user intention (instruction) (the corresponding relationship may be defined according to human usage habits), so as to The command responds.
  • the smart electronic device when the smart electronic device recognizes that the mouth-covering gesture is a predetermined category, it executes a specific control instruction.
  • the mouth-covering gesture is a first predetermined category, such as a gesture of covering the mouth to the left
  • the smart device receives, analyzes and responds to the voice input made by the user while maintaining the mouth-covering gesture The content output.
  • the mouth-covering gesture is a second predetermined category, such as a gesture of covering the mouth to the right
  • the smart device receives, analyzes and performs the head movement input of the user while maintaining the mouth-covering gesture. Output the corresponding content.
  • the mouth-covering gesture is a third predetermined category, such as a mouth-covering gesture that covers the entire mouth with one hand, it is determined that the user intends to execute a specific control instruction on the smart device, that is, when the smart device recognizes that the user keeps the mouth covered, it is resolved as a specific Control instructions.
  • the mouth-covering gesture distinguishes different postures, such as covering the mouth with the left hand and covering the mouth with the right hand, determine the different control instructions that the user intends to execute on the smart device.
  • the smart device recognizes that the user keeps covering the mouth, it is resolved into different types according to the different types of the mouth-covering gesture Control instruction.
  • the mouth-covering gesture triggers different control instructions or triggers the input of different modal information.
  • each type of gesture triggers different control instructions or triggers the input of different modal types of information in different applications.
  • the input of different modal information or other input methods include one of voice input, non-mouth covering gesture input, line of sight input, blinking input, head movement input, or a combination thereof.
  • any feedback including visual and auditory can be provided to remind the user that the smart electronic device has triggered other input methods.
  • the other input mode triggered is voice input
  • the smart electronic device processes the voice input made by the user while keeping the mouth-covering gesture.
  • the signal for recognizing the mouth-covering gesture includes the user's voice signal
  • the smart electronic device processes the voice signal as a voice input.
  • the signal is processed to detect whether the user makes a gesture of removing the mouth-covering.
  • the voice interaction process is ended.
  • the smart electronic portable device detects and recognizes the position and posture of the hand through its various sensors.
  • the first embodiment The smart portable device is a mobile phone, and the sensor system includes a camera
  • the signal used by the sensor system for recognition at this time includes the facial image taken by the camera.
  • the signal includes a facial image captured by a camera
  • one or more types of mouth-covering gestures of the user are recognized.
  • the mobile phone is equipped with a front-facing camera to capture an image of the user covering the mouth with one hand, and the mobile phone processes the image to recognize that the user is making a gesture of covering the mouth with one hand.
  • This gesture of covering the mouth with one hand can be parsed as a control command to the mobile phone , Such as mute.
  • the second embodiment The smart portable device is a mobile phone, the sensor system includes a camera, and a voice prompt is performed before input
  • the front camera of the mobile phone captures the user covering the mouth and judges that the user is making a gesture of covering the mouth with one hand.
  • the mouth-covering gesture can be parsed as the user's voice input intention.
  • the headset if worn by the user or the mobile phone emits a prompt tone to prompt the user to perform voice input, and the user starts voice input after hearing the prompt tone.
  • the smart portable device is a smart wearable device such as a smart watch or a smart ring or a wrist watch, and the sensor system includes a proximity sensor and a microphone.
  • the proximity sensor and microphone located on the smart watch or ring By detecting the proximity sensor and microphone located on the smart watch or ring, when the proximity sensor detects that it is close and the microphone receives a voice signal, it is determined that the user may be making a gesture of covering the mouth with one hand.
  • the smart portable device is a mobile phone and/or a wired headset, and the sensor system includes a microphone
  • the sound is passed into the microphone through the occlusion of the hand, and the voice characteristics and non-occlusion conditions are significant in the above aspects The difference can be judged whether the user is making a gesture of covering his mouth with one hand.
  • the characteristics of the signal used by the sensor system for identification include one or more of the time domain characteristics, frequency spectrum characteristics, or sound source location characteristics of the sound signal received by a single microphone
  • Fifth Embodiment Smart portable devices are mobile phones and dual Bluetooth headsets, and the sensor system includes dual microphone groups located in both ears
  • the sound signal received on both sides have significant differences in volume and energy distribution at different frequencies, which can be used to determine that the user may be making a gesture of covering the mouth with one hand.
  • the smart portable device is a head-mounted display device, and the sensor system includes multiple microphones
  • the user wears a head-mounted display device, which is equipped with multiple microphones at different positions. Similar to the fifth embodiment, the difference in the sound signals collected at different positions can be compared to determine whether the user is making a one-hand mouth covering gesture .
  • the user wears a wearable device located near the hand, such as a smart watch or a ring.
  • the wearable device is equipped with a motion sensor and a direction sensor.
  • the head wears a smart display device or earphone, and the device or earphone is equipped with a direction sensor.
  • the user can recognize the gesture of raising the hand, and then analyze the signal of the direction sensor located in the head and hand, and calculate the direction relationship between the user's head and hand.
  • the direction relationship between the head and hand meets the requirements of mouth-covering gesture, for example When the palm surface is basically parallel to the face surface, voice interaction is activated.
  • modal information in addition to using mouth-covering gestures to execute control instructions, other modal information can also be used for interaction.
  • Other modal information may include one of the user's voice, head movement, eye movement, or a combination thereof.
  • a voice input is triggered, and the user directly controls the smart electronic device through voice.
  • the head movement input is activated, and the user performs the confirmation operation by nodding the head. In this way, the mouth-covering gesture can conveniently and accurately open other modal inputs.
  • the signal used includes the image near the face taken by the camera
  • the user makes a mouth-covering gesture, before other modal inputs are made
  • the user's interaction intention is recognized by identifying the specific mouth-covering gesture through image processing.
  • a prompt including any one of visual and auditory senses is provided to confirm whether to activate other modal input.
  • smart electronic portable devices can use the aforementioned sensors, and can also include, but are not limited to, microphones, dual/multi-microphone groups, cameras, proximity sensors, etc. Using the combination of multiple sensor signals can make the detection and judgment of whether to activate the voice input more accurate and recall rate. At the same time, the use of various sensor signals can enable the present invention to be better applied to various intelligent electronic portable devices and adapt to more usage situations.
  • the characteristics of the signal used for recognition by the sensor system include one or more of the time domain characteristics, the frequency spectrum characteristics, or the sound source location characteristics of the sound signal received by the microphone.
  • an interaction method for a smart electronic device includes a sensor system capable of capturing a signal that a user is making a mouth-covering gesture with one hand.
  • the interaction method performed by the device includes: processing the signal to determine that the user makes a mouth-covering gesture with one hand; in response to determining that the user keeps the mouth-covering gesture with the hand on the mouth, according to the type of the mouth-covering gesture, the intelligence
  • the interactive content of the current application of the device and the information input by the user through other modalities at the same time analyze the user's interactive intention; according to the parsed interactive intention, the smart device will receive, analyze and make corresponding content for the user's input information Output; after responding to the user's mouth-covering gesture, in the case of the user interacting with the smart device, processing the signal to determine that the user removes the mouth-covering gesture; in response to determining that the user removes the mouth-covering gesture, the interaction process ends.
  • the content output form may include one or a combination of voice and image.
  • the user's input information may also include other modal information or other input information of the user.
  • other modal information or other input information may include voice input, non-mouth covering gesture input, line of sight input, blinking input, head movement input, etc., or a combination of these.
  • An example of an application scenario is given below, taking a user carrying a smartphone while wearing a binaural Bluetooth headset while being in a public place as an example.
  • the user wants to inquire about the weather conditions of the day through voice input.
  • the user puts one hand on his mouth to make a mouth-covering gesture, and at the same time says "How is the weather today?"
  • the smart phone recognizes the user's one-handed mouth gesture and voice input content, and the content output of weather information can be provided through the headset.
  • the user does not need to touch the mobile phone or perform information query through the interface of the mobile phone; there is no need to say a specific wake-up word to wake up the voice interaction; at the same time, the gesture of covering the mouth reduces the interference of voice input to others around, and protects the privacy of the user’s voice input.
  • the user's daily language communication habits and cognition are simple and natural.
  • the interaction is more natural.
  • the user can interact by making a mouth-covering gesture, which conforms to user habits and cognition.
  • the use efficiency is higher. It can be used with one hand. Users do not need to operate the device or switch between different user interfaces/applications, do not need to press a button or repeat the wake-up words, just lift their hands to the mouth to use.
  • the sensor system includes one or more of the following items: camera; infrared camera; depth camera; microphone; dual microphone group; multiple microphone group; proximity Sensor; and accelerometer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种捂嘴手势触发的交互方法和智能电子便携设备。该交互方法应用于具有传感器的智能电子便携设备。智能电子便携设备包括传感器***,能够捕捉到从其能判断用户的手放在用户嘴部做出捂嘴手势的信号,处理该信号以确定用户将手放在嘴部做出捂嘴手势(S101);响应于确定用户将手放在嘴边做出捂嘴手势,将捂嘴手势作为用户交互输入控制的方式,控制智能电子设备上的程序执行,包括触发相应的控制指令或者触发其他输入方式(S102)。该交互方法适用于用户在携带有智能电子设备时简单触发交互指令,用户不必接触智能设备,因此简化交互过程;捂嘴手势在用户进行语音输入等操作时,可以减少对周围他人的干扰,保护用户的隐私,降低用户交互时的心理负担;同时该手势为日常常用手势,学习成本低,交互自然。

Description

基于捂嘴动作识别的语音交互唤醒电子设备、方法和介质
本申请要求于2019年06月03日提交中国专利局、申请号为201910475947.0、发明名称为“基于捂嘴动作识别的语音交互唤醒电子设备、方法和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明总的来说涉及智能电子便携设备的控制及交互方法。
背景技术
随着计算机技术的发展,语音识别算法日益成熟,语音输入因其在交互方式上的高自然性与有效性而正变得越来越重要。用户可以通过语音与移动设备(手机、手表等)进行交互,完成指令输入、信息查询、语音聊天等多种任务。
随着计算机技术的发展,智能电子便携设备日益普及,用户可以通过控制指令控制智能设备,或通过语音,图像等不同模态向智能设备输入进行交互,完成指令输入、信息查询等多种任务。
而在何时触发控制指令及语音等模态的信息输入这一点上,现有的解决方案都有一些缺陷:
1.物理按键触发
按下(或按住)移动设备的某个(或某些)物理按键后,触发特定指令或激活语音等模态的信息输入。
该方案的缺点是:需要物理按键;容易误触发;需要用户按键。
2.界面元素触发
点击(或按住)移动设备的屏幕上的界面元素(如图标),触发指令或激活语音等模态的信息输入。
该方案的缺点是:需要设备具备屏幕;触发元素占用屏幕内容;受限 于软件UI限制,可能导致触发方式繁琐;容易误触发。
3.唤醒词(语音)检测
对于语音输入的触发,可以以某个特定词语(如产品昵称)为唤醒词,设备检测到对应的唤醒词后激活语音输入。
该方案的缺点是:隐私性和社会性较差;交互效率较低;用户日常交流过程中内容包含唤醒词,会引起误触发等问题。
发明内容
鉴于上述情况,提出了本发明:
根据本发明的一个方面,提供了一种智能电子便携设备,包括传感器***,能够捕捉到从其能判断用户的手放在用户嘴部做出捂嘴手势的信号,智能电子便携设备包括存储器和处理器,存储器上存储有计算机可执行指令,所述计算机可执行指令被处理器执行时可操作来执行如下交互方法:处理所述信号以确定用户是否将手放在嘴部做出捂嘴手势;响应于确定用户将手放在嘴边做出捂嘴手势,将捂嘴手势作为用户交互输入控制的方式,控制智能电子设备上的程序执行,包括触发相应的控制指令或者触发其他输入方式。
优选的,捂嘴手势区分使用左手做出和使用右手做出。
优选的,捂嘴手势区分手掌相对于嘴部的不同位置,包括手掌处于嘴部到左耳之间,手掌处于嘴部到右耳之间,手掌处于嘴部正前方。
优选的,捂嘴手势区分接触脸部与不接触脸部的手势类别。
优选的,所述捂嘴手势具体手型包括但不限于以下类别:
手掌遮挡住整个嘴部的捂嘴手势;
拇指贴在嘴边,食指贴在嘴唇上方,掌心以下露出嘴部的捂嘴手势;
拇指贴在下颌,食指贴在嘴唇上方,掌心以下露出嘴部的捂嘴手势;
拇指贴在嘴边,尾指接触下颌,掌心以上露出嘴部的捂嘴手势。
优选的,当智能电子设备识别捂嘴手势为预定类别时,执行特定的控制指令。
优选的,执行的控制指令为触发除捂嘴手势外的其它输入方式,即处理其它输入方式输入的信息。
优选的,所述其他输入方式包括语音输入、非捂嘴手势输入、视线输入、眨眼输入、头动输入之一或者其组合。
优选的,处理所述信号以检测用户是否去除捂嘴手势;响应于检测到用户去除捂嘴手势,智能电子设备结束所述交互过程。
优选的,提供包括视觉、听觉任一项反馈,提示用户智能电子设备已经触发其他输入方式。
优选的,触发的其他输入方式为语音输入,智能电子设备对用户在保持捂嘴手势同时进行的语音输入进行处理。
优选的,当所述用于识别捂嘴手势的信号包括用户的语音信号时,智能电子设备将该语音信号当作语音输入进行处理。
优选的,所述智能电子设备为手机,装备有双耳蓝牙耳机,有线耳机或者摄像头中的一种传感器。
优选的,所述智能电子设备为手表、智能戒指、腕表中的一种智能穿戴设备。
优选的,所述智能电子设备为头戴式智能显示设备,装备有麦克风或者多麦克风组。
优选的,所述传感器***包括下述项目中的一项或者多项:摄像头;红外摄像头;深度摄像头;麦克风;双麦克风组;多麦克风组;接近传感器;以及加速度计。
优选的,所述传感器***识别所用信号包括摄像头拍摄到的脸部图像。
优选的,在所述信号包括摄像头拍摄到的脸部图像时,在用户做出捂嘴手势后,识别用户的一类或者多类捂嘴手势。
优选的,所述智能电子设备为智能手机,所述摄像头包括智能手机的前置摄像头。
优选的,所述传感器***识别所用信号的特征包括单麦克风接收到的声音信号的时域特征、频谱特征或声音信号的声源位置特征中的一种或者 多种。
优选的,所述麦克风为手机上的麦克风和/或线控耳机上的麦克风。
优选的,所述传感器***识别所用信号的特征包括多麦克风接收到的声音信号之间的差异特征。
优选的,传感设备是无线蓝牙耳机时,通过左右耳机的信号差异来识别捂嘴手势。
优选的,所述信号为智能戒指上的接近光传感器信号。
根据本发明的另一方面,提供了一种智能电子设备的交互方法,所述智能电子设备包括传感器***,能够捕捉到用户单手在嘴边并做捂嘴手势的信号,所述智能电子设备执行的交互方法包括:处理所述信号以确定用户单手在嘴边做出捂嘴手势;响应于确定用户将手放在嘴边持续保持捂嘴手势,根据所做捂嘴手势类别、智能设备当前应用的交互内容、用户同时通过其它模态输入的信息,对于用户的交互意图进行解析;根据解析得到的交互意图,智能设备将对于用户的输入信息进行接收,分析及做出相应的内容输出;响应用户捂嘴手势后,在用户与智能设备交互情况下,处理所述信号以确定用户去除捂嘴手势;响应于确定用户去除捂嘴手势,结束所述交互过程。
优选的,内容输出形式包括语音、图像中一种或其组合。
优选的,用户的输入信息除了捂嘴手势本身,还包含用户的其他模态信息。
优选的,所述其他模态信息包括语音或眼神。
根据本发明的另一方面,提供了一种计算机可读介质,其上存储有计算机可执行指令,计算机可执行指令被计算机执行时能够执行前述的语音交互唤醒方法。
根据本发明实施例的技术方案具有以下优势中的一点或多点:
1.交互更加自然。用户做出捂嘴手势即可进行交互,符合用户习惯与认知。
2.使用效率更高。单手即可使用。用户无需操作设备或在不同的用 户界面/应用之间切换,不需按住某个按键或者重复说出唤醒词,直接抬起手到嘴边就能使用。
3.高隐私性与社会性。做出捂嘴手势,用户进行语音输入对他人的干扰较小,同时具有较好的隐私保护,降低用户语音输入时的心理负担。
附图说明
从下面结合附图对本发明实施例的详细描述中,本发明的上述和/或其它目的、特征和优势将变得更加清楚并更容易理解。其中:
图1是根据本发明实施例的语音输入交互方法的示意性流程图;
图2是根据本发明实施例的触发姿势中的右手向左捂嘴的正面示意图;
图3是根据本发明实施例的触发姿势中的右手向左捂嘴的侧面示意图;
图4是根据本发明实施例的触发姿势中的四指不超出鼻子姿势的示意图;
图5是根据本发明实施例的触发姿势中的拇指抵于下巴姿势的示意图。
具体实施方式
为了使本领域技术人员更好地理解本发明,下面结合附图和具体实施方式对本发明作进一步详细说明。
首先,对本文中使用的术语进行说明。
摄像头,除非特别指明,本文中的摄像头指普通摄像头,而不包括红外摄像头。
图1是根据本发明实施例的智能电子设备通过识别用户捂嘴手势和去除捂嘴手势来开启和结束与用户交互的交互方法的示意性流程图。智能电子便携设备,包括传感器***,能够捕捉到从其能判断用户的手放在用户嘴部做出捂嘴手势或去除捂嘴手势的信号。所述传感器***包括下述项目中的一项或者多项:摄像头、红外摄像头、麦克风、双麦克风组、多麦克风组、接近传感器、加速度计。
这里的交互,可以包括但不限于:语音交互、眼神交互、手势交互等等。
需要说明的是,以语音交互为例,根据本发明实施例,从用户角度,为了进行语音交互,只需要将手放在嘴部做出捂嘴手势,并同时或接着进行语音输入即可,要想结束语音交互,则只需放下手不再捂嘴即可。
如图1所示,S101,处理所述信号以确定用户将手放在嘴部做出捂嘴手势。
作为示例,捂嘴手势区分使用左手做出和使用右手做出。
作为示例,捂嘴手势区分手掌相对于嘴部的不同位置,包括手掌处于嘴部到左耳之间,手掌处于嘴部到右耳之间,手掌处于嘴部正前方。
作为示例,捂嘴手势区分接触脸部与不接触脸部的手势类别。
具体地,捂嘴手势可以包括下面项目中的一个:
用户单手向左侧或者右侧捂嘴;
用户单手接触脸部捂嘴,遮挡住整个嘴部的捂嘴手势;
用户单手接触脸部捂嘴,拇指贴在嘴边,食指接触在嘴唇上方,掌心以下露出嘴部的捂嘴手势;
用户单手接触脸部捂嘴,拇指贴在嘴边,尾指接触下颌,掌心以上露出嘴部的捂嘴手势;
用户单手不接触脸部捂嘴,遮挡住整个嘴部的捂嘴手势;
用户单不手接触脸部捂嘴,拇指贴在嘴边,食指接触在嘴唇上方,掌心以下露出嘴部的捂嘴手势;
用户单手不接触脸部捂嘴,拇指贴在嘴边,尾指接触下颌,掌心以上露出嘴部的捂嘴手势。
图2至图5显示了几例用户将单手放在嘴边并做出捂嘴手势以触发信息输入的情况。其中,图2与图3分别是触发姿势中的左手向右捂嘴的正面与侧面示意图。在这种姿势下,用户将左手放在嘴部左边,伸展手指向左捂嘴,拇指保持向上,其余四指向左在嘴唇上方,鼻子下方位置,即嘴部的上方和左方被左手挡住。根据不同用户的使用习惯,除拇指外其他四指伸展位置可以不超过鼻子右边或者超出,拇指可位于脸侧面或者抵于下巴。 图4与图5分别是四指不超出鼻子和拇指抵于下巴的姿势的示意图。与前述左手向右捂嘴姿势相类似,两种姿势拇指及其他四指的位置和伸展程度不同。上述对触发姿势的说明是示例性的,并非穷尽性的,并且也不限于所披露的各姿势。
在步骤S102中,响应于确定用户将手放在嘴边做出捂嘴手势,将捂嘴手势作为用户交互输入控制的方式,控制智能电子设备上的程序执行,包括触发相应的控制指令或者触发其他输入方式。例如,当智能电子设备为智能手机时,通过智能手机的前置摄像头检测到用户将手放在嘴边做出捂嘴手势,触发的控制指令为设置手机静音;另一种设计是当检测到用户将手放在嘴边做出捂嘴手势时,智能手机通过震动方式提示用户已经进入语音输入模式,用户可以通过说话来做语音输入。又例如,智能电子设备还可以为无线耳机,通过分析耳机上的麦克风信号的差异,确定用户将手放在嘴边做出捂嘴手势。
例如,响应于确定用户将手放在嘴边持续保持捂嘴手势,根据所做捂嘴手势类别、智能设备当前应用的交互内容、用户同时通过其它模态输入的信息,对于用户的交互意图进行解析。
换句话说,智能电子设备识别用户做的是哪种捂嘴手势,然后将该捂嘴手势对应到预定的用户意图(指令)(该对应关系可以是根据人类使用习惯来限定的),从而对该指令做出响应。
具体地,当智能电子设备识别捂嘴手势为预定类别时,执行特定的控制指令。
例如,当捂嘴手势为第一预定类别例如向左侧捂嘴的手势时,确定用户意图为语音输入,智能设备对用户在保持捂嘴手势同时进行的语音输入进行接收、分析及做出相应的内容输出。
例如,当捂嘴手势为第二预定类别例如向右侧捂嘴的手势时,确定用户意图为头动输入,智能设备对用户在保持捂嘴手势同时进行的头动输入进行接收、分析及做出相应的内容输出。
例如,当捂嘴手势为第三预定类别例如单手遮挡住整个嘴部的捂嘴手势时,确定用户意图对智能设备执行特定的控制指令,即智能设备识别用 户保持捂嘴时,解析为特定的控制指令。
当捂嘴手势区分不同姿势时,比如左手捂嘴、右手捂嘴,确定用户意图对智能设备执行的不同控制指令,智能设备识别用户保持捂嘴时,根据捂嘴手势的不同类别解析为不同的控制指令。
优选地,当前应用不同时,捂嘴手势触发不同的控制指令或触发不同模态类型信息的输入。
优选地,当捂嘴手势区分不同姿势时,比如左手捂嘴、右手捂嘴,每一类手势在不同应用内,触发不同的控制指令或触发不同模态类型信息的输入。所述不同模态类型信息的输入或其他输入方式包括语音输入、非捂嘴手势输入、视线输入、眨眼输入、头动输入之一或者其组合。
作为示例,可以提供包括视觉、听觉任一项反馈,提示用户智能电子设备已经触发其他输入方式。
作为示例,触发的其他输入方式为语音输入,智能电子设备对用户在保持捂嘴手势同时进行的语音输入进行处理。进一步地,当所述用于识别捂嘴手势的信号包括用户的语音信号时,智能电子设备将该语音信号当作语音输入进行处理。
可选地,在响应用户捂嘴手势用户与智能设备交互的过程中,处理所述信号以检测用户是否做出去除捂嘴手势。
响应于检测到用户做出去除捂嘴手势,结束语音所述交互过程。
当用户将单手放在嘴边并做出捂嘴手势时,智能电子便携设备通过自身的各种传感器,检测和识别手的位置及姿势。
下面以某几种智能便携设备及传感器为例进行说明,其中判断用户做出捂嘴手势等同于用户需要触发信息输入。
第一实施例 智能便携设备为手机,传感器***包括摄像头情况
此时传感器***识别所用信号包括摄像头拍摄到的脸部图像。在所述信号包括摄像头拍摄到的脸部图像时,在用户做出捂嘴手势后,识别用户的一类或者多类捂嘴手势。
例如,手机配置有前置摄像头,拍摄到用户单手遮挡在嘴边的图像, 手机处理图像识别到用户在做单手捂嘴手势,该单手捂嘴手势可被解析为对手机的控制指令,如静音。
第二实施例 智能便携设备为手机,传感器***包括摄像头,在输入前进行语音提示情况
手机前置摄像头拍摄到用户遮挡在嘴边,判断用户在做单手捂嘴手势。该捂嘴手势可被解析为用户的语音输入意图。耳机(如果用户佩戴)或者手机发出提示音,提示用户可以进行语音输入,用户听到提示音后开始语音输入。
第三实施例 智能便携设备为智能手表或智能戒指或腕表等中的一种智能穿戴设备,传感器***包括接近传感器和麦克风情况
通过检测位于智能手表或戒指上的接近传感器和麦克风,当接近传感器检测结果为接近同时麦克风接收到语音信号时,判断用户可能在做单手捂嘴手势。
第四实施例 智能便携设备为手机和/或线控耳机,传感器***包括麦克风情况
通过分析线控麦克风录制的用户语音的特征,如鼻音、语气、音量等,当用户做单手捂嘴手势时,声音经过手的遮挡传入麦克风,声音特征与非遮挡情况在以上方面存在显著区别,可以判断用户是否在做单手捂嘴手势。
传感器***识别所用信号的特征包括单麦克风接收到的声音信号的时域特征、频谱特征或声音信号的声源位置特征中的一种或者多种
第五实施例 智能便携设备为手机和双蓝牙耳机,传感器***包括位于双耳的双麦克风组
通过比较两个麦克风接收到的声音信号差异,以左手向右捂嘴为例,由于用户左手处在嘴和左耳之间,阻挡声音从嘴向位于左边的麦克风的传播路径,因此,声音信号传播到左侧和右侧的麦克风时,两侧接收到的声 音信号在音量,不同频率能量分布上存在显著差异,可用来判断用户可能在做单手捂嘴手势。
第六实施例 智能便携设备为头戴式显示设备,传感器***包括多麦克风情况
用户佩戴头戴式显示设备,该设备上在不同位置配有多个麦克风,与第五实施例类似,可通过不同位置采集到的声音信号比较差异,来判断用户是否在做单手捂嘴手势。
第七实施例,使用多传感器信号的组合
用户佩戴智能手表或者戒指等位于手部附近的可穿戴设备,该可穿戴设备配有运动传感器及方向传感器,同时头部戴有智能显示设备或者耳机,该设备或者耳机配有方向传感器。通过分析位于手部的运动传感器信号,识别用户抬手动作,之后分析位于头部及手部的方向传感器信号,计算用户头、手方向关系,当头、手方向关系满足捂嘴手势要求时,比如手掌面与脸部表面基本平行时,激活语音交互。
第八实施例:捂嘴手势与其他模态输入相结合的交互
根据此实施例,除了使用捂嘴手势执行控制指令外还可结合其他模态信息来进行交互。其他模态信息可以包括:用户的语音、头动、眼动之一或者其组合。例如,检测到捂嘴手势之后,触发语音输入,用户通过语音直接控制智能电子设备。又例如,检测到捂嘴手势之后,激活头动输入,用户通过点头动作来做确认操作。如此,捂嘴手势可便捷准确的开启其他模态输入。
在所用信号包括摄像头拍摄到的脸部附近图像时,在用户做出捂嘴手势后,未进行其他模态输入前,通过图像处理识别特定捂嘴手势而识别出用户交互意图。
在一个示例中,在用户未进行其他模态输入前,提供包括视觉、听觉任一项的提示,以确认是否激活其他模态的输入。
第九实施例 使用多传感器信号的组合
其中,智能电子便携设备可使用上述传感器,同时也可包括但不限于麦克风、双/多麦克风组、摄像头、接近传感器等。使用多个传感器信号的组合,可以使得是否激活语音输入的检测与判断的准确率和召回率更高。同时,对各种传感器信号的使用,可以使得本发明能更好地应用在各种智能电子便携设备上,适应更多的使用情况。
需要说明的是,所述传感器***识别所用信号的特征包括麦克风接收到的声音信号的时域特征、频谱特征或声音信号的声源位置特征中的一种或者多种。
根据本发明的另一实施例,提供了一种智能电子设备的交互方法,所述智能电子设备包括传感器***,能够捕捉到用户单手在嘴边并做捂嘴手势的信号,所述智能电子设备执行的交互方法包括:处理所述信号以确定用户单手在嘴边做出捂嘴手势;响应于确定用户将手放在嘴边持续保持捂嘴手势,根据所做捂嘴手势类别、智能设备当前应用的交互内容、用户同时通过其它模态输入的信息,对于用户的交互意图进行解析;根据解析得到的交互意图,智能设备将对于用户的输入信息进行接收,分析及做出相应的内容输出;响应用户捂嘴手势后,在用户与智能设备交互情况下,处理所述信号以确定用户去除捂嘴手势;响应于确定用户去除捂嘴手势,结束所述交互过程。
作为示例,内容输出形式可以包括语音、图像中一种或其组合。
用户的输入信息除了捂嘴手势本身,还可以包含用户的其他模态信息或者说其它输入信息。
作为示例,其他模态信息或其他输入信息可以包括语音输入、非捂嘴手势输入、视线输入、眨眼输入、头动输入等或这些的组合。
下面给出一个应用场景举例,以用户携带有智能手机同时佩戴双耳蓝牙耳机同时身处公共场合为例。用户希望通过语音输入查询当天的天气情况,使用本发明,用户将单手放在嘴边做出捂嘴手势,同时说出“今天的天气怎么样?”。通过上述方法智能手机识别到用户单手捂嘴手势及语音输入内容,可以通过耳机提供天气信息的内容输出。如此,用户无需接触手 机,或通过手机的界面进行信息查询;无需说出特定唤醒词以唤醒语音交互;同时,捂嘴的手势降低语音输入对于周围他人的干扰,保护用户语音输入的隐私,符合用户日常语言交流的习惯和认知,简单自然。
总结起来,根据本发明实施例的技术方案具有下面优势中的一个或多个:
1.交互更加自然。用户做出捂嘴手势即可进行交互,符合用户习惯与认知。
2.使用效率更高。单手即可使用。用户无需操作设备或在不同的用户界面/应用之间切换,不需按住某个按键或者重复说出唤醒词,直接抬起手到嘴边就能使用。
3.高隐私性与社会性。做出捂嘴手势,用户进行语音输入对他人的干扰较小,同时具有较好的隐私保护,降低用户语音输入时的心理负担。
前面的传感器类型作为示例而非作为限制,概括而言,所述传感器***包括下述项目中的一项或者多项:摄像头;红外摄像头;深度摄像头;麦克风;双麦克风组;多麦克风组;接近传感器;以及加速度计。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。因此,本发明的保护范围应该以权利要求的保护范围为准

Claims (29)

  1. 一种智能电子便携设备,包括传感器***,能够捕捉到从其能判断用户的手放在用户嘴部做出捂嘴手势的信号,智能电子便携设备包括存储器和处理器,存储器上存储有计算机可执行指令,所述计算机可执行指令被处理器执行时可操作来执行如下交互方法:
    处理所述信号以确定用户是否将手放在嘴部做出捂嘴手势;
    响应于确定用户将手放在嘴边做出捂嘴手势,将捂嘴手势作为用户交互输入控制的方式,控制智能电子设备上的程序执行,包括触发相应的控制指令或者触发其他输入方式。
  2. 根据权利要求1的智能电子便携设备,所述捂嘴手势区分使用左手做出和使用右手做出。
  3. 根据权利要求1的智能电子设备,所述捂嘴手势区分手掌相对于嘴部的不同位置,包括手掌处于嘴部到左耳之间,手掌处于嘴部到右耳之间,手掌处于嘴部正前方。
  4. 根据权利要求1的智能电子设备,所述捂嘴手势区分接触脸部与不接触脸部的手势类别。
  5. 根据权利要求1的智能电子设备,所述捂嘴手势具体手型包括但不限于以下类别:
    手掌遮挡住整个嘴部的捂嘴手势;
    拇指贴在嘴边,食指贴在嘴唇上方,掌心以下露出嘴部的捂嘴手势;
    拇指贴在下颌,食指贴在嘴唇上方,掌心以下露出嘴部的捂嘴手势;
    拇指贴在嘴边,尾指接触下颌,掌心以上露出嘴部的捂嘴手势。
  6. 根据权利要求1-5的智能电子设备,当智能电子设备识别捂嘴手势为预定类别时,执行特定的控制指令。
  7. 根据权利要求6的智能电子设备,执行的控制指令为触发除捂嘴手势外的其它输入方式,即处理其它输入方式输入的信息。
  8. 根据权利要求7的智能电子设备,所述其他输入方式包括语音输入、非捂嘴手势输入、视线输入、眨眼输入、头动输入之一或者其组合。
  9. 根据权利要求7的智能电子设备,处理所述信号以检测用户是否去 除捂嘴手势;
    响应于检测到用户去除捂嘴手势,智能电子设备结束所述交互过程。
  10. 根据权利要求7所述的智能电子设备,提供包括视觉、听觉任一项反馈,提示用户智能电子设备已经触发其他输入方式。
  11. 根据权利要求7的智能电子设备,触发的其他输入方式为语音输入,智能电子设备对用户在保持捂嘴手势同时进行的语音输入进行处理。
  12. 根据权利要求11所述的智能电子设备,
    当所述用于识别捂嘴手势的信号包括用户的语音信号时,智能电子设备将该语音信号当作语音输入进行处理。
  13. 根据权利要求1的智能电子设备,所述智能电子设备为手机,装备有双耳蓝牙耳机,有线耳机或者摄像头中的一种传感器。
  14. 根据权利要求1的智能电子设备,所述智能电子设备为手表、智能戒指、腕表中的一种智能穿戴设备。
  15. 根据权利要求1的智能电子设备,所述智能电子设备为头戴式智能显示设备,装备有麦克风或者多麦克风组。
  16. 根据权利要求1的智能电子设备,所述传感器***包括下述项目中的一项或者多项:
    摄像头;
    红外摄像头;
    深度摄像头;
    麦克风;
    双麦克风组;
    多麦克风组;
    接近传感器;以及
    加速度计。
  17. 根据权利要求1的智能电子设备,所述传感器***识别所用信号包括摄像头拍摄到的脸部图像。
  18. 根据权利要求17的智能电子设备,在所述信号包括摄像头拍摄到的脸部图像时,在用户做出捂嘴手势后,识别用户的一类或者多类捂嘴手 势。
  19. 根据权利要求17的智能电子设备,所述智能电子设备为智能手机,所述摄像头包括智能手机的前置摄像头。
  20. 根据权利要求11所述的智能电子设备,所述传感器***识别所用信号的特征包括单麦克风接收到的声音信号的时域特征、频谱特征或声音信号的声源位置特征中的一种或者多种。
  21. 根据权利要求20所述的智能电子设备,所述麦克风为手机上的麦克风和/或线控耳机上的麦克风。
  22. 根据权利要求11所述的智能电子设备,所述传感器***识别所用信号的特征包括多麦克风接收到的声音信号之间的差异特征。
  23. 根据权利要求22所述的智能电子设备,传感设备是无线蓝牙耳机时,通过左右耳机的信号差异来识别捂嘴手势。
  24. 根据权利要求1所述的智能电子设备,所述信号为智能戒指上的接近光传感器信号。
  25. 一种智能电子设备的语音交互唤醒方法,所述智能电子设备包括传感器***,能够捕捉到用户单手在嘴边并做捂嘴手势的信号,
    所述智能电子设备执行的语音交互唤醒方法包括:
    处理所述信号以确定用户单手在嘴边做出捂嘴手势;
    响应于确定用户将手放在嘴边持续保持捂嘴手势,根据所做捂嘴手势类别、智能设备当前应用的交互内容、用户同时通过其它模态输入的信息,对于用户的交互意图进行解析;
    根据解析得到的交互意图,智能设备将对于用户的输入信息进行接收,分析及做出相应的内容输出;
    响应用户捂嘴手势后,在用户与智能设备交互情况下,处理所述信号以确定用户去除捂嘴手势;
    响应于确定用户去除捂嘴手势,结束所述交互过程。
  26. 根据权利要求24的交互方法,所述内容输出形式包括语音、图像中一种或其组合。
  27. 根据权利要求24的交互方法,用户的输入信息除了捂嘴手势本身, 还包含用户的其他模态信息。
  28. 根据权利要求26的交互方法,所述其他模态信息包括语音或眼神。
  29. 一种计算机可读介质,其上存储有计算机可执行指令,计算机可执行指令被计算机执行时能够执行权利要求24-27任一项所述的语音交互唤醒方法。
PCT/CN2020/092190 2019-06-03 2020-05-26 基于捂嘴动作识别的语音交互唤醒电子设备、方法和介质 WO2020244410A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/616,075 US20220319520A1 (en) 2019-06-03 2020-05-26 Voice interaction wakeup electronic device, method and medium based on mouth-covering action recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910475947.0 2019-06-03
CN201910475947.0A CN110164440B (zh) 2019-06-03 2019-06-03 基于捂嘴动作识别的语音交互唤醒电子设备、方法和介质

Publications (1)

Publication Number Publication Date
WO2020244410A1 true WO2020244410A1 (zh) 2020-12-10

Family

ID=67627224

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092190 WO2020244410A1 (zh) 2019-06-03 2020-05-26 基于捂嘴动作识别的语音交互唤醒电子设备、方法和介质

Country Status (3)

Country Link
US (1) US20220319520A1 (zh)
CN (1) CN110164440B (zh)
WO (1) WO2020244410A1 (zh)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110164440B (zh) * 2019-06-03 2022-08-09 交互未来(北京)科技有限公司 基于捂嘴动作识别的语音交互唤醒电子设备、方法和介质
CN110866465A (zh) * 2019-10-29 2020-03-06 维沃移动通信有限公司 电子设备的控制方法及电子设备
CN111432303B (zh) * 2020-03-19 2023-01-10 交互未来(北京)科技有限公司 单耳耳机、智能电子设备、方法和计算机可读介质
CN111625094B (zh) * 2020-05-25 2023-07-14 阿波罗智联(北京)科技有限公司 智能后视镜的交互方法、装置、电子设备和存储介质
CN112216030B (zh) * 2020-08-31 2022-02-22 厦门宸力科技有限公司 智能服药监测方法、智能服药机、智能服药站和管理***
CN112133313A (zh) * 2020-10-21 2020-12-25 交互未来(北京)科技有限公司 基于单耳机语音对话过程捂嘴手势的识别方法
CN112259124B (zh) * 2020-10-21 2021-06-15 交互未来(北京)科技有限公司 基于音频频域特征的对话过程捂嘴手势识别方法
CN113805691A (zh) * 2020-12-28 2021-12-17 京东科技控股股份有限公司 电子设备的控制方法、装置、电子设备和存储介质
CN114915682B (zh) * 2021-02-10 2023-11-03 华为技术有限公司 语音处理方法、装置、存储介质及芯片
JP2022125782A (ja) * 2021-02-17 2022-08-29 京セラドキュメントソリューションズ株式会社 電子機器及び画像形成装置
CN113191184A (zh) * 2021-03-02 2021-07-30 深兰科技(上海)有限公司 实时视频处理方法、装置、电子设备及存储介质
CN114527924A (zh) * 2022-02-16 2022-05-24 珠海读书郎软件科技有限公司 一种基于双屏设备的控制方法、存储介质及设备
CN116301389B (zh) * 2023-05-17 2023-09-01 广东皮阿诺科学艺术家居股份有限公司 一种基于深度学习的多模态智能家具控制方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013001331A1 (de) * 2013-01-26 2014-07-31 Audi Ag Verfahren zum Betreiben einer Vorrichtung, insbesondere eines Kraftwagens oder eines mobilen Endgeräts, mittels Gestensteuerung und Spracheingabe sowie Vorrichtung
CN104731315A (zh) * 2013-12-20 2015-06-24 联想(新加坡)私人有限公司 根据手势输入来启用设备特征
CN104781782A (zh) * 2012-11-08 2015-07-15 索尼公司 信息处理设备、信息处理方法和程序
US9443536B2 (en) * 2009-04-30 2016-09-13 Samsung Electronics Co., Ltd. Apparatus and method for detecting voice based on motion information
CN108181992A (zh) * 2018-01-22 2018-06-19 北京百度网讯科技有限公司 基于手势的语音唤醒方法、装置、设备及计算机可读介质
CN110164440A (zh) * 2019-06-03 2019-08-23 清华大学 基于捂嘴动作识别的语音交互唤醒电子设备、方法和介质
CN111432303A (zh) * 2020-03-19 2020-07-17 清华大学 单耳耳机、智能电子设备、方法和计算机可读介质

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3997392B2 (ja) * 2001-12-13 2007-10-24 セイコーエプソン株式会社 表示装置及び表示装置の入力方法
EP2524279A1 (en) * 2010-01-14 2012-11-21 BrainLAB AG Gesture support for controlling and/or operating a medical device
DE102011075467A1 (de) * 2011-05-06 2012-11-08 Deckel Maho Pfronten Gmbh Vorrichtung zum bedienen einer automatisierten maschine zur handhabung, montage oder bearbeitung von werkstücken
KR101749143B1 (ko) * 2011-12-26 2017-06-20 인텔 코포레이션 탑승자 오디오 및 시각적 입력의 차량 기반 결정
US20140268016A1 (en) * 2013-03-13 2014-09-18 Kopin Corporation Eyewear spectacle with audio speaker in the temple
KR102091028B1 (ko) * 2013-03-14 2020-04-14 삼성전자 주식회사 사용자 기기의 오브젝트 운용 방법 및 장치
US9436287B2 (en) * 2013-03-15 2016-09-06 Qualcomm Incorporated Systems and methods for switching processing modes using gestures
US10884493B2 (en) * 2013-06-20 2021-01-05 Uday Parshionikar Gesture based user interfaces, apparatuses and systems using eye tracking, head tracking, hand tracking, facial expressions and other user actions
WO2015079441A1 (en) * 2013-11-26 2015-06-04 Yoav Shefi Method and system for constructing a virtual image anchored onto a real-world object
NZ735465A (en) * 2015-03-05 2021-07-30 Magic Leap Inc Systems and methods for augmented reality
CN104835059A (zh) * 2015-04-27 2015-08-12 东华大学 一种基于体感交互技术的智能广告投放***
CA3002369A1 (en) * 2015-10-20 2017-04-27 Magic Leap, Inc. Selecting virtual objects in a three-dimensional space
CN106155311A (zh) * 2016-06-28 2016-11-23 努比亚技术有限公司 Ar头戴设备、ar交互***及ar场景的交互方法
CN106774917A (zh) * 2016-12-27 2017-05-31 努比亚技术有限公司 终端控制装置、穿戴式设备、终端及终端控制方法
CN108304062A (zh) * 2017-01-11 2018-07-20 西门子公司 虚拟环境交互方法、设备和***
US20200050256A1 (en) * 2017-01-25 2020-02-13 Google Llc Techniques to cause changes in both virtual environment and physical environment
EP3602544A4 (en) * 2017-03-23 2020-02-05 Joyson Safety Systems Acquisition LLC SYSTEM AND METHOD FOR CORRELATION OF MOUTH IMAGES WITH INPUT COMMANDS
CN108052202B (zh) * 2017-12-11 2021-06-11 深圳市星野信息技术有限公司 一种3d交互方法、装置、计算机设备及存储介质
CN108271078A (zh) * 2018-03-07 2018-07-10 康佳集团股份有限公司 通过手势识别的语音唤醒方法、智能电视及存储介质
CN108492825A (zh) * 2018-03-12 2018-09-04 陈火 一种语音识别的启动方法、头戴式设备及语音识别***
US10554886B2 (en) * 2018-05-18 2020-02-04 Valve Corporation Power management for optical position tracking devices
US10948993B2 (en) * 2018-06-07 2021-03-16 Facebook, Inc. Picture-taking within virtual reality
US11017217B2 (en) * 2018-10-09 2021-05-25 Midea Group Co., Ltd. System and method for controlling appliances using motion gestures
EP3660633A1 (en) * 2019-07-31 2020-06-03 Taurum Technologies SL Hand-worn data-input device
CN112596605A (zh) * 2020-12-14 2021-04-02 清华大学 一种ar眼镜控制方法、装置、ar眼镜及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9443536B2 (en) * 2009-04-30 2016-09-13 Samsung Electronics Co., Ltd. Apparatus and method for detecting voice based on motion information
CN104781782A (zh) * 2012-11-08 2015-07-15 索尼公司 信息处理设备、信息处理方法和程序
DE102013001331A1 (de) * 2013-01-26 2014-07-31 Audi Ag Verfahren zum Betreiben einer Vorrichtung, insbesondere eines Kraftwagens oder eines mobilen Endgeräts, mittels Gestensteuerung und Spracheingabe sowie Vorrichtung
CN104731315A (zh) * 2013-12-20 2015-06-24 联想(新加坡)私人有限公司 根据手势输入来启用设备特征
CN108181992A (zh) * 2018-01-22 2018-06-19 北京百度网讯科技有限公司 基于手势的语音唤醒方法、装置、设备及计算机可读介质
CN110164440A (zh) * 2019-06-03 2019-08-23 清华大学 基于捂嘴动作识别的语音交互唤醒电子设备、方法和介质
CN111432303A (zh) * 2020-03-19 2020-07-17 清华大学 单耳耳机、智能电子设备、方法和计算机可读介质

Also Published As

Publication number Publication date
CN110164440B (zh) 2022-08-09
US20220319520A1 (en) 2022-10-06
CN110164440A (zh) 2019-08-23

Similar Documents

Publication Publication Date Title
WO2020244410A1 (zh) 基于捂嘴动作识别的语音交互唤醒电子设备、方法和介质
AU2020257155B2 (en) Motion gesture input detected using optical sensors
JP6721713B2 (ja) 動作−音声の多重モード命令に基づいた最適制御方法およびこれを適用した電子装置
WO2021184549A1 (zh) 单耳耳机、智能电子设备、方法和计算机可读介质
US10873798B1 (en) Detecting through-body inputs at a wearable audio device
US20230297169A1 (en) Measurement of Facial Muscle EMG Potentials for Predictive Analysis Using a Smart Wearable System and Method
TWI590640B (zh) 通話方法及其電子裝置
US20150222742A1 (en) Wrist wearable apparatus with transformable substrate
CN108595003A (zh) 功能控制方法及相关设备
CN110428806A (zh) 基于麦克风信号的语音交互唤醒电子设备、方法和介质
CN110097875A (zh) 基于麦克风信号的语音交互唤醒电子设备、方法和介质
KR20130040607A (ko) 통화 시의 귀를 구별하기 위한 이동 단말 및 그 방법
CN110780769A (zh) 一种控制方法、装置、可穿戴设备及介质
WO2020244401A1 (zh) 基于靠近嘴部检测的语音输入唤醒装置、方法和介质
CN117130469A (zh) 一种隔空手势识别方法、电子设备及芯片***
JP2023001310A (ja) 携帯端末
CN109891364A (zh) 信息处理装置、方法和程序
KR20140057831A (ko) 입으로 부는 바람을 이용한 스마트폰의 제어방법
US10540085B2 (en) Microphone control via contact patch
TWI495903B (zh) 具無接觸手勢控制之眼鏡型行動電話
WO2023226031A1 (zh) 快捷操作执行方法、装置、设备及存储介质
WO2015092635A1 (en) Device, method and computer program product
WO2015062502A1 (zh) 一种来电应答的方法及终端
CN106361301A (zh) 移动装置与其控制方法
CN116189718A (zh) 语音活性检测方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20818843

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20818843

Country of ref document: EP

Kind code of ref document: A1