CN110865705B - Multi-mode fusion communication method and device, head-mounted equipment and storage medium - Google Patents

Multi-mode fusion communication method and device, head-mounted equipment and storage medium Download PDF

Info

Publication number
CN110865705B
CN110865705B CN201911019740.9A CN201911019740A CN110865705B CN 110865705 B CN110865705 B CN 110865705B CN 201911019740 A CN201911019740 A CN 201911019740A CN 110865705 B CN110865705 B CN 110865705B
Authority
CN
China
Prior art keywords
information
myoelectricity
user
voice
facial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911019740.9A
Other languages
Chinese (zh)
Other versions
CN110865705A (en
Inventor
印二威
鲁金朋
马权智
谢良
邓宝松
闫野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center
National Defense Technology Innovation Institute PLA Academy of Military Science
Original Assignee
Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center
National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center, National Defense Technology Innovation Institute PLA Academy of Military Science filed Critical Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center
Priority to CN201911019740.9A priority Critical patent/CN110865705B/en
Publication of CN110865705A publication Critical patent/CN110865705A/en
Application granted granted Critical
Publication of CN110865705B publication Critical patent/CN110865705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/015Input arrangements based on nervous system activity detection, e.g. brain waves [EEG] detection, electromyograms [EMG] detection, electrodermal response detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/011Emotion or mood input determined on the basis of sensed human body parameters such as pulse, heart rate or beat, temperature of skin, facial expressions, iris, voice pitch, brain activity patterns

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Dermatology (AREA)
  • Neurology (AREA)
  • Neurosurgery (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a multi-mode fusion communication method, a device, a head-mounted device and a storage medium. The method comprises the following steps: acquiring voice information, lip image information and facial myoelectricity information of a user; determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the facial myoelectricity information; and identifying instruction information of the user according to the model parameters, the voice information, the lip image information and the facial myoelectricity information. In the embodiment of the application, the lip image and facial myoelectricity information linkage signal is processed, so that the environment adaptability and the instruction recognition accuracy of the interactive communication system are greatly improved. And the equipment is easy to wear, simple to use and easy to operate. The position of the collector is relatively fixed, so that the difference of each signal collection is reduced, and the accuracy of model prediction is improved.

Description

Multi-mode fusion communication method and device, head-mounted equipment and storage medium
Technical Field
The application belongs to the technical field of data processing and communication, and particularly relates to a multi-mode fusion communication method, a device, a head-mounted device and a storage medium.
Background
The cooperation among teams is not separated from information interaction, and voice communication is the most direct and accurate communication mode. However, some complex environments have certain limitations on voice communication between people, such as mutual conversation during driving of the aircraft, and voice information is greatly disturbed due to the booming of the engine. For another example, in special combat, whispering is required to transfer information in low sound, and at this time, the loudness of the speech is too low, so that it is difficult to ensure correct and effective transfer of information.
Currently, for a complex environment affecting voice communication, the related art recognizes that a user wants to speak through an electronic throat, which is a sensor capable of collecting throat vibration. When in use, the electronic throat needs to be tightly attached to the throat of the user. When a user speaks, the electronic larynx collects vibration signals of the larynx and converts the vibration signals into audio signals, so that the user speaking is identified.
However, the electronic throat has strict requirements on the use mode of the user, the wearing mode brings discomfort to the user, and throat vibration formed by the swallowing action of the user cannot be recognized, so that a great error exists.
Disclosure of Invention
The application provides a multi-mode fusion communication method, a device, a head-mounted device and a storage medium, wherein the lip image and facial myoelectricity information linkage signal processing ensures that the environment adaptability and the instruction recognition accuracy of an interactive communication system are greatly improved. And the equipment is easy to wear, simple to use and easy to operate.
An embodiment of a first aspect of the present application provides a multi-mode fusion communication method, including:
acquiring voice information, lip image information and facial myoelectricity information of a user;
determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the facial myoelectricity information;
and identifying instruction information of the user according to the model parameters, the voice information, the lip image information and the facial myoelectricity information.
In some embodiments of the present application, the acquiring voice information, lip image information, and facial myoelectricity information of the user includes:
collecting voice information of a user through a voice collecting device included in the head-mounted equipment;
shooting a lip region of a user through a miniature camera arranged on the voice acquisition device to obtain lip image information of the user;
and acquiring the facial myoelectricity information of the user through the myoelectricity signal acquisition equipment attached to the face of the user on the head-mounted equipment.
In some embodiments of the present application, the identifying instruction information of the user according to the model parameter, the voice information, the lip image information, and the facial myoelectricity information includes:
determining a corresponding image processing model and a myoelectricity processing model according to the model parameters;
according to the lip image information, identifying a lip instruction corresponding to the lip image information through the image processing model;
identifying facial instructions corresponding to the facial myoelectricity information through the myoelectricity processing model according to the facial myoelectricity information;
and identifying instruction information of the user according to the voice information, the lip instruction and the face instruction.
In some embodiments of the present application, before determining the model parameters corresponding to the current environment through the pre-trained environment assessment model according to the voice information, the lip image information and the facial myoelectricity information, the method further includes:
acquiring voice information, lip image information and facial myoelectricity information under different environments;
dividing different data sets according to the signal-to-noise ratio of the voice information, the brightness of the lip image information and the intensity of the facial myoelectricity information;
and training the model according to the different data sets to obtain an environment assessment model.
In some embodiments of the present application, after the identifying the instruction information of the user, the method further includes:
displaying the instruction information through a display device;
and receiving confirmation information of the user and sending the instruction information to a receiver.
An embodiment of a second aspect of the present application provides a multimode-converged communication device, including:
the information acquisition module is used for acquiring voice information, lip image information and facial myoelectricity information of a user;
the environment determining module is used for determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the facial myoelectricity information;
and the instruction identification module is used for identifying the instruction information of the user according to the model parameters, the voice information, the lip image information and the facial myoelectricity information.
In some embodiments of the present application, the information acquisition module includes:
the voice acquisition unit is used for acquiring voice information of a user through a voice acquisition device included in the head-mounted equipment;
the image shooting unit is used for shooting the lip area of the user through a miniature camera arranged on the voice acquisition device to obtain lip image information of the user;
and the myoelectricity acquisition unit is used for acquiring the myoelectricity information of the face of the user through the myoelectricity acquisition equipment attached to the face of the user on the head-mounted equipment.
In some embodiments of the present application, the instruction identifying module is configured to determine a corresponding image processing model and myoelectricity processing model according to the model parameters; according to the lip image information, identifying a lip instruction corresponding to the lip image information through the image processing model; identifying facial instructions corresponding to the facial myoelectricity information through the myoelectricity processing model according to the facial myoelectricity information; and identifying instruction information of the user according to the voice information, the lip instruction and the face instruction.
An embodiment of a third aspect of the present application provides a headset, including: the device comprises a voice acquisition device, electromyographic signal acquisition equipment, a miniature camera head arranged on the voice acquisition device, a memory, a processor and an executable program stored on the memory, wherein the executable program is executed by the processor to realize the method of the first aspect.
An embodiment of a fourth aspect of the present application proposes a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first aspect described above.
The technical scheme provided by the embodiment of the application has at least the following technical effects or advantages:
in the embodiment of the application, the collected signal flow is subjected to signal processing to obtain the instruction prediction result, the prediction result can be displayed through external display equipment, and after the speaker confirms that the speaker is correct, instruction information is sent through a determination button of the information receiving and sending unit. And the lip image and facial myoelectricity information linkage signal processing ensures that the environment adaptability and the instruction recognition accuracy of the interactive communication system are greatly improved. And the equipment is easy to wear, simple to use and easy to operate. The position of the collector is relatively fixed, so that the difference of each signal collection is reduced, and the accuracy of model prediction is improved.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures.
In the drawings:
FIG. 1 is a flow chart of a method for multimodal fusion communication according to an embodiment of the application;
fig. 2 shows a schematic diagram of an electromyographic signal acquisition device according to an embodiment of the application;
FIG. 3 illustrates a schematic diagram of a headset style headset provided by an embodiment of the present application;
FIG. 4 is a schematic flow chart of image and electromyographic signal processing provided by an embodiment of the application;
FIG. 5 is a schematic diagram illustrating functional modules of a multi-mode converged communication system according to an embodiment of the present application;
FIG. 6 is a flowchart of a method for multi-mode converged communication according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a multi-mode converged communication device according to an embodiment of the application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.
In the embodiment of the application, in order to cope with a complex communication environment, a head-mounted device for rapidly extracting voice information, lip image information and facial myoelectricity information is designed. And evaluating the use environment according to the extracted information, selecting a processing model suitable for the use environment to process and predict signals, and identifying instruction information of a user, thereby realizing information exchange under the complex environment.
The following describes a communication method, a device, a head-mounted device and a storage medium for multi-mode fusion according to an embodiment of the present application with reference to the accompanying drawings.
Example 1
The embodiment of the application provides a communication method of multi-mode fusion, which performs muted communication based on multi-mode information fusion, as shown in fig. 1, and specifically comprises the following steps:
step 101: and acquiring voice information, lip image information and facial myoelectricity information of the user.
The implementation main body of the method is a head-mounted device, the head-mounted device is provided with the electromyographic signal acquisition device shown in fig. 2, when a user wears the head-mounted device on the head, the electromyographic signal acquisition device can be attached to the surface of facial muscles near the mouth of the user, the electromyographic signal acquisition device mainly comprises sensors attached to the periphery of the mouth, and an acquisition part belongs to a mouth muscle group driven during speaking and is used for acquiring facial electromyographic information of the user in real time. The head-mounted equipment is also provided with a voice acquisition device, the voice acquisition device can be a microphone, and when the user wears the head-mounted equipment on the head, the voice acquisition device is just located near the mouth of the user and is used for acquiring voice information when the user speaks. The miniature camera is arranged on the head-mounted equipment and is arranged on the voice acquisition device, and when the user wears the head-mounted equipment on the head, the miniature camera is just aimed at the lip area of the user and is used for shooting the lip area of the user to obtain lip image information of the user. In the embodiment of the application, the headset can be designed into a similar style as the earphone shown in fig. 3, and the myoelectric signal acquisition device shown in fig. 2 and the headset structure shown in fig. 3 can be embedded into a helmet, wherein the headset shown in fig. 3 comprises a microphone and a miniature camera wound on the microphone.
In the embodiment of the application, the miniature camera only collects the picture information of the fixed area of the lip of the user, but not collects the head image of the user, so that the pretreatment steps of positioning the lip part, cutting the image and the like from the head image can be omitted, and the operation process of the system is quickened.
Step 102: and determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the facial myoelectricity information.
Before executing step 102, the embodiment of the present application acquires a large amount of user voice information, lip image information, and facial myoelectricity information in different environments, including a noisy environment, a muted environment, and the like. Different data sets are divided according to different quantity levels according to the signal-to-noise ratio of the voice information, the brightness of lip image information and the intensity of facial myoelectricity information. Model training is carried out according to different data sets, and model parameters under different environments are determined to obtain an environment assessment model.
And then, according to the voice information, the lip image information and the facial myoelectricity information acquired in the step 101, carrying out environment assessment through the environment assessment model, and determining model parameters corresponding to the current environment.
According to the embodiment of the application, according to the multi-mode information of the user, namely the voice information, the lip image information and the facial myoelectricity information, the signal magnitude corresponding to the current environment of the user is estimated through the environment estimation model, and the corresponding model parameters are selected, so that the environment self-adaption can be realized to a certain extent.
Step 103: and identifying instruction information of the user according to the model parameters, the voice information, the lip image information and the facial myoelectricity information.
Determining a corresponding image processing model and an myoelectricity processing model according to the model parameters; according to the lip image information, identifying a lip instruction corresponding to the lip image information through an image processing model; according to the facial myoelectricity information, identifying facial instructions corresponding to facial myoelectricity information through a myoelectricity processing model; and identifying instruction information of the user according to the voice information, the lip instruction and the face instruction.
As shown in fig. 4, the lip image information is input into an image processing model, which first processes the lip image information through a 3D convolution layer, which applies spatiotemporal convolution to a pre-processed image frame stream, the spatiotemporal convolution layer being capable of capturing short-term dynamics of the lip region, the 3D convolution layer being composed of 64 convolution layers of 3D kernels (5×7×7 sizes (time/width/height)) and then subjected to normalization (BN) and normalized linear units (ReLU). The 3D feature map output by the 3D convolutional layer passes through a ResNet (Residual Neural Network) residual network of 34 layers and two BGRU (Bidirectional Gated Recurrent Unit) bi-directional gating cyclic units.
The facial myoelectricity information is input into an myoelectricity processing model, firstly, 50Hz Chebyshev I type IIR notch filtering processing is carried out on the facial myoelectricity information, then 0.1-70Hz Chebyshev I type IIR band-pass filtering processing is carried out on the facial myoelectricity information, 18 ResNet layers and two BGRU layers are firstly input into the obtained information flow, wherein due to the particularity (one-dimensional information) of the myoelectricity signals, the ResNet uses a 1-dimensional kernel, and a 5ms time kernel with a 0.25ms step length is used in a first convolution layer of the ResNet so as to extract fine-scale spectrum information. ResNet, by averaging the pooling layer, makes the output frame number the same as the video frame rate. These frames are then fed to the ResNet layers, which consist of default kernels of size 3 by 1, thus extracting longer term myoelectrical features deeper. The output of ResNet-18 is fed to a 2-layer BGRU, each layer of 1024 cells.
As shown in fig. 4, the image processing model processes lip image information, the myoelectricity processing model processes facial myoelectricity information, then the processing results of the two are input into the information recognition model, and the final BGRU output of the image processing model and the final BGRU output of the myoelectricity processing model are connected and fed to one 2-layer BGRU of the information recognition model to fuse information from video streams and myoelectricity signal streams and simulate their time dynamics in parallel. The output layer of the information identification model is a softmax layer, and the instruction information of the user which is finally identified is output, so that the muted communication of the fusion of facial myoelectricity and image information is realized.
The instruction information of the user can be natural language which the user speaks at will, and can also be instructions in a fixed instruction library. When the instruction information is an instruction in the fixed instruction library, a label can be allocated to each instruction, and when the instruction information of the user is identified, the label to which the identified instruction sequence belongs is marked based on the highest average probability.
In the embodiment of the application, the signal processing adopts an end-to-end processing mode, and a data preprocessing stage is omitted. The image stream part adopts 3D convolution, so that time sequence information can be well captured, then, two signal (image and electromyographic signal) streams are processed by ResNet, the ResNet is composed of residual blocks, and the problems of gradient disappearance and gradient explosion can be effectively avoided through skip connection, so that the training effect is improved.
Wherein, the relevant forward propagation formula is shown as follows,
z [l+1] =w [l+1] a [l] +b [l+1]
a [l+1] =g(z [l+1] )
z [l+2] =w [l+2] a [l+1] +b [l+2]
a [l+2] =g(z [l+2] +a [l] )
wherein l is the layer number of the neural network, z is the calculation result of each layer, a is the value processed by the activation function of each layer, w and b are parameters of the corresponding network layer (corresponding relation is determined according to the superscript), namely the part of the neural network which needs to be trained and updated after an initial value is given; g is the selected activation function (e.g., reLU function), the residual block in ResNet is characterized by equation 4 above, rather than simply performing a [l+2] =g(z [++2] ) But adds a to the first two layers [l] The calculation of a takes into account the information of the previous layer, i.e. every second layer network.
The BGRU unit adds the Memory unit information, is the simplification of an LSTM (Long Short-Term Memory) network, can accelerate the running speed more than the LSTM, and simultaneously keeps the Memory of the time sequence. The operation formula for the time t in each layer of the GRU (gate cycle unit) units is as follows,
c <t-1> =a <t-1>
Γ r =σ(w r [c <t-1> ,x <t> ]+b r )
Γ u =σ(w u [c <t-1> ,x <t> ]+b u )
the above formula is a calculation process for a certain layer of the neural network, because the input is a sequence, the time t corresponds to a certain time information in the sequence sample (in this example, the time t corresponds to a certain frame in the image sequence, or a certain small electromyographic signal), and the input at each time is the information x at the time <t> Output c at the last time <t-1> The output is C at this time <t> According to different needs, it is possible to also apply to C <t> Processing is performed as input to the next layer of neural network.
Wherein x is <t> Is information input at time t, c <t-1> Memory cell information a at time t-1 (the last time) <t-1> Is the value processed by the activation function at the last moment. In GRU, its value and c <t-1> Identical, so are not intended as additional representations. However, in LSTM, the values are different, in order to maintain uniformity, a is reserved <t-1> In the expression (c) of (c),is a substitute value at time t for updating C at time t <t>
Wherein w is c 、b c Is used for calculatingI.e. the part of the neural network that needs to be trained for updating after giving the initial value. Γ -shaped structure r Is a correlation gate, representing->And c <t-1> Is a correlation of (3). w (w) r 、b r Is used for calculating Γ r I.e. the part of the neural network that needs to be trained for updating after giving the initial value. Γ -shaped structure u Is an update gate (value between 0 and 1), control C <t> What updates are made. w (w) u 、b u Is used for calculating Γ u I.e. the part of the neural network that needs to be trained for updating after giving the initial value. Sigma is a Sigmoid activation function with the expression +.>C <t> The memory cell information at time t is the output at this time and is used as an input at the next time.
After the instruction information of the user is identified in the mode, the instruction information is displayed through the display device. The display device may be a mobile phone of the user or an external display screen of the head-mounted device.
After displaying the instruction information, the user can see the identified instruction information so that the user can confirm whether the identified instruction information is an instruction which the user really wants to express. And after the user confirms that the identified instruction information is the instruction which the user really wants to express, the user submits the confirmation information to the head-mounted equipment. The head-mounted equipment receives the confirmation information of the user and sends the instruction information to the receiving end corresponding to the receiving end through the transmitting end, so that communication between the user and the receiving end is realized.
In the embodiment of the application, the user can also receive the instruction information sent by the transmitting end of the opposite party through the receiving end of the head-mounted equipment and transmit the instruction information to the user through the earphone arranged on the head-mounted equipment.
In order to facilitate understanding of the communication system provided by the embodiments of the present application, the following description is specifically provided with reference to fig. 5. As shown in fig. 5, the communication system includes a signal acquisition module, an environment evaluation module, an image sequence processing module, an electromyographic signal processing module, an information identification module, an instruction information display module, and an information transceiver module. The signal acquisition module comprises a lip image acquisition module and a face electromyographic signal acquisition module. The signal acquisition module comprises a head-mounted device, comprises a headset, a camera and a myoelectricity acquisition system, wherein the microphone is in charge of communication under a normal environment, a small camera is wound on the microphone to acquire lip information of a fixed size and a fixed area, the myoelectricity acquisition device is attached to muscles near a mouth and is in charge of acquiring facial myoelectricity signals during speaking. The environment evaluation module is used for evaluating the signal magnitude corresponding to the current environment and selecting corresponding model parameters. The image sequence processing module and the electromyographic signal processing module are used for respectively processing lip image information and facial electromyographic information. The information identification module is used for fusing the processing results of the image sequence processing module and the electromyographic signal processing module and identifying instruction information of a user. The instruction information display module is used for displaying the instruction information of the identified user. The information receiving and transmitting module is used for receiving and transmitting instruction information and realizing communication with other users. The system adopts a multi-mode perception fusion processing technology, so that the information identification does not depend on a single perception system any more, and the universality of communication and the accuracy of information identification under different environments are improved.
In the embodiment of the application, in a normal communication environment, a user can use a voice acquisition device included in the head-mounted equipment to perform normal voice communication with the signal receiving and transmitting module. However, in some abnormal complex environments, such as where a special combat user cannot speak aloud, or where the user's voice is easily masked in very noisy environments, the voice information is not very useful for identifying the user's instruction information. Therefore, after the voice information, the lip image information and the facial myoelectricity information are obtained, the signal to noise ratio of the voice information can be determined first, if the signal to noise ratio exceeds the preset threshold, the voice information is not considered, and the instruction information of the user is determined only according to the lip image information and the facial myoelectricity information in a mode of fusing the image and the myoelectricity signals. Thus, communication is realized through the voice acquisition device in a normal environment, and communication is realized by depending on the electromyographic signal acquisition equipment and the WeChat camera in a complex silencing environment.
The usage flow of the head-mounted device is shown in fig. 6, a user wears the device, a device switch is turned on, and the system detects whether each module operates normally or not and whether each module can communicate normally or not. If the equipment can not normally operate, prompting a user to check and repair the corresponding part. The lip image acquisition module detects the lip state and judges whether to start speaking, if not, the lip image acquisition module is in a standby state, otherwise, the lip image acquisition module starts signal acquisition, selects different processing models according to the environment, and processes the signals to obtain instruction information. The speaker gets the instruction judgment result of the system through the instruction information display module, clicks the sending button after determining the information is correct, sends the instruction of the speaker through the information receiving and sending module, and re-collects the speaker signal to perform a new round of judgment if the information is incorrect.
In the embodiment of the application, the electromyographic signals remove environmental noise through baseline-passing operation, the lip picture sequence is not processed, the two signal streams are subjected to model selection through the environment evaluation module, then the two signals respectively enter the corresponding processing module, the information recognition module synthesizes the two signal characteristics, and the speaker instruction is judged. After the instruction is obtained, the information transmitting and receiving module is used for transmitting and communicating information.
The collected signal flow is subjected to signal processing to obtain an instruction prediction result, the prediction result can be displayed through an external display device, and after a speaker confirms that the speaker is correct, instruction information is sent through a determination button of the information receiving and sending unit. And the lip image and facial myoelectricity information linkage signal processing ensures that the environment adaptability and the instruction recognition accuracy of the interactive communication system are greatly improved. And the equipment is easy to wear, simple to use and easy to operate. The position of the collector is relatively fixed, so that the difference of each signal collection is reduced, and the accuracy of model prediction is improved.
Example 2
The embodiment of the application provides a multimode-fused communication device, which is used for executing the multimode-fused communication method described in the above embodiment, as shown in fig. 7, and the device comprises:
the information acquisition module 301 is configured to acquire voice information, lip image information, and facial myoelectricity information of a user;
the environment determining module 302 is configured to determine model parameters corresponding to a current environment according to the voice information, the lip image information, and the facial myoelectricity information through a pre-trained environment assessment model;
the instruction identifying module 303 is configured to identify instruction information of the user according to the model parameter, the voice information, the lip image information and the facial myoelectricity information.
The information acquisition module 301 includes:
the voice acquisition unit is used for acquiring voice information of a user through a voice acquisition device included in the head-mounted equipment;
the image shooting unit is used for shooting the lip area of the user through a miniature camera arranged on the voice acquisition device to obtain lip image information of the user;
and the myoelectricity acquisition unit is used for acquiring the myoelectricity information of the face of the user through the myoelectricity acquisition equipment attached to the face of the user on the head-mounted equipment.
The instruction identifying module 303 is configured to determine a corresponding image processing model and an myoelectricity processing model according to the model parameters; according to the lip image information, identifying a lip instruction corresponding to the lip image information through the image processing model; identifying facial instructions corresponding to the facial myoelectricity information through the myoelectricity processing model according to the facial myoelectricity information; and identifying instruction information of the user according to the voice information, the lip instruction and the face instruction.
The apparatus further comprises: the model training module is used for acquiring voice information, lip image information and facial myoelectricity information under different environments; dividing different data sets according to the signal-to-noise ratio of the voice information, the brightness of the lip image information and the intensity of the facial myoelectricity information; and training the model according to the different data sets to obtain an environment assessment model.
Further comprises: the display module is used for displaying the instruction information through display equipment; and the receiving and transmitting module is used for receiving the confirmation information of the user and transmitting the instruction information to the receiving party.
In the embodiment of the application, the collected signal flow is subjected to signal processing to obtain the instruction prediction result, the prediction result can be displayed through external display equipment, and after the speaker confirms that the speaker is correct, instruction information is sent through a determination button of the information receiving and sending unit. And the lip image and facial myoelectricity information linkage signal processing ensures that the environment adaptability and the instruction recognition accuracy of the interactive communication system are greatly improved. And the equipment is easy to wear, simple to use and easy to operate. The position of the collector is relatively fixed, so that the difference of each signal collection is reduced, and the accuracy of model prediction is improved.
It should be noted that the foregoing explanation of the embodiment of the multi-mode fusion communication method is also applicable to the multi-mode fusion communication device of the foregoing embodiment, and thus will not be repeated herein.
Example 3
The embodiment of the application provides a head-mounted device, which comprises: the communication method comprises a voice acquisition device, electromyographic signal acquisition equipment, a miniature camera head arranged on the voice acquisition device, a memory, a processor and an executable program stored on the memory, wherein the executable program is executed by the processor to realize the communication method of multi-mode fusion.
Example 4
In order to implement the embodiments described above, the embodiments of the present application also provide a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the unmanned vehicle navigation method according to any of the embodiments described above.
It should be noted that:
the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may also be used with the teachings herein. The required structure for the construction of such devices is apparent from the description above. In addition, the present application is not directed to any particular programming language. It will be appreciated that the teachings of the present application described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present application.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the above description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Various component embodiments of the application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in the creation means of a virtual machine according to an embodiment of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present application can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. A method of multimodal fusion communication, comprising:
acquiring voice information, lip image information and facial myoelectricity information of a user;
determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the facial myoelectricity information;
identifying instruction information of the user according to the model parameters, the voice information, the lip image information and the facial myoelectricity information;
before determining the model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the facial myoelectricity information, the method further comprises the following steps:
acquiring voice information, lip image information and facial myoelectricity information under different environments;
dividing different data sets according to the signal-to-noise ratio of the voice information, the brightness of the lip image information and the intensity of the facial myoelectricity information;
model training is carried out according to the different data sets, and an environment assessment model is obtained;
the identifying instruction information of the user according to the model parameters, the voice information, the lip image information and the facial myoelectricity information comprises the following steps:
determining a corresponding image processing model and a myoelectricity processing model according to the model parameters;
according to the lip image information, identifying a lip instruction corresponding to the lip image information through the image processing model;
identifying facial instructions corresponding to the facial myoelectricity information through the myoelectricity processing model according to the facial myoelectricity information;
identifying instruction information of the user according to the voice information, the lip instruction and the face instruction;
if the signal-to-noise ratio exceeds the preset threshold, voice information is not considered, and instruction information of the user is determined only according to lip image information and facial myoelectricity information in a mode of fusing the image and the myoelectricity signals.
2. The method of claim 1, wherein the obtaining the voice information, lip image information, and facial myoelectricity information of the user comprises:
collecting voice information of a user through a voice collecting device included in the head-mounted equipment;
shooting a lip region of a user through a miniature camera arranged on the voice acquisition device to obtain lip image information of the user;
and acquiring the facial myoelectricity information of the user through the myoelectricity signal acquisition equipment attached to the face of the user on the head-mounted equipment.
3. The method according to any one of claims 1-2, further comprising, after said identifying instruction information of said user:
displaying the instruction information through a display device;
and receiving confirmation information of the user and sending the instruction information to a receiver.
4. A multi-modality converged communication device, comprising:
the information acquisition module is used for acquiring voice information, lip image information and facial myoelectricity information of a user;
the environment determining module is used for determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the facial myoelectricity information;
the instruction identification module is used for identifying instruction information of the user according to the model parameters, the voice information, the lip image information and the facial myoelectricity information;
further comprises: the model training module is used for acquiring voice information, lip image information and facial myoelectricity information under different environments; dividing different data sets according to the signal-to-noise ratio of the voice information, the brightness of the lip image information and the intensity of the facial myoelectricity information; model training is carried out according to the different data sets, and an environment assessment model is obtained;
the instruction identification module is used for determining a corresponding image processing model and a myoelectricity processing model according to the model parameters; according to the lip image information, identifying a lip instruction corresponding to the lip image information through the image processing model; identifying facial instructions corresponding to the facial myoelectricity information through the myoelectricity processing model according to the facial myoelectricity information; identifying instruction information of the user according to the voice information, the lip instruction and the face instruction; if the signal-to-noise ratio exceeds the preset threshold, voice information is not considered, and instruction information of a user is determined only according to lip image information and facial myoelectricity information in a mode of fusing the image and the myoelectricity signals.
5. The apparatus of claim 4, wherein the information acquisition module comprises:
the voice acquisition unit is used for acquiring voice information of a user through a voice acquisition device included in the head-mounted equipment;
the image shooting unit is used for shooting the lip area of the user through a miniature camera arranged on the voice acquisition device to obtain lip image information of the user;
and the myoelectricity acquisition unit is used for acquiring the myoelectricity information of the face of the user through the myoelectricity acquisition equipment attached to the face of the user on the head-mounted equipment.
6. A headset, comprising: a voice acquisition device, an electromyographic signal acquisition apparatus, a miniature camera head and a memory provided on the voice acquisition device, a processor and an executable program stored on the memory, the executable program being executed by the processor to implement the method of any one of claims 1-3.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-3.
CN201911019740.9A 2019-10-24 2019-10-24 Multi-mode fusion communication method and device, head-mounted equipment and storage medium Active CN110865705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911019740.9A CN110865705B (en) 2019-10-24 2019-10-24 Multi-mode fusion communication method and device, head-mounted equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911019740.9A CN110865705B (en) 2019-10-24 2019-10-24 Multi-mode fusion communication method and device, head-mounted equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110865705A CN110865705A (en) 2020-03-06
CN110865705B true CN110865705B (en) 2023-09-19

Family

ID=69653139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911019740.9A Active CN110865705B (en) 2019-10-24 2019-10-24 Multi-mode fusion communication method and device, head-mounted equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110865705B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111755004A (en) * 2020-06-29 2020-10-09 苏州思必驰信息科技有限公司 Voice activity detection method and device
CN111798849A (en) * 2020-07-06 2020-10-20 广东工业大学 Robot instruction identification method and device, electronic equipment and storage medium
CN111899713A (en) * 2020-07-20 2020-11-06 中国人民解放军军事科学院国防科技创新研究院 Method, device, equipment and storage medium for silencing communication
CN111986674B (en) * 2020-08-13 2021-04-09 广州仿真机器人有限公司 Intelligent voice recognition method based on three-level feature acquisition
CN112001444A (en) * 2020-08-25 2020-11-27 斑马网络技术有限公司 Multi-scene fusion method for vehicle
CN113274038B (en) * 2021-04-02 2023-06-13 上海大学 Lip-shaped sensor device combining myoelectricity and pressure signals
CN113793047A (en) * 2021-09-22 2021-12-14 中国民航大学 Pilot cooperative communication capacity evaluation method and device
CN114917544B (en) * 2022-05-13 2023-09-22 上海交通大学医学院附属第九人民医院 Visual method and device for assisting orbicularis stomatitis function training
CN116766207B (en) * 2023-08-02 2024-05-28 中国科学院苏州生物医学工程技术研究所 Robot control method based on multi-mode signal motion intention recognition

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104410883A (en) * 2014-11-29 2015-03-11 华南理工大学 Mobile wearable non-contact interaction system and method
CN104951077A (en) * 2015-06-24 2015-09-30 百度在线网络技术(北京)有限公司 Man-machine interaction method and device based on artificial intelligence and terminal equipment
WO2016150001A1 (en) * 2015-03-24 2016-09-29 中兴通讯股份有限公司 Speech recognition method, device and computer storage medium
CN107220591A (en) * 2017-04-28 2017-09-29 哈尔滨工业大学深圳研究生院 Multi-modal intelligent mood sensing system
CN108228285A (en) * 2016-12-14 2018-06-29 中国航空工业集团公司西安航空计算技术研究所 A kind of human-computer interaction instruction identification method multi-modal end to end
CN108537207A (en) * 2018-04-24 2018-09-14 Oppo广东移动通信有限公司 Lip reading recognition methods, device, storage medium and mobile terminal
CN108594987A (en) * 2018-03-20 2018-09-28 中国科学院自动化研究所 More man-machine coordination Behavior Monitor Systems based on multi-modal interaction and its control method
CN108597501A (en) * 2018-04-26 2018-09-28 深圳市唯特视科技有限公司 A kind of audio-visual speech model based on residual error network and bidirectional valve controlled cycling element
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN108877801A (en) * 2018-06-14 2018-11-23 南京云思创智信息科技有限公司 More wheel dialog semantics based on multi-modal Emotion identification system understand subsystem
CN108899050A (en) * 2018-06-14 2018-11-27 南京云思创智信息科技有限公司 Speech signal analysis subsystem based on multi-modal Emotion identification system
CN109558788A (en) * 2018-10-08 2019-04-02 清华大学 Silent voice inputs discrimination method, computing device and computer-readable medium
CN110059575A (en) * 2019-03-25 2019-07-26 中国科学院深圳先进技术研究院 A kind of augmentative communication system based on the identification of surface myoelectric lip reading
CN110110603A (en) * 2019-04-10 2019-08-09 天津大学 A kind of multi-modal labiomaney method based on facial physiologic information
CN110109541A (en) * 2019-04-25 2019-08-09 广州智伴人工智能科技有限公司 A kind of method of multi-modal interaction
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium
CN110286756A (en) * 2019-06-13 2019-09-27 深圳追一科技有限公司 Method for processing video frequency, device, system, terminal device and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4381404B2 (en) * 2006-09-25 2009-12-09 株式会社エヌ・ティ・ティ・ドコモ Speech synthesis system, speech synthesis method, speech synthesis program
KR101092820B1 (en) * 2009-09-22 2011-12-12 현대자동차주식회사 Lipreading and Voice recognition combination multimodal interface system
WO2019050881A1 (en) * 2017-09-05 2019-03-14 Massachusetts Institute Of Technology Methods and apparatus for silent speech interface

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104410883A (en) * 2014-11-29 2015-03-11 华南理工大学 Mobile wearable non-contact interaction system and method
WO2016150001A1 (en) * 2015-03-24 2016-09-29 中兴通讯股份有限公司 Speech recognition method, device and computer storage medium
CN104951077A (en) * 2015-06-24 2015-09-30 百度在线网络技术(北京)有限公司 Man-machine interaction method and device based on artificial intelligence and terminal equipment
CN108228285A (en) * 2016-12-14 2018-06-29 中国航空工业集团公司西安航空计算技术研究所 A kind of human-computer interaction instruction identification method multi-modal end to end
CN107220591A (en) * 2017-04-28 2017-09-29 哈尔滨工业大学深圳研究生院 Multi-modal intelligent mood sensing system
CN108594987A (en) * 2018-03-20 2018-09-28 中国科学院自动化研究所 More man-machine coordination Behavior Monitor Systems based on multi-modal interaction and its control method
CN108537207A (en) * 2018-04-24 2018-09-14 Oppo广东移动通信有限公司 Lip reading recognition methods, device, storage medium and mobile terminal
CN108597501A (en) * 2018-04-26 2018-09-28 深圳市唯特视科技有限公司 A kind of audio-visual speech model based on residual error network and bidirectional valve controlled cycling element
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN108877801A (en) * 2018-06-14 2018-11-23 南京云思创智信息科技有限公司 More wheel dialog semantics based on multi-modal Emotion identification system understand subsystem
CN108899050A (en) * 2018-06-14 2018-11-27 南京云思创智信息科技有限公司 Speech signal analysis subsystem based on multi-modal Emotion identification system
CN109558788A (en) * 2018-10-08 2019-04-02 清华大学 Silent voice inputs discrimination method, computing device and computer-readable medium
CN110059575A (en) * 2019-03-25 2019-07-26 中国科学院深圳先进技术研究院 A kind of augmentative communication system based on the identification of surface myoelectric lip reading
CN110110603A (en) * 2019-04-10 2019-08-09 天津大学 A kind of multi-modal labiomaney method based on facial physiologic information
CN110109541A (en) * 2019-04-25 2019-08-09 广州智伴人工智能科技有限公司 A kind of method of multi-modal interaction
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium
CN110286756A (en) * 2019-06-13 2019-09-27 深圳追一科技有限公司 Method for processing video frequency, device, system, terminal device and storage medium

Also Published As

Publication number Publication date
CN110865705A (en) 2020-03-06

Similar Documents

Publication Publication Date Title
CN110865705B (en) Multi-mode fusion communication method and device, head-mounted equipment and storage medium
CN110544488B (en) Method and device for separating multi-person voice
US20230045237A1 (en) Wearable apparatus for active substitution
WO2021036568A1 (en) Fitness-assisted method and electronic apparatus
CN110035141A (en) A kind of image pickup method and equipment
CN108198130B (en) Image processing method, image processing device, storage medium and electronic equipment
CN102760077A (en) Method and device for self-adaptive application scene mode on basis of human face recognition
CN109167910A (en) focusing method, mobile terminal and computer readable storage medium
WO2022033556A1 (en) Electronic device and speech recognition method therefor, and medium
CN109819167B (en) Image processing method and device and mobile terminal
WO2022199500A1 (en) Model training method, scene recognition method, and related device
CN109743504A (en) A kind of auxiliary photo-taking method, mobile terminal and storage medium
CN114242037A (en) Virtual character generation method and device
CN113611318A (en) Audio data enhancement method and related equipment
WO2015100923A1 (en) User information obtaining method and mobile terminal
CN112489036A (en) Image evaluation method, image evaluation device, storage medium, and electronic apparatus
CN113574525A (en) Media content recommendation method and equipment
CN111191018B (en) Response method and device of dialogue system, electronic equipment and intelligent equipment
CN116137673A (en) Digital human expression driving method and device, equipment and medium thereof
CN107003736A (en) For the method and apparatus for the status data for obtaining instruction user state
CN109986553B (en) Active interaction robot, system, method and storage device
WO2022041182A1 (en) Method and device for making music recommendation
CN110491384B (en) Voice data processing method and device
CN113948076A (en) Voice interaction method, device and system
CN111524518B (en) Augmented reality processing method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant