CN111985432B - Multi-modal data fusion method based on Bayesian theorem and adaptive weight adjustment - Google Patents

Multi-modal data fusion method based on Bayesian theorem and adaptive weight adjustment Download PDF

Info

Publication number
CN111985432B
CN111985432B CN202010882365.7A CN202010882365A CN111985432B CN 111985432 B CN111985432 B CN 111985432B CN 202010882365 A CN202010882365 A CN 202010882365A CN 111985432 B CN111985432 B CN 111985432B
Authority
CN
China
Prior art keywords
interaction
probability
decision
given
bayesian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010882365.7A
Other languages
Chinese (zh)
Other versions
CN111985432A (en
Inventor
左韬
王星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aobo Jiangsu Robot Co ltd
Original Assignee
Wuhan University of Science and Engineering WUSE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Science and Engineering WUSE filed Critical Wuhan University of Science and Engineering WUSE
Priority to CN202010882365.7A priority Critical patent/CN111985432B/en
Publication of CN111985432A publication Critical patent/CN111985432A/en
Application granted granted Critical
Publication of CN111985432B publication Critical patent/CN111985432B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/70Multimodal biometrics, e.g. combining information from different biometric modalities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-modal data fusion method based on Bayesian theorem and adaptive weight adjustment, which comprises the following steps: and carrying out multiple interactive experiments and collecting data, so as to calculate the prior probability of the Bayesian theorem. In the actual interaction process, the real-time interaction effect is inspected by utilizing the distance between the actual interaction point and the central point of the preset interaction range in real time, and the distance factor is fused into a self-adaptive parameter to be used as a self-adaptive weight of the multi-mode data fusion; and (4) providing a Bayesian weight by utilizing Bayes theorem and combining the actual interaction condition of each mode, and further adjusting the interaction result. The method comprehensively adjusts the decision fusion weight of the multi-modal data from two angles of the interaction process and the interaction result, and improves the accuracy and the robustness of the man-machine interaction process based on the multi-modal fusion.

Description

Multi-modal data fusion method based on Bayesian theorem and adaptive weight adjustment
Technical Field
The invention relates to the field of multi-modal data fusion, in particular to a multi-modal data fusion method based on Bayesian theorem and adaptive weight adjustment.
Background
The MultiModal fusion is responsible for combining information of a plurality of modalities to perform target prediction (classification or regression), and belongs to one of the earliest research directions of MMML (MultiModal Machine Learning), which is also the most widely used direction at present. According to the fusion level, the multi-modal fusion can be divided into three categories, namely pixel level, feature level and decision level, and the fusion is respectively carried out on original data, abstract features and decision results. The feature level fusion can be divided into two categories, namely early fusion and late fusion, which represent that the fusion occurs in the early stage and the late stage of feature extraction, and hybrid (mixing) method for mixing multiple fusion levels is also provided.
The fusion model provided by the invention belongs to back-end fusion, namely decision fusion, and the back-end fusion is to perform fusion on decision probabilities output by classifiers respectively trained by different modal data. The advantage of doing so is that the errors of the fusion model come from different classifiers, while the errors from different classifiers are often not correlated with each other, do not affect each other, and do not cause further accumulation of errors. Common back-end fusion modes include maximum value fusion (max-fusion), average value fusion (averaged-fusion), Bayes 'rule based fusion (Bayes' rule based), ensemble learning (ensemble learning), and the like, and selecting an appropriate weight for back-end fusion to improve the robustness of a data fusion structure is also a hot issue of research.
The fusion model provides proper weight for the fusion process mainly by means of Bayes theorem and interactive quality evaluation. The contribution of the whole system is that a method for integrating information of two modes, namely a sound mode and a human body posture mode (namely two kinds of information of finger pointing and face orientation) is provided, compared with man-machine interaction of a single mode, multi-mode data has higher robustness in man-machine interaction, and different mode data can be mutually corrected.
Disclosure of Invention
The invention integrates the Bayesian theorem for comprehensively inspecting the evaluation results of different modes and the self-adaptive weight for dynamically inspecting the interaction quality of each mode into a multi-mode data decision fusion framework. The influence of the correctness of the judgment result of each mode on the whole interaction process is analyzed, and the results given by the two modes are comprehensively considered by using Bayesian theorem, so that a corresponding decision coefficient is given to each mode. The adaptive weight is given by considering the interaction effect of two video modalities (finger pointing and face orientation), and the purpose is to improve the robustness of an interactive system by giving higher weight to the modality which is more excellent in performance. The interaction effect is embodied in the geometric distance between the specific physical interaction direction in the video frame and the center of the preset interaction target, and the interaction accuracy can be further improved by giving a larger weight to the interaction target closer to the specific physical interaction direction.
The specific invention content is as follows: a multi-modal data back-end decision fusion method based on Bayesian theorem adaptive parameter adjustment comprises the following steps:
step 1: obtaining the prior probability of a Bayesian formula for multiple times according to the original model;
step 1.1: selecting a proper video frame to judge the finger direction and the face direction of the interactor;
step 1.2: for each video frame selected by the finger pointing mode, judging the finger interaction points determined in the frame, and providing a decision vector with the length of N interaction points, wherein the decision vector is in the following form:
t pt (i)=[t 1 ,t 2 ,...,t j ,...,t N ] T
Figure BDA0002654470990000031
step 1.3: adding the given decision vectors corresponding to the finger direction modes to obtain decision probability:
Figure BDA0002654470990000032
wherein F p For the number of video frames selected,
in the stage of calculating the prior probability by Beyes's theorem, the coefficient alpha i Taking a constant, wherein the value of the constant is not 1, and each element of the decision probability represents the probability that the corresponding preset interaction point is the finger modal target interaction point;
step 1.4: the decision vector and decision probability for the face orientation modality are given in the same way:
t et (i)=[t 1 ,t 2 ,...,t j ,...,t N ] T
Figure BDA0002654470990000033
Figure BDA0002654470990000034
step 1.5: calculating the similarity of HMM (Hidden Markov Model) of the speech signal and each template through HTK (high Markov Model Toolkit), and obtaining the probability of each interactive target given by the sound mode:
P s =[p s1 ,p s2 ,...,p si ,...,p sN ] T
step 1.6: adding the decision probabilities determined by the three modalities to obtain the decision probability corresponding to each preset interaction point given by the whole interaction process:
Figure BDA0002654470990000035
the interaction point corresponding to the element with the largest value in the decision vector represented by the formula is the interaction point determined in the interaction process;
step 1.7: the process from step 1.1 to step 1.6 is repeated for a plurality of times, the actual target interaction point of the interactor in each test is given artificially, the correctness of each mode and the whole interaction model are given by taking the actual target interaction point as a standard, and the recorded data are as follows:
number of experiments Modal (eye) Modality p (point) Modality s (sound) g(general)
1 T T T T
2 F T T T
3 T F T T
4 T T T T
5 T T T T
6 T T F T
7 T F F F
8 T T T T
9 T T T T
10 F T T F
11 T T T T
12 T F F F
13 T F T F
14 F T T T
15 T T T F
... ... ... ... ...
In the table, T represents correct judgment, F represents error, and g (general) represents the overall judgment condition of the interactive system;
step 1.8: counting the number of related events according to the data given in the table, and giving the prior probability of a Bayesian formula according to the central limit theorem:
Figure BDA0002654470990000041
Figure BDA0002654470990000042
Figure BDA0002654470990000051
Figure BDA0002654470990000052
Figure BDA0002654470990000053
the probability in each formula above will provide a prior probability for bayesian formula, where N (a) represents the number of occurrences of event a in the table above, e.g., N (g ═ T) represents the number of times general judges are correct in the data table;
step 1, providing a parameter basis (namely Bayesian prior probability) for Bayesian theorem through experiments of which weights are not considered for a plurality of times;
step 2: providing Bayesian prior probability in step 1, then performing man-machine interaction, and adding self-adaptive weight adjustment in step 2;
step 2.1: selecting a proper video frame during interaction to judge the finger direction and the face direction of an interactor;
step 2.2: for each video frame selected by the finger pointing mode, judging the finger pointing interaction points determined in the frame, and providing a decision vector with the length of N of the interaction points, wherein the form of the decision vector is as follows:
t pt (i)=[t 1 ,t 2 ,...,t j ,...,t N ] T
each element in the vector corresponds to a preset interaction target:
Figure BDA0002654470990000054
the value of the element corresponding to the selected interactive point is 1, and correspondingly, the values of other elements are 0 (one interactive target is selected by one video frame);
step 2.3: calculating the geometric distance between the interaction point determined by the image frame and the center of the preset interaction area (because the two points are in the same plane, the distance calculation only considers two-dimensional coordinates):
L i =((x 1 -x 2 ) 2 +(y 1 -y 2 ) 2 ) 0.5
step 2.4: will be at a distance L i Is assigned to the adaptive weight value alpha i
L i =α i
Step 2.5: substituting the obtained self-adaptive weight into decision probability vectors of e and p modes:
Figure BDA0002654470990000061
and step 3: on the basis of the Bayes prior probability given in the step 1 and the self-adaptive weight given in the step 2, the final weight adjustment is carried out by utilizing a Bayes formula according to the judgment results of 3 modes;
step 3.1: when the interaction points judged by the three modes are the same, the interaction points are directly selected as interaction targets of the whole system;
step 3.2: when the interaction point of one mode judgment is different from the results of other two same mode judgments (namely the element positions corresponding to the maximum values in the probability vectors given by the three modes are different), a Bayesian formula is introduced for weighting. The following formula is given as an example of different e-mode results and the same p-and s-mode results:
Figure BDA0002654470990000062
the probability corresponds to the final interactive result and takes the result of e-mode judgment,
Figure BDA0002654470990000063
the probability corresponds to the final interaction result and takes the judgment result of the p and s modes;
step 3.2.1: applying the self-adaptive weight and the Bayes formula to the probability vector fusion process of each mode to obtain an improved probability vector:
Figure BDA0002654470990000071
at this time, the interactive target corresponding to the maximum value element in the probability vector is the interactive target determined by the whole improved system;
step 3.3: when the interaction targets determined by the three modalities are different from each other (namely, the three modalities determine three different interaction targets), applying a Bayesian formula of a corresponding situation to the weighting process:
Figure BDA0002654470990000072
Figure BDA0002654470990000073
Figure BDA0002654470990000074
step 3.4: the three Bayes formulas are substituted into the following formula to obtain the final probability vector,
Figure BDA0002654470990000075
and the interactive target corresponding to the element with the largest value in the probability vector is the interactive target selected by the whole improved system.
The invention has the following advantages and beneficial effects:
the invention utilizes Bayesian theorem to carry out comprehensive classification analysis on the interaction results given by the three modes, thereby providing Bayesian weight values which refer to various mode output results, and the process considers the accuracy of data from the mode output results and aims to give higher weight values to the possibly correct mode analysis results and correspondingly give lower weight values to the possibly wrong mode results, thereby improving the interaction accuracy. Different from the attention of the Bayes principle to interaction results of various modes, the self-adaptive weight adjustment focuses on the interaction process. The quality of the interaction process is evaluated by observing the geometric distance between the actual physical interaction point corresponding to the selected video frame and the preset interaction point in the interaction process, and the interaction accuracy is improved from another angle by endowing an interaction mode with good quality with a higher weight.
Drawings
FIG. 1 is a schematic diagram of human-machine interaction operation, wherein a depth camera is located on an interaction plane.
Fig. 2 is a flow chart of the proposed multimodal data fusion.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings in conjunction with the specific embodiments, and the description is only for explaining the present invention and is not intended to limit the present invention. Fig. 1 is a schematic diagram of human-computer interaction performed by an operator, as shown in the figure, the interactor faces to a plane where a predetermined interaction target is located (the predetermined interaction target is in the same plane, and the z coordinate of the plane is 0 as shown in fig. 1), and a Kinect depth camera is placed on the plane of the interaction target. The interactive target is selected by the interactive person through the face orientation, the finger pointing direction and the voice information, and the interactive target is selected by the interactive person through three modalities (ideally, the interactive targets of the three modalities are consistent). In order to ensure the interaction accuracy, as shown in fig. 2, i.e. the flow of the interaction system, a large number of interaction experiments (i.e. repeated interaction processes) need to be performed first, the positive-error data of each modality and the actual target interaction points of the interactors are collected, the table in step 1.7 is counted, the prior probability of the bayesian theorem is calculated, and a data basis is provided for correcting the weight of each modality through the bayesian principle in step 3. In the actual interaction process (different from the multi-time interaction process of collecting data by the Bayesian principle in the step 1), the step 2 evaluates the interactive video frames in real time (the evaluation basis is the interaction point distance), and gives the self-adaptive weight according to the evaluation. And 3, comprehensively considering the analysis result of each mode by using a Bayes principle to give a weight based on result analysis and substituting the self-adaptive weight and the Bayes weight into the probability vector to improve the interaction accuracy. The method specifically comprises the following steps:
step 1: obtaining the prior probability of a Bayesian formula for multiple times according to the original model;
step 1.1: selecting a proper video frame to judge the finger direction and the face direction of the interactor;
step 1.2: for each video frame selected by the finger pointing mode, judging the finger interaction points determined in the frame, and providing a decision vector with the length of N interaction points, wherein the decision vector is in the following form:
t pt (i)=[t 1 ,t 2 ,...,t j ,...,t N ] T
Figure BDA0002654470990000091
step 1.3: adding the given decision vectors corresponding to the finger direction modes to obtain decision probability:
Figure BDA0002654470990000092
wherein F p For the number of video frames selected,
in the stage of calculating the prior probability by Beyes's theorem, the coefficient alpha i Taking a constant, wherein the value of the constant is not 1, and each element of the decision probability represents the probability that the corresponding preset interaction point is the finger modal target interaction point;
step 1.4: the decision vector and decision probability for the face orientation modality are given in the same way:
t et (i)=[t 1 ,t 2 ,...,t j ,...,t N ] T
Figure BDA0002654470990000093
Figure BDA0002654470990000094
step 1.5: calculating the similarity of HMM (Hidden Markov Model) of the speech signal and each template through HTK (high Markov Model Toolkit), and obtaining the probability of each interactive target given by the sound mode:
P s =[p s1 ,p s2 ,...,p si ,...,p sN ] T
step 1.6: adding the decision probabilities determined by the three modalities to obtain the decision probability corresponding to each preset interaction point given by the whole interaction process:
Figure BDA0002654470990000101
the interaction point corresponding to the element with the largest value in the decision vector represented by the formula is the interaction point determined in the interaction process;
step 1.7: the process from step 1.1 to step 1.6 is repeated for a plurality of times, the actual target interaction point of the interactor in each test is given artificially, the correctness of each mode and the whole interaction model are given by taking the actual target interaction point as a standard, and the recorded data are as follows:
number of experiments Modal (eye) Modality p (point) Modality s (sound) g(general)
1 T T T T
2 F T T T
3 T F T T
4 T T T T
5 T T T T
6 T T F T
7 T F F F
8 T T T T
9 T T T T
10 F T T F
11 T T T T
12 T F F F
13 T F T F
14 F T T T
15 T T T F
... ... ... ... ...
In the table, T represents correct judgment, F represents error, and g (general) represents the overall judgment condition of the interactive system;
step 1.8: counting the number of related events according to the data given in the table, and giving the prior probability of a Bayesian formula according to the central limit theorem:
Figure BDA0002654470990000111
Figure BDA0002654470990000112
Figure BDA0002654470990000113
Figure BDA0002654470990000114
Figure BDA0002654470990000115
the probability in each formula above will provide a prior probability for bayesian formula, where N (a) represents the number of occurrences of event a in the table above, e.g., N (g ═ T) represents the number of times general judges are correct in the data table;
step 1, providing a parameter basis (namely Bayesian prior probability) for Bayesian theorem through experiments of which weights are not considered for a plurality of times;
step 2: providing Bayesian prior probability in step 1, then performing man-machine interaction, and adding self-adaptive weight adjustment in step 2;
step 2.1: selecting a proper video frame during interaction to judge the finger direction and the face direction of an interactor;
step 2.2: for each video frame selected by the finger pointing mode, judging the finger pointing interaction points determined in the frame, and providing a decision vector with the length of N of the interaction points, wherein the form of the decision vector is as follows:
t pt (i)=[t 1 ,t 2 ,...,t j ,...,t N ] T
each element in the vector corresponds to a preset interaction target:
Figure BDA0002654470990000121
the value of the element corresponding to the selected interactive point is 1, and correspondingly, the values of other elements are 0 (one interactive target is selected by one video frame);
step 2.3: calculating the geometric distance between the interaction point determined by the image frame and the center of the preset interaction area (since the two points are in the same plane, the distance calculation only considers two-dimensional coordinates):
L i =((x 1 -x 2 ) 2 +(y 1 -y 2 ) 2 ) 0.5
step 2.4: will be at a distance L i Is assigned to the adaptive weight value alpha i
L i =α i
Step 2.5: substituting the obtained self-adaptive weight into decision probability vectors of e and p modes:
Figure BDA0002654470990000122
and step 3: on the basis of the Bayes prior probability given in the step 1 and the self-adaptive weight given in the step 2, the final weight adjustment is carried out by utilizing a Bayes formula according to the judgment results of 3 modes;
step 3.1: when the interaction points judged by the three modes are the same, the interaction points are directly selected as interaction targets of the whole system;
step 3.2: when the interaction point of one mode judgment is different from the results of other two same mode judgments (namely the element positions corresponding to the maximum values in the probability vectors given by the three modes are different), a Bayesian formula is introduced for weighting. The following formula is given as an example that the e mode results are different and the p and s mode results are the same:
Figure BDA0002654470990000131
the probability corresponds to the final interactive result and takes the result of e-mode judgment,
Figure BDA0002654470990000132
the probability is corresponding to the final interaction result, and the result of judging the p and s modes is taken;
step 3.2.1: applying the self-adaptive weight and the Bayes formula to the probability vector fusion process of each mode to obtain an improved probability vector:
Figure BDA0002654470990000133
at this time, the interactive target corresponding to the maximum value element in the probability vector is the interactive target determined by the whole improved system;
step 3.3: when the interaction targets determined by the three modalities are different from each other (namely, the three modalities determine three different interaction targets), applying a Bayesian formula of a corresponding situation to a weighting process:
Figure BDA0002654470990000134
Figure BDA0002654470990000135
Figure BDA0002654470990000136
step 3.4: the three Bayes formulas are substituted into the following formula to obtain the final probability vector,
Figure BDA0002654470990000141
and the interactive target corresponding to the element with the largest value in the probability vector is the interactive target selected by the whole improved system.

Claims (1)

1. A multi-modal data fusion method based on Bayesian theorem and adaptive weight adjustment is characterized by comprising the following steps:
step 1: obtaining the prior probability of a Bayesian formula for multiple times according to the original model;
step 1.1: selecting a proper video frame to judge the finger direction and the face direction of the interactor;
step 1.2: for each video frame selected by the finger pointing mode, judging the finger interaction points determined in the frame, and providing a decision vector with the length of N interaction points, wherein the decision vector is in the following form:
t pt (i)=[t 1 ,t 2 ,...,t j ,...,t N ] T
Figure FDA0003726101670000011
step 1.3: adding the given decision vectors corresponding to the finger direction modes to obtain decision probability:
Figure FDA0003726101670000012
wherein F p For the number of video frames selected,
in the stage of calculating the prior probability by Bayes' theorem, the coefficient alpha i Taking a constant, initially assigning the constant as 1, and determining the probability that each element of the probability represents the corresponding preset interaction point as a finger modal target interaction point;
step 1.4: the decision vector and decision probability for the face orientation modality are given in the same way:
t et (i)=[t t ,t 2 ,...,t j ,...,t N ] T
Figure FDA0003726101670000013
Figure FDA0003726101670000014
step 1.5: calculating the similarity between the Hidden Markov Model of the voice signal and each template through a Hidden Markov Model Toolkit, namely a voice recognition Toolkit, and obtaining the probability of each interactive target given by the voice mode:
P s =[p sl ,p s2 ,...,p si ,...,p sN ] T
step 1.6: adding the decision probabilities determined by the three modalities to obtain the decision probability corresponding to each preset interaction point given by the whole interaction process:
Figure FDA0003726101670000021
the interaction point corresponding to the element with the largest value in the decision vector represented by the formula is the interaction point determined in the interaction process, wherein M is the number of modes;
step 1.7: the process from step 1.1 to step 1.6 is repeated for a plurality of times, the actual target interaction point of the interactor in each test is given artificially, the correctness of each mode and the whole interaction model are given by taking the actual target interaction point as a standard, and the recorded data are as follows:
number of experiments Modality e, eye Modality p, point Modality s, sound g,general 1 T T T T 2 F T T T 3 T F T T 4 T T T T 5 T T T T 6 T T F T 7 T F F F 8 T T T T 9 T T T T 10 F T T F 11 T T T T 12 T F F F 13 T F T F 14 F T T T 15 T T T F ... ... ... ... ...
In the table, T represents correct judgment, F represents error, and g, namely general represents the integral judgment condition of the interactive system;
step 1.8: the number of relevant events is counted according to the data given in the table above, and the prior probability of the Bayes formula is given according to the central limit theorem:
Figure FDA0003726101670000031
Figure FDA0003726101670000032
Figure FDA0003726101670000033
Figure FDA0003726101670000034
Figure FDA0003726101670000035
the probability in the above formulas will provide prior probability for bayesian formula, where N (a) represents the number of occurrences of event a in the above table, and N (g ═ T) represents the number of times the general judgment is correct in the data table;
step 1, providing a parameter basis, namely Bayesian prior probability, for Bayesian theorem through experiments in which weights are not considered for multiple times;
step 2: providing Bayesian prior probability in step 1, then performing man-machine interaction, and adding self-adaptive weight adjustment in step 2;
step 2.1: selecting a proper video frame during interaction to judge the finger direction and the face direction of an interactor;
step 2.2: for each video frame selected by the finger pointing mode, judging the finger pointing interaction points determined in the frame, and providing a decision vector with the length of N of the interaction points, wherein the form of the decision vector is as follows:
t pt (i)=[t 1 ,t 2 ,...,t j ,...,t N ] T
each element in the vector corresponds to a preset interaction target:
Figure FDA0003726101670000041
the value of the element corresponding to the selected interactive point is 1, correspondingly, the values of other elements are 0, and one interactive target is selected from one video frame;
step 2.3: and calculating the geometric distance between the interaction point determined by the video frame and the center of the preset interaction area, wherein the two points are positioned on the same plane, so that the distance calculation only considers two-dimensional coordinates:
L i =((x 1 -x 2 ) 2 +(y 1 -y 2 ) 2 ) 0.5
step 2.4: will be at a distance L i And assigning to a self-adaptive weight:
L i =α i
step 2.5: substituting the obtained self-adaptive weight into decision probability vectors of e and p modes:
Figure FDA0003726101670000042
and step 3: on the basis of the Bayes prior probability given in the step 1 and the self-adaptive weight given in the step 2, the final weight adjustment is carried out by utilizing a Bayes formula according to the judgment results of 3 modes;
step 3.1: when the interaction points judged by the three modes are the same, directly selecting the interaction points as interaction targets of the whole system;
step 3.2: when the interaction point judged by a certain mode is different from the results judged by other two same modes, namely the element positions corresponding to the maximum values in the probability vectors given by the three modes are different, introducing a Bayesian formula for weighting, wherein the following formula takes the example that the results of e modes are different and the results of p and s modes are the same:
Figure FDA0003726101670000051
the probability corresponds to the final interaction result and takes the result of e-mode judgment;
Figure FDA0003726101670000052
the probability is corresponding to the final interaction result, and the result of judging the p and s modes is taken;
step 3.2.1: applying the self-adaptive weight and the Bayes formula to the probability vector fusion process of each mode to obtain an improved probability vector:
Figure FDA0003726101670000053
at this time, the interactive target corresponding to the maximum value element in the probability vector is the interactive target determined by the whole improved system;
step 3.3: when the interaction targets determined by the three modalities are different, namely when the three modalities determine three different interaction targets, applying a Bayesian formula of the corresponding situation to the weighting process:
Figure FDA0003726101670000054
Figure FDA0003726101670000055
Figure FDA0003726101670000056
step 3.4: the three Bayes formulas are substituted into the following formula to obtain the final probability vector,
Figure FDA0003726101670000061
and the interactive target corresponding to the element with the largest value in the probability vector is the interactive target selected by the whole improved system.
CN202010882365.7A 2020-08-28 2020-08-28 Multi-modal data fusion method based on Bayesian theorem and adaptive weight adjustment Active CN111985432B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010882365.7A CN111985432B (en) 2020-08-28 2020-08-28 Multi-modal data fusion method based on Bayesian theorem and adaptive weight adjustment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010882365.7A CN111985432B (en) 2020-08-28 2020-08-28 Multi-modal data fusion method based on Bayesian theorem and adaptive weight adjustment

Publications (2)

Publication Number Publication Date
CN111985432A CN111985432A (en) 2020-11-24
CN111985432B true CN111985432B (en) 2022-08-12

Family

ID=73440805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010882365.7A Active CN111985432B (en) 2020-08-28 2020-08-28 Multi-modal data fusion method based on Bayesian theorem and adaptive weight adjustment

Country Status (1)

Country Link
CN (1) CN111985432B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065277B (en) * 2021-03-11 2021-10-08 自然资源部国土卫星遥感应用中心 High-resolution remote sensing satellite flutter detection and modeling method in cooperation with multi-load data
CN113616184B (en) * 2021-06-30 2023-10-24 北京师范大学 Brain network modeling and individual prediction method based on multi-mode magnetic resonance image

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7979363B1 (en) * 2008-03-06 2011-07-12 Thomas Cecil Minter Priori probability and probability of error estimation for adaptive bayes pattern recognition
CN102646200A (en) * 2012-03-08 2012-08-22 武汉大学 Image classifying method and system for self-adaption weight fusion of multiple classifiers

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7979363B1 (en) * 2008-03-06 2011-07-12 Thomas Cecil Minter Priori probability and probability of error estimation for adaptive bayes pattern recognition
CN102646200A (en) * 2012-03-08 2012-08-22 武汉大学 Image classifying method and system for self-adaption weight fusion of multiple classifiers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于动态贝叶斯网络的可信度量模型研究;梁洪泉等;《通信学报》;20130925(第09期);全文 *

Also Published As

Publication number Publication date
CN111985432A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
EP3843035A1 (en) Image processing method and apparatus for target recognition
CN108596087B (en) Driving fatigue degree detection regression model based on double-network result
CN112507996A (en) Face detection method of main sample attention mechanism
CN111985432B (en) Multi-modal data fusion method based on Bayesian theorem and adaptive weight adjustment
CN110555417A (en) Video image recognition system and method based on deep learning
CN115861772A (en) Multi-scale single-stage target detection method based on RetinaNet
CN109389105B (en) Multitask-based iris detection and visual angle classification method
CN109919055B (en) Dynamic human face emotion recognition method based on AdaBoost-KNN
Zheng et al. Improvement of grayscale image 2D maximum entropy threshold segmentation method
CN115439458A (en) Industrial image defect target detection algorithm based on depth map attention
CN109934129B (en) Face feature point positioning method, device, computer equipment and storage medium
CN111860587A (en) Method for detecting small target of picture
EP2535787B1 (en) 3D free-form gesture recognition system and method for character input
CN103105924A (en) Man-machine interaction method and device
CN113799124A (en) Robot flexible grabbing detection method in unstructured environment
WO2019085060A1 (en) Method and system for detecting waving of robot, and robot
CN117274774A (en) Yolov 7-based X-ray security inspection image dangerous goods detection algorithm
CN111144462A (en) Unknown individual identification method and device for radar signals
CN114821423A (en) Fire detection method based on improved YOLOV5
CN110163130A (en) A kind of random forest grader and classification method of the feature pre-align for gesture identification
CN111860265B (en) Multi-detection-frame loss balanced road scene understanding algorithm based on sample loss
CN112329571B (en) Self-adaptive human body posture optimization method based on posture quality evaluation
CN113327269A (en) Unmarked cervical vertebra movement detection method
CN111368625B (en) Pedestrian target detection method based on cascade optimization
CN117237902A (en) Robot character recognition system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231108

Address after: Room B-301, Zhongke Entrepreneurship Center, Changzhou Science and Education City, No. 18 Changwu Middle Road, Changzhou City, Jiangsu Province, 213100

Patentee after: AOBO (JIANGSU) ROBOT CO.,LTD.

Address before: 430081 No. 947 Heping Avenue, Qingshan District, Hubei, Wuhan

Patentee before: WUHAN University OF SCIENCE AND TECHNOLOGY

TR01 Transfer of patent right