CN112101219A - Intention understanding method and system for elderly accompanying robot - Google Patents

Intention understanding method and system for elderly accompanying robot Download PDF

Info

Publication number
CN112101219A
CN112101219A CN202010970662.7A CN202010970662A CN112101219A CN 112101219 A CN112101219 A CN 112101219A CN 202010970662 A CN202010970662 A CN 202010970662A CN 112101219 A CN112101219 A CN 112101219A
Authority
CN
China
Prior art keywords
gesture
intention
gesture recognition
probability set
cin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010970662.7A
Other languages
Chinese (zh)
Other versions
CN112101219B (en
Inventor
冯志全
豆少松
郭庆北
杨晓晖
徐涛
田京兰
范雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Jinan
Original Assignee
University of Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Jinan filed Critical University of Jinan
Priority to CN202010970662.7A priority Critical patent/CN112101219B/en
Publication of CN112101219A publication Critical patent/CN112101219A/en
Application granted granted Critical
Publication of CN112101219B publication Critical patent/CN112101219B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Acoustics & Sound (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)
  • Manipulator (AREA)

Abstract

The invention provides an intention understanding method and system for an elderly accompanying robot, wherein the method comprises the following steps: acquiring gesture images and gesture information of the old in real time, and performing image segmentation on the gesture images and the gesture information to respectively form gesture data sets and gesture data sets; inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set; performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the two probability sets are fused by adopting an F1 score under different intention classifications; and then determining the final recognition intention. Based on the method, an intention understanding system is also provided. The invention improves the intention understanding rate of the old accompanying robot system and the use satisfaction of the old for the social accompanying robot.

Description

Intention understanding method and system for elderly accompanying robot
Technical Field
The invention belongs to the technical field of old people training, and particularly relates to an intention understanding method and system for an old accompanying robot.
Background
When aging is a problem in many countries around the world, it is difficult to give a parent the necessary care at all times because work as a child is busy. Meanwhile, through actual research on a nursing home and research on an accompanying robot by Sari Merilampi and the like, the robot accompanying is more and more approved by the old people, and the old accompanying robot provides a lot of services. However, the recognition rate and the intention understanding rate of the robot accompanying system in the intention understanding of the old people need to be improved, and particularly, the robot accompanying system has a plurality of unique characteristics in the movement of the old people, so that the interaction burden of the old people is increased when the old people use the accompanying robot, and some negative emotions of the old people are easily caused, therefore, the model fusion algorithm (SDFM) for effectively improving the intention understanding rate of the robot to the behavior of the old people is provided, and the man-machine interaction action design is applied to a real scene.
Because the deep learning model and the statistical model have advantages and disadvantages in the pattern recognition process, the statistical method is characterized by high judgment efficiency and slower judgment efficiency of the neural network, the establishment of the statistical method can be completely obtained according to the theory, and the recognition effect is obviously and more suitable for the training of the posture and posture information when the data volume is small or the training data is difficult to collect. The design of the structure and the algorithm of the neural network must depend on the experience of a designer, a high recognition effect can be ensured when the data volume is large enough, the method is suitable for gesture recognition of easily collected information, but the recognition effect is often not satisfactory when the data volume is small or training data is difficult to collect, and the success of the system has great chance.
Disclosure of Invention
In order to solve the technical problems, the invention provides an intention understanding method and system for an elderly accompanying robot, and the intention understanding method and system provided by the invention adopt a fusion scheme of fusion of an intention recognition result set and a weight matrix, wherein F1-score under each classification in a confusion matrix of sub-model recognition results is used as a weight value to form the weight matrix, a fuzzy evaluation operator is established by referring to a fuzzy evaluation theory to fuse two model recognition results, and the accuracy and the sensitivity of the elderly accompanying robot intention recognition are improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
an intention understanding method for an elderly accompanying robot, comprising the following steps:
acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and performing image segmentation on the gesture image and the gesture information to respectively form a gesture data set and a gesture data set;
inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;
performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1 score under different intention classifications; and then determining the final recognition intention.
Further, before the behavior image of the old person is obtained in real time, voice channel information is obtained, and keywords of the voice channel information are extracted to start the robot.
Further, the method for training the neural network model and the hidden markov model comprises:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are both segmented by adopting an Otsu algorithm to form an old age behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
Further, the process of performing intent fusion on the gesture recognition probability set and the gesture recognition probability set by the fusion algorithm based on the confusion matrix is as follows:
building an intention fusion model F ═ F (I, Cin, Hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set;
distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively;
carrying out vague change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C ═ Cconfi omicron Cin + Hconfi omichin; where o is called a composite evaluation operator.
Further, the method for calculating the weight ratio of the gesture recognition probability set to the weight ratio of different intentions in the fusion of the gesture recognition probability set by using the F1 score under different intention classifications includes:
assigning values to the system using F1 scores under different intentions
Figure BDA0002682708190000031
As weight values under each intent classification of Cin; is assigned to
Figure BDA0002682708190000032
As the weight value under each intention classification of Hin;wherein
Figure BDA0002682708190000033
Based on
Figure BDA0002682708190000034
Obtaining n x1 dimensional weight matrix of Cin
Figure BDA0002682708190000035
Based on
Figure BDA0002682708190000036
N x1 dimensional weight matrix of Hin can be obtained
Figure BDA0002682708190000037
Cin, Hin, Cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ]12,…,λn]T(ii) a Selecting the maximum value gamma in the matrixi(ii) a The i intention is the final recognition intention of the user; wherein [ lambda ]12,…,λn]T=Cin×Cconfi+Hin×Hconfi。
Further, when the robot does not complete the specified action according to the final intention:
starting target identification, and judging the distance of a specific obstacle by adopting image acquisition equipment;
after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object into a video frame to obtain coordinates (x1, y 1); the transformation procedure of the target object is (x → x1, y → y 1);
after the target object is positioned, performing grabbing operation; after the grabbing and receiving, the intention of the old people is captured in real time, and the robot is moved.
The invention also provides an intention understanding system for the elderly accompanying robot, which comprises an acquisition module, a training module and a building calculation module;
the acquisition module is used for acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set;
the training module inputs the gesture data set into a trained neural network model to perform gesture recognition to obtain a gesture recognition probability set, and inputs the gesture data set into a trained hidden Markov model to perform gesture recognition to obtain a gesture recognition probability set;
the building calculation module is used for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting F1 scores under different intention classifications; and then determining the final recognition intention.
Further, the device also comprises a starting module;
the starting module is used for acquiring voice channel information and extracting keywords of the voice channel information to start the robot.
Further, the execution process of the training module is as follows:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are both segmented by adopting an Otsu algorithm to form an old age behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
Further, the building calculation module comprises a building module and a calculation module;
the process of building the module comprises the following steps: building an intention fusion model F ═ F (I, Cin, Hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively; carrying out vague change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C ═ Cconfi omicron Cin + Hconfi omichin; wherein o is called a composite evaluation operator;
the process of the calculation module is as follows: assigning values to the system using F1 scores under different intentions
Figure BDA0002682708190000041
As weight values under each intent classification of Cin; is assigned to
Figure BDA0002682708190000042
As the weight value under each intention classification of Hin; wherein
Figure BDA0002682708190000043
Based on
Figure BDA0002682708190000044
Obtaining n x1 dimensional weight matrix of Cin
Figure BDA0002682708190000045
Based on
Figure BDA0002682708190000046
N x1 dimensional weight matrix of Hin can be obtained
Figure BDA0002682708190000047
Cin, Hin, Cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ]12,…,λn]T(ii) a Selecting the maximum value gamma in the matrixi(ii) a The i intention is the final recognition intention of the user; wherein [ lambda ]12,…,λn]T=Cin×Cconfi+Hin×Hconfi。
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
the invention provides an intention understanding method and system for an elderly accompanying robot, wherein the method comprises the following steps: acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set; inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set; performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1 score under different intention classifications; and then determining the final recognition intention. Based on the intention understanding method for the old accompanying robot provided by the invention, an intention understanding system for the old accompanying robot is also provided. The invention provides a novel method for gesture recognition and posture recognition based on deep learning, and solves the key problems of poor recognition rate, low robustness, poor universality and the like in the traditional gesture recognition algorithm and posture recognition algorithm. Calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1 score; and then final recognition intentions are determined, the current intention understanding rate of the old accompanying robot system is improved, and the use satisfaction of the old to the social accompanying robot is improved.
Drawings
Fig. 1 is a flowchart of an intention understanding method for an elderly accompanying robot according to embodiment 1 of the present invention;
a schematic diagram of a CNN neural network recognition model framework is given in fig. 2;
an HMM gesture recognition model framework schematic diagram is given in FIG. 3;
FIG. 4 is a schematic diagram of a dual-model decision-level fusion algorithm according to embodiment 1 of the present invention;
fig. 5 is a diagram of a fusion model architecture for deep learning and statistical probability provided in embodiment 1 of the present invention;
FIG. 6 is a confusion matrix of multiple classification intentions corresponding to CNN and HMM, respectively, in embodiment 1 of the present invention;
fig. 7 is a schematic view of an intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
Example 1
According to the intention understanding method for the aged accompanying robot, which is provided by the embodiment 1 of the invention, through a fusion scheme of fusion of an intention recognition result set and a weight matrix, F1-score under each classification in a confusion matrix of sub-model recognition results is used as a weight value to form the weight matrix, and a fuzzy evaluation operator is established by referring to a fuzzy evaluation theory to fuse two model recognition results. Fig. 1 shows a flowchart of an intention understanding method for an elderly accompanying robot in embodiment 1 of the present invention.
In step S101, voice channel information is acquired, and a keyword of the voice channel information is extracted to start the robot. The invention carries out the keyword starting system of the voice capturing triggering system through the microphone of the robot, carries out the matching of word stock templates through using the Baidu voice recognition interface of the Baidu intelligent cloud, and starts the robot interaction system or carries out other man-machine interaction actions when capturing the preset keywords.
In step S102, a behavior image of the elderly person is obtained in real time, the behavior image includes a gesture image and gesture information, and the gesture image and the gesture information are both subjected to image segmentation to form a gesture data set and a gesture data set respectively.
Although there are many motion data sets for researchers, most of these motion data sets are mainly for middle-aged people and people of all ages to capture motion, but the behaviors of the elderly, the middle-aged people and people of all ages are obviously differentTherefore, based on the improvement of the intention recognition rate of the model for the behaviors of the old people, a plurality of old people are summoned to capture image data of several gestures and postures used for operating the robot, for example, 10-bit or 20-bit or more old people, image segmentation is carried out on the old people region in the collected image data after extraction and based on hands and body gestures, the image segmentation is generally used as the premise of image recognition, and the image is segmented by using the Otsu algorithm to form a gesture data set and a posture data set containing the gestures and the posture characteristics for the old people.
In step S103, the gesture data set is input to the trained neural network model for gesture recognition to obtain a gesture recognition probability set, and the gesture data set is input to the trained hidden markov model for gesture recognition to obtain a gesture recognition probability set.
The CNN neural network plays a better and better effect in gesture recognition. At present, static gesture recognition based on an interactive teaching platform reveals a relation between deep learning network training parameters and a model recognition rate. The gesture recognition based on the CNN neural network solves the key problems of poor recognition rate, low robustness, poor universality and the like in the traditional gesture recognition algorithm. A schematic diagram of a CNN neural network recognition model framework is given in fig. 2.
A Hidden Markov Model (HMM) is a statistical model that is used to describe a Markov process with hidden unknown parameters. The current HMM model can achieve an average arm motion recognition rate of 96%, and we input them into an HMM-based classifier for training and recognition according to the motion feature trajectory of the user. After similar HMM models are built, model training is carried out by using the old people behavior data set, and the HMM models capable of identifying the behavior intentions of the old people are obtained. An HMM gesture recognition model framework schematic is given in fig. 3.
Training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
In step S104, performing intent fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions of the gesture recognition probability set and the gesture recognition probability set under different intentions by using F1 scores under different intent classifications; and then determining the final recognition intention.
Building an intention fusion model F ═ F (I, Cin, Hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is the gesture recognition probability set.
Distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; assigning weight values to the Hin to form n x1 dimensional weight matrix Hconfi respectively.
Carrying out vague change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C ═ Cconfi omicron Cin + Hconfi omicron Hin. Where o is called a composite evaluation operator.
In the invention, the weight matrixes of two sub-models under different intention classifications are calculated, and the identification correctness of the two models under different intentions is evaluated in a mode of carrying out model evaluation by a multi-classification confusion matrix. The confusion matrix of the multi-classification task is a situation analysis table for summarizing the prediction result of the classification model in machine learning, and records in a data set are summarized in a matrix form according to two standards of real classification and classification judgment of the classification model prediction. The number of the confusion matrixes is counted, and the quality of the models is difficult to measure sometimes in the face of a large amount of data, so that the confusion matrixes extend 4 indexes on the basic statistical result, namely the accuracy, the precision, the recall rate and the specificity, and the results of the number in the confusion matrixes can be converted into the ratio between 0 and 1 through the four secondary indexes, so that the standardized measurement is facilitated. Expanding on the basis of the four indexes, another three-level index is generated, the index is called F1 Score (F1 Score), is an index used for measuring the accuracy of the two classification models in statistics, and gives consideration to the accuracy and the recall rate of the classification models, the F1 Score is a harmonic mean of the accuracy and the recall rate of the models, and the calculation formula is
Figure BDA0002682708190000081
The machine learning approach to multi-class problems, often using F1-score as the final measure, is consistent with the weight of each intent class as model fusion herein.
Assigning values to the system using F1 scores under different intentions
Figure BDA0002682708190000082
As weight values under each intent classification of Cin; is assigned to
Figure BDA0002682708190000083
As the weight value under each intention classification of Hin; wherein
Figure BDA0002682708190000084
Based on
Figure BDA0002682708190000085
Obtaining n x1 dimensional weight matrix of Cin
Figure BDA0002682708190000086
Based on
Figure BDA0002682708190000087
N x1 dimensional weight matrix of Hin can be obtained
Figure BDA0002682708190000088
Cin, Hin, Cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ]12,…,λn]T(ii) a Selecting the maximum value gamma in the matrixi(ii) a The subscript i intent ultimately identifies the intent for the user; wherein the fusion process is as follows: [ lambda ]12,…,λn]TCin × Cconfi + Hin × Hconfi. Fig. 4 is a schematic diagram of a dual-model decision-level fusion algorithm according to embodiment 1 of the present invention.
In the present invention, if the robot does not complete the specified action according to the final intention: starting target identification, and judging the distance of a specific obstacle by adopting image acquisition equipment; after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object into a video frame to obtain coordinates (x1, y 1); the transformation procedure of the target object is (x → x1, y → y 1); after the target object is positioned, performing grabbing operation; after the grabbing and receiving, the intention of the old people is captured in real time, and the robot is moved. And after the target object is handed to the old, the interaction is finished.
Fig. 5 is a diagram of a fusion model architecture for deep learning and statistical probability provided in embodiment 1 of the present invention. Firstly, the method obtains the most original voice channel information, sends the information into a preprocessing layer to perform voice system awakening and image information preprocessing, and sends the preprocessed information to a recognition layer, wherein the recognition layer comprises a CNN (hidden Markov model) trained by applying an aged people behavior data set (EIDS) and a Hidden Markov Model (HMM), two intention probability sets are obtained in real time through two submodels and provided for a model fusion layer, the real intention of a user is captured through a model fusion algorithm and is transmitted into an interactive behavior layer, and man-machine interaction is performed to finish the operation of the robot by the user to meet the requirements of the user. The system feeds back the user intention and carries out interactive action on the user intention through the pepper robot.
In the embodiment 1 of the invention, the intentions of the elderly are divided into four intentions of controlling the robot, namely advancing (I), stopping (II), turning left (III) and turning right (IV).
Each column of the confusion matrix represents a prediction class, the total number of each column represents the number of data predicted for that class, and each row represents the number of data predicted for that classIn the experimental process, 4 intention classifications are adopted for the true attribution category of the data, 200 sample data are totally obtained, the data are divided into 4 types and 50 sample data of each type, and a multi-classification confusion matrix of two sub models is respectively established. Fig. 6 is a multi-classification-intention confusion matrix corresponding to CNN and HMM respectively in embodiment 1 of the present invention. Then four indexes of the two confusion matrixes, namely accuracy, precision, sensitivity, recall rate and specificity, are calculated, and a formula is further utilized
Figure BDA0002682708190000091
F1-score under the two confusion matrices is calculated as the weight value under each intention of the submodel. The following table shows F1-SCORE for determining each intent of the sub-model.
Figure BDA0002682708190000092
Example 2
An intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention, as shown in fig. 7, is a schematic diagram of an intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention. The system comprises an acquisition module, a training module and a building calculation module.
The acquisition module is used for acquiring a behavior image of the old in real time, the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set.
The training module inputs the gesture data set into the trained neural network model to perform gesture recognition to obtain a gesture recognition probability set, and inputs the gesture data set into the trained hidden Markov model to perform gesture recognition to obtain a gesture recognition probability set.
Building a calculation module for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight ratios under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1 score under different intention classifications; and then determining the final recognition intention.
The system also includes a start module; the starting module is used for acquiring the voice channel information and extracting keywords of the voice channel information to start the robot.
The execution process of the training module is as follows: acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; and (4) segmenting the gesture image sample and the gesture information sample by adopting an Otsu algorithm to form an old age behavior feature set. Training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
The building calculation module comprises a building module and a calculation module; the process of building the module is as follows: building an intention fusion model F ═ F (I, Cin, Hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively; carrying out vague change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C ═ Cconfi omicron Cin + Hconfi omichin; where o is called a composite evaluation operator. The process of the calculation module is as follows: assigning T with F1 score under different intentionsC zAs weight values under each intent classification of Cin; is assigned to
Figure BDA0002682708190000105
As the weight value under each intention classification of Hin; wherein
Figure BDA0002682708190000101
Based on
Figure BDA0002682708190000102
Obtaining n x1 dimensional weight matrix of Cin
Figure BDA0002682708190000103
Based on
Figure BDA0002682708190000104
N x1 dimensional weight matrix of Hin can be obtained
Figure BDA0002682708190000111
Cin, Hin, Cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ]12,…,λn]T(ii) a Selecting the maximum value gamma in the matrixi(ii) a The i intention is the final recognition intention of the user; wherein [ lambda ]12,…,λn]T=Cin×Cconfi+Hin×Hconfi。
The invention improves the current intention understanding rate of the old accompanying robot system and improves the use satisfaction of the old to the social accompanying robot.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, the scope of the present invention is not limited thereto. Various modifications and alterations will occur to those skilled in the art based on the foregoing description. And are neither required nor exhaustive of all embodiments. On the basis of the technical scheme of the invention, various modifications or changes which can be made by a person skilled in the art without creative efforts are still within the protection scope of the invention.

Claims (10)

1. An intention understanding method for an elderly accompanying robot is characterized by comprising the following steps:
acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and performing image segmentation on the gesture image and the gesture information to respectively form a gesture data set and a gesture data set;
inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;
performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1 score under different intention classifications; and then determining the final recognition intention.
2. The method for understanding the intention of the elderly accompanying robot as claimed in claim 1, further comprising obtaining voice channel information and extracting keywords of the voice channel information to start the robot before the real-time obtaining of the behavior image of the elderly.
3. The method for understanding the intention of the elderly accompanying robot as claimed in claim 1, wherein the method for training the neural network model and the hidden markov model comprises:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are both segmented by adopting an Otsu algorithm to form an old age behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
4. The method for understanding the intention of the elderly accompanying and attending robot as claimed in claim 1, wherein the confusion matrix-based fusion algorithm performs intention fusion on the gesture recognition probability set and the gesture recognition probability set by:
building an intention fusion model F ═ F (I, Cin, Hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set;
distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively;
carrying out vague change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C ═ Cconfi omicron Cin + Hconfi omichin; where o is called a composite evaluation operator.
5. The method for understanding the intention of the elderly accompanying and attending robot as claimed in claim 4, wherein the method for calculating the weight ratio of the gesture recognition probability set and the gesture recognition probability set under different intentions when the gesture recognition probability set is fused by using F1 score under different intention classifications is:
assigning values to the system using F1 scores under different intentions
Figure FDA0002682708180000021
As weight values under each intent classification of Cin; is assigned to
Figure FDA0002682708180000022
As the weight value under each intention classification of Hin; wherein
Figure FDA0002682708180000023
Based on
Figure FDA0002682708180000024
Obtaining n x1 dimensional weight matrix of Cin
Figure FDA0002682708180000025
Based on
Figure FDA0002682708180000026
N x1 dimensional weight matrix of Hin can be obtained
Figure FDA0002682708180000027
Cin, Hin, Cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ]12,…,λn]T(ii) a Selecting the maximum value gamma in the matrixi(ii) a The i intention is the final recognition intention of the user; wherein [ lambda ]12,…,λn]T=Cin×Cconfi+Hin×Hconfi。
6. The intent understanding method for an elderly accompanying robot as claimed in claim 1, wherein when the robot does not complete the designated action according to the final intent:
starting target identification, and judging the distance of a specific obstacle by adopting image acquisition equipment;
after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object into a video frame to obtain coordinates (x1, y 1); the transformation procedure of the target object is (x → x1, y → y 1);
after the target object is positioned, performing grabbing operation; after the grabbing and receiving, the intention of the old people is captured in real time, and the robot is moved.
7. An intention understanding system for an elderly accompanying robot is characterized by comprising an acquisition module, a training module and a building calculation module;
the acquisition module is used for acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set;
the training module inputs the gesture data set into a trained neural network model to perform gesture recognition to obtain a gesture recognition probability set, and inputs the gesture data set into a trained hidden Markov model to perform gesture recognition to obtain a gesture recognition probability set;
the building calculation module is used for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting F1 scores under different intention classifications; and then determining the final recognition intention.
8. The elderly accompanying robot oriented intention understanding system of claim 7, further comprising a starting module;
the starting module is used for acquiring voice channel information and extracting keywords of the voice channel information to start the robot.
9. The elderly accompanying and attending robot oriented intention understanding system as claimed in claim 7, wherein the training module is implemented by the following steps:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are both segmented by adopting an Otsu algorithm to form an old age behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
10. The elderly accompanying robot oriented intention understanding system of claim 7, wherein the building calculation module comprises a building module and a calculation module;
the process of building the module comprises the following steps: building an intention fusion model F ═ F (I, Cin, Hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively; carrying out vague change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C ═ Cconfi omicron Cin + Hconfi omichin; where o is called a composite evaluation operator
The process of the calculation module is as follows: assigning values to the system using F1 scores under different intentions
Figure FDA0002682708180000031
As weight values under each intent classification of Cin; is assigned to
Figure FDA0002682708180000032
As the weight value under each intention classification of Hin; wherein
Figure FDA0002682708180000033
Based on
Figure FDA0002682708180000034
Obtaining n x1 dimensional weight matrix of Cin
Figure FDA0002682708180000035
Based on
Figure FDA0002682708180000036
N x1 dimensional weight matrix of Hin can be obtained
Figure FDA0002682708180000041
Cin, Hin, Cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ]12,…,λn]T(ii) a Selecting the maximum value gamma in the matrixi(ii) a The i intention is the final recognition intention of the user; wherein [ lambda ]12,…,λn]T=Cin×Cconfi+Hin×Hconfi。
CN202010970662.7A 2020-09-15 2020-09-15 Intention understanding method and system for elderly accompanying robot Active CN112101219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010970662.7A CN112101219B (en) 2020-09-15 2020-09-15 Intention understanding method and system for elderly accompanying robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010970662.7A CN112101219B (en) 2020-09-15 2020-09-15 Intention understanding method and system for elderly accompanying robot

Publications (2)

Publication Number Publication Date
CN112101219A true CN112101219A (en) 2020-12-18
CN112101219B CN112101219B (en) 2022-11-04

Family

ID=73759249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010970662.7A Active CN112101219B (en) 2020-09-15 2020-09-15 Intention understanding method and system for elderly accompanying robot

Country Status (1)

Country Link
CN (1) CN112101219B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112684711A (en) * 2020-12-24 2021-04-20 青岛理工大学 Interactive identification method for human behavior and intention
CN112766041A (en) * 2020-12-25 2021-05-07 北京理工大学 Method for identifying hand washing action of senile dementia patient based on inertial sensing signal
CN113705440A (en) * 2021-08-27 2021-11-26 华中师范大学 Head posture estimation method and system for visual understanding of educational robot
CN113780750A (en) * 2021-08-18 2021-12-10 同济大学 Medical risk assessment method and device based on medical image segmentation
CN113848790A (en) * 2021-09-28 2021-12-28 德州学院 Intelligent nursing type robot system and control method thereof
CN114092967A (en) * 2021-11-19 2022-02-25 济南大学 Real-time multi-mode accompanying robot intention understanding method and system
CN116028880A (en) * 2023-02-07 2023-04-28 支付宝(杭州)信息技术有限公司 Method for training behavior intention recognition model, behavior intention recognition method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593680A (en) * 2013-11-19 2014-02-19 南京大学 Dynamic hand gesture recognition method based on self incremental learning of hidden Markov model
CN105787471A (en) * 2016-03-25 2016-07-20 南京邮电大学 Gesture identification method applied to control of mobile service robot for elder and disabled
CN108986801A (en) * 2017-06-02 2018-12-11 腾讯科技(深圳)有限公司 A kind of man-machine interaction method, device and human-computer interaction terminal
US20190272764A1 (en) * 2018-03-03 2019-09-05 Act, Inc. Multidimensional assessment scoring using machine learning
WO2019204186A1 (en) * 2018-04-18 2019-10-24 Sony Interactive Entertainment Inc. Integrated understanding of user characteristics by multimodal processing
CN110554774A (en) * 2019-07-22 2019-12-10 济南大学 AR-oriented navigation type interactive normal form system
CN110717381A (en) * 2019-08-28 2020-01-21 北京航空航天大学 Human intention understanding method facing human-computer cooperation and based on deep stacking Bi-LSTM
CN111222341A (en) * 2020-01-16 2020-06-02 中国平安人寿保险股份有限公司 Method, device, equipment and storage medium for training hidden Markov model
CN111582108A (en) * 2020-04-28 2020-08-25 河北工业大学 Gait recognition and intention perception method
CN111596767A (en) * 2020-05-27 2020-08-28 广州市大湾区虚拟现实研究院 Gesture capturing method and device based on virtual reality

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593680A (en) * 2013-11-19 2014-02-19 南京大学 Dynamic hand gesture recognition method based on self incremental learning of hidden Markov model
CN105787471A (en) * 2016-03-25 2016-07-20 南京邮电大学 Gesture identification method applied to control of mobile service robot for elder and disabled
CN108986801A (en) * 2017-06-02 2018-12-11 腾讯科技(深圳)有限公司 A kind of man-machine interaction method, device and human-computer interaction terminal
US20190272764A1 (en) * 2018-03-03 2019-09-05 Act, Inc. Multidimensional assessment scoring using machine learning
WO2019204186A1 (en) * 2018-04-18 2019-10-24 Sony Interactive Entertainment Inc. Integrated understanding of user characteristics by multimodal processing
CN110554774A (en) * 2019-07-22 2019-12-10 济南大学 AR-oriented navigation type interactive normal form system
CN110717381A (en) * 2019-08-28 2020-01-21 北京航空航天大学 Human intention understanding method facing human-computer cooperation and based on deep stacking Bi-LSTM
CN111222341A (en) * 2020-01-16 2020-06-02 中国平安人寿保险股份有限公司 Method, device, equipment and storage medium for training hidden Markov model
CN111582108A (en) * 2020-04-28 2020-08-25 河北工业大学 Gait recognition and intention perception method
CN111596767A (en) * 2020-05-27 2020-08-28 广州市大湾区虚拟现实研究院 Gesture capturing method and device based on virtual reality

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHENGKUN CUI ET AL.: "A Multimodal Framework Based on Integration of Cortical and Muscular Activities for Decoding Human Intentions About Lower Limb Motions", 《IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS》 *
JUN LEI ET AL.: "Continuous action segmentation and recognition using hybrid convolutional neural network-hidden Markov model model", 《IET COMPUTER VISION》 *
JUNHONG MENG ET AL.: "A Method of Fusing Gesture and Speech for Human-robot Interaction", 《ICCDE 2020》 *
MING-HAO YANG ET AL.: "Data fusion methods in multimodal human computer dialog", 《虚拟现实与智能硬件》 *
张蕊 等: "基于 GA - BP 神经网络的接触式人机协作意图理解方法研究", 《组合机床与自动化加工技术》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112684711A (en) * 2020-12-24 2021-04-20 青岛理工大学 Interactive identification method for human behavior and intention
CN112684711B (en) * 2020-12-24 2022-10-11 青岛理工大学 Interactive recognition method for human behavior and intention
CN112766041A (en) * 2020-12-25 2021-05-07 北京理工大学 Method for identifying hand washing action of senile dementia patient based on inertial sensing signal
CN113780750A (en) * 2021-08-18 2021-12-10 同济大学 Medical risk assessment method and device based on medical image segmentation
CN113780750B (en) * 2021-08-18 2024-03-01 同济大学 Medical risk assessment method and device based on medical image segmentation
CN113705440A (en) * 2021-08-27 2021-11-26 华中师范大学 Head posture estimation method and system for visual understanding of educational robot
CN113705440B (en) * 2021-08-27 2023-09-01 华中师范大学 Head posture estimation method and system for visual understanding of educational robot
CN113848790A (en) * 2021-09-28 2021-12-28 德州学院 Intelligent nursing type robot system and control method thereof
CN114092967A (en) * 2021-11-19 2022-02-25 济南大学 Real-time multi-mode accompanying robot intention understanding method and system
CN116028880A (en) * 2023-02-07 2023-04-28 支付宝(杭州)信息技术有限公司 Method for training behavior intention recognition model, behavior intention recognition method and device

Also Published As

Publication number Publication date
CN112101219B (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN112101219B (en) Intention understanding method and system for elderly accompanying robot
CN107563494B (en) First-view-angle fingertip detection method based on convolutional neural network and heat map
CN110647612A (en) Visual conversation generation method based on double-visual attention network
CN105739688A (en) Man-machine interaction method and device based on emotion system, and man-machine interaction system
CN109271876B (en) Video motion detection method based on time evolution modeling and multi-example learning
CN109993102A (en) Similar face retrieval method, apparatus and storage medium
CN110781829A (en) Light-weight deep learning intelligent business hall face recognition method
CN111402928B (en) Attention-based speech emotion state evaluation method, device, medium and equipment
CN111339847A (en) Face emotion recognition method based on graph convolution neural network
CN107016046A (en) The intelligent robot dialogue method and system of view-based access control model displaying
CN111666845B (en) Small sample deep learning multi-mode sign language recognition method based on key frame sampling
CN109815920A (en) Gesture identification method based on convolutional neural networks and confrontation convolutional neural networks
CN112101243A (en) Human body action recognition method based on key posture and DTW
CN111444488A (en) Identity authentication method based on dynamic gesture
CN108537109B (en) OpenPose-based monocular camera sign language identification method
CN106127112A (en) Data Dimensionality Reduction based on DLLE model and feature understanding method
CN111128240B (en) Voice emotion recognition method based on anti-semantic-erasure
Zhang et al. Improvement of dynamic hand gesture recognition based on HMM algorithm
CN110796090A (en) Human-computer cooperation human behavior intention judging method based on cyclic neural network
Chaaban et al. Automatic annotation and segmentation of sign language videos: Base-level features and lexical signs classification
CN114495211A (en) Micro-expression identification method, system and computer medium based on graph convolution network
CN112580527A (en) Facial expression recognition method based on convolution long-term and short-term memory network
CN112308827A (en) Hair follicle detection method based on deep convolutional neural network
CN111695507A (en) Static gesture recognition method based on improved VGGNet network and PCA
Gao et al. Chinese fingerspelling sign language recognition using a nine-layer convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant