CN108921037A

CN108921037A - A kind of Emotion identification method based on BN-inception binary-flow network

Info

Publication number: CN108921037A
Application number: CN201810579049.5A
Authority: CN
Inventors: 卿粼波; 王露; 滕奇志; 何小海; 熊文诗; 吴晓红
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2018-06-07
Filing date: 2018-06-07
Publication date: 2018-11-30
Anticipated expiration: 2038-06-07
Also published as: CN108921037B

Abstract

The individual Emotion identification method based on posture information that the present invention provides a kind of relates generally to the mood that individual is judged using deep learning technique study individual posture.This method includes：It is firstly introduced into and the static state and behavioral characteristics of list entries is extracted by the study to original image and light stream image based on BN-inception binary-flow network model；Then spatial pyramid pond (Space Pyramid Pooling, SPP) is added on the basis of binary-flow network, image is made to input network with original size, to reduce because being influenced caused by deforming model performance.The invention firstly uses binary-flow networks to carry out the study of space-time characteristic to list entries, and introduces pyramid pond, to retain the raw information of video frame, enables network effectively to learn the feature of individual posture mood, obtains higher discrimination.

Description

A kind of Emotion identification method based on BN-inception binary-flow network

Technical field

The present invention relates to the Emotion identification problems in deep learning field, are based on BN-inception more particularly, to one kind The individual mood analysis method of+SPP binary-flow network.

Background technique

Mood is the feeling for combining people, a kind of state of thought and act, is played in the exchange of person to person important Effect.Its emotional state can be judged generally according to people's facial expression, but in certain specific environment, such as monitoring view In the presence of when blocking, we can not necessarily obtain clearly face facial expression for angle, face.In fact, people True emotional not only relies on human face expression to be expressed, and individual limb action can also express certain emotional information.Therefore, Research of the invention is concentrated mainly on the Emotion identification of the individual posture based on video.

Emotion identification is important research content and direction in computer vision field, the International Periodicals of current many authoritys It is equipped with relevant theme and content with top-level meeting, and many external elite schools have also opened up relevant course.It is traditional based on view The Emotion identification method of frequency relies primarily on the feature manually chosen, this method take time and effort and the model parameter that obtains it is extensive Performance is poor, serves the limitation of Emotion identification.Deep learning is an important component of artificial intelligence field development, The very popular research direction of artificial intelligence field is had become in recent years.It is in many fields (such as image recognition, speech recognition Deng) very big breakthrough is all achieved, higher discrimination and generalization ability have especially been achieved in video analysis.Therefore originally Patent utilizes advantage of the deep learning in video analysis, studies Emotion identification individual in video.

Just grown up based on the Emotion identification of posture information in recent years, relevant research is less, focuses primarily upon biography The research of system algorithm.Li et al.^[1]It proposes and a kind of raw skeleton coordinate and skeleton motion is utilized to carry out Activity recognition and classification； Piana etc.^[2]It is proposed a mood automatic identification model and system from all-around exercises, it is used to help autism children Association identifies and is showed emotion by all-around exercises.Equally also someone by the motion feature of human body attitude and advanced kinematics Geometrical characteristic is combined, and is carried out cluster and is classified.Crenn etc.^[3]It is low that operation data etc. is obtained using the 3D frame sequence of people Then feature decomposition is geometrical characteristic, motion feature and three kinds of Fourier's feature, calculates the member of these low-level features by layer feature Feature (such as mean value, standard deviation) finally uses classifier by first tagsort.Deep learning regardless of recognition time still It in accuracy, is compared with the traditional method and all improves a lot, but due to the shortage of mood data collection relevant to posture, adopt It is also seldom that the individual Emotion identification correlative study based on posture information is carried out with deep learning.

Summary of the invention

The individual Emotion identification method based on posture that the object of the present invention is to provide a kind of, will be in deep learning and video Human body attitude combines, and makes full use of the superiority of BN-inception+SPP network structure, while introducing binary-flow network structure The individual Emotion identification based on video is carried out, effectively learns the emotional characteristics of individual posture, obtains higher discrimination.

For convenience of explanation, it is firstly introduced into following concept：

Optical flow method：It is a kind of expression way of simple and practical image motion, is normally defined in an image sequence The apparent motion of brightness of image mode, i.e., the movement velocity of the point on space object surface is on the imaging plane of visual sensor Expression.

Convolutional neural networks：A kind of multilayer feedforward neural network, every layer is made of multiple two-dimensional surfaces, the mind of each plane It works independently through member, convolutional neural networks include convolutional layer and pond layer.

Double-current convolutional neural networks：For video behavioural characteristic extraction and design, network with single frames RGB original image and The light stream image obtained based on video data is inputted respectively as two, is indicated with realizing the apparent information in object of action space And the extraction of action process temporal aspect.

Spatial pyramid pond (Space Pyramid Pooling, SPP)：It is to be composed of multiple down-sampled layers, it The fixed feature vector of a length can be converted to dividing to input feature vector figure from thick to thin, and characteristic pattern, So SPP layers can extract various local messages.

The present invention specifically adopts the following technical scheme that：

Propose the individual Emotion identification method based on BN-inception+SPP binary-flow network, the main feature of this method It is：

1. individual attitude data collection is divided into four mood classifications：Boring (bored), exciting (excited) is angry (frantic), loosen (relaxed)；

2. spatial pyramid pond (Space is added before the full articulamentum of BN-inception binary-flow network Pyramid Pooling, SPP), carry out the training of time-space network respectively to data set；

This method mainly includes the following steps that：

(1) by individual posture sequence data collection, four mood classifications are divided into：Boring, excitement is angry, loosens；

(2) the corresponding light stream image sequence of data set is generated using the optical flow algorithm of document [4], indicates the fortune of individual posture Dynamic feature；

(3) raw data set and optical flow data collection are proportionally divided into training set, verifying collection and test set respectively；

(4) the double-current convolutional neural networks model based on BN-inception is introduced, and is added before its full articulamentum SPP layers of optimization BN-inception network, using training set and verifying collection carry out time-space network training, using test set into Row verifying；

(5) by based on BN-inception+SPP spatial flow and two channel network of time flow carry out average fusion, obtain Accuracy ACC (Accuracy) and macro accuracy of the mean MAP (Macro Average Precision) on test set.

Detailed description of the invention

Fig. 1 is that the present invention is based on the individual Emotion identification general frame schematic diagrames of BN-inception+SPP binary-flow network.

Fig. 2-a~Fig. 2-b is the accuracy confusion matrix that the present invention obtains on test set when SPP layers are not added, In, 2-a is the test matrix of spatial flow BN-inception network, and 2-b is the test square of time flow BN-inception network Battle array.

Fig. 3-a~Fig. 3-b is the accuracy confusion matrix that the present invention obtains on test set when being added SPP layers, wherein 3-a is the test matrix of spatial flow BN-inception+SPP network, and 3-b is time flow BN-inception+SPP network Test matrix.

Fig. 4 is that the spatial flow by based on BN-inception+SPP and two channel network of time flow of the invention averagely merge Afterwards, the ACC and MAP on test set are obtained.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and embodiments, it is necessary to, it is noted that below Embodiment is served only for that the present invention is described further, should not be understood as limiting the scope of the invention, fields Personnel be skillful at according to foregoing invention content, some nonessential modifications and adaptations are made to the present invention and are embodied, Protection scope of the present invention should be still fallen within.

In Fig. 1, a kind of individual Emotion identification method based on BN-inception+SPP binary-flow network, including following step Suddenly：

(1) it first after obtaining the individual data items collection in public space, is generated using the optical flow algorithm of document [4] former The light stream image sequence of beginning data set indicates the motion feature of individual posture；

(2) raw data set and obtained optical flow data collection are respectively divided into test set in proportion, verifying collects and training Collect three parts, and gives corresponding mood classification；

(3) remove SPP layers shown in FIG. 1, the data of training set and verifying collection inputted time-space network respectively and learn, Training pattern is obtained, test is carried out using the data of test set and verifies its effect；

(4) SPP layers are added, training set is inputted into time-space network with original size respectively and is learnt, training pattern is obtained, Test, which is carried out, using the data of test set collection verifies its effect；

(5) it by the spatial flow based on BN-inception+SPP and after two channel network of time flow averagely merges, is surveyed ACC and MAP on examination collection；

The present invention is separately trained the convolutional neural networks in two channels of spatial flow and time flow using Caffe, leads to It crosses experiment the parameter of time flow and space flow network is arranged, as shown in table 1.Due to the individual posture mood number of foundation It is less according to the sample size of collection, to prevent over-fitting, Dropout layers of side is added using data extending and in a network Method.

The setting of 1 training parameter of table

Bibliography：

[1]Li C,Zhong Q,Xie D,et al.Skeleton-based Action Recognition with Convolutional Neural Networks[J]. 2017:597-600.

[2]Piana S,StaglianòA,Odone F,et al.Adaptive Body Gesture Representation for Automatic Emotion Recognition[J].ACM Transactions on Interactive Intelligent Systems(TiiS),2016,6(1):6.

[3]Crenn A,Khan R A,Meyer A,et al.Body Expression Recognition from Animated 3D Skeleton[C]// International Conference on 3D Imaging.IEEE,2017:1- 7.

[4]Brox T,Bruhn A,Papenberg N,et al.High Accuracy Optical Flow Estimation Based on A Theory for Warping[C]//European Conference on Computer Vision(ECCV),2004:25-36.

Claims

1. a kind of individual Emotion identification method based on BN-inception+SPP binary-flow network, it is characterised in that：

A. individual attitude data collection is divided into four mood classifications：Boring (bored), exciting (excited) is angry (frantic), loosen (relaxed)；

B. spatial pyramid pond (Space Pyramid is added before the full articulamentum of BN-inception binary-flow network Pooling, SPP), carry out the training of time-space network respectively to data set；

This method mainly includes the following steps that：

(1) using the corresponding light stream image sequence of optical flow algorithm processing data set generation of document [1], the fortune of individual posture is indicated Dynamic feature；

(2) data set is divided into training set, verifying collection and test set three parts, and gives the mood classification of each sequence；

(3) the double-current convolutional neural networks model based on BN-inception is introduced, and SPP layers are added before its full articulamentum Optimize BN-inception network, the training of time-space network is carried out using training set and verifying collection, is verified using test set；

(4) by based on BN-inception+SPP spatial flow and two channel network of time flow carry out average fusion, tested Accuracy ACC (Accuracy) and macro accuracy of the mean MAP (Macro Average Precision) on collection.

2. the individual Emotion identification method based on BN-inception+SPP binary-flow network as described in claim 1, feature It is respectively to learn the space-time characteristic of data set using binary-flow network in step (3).

3. as described in claim 1 based on the Emotion identification method of BN-inception binary-flow network, it is characterised in that in step Suddenly SPP layers are added in (3) first before the full articulamentum of BN-inception binary-flow network, so that training set is with original size Network is inputted, avoids fixed input size bring motion information from losing, time-space network is then carried out respectively to data set again Training.