CN112529054B - Multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data - Google Patents

Multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data Download PDF

Info

Publication number
CN112529054B
CN112529054B CN202011355627.0A CN202011355627A CN112529054B CN 112529054 B CN112529054 B CN 112529054B CN 202011355627 A CN202011355627 A CN 202011355627A CN 112529054 B CN112529054 B CN 112529054B
Authority
CN
China
Prior art keywords
data
learner
layer
neural network
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011355627.0A
Other languages
Chinese (zh)
Other versions
CN112529054A (en
Inventor
杨宗凯
廖盛斌
王小丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202011355627.0A priority Critical patent/CN112529054B/en
Publication of CN112529054A publication Critical patent/CN112529054A/en
Application granted granted Critical
Publication of CN112529054B publication Critical patent/CN112529054B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a multi-dimensional convolutional neural network learner modeling method for multi-source heterogeneous data. The method comprises the steps of synchronously acquiring eye movement data, voice data and video data of a learner; preprocessing the eye movement data, the voice data and the video data; training a multi-dimensional convolutional neural network, and respectively inputting the heat point diagram, the spectrogram and the human body posture image to be recognized into the multi-dimensional convolutional neural network with the same structure for feature extraction to respectively obtain output classification results; and performing space-time multi-dimensional feature modeling analysis by combining the three output classification results. The invention realizes modeling of the learner by using multi-source heterogeneous data, can integrate and analyze the learning state of the learner from different learning sources, and the mode is more in line with the learning essence of the learner; the learner is subjected to omnibearing three-dimensional modeling from characteristics such as emotion, cognition, interaction and the like, and the real learning state of the learner can be represented.

Description

Multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data
Technical Field
The application relates to the technical field of education informatization, in particular to a multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data.
Background
With the development of education informatization, the construction of an individual chemistry learner model is the key of intelligent education, wherein big data and artificial intelligence technology are the basis for constructing a learner model, potential information of the data can be mined through deep modeling research on a learner, rules and mechanisms of learner emotion, cognition, knowledge construction modes and the like can be mined, and education services are further improved.
For learner data collection, researchers are biased to collect personal basic information and behavior information of learners before artificial intelligence, and with the application and popularization of artificial intelligence technology and an online learning platform, people start to collect all-around data about learners, and the data is utilized to restore the real learning state of learners in the learning environment to the maximum extent. Such full-range data includes video, audio, biological data, etc., wherein information of brain, skin and heart is an important source of biological data of learners, and the biological data frequently used includes heart rate, electroencephalogram, electrodermal activity, etc.
Convolutional Neural Networks (CNNs) are a research hotspot in the field of artificial intelligence and are also the basis of computer vision, a Convolutional Neural network is a typical hierarchical structure, and classifies and identifies data by automatically extracting features layer by layer, a 2D Convolutional Neural network is commonly used for classifying static pictures, but the 2D network has certain limitation on extracting characteristics of serialized data such as videos and voices, because the 2D network cannot identify the time relationship between each piece of data in a sequence, and a 3D Convolutional Neural network can capture the correlation between sequence data, but because the Convolutional kernel of the Convolutional Neural network has one more time dimension than that of the 2D Convolutional network, the network has more network parameters, the calculation consumption is increased, and the calculation is slower.
Chinese patent application No. 201710049075.2 discloses an emotion classification method based on facial expressions, learning scores and voice data, which is used for evaluating the class mastering degree of students. The method comprises the steps of classifying multi-modal data by using a convolutional neural network, wherein each data label is dysphoria, pleasure and calm, finally fusing classification results by using a Gaussian mixture model to obtain a final classification result so as to predict the emotion of a student, and judging the learning state of the student according to the emotion. However, the method has some defects that the multi-modal data is used for modeling students, but the characteristics of the selected students are single, the classroom grasping condition is not accurately analyzed only from the perspective of the emotion of the students, and deeper information of the data cannot be fully mined.
Chinese patent application No. 201910056952.8 discloses a method for modeling cognitive ability of a student and recommending personalized courses based on a cognitive diagnosis model, firstly, courses are modeled quantitatively, then, the cognitive ability of the student is modeled according to the learning condition of the student, and finally, personalized course recommendation is carried out on the student.
In summary, the existing learner modeling method mainly has the following disadvantages: 1. modeling learners focuses on single structural data, which does not accurately characterize the learner. 2. The learner has more attention to the characteristics of the learner, namely the knowledge level, the cognitive ability and other characteristics expressing intelligence of the learner, the non-intelligence characteristics of the student, such as emotion, interaction and the like, have less attention, most of the existing researches focus on single characteristics, and the comprehensive analysis modeling research on a plurality of characteristics is less.
Disclosure of Invention
In order to solve the above problem, the embodiment of the present application provides a multi-dimensional convolutional neural network learner modeling method for multi-source heterogeneous data. The method integrates two-dimensional convolution and three-dimensional convolution, compared with the method of singly using three-dimensional convolution, the method has the advantages of less parameter quantity, higher calculation speed and reduced calculation consumption, and meanwhile, the method takes multi-source heterogeneous data as input and can carry out multi-level and multi-angle modeling on the characteristics of the learner such as emotion, interaction, cognition and the like. The modeling precision of the learner is further improved.
In a first aspect, an embodiment of the present application provides a multi-dimensional convolutional neural network learner modeling method for multi-source heterogeneous data, where the method includes:
(1) Synchronously acquiring eye movement data, voice data and video data of a learner;
specifically, the eye movement data of the learner is collected through an eye movement instrument, the voice data of the learner is collected through a microphone or a professional recording pen, the video data of the learner is collected through a camera, and the eye movement data, the voice data and the video data are collected synchronously.
(2) Preprocessing the eye movement data, the voice data and the video data to respectively obtain a heat point diagram corresponding to the eye movement data, a spectrogram corresponding to the voice data and a human body posture image corresponding to the video data;
specifically, preprocessing collected learner eye movement data, voice data and video data, wherein a hotspot graph about each frame of a video is generated on professional software by using the learner eye movement data and the video data; the learner video data processing mode is to extract video frames, generate human body posture images according to each frame and finally obtain an image sequence; the learner voice data preliminary processing is to generate a spectrogram according to the audio information; the obtained data are encoded according to time sequence, and the encoding format is as follows: frame000001, frame0000002 … …, the heat point diagram, the sound spectrogram and the human body posture image are in one-to-one correspondence.
(3) Setting a label for the hotspot graph based on the cognitive state classification of the learner, setting a label for the spectrogram based on the interactive state classification of the learner, and setting a label for the human posture image based on the emotional state classification of the learner;
specifically, the hotspot graph corresponds to the cognitive state of the learner, and the labels are set to be difficult, non-participatory and easy; the spectrogram reflects the interaction level of a learner, and the labels of the spectrogram are set to be high-rising tone and low-depression tone according to the tone; the human body posture image represents the emotional state of the learner, and the label is set to be interested, confused, stressed, boring and relaxed.
(4) Training a multi-dimensional convolutional neural network, and respectively inputting the heat point diagram, the spectrogram and the human body posture image to be recognized into the multi-dimensional convolutional neural network with the same structure for feature extraction to respectively obtain output classification results;
(5) And performing space-time multi-dimensional feature modeling analysis by combining the three output classification results.
Preferably, the heat point diagram, the spectrogram and the human body posture image are one-to-one corresponding serialized data;
in the step (4), the feature extraction of the multi-dimensional convolutional neural network with the same input structure includes:
averagely dividing the serialized data into K segments { S ] according to time sequence 1 ,S 2 ,S 3 ……S K };
And (3) carrying out equal-probability random sampling on N images of each section of the serialized data, inputting K x N data serving as input data into a multi-dimensional convolutional neural network, and processing according to the following formula:
m(T 1 ,T 2 ,T 3 ……T K )=(f(T 1 ,W),f(T 2 ,W),f(T 3 ,W)……f(T K ,W))
M(T 1 ,T 2 ,T 3 ……T K )=H(F(m(T 1 ,T 2 ,T 3 ……T K ),W 1 ))
wherein, m (T) 1 ,T 2 ,T 3 ……T K ) Representing the extraction of data by a two-dimensional convolutional neural network, T 1 ,T 2 ,T 3 ……T K Is a sequence of N pictures, each sequence being from segment S 1 ,S 2 ,S 3 ……S K Get f (T) by random sampling K W) two-dimensional convolution layer with parameter W, M (T) 1 ,T 2 ,T 3 ……T K ) Represents the final prediction result of the network, F (m (T) 1 ,T 2 ,T 3 ……T K ),W 1 ) The expression parameter is W 1 The three-dimensional convolutional neural network carries out feature extraction on the data, and H represents a SoftMax function.
Preferably, the training of the multidimensional convolutional neural network in the step (4) includes:
forming a multi-dimensional convolutional neural network by a 2D network and a 3D network, the multi-dimensional convolutional neural network comprising an input layer, two-dimensional convolutional layers, three-dimensional convolutional layers, a maximum pooling layer, an average pooling layer, a BatchNorm layer, and a SoftMax classification layer, the BatchNorm layer following each convolutional layer;
the input layer inputs a sample heat point diagram, a sample spectrogram or a sample human body posture image into the two-dimensional convolutional layer and the maximum pooling layer to obtain static characteristics of input data;
the static features are expanded according to time dimension and then input into the three-dimensional convolutional layer and the maximum pooling layer, wherein the last pooling layer is an average pooling layer, and dynamic information of the input data is obtained;
calculating the error between the classification result output by the SoftMax classification layer and the actual classification, calculating the gradient of each layer of parameters according to the calculated error back propagation, adjusting the parameters connected with each layer according to the gradient, circulating the error back propagation until each layer of parameters reaches the minimum point of the classification output error, and stopping iteration.
Preferably, the error is calculated by:
Figure GDA0004035562240000041
wherein, l (x) i ) A tag value, p (x), representing the ith input data i ) Is a predicted value obtained by the ith input data after passing through a convolutional network, loss (x) i ) The loss function is represented.
Preferably, the method for calculating the gradient of each layer parameter according to the calculated error back propagation includes:
Figure GDA0004035562240000042
wherein L represents an error obtained after training sample data,
Figure GDA0004035562240000043
is a convolution kernel parameter of l layers, <' >>
Figure GDA0004035562240000044
Represents the convolved output, <' > or>
Figure GDA0004035562240000045
Is to activate the result after convolution, delta l Represents the gradient of the error to the convolution kernel parameters, η is the learning rate.
Preferably, the unfolding manner of unfolding the static features according to the time dimension is as follows:
[B*S,C,H,W]→[B,C,S,H,W]
where B denotes the batch size, S denotes the number of input images, and C, H, and W denote the number of channels and the height and width of the feature map, respectively.
Preferably, the performing spatiotemporal multidimensional feature modeling analysis by combining the three output classification results in the step (5) includes:
classifying the learner into three dimensions according to characteristics according to the output classification result, wherein the dimensions comprise a cognitive dimension, an interaction dimension and an emotion dimension;
sending suggestion information to the learner based on the classification result of the cognitive dimension;
sending interaction reminding information to the learner based on the classification result of the interaction dimension;
and sending learning difficulty adjustment information to the learner based on the classification result of the emotion dimension.
Specifically, the input of the three networks are a hotspot graph, a spectrogram and a human body posture image respectively, the hotspot graph can reflect the cognitive process of a learner, and the obtained label of the hotspot graph is set to be difficult, non-participatory and easy; the tone in the voice data is a specific reflection of the internal emotion and emotion, so that the label of the spectrogram is set to be high tone and low tone, wherein the high tone indicates that a learner already masters the learned knowledge and is full of confidence and active interaction, and the low tone indicates that the learner is full of confusion and unwilling of interaction on the learned knowledge; different human postures of learners reflect different emotional states, and the human posture graph is labeled as five types of interest, confusion, pressure, boring and relaxation.
According to the final classification results of the three networks, the learner is subjected to space-time multi-dimensional modeling analysis, the learner is divided into three dimensions according to characteristics, namely cognitive, interactive and emotional dimensions, a learner model facing the cognitive-interactive-emotional three dimensions is formed, the cognitive development condition of the learner is effectively analyzed through effective extraction of eye movement data information of the learner in the cognitive dimension of the learner, reasonable suggestions are given to different classification results, the interaction condition of students is analyzed through voice data in the interactive dimension, the learner is reminded to actively interact, the emotional change of the learner in a certain time is analyzed in the emotional dimension, the learning difficulty is adjusted according to the emotional state, the internal cognitive structure of the learner is accurately represented through multi-dimensional and comprehensive modeling analysis of the learner, and support is provided for formulation of an accurate teaching strategy.
The invention has the beneficial effects that: (1) The modeling is carried out on the learner by using the multi-source heterogeneous data, the learning state of the learner can be fused and analyzed from different learning sources, and the mode is more consistent with the learning essence of the learner.
(2) The learner is subjected to omnibearing three-dimensional modeling from characteristics such as emotion, cognition, interaction and the like, and the real learning state of the learner can be represented.
(3) The convolution neural network is constructed by fusing two-dimensional convolution and three-dimensional convolution, so that the three-dimensional convolution can be utilized to extract the characteristics of data in time dimension, and the two-dimensional convolution is added into the network, thereby effectively reducing the training time and reducing the calculation overhead.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flowchart of a multi-dimensional convolutional neural network learner modeling method for multi-source heterogeneous data according to an embodiment of the present application;
fig. 2 is a schematic diagram illustrating an example of a multidimensional convolution network according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a 2D network according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a 3D network provided in an embodiment of the present application;
fig. 5 is an exemplary schematic diagram of a subnetwork Inc in a 2D network according to an embodiment of the present application;
fig. 6 is a schematic diagram illustrating an example of a learner modeling process according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
In the following description, the terms "first" and "second" are used for descriptive purposes only and are not intended to indicate or imply relative importance. The following description provides embodiments of the invention, which may be combined with or substituted for various embodiments, and the invention is thus to be construed as embracing all possible combinations of the same and/or different embodiments described. Thus, if one embodiment includes the feature A, B, C and another embodiment includes the feature B, D, the invention should also be considered to include embodiments that include one or more of all other possible combinations of A, B, C, D, although this embodiment may not be explicitly recited in text below.
The following description provides examples, and does not limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For example, the described methods may be performed in an order different than the order described, and various steps may be added, omitted, or combined. Furthermore, features described with respect to some examples may be combined into other examples.
The technical idea of the application is as follows: the method comprises the steps of collecting eye movement data, voice data and video data of a learner through equipment such as an eye movement instrument, a microphone, a camera and the like, generating a corresponding heat point diagram, a sound spectrogram and a human body posture image, constructing a deep convolution neural network by using a two-dimensional convolution and a three-dimensional convolution fusion mode, firstly extracting static characteristics of input data in a space dimension by using the two-dimensional convolution, then extracting dynamic characteristics of the static characteristics in a time dimension and a space dimension by using the three-dimensional convolution, independently operating the three convolution neural networks, carrying out space-time multi-dimensional modeling analysis on obtained classification results and proposing opinions or suggestions.
Referring to fig. 1, fig. 1 is a schematic flowchart of a multi-dimensional convolutional neural network learner modeling method for multi-source heterogeneous data according to an embodiment of the present application.
Illustratively, the method comprises the steps of:
(1) The voice data, the video data and the eye movement data of the learner are acquired by utilizing equipment such as a microphone, a camera, an eye movement instrument and the like, and all data are acquired synchronously.
(2) And converting the collected voice data, video data and eye movement data into a spectrogram, a human body posture graph and a hot spot graph, setting the size of the graph to be 224 × 224, and sequentially coding all the graphs according to a time sequence.
(3) And sampling the obtained data by using a sparse sampling strategy, and inputting the obtained image data with a fixed size into the three multi-dimensional convolutional neural networks as input values, wherein the structures of the networks are the same, and refer to the attached figure 2.
The structure of the multidimensional convolutional neural network of the embodiment of the invention is shown in fig. 2, the network is composed of 2DNets and 3DNets, the structure of the 2DNets network is shown in fig. 3, the 2DNets is composed of two convolutional layers and an Inc module, the structure of the Inc module can be shown in fig. 4 with reference to the network structure of 5,3D, the network structure is a residual error structure composed of 12 three-dimensional convolutions, as shortcut connections and residual error networks can effectively avoid the degradation problem, the input of the multidimensional convolutional network is composed of 3 pictures obtained by sparse sampling, the data is firstly divided into 3 sections according to time sequence, then 1 picture is randomly sampled from the three sections to form picture data with 9 channels, therefore, the format of the input data is 9 × 224, a small-batch descent algorithm is adopted to convert the data format into [ B × 52 zxft 3252 ], B represents the size of each batch of data, the output of the network after passing through the 2D network is 3532B, and finally the output of the network is classified as max [ B × 32 ], and the final input of the network is finally classified as max [ 3 × 32 ] after 2D network is subjected to be subjected to expansion and the last step of the input of the software.
The feature extraction process of each layer is described in detail below with reference to fig. 3, 4 and 5:
2DNets, namely the input data format of the network is [9,224,224], then the format is set to [ B x 3,3,224,224] and the input data is input into a first convolutional layer and a pooling layer, the convolutional layer uses 64 7*7 convolutional cores to check the input data and extract characteristics, the pooling layer has a convolutional kernel of 3*3, and the output data format is [ B x 3,64,56,56]; the size of convolution kernels of the second layer of convolution and pooling is 3*3, the number of convolution kernels is 192, and the output data format is [ B × 3,192,28,28]; and then inputting the product into an Inc module, wherein the Inc module consists of 1*1 convolution kernels and 3*3 convolution kernels, the number of the convolution kernels of each Inc module is different, the output of the first Inc module is [ B + 3,256,28,28], and the output of the second Inc module is [ B + 3,256,28,28]
And [ B3,320,28,28 ], inputting the convolution kernels into two convolution layers, wherein the convolution kernels are 1*1 and 3*3 respectively, and the final output of the network is [ B3,96,28,28 ].
3DNets, wherein the network consists of 6 residual blocks, each residual block comprises two three-dimensional convolution layers, convolution kernels of the three-dimensional convolution layers are uniformly set to be 3 x 3, and the number of feature maps finally output by each residual block is 128, 256, 512 and 512 respectively; the result of 2DNets is expanded in time dimension to [ B,96,3,28,28] input network, and the final output is [ B,512,1,1,1].
(4) And inputting the network output value into a SoftMax layer to obtain a final classification result.
(5) And (3) performing space-time multi-dimensional feature modeling analysis, and referring to the attached figure 6.
According to the final classification results of the three networks, performing space-time multi-dimensional modeling analysis on the learner, which specifically comprises the following steps: the learner is divided into three dimensions according to characteristics, namely cognition dimension, interaction dimension and emotion dimension, a learner model facing to the cognition dimension, the interaction dimension and the emotion dimension is formed, in the cognition dimension of the learner, the cognitive development condition of the learner is effectively analyzed through effective extraction of eye movement data information of the learner, reasonable suggestions are given to different classification results, in the interaction dimension, the interaction condition of students is analyzed through voice data, the learner is reminded of actively interacting, in the emotion dimension, the emotion change of the learner within a certain time is analyzed, the learning difficulty is adjusted according to the emotion state, and through multi-dimensional and comprehensive modeling analysis of the learner, the internal cognitive structure of the learner is more accurately represented, and support is provided for formulation of an accurate teaching strategy.
The above description is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure should not be limited thereby. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (5)

1. A multi-dimensional convolutional neural network learner modeling method of multi-source heterogeneous data, the method comprising:
(1) Synchronously acquiring eye movement data, voice data and video data of a learner;
(2) Preprocessing the eye movement data, the voice data and the video data to respectively obtain a heat point diagram corresponding to the eye movement data, a spectrogram corresponding to the voice data and a human body posture image corresponding to the video data;
(3) Setting a label for the hotspot graph based on the cognitive state classification of the learner, setting a label for the spectrogram based on the interactive state classification of the learner, and setting a label for the human posture image based on the emotional state classification of the learner;
(4) Training a multi-dimensional convolutional neural network, and respectively inputting the heat point diagram, the spectrogram and the human body posture image to be recognized into the multi-dimensional convolutional neural network with the same structure for feature extraction to respectively obtain output classification results;
the heat point diagram, the spectrogram and the human body posture image are serialized data which correspond to each other one by one;
in the step (4), the feature extraction of the multi-dimensional convolutional neural network with the same input structure includes:
averagely dividing the serialized data into K segments { S ] according to time sequence 1 ,S 2 ,S 3 ……S K };
And (3) carrying out equal-probability random sampling on N images of each section of the serialized data, inputting K x N data serving as input data into a multi-dimensional convolutional neural network, and processing according to the following formula:
m(T 1 ,T 2 ,T 3 ……T K )=(f(T 1 ,W),f(T 2 ,W),f(T 3 ,W)……f(T K ,W))
M(T 1 ,T 2 ,T 3 ……T K )=H(F(m(T 1 ,T 2 ,T 3 ……T K ),W 1 ))
wherein, m (T) 1 ,T 2 ,T 3 ……T K ) Representing the extraction of data by a two-dimensional convolutional neural network, T 1 ,T 2 ,T 3 ……T K Is a sequence of N pictures, each sequence being from segment S 1 ,S 2 ,S 3 ……S K Get f (T) by random sampling K W) two-dimensional convolution layer with parameter W, M (T) 1 ,T 2 ,T 3 ……T K ) Represents the final prediction result of the network, F (m (T) 1 ,T 2 ,T 3 ……T K ),W 1 ) The expression parameter is W 1 The three-dimensional convolution neural network carries out feature extraction on the data, and H represents a SoftMax function;
the training multidimensional convolutional neural network in the step (4) comprises the following steps:
forming a multi-dimensional convolutional neural network by a 2D network and a 3D network, the multi-dimensional convolutional neural network comprising an input layer, two-dimensional convolutional layers, three-dimensional convolutional layers, a maximum pooling layer, an average pooling layer, a BatchNorm layer, and a SoftMax classification layer, the BatchNorm layer following each convolutional layer;
the input layer inputs a sample heat point diagram, a sample spectrogram or a sample human body posture image into the two-dimensional convolutional layer and the maximum pooling layer to obtain static characteristics of input data;
the static features are expanded according to time dimension and then input into the three-dimensional convolutional layer and the maximum pooling layer, wherein the last pooling layer is an average pooling layer, and dynamic information of the input data is obtained;
calculating an error between a classification result output by the SoftMax classification layer and an actual classification, calculating a gradient of each layer of parameters according to error back propagation obtained by calculation, adjusting the parameters connected with each layer according to the gradient, circulating the error back propagation until each layer of parameters reaches a classification output error minimum point, and stopping iteration;
(5) And performing space-time multi-dimensional feature modeling analysis by combining the three output classification results.
2. The method of claim 1, wherein the error is calculated by:
Figure FDA0004035562230000021
wherein, l (x) i ) A tag value, p (x), representing the ith input data i ) Is a predicted value obtained by the ith input data after passing through a convolutional network, loss (x) i ) The loss function is represented.
3. The method of claim 2, wherein the calculating the gradient of each layer parameter based on the calculated error back propagation comprises:
Figure FDA0004035562230000022
Figure FDA0004035562230000023
wherein L represents an error obtained after training sample data,
Figure FDA0004035562230000024
is a convolution kernel parameter of l layers, <' >>
Figure FDA0004035562230000025
Represents the convolved output, <' > or>
Figure FDA0004035562230000026
Is to activate the result after convolution, delta l Represents the gradient of the error to the convolution kernel parameters, η is the learning rate.
4. The method of claim 1, wherein the unfolding manner of unfolding the static features in the time dimension is as follows:
[B*S,C,H,W]→[B,C,S,H,W]
where B denotes the batch size, S denotes the number of input images, and C, H, and W denote the number of channels and the height and width of the feature map, respectively.
5. The method according to claim 1, wherein said combining three said output classification results in step (5) for spatiotemporal multidimensional feature modeling analysis comprises:
classifying the learner into three dimensions according to characteristics according to the output classification result, wherein the dimensions comprise a cognitive dimension, an interaction dimension and an emotion dimension;
sending suggestion information to the learner based on the classification result of the cognitive dimension;
sending interaction reminding information to the learner based on the classification result of the interaction dimension;
and sending learning difficulty adjustment information to the learner based on the classification result of the emotional dimension.
CN202011355627.0A 2020-11-27 2020-11-27 Multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data Active CN112529054B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011355627.0A CN112529054B (en) 2020-11-27 2020-11-27 Multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011355627.0A CN112529054B (en) 2020-11-27 2020-11-27 Multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data

Publications (2)

Publication Number Publication Date
CN112529054A CN112529054A (en) 2021-03-19
CN112529054B true CN112529054B (en) 2023-04-07

Family

ID=74994046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011355627.0A Active CN112529054B (en) 2020-11-27 2020-11-27 Multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data

Country Status (1)

Country Link
CN (1) CN112529054B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591988B (en) * 2021-07-30 2023-08-29 华中师范大学 Knowledge cognitive structure analysis method, system, computer equipment, medium and terminal
CN114224342B (en) * 2021-12-06 2023-12-15 南京航空航天大学 Multichannel electroencephalogram signal emotion recognition method based on space-time fusion feature network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9754351B2 (en) * 2015-11-05 2017-09-05 Facebook, Inc. Systems and methods for processing content using convolutional neural networks
US10818019B2 (en) * 2017-08-14 2020-10-27 Siemens Healthcare Gmbh Dilated fully convolutional network for multi-agent 2D/3D medical image registration
NZ759804A (en) * 2017-10-16 2022-04-29 Illumina Inc Deep learning-based techniques for training deep convolutional neural networks
CN108399376B (en) * 2018-02-07 2020-11-06 华中师范大学 Intelligent analysis method and system for classroom learning interest of students
CN108596039B (en) * 2018-03-29 2020-05-05 南京邮电大学 Bimodal emotion recognition method and system based on 3D convolutional neural network

Also Published As

Publication number Publication date
CN112529054A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
Dewan et al. A deep learning approach to detecting engagement of online learners
CN110148318B (en) Digital teaching assistant system, information interaction method and information processing method
Bian et al. Spontaneous facial expression database for academic emotion inference in online learning
CN109902912B (en) Personalized image aesthetic evaluation method based on character features
CN110135242B (en) Emotion recognition device and method based on low-resolution infrared thermal imaging depth perception
CN112529054B (en) Multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data
CN110298303B (en) Crowd identification method based on long-time memory network glance path learning
Aslan et al. Multimodal video-based apparent personality recognition using long short-term memory and convolutional neural networks
Gavrilescu et al. Predicting the Sixteen Personality Factors (16PF) of an individual by analyzing facial features
Feng et al. Engagement evaluation for autism intervention by robots based on dynamic bayesian network and expert elicitation
Atanassov et al. Hybrid system for emotion recognition based on facial expressions and body gesture recognition
CN114724224A (en) Multi-mode emotion recognition method for medical care robot
Fujita et al. Intelligent human interface based on mental cloning-based software
CN113705725B (en) User personality characteristic prediction method and device based on multi-mode information fusion
CN111339878B (en) Correction type real-time emotion recognition method and system based on eye movement data
CN117251057A (en) AIGC-based method and system for constructing AI number wisdom
Hou Deep Learning-Based Human Emotion Detection Framework Using Facial Expressions
CN117237766A (en) Classroom cognition input identification method and system based on multi-mode data
CN116071794A (en) Behavior data learning style portrait generation method, system, equipment and medium
He et al. Analysis of concentration in English education learning based on CNN model
CN115376214A (en) Emotion recognition method and device, electronic equipment and storage medium
CN113158872A (en) Online learner emotion recognition method
Yildirim From perception to conception: learning multisensory representations
Younis et al. Machine learning for human emotion recognition: a comprehensive review
CN117150320B (en) Dialog digital human emotion style similarity evaluation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant