CN111401147A - Intelligent analysis method and device based on video behavior data and storage medium - Google Patents

Intelligent analysis method and device based on video behavior data and storage medium Download PDF

Info

Publication number
CN111401147A
CN111401147A CN202010122870.1A CN202010122870A CN111401147A CN 111401147 A CN111401147 A CN 111401147A CN 202010122870 A CN202010122870 A CN 202010122870A CN 111401147 A CN111401147 A CN 111401147A
Authority
CN
China
Prior art keywords
video
recognition result
data
expression
expression recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010122870.1A
Other languages
Chinese (zh)
Other versions
CN111401147B (en
Inventor
吴智炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202010122870.1A priority Critical patent/CN111401147B/en
Priority claimed from CN202010122870.1A external-priority patent/CN111401147B/en
Publication of CN111401147A publication Critical patent/CN111401147A/en
Application granted granted Critical
Publication of CN111401147B publication Critical patent/CN111401147B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an artificial intelligence technology, and discloses an intelligent analysis method based on video behavior data, which comprises the following steps: receiving a prerecorded user video, performing voice extraction operation on the user video to obtain voice data and video data, inputting the video data to a pre-trained expression recognition model to obtain an expression recognition result, inputting the voice data to a pre-trained speech recognition model to obtain a speech recognition result, constructing a classification tree according to the speech recognition result and the expression recognition result to obtain a shallow depth psychological characteristic set, constructing a target function according to the shallow depth psychological characteristic set, solving a partial derivative of the target function to obtain an offset value, and outputting a psychological state analysis result if the offset value is less than or equal to a preset offset error. The invention also provides an intelligent analysis device based on the video behavior data and a computer readable storage medium. The invention can realize accurate and efficient intelligent analysis function based on the video behavior data.

Description

Intelligent analysis method and device based on video behavior data and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for intelligent analysis based on video behavior data and a computer readable storage medium.
Background
The intelligent analysis based on the video behavior data is applied to a plurality of fields at present, for example, in the process of claims collection of insurance companies, a video recording device is used for recording the communication video between business personnel and the personnel to be claimed, then whether the personnel to be claimed have cheating and insurance behaviors is analyzed intelligently, when a policeman examines a criminal, the psychological state of the criminal is analyzed to give psychological attack to the criminal, and the criminal is expected to be in good faith and wait.
At present, the common intelligent analysis based on video behavior data is to summarize the psychological state situation by recording videos such as micro-expressions, body movements, speaking mood and the like and observing and analyzing the videos by relevant psychological experts, and although the purpose of psychological state identification can be achieved, the efficiency in the fields such as insurance, investigation and the like is low because a large amount of time and manpower are required to be invested for analysis.
Disclosure of Invention
The invention provides an intelligent analysis method and device based on video behavior data and a computer readable storage medium, and mainly aims to identify expressions and language states of a user through a model so as to perform intelligent analysis on a psychological state.
In order to achieve the above object, the present invention provides an intelligent analysis method based on video behavior data, which includes:
receiving a pre-recorded user video, and executing voice extraction operation on the user video to obtain voice data and video data not including the voice data;
inputting the video data into a pre-trained expression recognition model for expression recognition to obtain an expression recognition result;
inputting the voice data into a pre-trained morphological recognition model for morphological recognition to obtain a morphological recognition result;
and constructing a classification tree according to the speech state recognition result and the expression recognition result, obtaining a deep shallow layer psychological characteristic set according to the classification tree, constructing a target function according to the deep shallow layer psychological characteristic set, solving a partial derivative of the target function to obtain an offset value, feeding back the speech state recognition result and the expression recognition result to a preset user if the offset value is greater than a preset offset error, generating a psychological state analysis result according to the expression recognition result and the speech state recognition result if the offset value is less than or equal to the preset offset error, and outputting the psychological state analysis result.
Optionally, the voice extraction operation includes:
carrying out pre-emphasis operation on the user video;
performing frame division and windowing operation on the user video subjected to the pre-emphasis operation;
and separating voice data from the user video subjected to the framing windowing operation based on a discrete Fourier transform method to obtain the voice data and the video data not comprising the voice data.
Optionally, the intelligent analysis method based on video behavior data further includes training the expression recognition model, where the training includes:
constructing the expression recognition model;
establishing a facial expression library and a comparative expression library;
positioning and cutting a face area of the face expression library according to the expression recognition model to obtain a cut face expression library;
predicting the feature points of the facial expression library by using the expression recognition model, judging the errors between the feature points of the facial expression library and the comparison expression library, if the errors are larger than the preset errors, adjusting the parameters of the facial expression recognition model, and predicting the feature points of the facial expression library again, if the errors are smaller than the preset errors, quitting the prediction, and finishing the training of the facial expression recognition model.
Optionally, the deep shallow psychological characteristic set is obtained by calculating a kini index of the classification tree by using a kini index method;
wherein the Gini index method is as follows:
Figure BDA0002392486400000021
Figure BDA0002392486400000022
wherein, A represents the depth shallow psychological characteristic set, D represents the set formed by the morphism recognition result and the expression recognition result, and T represents the set formed by the morphism recognition result and the expression recognition resultsData volume, T, representing different label classifications1Amount of data, T, representing anger label2And K represents the data volume of the set formed by the morphism recognition result and the expression recognition result.
Optionally, the constructing an objective function according to the set of deep and shallow psychological features, and solving a partial derivative of the objective function to obtain an offset value includes:
respectively constructing a penalty item and an error function based on the depth shallow psychological characteristic set;
adding the error function and the penalty term to obtain a target function;
solving a first order partial derivative result and a second order partial derivative result of the error function;
and reversely deducing to obtain an offset value in the target function according to the first-order partial derivative result and the second-order partial derivative result.
In addition, in order to achieve the above object, the present invention further provides an intelligent analysis device based on video behavior data, the device including a memory and a processor, the memory storing therein an intelligent analysis program based on video behavior data, the intelligent analysis program based on video behavior data being executable on the processor, and the intelligent analysis program based on video behavior data implementing the following steps when executed by the processor:
receiving a pre-recorded user video, and executing voice extraction operation on the user video to obtain voice data and video data not including the voice data;
inputting the video data into a pre-trained expression recognition model for expression recognition to obtain an expression recognition result;
inputting the voice data into a pre-trained morphological recognition model for morphological recognition to obtain a morphological recognition result;
and constructing a classification tree according to the speech state recognition result and the expression recognition result, obtaining a depth shallow layer psychological characteristic set according to the classification tree, inputting the depth shallow layer psychological characteristic set to a pre-constructed psychological analysis model to obtain an offset value, feeding back the speech state recognition result and the expression recognition result to a preset user if the offset value is greater than a preset offset error, generating a psychological state analysis result according to the expression recognition result and the speech state recognition result if the offset value is less than or equal to the preset offset error, and outputting the psychological state analysis result.
Optionally, the voice extraction operation includes:
carrying out pre-emphasis operation on the user video;
performing frame division and windowing operation on the user video subjected to the pre-emphasis operation;
and separating voice data from the user video subjected to the framing windowing operation based on a discrete Fourier transform method to obtain the voice data and the video data not comprising the voice data.
Optionally, when executed by the processor, the intelligent analysis program based on video behavior data further implements the following steps: training the expression recognition model, the training comprising:
constructing the expression recognition model;
establishing a facial expression library and a comparative expression library;
positioning and cutting a face area of the face expression library according to the expression recognition model to obtain a cut face expression library;
predicting the feature points of the facial expression library by using the expression recognition model, judging the errors between the feature points of the facial expression library and the comparison expression library, if the errors are larger than the preset errors, adjusting the parameters of the facial expression recognition model, and predicting the feature points of the facial expression library again, if the errors are smaller than the preset errors, quitting the prediction, and finishing the training of the facial expression recognition model.
Optionally, the deep shallow psychological characteristic set is obtained by calculating a kini index of the classification tree by using a kini index method;
wherein the Gini index method is as follows:
Figure BDA0002392486400000041
Figure BDA0002392486400000042
wherein, A represents the depth shallow psychological characteristic set, D represents the set formed by the morphism recognition result and the expression recognition result, and T represents the set formed by the morphism recognition result and the expression recognition resultsData volume, T, representing different label classifications1Amount of data, T, representing anger label2And K represents the data volume of the set formed by the morphism recognition result and the expression recognition result.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium, on which an intelligent analysis program based on video behavior data is stored, the intelligent analysis program based on video behavior data being executable by one or more processors to implement the steps of the intelligent analysis method based on video behavior data as described above.
According to the method, the user videos are recorded in advance, the voice data and the video data without the voice data are obtained through voice extraction operation, and the expression and the language state are recognized through the model to obtain the recognition result, so that the intelligent degree is high, and a large amount of time and manpower are not required to be invested for intervention; meanwhile, according to the constructed classification tree, error analysis is carried out, and psychological state analysis can be automatically completed. Therefore, the intelligent analysis method, the intelligent analysis device and the computer-readable storage medium based on the video behavior data can achieve the purpose of intelligently analyzing the psychological state.
Drawings
Fig. 1 is a schematic flowchart of an intelligent analysis method based on video behavior data according to an embodiment of the present invention;
fig. 2 is a schematic internal structural diagram of an intelligent analysis apparatus based on video behavior data according to an embodiment of the present invention;
fig. 3 is a block diagram illustrating an intelligent analysis program based on video behavior data in an intelligent analysis device based on video behavior data according to an embodiment of the present invention;
fig. 4 is a structural diagram of a speech recognition model in an intelligent analysis based on video behavior data according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an intelligent analysis method based on video behavior data. Fig. 1 is a schematic flow chart of an intelligent analysis method based on video behavior data according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the intelligent analysis method based on video behavior data includes:
and S1, receiving the pre-recorded user video, and performing voice extraction operation on the user video to obtain voice data and video data not including the voice data.
Preferably, the pre-recorded user video can be divided according to different scenes, for example, in the process of claiming by an insurance company, the exchange video of business personnel and the personnel to be claimed is recorded, when a public security system inspects a person, the whole process of inspecting when the person is recorded.
Preferably, the performing a voice extraction operation on the user video to obtain voice data and video data not including voice includes: and carrying out pre-emphasis operation on the user video, carrying out framing and windowing operation on the user video after the pre-emphasis operation, and separating voice data from the user video after the framing and windowing operation on the basis of discrete Fourier change.
The pre-emphasis operation is to compensate the voice signal of the user video, because the voice system of human voice will suppress the high frequency part, and in addition, in order to make the voice energy of the high frequency part and the voice energy of the low frequency part have similar amplitude, so as to flatten the frequency spectrum of the signal, and to keep the same signal-to-noise ratio in the whole frequency band from low frequency to high frequency, it is necessary to increase the energy of the high frequency part. The pre-emphasis operation may be calculated by:
y(n)=x(n)-μx(n-1)
wherein, y (n) is the user video after the pre-emphasis operation, x (n) is the user video, n is the waveform, μ is the adjustment value of the pre-emphasis operation, and the value range is [0.9,1.0 ].
Preferably, the framing windowing operation is to remove the voice overlapping part in the user video, for example, in the video of the communication between the service person who logs in and the person who is to be claimed, there is a voice overlapping part between the service person and the person who is to be claimed, so the voice of the service person can be removed by using the framing windowing operation, and the voice of the person who is to be claimed is retained. The method for adopting the frame windowing operation comprises the following steps:
Figure BDA0002392486400000061
wherein w (n) is the user video after the frame windowing operation, n is the waveform, and L is the frame length of the user video.
Preferably, the calculation method for separating the voice data from the user video after the frame windowing operation based on the discrete fourier transform is as follows:
Figure BDA0002392486400000062
wherein, s (N) is the separated voice data, N is the number of change points of the discrete fourier transform, w (N) is the user video after the frame windowing operation, j is the weight of the discrete fourier transform, and k is the interval division value of the waveform N.
And S2, inputting the video data into a pre-trained expression recognition model for expression recognition to obtain an expression recognition result.
Preferably, the training process of the expression recognition model includes: the method comprises the steps of establishing an expression recognition model, establishing a facial expression library and a comparative expression library, positioning and cutting out a facial area of the facial expression library according to the expression recognition model to obtain a cut facial expression library, predicting feature points of the cut facial expression library by using the expression recognition model, judging errors of the feature points of the cut facial expression library and the comparative expression library, if the errors are larger than preset errors, predicting the feature points of the cut facial expression library again, and if the errors are smaller than the preset errors, quitting the prediction to obtain the pre-established expression recognition model.
The expression database JAFFE of the Japanese ATR (advanced telecom research institute International) is a database specially used for expression recognition research, the database comprises 213 faces (resolution: 256 pixels of each image: × 256 pixels) of Japanese females, each image is marked with original expression definitions, the expression database comprises 10 persons in total, and each person has 7 expressions (normal (also called neutral face), happiness, sadness, surprise, anger, disgust and fear).
Preferably, the facial expression library is created by using crawler technology to crawl facial expression images and normalizing captured facial brightness, and the facial expression library takes six emotions as labels, including happiness, sadness, surprise, anger, disgust and fear, and the facial label of each label is different, such as happiness (face smiles, mouth is raised, eyes are smaller than normal state, because the pupil is shrunk when the person is happy), anger (the main characteristic is pupil is enlarged, eyes are bigger than normal state, because the pupil is enlarged when the person is angry).
In a preferred embodiment of the present invention, the expression recognition model adopts a dcnn (deep convolutional network for Facial Point detection) deep convolutional network model.
The positioning cutting is that the judgment of the facial expression recognition is influenced because the range contained in the facial expression graph is too large, so that the first partial convolution network model of the DCNN positions the face and cuts the face by searching for 5 characteristic points (left and right eyes, nose, left and right mouth corners) of the face.
Specifically, the convolutional neural network of the first part of the DCNN consists of three convolutional neural networks, which are named as: f1 (input of network is a whole face picture), EN1 (input picture contains eyes and nose), NM1 (contains nose and mouth area). For the input facial expression image, a 10-dimensional feature vector (5 feature points) is output through F1; according to the output 10-dimensional feature vector, the EN1 is used for positioning three feature points of a left eye, a right eye and a nose; meanwhile, according to the output 10-dimensional feature vector, NM1 positions three feature points of a left mouth angle, a right mouth angle and a nose, and cuts out a face region picture containing eyes and a nose and mouth after combining the nose feature points positioned by EN 1.
The positions of the 5 human face feature points can be roughly positioned through the operation, and the five predicted feature points are further taken as centers to continue to perform feature positioning by utilizing the convolutional neural network model of the DCNN second part. The convolutional neural network model of the second part consists of 10 CNNs, and the 10 CNNs are respectively used for predicting 5 feature points, each feature point uses two CNNs, and then the two CNNs average the predicted results.
And the neural network model of the third part of the DCNN performs face cutting again on the basis of the positions predicted by the two characteristic points. The neural network model of the third part of the DCNN has the same structure as that of the second part and also consists of 10 CNNs.
Further, the error is calculated by
Figure BDA0002392486400000081
Wherein l is the image width of the facial expression image; and x is the vector representation of 5 characteristic points of the facial expression library picture, x 'is the characteristic vector of the facial expression library data, and y' is the corresponding facial expression label of the facial expression library.
And S3, inputting the voice data into a pre-trained morphological recognition model to perform morphological recognition to obtain a morphological recognition result.
Preferably, the speech recognition model is based on a convolution-cycle neural network, and the network structure diagram of the whole speech recognition model is shown in the attached figure 4 in the specification.
As shown in the attached figure 4 of the specification, the language state recognition model comprises a convolutional layer, a pooling layer, a Permute layer, an L STM layer and a full connection layer.
Preferably, the speech recognition includes: and the convolutional layer and the pooling layer receive the voice data to carry out convolution processing and pooling processing.
The calculation method of the convolution processing comprises the following steps:
Figure BDA0002392486400000082
wherein the content of the first and second substances,
Figure BDA0002392486400000083
representing the input of the jth characteristic diagram of the mth convolutional layer,
Figure BDA0002392486400000084
which represents the convolution kernel or kernels, is,
Figure BDA0002392486400000085
representing bias terms, representing convolution operations, MiRepresenting a set of feature maps, f represents an activation function.
The calculation method of the pooling treatment comprises the following steps:
Figure BDA0002392486400000086
wherein the content of the first and second substances,
Figure BDA0002392486400000087
an input feature map representing the nth layer,
Figure BDA0002392486400000088
showing the output characteristic of the (n-1) th layer,
Figure BDA0002392486400000089
and
Figure BDA00023924864000000810
respectively, weight and bias terms, and down represents a down-sampling function from n-1 layers to n layers.
The Permutee layer conducts dimensionality extension on the data after convolution processing and pooling processing, and the L STM layer and the full connection layer conduct a calculation method to obtain a morphological recognition result, wherein the morphological recognition result is the same as S3 and has 6 states of happiness, sadness, surprise, anger, disgust and fear.
The calculation method of the L STM layer comprises the following steps:
it=σ(Wixt+Wimt-1+bi)
ft=σ(Wfxt+Wfmt-1+bf)
ot=σ(Woxt+Womt-1+bo)
cnew=h(Wcxt+Wcmt-1+bc
wherein, cnewIs the output value, i, of the L STM layert,ft,otRespectively representing an input gate, an output gate and a forgetting gate of the L STM layer, wherein t is time, sigma is a sigmoid function, h is a tanh function, W is weight, b is offset, and m ist-1Is a hidden state at time t-1.
S4, constructing a classification tree according to the morphism recognition result and the expression recognition result, and obtaining a depth shallow psychological characteristic set according to the classification tree.
Preferably, the constructing of the deep shallow psychological characteristic based on the morphological recognition result and the expression recognition result includes: and constructing a classification feature sequence tree according to the morphism recognition result and the expression recognition result, and obtaining the depth shallow psychological feature set according to the classification feature sequence tree.
Preferably, the classification feature order tree may adopt a CART tree.
Further, the depth shallow psychological characteristic set obtained according to the classification characteristic sequence tree may adopt a kini index method, and a calculation formula of the kini index method is as follows:
Figure BDA0002392486400000091
wherein A represents the depth shallow layer psychological characteristics, D represents a set formed by the morphism recognition result and the expression recognition result, and TsIndicating label classification, including joy, anger, etc., such as T1Indicating anger. Further, the air conditioner is provided with a fan,
Figure BDA0002392486400000092
and K represents the data volume of a set formed by the morphism recognition result and the expression recognition result.
S5, constructing an objective function according to the depth shallow layer psychological characteristic set, and solving a partial derivative of the objective function to obtain an offset value.
Preferably, the constructing an objective function according to the set of deep and shallow psychological features, and solving a partial derivative of the objective function to obtain an offset value includes: and respectively constructing a penalty term and an error function based on the deep shallow psychological characteristic set, adding the error function and the penalty term to obtain an objective function, solving a first-order partial derivative result and a second-order partial derivative result of the error function, and reversely deducing to obtain an offset value in the objective function according to the first-order partial derivative result and the second-order partial derivative result.
Preferably, after the pre-constructed psychology analysis model receives the shallow depth psychology feature, an objective function is constructed based on the shallow depth psychology feature:
Figure BDA0002392486400000101
wherein y is the bias value, deep _ show represents the deep shallow psychological characteristic set, K is the data size of the deep shallow psychological characteristic set, and fk(xi) Is the objective function.
Further, the objective function is:
Figure BDA0002392486400000102
in the formula, l (x)i) Is an error function of the deep superficial psychographic features, omega (f)i) Is a penalty term function, aiming at improving the accuracy excellence of the evaluation of the invention, and further, a penalty term omega (f)t) Comprises the following steps:
Figure BDA0002392486400000103
wherein M is the number of leaf nodes of the CART tree, omegajAnd further, the error function is the weight of the leaf node of the CART tree.
Figure BDA0002392486400000104
In the formula gi,hiAre each l (x)i) First and second order partial derivatives of (d):
Figure BDA0002392486400000105
Figure BDA0002392486400000106
and (3) combining the above formula to obtain a final objective function:
Figure BDA0002392486400000107
wherein G isi,HiAnd respectively calculating a first order partial derivative and a second order partial derivative, wherein T is a penalty term, and gamma is a penalty term coefficient to obtain the bias value.
And S6, judging whether the offset value is greater than a preset offset error, and if so, feeding the speech recognition result and the expression recognition result back to a professional psychoanalyst for further psychological state analysis.
If the bias value is larger than the preset bias error, the superficial depth psychology feature set does not reach the expected psychology analysis result, and the obtained expression recognition result and the obtained language recognition result are inconsistent, so a professional psychoanalyst needs to be further combined for analysis.
And S7, if the offset value is less than or equal to a preset offset error, generating a psychological state analysis result according to the expression recognition result and the speech recognition result, and outputting the psychological state analysis result.
The invention also provides an intelligent analysis device based on the video behavior data. Fig. 2 is a schematic diagram illustrating an internal structure of an intelligent analysis device based on video behavior data according to an embodiment of the present invention.
In the present embodiment, the intelligent analysis device 1 based on video behavior data may be a PC (personal computer), a terminal device such as a smart phone, a tablet computer, or a mobile computer, or may be a server. The intelligent analysis device 1 based on video behavior data comprises at least a memory 11, a processor 12, a communication bus 13, and a network interface 14.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the intelligent analysis device 1 based on video behavior data, for example a hard disk of the intelligent analysis device 1 based on video behavior data. The memory 11 may also be an external storage device of the intelligent analysis device 1 based on the video behavior data in other embodiments, such as a plug-in hard disk provided on the intelligent analysis device 1 based on the video behavior data, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and so on. Further, the memory 11 may also include both an internal storage unit and an external storage device of the intelligent analysis apparatus 1 based on video behavior data. The memory 11 may be used not only to store application software installed in the intelligent analysis device 1 based on video behavior data and various types of data, such as codes of the intelligent analysis program 01 based on video behavior data, but also to temporarily store data that has been output or is to be output.
The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data, such as executing the intelligent analysis program 01 based on video behavior data.
The communication bus 13 is used to realize connection communication between these components.
The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the apparatus 1 and other electronic devices.
Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and an optional user interface may also comprise a standard wired interface, a wireless interface, optionally, in some embodiments, the Display may be a L ED Display, a liquid crystal Display, a touch-sensitive liquid crystal Display, and an O L ED (Organic L light-Emitting Diode) touch-sensitive device, etc.
Fig. 2 shows only the video behavior data based intelligent analysis apparatus 1 having the components 11-14 and the video behavior data based intelligent analysis program 01, and it will be understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the video behavior data based intelligent analysis apparatus 1, and may include fewer or more components than those shown, or combine some components, or different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores therein an intelligent analysis program 01 based on video behavior data; the processor 12 executes the intelligent analysis program 01 based on video behavior data stored in the memory 11 to implement the following steps:
step one, receiving a pre-recorded user video, and executing voice extraction operation on the user video to obtain voice data and video data not including the voice data.
Preferably, the pre-recorded user video can be divided according to different scenes, for example, in the process of claiming by an insurance company, the exchange video of business personnel and the personnel to be claimed is recorded, when a public security system inspects a person, the whole process of inspecting when the person is recorded.
Preferably, the performing a voice extraction operation on the user video to obtain voice data and video data not including voice includes: and carrying out pre-emphasis operation on the user video, carrying out framing and windowing operation on the user video after the pre-emphasis operation, and separating voice data from the user video after the framing and windowing operation on the basis of discrete Fourier change.
The pre-emphasis operation is to compensate the voice signal of the user video, because the voice system of human voice will suppress the high frequency part, and in addition, in order to make the voice energy of the high frequency part and the voice energy of the low frequency part have similar amplitude, so as to flatten the frequency spectrum of the signal, and to keep the same signal-to-noise ratio in the whole frequency band from low frequency to high frequency, it is necessary to increase the energy of the high frequency part. The pre-emphasis operation may be calculated by:
y(n)=x(n)-μx(n-1)
wherein, y (n) is the user video after the pre-emphasis operation, x (n) is the user video, n is the waveform, μ is the adjustment value of the pre-emphasis operation, and the value range is [0.9,1.0 ].
Preferably, the framing windowing operation is to remove the voice overlapping part in the user video, for example, in the video of the communication between the service person who logs in and the person who is to be claimed, there is a voice overlapping part between the service person and the person who is to be claimed, so the voice of the service person can be removed by using the framing windowing operation, and the voice of the person who is to be claimed is retained. The method for adopting the frame windowing operation comprises the following steps:
Figure BDA0002392486400000131
wherein w (n) is the user video after the frame windowing operation, n is the waveform, and L is the frame length of the user video.
Preferably, the calculation method for separating the voice data from the user video after the frame windowing operation based on the discrete fourier transform is as follows:
Figure BDA0002392486400000132
wherein, s (N) is the separated voice data, N is the number of change points of the discrete fourier transform, w (N) is the user video after the frame windowing operation, j is the weight of the discrete fourier transform, and k is the interval division value of the waveform N.
And step two, inputting the video data into an expression recognition model which is trained in advance to perform expression recognition to obtain an expression recognition result.
Preferably, the training process of the expression recognition model includes: the method comprises the steps of establishing an expression recognition model, establishing a facial expression library and a comparative expression library, positioning and cutting out a facial area of the facial expression library according to the expression recognition model to obtain a cut facial expression library, predicting feature points of the cut facial expression library by using the expression recognition model, judging errors of the feature points of the cut facial expression library and the comparative expression library, if the errors are larger than preset errors, predicting the feature points of the cut facial expression library again, and if the errors are smaller than the preset errors, quitting the prediction to obtain the pre-established expression recognition model.
The expression database JAFFE of the Japanese ATR (advanced telecom research institute International) is a database specially used for expression recognition research, the database comprises 213 faces (resolution: 256 pixels of each image: × 256 pixels) of Japanese females, each image is marked with original expression definitions, the expression database comprises 10 persons in total, and each person has 7 expressions (normal (also called neutral face), happiness, sadness, surprise, anger, disgust and fear).
Preferably, the facial expression library is created by using crawler technology to crawl facial expression images and normalizing captured facial brightness, and the facial expression library takes six emotions as labels, including happiness, sadness, surprise, anger, disgust and fear, and the facial label of each label is different, such as happiness (face smiles, mouth is raised, eyes are smaller than normal state, because the pupil is shrunk when the person is happy), anger (the main characteristic is pupil is enlarged, eyes are bigger than normal state, because the pupil is enlarged when the person is angry).
In a preferred embodiment of the present invention, the expression recognition model adopts a dcnn (deep convolutional network for Facial Point detection) deep convolutional network model.
The positioning cutting is that the judgment of the facial expression recognition is influenced because the range contained in the facial expression graph is too large, so that the first partial convolution network model of the DCNN positions the face and cuts the face by searching for 5 characteristic points (left and right eyes, nose, left and right mouth corners) of the face.
Specifically, the convolutional neural network of the first part of the DCNN consists of three convolutional neural networks, which are named as: f1 (input of network is a whole face picture), EN1 (input picture contains eyes and nose), NM1 (contains nose and mouth area). For the input facial expression image, a 10-dimensional feature vector (5 feature points) is output through F1; according to the output 10-dimensional feature vector, the EN1 is used for positioning three feature points of a left eye, a right eye and a nose; meanwhile, according to the output 10-dimensional feature vector, NM1 positions three feature points of a left mouth angle, a right mouth angle and a nose, and cuts out a face region picture containing eyes and a nose and mouth after combining the nose feature points positioned by EN 1.
The positions of the 5 human face feature points can be roughly positioned through the operation, and the five predicted feature points are further taken as centers to continue to perform feature positioning by utilizing the convolutional neural network model of the DCNN second part. The convolutional neural network model of the second part consists of 10 CNNs, and the 10 CNNs are respectively used for predicting 5 feature points, each feature point uses two CNNs, and then the two CNNs average the predicted results.
And the neural network model of the third part of the DCNN performs face cutting again on the basis of the positions predicted by the two characteristic points. The neural network model of the third part of the DCNN has the same structure as that of the second part and also consists of 10 CNNs.
Further, the error is calculated by
Figure BDA0002392486400000151
Wherein l is the image width of the facial expression image; and x is the vector representation of 5 characteristic points of the facial expression library picture, x 'is the characteristic vector of the facial expression library data, and y' is the corresponding facial expression label of the facial expression library.
And step three, inputting the voice data into a pre-trained speech recognition model for speech recognition to obtain a speech recognition result.
Preferably, the speech recognition model is based on a convolution-cycle neural network, and the network structure diagram of the whole speech recognition model is shown in the attached figure 4 in the specification.
As shown in the attached figure 4 of the specification, the language state recognition model comprises a convolutional layer, a pooling layer, a Permute layer, an L STM layer and a full connection layer.
Preferably, the speech recognition includes: and the convolutional layer and the pooling layer receive the voice data to carry out convolution processing and pooling processing.
The calculation method of the convolution processing comprises the following steps:
Figure BDA0002392486400000152
wherein the content of the first and second substances,
Figure BDA0002392486400000153
representing the input of the jth characteristic diagram of the mth convolutional layer,
Figure BDA0002392486400000154
which represents the convolution kernel or kernels, is,
Figure BDA0002392486400000155
representing bias terms, representing convolution operations, MiRepresenting a set of feature maps, f represents an activation function.
The calculation method of the pooling treatment comprises the following steps:
Figure BDA0002392486400000156
wherein the content of the first and second substances,
Figure BDA0002392486400000157
an input feature map representing the nth layer,
Figure BDA0002392486400000158
showing the output characteristic of the (n-1) th layer,
Figure BDA0002392486400000159
and
Figure BDA00023924864000001510
respectively representing the weight and bias terms, down representing n-1 layers ton layers of downsampling function.
And the Permutee layer performs dimensionality extension on the data after the convolution processing and the pooling processing, and the L STM layer and the full connection layer perform a calculation method to obtain a morphological recognition result, wherein the morphological recognition result is the same as the three steps, and has 6 states of happiness, sadness, surprise, anger, disgust and fear.
The calculation method of the L STM layer comprises the following steps:
it=σ(Wixt+Wimt-1+bi)
ft=σ(Wfxt+Wfmt-1+bf)
ot=σ(Woxt+Womt-1+bo)
cnew=h(Wcxt+Wcmt-1+bc
wherein, cnewIs the output value, i, of the L STM layert,ft,otRespectively representing an input gate, an output gate and a forgetting gate of the L STM layer, wherein t is time, sigma is a sigmoid function, h is a tanh function, W is weight, b is offset, and m ist-1Is a hidden state at time t-1.
And fourthly, constructing a classification tree according to the morphism recognition result and the expression recognition result, and obtaining a depth shallow psychological characteristic set according to the classification tree.
Preferably, the constructing of the deep shallow psychological characteristic based on the morphological recognition result and the expression recognition result includes: and constructing a classification feature sequence tree according to the morphism recognition result and the expression recognition result, and obtaining the depth shallow psychological feature set according to the classification feature sequence tree.
Preferably, the classification feature order tree may adopt a CART tree.
Further, the depth shallow psychological characteristic set obtained according to the classification characteristic sequence tree may adopt a kini index method, and a calculation formula of the kini index method is as follows:
Figure BDA0002392486400000161
wherein A represents the depth shallow layer psychological characteristics, D represents a set formed by the morphism recognition result and the expression recognition result, and TsIndicating label classification, including joy, anger, etc., such as T1Indicating anger. Further, the air conditioner is provided with a fan,
Figure BDA0002392486400000162
and K represents the data volume of a set formed by the morphism recognition result and the expression recognition result.
And fifthly, constructing an objective function according to the depth shallow psychological characteristic set, and solving a partial derivative of the objective function to obtain an offset value.
Preferably, the constructing an objective function according to the set of deep and shallow psychological features, and solving a partial derivative of the objective function to obtain an offset value includes: and respectively constructing a penalty term and an error function based on the deep shallow psychological characteristic set, adding the error function and the penalty term to obtain an objective function, solving a first-order partial derivative result and a second-order partial derivative result of the error function, and reversely deducing to obtain an offset value in the objective function according to the first-order partial derivative result and the second-order partial derivative result.
Preferably, after the pre-constructed psychology analysis model receives the shallow depth psychology feature, an objective function is constructed based on the shallow depth psychology feature:
Figure BDA0002392486400000171
wherein y is the bias value, deep _ show represents the deep shallow psychological characteristic set, K is the data size of the deep shallow psychological characteristic set, and fk(xi) Is the objective function.
Further, the objective function is:
Figure BDA0002392486400000172
in the formula, l (x)i) Is an error function of the deep superficial psychographic features, omega (f)i) Is a penalty term function, aiming at improving the accuracy excellence of the evaluation of the invention, and further, a penalty term omega (f)t) Comprises the following steps:
Figure BDA0002392486400000173
wherein M is the number of leaf nodes of the CART tree, omegajAnd further, the error function is the weight of the leaf node of the CART tree.
Figure BDA0002392486400000174
In the formula gi,hiAre each l (x)i) First and second order partial derivatives of (d):
Figure BDA0002392486400000175
Figure BDA0002392486400000176
and (3) combining the above formula to obtain a final objective function:
Figure BDA0002392486400000177
wherein G isi,HiAnd respectively calculating a first order partial derivative and a second order partial derivative, wherein T is a penalty term, and gamma is a penalty term coefficient to obtain the bias value.
And step six, judging whether the offset value is larger than a preset offset error, and if the offset value is larger than the preset offset error, feeding the speech recognition result and the expression recognition result back to a professional psychoanalyst for further psychological state analysis.
If the bias value is larger than the preset bias error, the superficial depth psychology feature set does not reach the expected psychology analysis result, and the obtained expression recognition result and the obtained language recognition result are inconsistent, so a professional psychoanalyst needs to be further combined for analysis.
And seventhly, if the bias value is smaller than or equal to a preset bias error, generating a psychological state analysis result according to the expression recognition result and the speech state recognition result, and outputting the psychological state analysis result.
If the bias value is larger than the preset bias error, the superficial depth psychology feature set does not reach the expected psychology analysis result, and therefore a professional psychoanalyst needs to be further combined for analysis.
Alternatively, in other embodiments, the intelligent analysis program based on video behavior data may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention.
For example, referring to fig. 3, a schematic diagram of program modules of an intelligent analysis program based on video behavior data in an embodiment of the intelligent analysis device based on video behavior data of the present invention is shown, in this embodiment, the intelligent analysis program based on video behavior data may be divided into a data receiving and separating module 10, an expression and speech recognition module 20, a classification number constructing module 30, and a mental state analysis module 40, which exemplarily:
the data receiving and separating module 10 is configured to: receiving a pre-recorded user video, and executing voice extraction operation on the user video to obtain voice data and video data not including the voice data.
The expression and morphism recognition module 20 is configured to: and inputting the video data into a pre-trained expression recognition model for expression recognition to obtain an expression recognition result, and inputting the voice data into a pre-trained morphological recognition model for morphological recognition to obtain a morphological recognition result.
The classification number construction module 30 is configured to: and constructing a classification tree according to the morphism recognition result and the expression recognition result, and obtaining a deep shallow psychological characteristic set according to the classification tree.
The mental state analysis module 40 is configured to: and constructing an objective function according to the deep shallow psychological characteristic set, solving a partial derivative of the objective function to obtain an offset value, feeding back the speech recognition result and the expression recognition result to a preset user if the offset value is greater than a preset offset error, generating a psychological state analysis result according to the expression recognition result and the speech recognition result if the offset value is less than or equal to the preset offset error, and outputting the psychological state analysis result.
The functions or operation steps of the data receiving and separating module 10, the expression and language state identifying module 20, the classification number constructing module 30, the mental state analyzing module 40 and other program modules when executed are substantially the same as those of the above embodiments, and are not described herein again.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium has stored thereon an intelligent analysis program based on video behavior data, and the intelligent analysis program based on video behavior data is executable by one or more processors to implement the following operations:
receiving a pre-recorded user video, and executing voice extraction operation on the user video to obtain voice data and video data not including the voice data.
And inputting the video data into a pre-trained expression recognition model for expression recognition to obtain an expression recognition result, and inputting the voice data into a pre-trained morphological recognition model for morphological recognition to obtain a morphological recognition result.
And constructing a classification tree according to the morphism recognition result and the expression recognition result, and obtaining a deep shallow psychological characteristic set according to the classification tree.
And constructing an objective function according to the deep shallow psychological characteristic set, solving a partial derivative of the objective function to obtain an offset value, feeding back the speech recognition result and the expression recognition result to a preset user if the offset value is greater than a preset offset error, generating a psychological state analysis result according to the expression recognition result and the speech recognition result if the offset value is less than or equal to the preset offset error, and outputting the psychological state analysis result.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An intelligent analysis method based on video behavior data, the method comprising:
receiving a pre-recorded user video, and executing voice extraction operation on the user video to obtain voice data and video data not including the voice data;
inputting the video data into a pre-trained expression recognition model for expression recognition to obtain an expression recognition result;
inputting the voice data into a pre-trained morphological recognition model for morphological recognition to obtain a morphological recognition result;
and constructing a classification tree according to the speech state recognition result and the expression recognition result, obtaining a deep shallow layer psychological characteristic set according to the classification tree, constructing a target function according to the deep shallow layer psychological characteristic set, solving a partial derivative of the target function to obtain an offset value, feeding back the speech state recognition result and the expression recognition result to a preset user if the offset value is greater than a preset offset error, generating a psychological state analysis result according to the expression recognition result and the speech state recognition result if the offset value is less than or equal to the preset offset error, and outputting the psychological state analysis result.
2. The intelligent analysis method based on video behavior data according to claim 1, wherein the performing of the voice extraction operation on the user video to obtain voice data and video data not including the voice data comprises:
carrying out pre-emphasis operation on the user video;
performing frame division and windowing operation on the user video subjected to the pre-emphasis operation;
and separating voice data from the user video subjected to the framing windowing operation based on a discrete Fourier transform method to obtain the voice data and the video data not comprising the voice data.
3. The intelligent analysis method based on video behavior data according to claim 1, further comprising training the expression recognition model, the training comprising:
constructing the expression recognition model;
establishing a facial expression library and a comparative expression library;
positioning and cutting a face area of the face expression library according to the expression recognition model to obtain a cut face expression library;
predicting the feature points of the facial expression library by using the expression recognition model, judging the errors between the feature points of the facial expression library and the comparison expression library, if the errors are larger than the preset errors, adjusting the parameters of the facial expression recognition model, and predicting the feature points of the facial expression library again, if the errors are smaller than the preset errors, quitting the prediction, and finishing the training of the facial expression recognition model.
4. The intelligent analysis method based on video behavior data according to any one of claims 1 to 3, characterized in that the deep superficial psychology feature set is obtained by calculating a Gini index of the classification tree by using a Gini index method;
wherein the Gini index method is as follows:
Figure FDA0002392486390000021
Figure FDA0002392486390000022
wherein, A represents the depth shallow psychological characteristic set, D represents the set formed by the morphism recognition result and the expression recognition result, and T represents the set formed by the morphism recognition result and the expression recognition resultsData volume, T, representing different label classifications1Amount of data, T, representing anger label2Data volume representing joy tag representing said morphemeAnd the data size of a set formed by the expression recognition result and the identification result.
5. The intelligent analysis method based on video behavior data according to claim 4, wherein the constructing an objective function according to the set of deep and shallow psychological characteristics, and solving the partial derivative of the objective function to obtain an offset value comprises:
respectively constructing a penalty item and an error function based on the depth shallow psychological characteristic set;
adding the error function and the penalty term to obtain a target function;
solving a first order partial derivative result and a second order partial derivative result of the error function;
and reversely deducing to obtain an offset value in the target function according to the first-order partial derivative result and the second-order partial derivative result.
6. An intelligent analysis device based on video behavior data, characterized in that the device comprises a memory and a processor, the memory stores thereon an intelligent analysis program based on video behavior data, the intelligent analysis program based on video behavior data is executable on the processor, and when being executed by the processor, the intelligent analysis program based on video behavior data realizes the following steps:
receiving a pre-recorded user video, and executing voice extraction operation on the user video to obtain voice data and video data not including the voice data;
inputting the video data into a pre-trained expression recognition model for expression recognition to obtain an expression recognition result;
inputting the voice data into a pre-trained morphological recognition model for morphological recognition to obtain a morphological recognition result;
and constructing a classification tree according to the speech state recognition result and the expression recognition result, obtaining a depth shallow layer psychological characteristic set according to the classification tree, inputting the depth shallow layer psychological characteristic set to a pre-constructed psychological analysis model to obtain an offset value, feeding back the speech state recognition result and the expression recognition result to a preset user if the offset value is greater than a preset offset error, generating a psychological state analysis result according to the expression recognition result and the speech state recognition result if the offset value is less than or equal to the preset offset error, and outputting the psychological state analysis result.
7. The intelligent analysis device based on video behavior data as claimed in claim 6, wherein said performing voice extraction operation on the user video to obtain voice data and video data not including voice data comprises:
carrying out pre-emphasis operation on the user video;
performing frame division and windowing operation on the user video subjected to the pre-emphasis operation;
and separating voice data from the user video subjected to the framing windowing operation based on a discrete Fourier transform method to obtain the voice data and the video data not comprising the voice data.
8. The intelligent video behavior data-based analysis device according to claim 6, wherein the intelligent video behavior data-based analysis program, when executed by the processor, further implements the steps of: training the expression recognition model, the training comprising:
constructing the expression recognition model;
establishing a facial expression library and a comparative expression library;
positioning and cutting a face area of the face expression library according to the expression recognition model to obtain a cut face expression library;
predicting the feature points of the facial expression library by using the expression recognition model, judging the errors between the feature points of the facial expression library and the comparison expression library, if the errors are larger than the preset errors, adjusting the parameters of the facial expression recognition model, and predicting the feature points of the facial expression library again, if the errors are smaller than the preset errors, quitting the prediction, and finishing the training of the facial expression recognition model.
9. The intelligent analysis device based on video behavior data according to any one of claims 6 to 8, characterized in that the deep superficial psychology feature set is obtained by calculating a Gini index of the classification tree by using a Gini index method;
wherein the Gini index method is as follows:
Figure FDA0002392486390000031
Figure FDA0002392486390000041
wherein, A represents the depth shallow psychological characteristic set, D represents the set formed by the morphism recognition result and the expression recognition result, and T represents the set formed by the morphism recognition result and the expression recognition resultsData volume, T, representing different label classifications1Amount of data, T, representing anger label2And K represents the data volume of the set formed by the morphism recognition result and the expression recognition result.
10. A computer-readable storage medium, wherein the computer-readable storage medium has stored thereon a video behavior data-based intelligent analysis program, which is executable by one or more processors to implement the steps of the video behavior data-based intelligent analysis method according to any one of claims 1 to 5.
CN202010122870.1A 2020-02-26 Intelligent analysis method, device and storage medium based on video behavior data Active CN111401147B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010122870.1A CN111401147B (en) 2020-02-26 Intelligent analysis method, device and storage medium based on video behavior data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010122870.1A CN111401147B (en) 2020-02-26 Intelligent analysis method, device and storage medium based on video behavior data

Publications (2)

Publication Number Publication Date
CN111401147A true CN111401147A (en) 2020-07-10
CN111401147B CN111401147B (en) 2024-06-04

Family

ID=

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN107292256A (en) * 2017-06-14 2017-10-24 西安电子科技大学 Depth convolved wavelets neutral net expression recognition method based on secondary task
CN109829388A (en) * 2019-01-07 2019-05-31 平安科技(深圳)有限公司 Video data handling procedure, device and computer equipment based on micro- expression
CN109858330A (en) * 2018-12-15 2019-06-07 深圳壹账通智能科技有限公司 Expression analysis method, apparatus, electronic equipment and storage medium based on video
CN109871751A (en) * 2019-01-04 2019-06-11 平安科技(深圳)有限公司 Attitude appraisal procedure, device and storage medium based on facial expression recognition
CN110246506A (en) * 2019-05-29 2019-09-17 平安科技(深圳)有限公司 Voice intelligent detecting method, device and computer readable storage medium
CN110826466A (en) * 2019-10-31 2020-02-21 南京励智心理大数据产业研究院有限公司 Emotion identification method, device and storage medium based on LSTM audio-video fusion

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN107292256A (en) * 2017-06-14 2017-10-24 西安电子科技大学 Depth convolved wavelets neutral net expression recognition method based on secondary task
CN109858330A (en) * 2018-12-15 2019-06-07 深圳壹账通智能科技有限公司 Expression analysis method, apparatus, electronic equipment and storage medium based on video
CN109871751A (en) * 2019-01-04 2019-06-11 平安科技(深圳)有限公司 Attitude appraisal procedure, device and storage medium based on facial expression recognition
CN109829388A (en) * 2019-01-07 2019-05-31 平安科技(深圳)有限公司 Video data handling procedure, device and computer equipment based on micro- expression
CN110246506A (en) * 2019-05-29 2019-09-17 平安科技(深圳)有限公司 Voice intelligent detecting method, device and computer readable storage medium
CN110826466A (en) * 2019-10-31 2020-02-21 南京励智心理大数据产业研究院有限公司 Emotion identification method, device and storage medium based on LSTM audio-video fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李昊璇 等: "融合声门波信号频谱特征的语音情感识别", 测试技术学报, vol. 31, no. 01, 28 February 2017 (2017-02-28), pages 8 *
杨勇 等: "基于MSVR和Arousal-Valence情感模型的表情识别研究", 重庆邮电大学学报(自然科学版), vol. 28, no. 06, 15 December 2016 (2016-12-15), pages 836 - 843 *

Similar Documents

Publication Publication Date Title
CN106803069B (en) Crowd happiness degree identification method based on deep learning
CN110188615B (en) Facial expression recognition method, device, medium and system
CN109117777A (en) The method and apparatus for generating information
JP2017062781A (en) Similarity-based detection of prominent objects using deep cnn pooling layers as features
CN110580516B (en) Interaction method and device based on intelligent robot
CN113255557B (en) Deep learning-based video crowd emotion analysis method and system
Kallipolitis et al. Affective analysis of patients in homecare video-assisted telemedicine using computational intelligence
CN110705490A (en) Visual emotion recognition method
CN112418059A (en) Emotion recognition method and device, computer equipment and storage medium
Premaladha et al. Recognition of facial expression using haar cascade classifier and deep learning
Gantayat et al. Study of algorithms and methods on emotion detection from facial expressions: a review from past research
Cai et al. Pedestrian detection algorithm in traffic scene based on weakly supervised hierarchical deep model
Panda et al. Feedback through emotion extraction using logistic regression and CNN
Pallavi et al. Retrieval of facial sketches using linguistic descriptors: an approach based on hierarchical classification of facial attributes
CN111401147A (en) Intelligent analysis method and device based on video behavior data and storage medium
Takalkar et al. Improving micro-expression recognition accuracy using twofold feature extraction
CN111401147B (en) Intelligent analysis method, device and storage medium based on video behavior data
US20220180129A1 (en) Fcn-based multivariate time series data classification method and device
Rohini et al. A framework to identify allergen and nutrient content in fruits and packaged food using deep learning and OCR
CN115294621A (en) Expression recognition system and method based on two-stage self-healing network
CN113705328A (en) Depression detection method and system based on facial feature points and facial movement units
CN112668631A (en) Mobile terminal community pet identification method based on convolutional neural network
Gawade et al. Algorithm for safety decisions in social media feeds using personification patterns
CN112132175A (en) Object classification method and device, electronic equipment and storage medium
Bennur et al. Face Mask Detection and Face Recognition of Unmasked People in Organizations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant