CN117556084B - Video emotion analysis system based on multiple modes - Google Patents

Video emotion analysis system based on multiple modes Download PDF

Info

Publication number
CN117556084B
CN117556084B CN202311812195.5A CN202311812195A CN117556084B CN 117556084 B CN117556084 B CN 117556084B CN 202311812195 A CN202311812195 A CN 202311812195A CN 117556084 B CN117556084 B CN 117556084B
Authority
CN
China
Prior art keywords
analysis
emotion
vector
processor
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311812195.5A
Other languages
Chinese (zh)
Other versions
CN117556084A (en
Inventor
张卫平
张伟
李显阔
王丹
邵胜博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Digital Group Co Ltd
Original Assignee
Global Digital Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Global Digital Group Co Ltd filed Critical Global Digital Group Co Ltd
Priority to CN202311812195.5A priority Critical patent/CN117556084B/en
Publication of CN117556084A publication Critical patent/CN117556084A/en
Application granted granted Critical
Publication of CN117556084B publication Critical patent/CN117556084B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a video emotion analysis system based on multiple modes, which relates to the field of electric digital data processing and comprises an audio and video acquisition module, an expression recognition module, a voice analysis module and an emotion comprehensive analysis module, wherein the audio and video acquisition module is used for acquiring facial video information and voice information of a user, the expression recognition module is used for analyzing and processing the facial video information, the voice analysis module is used for analyzing and processing the voice information, and the emotion comprehensive analysis module is used for processing and obtaining emotion information of the user based on a video analysis result and a voice analysis result; the system performs cut-in analysis from two modes of video information and audio information, and fuses the two analysis results, so that more accurate emotion results can be obtained.

Description

Video emotion analysis system based on multiple modes
Technical Field
The invention relates to the field of electric digital data processing, in particular to a video emotion analysis system based on multiple modes.
Background
With the development of artificial intelligence, more and more application products for emotion communication are generated, and the application premise of the products is that the emotion states of users can be accurately mastered, and in the existing emotion analysis system, a single mode is often adopted for analysis, or the results of various modes are combined in a plurality of modes but only in a simple way, so that a multi-mode system is required to accurately analyze the emotion of the users.
The foregoing discussion of the background art is intended to facilitate an understanding of the present invention only. This discussion is not an admission or admission that any of the material referred to was common general knowledge.
A number of emotion analysis systems have been developed and found, through extensive searching and reference, to have a system as disclosed in publication No. CN111222464B, which generally includes: acquiring a physiological signal corresponding to a target user; wherein the physiological signal comprises an electroencephalogram signal and an electromyogram signal; acquiring facial image information corresponding to a target user; and respectively inputting the physiological signals and the facial image information into at least one pre-trained target classification model to obtain physiological signal recognition results and micro-expression recognition results corresponding to the target user, and determining emotion analysis results corresponding to the target user based on the physiological signal recognition results and the micro-expression recognition results. However, the system needs to acquire physiological signals, is more complex than acquiring audio and video information, cannot comprehensively analyze in a multi-mode, and is easy to judge emotion errors.
Disclosure of Invention
The invention aims to provide a video emotion analysis system based on multiple modes aiming at the defects.
The invention adopts the following technical scheme:
a video emotion analysis system based on multiple modes comprises an audio and video acquisition module, an expression recognition module, a voice analysis module and an emotion comprehensive analysis module;
the emotion comprehensive analysis module is used for obtaining emotion information of the user based on a video analysis result and a voice analysis result;
the audio and video acquisition module comprises a video acquisition unit, an audio acquisition unit and a synchronous marking unit, wherein the video acquisition unit is used for acquiring facial video information of a user, the audio acquisition unit is used for acquiring voice information of the user, and the synchronous marking unit is used for marking synchronous time points in the video information and the voice information;
the expression recognition module comprises a facial feature extraction unit and an expression analysis unit, wherein the facial feature extraction unit is used for extracting facial features of a user from video information, and the expression analysis unit is used for analyzing emotion of the user based on the facial features;
the voice analysis module comprises a voice feature extraction unit and a intonation analysis unit, wherein the voice feature extraction unit is used for extracting key features in voice information, and the intonation analysis unit is used for analyzing emotion of a user according to the key features;
the emotion comprehensive analysis module comprises a data fusion unit and an emotion judgment unit, wherein the data fusion unit is used for carrying out multi-mode fusion on analysis data of the expression recognition module and analysis data of the voice analysis module, and the emotion judgment unit is used for carrying out judgment analysis on the overall emotion state of the user based on the fused data;
further, the facial feature extraction unit comprises a frame information extraction processor, a face alignment processor, a key point positioning processor and a feature vector processor, wherein the frame information extraction processor is used for sequentially extracting frame information from video information, the face alignment processor is used for acquiring a local facial picture from the frame information, the key point positioning processor is used for acquiring position information of a key point in the facial picture, and the feature vector processor is used for calculating a feature vector according to the position information of the key point;
further, the expression analysis unit comprises a vector analysis processor, a first emotion feature register and a first proofreading analysis processor, wherein the vector analysis processor is used for calculating and processing feature vectors to obtain expression data, the first emotion feature register is used for storing the expression data of each emotion, and the first proofreading analysis processor is used for comparing the calculated expression data with recorded expression data and outputting a first judgment vector;
the first collation analysis processor calculates a first judgment vector Jv1 according to the following formula:
wherein Jv1 i For the ith element value of the first judgment vector, jv1 shares n elements, n is the number of emotions recorded by the first emotion characteristic register, and Ep 1 And Ep is a 2 Respectively a transverse ratio and a longitudinal ratio of expression data, ep 1 (i) And Ep is a 2 (i) A transverse ratio and a longitudinal ratio for the ith emotion;
further, the intonation analysis unit includes a second emotion feature register for storing intonation data of each emotion, and a second correction analysis processor for comparing the peak feature vector with the intonation data and outputting a second judgment vector Jv2, specifically expressed as follows:
wherein Jv2 i The i-th element value representing the second judgment vector, jv2 has n elements in total,and->Intonation feature vector for the ith emotion, (-)>,/>) The intonation feature vector in the corresponding target time period;
further, the data fusion unit comprises a time matching processor and a fusion analysis processor, wherein the time matching processor divides the first judgment vector into a plurality of sets according to the synchronous time point, each set is matched with a corresponding second judgment vector, and the fusion analysis processor analyzes and processes the matched first judgment vector set and second judgment vector set;
the fusion analysis processor performs primary fusion processing on the first judgment vector set according to the following steps of:
wherein Jv1 i ' is the i element value of the first-level fusion vector, N is the number of vectors in the first judgment vector set, and Jv1 i (j) N (i, j) is the sorting value of the ith element value of the jth vector in the first judging vector set in the element value of the present vector;
the fusion analysis processor performs secondary fusion processing according to the following steps to obtain a secondary fusion vector Jv2':
wherein Jv2 i ' is the value of the i-th element in the secondary fusion vector.
The beneficial effects obtained by the invention are as follows:
the system obtains the judgment vectors by independently analyzing the video information and the audio information, and then fuses the judgment vectors to obtain the emotion analysis result under multiple modes, compared with a single mode, the system is more accurate, and the independently analyzed judgment vectors do not directly represent emotion results, but represent the possibility of various emotions, so that the two judgment vectors can be organically fused, and the results are not simply combined.
For a further understanding of the nature and the technical aspects of the present invention, reference should be made to the following detailed description of the invention and the accompanying drawings, which are provided for purposes of reference only and are not intended to limit the invention.
Drawings
FIG. 1 is a schematic diagram of the overall structural framework of the present invention;
FIG. 2 is a schematic diagram of an audio/video acquisition module according to the present invention;
FIG. 3 is a schematic diagram of an expression recognition module according to the present invention;
FIG. 4 is a schematic diagram of a voice analysis module according to the present invention;
FIG. 5 is a schematic diagram of the emotion comprehensive analysis module of the present invention.
Detailed Description
The following embodiments of the present invention are described in terms of specific examples, and those skilled in the art will appreciate the advantages and effects of the present invention from the disclosure herein. The invention is capable of other and different embodiments and its several details are capable of modification and variation in various respects, all without departing from the spirit of the present invention. The drawings of the present invention are merely schematic illustrations, and are not intended to be drawn to actual dimensions. The following embodiments will further illustrate the related art content of the present invention in detail, but the disclosure is not intended to limit the scope of the present invention.
Embodiment one: the embodiment provides a video emotion analysis system based on multiple modes, which comprises an audio and video acquisition module, an expression recognition module, a voice analysis module and an emotion comprehensive analysis module;
the emotion comprehensive analysis module is used for obtaining emotion information of the user based on a video analysis result and a voice analysis result;
the audio and video acquisition module comprises a video acquisition unit, an audio acquisition unit and a synchronous marking unit, wherein the video acquisition unit is used for acquiring facial video information of a user, the audio acquisition unit is used for acquiring voice information of the user, and the synchronous marking unit is used for marking synchronous time points in the video information and the voice information;
the expression recognition module comprises a facial feature extraction unit and an expression analysis unit, wherein the facial feature extraction unit is used for extracting facial features of a user from video information, and the expression analysis unit is used for analyzing emotion of the user based on the facial features;
the voice analysis module comprises a voice feature extraction unit and a intonation analysis unit, wherein the voice feature extraction unit is used for extracting key features in voice information, and the intonation analysis unit is used for analyzing emotion of a user according to the key features;
the emotion comprehensive analysis module comprises a data fusion unit and an emotion judgment unit, wherein the data fusion unit is used for carrying out multi-mode fusion on analysis data of the expression recognition module and analysis data of the voice analysis module, and the emotion judgment unit is used for carrying out judgment analysis on the overall emotion state of the user based on the fused data;
the facial feature extraction unit comprises a frame information extraction processor, a face alignment processor, a key point positioning processor and a feature vector processor, wherein the frame information extraction processor is used for sequentially extracting frame information from video information, the face alignment processor is used for acquiring local facial pictures from the frame information, the key point positioning processor is used for acquiring position information of key points in the facial pictures, and the feature vector processor is used for calculating feature vectors according to the position information of the key points;
the expression analysis unit comprises a vector analysis processor, a first emotion feature register and a first proofreading analysis processor, wherein the vector analysis processor is used for calculating feature vectors to obtain expression data, the first emotion feature register is used for storing the expression data of each emotion, and the first proofreading analysis processor is used for comparing the calculated expression data with recorded expression data and outputting a first judgment vector;
the first collation analysis processor calculates a first judgment vector Jv1 according to the following formula:
wherein Jv1 i For the i-th element value of the first judgment vector, jv1 has n elements in total,n is the number of emotions recorded by the first emotion feature register, ep 1 And Ep is a 2 Respectively a transverse ratio and a longitudinal ratio of expression data, ep 1 (i) And Ep is a 2 (i) A transverse ratio and a longitudinal ratio for the ith emotion;
the intonation analysis unit comprises a second emotion feature register and a second correction analysis processor, wherein the second emotion feature register is used for storing intonation data of each emotion, and the second correction analysis processor is used for comparing peak feature vectors with the intonation data and outputting second judgment vectors Jv2, and the specific formula is as follows:
wherein Jv2 i The i-th element value representing the second judgment vector, jv2 has n elements in total,and->Intonation feature vector for the ith emotion, (-)>,/>) The intonation feature vector in the corresponding target time period;
the data fusion unit comprises a time matching processor and a fusion analysis processor, wherein the time matching processor divides a first judgment vector into a plurality of sets according to a synchronous time point, each set is matched with a corresponding second judgment vector, and the fusion analysis processor analyzes and processes the matched first judgment vector set and second judgment vector set;
the fusion analysis processor performs primary fusion processing on the first judgment vector set according to the following steps of:
wherein Jv1 i ' is the i element value of the first-level fusion vector, N is the number of vectors in the first judgment vector set, and Jv1 i (j) N (i, j) is the sorting value of the ith element value of the jth vector in the first judging vector set in the element value of the present vector;
the fusion analysis processor performs secondary fusion processing according to the following steps to obtain a secondary fusion vector Jv2':
wherein Jv2 i ' is the value of the i-th element in the secondary fusion vector.
Embodiment two: the embodiment comprises the whole content of the first embodiment, and provides a video emotion analysis system based on multiple modes, which comprises an audio and video acquisition module, an expression recognition module, a voice analysis module and an emotion comprehensive analysis module;
the emotion comprehensive analysis module is used for obtaining emotion information of the user based on a video analysis result and a voice analysis result;
referring to fig. 2, the audio/video acquisition module includes a video acquisition unit, an audio acquisition unit and a synchronization marking unit, wherein the video acquisition unit is used for acquiring facial video information of a user, the audio acquisition unit is used for acquiring voice information of the user, and the synchronization marking unit is used for marking synchronization time points in the video information and the voice information;
referring to fig. 3, the expression recognition module includes a facial feature extraction unit for extracting facial features of a user from video information and an expression analysis unit for analyzing emotion of the user based on the facial features;
referring to fig. 4, the voice analysis module includes a voice feature extraction unit and a intonation analysis unit, wherein the voice feature extraction unit is used for extracting key features in voice information, and the intonation analysis unit analyzes emotion of a user according to the key features;
referring to fig. 5, the emotion comprehensive analysis module includes a data fusion unit and an emotion judgment unit, where the data fusion unit is configured to perform multi-modal fusion on analysis data of the expression recognition module and analysis data of the voice analysis module, and the emotion judgment unit performs judgment analysis on an overall emotion state of a user based on the fused data;
the facial feature extraction unit comprises a frame information extraction processor, a face alignment processor, a key point positioning processor and a feature vector processor, wherein the frame information extraction processor is used for sequentially extracting frame information from video information, the face alignment processor is used for acquiring local facial pictures from the frame information, the key point positioning processor is used for acquiring position information of key points in the facial pictures, and the feature vector processor is used for calculating feature vectors according to the position information of the key points;
the frame information extraction processor detects frames containing synchronous time point information as basic frames, extracts one frame of information every same frame after the basic frames, stores the basic frames and the extracted frames as analysis frames and sequentially sends the analysis frames to the face alignment processor;
the face alignment processor intercepts a rectangular picture from an analysis frame, wherein the two sides of the rectangular picture are boundary vertical lines of ears, the bottom side of the rectangular picture is a boundary horizontal line of chin, the upper side of the rectangular picture is a boundary horizontal line of eyebrows, and the face alignment processor marks the width and the height of the rectangular picture as w and h respectively;
the process of acquiring the key point position information by the key point positioning processor comprises the following steps:
s1, acquiring edge information of eyes, a mouth, a nose and eyebrows in a rectangular picture;
s2, intersecting the edge information by using a preset intercept line, wherein the intersection point is used as a key point;
s3, reading coordinate information of the key points in the rectangular picture;
the preset stub includes three pieces of information: the key points obtained by the corresponding sectional lines of the eyes, the vertical line and the 0 are the left end point of the eyes, and the two key points obtained by the corresponding sectional lines of the mouth, the vertical line and the 0.5 are the upper end point and the lower end point in the middle of the mouth;
the feature vector processor uses the nose core key points as vector starting points and other key points as vector ending points to calculate feature vectors, and usesRepresenting an ith feature vector;
the facial feature extraction unit sends the feature vector of each analysis frame to the expression analysis unit;
the expression analysis unit comprises a vector analysis processor, a first emotion feature register and a first proofreading analysis processor, wherein the vector analysis processor is used for calculating feature vectors to obtain expression data, the first emotion feature register is used for storing the expression data of each emotion, and the first proofreading analysis processor is used for comparing the calculated expression data with recorded expression data and outputting a first judgment vector;
the vector analysis processor calculates and processes the feature vector according to the following steps:
wherein Ep is 1 And Ep is a 2 To represent two ratios of expression data, respectively referred to as a transverse ratio and a longitudinal ratio, { k 1i And } is a transverse coefficient group, { k 2i Is the longitudinal coefficient group, m isThe number of feature vectors;
the transverse coefficient group and the longitudinal coefficient group are obtained by measuring and counting a large number of face images;
the first collation analysis processor calculates a first judgment vector Jv1 according to the following formula:
wherein Jv1 i For the ith element value of the first judgment vector, jv1 shares n elements, n is the number of emotions recorded by the first emotion characteristic register, and Ep 1 (i) And Ep is a 2 (i) A transverse ratio and a longitudinal ratio for the ith emotion;
the expression recognition module sends the first judgment vector of each analysis frame to the emotion comprehensive analysis module;
the voice feature extraction unit comprises a peak detection processor and a peak feature processor, wherein the peak detection processor is used for detecting a peak time point from audio data, and the peak feature processor is used for processing according to the interval time of the peak time point and the change of the amplitude at the peak time point to obtain voice features;
for time intervalsThe amplitude change is represented by->Representing that the peak characteristic processor is +_ for two adjacent synchronization points in time>And->Calculating standard deviation, respectively marked as +.>And->The period between two adjacent synchronization time points is called the target period, which is defined by +.>And->Vectors of constitution (+)>,/>) As intonation feature vectors in the corresponding target time period;
the intonation analysis unit comprises a second emotion feature register and a second correction analysis processor, wherein the second emotion feature register is used for storing intonation data of each emotion, and the second correction analysis processor is used for comparing peak feature vectors with the intonation data and outputting second judgment vectors Jv2, and the specific formula is as follows:
wherein Jv2 i The i-th element value representing the second judgment vector, jv2 has n elements in total,and->Intonation feature vector for the ith emotion;
the second judgment vector of each target time period of the voice analysis module is sent to the emotion comprehensive analysis module;
the data fusion unit comprises a time matching processor and a fusion analysis processor, wherein the time matching processor divides a first judgment vector into a plurality of sets according to a synchronous time point, each set is matched with a corresponding second judgment vector, and the fusion analysis processor analyzes and processes the matched first judgment vector set and second judgment vector set;
the fusion analysis processor performs primary fusion processing on the first judgment vector set according to the following steps of:
wherein Jv1 i ' is the i element value of the first-level fusion vector, N is the number of vectors in the first judgment vector set, and Jv1 i (j) N (i, j) is the sorting value of the ith element value of the jth vector in the first judging vector set in the element value of the present vector;
the sorting value refers to the sequence number of the element values when sorting from small to large;
the fusion analysis processor performs secondary fusion processing on the primary fusion vector and the secondary judgment vector according to the following formula to obtain a secondary fusion vector Jv2':
wherein Jv2 i ' is the value of the ith element in the secondary fusion vector;
the emotion judging unit comprises a data receiving processor and an emotion output processor, wherein the data receiving processor is used for receiving the secondary fusion vector, and the emotion output processor outputs emotion information according to the secondary fusion vector;
the emotion output processor retrieves the element item with the maximum element value from each secondary fusion vector, converts the element item into corresponding emotion, and then arranges the emotion in sequence and outputs the emotion as emotion information;
both i and j appearing above are ordinals used to represent sequence numbers.
The foregoing disclosure is only a preferred embodiment of the present invention and is not intended to limit the scope of the invention, so that all equivalent technical changes made by applying the description of the present invention and the accompanying drawings are included in the scope of the present invention, and in addition, elements in the present invention can be updated as the technology develops.

Claims (2)

1. The video emotion analysis system based on the multiple modes is characterized by comprising an audio and video acquisition module, an expression recognition module, a voice analysis module and an emotion comprehensive analysis module;
the emotion comprehensive analysis module is used for obtaining emotion information of the user based on a video analysis result and a voice analysis result;
the audio and video acquisition module comprises a video acquisition unit, an audio acquisition unit and a synchronous marking unit, wherein the video acquisition unit is used for acquiring facial video information of a user, the audio acquisition unit is used for acquiring voice information of the user, and the synchronous marking unit is used for marking synchronous time points in the video information and the voice information;
the expression recognition module comprises a facial feature extraction unit and an expression analysis unit, wherein the facial feature extraction unit is used for extracting facial features of a user from video information, and the expression analysis unit is used for analyzing emotion of the user based on the facial features;
the voice analysis module comprises a voice feature extraction unit and a intonation analysis unit, wherein the voice feature extraction unit is used for extracting key features in voice information, and the intonation analysis unit is used for analyzing emotion of a user according to the key features;
the emotion comprehensive analysis module comprises a data fusion unit and an emotion judgment unit, wherein the data fusion unit is used for carrying out multi-mode fusion on analysis data of the expression recognition module and analysis data of the voice analysis module, and the emotion judgment unit is used for carrying out judgment analysis on the overall emotion state of the user based on the fused data;
the expression analysis unit comprises a vector analysis processor, a first emotion feature register and a first proofreading analysis processor, wherein the vector analysis processor is used for calculating feature vectors to obtain expression data, the first emotion feature register is used for storing the expression data of each emotion, and the first proofreading analysis processor is used for comparing the calculated expression data with recorded expression data and outputting a first judgment vector;
the first collation analysis processor calculates a first judgment vector Jv1 according to the following formula:
wherein Jv1 i For the ith element value of the first judgment vector, jv1 shares n elements, n is the number of emotions recorded by the first emotion characteristic register, and Ep 1 And Ep is a 2 Respectively a transverse ratio and a longitudinal ratio of expression data, ep 1 (i) And Ep is a 2 (i) A transverse ratio and a longitudinal ratio for the ith emotion;
the intonation analysis unit comprises a second emotion feature register and a second correction analysis processor, wherein the second emotion feature register is used for storing intonation data of each emotion, and the second correction analysis processor is used for comparing peak feature vectors with the intonation data and outputting second judgment vectors Jv2, and the specific formula is as follows:
wherein Jv2 i The i-th element value representing the second judgment vector, jv2 has n elements in total,and->Intonation feature vector for the ith emotion, (-)>,/>) The intonation feature vector in the corresponding target time period;
the data fusion unit comprises a time matching processor and a fusion analysis processor, wherein the time matching processor divides a first judgment vector into a plurality of sets according to a synchronous time point, each set is matched with a corresponding second judgment vector, and the fusion analysis processor analyzes and processes the matched first judgment vector set and second judgment vector set;
the fusion analysis processor performs primary fusion processing on the first judgment vector set according to the following steps of:
wherein Jv1 i ' is the i element value of the first-level fusion vector, N is the number of vectors in the first judgment vector set, and Jv1 i (j) N (i, j) is the sorting value of the ith element value of the jth vector in the first judging vector set in the element value of the present vector;
the fusion analysis processor performs secondary fusion processing according to the following steps to obtain a secondary fusion vector Jv2':
wherein Jv2 i ' is the value of the i-th element in the secondary fusion vector.
2. The multi-modality based video emotion analysis system of claim 1, wherein the facial feature extraction unit includes a frame information extraction processor for sequentially extracting frame information from video information, a face alignment processor for acquiring a partial face picture from the frame information, a key point location processor for acquiring position information of a key point in the face picture, and a feature vector processor for calculating a feature vector from the position information of the key point.
CN202311812195.5A 2023-12-27 2023-12-27 Video emotion analysis system based on multiple modes Active CN117556084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311812195.5A CN117556084B (en) 2023-12-27 2023-12-27 Video emotion analysis system based on multiple modes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311812195.5A CN117556084B (en) 2023-12-27 2023-12-27 Video emotion analysis system based on multiple modes

Publications (2)

Publication Number Publication Date
CN117556084A CN117556084A (en) 2024-02-13
CN117556084B true CN117556084B (en) 2024-03-26

Family

ID=89811171

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311812195.5A Active CN117556084B (en) 2023-12-27 2023-12-27 Video emotion analysis system based on multiple modes

Country Status (1)

Country Link
CN (1) CN117556084B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007098560A1 (en) * 2006-03-03 2007-09-07 The University Of Southern Queensland An emotion recognition system and method
CN114399818A (en) * 2022-01-05 2022-04-26 广东电网有限责任公司 Multi-mode face emotion recognition method and device
CN114724224A (en) * 2022-04-15 2022-07-08 浙江工业大学 Multi-mode emotion recognition method for medical care robot
CN116167015A (en) * 2023-02-28 2023-05-26 南京邮电大学 Dimension emotion analysis method based on joint cross attention mechanism
WO2023139559A1 (en) * 2022-01-24 2023-07-27 Wonder Technology (Beijing) Ltd Multi-modal systems and methods for voice-based mental health assessment with emotion stimulation
CN116883888A (en) * 2023-06-06 2023-10-13 交通银行股份有限公司 Bank counter service problem tracing system and method based on multi-mode feature fusion

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10204625B2 (en) * 2010-06-07 2019-02-12 Affectiva, Inc. Audio analysis learning using video data
US20180160200A1 (en) * 2016-12-03 2018-06-07 Streamingo Solutions Private Limited Methods and systems for identifying, incorporating, streamlining viewer intent when consuming media
CN110677598B (en) * 2019-09-18 2022-04-12 北京市商汤科技开发有限公司 Video generation method and device, electronic equipment and computer storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007098560A1 (en) * 2006-03-03 2007-09-07 The University Of Southern Queensland An emotion recognition system and method
CN114399818A (en) * 2022-01-05 2022-04-26 广东电网有限责任公司 Multi-mode face emotion recognition method and device
WO2023139559A1 (en) * 2022-01-24 2023-07-27 Wonder Technology (Beijing) Ltd Multi-modal systems and methods for voice-based mental health assessment with emotion stimulation
CN114724224A (en) * 2022-04-15 2022-07-08 浙江工业大学 Multi-mode emotion recognition method for medical care robot
CN116167015A (en) * 2023-02-28 2023-05-26 南京邮电大学 Dimension emotion analysis method based on joint cross attention mechanism
CN116883888A (en) * 2023-06-06 2023-10-13 交通银行股份有限公司 Bank counter service problem tracing system and method based on multi-mode feature fusion

Also Published As

Publication number Publication date
CN117556084A (en) 2024-02-13

Similar Documents

Publication Publication Date Title
Dehghan et al. Dager: Deep age, gender and emotion recognition using convolutional neural network
CN107358223B (en) Face detection and face alignment method based on yolo
US20210312166A1 (en) System and method for face recognition based on dynamic updating of facial features
CN102214291B (en) Method for quickly and accurately detecting and tracking human face based on video sequence
CN110267061B (en) News splitting method and system
CN111563452B (en) Multi-human-body gesture detection and state discrimination method based on instance segmentation
CN103218603B (en) A kind of face automatic marking method and system
US20110025834A1 (en) Method and apparatus of identifying human body posture
CN108898125A (en) One kind being based on embedded human face identification and management system
CN106778496A (en) Biopsy method and device
CN105740775A (en) Three-dimensional face living body recognition method and device
CN109034099A (en) A kind of expression recognition method and device
CN110232331B (en) Online face clustering method and system
CN104091173B (en) A kind of gender identification method and device based on web camera
CN110796101A (en) Face recognition method and system of embedded platform
Yuan et al. Large scale sign language interpretation
CN107038400A (en) Face identification device and method and utilize its target person tracks of device and method
CN112101124A (en) Sitting posture detection method and device
US20240135956A1 (en) Method and apparatus for measuring speech-image synchronicity, and method and apparatus for training model
CN112257513A (en) Training method, translation method and system for sign language video translation model
JP5846552B2 (en) Sign language recognition method and apparatus
Agarwal et al. Face recognition based smart and robust attendance monitoring using deep CNN
CN117556084B (en) Video emotion analysis system based on multiple modes
CN106407906A (en) Human face identification method
CN110287933A (en) A kind of dynamic human face recognition system and recognition methods based on stereo video streaming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant