CN112560668A - Human behavior identification method based on scene prior knowledge - Google Patents

Human behavior identification method based on scene prior knowledge Download PDF

Info

Publication number
CN112560668A
CN112560668A CN202011470438.8A CN202011470438A CN112560668A CN 112560668 A CN112560668 A CN 112560668A CN 202011470438 A CN202011470438 A CN 202011470438A CN 112560668 A CN112560668 A CN 112560668A
Authority
CN
China
Prior art keywords
scene
prior knowledge
video
human behavior
human
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011470438.8A
Other languages
Chinese (zh)
Inventor
袁家斌
刘昕
王天星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202011470438.8A priority Critical patent/CN112560668A/en
Publication of CN112560668A publication Critical patent/CN112560668A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human behavior identification method based on scene prior knowledge, which comprises the following steps: preprocessing an input video; establishing an indoor scene-human behavior prior knowledge base; establishing a video scene recognition model and a human behavior recognition model M; and carrying out scene prediction on the input video, and fusing corresponding scene prior knowledge into a human behavior recognition network model M based on a scene recognition result to obtain human behavior classification. The method can fully utilize the correlation between the scene and the human body activity, optimize the objective function by converting the prior knowledge into the constraint on the weight in the behavior recognition model, and effectively improve the human body behavior recognition effect in the video.

Description

Human behavior identification method based on scene prior knowledge
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a human behavior identification method based on scene prior knowledge.
Background
In recent years, the development of video platforms makes video one of the most widely used data, the understanding of human behaviors in video is also receiving more attention, and how to better utilize elements contained in video to identify human behaviors in different environments becomes a challenging task. In the real world, human behaviors are closely related to the scene where the human behaviors are located, and elements contained in the scene, such as objects, environmental structures, scene attributes and the like in the scene, can influence the behaviors of a subject. While some actions are relatively scenario-independent, in a particular scenario, a subject may only accomplish a particular behavior.
In the prior art, a more mainstream human behavior identification method is to optimize a deep neural network, or to improve an extraction mode of human features and a processing means of the features after extraction, a typical double-flow network learns Spatial (Spatial) features of a single frame image by using a Spatial flow channel and learns motion information of the optical flow image by using a temporal flow channel, and finally, the two channels are fused by fractions of softmax to obtain behavior classification. The 3D convolution network fully utilizes the characteristic that the 3D convolution is more suitable for learning space-time characteristics, and fully fuses the low-dimensional characteristics and the high-dimensional characteristics of the human body. The methods effectively improve the identification accuracy of human behavior identification, focus on a single human behavior identification task, and ignore objective relation between a scene where a human body is located and a human body main body.
For the relationship between the scene context and the human behavior, in some human behavior recognition methods, human features and scene features are input into different channels and are respectively extracted and analyzed, and the method does not fully utilize the constraint condition that the human activities are limited by the scene where the human activities are located.
Disclosure of Invention
The invention provides a human behavior identification method based on scene prior knowledge, which aims to solve the problem of low accuracy of behavior identification caused by neglecting the association of scenes and human behaviors in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
a human behavior identification method based on scene prior knowledge comprises the following steps:
s1, preprocessing an input video;
s2, establishing an indoor scene-human body behavior prior knowledge base;
s3, establishing a video scene recognition model and a human body behavior recognition model;
s4, model training and testing;
and S5, inputting the video to perform human behavior recognition to obtain human behavior classification.
Further, the specific step of step S1 is:
s11, randomly extracting one frame of the video to serve as a scene image, and establishing a mapping relation between the scene image and the video after extraction;
s12, sparsely sampling an input video by using a TSN (time sequence coding) method, averagely dividing the video into N segments according to the number of frames, randomly extracting one frame from each Segment, and after an optical flow graph is obtained by using a TV-L1 method in OpenCV, randomly extracting one frame of optical flow image from each Segment by using the TSN method;
s13, the preprocessed images are all cut by using centers, and the size of each image is 224 multiplied by 224 (pixels).
Further, the specific step of step S2 is:
s21, quantifying prior knowledge and obtaining the prior knowledge G of different scenesjExpressed as:
Figure BDA0002833595970000021
wherein:
Figure BDA0002833595970000022
representing the prior probability of occurrence of the t-th action in the j-th scene;
s22, storing prior knowledge under all scenes in a prior knowledge base, and mapping relation G from the scenes to behaviorsSExpressed as: gS=(G1,G2,……,Gj)TWherein G isjIndicating a priori knowledge in the j-th scenario, T refers to transpose.
Further, the specific step of step S3 is:
s31, selecting ResNet152 as a network model for scene recognition, and pre-training the network on large-scale data sets of Places365 and SUN 397;
s32. establishing a baseIn an improved network model M of I3D, fusing the priori knowledge in the softmax function and the predicted classification result according to the influence factor mu of the priori knowledge to obtain a final prediction result, and optimizing a loss function for a training set with k classifications by comparing the label value with the prediction result of the softmax function
Figure BDA0002833595970000023
Wherein: y denotes the correct tag value, Y denotes the final predicted value, YtIs a tag value, ytFor the predicted value, Y ═ Yt
Further, the specific step of step S4 is:
s41, inputting the scene graph into a scene recognition model for scene classification, taking TOP-3 of a recognition result, and recalculating the result according to a proportion of 100%;
s42, recalculating the prior knowledge corresponding to each scene into new fusion scene prior knowledge according to scene proportion based on the result of the step S41;
s43, the network model M is of a modular structure and comprises a convolution layer, a maximum pooling layer, an average pooling layer, an inclusion module and a final softmax, fusion scene prior knowledge is introduced to constrain the weight of the softmax, so that the final output of the softmax excludes certain types of impossible behavior classification, the accuracy of behavior identification is improved, meanwhile, in order to avoid the problem of gradient disappearance, 2 additional softmax for conducting the gradient are added to the network model M on the basis of I3D, the output of a certain inclusion module in the middle of the two auxiliary classifiers is used for classification, the two auxiliary classifiers are fused into a final classification result according to a weight value of 0.2, and in a final test, the two auxiliary classifiers are removed.
Compared with the prior art, the invention has the following beneficial effects:
the human behavior identification method based on the scene prior knowledge comprises the steps of preprocessing an input video to obtain a scene image and an image after sparse sampling, then establishing an indoor scene-human behavior prior knowledge base, inputting the preprocessed video into a video scene identification model to carry out scene prediction on the input video, fusing corresponding scene prior knowledge into a human behavior identification network model M based on a scene identification result to obtain human behavior classification, and completing content identification of video multitasks through the method.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a model diagram of the scene recognition model and the human behavior recognition network model M.
Fig. 3 is a block diagram of the human behavior recognition network model M.
Detailed Description
The present invention will be further described with reference to the following examples.
As shown in fig. 1, a human behavior recognition method based on scene prior knowledge includes the following steps:
s1, preprocessing an input video;
s11, one frame of the video is arbitrarily extracted to serve as a scene image, and a mapping relation between the scene image and the video is established after extraction so as to be used when association of scene-video-priori knowledge is carried out;
s12, sparsely sampling an input video by using a TSN (time sequence coding) method, averagely dividing the video into N segments according to the number of frames, randomly extracting one frame from each Segment, and after an optical flow graph is obtained by using a TV-L1 method in OpenCV, randomly extracting one frame of optical flow image from each Segment by using the TSN method;
s13, the preprocessed images are all cut by using centers, and the size of each image is 224 multiplied by 224 (pixels).
S2, establishing an indoor scene-human body behavior prior knowledge base;
s21, quantifying prior knowledge and obtaining the prior knowledge G of different scenesjExpressed as:
Figure BDA0002833595970000031
wherein:
Figure BDA0002833595970000032
representing the prior probability of occurrence of the t-th action in the j-th scene;
s22, storing prior knowledge under all scenes in a prior knowledge base, and mapping relation G from the scenes to behaviorsSExpressed as: gS=(G1,G2,……,Gj)TWherein G isjIndicating a priori knowledge in the j-th scenario, T refers to transpose.
S3, establishing a video scene recognition model and a human body behavior recognition model;
s31, selecting ResNet152 as a network model for scene recognition, and pre-training the network on large-scale data sets of Places365 and SUN 397;
s32, establishing an improved network model M based on I3D, fusing the prior knowledge in the softmax function with the predicted classification result according to the influence factor mu of the prior knowledge to obtain a final prediction result, and optimizing the loss function of the training set with k classifications by comparing the label value with the prediction result of the softmax function
Figure BDA0002833595970000041
Wherein: y denotes the correct tag value, Y denotes the final predicted value, YtFor the predicted value, Y ═ Yt
In particular: the main body framework of the human behavior recognition part of the network model M adopts an I3D network with a good effect on human behavior modeling, the I3D network is a network which expands a 2D filter into 3D, the 2D neural network uses an increment-V1 network, the expanded 3D network can use parameters of the 2D network in ImageNet pre-training, and the method is that the parameters of a 2D convolution kernel are copied along time and then divided by the time dimension of the 3D convolution kernel. Using 9 Inception modules, in the last average pooling layer of the model, reducing the dimensions of the human behavior features extracted from the previous layers into 1024-dimensional characteristic vectors, inputting the 1024-dimensional characteristic vectors into a softmax classification layer, and outputting a result A of the softmax classification layer for a training set with k classificationsiTo representComprises the following steps: a. thei=(ai1,ai2,……,ait,……aik) Wherein a isitRepresenting the probability value of the ith video belonging to the t-th action, fusing the priori knowledge in the softmax function with the predicted classification result according to the influence factor mu of the priori knowledge to obtain the final prediction result, and comparing the label value with the prediction result of the softmax function to optimize the loss function of a single training sample
Figure BDA0002833595970000042
Wherein: y denotes the correct tag value, Y denotes the final predicted value, YtIs a tag value, ytFor the prediction, the final loss function J is expressed as
Figure BDA0002833595970000043
Wherein, ω isi、biMu is weight, bias value and priori knowledge weight to be learned of softmax, N is video number, Y isiIs the tag value, y, of the ith videoiIs a prediction value for the ith video.
S4, model training and testing, wherein the process is shown in figure 2;
s41, inputting the scene graph into a scene recognition model for scene classification, taking TOP-3 of a recognition result, and recalculating the result according to a proportion of 100%;
s42, recalculating the prior knowledge corresponding to each scene into new fusion scene prior knowledge according to scene proportion based on the result of the step S41;
the recalculated prior knowledge of the ith video is:
Figure BDA0002833595970000051
wherein the content of the first and second substances,
Figure BDA0002833595970000052
TOP3 showing the probability of the ith video in the jth scene, m indicates that the recognition result takes TOP-3, which means "TOP-3 recognition result of scene recognition", so m ranges from 1 to 3,
Figure BDA0002833595970000053
Figure BDA0002833595970000054
introducing scene prior knowledge into the training process for the behavior occurrence probability of each video corresponding to the prior knowledge
Figure BDA0002833595970000055
Then, in the reverse learning process, if the priori knowledge is positively correlated with the prediction result, the weight mu value of the priori knowledge is increased, if the priori knowledge is inaccurate, the mu value is reduced, meanwhile, the deep neural network learns the sample by analyzing the distribution of data, and in order to prevent the introduction of the priori knowledge from generating decisive influence on the final result, mu is set as an influence factor of the priori knowledge, and 0<μ<1, initial μ ═ 0.5. The behavior recognition result of the ith video combined with the prior knowledge is represented as:
Figure BDA0002833595970000056
s43, the network model M is of a modular structure, as shown in FIG. 3, the network model M comprises a convolution layer, a maximum pooling layer, an average pooling layer, an inclusion module and a final softmax, fusion scene prior knowledge is introduced to constrain the weight of the softmax, so that the final output of the softmax excludes certain classes of impossible behavior classification, the accuracy of behavior identification is improved, meanwhile, in order to avoid the problem of gradient disappearance, the network model M is additionally provided with 2 auxiliary softmax on the basis of I3D for conducting the gradient forwards, the two auxiliary classifiers take the output of a certain middle inclusion module as classification, the two auxiliary classifiers are fused into the final classification result according to a weight value of 0.2, but in the final test, the two auxiliary classifiers are removed;
in particular: the network model M comprises convolution layers, a maximum pooling layer, an average pooling layer, an inclusion module and a last softmax, wherein the convolution kernel of the last convolution layer is 1 x 1 to generate a classification score, other convolution layers are all subjected to BN operation and Relu activation, the learning rate can be increased on the basis of the structure, and the dropout layer is only placed behind the average pooling layer. The method includes the steps that fusion scene prior knowledge is introduced to constrain the weight of softmax, so that certain types of impossible behavior classification can be eliminated from final output of softmax, accuracy of behavior identification is improved, meanwhile, in order to avoid the problem of gradient disappearance, 2 auxiliary softmax are additionally added to a network model M on the basis of I3D to conduct gradient forwards, output of a certain Incepration module in the middle of the two auxiliary classifiers is used as classification, the two auxiliary classifiers are fused into a final classification result according to a weight value of 0.2, and in a final test, the two auxiliary classifiers can be removed.
And S5, inputting the video to perform human behavior recognition to obtain human behavior classification.
The human behavior identification method based on scene prior knowledge can complete the multi-task content identification of videos, fully utilizes the association between scene information and human behaviors, optimizes a target function by converting the prior knowledge into the constraint on weight in a behavior identification model, and improves the accuracy rate of behavior identification.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (5)

1. A human behavior recognition method based on scene prior knowledge is characterized by comprising the following steps:
s1, preprocessing an input video;
s2, establishing an indoor scene-human body behavior prior knowledge base;
s3, establishing a video scene recognition model and a human body behavior recognition model;
s4, model training and testing;
and S5, inputting the video to perform human behavior recognition to obtain human behavior classification.
2. The human behavior recognition method based on scene a priori knowledge of claim 1, wherein the specific steps of step S1 are as follows:
s11, randomly extracting one frame of the video to serve as a scene image, and establishing a mapping relation between the scene image and the video after extraction;
s12, sparsely sampling an input video by using a TSN (time sequence coding) method, averagely dividing the video into N segments according to the number of frames, randomly extracting one frame from each Segment, and after an optical flow graph is obtained by using a TV-L1 method in OpenCV, randomly extracting one frame of optical flow image from each Segment by using the TSN method;
s13, cutting the preprocessed images by using a center, wherein the size of each image is 224 multiplied by 224 and the unit is a pixel.
3. The human behavior recognition method based on scene a priori knowledge of claim 1, wherein the specific steps of step S2 are as follows:
s21, quantifying prior knowledge and obtaining the prior knowledge G of different scenesjExpressed as:
Figure FDA0002833595960000011
wherein:
Figure FDA0002833595960000012
representing the prior probability of occurrence of the t-th action in the j-th scene;
s22, storing prior knowledge under all scenes in a prior knowledge base, and mapping relation G from the scenes to behaviorsSExpressed as: gS=(G1,G2,……,Gj)TWherein G isjIndicating a priori knowledge in the j-th scenario, T refers to transpose.
4. The human behavior recognition method based on scene a priori knowledge of claim 1, wherein the specific steps of step S3 are as follows:
s31, selecting ResNet152 as a network model for scene recognition, and pre-training the network on large-scale data sets of Places365 and SUN 397;
s32, establishing an improved network model M based on I3D, fusing the prior knowledge in the softmax function with the predicted classification result according to the influence factor mu of the prior knowledge to obtain a final prediction result, and optimizing the loss function for the training set with k classifications by comparing the label value with the prediction result of the softmax function
Figure FDA0002833595960000013
Wherein: y denotes the correct tag value, Y denotes the final predicted value, YtIs a tag value, ytFor the predicted value, Y ═ Yt
5. The human behavior recognition method based on scene a priori knowledge of claim 1, wherein the specific steps of step S4 are as follows:
s41, inputting the scene graph into a scene recognition model for scene classification, taking TOP-3 of a recognition result, and recalculating the result according to a proportion of 100%;
s42, recalculating the prior knowledge corresponding to each scene into new fusion scene prior knowledge according to scene proportion based on the result of the step S41;
s43, the network model M is of a modular structure and comprises a convolution layer, a maximum pooling layer, an average pooling layer, an inclusion module and a final softmax, fusion scene prior knowledge is introduced to constrain the weight of the softmax, so that the final output of the softmax excludes certain types of impossible behavior classification, 2 auxiliary softmax is additionally added to the network model M on the basis of I3D for conducting the gradient forwards, the output of one of the inclusion modules in the middle of the two auxiliary classifiers is used for classification, the two auxiliary classifiers are fused into a final classification result according to the weight, but in a final test, the two auxiliary classifiers are removed.
CN202011470438.8A 2020-12-14 2020-12-14 Human behavior identification method based on scene prior knowledge Pending CN112560668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011470438.8A CN112560668A (en) 2020-12-14 2020-12-14 Human behavior identification method based on scene prior knowledge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011470438.8A CN112560668A (en) 2020-12-14 2020-12-14 Human behavior identification method based on scene prior knowledge

Publications (1)

Publication Number Publication Date
CN112560668A true CN112560668A (en) 2021-03-26

Family

ID=75063141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011470438.8A Pending CN112560668A (en) 2020-12-14 2020-12-14 Human behavior identification method based on scene prior knowledge

Country Status (1)

Country Link
CN (1) CN112560668A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642644A (en) * 2021-08-13 2021-11-12 北京赛目科技有限公司 Method and device for determining vehicle environment grade, electronic equipment and storage medium
WO2023108968A1 (en) * 2021-12-14 2023-06-22 北京邮电大学 Image classification method and system based on knowledge-driven deep learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642644A (en) * 2021-08-13 2021-11-12 北京赛目科技有限公司 Method and device for determining vehicle environment grade, electronic equipment and storage medium
CN113642644B (en) * 2021-08-13 2024-05-10 北京赛目科技有限公司 Method and device for determining vehicle environment level, electronic equipment and storage medium
WO2023108968A1 (en) * 2021-12-14 2023-06-22 北京邮电大学 Image classification method and system based on knowledge-driven deep learning

Similar Documents

Publication Publication Date Title
CN111639692B (en) Shadow detection method based on attention mechanism
CN108229338B (en) Video behavior identification method based on deep convolution characteristics
CN108537119B (en) Small sample video identification method
CN112446476A (en) Neural network model compression method, device, storage medium and chip
CN110334589B (en) High-time-sequence 3D neural network action identification method based on hole convolution
CN110414344B (en) Character classification method based on video, intelligent terminal and storage medium
CN112699786B (en) Video behavior identification method and system based on space enhancement module
CN110569814B (en) Video category identification method, device, computer equipment and computer storage medium
CN111680705B (en) MB-SSD method and MB-SSD feature extraction network suitable for target detection
CN110827265B (en) Image anomaly detection method based on deep learning
CN112036447A (en) Zero-sample target detection system and learnable semantic and fixed semantic fusion method
CN113239807B (en) Method and device for training bill identification model and bill identification
CN112560668A (en) Human behavior identification method based on scene prior knowledge
CN112668638A (en) Image aesthetic quality evaluation and semantic recognition combined classification method and system
CN113378949A (en) Dual-generation confrontation learning method based on capsule network and mixed attention
CN112507904A (en) Real-time classroom human body posture detection method based on multi-scale features
Koli et al. Human action recognition using deep neural networks
CN115035381A (en) Lightweight target detection network of SN-YOLOv5 and crop picking detection method
CN114780767A (en) Large-scale image retrieval method and system based on deep convolutional neural network
CN114170657A (en) Facial emotion recognition method integrating attention mechanism and high-order feature representation
CN110782503B (en) Face image synthesis method and device based on two-branch depth correlation network
CN117351392A (en) Method for detecting abnormal behavior of video
CN115527275A (en) Behavior identification method based on P2CS _3DNet
CN115588217A (en) Face attribute detection method based on deep self-attention network
Patil et al. Video content classification using deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination