CN113449564A - Behavior image classification method based on human body local semantic knowledge - Google Patents

Behavior image classification method based on human body local semantic knowledge Download PDF

Info

Publication number
CN113449564A
CN113449564A CN202010228189.5A CN202010228189A CN113449564A CN 113449564 A CN113449564 A CN 113449564A CN 202010228189 A CN202010228189 A CN 202010228189A CN 113449564 A CN113449564 A CN 113449564A
Authority
CN
China
Prior art keywords
behavior
human body
body part
local
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010228189.5A
Other languages
Chinese (zh)
Other versions
CN113449564B (en
Inventor
李永露
徐良
刘欣鹏
许越
卢策吾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010228189.5A priority Critical patent/CN113449564B/en
Publication of CN113449564A publication Critical patent/CN113449564A/en
Application granted granted Critical
Publication of CN113449564B publication Critical patent/CN113449564B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

An image classification method based on human body local behavior semantic knowledge is characterized in that a human body part behavior state recognition model for obtaining human body local fine-grained semantic representation is established and model training is carried out; then, converting visual information in the image to be detected into language-based priori knowledge by using natural language understanding, fusing the priori knowledge and the visual information to generate a fine-grained behavior characterization vector, and transferring the fine-grained behavior characterization vector to a computer visual behavior and recognition task; and finally, reasoning the overall behavior by combining the local fine-grained characteristics of the human body to finish the behavior understanding process to obtain a classification result. The invention achieves ideal recognition performance improvement in a plurality of complex behavior understanding tasks; meanwhile, the method has the advantages of one-time pre-training and multiple times of various migration, and has generalization and flexibility.

Description

Behavior image classification method based on human body local semantic knowledge
Technical Field
The invention relates to a technology in the field of image recognition and artificial intelligence, in particular to an image classification method based on human body local behavior semantic knowledge.
Background
Human behavior detection is an important branch of computer vision, with the goal of inferring human behavior and interaction with the environment in an image or video. Behavior detection is widely applied to the fields of intelligent driving, security and robots, is one of the most important artificial intelligence technologies for the industry, and is more and more concerned by people. Machine learning mainly studies computer algorithms capable of being automatically improved through experience, and key information and knowledge are generally obtained, abstracted and summarized from a large amount of experience data, while an artificial neural network is an important branch of machine learning and is widely applied to artificial intelligence related tasks at present. The existing image behavior detection method directly infers the behavior of a person from the characteristics of the image level, and the method is easy to fall into the performance bottleneck due to overlarge modal difference between the image behavior detection method and the image level.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an image classification method based on human body local behavior semantic knowledge, so that the ideal recognition performance improvement is achieved in various complex behavior understanding tasks; meanwhile, the method has the advantages of one-time pre-training and multiple times of various migration, and has generalization and flexibility.
The invention is realized by the following technical scheme:
the invention relates to an image classification method based on human body local behavior semantic knowledge, which comprises the steps of establishing a human body part behavior state recognition model for obtaining human body local fine-grained semantic representation and carrying out model training; then, converting visual information in the image to be detected into language-based priori knowledge by using natural language understanding, fusing the priori knowledge and the visual information to generate a fine-grained behavior characterization vector, and transferring the fine-grained behavior characterization vector to a computer visual behavior and recognition task; and finally, reasoning the overall behavior by combining the local fine-grained characteristics of the human body to finish the behavior understanding process to obtain a classification result.
The human body part behavior state recognition model comprises: the human body part behavior state classifier comprises a 50-layer residual convolutional neural network for pre-training, 10 interest region pooling layers with 512 dimensions, two layers of perceptrons with a ReLU nonlinear activation layer and 10 human body part behavior state classifiers with 76 dimensions of output.
The model training adopts a human body part behavior state training sample set, and the human body part behavior state training sample set is obtained by the following method: in the image data set containing human behavior and its mark (including human boundary box b)hObject bounding box bo(when the behavior is human-object interaction behavior) and behavior tag labelaction) Defining the human body part behavior states of the people participating in the interaction, and finally obtaining 76 different human body part states; based on these definitions, the body part status of each person behavior instance in the image dataset is tagged, the result comprising two parts: body part state labelpastaAttention vector label of human body partattCharacterizing whether each part contributes to the behavior sample; and carrying out two-dimensional human body posture estimation on the people in the training set, and generating a boundary box b of ten parts of each person according to the estimation resultp1~bp10. The above bounding boxes are all four-dimensional vectors (x)1,y1,x2,y2) The coordinate of the upper left corner of the bounding box is (x)1,y1) The coordinate of the lower right corner is (x)2,y2)。
The visual information comprises: the image set HICO-DET which is disclosed and contains human behavior labels is used as a migration task data set to be trained to obtain a human body part behavior state recognition model, so that human body local fine-grained visual semantic representation and estimation of human body part attention vectors are extracted.
The language-based prior knowledge is that: according to the way of natural language understanding, the language representation vector of each part name and human body is extracted by using a BERT (pre-training of deep two-way transformation for language understanding) model.
The fusion is as follows: and combining the language-based prior knowledge and the visual information in a splicing mode to obtain a fine-grained behavior characterization vector.
The computer vision behavior and recognition task comprises the following steps: construction is based on human body local fine-grained semantic representation with fpastaTo input and derive an inferred score S of human behaviorpastaThe overall behavioral inference model of (a), the overall behavioral inference model comprising: hierarchical graph model, linear combination, multilayer perceptron, graph convolution network, sequence model, tree structure information conduction, wherein: the hierarchical graph model divides human body parts according to functional modules, merges and induces the human body parts according to layers, and performs behavior reasoning; and the linear combination, the multilayer perceptron, the graph convolution network, the sequence model and the tree structure information conduction respectively utilize single-layer full-connection operation, multilayer full-connection operation, graph convolution operation, LSTM operation and tree structure operation to classify the fine-grained behavior characterization vectors and infer human behaviors.
The loss function adopted by the training of the overall behavior inference model
Figure BDA0002426300930000021
Figure BDA0002426300930000022
Wherein: l ispastaA loss function adopted for training a human body part behavior state recognition model is adopted, the cross soil moisture between a model output result and a label is adopted as the loss function, and the loss function is omitted when no human body part state information exists in a migration task;
Figure BDA0002426300930000023
to use fpastaA cross entropy function calculated from the behavior detection result obtained after the model is sent;
Figure BDA0002426300930000024
the cross entropy function calculated for the conventional method is omitted when not combined with the conventional method.
The combined human body local fine granularityThe characteristic reasoning overall behavior is as follows: outputting the behavior detection score obtained by the overall behavior inference model and the method of only inputting the image-level features to map human behaviors as a behavior detection score SinstThe combined output is S ═ Spasta+SinstAnd obtaining the final detection result.
The invention relates to an identification system for realizing the method, which comprises the following steps: the device comprises an image feature extraction unit, a local state identification unit, a local state language feature unit and a behavior reasoning unit, wherein: the image feature extraction unit is used for extracting features of an input image and transmitting the input image to the local state identification unit connected with the image feature extraction unit so as to identify a local state and extract visual features of the local state; the local state language feature unit reads the recognition result of the local state recognition unit and converts the recognition result into language features; the local state recognition unit and the local state language feature unit respectively transmit the visual features and the language features to the behavior reasoning unit for final behavior recognition.
Technical effects
The invention integrally solves the problem that the migration learning is not facilitated due to more types of behaviors, and realizes the knowledge migration among different behaviors by learning less types and easier-to-migrate semantic knowledge of the local behaviors of the human body so as to improve the behavior recognition under a small sample.
Compared with the prior art, the method obviously improves the human behavior detection precision in the image, constructs fine-grained human local semantic representation by introducing human local behavior semantic knowledge and combining visual information and language information, generally improves the human local semantic representation by about 10% on a common behavior understanding data set, and can apply a feature extraction model after one-time training to various behavior understanding and identifying tasks such as human-object interaction behavior understanding, video or picture behavior understanding and the like through transfer learning.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the system of the present invention;
FIG. 3 is a schematic diagram illustrating the effect of the present invention.
Detailed Description
As shown in fig. 1, the present embodiment relates to an object-attribute combined image recognition method based on symmetry and group theory, which includes the following steps:
step 1, constructing a data set: using the public image data set containing human behavior and obtaining the human bounding box bhObject bounding box bo(when the behavior is human-object interaction behavior) and behavior tag labelactionDefining the human body part behavior states of the people participating in the interaction, and finally obtaining 76 different human body part states; based on these definitions, the human body part status of each human behavior instance in the image dataset is labeled, and the following results are obtained: body part state labelpastaAnd a human body part attention vector label with a length of 10attCharacterizing whether each part contributes to the behavior sample; and carrying out two-dimensional human body posture estimation on the people in the training set, and generating a boundary box b of ten parts of each person according to the estimation resultp1~bp10
The bounding boxes are all four-dimensional vectors (x)1,y1,x2,y2) The coordinate of the upper left corner of the bounding box is (x)1,y1) The coordinate of the lower right corner is (x)2,y2)。
Step 2: training a human body part behavior state recognition model.
Step 2.1: constructing a human body part behavior state recognition model, wherein the model comprises the following steps: the human body part behavior state classifier comprises a pre-trained 50-layer residual convolutional neural network, 10 interest region pooling layers with 512 dimensions, two layers of perceptrons of a ReLU nonlinear activation layer and 10 human body part behavior state classifiers with 76 dimensions as output, wherein: RGB three-channel color picture IRGBSending into residual convolution neural network to obtain characteristic diagram with resolution reduced to original 1/16 and having 1024 channels, and processing with the characteristic diagram and bp1~bp10After input interest areas are pooled, ten characteristics corresponding to each human body part are obtained and then are respectively sent to a corresponding multilayer perceptron and a human body part behavior state classifier to obtain Ppasta
Step 2.2: training the model with the data set constructed in step 1: will train data IRGB、bp1~bp10Inputting the corresponding human body part state label into the human body part behavior state recognition model, and calculating a loss function L according to the output resultpastaAnd iteratively training the model by using a gradient back propagation algorithm.
Said loss function LpastaThe method specifically comprises the following steps: and (3) a loss function adopted by the human body part behavior state recognition model training, cross moisture between a model output result and a label is adopted as the loss function, and the loss function is omitted when no human body part state information exists in the migration task.
And step 3, obtaining the human body local fine-grained semantic representation.
Step 3.1: acquiring a human bounding box b in an open image data set HICO-DET containing human behaviors as a migration task data sethObject bounding box bo(when the behavior is human-object interaction behavior) and behavior tag labelaction. Because the data set has behavior state labels of human body parts, corresponding label is also obtainedpastaAnd labelattIt is divided into a training set and a test set as input information, namely a three-channel RGB image I comprising human behaviorsRGBAnd bounding boxes b of people, human body parts, objects (e.g. for human-object interaction behavior)h,bo,bp
Step 3.2: inputting the data obtained in the step 3.1 into the human body part behavior state recognition model trained in the step 2, and outputting the data to represent the local fine-grained visual semantic meaning of the human body
Figure BDA0002426300930000041
Recognition result of human body part behavior state
Figure BDA0002426300930000042
Figure BDA0002426300930000043
And estimation of attention vectors of human body parts
Figure BDA0002426300930000044
Obtaining the visual characteristics of the local state through final splicing
Figure BDA0002426300930000045
Wherein:
Figure BDA0002426300930000046
is a pair of
Figure BDA0002426300930000047
And
Figure BDA0002426300930000048
and (4) splicing.
The human body local fine-grained visual semantic representation is the output of a last full connection layer of a human body part behavior state classifier
Figure BDA0002426300930000049
The length is 512, n in this embodiment1Is the dimension of the output.
In the overall training of step 2.2, a loss function is calculated by using the label corresponding to the input information and the output result of the network, and the iterative optimization is performed on the neural network parameters by using a gradient back propagation algorithm, wherein the loss function is
Figure BDA00024263009300000410
Figure BDA00024263009300000411
Figure BDA00024263009300000412
To estimate the body part behavior state of the ith individual body part,
Figure BDA00024263009300000413
a cross entropy loss function for estimating the human body part attention of the ith human body part.
The length of the visual information is 1024 in this embodiment.
Step 3.3: local behavior state language features based on self-language understanding are generated and are combined with the local state visual features obtained in the step 3.2
Figure BDA00024263009300000414
Fusing to generate fine-grained behavior characterization vectors: specifically, the local state identified by the local behavior state identification unit is converted into the local state language feature based on the natural language word description by using J Devrlin and the like, which are described in the document 'book of Pre-training of deep bidirectional transformations for language understanding' (Pre-training of deep bidirectional transformation for language understanding):
Figure BDA00024263009300000415
n2is the length of the language feature vector, associated with the selected language model; then, obtain
Figure BDA0002426300930000051
Then it is right
Figure BDA0002426300930000052
And
Figure BDA0002426300930000053
carrying out fusion: will be provided with
Figure BDA0002426300930000054
And
Figure BDA0002426300930000055
are spliced to obtain
Figure BDA0002426300930000056
And 4, step 4: and training an overall behavior inference model based on human body local fine-grained semantic representation.
Step 4.1: constructing a whole behavior inference model based on human body local fine-grained semantic representation, wherein the modelThe type comprises two layers 102 of four-dimensional multi-layer perceptron with activation function ReLU and full connection layer classifier behind the perceptron and is represented by fpastaAs input, inferred scores of human behavior are output.
Step 4.2: f belonging to the training set and obtained in the step 3pastaInputting the data into a model to obtain a behavior detection score SpastaAnd calculating a loss function L therefrompastaThe model is iteratively trained and updated using a gradient back propagation algorithm.
And 5: and (3) carrying out behavior classification based on human body local behavior semantic knowledge by using the trained model: f belonging to the test set and obtained in the step 3pastaInputting into a model to obtain an output SpastaResult S output by the method using only image level featuresinstCombining to obtain S ═ Spasta+SinstAs a final test result.
After combination, the relative improvement is 29% compared with that before combination.
As shown in fig. 2, the present embodiment further relates to an identification system for implementing the above method, including: the device comprises an image feature extraction unit, a local state identification unit, a local state language feature unit and a behavior reasoning unit, wherein: the image feature extraction unit is used for extracting features of an input image and transmitting the input image to the local state identification unit connected with the image feature extraction unit so as to identify a local state and extract visual features of the local state; the local state language feature unit reads the recognition result of the local state recognition unit and converts the recognition result into language features; the local state recognition unit and the local state language feature unit respectively transmit the visual features and the language features to the behavior reasoning unit for final behavior recognition.
Preferably, the system acquires dynamic local changes of continuous video frames through a video-based human body part tracking unit, further acquires a local behavior state in a time period, and receives multi-frame input in cooperation with a behavior reasoning unit to obtain an overall behavior recognition result in a certain time period, so that the system can be used for recognizing daily video behaviors, the behavior recognition performance in videos is improved through the judgment of the local dynamic time sequence state of the human body, and the precision can be effectively improved by 4.2% on a large-scale public video behavior data set AVA.
As shown in fig. 3, for the image-level human behavior classification task, the batch size is set to be 16 under a single Nvidia Titan X GPU, the initial learning rate is 1e-5 and the cosine decreases, a model is trained by using a stochastic gradient descent optimizer with momentum of 0.9, and through 80k training and 20k training adjustments, the experimental data that can be obtained on the HICO data set is 46.3 maps, which reaches the most advanced level at present.
Figure BDA0002426300930000057
Figure BDA0002426300930000061
Compared with the prior art, the performance index of the method is improved by introducing the identification of the human body part behavior state, so that the huge difference of direct mapping from the image to the human behavior is avoided; meanwhile, the local state can be shared by different overall behaviors, and the behavior recognition under the learning of small samples is better improved due to better mobility.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (9)

1. An image classification method based on human body local behavior semantic knowledge is characterized in that a human body part behavior state recognition model for obtaining human body local fine-grained semantic representation is established and model training is carried out; then, converting visual information in the image to be detected into language-based priori knowledge by using natural language understanding, fusing the priori knowledge and the visual information to generate a fine-grained behavior characterization vector, and transferring the fine-grained behavior characterization vector to a computer visual behavior and recognition task; and finally, reasoning the overall behavior by combining the local fine-grained characteristics of the human body to finish the behavior understanding process to obtain a classification result.
2. The image classification method according to claim 1, wherein the human body part behavior state recognition model comprises: the human body part behavior state classifier comprises a 50-layer residual convolutional neural network for pre-training, 10 interest region pooling layers with 512 dimensions, two layers of perceptrons with a ReLU nonlinear activation layer and 10 human body part behavior state classifiers with 76 dimensions of output.
3. The image classification method according to claim 1 or 2, characterized in that the model training uses a human body part behavior state training sample set, and the human body part behavior state training sample set is obtained by: defining human body part behavior states of people participating in interaction on the image data set containing human behaviors and labels thereof, and finally obtaining 76 different human body part states; based on these definitions, the body part status of each person behavior instance in the image dataset is tagged, the result comprising two parts: body part state labelpastaAttention vector label of human body partattCharacterizing whether each part contributes to the behavior sample; and carrying out two-dimensional human body posture estimation on the people in the training set, and generating a boundary box b of ten parts of each person according to the estimation resultp1~bp10The bounding boxes are all four-dimensional vectors (x)1,y1,x2,y2) The coordinate of the upper left corner of the bounding box is (x)1,y1) The coordinate of the lower right corner is (x)2,y2)。
4. The image classification method according to claim 1, wherein the visual information comprises: the image set HICO-DET which is disclosed and contains human behavior labels is used as a migration task data set to be trained to obtain a human body part behavior state recognition model, so that human body local fine-grained visual semantic representation and estimation on human body part attention vectors are extracted;
the language-based prior knowledge is that: according to a natural language understanding mode, extracting names of all parts of a human body and language expression vectors by using a pre-training model of deep bidirectional transformation of language understanding;
the fusion is as follows: and combining the language-based prior knowledge and the visual information in a splicing mode to obtain a fine-grained behavior characterization vector.
5. The image classification method according to claim 1, characterized in that the computer vision behavior and recognition tasks are: construction is based on human body local fine-grained semantic representation with fpastaTo input and derive an inferred score S of human behaviorpastaThe overall behavioral inference model of (a), the overall behavioral inference model comprising: hierarchical graph model, linear combination, multilayer perceptron, graph convolution network, sequence model, tree structure information conduction, wherein: the hierarchical graph model divides human body parts according to functional modules, merges and induces the human body parts according to layers, and performs behavior reasoning; and the linear combination, the multilayer perceptron, the graph convolution network, the sequence model and the tree structure information conduction respectively utilize single-layer full-connection operation, multilayer full-connection operation, graph convolution operation, LSTM operation and tree structure operation to classify the fine-grained behavior characterization vectors and infer human behaviors.
6. The image classification method according to claim 1, characterized in that the training of the overall behavior inference model uses a loss function
Figure FDA0002426300920000021
Wherein: l ispastaA loss function adopted for training a human body part behavior state recognition model is adopted, the cross soil moisture between a model output result and a label is adopted as the loss function, and the loss function is omitted when no human body part state information exists in a migration task;
Figure FDA0002426300920000022
to use fpastaFeeding into the mold to obtainCalculating a cross entropy function of the obtained behavior detection result;
Figure FDA0002426300920000023
the cross entropy function calculated for the conventional method is omitted when not combined with the conventional method.
7. The image classification method according to claim 1, characterized in that the combined human body local fine-grained feature inference global behavior is: outputting the behavior detection score obtained by the overall behavior inference model and the method of only inputting the image-level features to map human behaviors as a behavior detection score SinstThe combined output is S ═ Spasta+SinstAnd obtaining the final detection result.
8. An identification system for implementing the method of any one of claims 1 to 7, comprising: the device comprises an image feature extraction unit, a local state identification unit, a local state language feature unit and a behavior reasoning unit, wherein: the image feature extraction unit is used for extracting features of an input image and transmitting the input image to the local state identification unit connected with the image feature extraction unit so as to identify a local state and extract visual features of the local state; the local state language feature unit reads the recognition result of the local state recognition unit and converts the recognition result into language features; the local state recognition unit and the local state language feature unit respectively transmit the visual features and the language features to the behavior reasoning unit for final behavior recognition.
9. The identification system of claim 8, further comprising a video-based human body part tracking unit for capturing dynamic local changes of successive video frames to obtain local behavior state in a time period, and a behavior inference unit for receiving multi-frame input and obtaining overall behavior identification result in a time period, thereby being used for daily video behavior identification.
CN202010228189.5A 2020-03-26 2020-03-26 Behavior image classification method based on human body local semantic knowledge Active CN113449564B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010228189.5A CN113449564B (en) 2020-03-26 2020-03-26 Behavior image classification method based on human body local semantic knowledge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010228189.5A CN113449564B (en) 2020-03-26 2020-03-26 Behavior image classification method based on human body local semantic knowledge

Publications (2)

Publication Number Publication Date
CN113449564A true CN113449564A (en) 2021-09-28
CN113449564B CN113449564B (en) 2022-09-06

Family

ID=77807763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010228189.5A Active CN113449564B (en) 2020-03-26 2020-03-26 Behavior image classification method based on human body local semantic knowledge

Country Status (1)

Country Link
CN (1) CN113449564B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115968087A (en) * 2023-03-16 2023-04-14 中建八局发展建设有限公司 Interactive light control device of exhibitions center
CN117197843A (en) * 2023-11-06 2023-12-08 中国科学院自动化研究所 Unsupervised human body part area determination method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942851A (en) * 2014-04-02 2014-07-23 北京中交慧联信息科技有限公司 Method and device for monitoring vehicle state and driving behavior
CN107578106A (en) * 2017-09-18 2018-01-12 中国科学技术大学 A kind of neutral net natural language inference method for merging semanteme of word knowledge
CN108367442A (en) * 2016-02-25 2018-08-03 奥林巴斯株式会社 Effector system and its control method
CN108830334A (en) * 2018-06-25 2018-11-16 江西师范大学 A kind of fine granularity target-recognition method based on confrontation type transfer learning
CN109077704A (en) * 2018-07-06 2018-12-25 上海玄众医疗科技有限公司 A kind of infant nurses recognition methods and system
CN109783666A (en) * 2019-01-11 2019-05-21 中山大学 A kind of image scene map generation method based on iteration fining
CN110728203A (en) * 2019-09-23 2020-01-24 清华大学 Sign language translation video generation method and system based on deep learning
CN110750669A (en) * 2019-09-19 2020-02-04 深思考人工智能机器人科技(北京)有限公司 Method and system for generating image captions
CN110909736A (en) * 2019-11-12 2020-03-24 北京工业大学 Image description method based on long-short term memory model and target detection algorithm

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942851A (en) * 2014-04-02 2014-07-23 北京中交慧联信息科技有限公司 Method and device for monitoring vehicle state and driving behavior
CN108367442A (en) * 2016-02-25 2018-08-03 奥林巴斯株式会社 Effector system and its control method
CN107578106A (en) * 2017-09-18 2018-01-12 中国科学技术大学 A kind of neutral net natural language inference method for merging semanteme of word knowledge
CN108830334A (en) * 2018-06-25 2018-11-16 江西师范大学 A kind of fine granularity target-recognition method based on confrontation type transfer learning
CN109077704A (en) * 2018-07-06 2018-12-25 上海玄众医疗科技有限公司 A kind of infant nurses recognition methods and system
CN109783666A (en) * 2019-01-11 2019-05-21 中山大学 A kind of image scene map generation method based on iteration fining
CN110750669A (en) * 2019-09-19 2020-02-04 深思考人工智能机器人科技(北京)有限公司 Method and system for generating image captions
CN110728203A (en) * 2019-09-23 2020-01-24 清华大学 Sign language translation video generation method and system based on deep learning
CN110909736A (en) * 2019-11-12 2020-03-24 北京工业大学 Image description method based on long-short term memory model and target detection algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MICHALIS RAPTIS ET AL.: "Poselet Key-framing: A Model for Human Activity Recognition", 《IEEE XPLORE》 *
雷庆等: "复杂场景下的人体行为识别研究新进展", 《计算机科学》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115968087A (en) * 2023-03-16 2023-04-14 中建八局发展建设有限公司 Interactive light control device of exhibitions center
CN117197843A (en) * 2023-11-06 2023-12-08 中国科学院自动化研究所 Unsupervised human body part area determination method and device
CN117197843B (en) * 2023-11-06 2024-02-02 中国科学院自动化研究所 Unsupervised human body part area determination method and device

Also Published As

Publication number Publication date
CN113449564B (en) 2022-09-06

Similar Documents

Publication Publication Date Title
Zhang et al. Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things
Deng et al. MVF-Net: A multi-view fusion network for event-based object classification
CN109948475B (en) Human body action recognition method based on skeleton features and deep learning
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
CN110135249B (en) Human behavior identification method based on time attention mechanism and LSTM (least Square TM)
Zheng et al. Recent advances of deep learning for sign language recognition
KR101887637B1 (en) Robot system
CN113449564B (en) Behavior image classification method based on human body local semantic knowledge
CN114896434B (en) Hash code generation method and device based on center similarity learning
CN116524593A (en) Dynamic gesture recognition method, system, equipment and medium
Luqman An efficient two-stream network for isolated sign language recognition using accumulative video motion
He et al. Global and local fusion ensemble network for facial expression recognition
CN112800979B (en) Dynamic expression recognition method and system based on characterization flow embedded network
Musthafa et al. Real time Indian sign language recognition system
CN112949501A (en) Method for learning object availability from teaching video
CN117496567A (en) Facial expression recognition method and system based on feature enhancement
CN116662924A (en) Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism
CN112861848B (en) Visual relation detection method and system based on known action conditions
Zhao et al. Research on human behavior recognition in video based on 3DCCA
Saif et al. Aggressive action estimation: a comprehensive review on neural network based human segmentation and action recognition
KR101913140B1 (en) Apparatus and method for Optimizing Continuous Features in Industrial Surveillance using Big Data in the Internet of Things
Liu Improved convolutional neural networks for course teaching quality assessment
Mittel et al. Peri: Part aware emotion recognition in the wild
CN112784631A (en) Method for recognizing face emotion based on deep neural network
Nan et al. 3D RES-inception network transfer learning for multiple label crowd behavior recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant