CN113449564A - Behavior image classification method based on human body local semantic knowledge - Google Patents
Behavior image classification method based on human body local semantic knowledge Download PDFInfo
- Publication number
- CN113449564A CN113449564A CN202010228189.5A CN202010228189A CN113449564A CN 113449564 A CN113449564 A CN 113449564A CN 202010228189 A CN202010228189 A CN 202010228189A CN 113449564 A CN113449564 A CN 113449564A
- Authority
- CN
- China
- Prior art keywords
- behavior
- human body
- body part
- local
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
An image classification method based on human body local behavior semantic knowledge is characterized in that a human body part behavior state recognition model for obtaining human body local fine-grained semantic representation is established and model training is carried out; then, converting visual information in the image to be detected into language-based priori knowledge by using natural language understanding, fusing the priori knowledge and the visual information to generate a fine-grained behavior characterization vector, and transferring the fine-grained behavior characterization vector to a computer visual behavior and recognition task; and finally, reasoning the overall behavior by combining the local fine-grained characteristics of the human body to finish the behavior understanding process to obtain a classification result. The invention achieves ideal recognition performance improvement in a plurality of complex behavior understanding tasks; meanwhile, the method has the advantages of one-time pre-training and multiple times of various migration, and has generalization and flexibility.
Description
Technical Field
The invention relates to a technology in the field of image recognition and artificial intelligence, in particular to an image classification method based on human body local behavior semantic knowledge.
Background
Human behavior detection is an important branch of computer vision, with the goal of inferring human behavior and interaction with the environment in an image or video. Behavior detection is widely applied to the fields of intelligent driving, security and robots, is one of the most important artificial intelligence technologies for the industry, and is more and more concerned by people. Machine learning mainly studies computer algorithms capable of being automatically improved through experience, and key information and knowledge are generally obtained, abstracted and summarized from a large amount of experience data, while an artificial neural network is an important branch of machine learning and is widely applied to artificial intelligence related tasks at present. The existing image behavior detection method directly infers the behavior of a person from the characteristics of the image level, and the method is easy to fall into the performance bottleneck due to overlarge modal difference between the image behavior detection method and the image level.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an image classification method based on human body local behavior semantic knowledge, so that the ideal recognition performance improvement is achieved in various complex behavior understanding tasks; meanwhile, the method has the advantages of one-time pre-training and multiple times of various migration, and has generalization and flexibility.
The invention is realized by the following technical scheme:
the invention relates to an image classification method based on human body local behavior semantic knowledge, which comprises the steps of establishing a human body part behavior state recognition model for obtaining human body local fine-grained semantic representation and carrying out model training; then, converting visual information in the image to be detected into language-based priori knowledge by using natural language understanding, fusing the priori knowledge and the visual information to generate a fine-grained behavior characterization vector, and transferring the fine-grained behavior characterization vector to a computer visual behavior and recognition task; and finally, reasoning the overall behavior by combining the local fine-grained characteristics of the human body to finish the behavior understanding process to obtain a classification result.
The human body part behavior state recognition model comprises: the human body part behavior state classifier comprises a 50-layer residual convolutional neural network for pre-training, 10 interest region pooling layers with 512 dimensions, two layers of perceptrons with a ReLU nonlinear activation layer and 10 human body part behavior state classifiers with 76 dimensions of output.
The model training adopts a human body part behavior state training sample set, and the human body part behavior state training sample set is obtained by the following method: in the image data set containing human behavior and its mark (including human boundary box b)hObject bounding box bo(when the behavior is human-object interaction behavior) and behavior tag labelaction) Defining the human body part behavior states of the people participating in the interaction, and finally obtaining 76 different human body part states; based on these definitions, the body part status of each person behavior instance in the image dataset is tagged, the result comprising two parts: body part state labelpastaAttention vector label of human body partattCharacterizing whether each part contributes to the behavior sample; and carrying out two-dimensional human body posture estimation on the people in the training set, and generating a boundary box b of ten parts of each person according to the estimation resultp1~bp10. The above bounding boxes are all four-dimensional vectors (x)1,y1,x2,y2) The coordinate of the upper left corner of the bounding box is (x)1,y1) The coordinate of the lower right corner is (x)2,y2)。
The visual information comprises: the image set HICO-DET which is disclosed and contains human behavior labels is used as a migration task data set to be trained to obtain a human body part behavior state recognition model, so that human body local fine-grained visual semantic representation and estimation of human body part attention vectors are extracted.
The language-based prior knowledge is that: according to the way of natural language understanding, the language representation vector of each part name and human body is extracted by using a BERT (pre-training of deep two-way transformation for language understanding) model.
The fusion is as follows: and combining the language-based prior knowledge and the visual information in a splicing mode to obtain a fine-grained behavior characterization vector.
The computer vision behavior and recognition task comprises the following steps: construction is based on human body local fine-grained semantic representation with fpastaTo input and derive an inferred score S of human behaviorpastaThe overall behavioral inference model of (a), the overall behavioral inference model comprising: hierarchical graph model, linear combination, multilayer perceptron, graph convolution network, sequence model, tree structure information conduction, wherein: the hierarchical graph model divides human body parts according to functional modules, merges and induces the human body parts according to layers, and performs behavior reasoning; and the linear combination, the multilayer perceptron, the graph convolution network, the sequence model and the tree structure information conduction respectively utilize single-layer full-connection operation, multilayer full-connection operation, graph convolution operation, LSTM operation and tree structure operation to classify the fine-grained behavior characterization vectors and infer human behaviors.
The loss function adopted by the training of the overall behavior inference model Wherein: l ispastaA loss function adopted for training a human body part behavior state recognition model is adopted, the cross soil moisture between a model output result and a label is adopted as the loss function, and the loss function is omitted when no human body part state information exists in a migration task;to use fpastaA cross entropy function calculated from the behavior detection result obtained after the model is sent;the cross entropy function calculated for the conventional method is omitted when not combined with the conventional method.
The combined human body local fine granularityThe characteristic reasoning overall behavior is as follows: outputting the behavior detection score obtained by the overall behavior inference model and the method of only inputting the image-level features to map human behaviors as a behavior detection score SinstThe combined output is S ═ Spasta+SinstAnd obtaining the final detection result.
The invention relates to an identification system for realizing the method, which comprises the following steps: the device comprises an image feature extraction unit, a local state identification unit, a local state language feature unit and a behavior reasoning unit, wherein: the image feature extraction unit is used for extracting features of an input image and transmitting the input image to the local state identification unit connected with the image feature extraction unit so as to identify a local state and extract visual features of the local state; the local state language feature unit reads the recognition result of the local state recognition unit and converts the recognition result into language features; the local state recognition unit and the local state language feature unit respectively transmit the visual features and the language features to the behavior reasoning unit for final behavior recognition.
Technical effects
The invention integrally solves the problem that the migration learning is not facilitated due to more types of behaviors, and realizes the knowledge migration among different behaviors by learning less types and easier-to-migrate semantic knowledge of the local behaviors of the human body so as to improve the behavior recognition under a small sample.
Compared with the prior art, the method obviously improves the human behavior detection precision in the image, constructs fine-grained human local semantic representation by introducing human local behavior semantic knowledge and combining visual information and language information, generally improves the human local semantic representation by about 10% on a common behavior understanding data set, and can apply a feature extraction model after one-time training to various behavior understanding and identifying tasks such as human-object interaction behavior understanding, video or picture behavior understanding and the like through transfer learning.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the system of the present invention;
FIG. 3 is a schematic diagram illustrating the effect of the present invention.
Detailed Description
As shown in fig. 1, the present embodiment relates to an object-attribute combined image recognition method based on symmetry and group theory, which includes the following steps:
step 1, constructing a data set: using the public image data set containing human behavior and obtaining the human bounding box bhObject bounding box bo(when the behavior is human-object interaction behavior) and behavior tag labelactionDefining the human body part behavior states of the people participating in the interaction, and finally obtaining 76 different human body part states; based on these definitions, the human body part status of each human behavior instance in the image dataset is labeled, and the following results are obtained: body part state labelpastaAnd a human body part attention vector label with a length of 10attCharacterizing whether each part contributes to the behavior sample; and carrying out two-dimensional human body posture estimation on the people in the training set, and generating a boundary box b of ten parts of each person according to the estimation resultp1~bp10。
The bounding boxes are all four-dimensional vectors (x)1,y1,x2,y2) The coordinate of the upper left corner of the bounding box is (x)1,y1) The coordinate of the lower right corner is (x)2,y2)。
Step 2: training a human body part behavior state recognition model.
Step 2.1: constructing a human body part behavior state recognition model, wherein the model comprises the following steps: the human body part behavior state classifier comprises a pre-trained 50-layer residual convolutional neural network, 10 interest region pooling layers with 512 dimensions, two layers of perceptrons of a ReLU nonlinear activation layer and 10 human body part behavior state classifiers with 76 dimensions as output, wherein: RGB three-channel color picture IRGBSending into residual convolution neural network to obtain characteristic diagram with resolution reduced to original 1/16 and having 1024 channels, and processing with the characteristic diagram and bp1~bp10After input interest areas are pooled, ten characteristics corresponding to each human body part are obtained and then are respectively sent to a corresponding multilayer perceptron and a human body part behavior state classifier to obtain Ppasta。
Step 2.2: training the model with the data set constructed in step 1: will train data IRGB、bp1~bp10Inputting the corresponding human body part state label into the human body part behavior state recognition model, and calculating a loss function L according to the output resultpastaAnd iteratively training the model by using a gradient back propagation algorithm.
Said loss function LpastaThe method specifically comprises the following steps: and (3) a loss function adopted by the human body part behavior state recognition model training, cross moisture between a model output result and a label is adopted as the loss function, and the loss function is omitted when no human body part state information exists in the migration task.
And step 3, obtaining the human body local fine-grained semantic representation.
Step 3.1: acquiring a human bounding box b in an open image data set HICO-DET containing human behaviors as a migration task data sethObject bounding box bo(when the behavior is human-object interaction behavior) and behavior tag labelaction. Because the data set has behavior state labels of human body parts, corresponding label is also obtainedpastaAnd labelattIt is divided into a training set and a test set as input information, namely a three-channel RGB image I comprising human behaviorsRGBAnd bounding boxes b of people, human body parts, objects (e.g. for human-object interaction behavior)h,bo,bp。
Step 3.2: inputting the data obtained in the step 3.1 into the human body part behavior state recognition model trained in the step 2, and outputting the data to represent the local fine-grained visual semantic meaning of the human bodyRecognition result of human body part behavior state And estimation of attention vectors of human body partsObtaining the visual characteristics of the local state through final splicingWherein:is a pair ofAndand (4) splicing.
The human body local fine-grained visual semantic representation is the output of a last full connection layer of a human body part behavior state classifierThe length is 512, n in this embodiment1Is the dimension of the output.
In the overall training of step 2.2, a loss function is calculated by using the label corresponding to the input information and the output result of the network, and the iterative optimization is performed on the neural network parameters by using a gradient back propagation algorithm, wherein the loss function is To estimate the body part behavior state of the ith individual body part,a cross entropy loss function for estimating the human body part attention of the ith human body part.
The length of the visual information is 1024 in this embodiment.
Step 3.3: local behavior state language features based on self-language understanding are generated and are combined with the local state visual features obtained in the step 3.2Fusing to generate fine-grained behavior characterization vectors: specifically, the local state identified by the local behavior state identification unit is converted into the local state language feature based on the natural language word description by using J Devrlin and the like, which are described in the document 'book of Pre-training of deep bidirectional transformations for language understanding' (Pre-training of deep bidirectional transformation for language understanding):n2is the length of the language feature vector, associated with the selected language model; then, obtainThen it is rightAndcarrying out fusion: will be provided withAndare spliced to obtain
And 4, step 4: and training an overall behavior inference model based on human body local fine-grained semantic representation.
Step 4.1: constructing a whole behavior inference model based on human body local fine-grained semantic representation, wherein the modelThe type comprises two layers 102 of four-dimensional multi-layer perceptron with activation function ReLU and full connection layer classifier behind the perceptron and is represented by fpastaAs input, inferred scores of human behavior are output.
Step 4.2: f belonging to the training set and obtained in the step 3pastaInputting the data into a model to obtain a behavior detection score SpastaAnd calculating a loss function L therefrompastaThe model is iteratively trained and updated using a gradient back propagation algorithm.
And 5: and (3) carrying out behavior classification based on human body local behavior semantic knowledge by using the trained model: f belonging to the test set and obtained in the step 3pastaInputting into a model to obtain an output SpastaResult S output by the method using only image level featuresinstCombining to obtain S ═ Spasta+SinstAs a final test result.
After combination, the relative improvement is 29% compared with that before combination.
As shown in fig. 2, the present embodiment further relates to an identification system for implementing the above method, including: the device comprises an image feature extraction unit, a local state identification unit, a local state language feature unit and a behavior reasoning unit, wherein: the image feature extraction unit is used for extracting features of an input image and transmitting the input image to the local state identification unit connected with the image feature extraction unit so as to identify a local state and extract visual features of the local state; the local state language feature unit reads the recognition result of the local state recognition unit and converts the recognition result into language features; the local state recognition unit and the local state language feature unit respectively transmit the visual features and the language features to the behavior reasoning unit for final behavior recognition.
Preferably, the system acquires dynamic local changes of continuous video frames through a video-based human body part tracking unit, further acquires a local behavior state in a time period, and receives multi-frame input in cooperation with a behavior reasoning unit to obtain an overall behavior recognition result in a certain time period, so that the system can be used for recognizing daily video behaviors, the behavior recognition performance in videos is improved through the judgment of the local dynamic time sequence state of the human body, and the precision can be effectively improved by 4.2% on a large-scale public video behavior data set AVA.
As shown in fig. 3, for the image-level human behavior classification task, the batch size is set to be 16 under a single Nvidia Titan X GPU, the initial learning rate is 1e-5 and the cosine decreases, a model is trained by using a stochastic gradient descent optimizer with momentum of 0.9, and through 80k training and 20k training adjustments, the experimental data that can be obtained on the HICO data set is 46.3 maps, which reaches the most advanced level at present.
Compared with the prior art, the performance index of the method is improved by introducing the identification of the human body part behavior state, so that the huge difference of direct mapping from the image to the human behavior is avoided; meanwhile, the local state can be shared by different overall behaviors, and the behavior recognition under the learning of small samples is better improved due to better mobility.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Claims (9)
1. An image classification method based on human body local behavior semantic knowledge is characterized in that a human body part behavior state recognition model for obtaining human body local fine-grained semantic representation is established and model training is carried out; then, converting visual information in the image to be detected into language-based priori knowledge by using natural language understanding, fusing the priori knowledge and the visual information to generate a fine-grained behavior characterization vector, and transferring the fine-grained behavior characterization vector to a computer visual behavior and recognition task; and finally, reasoning the overall behavior by combining the local fine-grained characteristics of the human body to finish the behavior understanding process to obtain a classification result.
2. The image classification method according to claim 1, wherein the human body part behavior state recognition model comprises: the human body part behavior state classifier comprises a 50-layer residual convolutional neural network for pre-training, 10 interest region pooling layers with 512 dimensions, two layers of perceptrons with a ReLU nonlinear activation layer and 10 human body part behavior state classifiers with 76 dimensions of output.
3. The image classification method according to claim 1 or 2, characterized in that the model training uses a human body part behavior state training sample set, and the human body part behavior state training sample set is obtained by: defining human body part behavior states of people participating in interaction on the image data set containing human behaviors and labels thereof, and finally obtaining 76 different human body part states; based on these definitions, the body part status of each person behavior instance in the image dataset is tagged, the result comprising two parts: body part state labelpastaAttention vector label of human body partattCharacterizing whether each part contributes to the behavior sample; and carrying out two-dimensional human body posture estimation on the people in the training set, and generating a boundary box b of ten parts of each person according to the estimation resultp1~bp10The bounding boxes are all four-dimensional vectors (x)1,y1,x2,y2) The coordinate of the upper left corner of the bounding box is (x)1,y1) The coordinate of the lower right corner is (x)2,y2)。
4. The image classification method according to claim 1, wherein the visual information comprises: the image set HICO-DET which is disclosed and contains human behavior labels is used as a migration task data set to be trained to obtain a human body part behavior state recognition model, so that human body local fine-grained visual semantic representation and estimation on human body part attention vectors are extracted;
the language-based prior knowledge is that: according to a natural language understanding mode, extracting names of all parts of a human body and language expression vectors by using a pre-training model of deep bidirectional transformation of language understanding;
the fusion is as follows: and combining the language-based prior knowledge and the visual information in a splicing mode to obtain a fine-grained behavior characterization vector.
5. The image classification method according to claim 1, characterized in that the computer vision behavior and recognition tasks are: construction is based on human body local fine-grained semantic representation with fpastaTo input and derive an inferred score S of human behaviorpastaThe overall behavioral inference model of (a), the overall behavioral inference model comprising: hierarchical graph model, linear combination, multilayer perceptron, graph convolution network, sequence model, tree structure information conduction, wherein: the hierarchical graph model divides human body parts according to functional modules, merges and induces the human body parts according to layers, and performs behavior reasoning; and the linear combination, the multilayer perceptron, the graph convolution network, the sequence model and the tree structure information conduction respectively utilize single-layer full-connection operation, multilayer full-connection operation, graph convolution operation, LSTM operation and tree structure operation to classify the fine-grained behavior characterization vectors and infer human behaviors.
6. The image classification method according to claim 1, characterized in that the training of the overall behavior inference model uses a loss functionWherein: l ispastaA loss function adopted for training a human body part behavior state recognition model is adopted, the cross soil moisture between a model output result and a label is adopted as the loss function, and the loss function is omitted when no human body part state information exists in a migration task;to use fpastaFeeding into the mold to obtainCalculating a cross entropy function of the obtained behavior detection result;the cross entropy function calculated for the conventional method is omitted when not combined with the conventional method.
7. The image classification method according to claim 1, characterized in that the combined human body local fine-grained feature inference global behavior is: outputting the behavior detection score obtained by the overall behavior inference model and the method of only inputting the image-level features to map human behaviors as a behavior detection score SinstThe combined output is S ═ Spasta+SinstAnd obtaining the final detection result.
8. An identification system for implementing the method of any one of claims 1 to 7, comprising: the device comprises an image feature extraction unit, a local state identification unit, a local state language feature unit and a behavior reasoning unit, wherein: the image feature extraction unit is used for extracting features of an input image and transmitting the input image to the local state identification unit connected with the image feature extraction unit so as to identify a local state and extract visual features of the local state; the local state language feature unit reads the recognition result of the local state recognition unit and converts the recognition result into language features; the local state recognition unit and the local state language feature unit respectively transmit the visual features and the language features to the behavior reasoning unit for final behavior recognition.
9. The identification system of claim 8, further comprising a video-based human body part tracking unit for capturing dynamic local changes of successive video frames to obtain local behavior state in a time period, and a behavior inference unit for receiving multi-frame input and obtaining overall behavior identification result in a time period, thereby being used for daily video behavior identification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010228189.5A CN113449564B (en) | 2020-03-26 | 2020-03-26 | Behavior image classification method based on human body local semantic knowledge |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010228189.5A CN113449564B (en) | 2020-03-26 | 2020-03-26 | Behavior image classification method based on human body local semantic knowledge |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113449564A true CN113449564A (en) | 2021-09-28 |
CN113449564B CN113449564B (en) | 2022-09-06 |
Family
ID=77807763
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010228189.5A Active CN113449564B (en) | 2020-03-26 | 2020-03-26 | Behavior image classification method based on human body local semantic knowledge |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113449564B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115968087A (en) * | 2023-03-16 | 2023-04-14 | 中建八局发展建设有限公司 | Interactive light control device of exhibitions center |
CN117197843A (en) * | 2023-11-06 | 2023-12-08 | 中国科学院自动化研究所 | Unsupervised human body part area determination method and device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942851A (en) * | 2014-04-02 | 2014-07-23 | 北京中交慧联信息科技有限公司 | Method and device for monitoring vehicle state and driving behavior |
CN107578106A (en) * | 2017-09-18 | 2018-01-12 | 中国科学技术大学 | A kind of neutral net natural language inference method for merging semanteme of word knowledge |
CN108367442A (en) * | 2016-02-25 | 2018-08-03 | 奥林巴斯株式会社 | Effector system and its control method |
CN108830334A (en) * | 2018-06-25 | 2018-11-16 | 江西师范大学 | A kind of fine granularity target-recognition method based on confrontation type transfer learning |
CN109077704A (en) * | 2018-07-06 | 2018-12-25 | 上海玄众医疗科技有限公司 | A kind of infant nurses recognition methods and system |
CN109783666A (en) * | 2019-01-11 | 2019-05-21 | 中山大学 | A kind of image scene map generation method based on iteration fining |
CN110728203A (en) * | 2019-09-23 | 2020-01-24 | 清华大学 | Sign language translation video generation method and system based on deep learning |
CN110750669A (en) * | 2019-09-19 | 2020-02-04 | 深思考人工智能机器人科技(北京)有限公司 | Method and system for generating image captions |
CN110909736A (en) * | 2019-11-12 | 2020-03-24 | 北京工业大学 | Image description method based on long-short term memory model and target detection algorithm |
-
2020
- 2020-03-26 CN CN202010228189.5A patent/CN113449564B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942851A (en) * | 2014-04-02 | 2014-07-23 | 北京中交慧联信息科技有限公司 | Method and device for monitoring vehicle state and driving behavior |
CN108367442A (en) * | 2016-02-25 | 2018-08-03 | 奥林巴斯株式会社 | Effector system and its control method |
CN107578106A (en) * | 2017-09-18 | 2018-01-12 | 中国科学技术大学 | A kind of neutral net natural language inference method for merging semanteme of word knowledge |
CN108830334A (en) * | 2018-06-25 | 2018-11-16 | 江西师范大学 | A kind of fine granularity target-recognition method based on confrontation type transfer learning |
CN109077704A (en) * | 2018-07-06 | 2018-12-25 | 上海玄众医疗科技有限公司 | A kind of infant nurses recognition methods and system |
CN109783666A (en) * | 2019-01-11 | 2019-05-21 | 中山大学 | A kind of image scene map generation method based on iteration fining |
CN110750669A (en) * | 2019-09-19 | 2020-02-04 | 深思考人工智能机器人科技(北京)有限公司 | Method and system for generating image captions |
CN110728203A (en) * | 2019-09-23 | 2020-01-24 | 清华大学 | Sign language translation video generation method and system based on deep learning |
CN110909736A (en) * | 2019-11-12 | 2020-03-24 | 北京工业大学 | Image description method based on long-short term memory model and target detection algorithm |
Non-Patent Citations (2)
Title |
---|
MICHALIS RAPTIS ET AL.: "Poselet Key-framing: A Model for Human Activity Recognition", 《IEEE XPLORE》 * |
雷庆等: "复杂场景下的人体行为识别研究新进展", 《计算机科学》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115968087A (en) * | 2023-03-16 | 2023-04-14 | 中建八局发展建设有限公司 | Interactive light control device of exhibitions center |
CN117197843A (en) * | 2023-11-06 | 2023-12-08 | 中国科学院自动化研究所 | Unsupervised human body part area determination method and device |
CN117197843B (en) * | 2023-11-06 | 2024-02-02 | 中国科学院自动化研究所 | Unsupervised human body part area determination method and device |
Also Published As
Publication number | Publication date |
---|---|
CN113449564B (en) | 2022-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things | |
Deng et al. | MVF-Net: A multi-view fusion network for event-based object classification | |
CN109948475B (en) | Human body action recognition method based on skeleton features and deep learning | |
CN112800903B (en) | Dynamic expression recognition method and system based on space-time diagram convolutional neural network | |
CN110135249B (en) | Human behavior identification method based on time attention mechanism and LSTM (least Square TM) | |
Zheng et al. | Recent advances of deep learning for sign language recognition | |
KR101887637B1 (en) | Robot system | |
CN113449564B (en) | Behavior image classification method based on human body local semantic knowledge | |
CN114896434B (en) | Hash code generation method and device based on center similarity learning | |
CN116524593A (en) | Dynamic gesture recognition method, system, equipment and medium | |
Luqman | An efficient two-stream network for isolated sign language recognition using accumulative video motion | |
He et al. | Global and local fusion ensemble network for facial expression recognition | |
CN112800979B (en) | Dynamic expression recognition method and system based on characterization flow embedded network | |
Musthafa et al. | Real time Indian sign language recognition system | |
CN112949501A (en) | Method for learning object availability from teaching video | |
CN117496567A (en) | Facial expression recognition method and system based on feature enhancement | |
CN116662924A (en) | Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism | |
CN112861848B (en) | Visual relation detection method and system based on known action conditions | |
Zhao et al. | Research on human behavior recognition in video based on 3DCCA | |
Saif et al. | Aggressive action estimation: a comprehensive review on neural network based human segmentation and action recognition | |
KR101913140B1 (en) | Apparatus and method for Optimizing Continuous Features in Industrial Surveillance using Big Data in the Internet of Things | |
Liu | Improved convolutional neural networks for course teaching quality assessment | |
Mittel et al. | Peri: Part aware emotion recognition in the wild | |
CN112784631A (en) | Method for recognizing face emotion based on deep neural network | |
Nan et al. | 3D RES-inception network transfer learning for multiple label crowd behavior recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |