CN113392697A - Human body action recognition method based on bag-of-words model - Google Patents

Human body action recognition method based on bag-of-words model Download PDF

Info

Publication number
CN113392697A
CN113392697A CN202110451802.4A CN202110451802A CN113392697A CN 113392697 A CN113392697 A CN 113392697A CN 202110451802 A CN202110451802 A CN 202110451802A CN 113392697 A CN113392697 A CN 113392697A
Authority
CN
China
Prior art keywords
time
histogram
space
human body
training set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110451802.4A
Other languages
Chinese (zh)
Other versions
CN113392697B (en
Inventor
黄慧
李愈
马燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Normal University
Original Assignee
Shanghai Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Normal University filed Critical Shanghai Normal University
Priority to CN202110451802.4A priority Critical patent/CN113392697B/en
Publication of CN113392697A publication Critical patent/CN113392697A/en
Application granted granted Critical
Publication of CN113392697B publication Critical patent/CN113392697B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a human body action recognition method based on a bag-of-words model, which comprises the following steps: collecting human body joint point information, and preprocessing the collected human body joint point information; extracting motion sequence spatial features and time features according to the preprocessed joint point information, and dividing a data set into a training set and a test set; respectively coding the time characteristics and the space characteristics of the training set, and counting the coding results to finally obtain a joint frequency histogram of the training set; and then training the classifier by using the training set, and testing the classifier by using the test set to obtain the automatic identification of the action type. The invention discloses a human body action recognition method based on a bag-of-words model, which embeds time characteristics into action description, separately considers the time characteristics and the space characteristics of the action, describes the action independently, and provides a stable coding method to construct stable time bag-of-words and space bag-of-words.

Description

Human body action recognition method based on bag-of-words model
Technical Field
The invention relates to the field of motion recognition, in particular to a human body motion recognition method based on a bag-of-words model.
Background
The bag-of-words model method is initially used in text classification and then gradually applied in the field of motion recognition. When the action is recognized, the traditional visual word bag is used for recognizing the action, and the method mainly comprises the following steps:
firstly, preprocessing data and detecting a moving target;
then, extracting the characteristics of the action;
and finally, based on a standard action image library established by the visual word bag, realizing classification and identification of the action.
The above-mentioned action recognition using the conventional visual word bag has the following problems: 1. the traditional bag-of-words model ignores the time characteristic of the action, and the histogram is taken as the description vector of the action, so that the similar situation of each static posture in the action can only be reflected, the execution sequence of the static posture cannot be reflected, and the recognition effect on the reverse-sequence action is poor. 2. The initial clustering center of the K-means clustering is randomly selected, the clustering effect is unstable, the visual dictionary effect is unstable, and multiple experiments are needed to obtain an accurate effect.
Disclosure of Invention
In view of the above defects in the prior art, the technical problems to be solved by the present invention are that the traditional bag-of-words model ignores the action time characteristics, is not good for identifying the reverse-order action, and is unstable in algorithm identification effect. The invention provides a human body action recognition method based on a bag-of-words model, and provides a new time characteristic descriptor, wherein time characteristics are embedded into action description. And the time characteristic and the spatial characteristic of the action are considered separately and described separately, so that a stable coding method is further provided, and the problem of instability of a visual dictionary constructed by the conventional clustering method is solved. And finally, improving the classifier, and distinguishing the action with larger difference from the action with smaller difference by using different classifiers.
In order to achieve the purpose, the invention provides a human body action recognition method based on a bag-of-words model, which comprises the following steps:
collecting human body joint point information, and preprocessing the collected human body joint point information;
extracting motion sequence spatial features and time features according to the preprocessed joint point information, and dividing a data set into a training set and a test set;
symmetrically expanding the training set data;
respectively coding the time features and the space features of the training set, and counting a time feature histogram and a space feature histogram of the training set according to a coding result;
respectively coding the time features and the space features of the test set, and counting a time feature histogram and a space feature histogram of the test set according to a coding result;
obtaining a joint frequency histogram of the training set according to the time characteristic histogram and the space characteristic histogram of the training set, and obtaining a joint frequency histogram of the test set according to the time characteristic histogram of the test set and the space characteristic histogram of the test set;
and then training the classifier by using the training set, and testing the classifier by using the test set to obtain the automatic identification of the action type.
Further, collecting human body joint point information, and preprocessing the collected human body joint point information, specifically comprising:
human body joint point information acquired by using Kinect equipment;
and sequentially carrying out origin normalization, direction normalization and scale normalization on the collected human body joint point information.
Further, the processed three-dimensional coordinate number of the joint point is used as a space feature descriptor;
extracting 28 space angles of human joints for each frame, calculating the interframe difference value between the 28 angles and the previous frame, and taking the interframe difference value of the limb key angle of the joint point as a temporal feature descriptor.
Further, dividing the data set, obtaining a training set and a test set by a method of five-fifth division, and expanding the data of the training set;
and symmetrically turning the data in the training set, exchanging three-dimensional coordinates of joint points on the left side and the right side by taking the trunk of the human body as a central axis, and adding the symmetrical action of the original action in the training set into the training set.
Further, each frame in the action sequence is regarded as a data point, time features and space features are extracted from each data point, the time features and the space features of the training set data are respectively clustered, and a time feature label and a space feature label are obtained from each data point.
Further, performing motion coding in a training set to obtain a temporal feature histogram and a spatial feature histogram of the training set, specifically comprising the following steps:
the training set comprises a plurality of action sequences, each action sequence comprises a plurality of posture frames, the time characteristics of all the frames in the training set are used as time domain characteristics, the space characteristics are used as space domain characteristics, the time domain and the space domain are respectively clustered by utilizing a hierarchical clustering method, each cluster after clustering is regarded as a visual word, and a time bag and a space bag are respectively obtained;
after each frame in the training set obtains the time label and the action label, respectively counting the time label and the space label of each action sequence to obtain a time characteristic histogram and a space characteristic histogram of each action sequence in the training set.
Further, performing motion coding in the test set to obtain a temporal feature histogram and a spatial feature histogram of the test set, specifically including the following steps:
respectively calculating the average distance from the time characteristic of each frame of data in the test set to each cluster in the time word bag, taking the cluster type with the minimum average distance as the time characteristic label of the frame, and obtaining the space characteristic label in the same way;
after each frame in the test set obtains the time label and the space label, respectively counting the time label and the space label of each action sequence to obtain a time characteristic histogram and a space characteristic histogram of each action sequence in the test set.
Further, obtaining a spatiotemporal joint histogram of the training set according to the temporal feature histogram and the spatial feature histogram of the training set, and obtaining a spatiotemporal joint histogram of the test set according to the temporal feature histogram and the spatial feature histogram of the test set; and taking the space-time joint histogram of the action sequence as a final expression vector.
And further, a hierarchical classification method is constructed by adopting an SVM classifier to classify the actions, firstly, a first-layer classifier is used for classifying the similar actions into large classes, then, a second-layer classifier is used for classifying the small classes on the basis of the large classes, and finally, a classification result is obtained.
Technical effects
According to the human body action recognition method based on the bag-of-words model, an effective time descriptor is constructed, the change condition of each main angle along with time in the action is reflected, and the recognition accuracy of the reverse-order action is improved; corresponding descriptors are respectively constructed for the time characteristics and the space characteristics of the actions, so that the phenomenon that the time-space mixed characteristics cannot highlight the difference of the actions in time and space is avoided; constructing stable time word bags and space word bags, and enhancing the stability of the classification effect; by adopting the two layers of classifiers, the problem that the one layer of classifier can not efficiently distinguish the actions with larger differences and the actions with smaller differences at the same time is solved, and the classification efficiency is improved.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a flow chart of a human body motion recognition method based on bag-of-words model according to a preferred embodiment of the present invention;
FIG. 2 is a diagram illustrating joint activities of a human body motion recognition method based on bag-of-words model according to a preferred embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating angles between major joints of a human motion recognition method based on bag-of-words model according to a preferred embodiment of the present invention;
FIG. 4 is a schematic diagram of the main activity parts of a human body motion recognition method based on bag-of-words model according to a preferred embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating angles between main moving parts and coordinate axes of a human body motion recognition method based on a bag-of-words model according to a preferred embodiment of the present invention;
FIG. 6 is a schematic diagram of joint markers of a human body motion recognition method based on bag-of-words model according to a preferred embodiment of the present invention;
FIG. 7 is a schematic diagram of gesture encoding of a human body motion recognition method based on bag-of-words model according to a preferred embodiment of the present invention;
FIG. 8 is a time feature histogram of a human motion recognition method based on bag-of-words model according to a preferred embodiment of the present invention;
fig. 9 is a spatial feature histogram of a human body motion recognition method based on a bag-of-words model according to a preferred embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular internal procedures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
As shown in fig. 1, the present invention provides a human body motion recognition method based on a bag-of-words model, comprising the following steps:
step 100, collecting human body joint point information, preprocessing the collected human body joint point information, and constructing a time characteristic descriptor and a space characteristic descriptor by using the processed data;
step 200, extracting motion sequence spatial features and time features according to the preprocessed joint point information, and dividing a data set into a training set and a test set:
and obtaining a training set and a test set by adopting a five-fifth method, symmetrically overturning the data in the training set for expanding the data of the training set, namely taking the trunk of the human body as a central axis, exchanging three-dimensional coordinates of joint points on the left side and the right side, and adding the symmetrical action of the original action in the training set into the training set.
Step 300, respectively coding the time features and the space features of the training set, and counting a time feature histogram and a space feature histogram of the training set according to a coding result;
step 400, respectively encoding the time features and the space features of the test set, and counting a time feature histogram and a space feature histogram of the test set according to an encoding result;
500, obtaining a training set space-time joint histogram according to the time characteristic histogram and the space characteristic histogram of the training set; obtaining a space-time joint histogram of the test set according to the time characteristic histogram and the space characteristic histogram of the test set;
and 600, training the classifier by using a training set, testing the classifier by using a testing set, and obtaining the automatic identification of the action type.
Wherein, step 100 specifically comprises:
human body joint point information acquired by using Kinect equipment;
carrying out origin normalization, direction normalization and scale normalization operation on the collected human body joint point information in sequence;
taking the three-dimensional coordinates of the joint points subjected to the operations of origin normalization, direction normalization and scale normalization as spatial feature descriptors; specifically, the origin normalization is to perform coordinate system conversion on space coordinates, and convert a coordinate system with a Kinect camera space as an origin to a coordinate system with a hip center point as an origin; the direction normalization is to rotate the human skeleton to enable the human body to face the X axis; the scale normalization is to adjust the skeleton of each subject to the same size. The original point normalization, the direction normalization and the scale normalization are performed on the training set and the test set, and the operation steps are the same.
Selecting 28 space angles representative of human body motion as limb key angles, describing the limb key angles, and using interframe difference values of the limb key angles of the joint points as time feature descriptors.
There are many ways to construct temporal profiles, such as:
1. taking the interframe difference value of the three-dimensional coordinates of the joint points as a time characteristic;
2. taking the direction angle and the elevation angle of the difference vector of two adjacent frames of the joint point as time characteristics;
3. and taking the main movable limb as a vector, and taking the interframe difference value of the direction angle and the elevation angle of the limb vector as a time characteristic.
Step 300-step 500 specifically includes:
taking each action sequence as the ordered arrangement of a plurality of posture frames, wherein the training set comprises a plurality of action sequences, taking the time characteristics of all the frames in the training set as the characteristics of a time domain and the characteristics of a space domain, respectively clustering the time domain and the space domain by using a hierarchical clustering method, regarding each cluster after clustering as a visual word, respectively obtaining a time bag and a space bag, and counting the time characteristic histogram and the space characteristic histogram of each action sequence in the training set;
respectively calculating the average distance from the time characteristic of each frame of data in the test set to each cluster in the time word bag, taking the cluster type with the minimum average distance as the frame time characteristic label, and obtaining a space characteristic label in the same way;
after each frame in the test set obtains a time label and an action label, respectively counting the time label and the space label of each action sequence to obtain a time characteristic histogram and a space characteristic histogram of the test set;
and combining the time characteristic histogram and the space characteristic histogram to obtain a joint frequency histogram which is used as a final expression vector of the action sequence. And (3) constructing a joint histogram, namely, independently considering the time characteristic and the spatial characteristic, respectively constructing a time characteristic frequency histogram and a spatial characteristic frequency histogram, and then connecting the time characteristic frequency histogram and the spatial characteristic frequency histogram in series to obtain the joint frequency histogram. And taking the joint histogram of each action sequence in the training set and the test set as a final representation vector.
Step 600, specifically, an SVM classifier is adopted to construct a hierarchical classification method to classify the actions, firstly, a first-layer classifier is used to classify the similar actions into large classes, then, a second-layer classifier is used to classify the small classes based on the large classes, and finally, a classification result is obtained.
There are also a number of ways in classifying actions, such as:
1. according to the characteristics of different joint points, different weights are given to the different joint points, and during space-time feature extraction, according to the characteristics of the different joint points, the three-dimensional coordinates and the joint angle difference value of the different joint points are multiplied by different weight values to obtain time weighting features and space weighting features, and then follow-up operation is continuously executed;
2. according to the influence on the motion types, the joint points are classified into grades, the joint points which play a key role in the motion types are recorded as first-stage joint points, the other joint points are recorded as second-stage joint points, a first-layer SVM classifier is trained according to a space-time joint frequency histogram of the first-stage joint points to classify the motions into large classes, and then a second-layer SVM classifier is trained according to a space-time joint frequency histogram of the second-stage joint points to classify into small classes.
The following will illustrate specific steps of a human body motion recognition method based on a bag-of-words model by taking a specific example:
the data adopted by the embodiment of the invention is derived from human body joint point information acquired by Kinect equipment, and the acquired joint point data needs to be preprocessed before motion characteristics are extracted. The preprocessing operation comprises origin point normalization, direction normalization and scale normalization. After the origin point normalization processing, the origin point of the human body skeleton is converted to the center of the hip, the connecting line of the left hip and the right hip is parallel to the X axis, and the length of each part of each human body skeleton is scaled to be the same as the length of the reference size. In the scale normalization, the embodiment proposes a self-defined reference size, that is, the average value of the corresponding parts of all the experimental subjects is used as the length of the reference size of the limb part. After preprocessing, dividing the data set into a training set and a test set, dividing by adopting a five-fifth strategy, symmetrically turning the actions of the training set, and bringing the symmetrical actions of the actions in the training set into the training set. And respectively performing feature extraction, action coding and combined frequency histogram construction in a training set and a test set, then training a classifier by using training set data, and testing by using test set data to realize automatic identification of action types. The method comprises the following concrete steps:
1. after data normalization is carried out, the three-dimensional coordinate data of the 20 joint points after normalization are used as space feature descriptors;
the calculation formula of the coordinates of each joint point after the coordinate system conversion is as follows:
Figure BDA0003039000500000061
wherein P ist i(xt,yt,zt) Is the original spatial position, x, of the joint point i in the t-th frametIs the abscissa, ytIs ordinate, ztIs the distance of the point from the camera, Pt 0(xt,yt,zt) The spatial position of the hip joint in the t-th frame,
Figure BDA0003039000500000062
the spatial position of the joint after origin normalization.
2. The schematic diagram of the human joint movement is shown in fig. 2, and according to the joint movement characteristics, 4 joint points of shoulder, elbow, wrist, hip and knee with larger rotation angles are selected as main joint points of the human movement, and the limb supporting the main joint points to move is taken as a main movable limb. Then, based on the selected main joint points and limb parts, 28 angles with distinctiveness are constructed as action key angles, and the key angles are divided into the following two types:
1) the four main movable joint points of the left elbow, the right elbow, the left knee and the right knee form 4 angles with adjacent joint points. The selected angle is shown in fig. 3.
2) The left and right big arms, small arms, thighs and crus have 8 main moving parts which form included angles with three coordinate axes. The 8 main active sites selected are shown in FIG. 4, and the right arm is taken as an example to show the angle formed by the two major active sites and the coordinate axis, as shown in FIG. 5.
For convenience of description, the joint points are symbolized, a schematic diagram of joint point marks is shown in fig. 6, and 28 limb key angles are expressed according to the method, as shown in table 1. The inter-frame difference values of the 28 limb key angles are used as a temporal feature descriptor of each gesture in the action sequence.
TABLE 1
Figure BDA0003039000500000063
Figure BDA0003039000500000071
3. Regarding each frame in the action sequence as a data point, and obtaining the time characteristic and the space characteristic of each data point according to the method; firstly, a stable hierarchical clustering method is used for clustering the time characteristic and the spatial characteristic of training set data respectively, and a time characteristic label and a spatial characteristic label are obtained for each data point. In a time domain and a space domain of a training set, each clustered cluster is regarded as a visual word, and a time word bag and a space word bag are respectively obtained; and then calculating the average distance between the time characteristic of each data point in the test set and each cluster in the time word bag, and coding the time characteristic to the cluster with the minimum average distance, wherein the coding method of the space word bag is the same. Thereby obtaining a temporal label and a spatial label for each static gesture frame in the test set. The action code diagram is shown in FIG. 7;
4. and respectively counting the occurrence frequency of each code word in the time word bag and the space word bag of the training set and the test set, and respectively constructing a time characteristic histogram and a space characteristic histogram of each action sequence, as shown in fig. 8 and 9. Then combining the two frequency histograms to serve as a final expression vector of the action sequence;
5. the method comprises the steps of adopting an SVM classifier to construct a hierarchical classification method to classify actions, firstly classifying similar actions into the same large class, setting a primary class label of each action sequence, then subdividing the action sequences in each large class, and setting a secondary class label of each action. In the first-layer classifier, an SVM classifier is trained by taking the joint frequency histogram of a training set as a characteristic and taking the first-level category as a label. And respectively training a two-layer classifier in each class, and training the SVM classifier in the two-layer classifier by taking the joint frequency histogram of the training set as a characteristic and taking the two-level class as a label. During testing, a first-level classifier is used to obtain a first-level class number of a test sequence, and then a corresponding second-level classifier is used to obtain a specific class of a test set.
The invention relates to a human body action recognition method based on a bag-of-words model, which constructs an effective time descriptor, can reflect the change condition of each main angle along with time in the action and improves the recognition accuracy rate of the reverse-order action; corresponding descriptors are respectively constructed for the time characteristics and the space characteristics of the actions, so that the phenomenon that the time-space mixed characteristics cannot highlight the difference of the actions in time and space is avoided; constructing stable time word bags and space word bags, and enhancing the stability of the classification effect; by adopting the two layers of classifiers, the problem that the one layer of classifier can not efficiently distinguish the actions with larger differences and the actions with smaller differences at the same time is solved, and the classification efficiency is improved.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (9)

1. A human body action recognition method based on a bag-of-words model comprises the following steps:
collecting human body joint point information, and preprocessing the collected human body joint point information;
extracting motion sequence spatial features and time features according to the preprocessed joint point information, and dividing a data set into a training set and a test set;
respectively coding the time features and the space features of the training set, and counting a time feature histogram and a space feature histogram of the training set according to a coding result;
respectively coding the time features and the space features of the test set, and counting a time feature histogram and a space feature histogram of the test set according to a coding result;
obtaining a joint frequency histogram of the training set according to the time characteristic histogram and the space characteristic histogram of the training set, and obtaining a joint frequency histogram of the test set according to the time characteristic histogram of the test set and the space characteristic histogram of the test set;
and then training the classifier by using the training set, and testing the classifier by using the test set to obtain the automatic identification of the action type.
2. The human body motion recognition method based on the bag-of-words model as claimed in claim 1, wherein collecting human body joint point information, and preprocessing the collected human body joint point information specifically comprises:
human body joint point information acquired by using Kinect equipment;
and sequentially carrying out origin normalization, direction normalization and scale normalization on the collected human body joint point information.
3. The human body motion recognition method based on the bag-of-words model as claimed in claim 2, characterized in that the processed three-dimensional coordinates of the joint points are used as a spatial feature descriptor;
extracting 28 space angles of human joints for each frame, calculating the interframe difference value between the 28 angles and the previous frame, and taking the interframe difference value of the limb key angle of the joint point as a temporal feature descriptor.
4. The human body motion recognition method based on the bag-of-words model as claimed in claim 3, characterized in that the data set is divided, a method of five or five parts is adopted to obtain a training set and a test set, and the training set data is expanded;
and symmetrically turning the data in the training set, exchanging three-dimensional coordinates of joint points on the left side and the right side by taking the trunk of the human body as a central axis, and adding the symmetrical action of the original action in the training set into the training set.
5. The bag-of-words model-based human motion recognition method as claimed in claim 4, wherein each frame in the motion sequence is regarded as a data point, and each data point extracts a temporal feature and a spatial feature; and clustering the time characteristic and the spatial characteristic of the training set data respectively, and obtaining a time characteristic label and a spatial characteristic label for each data point.
6. The human body motion recognition method based on the bag-of-words model as claimed in claim 5, wherein constructing the temporal feature histogram and the spatial feature histogram of the training set specifically comprises the following steps:
the training set comprises a plurality of action sequences, each action sequence comprises a plurality of posture frames, the time characteristics of all the frames in the training set are used as time domain characteristics, the space characteristics are used as space domain characteristics, the time domain and the space domain are respectively clustered by utilizing a hierarchical clustering method, each cluster after clustering is regarded as a visual word, and a time bag and a space bag are respectively obtained;
after each frame in the training set obtains the time label and the action label, respectively counting the time label and the space label of each action sequence to obtain a time characteristic histogram and a space characteristic histogram of each action sequence in the training set.
7. The human body motion recognition method based on the bag-of-words model as claimed in claim 6, wherein obtaining the time feature histogram and the spatial feature histogram of the test set specifically comprises the following steps:
respectively calculating the average distance from the time characteristic of each frame of data in the test set to each cluster in the time word bag, taking the cluster type with the minimum average distance as the frame time characteristic label, and obtaining a space characteristic label in the same way;
after each frame in the test set obtains the time label and the space label, respectively counting the time label and the space label of each action sequence to obtain a time characteristic histogram and a space characteristic histogram of each action sequence in the test set.
8. The human body motion recognition method based on the bag-of-words model as claimed in claim 7, wherein a spatiotemporal joint histogram of the training set is obtained according to the temporal feature histogram and the spatial feature histogram of the training set, and a spatiotemporal joint histogram of the test set is obtained according to the temporal feature histogram and the spatial feature histogram of the test set; and taking the space-time joint histogram of the action sequence as a final expression vector.
9. The human body motion recognition method based on the bag-of-words model as claimed in claim 8, wherein the actions are classified by using an SVM classifier to construct a hierarchical classification method, wherein the actions are classified by using a first-layer classifier to classify similar actions into large classes, and then by using a second-layer classifier to classify small classes based on the large classes, and finally, a classification result is obtained.
CN202110451802.4A 2021-04-26 2021-04-26 Human body action recognition method based on bag-of-words model Active CN113392697B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110451802.4A CN113392697B (en) 2021-04-26 2021-04-26 Human body action recognition method based on bag-of-words model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110451802.4A CN113392697B (en) 2021-04-26 2021-04-26 Human body action recognition method based on bag-of-words model

Publications (2)

Publication Number Publication Date
CN113392697A true CN113392697A (en) 2021-09-14
CN113392697B CN113392697B (en) 2024-07-09

Family

ID=77617573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110451802.4A Active CN113392697B (en) 2021-04-26 2021-04-26 Human body action recognition method based on bag-of-words model

Country Status (1)

Country Link
CN (1) CN113392697B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226713A (en) * 2013-05-16 2013-07-31 中国科学院自动化研究所 Multi-view behavior recognition method
WO2014200437A1 (en) * 2013-06-12 2014-12-18 Agency For Science, Technology And Research Method and system for human motion recognition
CN105825240A (en) * 2016-04-07 2016-08-03 浙江工业大学 Behavior identification method based on AP cluster bag of words modeling
CN106056043A (en) * 2016-05-19 2016-10-26 中国科学院自动化研究所 Animal behavior identification method and apparatus based on transfer learning
CN106840166A (en) * 2017-02-15 2017-06-13 北京大学深圳研究生院 A kind of robot localization and air navigation aid based on bag of words woodlot model
CN107203745A (en) * 2017-05-11 2017-09-26 天津大学 A kind of across visual angle action identification method based on cross-domain study
CN109508684A (en) * 2018-11-21 2019-03-22 中山大学 A kind of method of Human bodys' response in video
CN110084211A (en) * 2019-04-30 2019-08-02 苏州大学 A kind of action identification method
CN111914798A (en) * 2020-08-17 2020-11-10 四川大学 Human body behavior identification method based on skeletal joint point data

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226713A (en) * 2013-05-16 2013-07-31 中国科学院自动化研究所 Multi-view behavior recognition method
WO2014200437A1 (en) * 2013-06-12 2014-12-18 Agency For Science, Technology And Research Method and system for human motion recognition
US20160148391A1 (en) * 2013-06-12 2016-05-26 Agency For Science, Technology And Research Method and system for human motion recognition
CN105825240A (en) * 2016-04-07 2016-08-03 浙江工业大学 Behavior identification method based on AP cluster bag of words modeling
CN106056043A (en) * 2016-05-19 2016-10-26 中国科学院自动化研究所 Animal behavior identification method and apparatus based on transfer learning
CN106840166A (en) * 2017-02-15 2017-06-13 北京大学深圳研究生院 A kind of robot localization and air navigation aid based on bag of words woodlot model
CN107203745A (en) * 2017-05-11 2017-09-26 天津大学 A kind of across visual angle action identification method based on cross-domain study
CN109508684A (en) * 2018-11-21 2019-03-22 中山大学 A kind of method of Human bodys' response in video
CN110084211A (en) * 2019-04-30 2019-08-02 苏州大学 A kind of action identification method
CN111914798A (en) * 2020-08-17 2020-11-10 四川大学 Human body behavior identification method based on skeletal joint point data

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
HONG LIU等: "Sequential Bag-of-Words model for human action classification", 《CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY》, vol. 1, no. 2, 21 October 2016 (2016-10-21), pages 125 - 136 *
PARUL SHUKLA等: "Bag-of-Features based Activity Classification using Body-joints Data", 《IN PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS》, 31 December 2015 (2015-12-31), pages 314 - 322 *
XIAOJIANG PENG等: "Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice", 《COMPUTER VISION AND IMAGE UNDERSTANDING》, vol. 150, 23 March 2016 (2016-03-23), pages 109 - 125, XP029628346, DOI: 10.1016/j.cviu.2016.03.013 *
YU LI等: "Human Action Recognition Method Based on Bag-of-Words Model", 《2023 IEEE 11TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC)》, 1 February 2024 (2024-02-01), pages 1871 - 1879 *
李建新: "基于局部特征共生关系的行为识别算法", 《合肥工业大学学报(自然科学版)》, vol. 43, no. 11, 28 November 2020 (2020-11-28), pages 1500 - 1505 *
李愈等: "基于词袋模型的人体动作识别方法", 《计算机应用与软件》, vol. 40, no. 11, 12 November 2023 (2023-11-12), pages 170 - 175 *
李盛楠: "基于Kinect和词袋模型的人体行为识别研究", 《中国优秀硕士学位论文全文数据库:信息科技辑》, no. 2020, 15 July 2020 (2020-07-15), pages 138 - 1087 *
柳似霖等: "基于局部特征词袋模型人体动作识别关键帧选取方法", 《应用光学》, vol. 40, no. 2, 15 March 2019 (2019-03-15), pages 265 - 270 *
邵延华: "基于计算机视觉的人体行为识别研究", 《中国博士学位论文全文数据库:信息科技辑》, no. 2016, 15 January 2016 (2016-01-15), pages 138 - 154 *

Also Published As

Publication number Publication date
CN113392697B (en) 2024-07-09

Similar Documents

Publication Publication Date Title
CN108052896B (en) Human body behavior identification method based on convolutional neural network and support vector machine
Ribeiro et al. Human activity recognition from video: modeling, feature selection and classification architecture
Jalal et al. Human daily activity recognition with joints plus body features representation using Kinect sensor
D’Orazio et al. Recent trends in gesture recognition: how depth data has improved classical approaches
KR20180080081A (en) Method and system for robust face dectection in wild environment based on cnn
Potdar et al. A convolutional neural network based live object recognition system as blind aid
CN105469050B (en) Video behavior recognition methods based on local space time's feature description and pyramid words tree
Haque et al. Two-handed bangla sign language recognition using principal component analysis (PCA) and KNN algorithm
Chen et al. TriViews: A general framework to use 3D depth data effectively for action recognition
CN111914643A (en) Human body action recognition method based on skeleton key point detection
CN105095880A (en) LGBP encoding-based finger multi-modal feature fusion method
Waheed et al. A novel deep learning model for understanding two-person interactions using depth sensors
Chan et al. A 3-D-point-cloud system for human-pose estimation
CN103577804A (en) Abnormal human behavior identification method based on SIFT flow and hidden conditional random fields
Chakraborty et al. View-invariant human-body detection with extension to human action recognition using component-wise HMM of body parts
Batool et al. Fundamental recognition of ADL assessments using machine learning engineering
Wali et al. Incremental learning approach for events detection from large video dataset
Cho et al. Human action recognition system based on skeleton data
Hachaj et al. Human actions modelling and recognition in low-dimensional feature space
Ma et al. Sports competition assistant system based on fuzzy big data and health exercise recognition algorithm
CN113392697A (en) Human body action recognition method based on bag-of-words model
Ramanathan et al. Combining pose-invariant kinematic features and object context features for rgb-d action recognition
CN114943873A (en) Method and device for classifying abnormal behaviors of construction site personnel
Chen et al. A Human Activity Recognition Approach Based on Skeleton Extraction and Image Reconstruction
Li et al. Pedestrian detection based on clustered poselet models and hierarchical and–or grammar

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant