CN115147933B - Human body preset behavior identification method and device, equipment terminal and storage medium - Google Patents

Human body preset behavior identification method and device, equipment terminal and storage medium Download PDF

Info

Publication number
CN115147933B
CN115147933B CN202211059859.0A CN202211059859A CN115147933B CN 115147933 B CN115147933 B CN 115147933B CN 202211059859 A CN202211059859 A CN 202211059859A CN 115147933 B CN115147933 B CN 115147933B
Authority
CN
China
Prior art keywords
preset
human body
model
classification
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211059859.0A
Other languages
Chinese (zh)
Other versions
CN115147933A (en
Inventor
林家辉
周有喜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Core Computing Integrated Shenzhen Technology Co ltd
Original Assignee
Shenzhen Aishen Yingtong Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Aishen Yingtong Information Technology Co Ltd filed Critical Shenzhen Aishen Yingtong Information Technology Co Ltd
Priority to CN202211059859.0A priority Critical patent/CN115147933B/en
Publication of CN115147933A publication Critical patent/CN115147933A/en
Application granted granted Critical
Publication of CN115147933B publication Critical patent/CN115147933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7747Organisation of the process, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/11Hand-related biometrics; Hand pose recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The identification method comprises the steps of marking a first preset target area of each pedestrian image in a first pedestrian image set to obtain a marked first data set, detecting and training the first preset target area of a pedestrian based on the first data set to generate a detection model of the first preset target area of the human body, extracting images and marking information corresponding to the first preset target area of each pedestrian image in the marked first data set to obtain a second pedestrian image set, marking a second preset target area of each pedestrian image in the second pedestrian image set to obtain a second data set, classifying and detecting and training the preset behaviors of the human body based on the second data set by adopting a structural weight parameterized model to generate a corresponding classification model, further constructing the identification model of the preset behaviors of the human body, and reducing the identification cost.

Description

Human body preset behavior identification method and device, equipment terminal and storage medium
Technical Field
The application relates to the field of image processing, in particular to a method and a device for recognizing human body preset behaviors, an equipment terminal and a storage medium.
Background
Illegal smoking behavior or phone calls in smoking ban sites are very likely to cause serious safety hazards, for example, illegal smoking or phone calls in gas stations or other smoke ban sites where fireworks are strictly prohibited.
At present, the smoke detection method based on the sensor generally needs to install the smoke sensor in a specific occasion, and the implementation cost is high.
Disclosure of Invention
In view of this, the application provides a method for recognizing a human body preset behavior, which reduces the economic cost of the recognition process.
A method for recognizing preset behaviors of a human body comprises the following steps:
acquiring a first pedestrian image set containing preset behaviors;
marking a first preset target area of each pedestrian image in the first pedestrian image set to obtain a marked first data set, wherein the mark comprises human body preset behavior information corresponding to pedestrians;
based on the first data set, carrying out detection training on a first preset target area of the pedestrian to generate a detection model of the first preset target area of the human body;
extracting images corresponding to the first preset target area of each pedestrian image and the marking information in the marked first data set to obtain a second pedestrian image set;
marking a second preset target area of each pedestrian image in a second pedestrian image set to obtain a second data set, wherein the second preset target area is in the first preset target area;
based on the second data set, carrying out classification detection training on the preset human body behaviors by adopting the structural weight parameterized model to generate a classification model of the preset human body behaviors;
and constructing a recognition model of the preset behaviors of the human body according to the detection model and the classification model.
In one embodiment, the structural parameterization model includes a backbone network, a classification branch network and a detection branch network, the second preset target area includes a mouth area of the human body and a hand area of the human body, and the step of performing classification detection training on the preset human body behaviors by using the structural parameterization model based on the second data set to generate the classification model of the preset human body behaviors includes:
extracting information of a second preset target area in each pedestrian image based on a backbone network to obtain semantic information of a mouth area and a hand area of the pedestrian;
respectively sending the semantic information into a classification branch network and a detection branch network, outputting a corresponding preliminary classification result through the classification branch network, and outputting a corresponding detection result through the detection branch network;
calculating a first preset loss function according to the primary classification result to obtain a corresponding first preset loss function value;
calculating a second preset loss function according to the detection result to obtain a corresponding second preset loss function value;
weighting the first preset loss function value and the corresponding second preset loss function value to obtain a weighted total loss value;
and obtaining an optimized gradient according to the weighted total loss value, and updating the weight and the bias until the weighted loss function is converged to generate a classification model of the preset behavior of the human body. In one embodiment, the step of constructing the recognition model of the preset behavior of the human body according to the detection model and the classification model comprises the following steps:
removing a detection branch network in the classification model to obtain a target classification model;
and constructing a recognition model of the preset behaviors of the human body according to the detection model and the target classification model.
In one embodiment, the identification method further includes:
and carrying out classification and identification of preset human behaviors on the input pedestrian image through the identification model to obtain a corresponding identification result.
In one embodiment, the first predetermined loss function is:
Figure 730266DEST_PATH_IMAGE001
P t representing the probability value of a positive sample, L cls (P t ) Is represented by P t A corresponding first predetermined loss function value,
Figure 469552DEST_PATH_IMAGE002
an adjustment factor representing the loss of positive and negative samples,
Figure DEST_PATH_IMAGE003
representing the loss weight adjustment factor.
In one embodiment, the second predetermined loss function is:
Figure 736848DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE005
is a function of the second predetermined loss in time,
Figure 219782DEST_PATH_IMAGE006
is a bounding box regression loss, x n Representing the output value, y, of each sample n Representing the true label value of each sample, n represents the total number of samples,
Figure DEST_PATH_IMAGE007
it is expressed as a generalized cross-over ratio,
Figure 754668DEST_PATH_IMAGE008
the loss function is a generalized cross-over ratio, wherein A is a prediction box, B is a real box, and C is a minimum box containing A and B.
In one embodiment, the weighting formula corresponding to the step of weighting the first predetermined loss function value and the corresponding second predetermined loss function value to obtain the weighted total loss value is:
Figure DEST_PATH_IMAGE009
L cls representing a first predetermined loss function value,
Figure 930435DEST_PATH_IMAGE010
and a is a corresponding second preset Loss function value, loss represents a weighted total Loss value, and a is a dimension weight value.
In addition, still provide the recognition device of human preset action, include:
the image set acquisition unit is used for acquiring a first pedestrian image set containing preset behaviors;
the first marking unit is used for marking a first preset target area of each pedestrian image in the first pedestrian image set to obtain a marked first data set, and the mark comprises human body preset behavior information corresponding to pedestrians;
the detection model generation unit is used for carrying out detection training on a first preset target area of the pedestrian based on the first data set so as to generate a detection model of the first preset target area of the human body;
the image set generating unit is used for extracting the image corresponding to the first preset target area of each pedestrian image and the mark information in the marked first data set to obtain a second pedestrian image set;
the second marking unit is used for marking a second preset target area of each pedestrian image in a second pedestrian image set to obtain a second data set, wherein the second preset target area is in the first preset target area;
the classification model generation unit is used for carrying out classification detection training on the preset human behavior by adopting the structural parameterization model based on the second data set so as to generate a classification model of the preset human behavior;
and the recognition model generating unit is used for constructing a recognition model of the preset behaviors of the human body according to the detection model and the classification model.
In addition, a device terminal is provided, which includes a processor and a memory, the memory is used for storing computer programs, and the processor runs the computer programs to make the device terminal execute the identification method.
Furthermore, a readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the above-mentioned identification method.
The method for recognizing the preset human behavior comprises the steps of obtaining a first pedestrian image set containing the preset behavior, marking a first preset target area of each pedestrian image in the first pedestrian image set to obtain a marked first data set, wherein the mark comprises preset human behavior information corresponding to pedestrians, detecting and training the first preset target area of the pedestrians on the basis of the first data set to generate a detection model of the first preset target area of the human body, extracting images and mark information corresponding to the first preset target area of each pedestrian image in the marked first data set to obtain a second pedestrian image set, marking a second preset target area of each pedestrian image in the second data set to obtain a parameterized second data set, constructing a recognition model of the preset human behavior in the first preset target area on the basis of the second data set, performing classification detection and training on the preset human behavior by using a structural weight model to generate a classification model of the preset human behavior, constructing the recognition model of the preset human behavior according to the detection model and the classification model of the preset human behavior, constructing the recognition model of the preset human body by using the second preset target area to reduce the detection cost of the detection model of the preset human body, and constructing the classification model of the classification of the preset human body.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for recognizing a preset behavior of a human body according to an embodiment of the present application;
fig. 2 is a flowchart illustrating a method for generating a classification model of a preset behavior of a human body according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a network structure of a classification model provided in an embodiment of the present application;
FIG. 4 is a schematic flow chart of a method for obtaining a recognition model according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a network structure of a recognition model provided in an embodiment of the present application;
fig. 6 is a schematic flowchart of another method for recognizing a preset human behavior according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an apparatus for recognizing a human body preset behavior according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. Based on the embodiments in the present application. The embodiments described below and their technical features may be combined with each other without conflict.
Generally, fireworks are strictly forbidden at gas stations, dangerous goods storage stations and natural gas stations, and telephone calling behaviors are generally not allowed in some places with special requirements, so that real-time identification of human preset behaviors such as smoking, telephone calling and the like has high practical application value.
As shown in fig. 1, a method for recognizing a preset behavior of a human body is provided, which includes:
step S110, a first pedestrian image set containing preset behaviors of the human body is obtained.
The specific preset behavior requirements can be set according to different places, for example, at a gas station, the human preset behaviors include behaviors of smoking and making a call of a pedestrian and the like.
Through the camera device, the pedestrian image containing the human body preset behaviors can be acquired in real time so as to establish a first pedestrian image set.
Step S120, marking a first preset target area of each pedestrian image in the first pedestrian image set to obtain a marked first data set, wherein the mark comprises human body preset behavior information corresponding to pedestrians.
After the first pedestrian image set is obtained, the first preset target area of each pedestrian image in the first pedestrian image set needs to be further marked, for example, when the preset behavior of the human body includes smoking and calling behaviors of pedestrians, the upper half area of each pedestrian image can be marked, and the mark includes information of the behaviors of smoking and calling of pedestrians.
Step S130, performing detection training on a first preset target region of the pedestrian based on the first data set to generate a detection model of the first preset target region of the human body.
In order to accurately recognize and train the preset human behavior, the detection training is usually performed on an image of a larger area range corresponding to the preset human behavior, for example, when the preset human behavior includes smoking and calling behaviors of pedestrians, the detection training may be performed on the upper half body area of each image of the pedestrians, so as to generate a detection model of the upper half body area of the human body.
In one embodiment, a nanodet-plus model is adopted to perform detection training on a first preset target region of a pedestrian so as to generate a detection model of the first preset target region of the human body.
Step S140, in the marked first data set, extracting an image corresponding to the first preset target region of each pedestrian image and the marking information to obtain a second pedestrian image set.
The marked first data set is processed, for example, the image corresponding to the first preset target area of each pedestrian image and the marking information are extracted, so that a second pedestrian image set can be obtained, and a foundation is laid for subsequent human body preset behavior recognition.
In one embodiment, when the image corresponding to the first preset target region of each pedestrian image and the mark information are extracted, the image within the human body outline range of the upper body of the pedestrian may be extracted, and the mark information in step S130 may be retained to obtain the second pedestrian image set.
Step S150, in the second pedestrian image set, marking a second preset target area of each pedestrian image to obtain a second data set, where the second preset target area is in the first preset target area.
In an embodiment, the first predetermined target area is an upper half of a human body, and the second predetermined target area may include a face area or a hand area of the human body.
And step S160, based on the second data set, carrying out classification detection training on the preset human behavior by adopting the structure parameterization model so as to generate a classification model of the preset human behavior.
After the second data set is obtained, the structure heavy parameterized model can be used for carrying out classification detection training on the human body preset behaviors so as to generate a classification model of the human body preset behaviors, and the structure heavy parameterized model is a Repvg model.
And S170, constructing an identification model of the preset behaviors of the human body according to the detection model and the classification model.
According to the method for recognizing the human body preset behaviors, firstly, a detection model of a first preset target region of a human body is generated so as to detect the first preset target region in each pedestrian image to establish a second data set, then, based on the second data set, a structural weight parameterization model is adopted to carry out classification detection training on the human body preset behaviors so as to generate a classification model of the human body preset behaviors, and finally, a recognition model of the human body preset behaviors is established according to the detection model and the classification model, so that the recognition of the human body preset behaviors can be completed by using the recognition model, and the economic cost is reduced.
In one embodiment, the structural reparameterization model includes a backbone network, a classification branch network and a detection branch network, the second preset target area includes a mouth area of the human body and a hand area of the human body, as shown in fig. 2, and step S160 includes:
step S161, performing information extraction on the second preset target area in each pedestrian image based on the backbone network to obtain semantic information of the mouth area and the hand area of the pedestrian.
The backbone network may include a plurality of Repvg modules, and is configured to extract information from a second preset target region in each pedestrian image to obtain semantic information of a mouth region and a hand region of a pedestrian, where the semantic information is usually deep semantic information.
In one example, the semantic information of the pedestrian's mouth region includes mouth shape information at the time of smoking, mouth opening width information at the time of smoking, and butt information of the mouth at the time of smoking, and the semantic information of the hand region includes hand motion posture information at the time of smoking and butt information of the hand.
And step S162, respectively sending the semantic information into a classification branch network and a detection branch network, outputting a corresponding preliminary classification result through the classification branch network, and outputting a corresponding detection result through the detection branch network.
In the training process, in order to achieve a good training effect, in the training stage, a detection branch network is introduced to assist in training the classification branch network, after semantic information is obtained, the semantic information is respectively sent to the classification branch network and the detection branch network, a corresponding preliminary classification result is output through the classification branch network, and a corresponding detection result is output through the detection branch network.
Step S163, performing a first preset loss function calculation according to the preliminary classification result, to obtain a corresponding first preset loss function value.
And step S164, calculating a second preset loss function according to the detection result to obtain a corresponding second preset loss function value.
Step S165, weighting the first preset loss function value and the corresponding second preset loss function value to obtain a weighted total loss value.
In one example, the preset human behavior includes smoking and calling behaviors, when the pedestrian image is a front pedestrian image, the backbone network and the classification branch network can accurately obtain a primary classification result, however, when the pedestrian image is a back pedestrian image, the mouth and hand regions are not detected, and therefore the primary classification result is difficult to distinguish, and at this time, the classification loss of the classification branch network can be weighted (i.e., the classification loss of the classification branch network is increased) through the established detection branch network, so that the result of the whole model tends to a big data result observed by human, namely: often most people with their body facing the back do not have smoking or phone behavior.
And step S166, obtaining an optimized gradient according to the weighted total loss value, and updating the weight and the bias until the weighted loss function is converged to generate a classification model of the preset behavior of the human body.
In this embodiment, the first preset loss function value and the corresponding second preset loss function value are weighted to obtain a weighted total loss value, and the classification branch network can be trained by using the detection branch network, so as to weight the classification loss of the classification branch network (i.e., improve the classification loss of the classification branch network), so that the result of the whole model tends to the big data result observed by human, the redundant detection amount is reduced, and the detection efficiency of the whole classification model is improved.
In one embodiment, the classification model is structured as shown in FIG. 3, and includes a backbone network 210, a classification branch network 220, and a detection branch network 230.
In one embodiment, as shown in fig. 4, step S170 includes:
step S171, removing the detection branch network in the classification model to obtain the target classification model.
And S172, constructing an identification model of the preset behaviors of the human body according to the detection model and the target classification model.
In this embodiment, when an actual recognition model is constructed and deployed, in order to improve efficiency and comprehensively consider actual application requirements, a detection branch network in a classification model is removed to obtain a target classification model, and then a recognition model of a human preset behavior is constructed according to the detection model and the target classification model.
In one embodiment, aiming at behaviors such as smoking and calling, the cigarette end or the telephone belongs to a small target, and at this time, the smoking and calling behaviors are usually only required to be identified without indicating the area information of the cigarette and the mobile phone, namely the cigarette and the mobile phone can be completed through the classification branch network without continuously operating the detection branch network, so that the detection branch network in the classification model can be removed.
Further, after the detection branch network in the classification model is removed, the memory space occupied by the whole model can be reduced, and the inference time can be shortened, so that the real-time operation on the camera at the embedded end is realized, and the defect that a conventional target detection algorithm, such as a yolo-based target detection algorithm, is difficult to directly deploy on a low-computation-power camera for real-time detection is further overcome.
In one embodiment, the structure of the recognition model is shown in FIG. 5, and includes a detection model 240, a backbone network 210, and a classification branch network 220.
In one embodiment, as shown in fig. 6, the above identification method further includes:
and S180, performing classification recognition of the preset human body behaviors on the input pedestrian image through the recognition model to obtain a corresponding recognition result.
The input pedestrian image is detected through the detection model so as to output a detection result of a first preset target area of the pedestrian, the detection result is input into the target classification model so as to classify and recognize preset behaviors in a second preset target area, and a corresponding recognition result is obtained.
In one embodiment, the first predetermined loss function is:
Figure 386824DEST_PATH_IMAGE001
P t representing the probability value of a positive sample, L cls (P t ) Represents P t A corresponding first predetermined loss function value,
Figure 640825DEST_PATH_IMAGE002
an adjustment factor representing the loss of positive and negative samples,
Figure 233481DEST_PATH_IMAGE003
representing the loss weight adjustment factor.
In one embodiment, the second predetermined loss function is:
Figure 48990DEST_PATH_IMAGE004
Figure 992675DEST_PATH_IMAGE005
for the purpose of a second predetermined loss function,
Figure 551832DEST_PATH_IMAGE006
is a bounding box regression loss, x n Representing the output value, y, of each sample n Representing the true label value of each sample, n represents the total number of samples,
Figure 264574DEST_PATH_IMAGE007
it is expressed as a generalized cross-over ratio,
Figure 250984DEST_PATH_IMAGE008
the loss function is a generalized cross-over ratio, wherein A is a prediction box, B is a real box, and C is a minimum box containing A and B.
In one embodiment, the weighting formula in the step of weighting the first preset loss function value and the corresponding second preset loss function value to obtain the weighted total loss value is as follows:
Figure 183430DEST_PATH_IMAGE009
L cls representing a first predetermined loss function value,
Figure 280699DEST_PATH_IMAGE010
and the Loss is a corresponding second preset Loss function value, loss represents a weighted total Loss value, and a is a dimension weight value.
In this embodiment, a is a dimension weight value, so that the classification loss and the detection loss can be selected according to an actual task in the same dimension.
In addition, as shown in fig. 7, there is also provided an apparatus 300 for recognizing a preset behavior of a human body, including:
an image set acquisition unit 310, configured to acquire a first pedestrian image set including a preset behavior;
the first marking unit 320 is configured to mark a first preset target area of each pedestrian image in the first pedestrian image set to obtain a marked first data set, where the mark includes human body preset behavior information corresponding to a pedestrian;
the detection model generation unit 330 is configured to perform detection training on a first preset target region of a pedestrian based on the first data set to generate a detection model of the first preset target region of the human body;
the image set generating unit 340 is configured to extract, in the marked first data set, an image corresponding to the first preset target region of each pedestrian image and the marking information to obtain a second pedestrian image set;
a second marking unit 350, configured to mark, in a second pedestrian image set, a second preset target region of each pedestrian image to obtain a second data set, where the second preset target region is in the first preset target region;
a classification model generation unit 360, configured to perform classification detection training on the preset human behavior by using the structural parameterization model based on the second data set, so as to generate a classification model of the preset human behavior;
and the recognition model generation unit 370 constructs a recognition model of the preset behavior of the human body according to the detection model and the classification model.
In addition, a device terminal is provided, which includes a processor and a memory, the memory is used for storing computer programs, and the processor runs the computer programs to make the device terminal execute the identification method.
Furthermore, a readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the above-mentioned identification method.
The division of the units in the recognition apparatus 300 is only for illustration, and in other embodiments, the recognition apparatus 300 may be divided into different units as needed to complete all or part of the functions of the recognition apparatus 300. For the specific limitations of the above-mentioned identification apparatus 300, reference may be made to the limitations of the above-mentioned methods, which are not described herein again.
That is, the above description is only an embodiment of the present application, and not intended to limit the scope of the present application, and all equivalent structures or equivalent flow transformations made by using the contents of the specification and the drawings, such as mutual combination of technical features between various embodiments, or direct or indirect application to other related technical fields, are included in the scope of the present application.
In addition, structural elements having the same or similar characteristics may be identified by the same or different reference numerals. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In this application, the word "for example" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as "for example" is not necessarily to be construed as preferred or advantageous over other embodiments. The previous description is provided to enable any person skilled in the art to make and use the present application. In the foregoing description, various details have been set forth for the purpose of explanation.
It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes are not shown in detail to avoid obscuring the description of the present application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

Claims (8)

1. A method for recognizing preset behaviors of a human body is characterized by comprising the following steps:
acquiring a first pedestrian image set containing the preset behaviors of the human body;
marking a first preset target area of each pedestrian image in the first pedestrian image set to obtain a marked first data set, wherein the mark comprises human body preset behavior information corresponding to pedestrians;
based on the first data set, carrying out detection training on a first preset target area of the pedestrian to generate a detection model of the first preset target area of the human body;
extracting images corresponding to the first preset target area of each pedestrian image and the marking information in the marked first data set to obtain a second pedestrian image set;
marking a second preset target area of each pedestrian image in the second pedestrian image set to obtain a second data set, wherein the second preset target area is in the first preset target area;
based on the second data set, carrying out classification detection training on the preset human body behaviors by adopting a structural parameterization model so as to generate a classification model of the preset human body behaviors;
constructing an identification model of the human body preset behavior according to the detection model and the classification model;
the structural parameterization model comprises a backbone network, a classification branch network and a detection branch network, the second preset target area comprises a mouth area of a human body and a hand area of the human body, and the step of performing classification detection training on the preset human body behaviors by adopting the structural parameterization model based on the second data set so as to generate the classification model of the preset human body behaviors comprises the following steps of:
extracting information of a second preset target area in each pedestrian image based on the backbone network to obtain semantic information of a mouth area and a hand area of the pedestrian;
the semantic information is respectively sent to the classification branch network and the detection branch network, corresponding preliminary classification results are output through the classification branch network, and corresponding detection results are output through the detection branch network;
performing first preset loss function calculation according to the preliminary classification result to obtain a corresponding first preset loss function value;
calculating a second preset loss function according to the detection result to obtain a corresponding second preset loss function value;
weighting the first preset loss function value and the corresponding second preset loss function value to obtain a weighted total loss value;
obtaining an optimized gradient according to the weighted total loss value, and performing weight and bias updating until the weighted loss function converges to generate a classification model of the human body preset behavior;
the step of constructing the recognition model of the human body preset behavior according to the detection model and the classification model comprises the following steps:
removing the detection branch network in the classification model to obtain a target classification model;
and constructing an identification model of the preset human body behavior according to the detection model and the target classification model.
2. The identification method according to claim 1, further comprising:
and carrying out classification recognition of the preset human body behaviors on the input pedestrian image through the recognition model to obtain a corresponding recognition result.
3. The identification method according to claim 1, wherein the first predetermined loss function is:
Figure 751858DEST_PATH_IMAGE001
P t representing the probability value, L, of a positive sample cls (P t ) Is represented by P t A corresponding first predetermined loss function value,
Figure 894126DEST_PATH_IMAGE002
an adjustment factor representing the loss of positive and negative samples,
Figure 795217DEST_PATH_IMAGE003
representing the loss weight adjustment factor.
4. The identification method according to claim 1, characterized in that said second predetermined loss function is:
Figure 531092DEST_PATH_IMAGE004
Figure 57889DEST_PATH_IMAGE005
for said second pre-set loss function,
Figure 569510DEST_PATH_IMAGE006
as bounding box regression loss, x n Representing the output value, y, of each sample n Representing the true label value of each sample, n represents the total number of samples,
Figure 766136DEST_PATH_IMAGE007
it is expressed as a generalized cross-over ratio,
Figure 113941DEST_PATH_IMAGE008
and the loss function of the generalized intersection ratio is defined as A, a prediction box, B, a real box and C, a minimum box containing A and B.
5. The method according to claim 1, wherein the step of weighting the first predetermined loss function value and the corresponding second predetermined loss function value to obtain the weighted total loss value corresponds to a weighting formula:
Figure 929581DEST_PATH_IMAGE009
L cls representing the first pre-set loss function value,
Figure 656229DEST_PATH_IMAGE010
and representing the corresponding second preset Loss function value, wherein Loss represents the weighted total Loss value, and a is a dimension weight value.
6. The utility model provides an identification means of action is predetermine to human body which characterized in that includes:
the image set acquisition unit is used for acquiring a first pedestrian image set containing the preset human body behaviors;
the first marking unit is used for marking a first preset target area of each pedestrian image in the first pedestrian image set to obtain a marked first data set, and the mark comprises human body preset behavior information corresponding to pedestrians;
the detection model generation unit is used for carrying out detection training on a first preset target area of the pedestrian based on the first data set so as to generate a detection model of the first preset target area of the human body;
the image set generating unit is used for extracting the image corresponding to the first preset target area of each pedestrian image and the mark information in the marked first data set to obtain a second pedestrian image set;
a second marking unit, configured to mark, in the second pedestrian image set, a second preset target area of each pedestrian image to obtain a second data set, where the second preset target area is in the first preset target area;
a classification model generation unit, configured to perform classification detection training on the preset human behavior by using a structural parameterization model based on the second data set, so as to generate a classification model of the preset human behavior;
the structural parameterization model comprises a backbone network, a classification branch network and a detection branch network, the second preset target area comprises a mouth area of a human body and a hand area of the human body, and the classification model generating unit comprises:
the voice information generation subunit is used for extracting information of a second preset target area in each pedestrian image based on the backbone network so as to obtain semantic information of a mouth area and a hand area of the pedestrian;
the result output unit is used for respectively sending the semantic information into the classification branch network and the detection branch network, outputting a corresponding preliminary classification result through the classification branch network, and outputting a corresponding detection result through the detection branch network;
the first function value calculation operator unit is used for calculating a first preset loss function according to the primary classification result to obtain a corresponding first preset loss function value;
the second function value calculating operator unit is used for calculating a second preset loss function according to the detection result to obtain a corresponding second preset loss function value;
a total loss value operator unit, configured to weight the first preset loss function value and the corresponding second preset loss function value to obtain a weighted total loss value;
a classification model generation subunit, configured to obtain an optimized gradient according to the weighted total loss value, and perform weight and bias updating until the weighted loss function converges to generate a classification model of the preset behavior of the human body;
and the recognition model generation unit is used for removing the detection branch network in the classification model to obtain a target classification model, and constructing the recognition model of the human body preset behavior according to the detection model and the target classification model.
7. A device terminal, characterized in that it comprises a processor and a memory for storing a computer program, the processor running the computer program to cause the device terminal to perform the identification method of any one of claims 1 to 5.
8. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the identification method of any one of claims 1 to 5.
CN202211059859.0A 2022-09-01 2022-09-01 Human body preset behavior identification method and device, equipment terminal and storage medium Active CN115147933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211059859.0A CN115147933B (en) 2022-09-01 2022-09-01 Human body preset behavior identification method and device, equipment terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211059859.0A CN115147933B (en) 2022-09-01 2022-09-01 Human body preset behavior identification method and device, equipment terminal and storage medium

Publications (2)

Publication Number Publication Date
CN115147933A CN115147933A (en) 2022-10-04
CN115147933B true CN115147933B (en) 2023-01-17

Family

ID=83415963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211059859.0A Active CN115147933B (en) 2022-09-01 2022-09-01 Human body preset behavior identification method and device, equipment terminal and storage medium

Country Status (1)

Country Link
CN (1) CN115147933B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108615237B (en) * 2018-05-08 2021-09-07 上海商汤智能科技有限公司 Lung image processing method and image processing equipment
CN109446911B (en) * 2018-09-28 2021-08-06 北京陌上花科技有限公司 Image detection method and system
CN112115775B (en) * 2020-08-07 2024-06-07 北京工业大学 Smoke sucking behavior detection method based on computer vision under monitoring scene
CN113538390B (en) * 2021-07-23 2023-05-09 仲恺农业工程学院 Quick identification method for shaddock diseases and insect pests
CN114240878A (en) * 2021-12-16 2022-03-25 国网河南省电力公司电力科学研究院 Routing inspection scene-oriented insulator defect detection neural network construction and optimization method

Also Published As

Publication number Publication date
CN115147933A (en) 2022-10-04

Similar Documents

Publication Publication Date Title
CN107609572B (en) Multi-modal emotion recognition method and system based on neural network and transfer learning
CN111814902A (en) Target detection model training method, target identification method, device and medium
CN108595696A (en) A kind of human-computer interaction intelligent answering method and system based on cloud platform
CN111881707B (en) Image reproduction detection method, identity verification method, model training method and device
CN109685065B (en) Layout analysis method and system for automatically classifying test paper contents
CN110704811A (en) Picture infringement detection method and device and storage medium
CN111539456B (en) Target identification method and device
CN111681182A (en) Picture restoration method and device, terminal equipment and storage medium
CN113837257A (en) Target detection method and device
CN114266894A (en) Image segmentation method and device, electronic equipment and storage medium
CN115240280A (en) Construction method of human face living body detection classification model, detection classification method and device
CN112765357A (en) Text classification method and device and electronic equipment
CN111353514A (en) Model training method, image recognition method, device and terminal equipment
CN116206334A (en) Wild animal identification method and device
CN112580738B (en) AttentionOCR text recognition method and device based on improvement
CN114445691A (en) Model training method and device, electronic equipment and storage medium
CN115147933B (en) Human body preset behavior identification method and device, equipment terminal and storage medium
JP2024045078A (en) Neural network-based classifier, classification method, and storage medium
CN116630749A (en) Industrial equipment fault detection method, device, equipment and storage medium
CN116881408A (en) Visual question-answering fraud prevention method and system based on OCR and NLP
CN111062199A (en) Bad information identification method and device
CN112101385B (en) Weak supervision text detection method
CN115620083A (en) Model training method, face image quality evaluation method, device and medium
US20220237932A1 (en) Computer implemented method for segmenting a binarized document
CN113744253A (en) Image recognition system based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230712

Address after: 13C-18, Caihong Building, Caihong Xindu, No. 3002, Caitian South Road, Gangsha Community, Futian Street, Futian District, Shenzhen, Guangdong 518033

Patentee after: Core Computing Integrated (Shenzhen) Technology Co.,Ltd.

Address before: 518000 1001, building G3, TCL International e city, Shuguang community, Xili street, Nanshan District, Shenzhen City, Guangdong Province

Patentee before: Shenzhen Aishen Yingtong Information Technology Co.,Ltd.

TR01 Transfer of patent right