CN116682141A - Multi-label pedestrian attribute identification method and medium based on multi-scale progressive perception - Google Patents

Multi-label pedestrian attribute identification method and medium based on multi-scale progressive perception Download PDF

Info

Publication number
CN116682141A
CN116682141A CN202310657643.2A CN202310657643A CN116682141A CN 116682141 A CN116682141 A CN 116682141A CN 202310657643 A CN202310657643 A CN 202310657643A CN 116682141 A CN116682141 A CN 116682141A
Authority
CN
China
Prior art keywords
scale
attribute
progressive
pedestrian
perception
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310657643.2A
Other languages
Chinese (zh)
Inventor
陈婷婷
陈明明
杨光
林国凤
张勤
黄智财
薛鹏辉
郭泽扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Huaxia University
Original Assignee
Xiamen Huaxia University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Huaxia University filed Critical Xiamen Huaxia University
Priority to CN202310657643.2A priority Critical patent/CN116682141A/en
Publication of CN116682141A publication Critical patent/CN116682141A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-label pedestrian attribute identification method based on multi-scale progressive perception, which comprises the following steps: inputting the pedestrian image into a backbone network, and carrying out feature extraction through a plurality of residual convolution blocks of the backbone network to obtain attribute feature information; constructing a plurality of multi-scale progressive perception models and training; embedding a plurality of trained multi-scale progressive perception models into a backbone network, and inputting attribute characteristic information output by a target residual convolution block into the corresponding multi-scale progressive perception model to obtain multi-scale characteristic information; processing the multi-scale characteristic information through a global average pooling layer, and then sending the multi-scale characteristic information into a first attribute prediction layer for attribute probability prediction; and performing progressive constraint on the first attribute prediction layer corresponding to the previous multi-scale progressive perception model by the first attribute prediction layer corresponding to the latter multi-scale progressive perception model. The invention also discloses a computer readable storage medium, which improves the characteristic robustness of the whole pedestrian attribute area.

Description

Multi-label pedestrian attribute identification method and medium based on multi-scale progressive perception
Technical Field
The invention relates to the technical field of pedestrian attribute identification, in particular to a multi-label pedestrian attribute identification method and medium based on multi-scale progressive perception.
Background
The purpose of pedestrian attribute recognition is to recognize a plurality of attributes (e.g., long hair, business wear, leather shoes, glasses, age, character, etc.) in one pedestrian image. With the rapid development of monitoring technology, a large number of monitoring systems are deployed in public places. Therefore, pedestrian attribute identification is a technology for acquiring semantic attribute information of a specific target, and has been attracting more and more attention in recent years, and has become a key technology in video monitoring applications. It is also increasingly becoming the primary option for facilitating pedestrian re-identification and pedestrian retrieval studies. However, despite many years of efforts, pedestrian attribute recognition remains a challenging problem because pedestrian gestures, viewpoints, lighting, imperfect pedestrian detection, occlusion, lighting discrepancies, etc., can affect the recognition results.
In the past few years, many approaches have demonstrated their superiority in the task of attribute identification of pedestrians. Unlike conventional image classification tasks (where images belong to a single category), pedestrian images typically have multiple attribute tags that need to be classified. Pedestrian attribute identification is considered a multi-tag task in that in order to predict the presence of a particular attribute, it is necessary to locate the area where the pedestrian attribute is present.
The existing method adopts a backbone network to extract characteristics, embeds a linear classification layer, constructs a plurality of binary classifiers in the linear classification layer, and predicts a plurality of pedestrian attributes under the constraint of a binary cross entropy loss function. However, the existing method ignores the change of appearance forms and positions of the same type of attribute under different pedestrian postures, and the display forms of the attribute are different; the backbone network cannot cope with such intra-category attribute variations in the case of learning global feature information. Meanwhile, the backbone network adopts the same convolution check image to extract the characteristics, and the characteristics of the local pedestrian attribute area are often ignored. For example: when a person carries the bag on the left side and faces the camera; when the pedestrian is facing away from the camera, the backpack appears on the right side of the image. Meanwhile, the backbone network adopts the same convolution to check the pedestrian attribute image to extract the characteristics, and the partial attribute area is ignored. Because the previous methods pay more attention to global information, lack of learning of local feature information results in an inability to have better robustness to all properties for the extracted features.
Disclosure of Invention
In view of the above, the present invention aims to provide a multi-tag pedestrian attribute recognition method based on multi-scale progressive perception, which introduces a multi-scale progressive perception model embedded in a backbone network for feature learning of a local area. The scale progressive perception model provided by the invention can be applied to various backbone networks to promote the characteristic information learning of the local attribute by the existing pedestrian attribute method. In addition, for the characteristic information of different scales, a dynamic aggregation strategy is used for combining various characteristics so as to improve the characteristic robustness of the whole pedestrian attribute area.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
the invention provides a multi-label pedestrian attribute identification method based on multi-scale progressive perception, which comprises the following steps:
step 1, inputting a pedestrian image into a main network, and carrying out feature extraction through a plurality of residual convolution blocks of the main network to obtain attribute feature information output by each residual convolution block;
step 2, constructing a plurality of multi-scale progressive perception models, and training each multi-scale progressive perception model;
step 3, embedding a plurality of trained multi-scale progressive perception models into the backbone network, and inputting attribute characteristic information output by a target residual convolution block into the corresponding multi-scale progressive perception model to obtain multi-scale characteristic information;
step 4, after the multi-scale characteristic information is processed by the global average pooling layer, the multi-scale characteristic information is sent to the first attribute prediction layer to predict attribute probability;
and 5, performing progressive constraint on the first attribute prediction layer corresponding to the previous multi-scale progressive perception model by the first attribute prediction layer corresponding to the latter multi-scale progressive perception model.
Further, the step 1 specifically includes:
step 11, the ith pedestrian image x in the acquired pedestrian data set D i As an input to the backbone network, the ith pedestrian image x i The corresponding pedestrian attribute tag is defined as y i ∈{0,1} M Wherein M represents the category number of the pedestrian attribute, 0 represents that the pedestrian attribute does not exist, and 1 represents that the pedestrian attribute exists;
step 12, the backbone network comprises l residual convolution blocks which are sequentially connected, the pedestrian image is used as the input of the 1 st residual convolution block, and the output of the current residual convolution block is used as the input of the next residual convolution block;
step 13, the pedestrian image x i Feature extraction is carried out through a residual convolution block of the backbone network to obtain corresponding attribute feature information, and the expression is as follows:
F l =(B l {x|θ 1 ,…θ l }) (1)
wherein F is l Attribute characteristic information which represents the output of the first residual convolution block; b (B) l Represents the 1 st to the l th residual convolution blocks, θ, in the backbone network 1 ,…θ l Training parameters representing the 1 st to the l th residual convolution blocks in the backbone network.
Further, the step 2 specifically includes:
step 21, constructing a plurality of multi-scale progressive perception models;
step 22, training each multi-scale progressive perception model by using a binary cross entropy as a loss function;
the expression of the loss function is:
wherein L is bce Representing a loss function, N and M representing data amounts, i representing numbers of pedestrian image sheets, j representing numbers of pedestrian attributes,representing the ith pedestrian image x i Sending the image x of the ith pedestrian into a multi-scale progressive perception model i The model of the jth pedestrian attribute predicts a probability value; y is i,j Jth pedestrian attribute tag value, ω, representing ith pedestrian image j Representing an imbalance suppression factor; log represents a logarithmic function, σ represents an activation function, e represents an index, r j Representing the positive sample proportion of the jth pedestrian attribute in the training set.
Further, the step 3 specifically includes:
step 31, taking the 1 st to the 1 st residual convolution blocks in the backbone network as target residual convolution blocks, and setting the number of scale-progressive perception models according to the number of the target residual convolution blocks;
step 32, embedding a plurality of trained multi-scale progressive perception models into the backbone network;
step 33, output F of the p-th target residual convolution block p Inputting the p-th multi-scale progressive perception model to obtain p-th scale characteristic information; p is a positive integer and the value range is 1-p-1.
Further, the step 33 specifically includes:
step 331, taking attribute characteristic information output by each target residual convolution block in the backbone network as input of a corresponding multi-scale progressive perception model;
step 332, in the multi-scale progressive perception model, sending the attribute characteristic information into a dimension-reducing convolution layer for dimension-reducing operation;
step 333, attribute characteristic information after dimension reduction is respectively fed into a plurality of branches, and is fed into a plurality of different convolution kernels through different branches to extract characteristics of different dimensions, so as to obtain different first-dimension characteristics;
step 334, the first scale feature extracted from the convolution kernel in each branch is subjected to a full connection layer to adjust the dimension of the first scale feature, and then nonlinear processing is performed through an activation function to obtain a second scale feature;
step 335, multiplying the second scale feature and the first scale feature in each branch as the output feature of the branch;
step 336, adding the output features of the multiple branches to obtain multi-scale feature information.
Further, the dimension-reducing convolution layer adopts a 1x1 convolution kernel.
Further, the branch road is provided with 3, is provided with 3 kinds of different convolution kernels on 3 branch roads respectively, adopts respectively: a 3x3 convolution kernel, a 5x5 convolution kernel, and a 7x7 convolution kernel.
Further, in the step 5, the first attribute prediction layer corresponding to the next multi-scale progressive perception model is subjected to progressive constraint on the first attribute prediction layer corresponding to the previous multi-scale progressive perception model by using the L2 norm.
Further, the step 5 further includes:
and 6, after the attribute characteristic information output by the last residual convolution block is processed by the global average pooling layer, the attribute characteristic information is sent to a second attribute prediction layer, and the output result of the second attribute prediction layer is used as a final pedestrian attribute identification result.
The invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the multi-tag pedestrian attribute identification method based on multi-scale progressive perception as described above.
By adopting the technical scheme, compared with the prior art, the invention has the beneficial effects that:
the invention introduces a scale progressive perception model, which is embedded into a backbone network thereof and is used for learning the characteristics of a local area. The scale progressive perception model provided by the invention can be applied to various backbone networks to promote the characteristic information learning of the local attribute by the existing pedestrian attribute method. In addition, for the characteristic information of different scales, a dynamic aggregation strategy is used for combining various characteristics so as to improve the characteristic robustness of the whole pedestrian attribute area. In addition, the multi-scale progressive perception model provided by the invention is plug and play, no extra calculation cost is generated during reasoning, and experiments on a plurality of data sets prove that the proposed method can bring about remarkable performance improvement.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a multi-tag pedestrian attribute identification method based on multi-scale progressive sensing according to an embodiment of the present invention.
Fig. 2 is a frame diagram of a multi-scale progressive perception model provided by an embodiment of the present invention.
Fig. 3 is a schematic diagram of a computer readable storage medium according to an embodiment of the present invention.
The reference numerals in the figures illustrate:
the multi-scale progressive perception model comprises a residual convolution block 1, a multi-scale progressive perception model 2, a global average pooling layer 3, a first attribute prediction layer 4 and a second attribute prediction layer 5.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is specifically noted that the following examples are only for illustrating the present invention, but do not limit the scope of the present invention. Likewise, the following examples are only some, but not all, of the examples of the present invention, and all other examples, which a person of ordinary skill in the art would obtain without making any inventive effort, are within the scope of the present invention.
Referring to fig. 1 and 2, the multi-label pedestrian attribute identification method based on multi-scale progressive perception of the invention comprises the following steps:
step 1, inputting a pedestrian image into a backbone network (ResNet 50, which comprises 4 residual convolution blocks 1 in the embodiment), and performing feature extraction through a plurality of residual convolution blocks 1 of the backbone network to obtain attribute feature information output by each residual convolution block 1;
in this embodiment, the step 1 specifically includes:
step 11, the ith pedestrian image x in the acquired pedestrian data set D i As an input to the backbone network, the ith pedestrian image x i The corresponding pedestrian attribute tag is defined as y i ∈{0,1} M Wherein M represents the category number of the pedestrian attribute, 0 represents that the pedestrian attribute does not exist, and 1 represents that the pedestrian attribute exists;
step 12, the backbone network comprises l residual convolution blocks 1 which are sequentially connected, the pedestrian image is used as the input of the 1 st residual convolution block 1, and the output of the current residual convolution block 1 is used as the input of the next residual convolution block 1;
step 13, the pedestrian image x i Feature extraction is performed through a residual convolution block 1 of the backbone network to obtain corresponding attribute feature information, and the expression is as follows:
F l =(B l {x|θ 1 ,…θ l }) (1)
wherein F is l Attribute characteristic information output by the first residual convolution block 1 is represented; b (B) l Residual convolution blocks 1, θ representing the 1 st through the l in the backbone network 1 ,…θ l Representing the training parameters of the 1 st to the l th residual convolution blocks 1 in the backbone network.
Step 2, constructing a plurality of multi-scale progressive perception models 2, and training each multi-scale progressive perception model 2;
in this embodiment, the step 2 specifically includes:
step 21, constructing a plurality of multi-scale progressive perception models 2;
step 22, the whole pedestrian attribute identification problem is regarded as a multi-label classification task, binary cross entropy (BCELoss) is adopted as a loss function, and each multi-scale progressive perception model 2 is trained through the loss function;
the expression of the loss function is:
wherein L is bce Representing a loss function, N and M representing data amounts, i representing numbers of pedestrian image sheets, j representing numbers of pedestrian attributes,representing the ith pedestrian image x i Feeding the image into a multi-scale progressive perception model 2, and aiming at an ith pedestrian image x i The model of the jth pedestrian attribute predicts a probability value; y is i,j Jth pedestrian attribute tag value, ω, representing ith pedestrian image j Representing an imbalance suppression factor; log represents a logarithmic function, σ represents an activation function, e represents an index, r j Representing the positive sample proportion of the jth pedestrian attribute in the training set.
Step 3, embedding a plurality of trained multi-scale progressive perception models 2 into the backbone network, and inputting attribute characteristic information output by the target residual convolution block 1 into the corresponding multi-scale progressive perception models 2 to obtain multi-scale characteristic information;
in this embodiment, the step 3 specifically includes:
step 31, taking the 1 st to the 1 st residual convolution blocks 1 in the backbone network as target residual convolution blocks 1, and setting the number of scale-progressive perception models according to the number of the target residual convolution blocks 1;
step 32, embedding a plurality of trained multi-scale progressive perception models 2 into the backbone network; for feature learning of a local region;
step 33, output F of the p-th target residual convolution block 1 p Inputting the p-th multi-scale progressive perception model 2 to obtain p-th scale characteristic information; p is a positive integer and the value range is 1-p-1. The method is used for extracting the features of different scales and pushing the network to learn the local attribute region features.
In this embodiment, the step 33 specifically includes:
step 331, taking attribute characteristic information output by each target residual convolution block 1 in the backbone network as input of a corresponding multi-scale progressive perception model 2;
step 332, in the multi-scale progressive perception model 2, sending the attribute characteristic information into a dimension-reducing convolution layer for dimension-reducing operation; in this embodiment, the dimension-reducing convolution layer uses a 1×1 convolution kernel (conv), and the purpose of this step is to reduce the feature dimension to reduce the calculation amount.
Step 333, attribute characteristic information after dimension reduction is respectively fed into a plurality of branches, and is fed into a plurality of different convolution kernels through different branches to extract characteristics of different dimensions, so as to obtain different first-dimension characteristics;
in this embodiment, the branches are provided with 3, and 3 branches are respectively provided with 3 different convolution kernels, and respectively adopt: a 3x3 convolution kernel (left-most branch of fig. 2), a 5x5 convolution kernel (middle branch of fig. 2), and a 7x7 convolution kernel (right-most branch of fig. 2), wherein the 5x5 convolution kernel is constructed using two 3x3 convolution kernels and the 7x7 convolution kernel is constructed using three 3x3 convolution kernels. Two 3x3 convolution kernels are used for each of the 3 different convolution kernels in order to reduce the number of parameters to achieve light weight. The input features are respectively subjected to 3x3, 5x5 and 7x7 to extract features with different scales, so that regional features with different pedestrian attributes are covered, and the extracted features have the most characterization force.
Step 334, the first scale feature extracted from the convolution kernel in each branch is subjected to a full connection layer (FC layer) to adjust the dimension of the first scale feature, and then nonlinear processing (adding nonlinearity) is performed through an activation function (Sigmoid function) to obtain a second scale feature;
step 335, multiplying the second scale feature and the first scale feature in each branch as the output feature of the branch;
step 336, adding the output features of the multiple branches to obtain multi-scale feature information. The characterization of the local feature information is increased through the acquisition of the features with different scales, so that the robustness of the features is enriched.
Step 4, after the multi-scale characteristic information is processed by a global average pooling layer 3 (GAP), the multi-scale characteristic information is sent to a first attribute prediction layer 4 for attribute probability prediction;
and 5, performing progressive constraint on the first attribute prediction layer 4 corresponding to the previous multi-scale progressive perception model 2 by the first attribute prediction layer 4 corresponding to the next multi-scale progressive perception model 2. The purpose of this step is to further constrain the parametric training of the model.
In this embodiment, the step 5 is to use the L2 norm to progressively constrain the first attribute prediction layer 4 corresponding to the next multi-scale progressive perception model 2 to the first attribute prediction layer 4 corresponding to the previous multi-scale progressive perception model 2. In order to prevent the condition that the parameters of the previous convolution blocks are updated timely due to the fact that gradients disappear in the model training process, the first attribute prediction layer 4 of the next multi-scale progressive perception model 2 is utilized to carry out progressive constraint on the first attribute prediction layer 4 of the previous multi-scale progressive perception model 2, so that the parameter optimization direction of the previous multi-scale progressive perception model 2 can be consistent with that of the next multi-scale progressive perception model 2. That is, the adjacent first attribute prediction layers 4 in fig. 1 are constrained by the progressive constraint (L2 norm) in fig. 1, and the training of the progressive constraint model facilitates the parameter update of the previous residual convolution block 1.
In this embodiment, the step 5 further includes:
and 6, processing the attribute characteristic information output by the last residual convolution block 1 through the global average pooling layer 3, sending the processed attribute characteristic information into a second attribute prediction layer 5, and taking the output result of the second attribute prediction layer 5 as a final pedestrian attribute identification result.
As shown in fig. 3, an embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the above-mentioned multi-tag pedestrian attribute identification method based on multi-scale progressive perception.
In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing description is only a partial embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent devices or equivalent processes using the descriptions and the drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.

Claims (10)

1. The multi-label pedestrian attribute identification method based on multi-scale progressive perception is characterized by comprising the following steps of:
step 1, inputting a pedestrian image into a main network, and carrying out feature extraction through a plurality of residual convolution blocks of the main network to obtain attribute feature information output by each residual convolution block;
step 2, constructing a plurality of multi-scale progressive perception models, and training each multi-scale progressive perception model;
step 3, embedding a plurality of trained multi-scale progressive perception models into the backbone network, and inputting attribute characteristic information output by a target residual convolution block into the corresponding multi-scale progressive perception model to obtain multi-scale characteristic information;
step 4, after the multi-scale characteristic information is processed by the global average pooling layer, the multi-scale characteristic information is sent to the first attribute prediction layer to predict attribute probability;
and 5, performing progressive constraint on the first attribute prediction layer corresponding to the previous multi-scale progressive perception model by the first attribute prediction layer corresponding to the latter multi-scale progressive perception model.
2. The multi-tag pedestrian attribute identification method based on multi-scale progressive sensing as set forth in claim 1, wherein the step 1 specifically includes:
step 11, the ith pedestrian image x in the acquired pedestrian data set D i As an input to the backbone network, the ith pedestrian image x i The corresponding pedestrian attribute tag is defined as y i ∈{0,1} M Wherein M represents the category number of the pedestrian attribute, 0 represents that the pedestrian attribute does not exist, and 1 represents that the pedestrian attribute exists;
step 12, the backbone network comprises l residual convolution blocks which are sequentially connected, the pedestrian image is used as the input of the 1 st residual convolution block, and the output of the current residual convolution block is used as the input of the next residual convolution block;
step 13, the pedestrian image x i Feature extraction is carried out through a residual convolution block of the backbone network to obtain corresponding attribute feature information, and the expression is as follows:
F l =(B l {x|θ 1 ,…θ l }) (1)
wherein F is l Attribute characteristic information which represents the output of the first residual convolution block; b (B) l Represents the 1 st to the l th residual convolution blocks, θ, in the backbone network 1 ,…θ l Training parameters representing the 1 st to the l th residual convolution blocks in the backbone network.
3. The multi-tag pedestrian attribute identification method based on multi-scale progressive perception according to claim 2, wherein the step 2 specifically includes:
step 21, constructing a plurality of multi-scale progressive perception models;
step 22, training each multi-scale progressive perception model by using a binary cross entropy as a loss function;
the expression of the loss function is:
wherein L is bce Representing a loss function, N and M representing data amounts, i representing numbers of pedestrian image sheets, j representing numbers of pedestrian attributes,representing the ith pedestrian image x i Sending the image x of the ith pedestrian into a multi-scale progressive perception model i The model of the jth pedestrian attribute predicts a probability value; y is i,j Jth pedestrian attribute tag value, ω, representing ith pedestrian image j Representing an imbalance suppression factor; log represents a logarithmic function, σ represents an activation function, e represents an index, r j Representing the positive sample proportion of the jth pedestrian attribute in the training set.
4. The multi-tag pedestrian attribute identification method based on multi-scale progressive perception of claim 3, wherein the step 3 specifically includes:
step 31, taking the 1 st to the 1 st residual convolution blocks in the backbone network as target residual convolution blocks, and setting the number of scale-progressive perception models according to the number of the target residual convolution blocks;
step 32, embedding a plurality of trained multi-scale progressive perception models into the backbone network;
step 33, output F of the p-th target residual convolution block p Inputting the p-th multi-scale progressive perception model to obtain p-th scale characteristic information; p is a positive integer and the value range is 1-p-1.
5. The multi-tag pedestrian attribute identification method based on multi-scale progressive sensing of claim 4, wherein the step 33 specifically includes:
step 331, taking attribute characteristic information output by each target residual convolution block in the backbone network as input of a corresponding multi-scale progressive perception model;
step 332, in the multi-scale progressive perception model, sending the attribute characteristic information into a dimension-reducing convolution layer for dimension-reducing operation;
step 333, attribute characteristic information after dimension reduction is respectively fed into a plurality of branches, and is fed into a plurality of different convolution kernels through different branches to extract characteristics of different dimensions, so as to obtain different first-dimension characteristics;
step 334, the first scale feature extracted from the convolution kernel in each branch is subjected to a full connection layer to adjust the dimension of the first scale feature, and then nonlinear processing is performed through an activation function to obtain a second scale feature;
step 335, multiplying the second scale feature and the first scale feature in each branch as the output feature of the branch;
step 336, adding the output features of the multiple branches to obtain multi-scale feature information.
6. The multi-tag pedestrian attribute identification method based on multi-scale progressive perception of claim 5 wherein the dimension-reduction convolution layer employs a 1x1 convolution kernel.
7. The multi-tag pedestrian attribute identification method based on multi-scale progressive sensing as claimed in claim 5, wherein the branches are provided with 3, 3 different convolution kernels are respectively provided on the 3 branches, and the method respectively adopts: a 3x3 convolution kernel, a 5x5 convolution kernel, and a 7x7 convolution kernel.
8. The multi-tag pedestrian attribute recognition method based on multi-scale progressive sensing according to claim 1, wherein the step 5 is to progressively constrain a first attribute prediction layer corresponding to a next multi-scale progressive sensing model to a first attribute prediction layer corresponding to a previous multi-scale progressive sensing model by using an L2 norm.
9. The multi-tag pedestrian attribute identification method based on multi-scale progressive perception of claim 1, further comprising, after step 5:
and 6, after the attribute characteristic information output by the last residual convolution block is processed by the global average pooling layer, the attribute characteristic information is sent to a second attribute prediction layer, and the output result of the second attribute prediction layer is used as a final pedestrian attribute identification result.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the multi-tag pedestrian attribute identification method based on multi-scale progressive perception as claimed in any one of claims 1 to 9.
CN202310657643.2A 2023-06-05 2023-06-05 Multi-label pedestrian attribute identification method and medium based on multi-scale progressive perception Pending CN116682141A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310657643.2A CN116682141A (en) 2023-06-05 2023-06-05 Multi-label pedestrian attribute identification method and medium based on multi-scale progressive perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310657643.2A CN116682141A (en) 2023-06-05 2023-06-05 Multi-label pedestrian attribute identification method and medium based on multi-scale progressive perception

Publications (1)

Publication Number Publication Date
CN116682141A true CN116682141A (en) 2023-09-01

Family

ID=87781815

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310657643.2A Pending CN116682141A (en) 2023-06-05 2023-06-05 Multi-label pedestrian attribute identification method and medium based on multi-scale progressive perception

Country Status (1)

Country Link
CN (1) CN116682141A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118115729A (en) * 2024-04-26 2024-05-31 齐鲁工业大学(山东省科学院) Image fake region identification method and system with multi-level and multi-scale feature interaction

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118115729A (en) * 2024-04-26 2024-05-31 齐鲁工业大学(山东省科学院) Image fake region identification method and system with multi-level and multi-scale feature interaction

Similar Documents

Publication Publication Date Title
EP3327583B1 (en) Method and device for searching a target in an image
Xu et al. No-reference/blind image quality assessment: a survey
JP2017062781A (en) Similarity-based detection of prominent objects using deep cnn pooling layers as features
Chen et al. A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning methods
Liu et al. Smooth filtering identification based on convolutional neural networks
Cheng et al. Sparse representations based attribute learning for flower classification
Xiao et al. Local phase quantization plus: A principled method for embedding local phase quantization into fisher vector for blurred image recognition
Jain et al. An efficient image forgery detection using biorthogonal wavelet transform and improved relevance vector machine
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
CN116682141A (en) Multi-label pedestrian attribute identification method and medium based on multi-scale progressive perception
Defriani et al. Recognition of regional traditional house in Indonesia using Convolutional Neural Network (CNN) method
CN116310563A (en) Noble metal inventory management method and system
CN111814562A (en) Vehicle identification method, vehicle identification model training method and related device
Siddiqi Fruit-classification model resilience under adversarial attack
Yousaf et al. Patch-CNN: deep learning for logo detection and brand recognition
Sabeena et al. Convolutional block attention based network for copy-move image forgery detection
Paul et al. Dimensionality reduction of hyperspectral images: a data-driven approach for band selection
Chawla et al. Classification of computer generated images from photographic images using convolutional neural networks
CN116029760A (en) Message pushing method, device, computer equipment and storage medium
Turtinen et al. Contextual analysis of textured scene images.
CN112507912B (en) Method and device for identifying illegal pictures
CN112084371B (en) Movie multi-label classification method and device, electronic equipment and storage medium
CN116958615A (en) Picture identification method, device, equipment and medium
Cristin et al. Image forgery detection using supervised learning algorithm
Vijayan et al. Contextual background modeling using deep convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination