CN115909409A - Pedestrian attribute analysis method and device, storage medium and electronic equipment - Google Patents

Pedestrian attribute analysis method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN115909409A
CN115909409A CN202211589015.7A CN202211589015A CN115909409A CN 115909409 A CN115909409 A CN 115909409A CN 202211589015 A CN202211589015 A CN 202211589015A CN 115909409 A CN115909409 A CN 115909409A
Authority
CN
China
Prior art keywords
pedestrian
attribute
information
image
context information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211589015.7A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Ruiyan Technology Co ltd
Original Assignee
Chengdu Ruiyan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Ruiyan Technology Co ltd filed Critical Chengdu Ruiyan Technology Co ltd
Priority to CN202211589015.7A priority Critical patent/CN115909409A/en
Publication of CN115909409A publication Critical patent/CN115909409A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and provides a pedestrian attribute analysis method and device, a storage medium and an electronic device. The pedestrian attribute analysis method comprises the following steps: obtaining semantic context information of pedestrian attributes in the pedestrian image, wherein the semantic context information is obtained by analyzing the pedestrian image or the associated information thereof and comprises at least one of individual attribute association, group attribute information and space-time constraint information; and executing a pedestrian attribute analysis task by utilizing the semantic context information, wherein the pedestrian attribute analysis task comprises at least one task of a model training task, a model inference task and a post-processing task. When analyzing the pedestrian attribute, the method uses the semantic context information of the pedestrian attribute such as individual attribute association, group attribute information, space-time constraint information and the like in the stages of training, inference, post-processing and the like, thereby obviously improving the accuracy of analyzing the pedestrian attribute.

Description

Pedestrian attribute analysis method and device, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a pedestrian attribute analysis method and device, a storage medium and electronic equipment.
Background
And analyzing the pedestrian attributes, namely identifying various attributes of pedestrians in the monitoring video by utilizing a computer vision technology, wherein the identifiable pedestrian attributes comprise sex, age, human body orientation, jacket type, jacket color, coat removing type, coat removing color, shoe type, whether to wear overcoat, whether to wear a hat, whether to wear glasses, whether to wear a mask, hair length, whether to hold a shopping bag, whether to hold a handbag, whether to hold a single-shoulder bag, whether to hold a backpack, whether to hold a draw-bar box, whether to hold a handcart, whether to hold a mobile phone, whether to hold an umbrella, umbrella color, occupation and the like.
At present, the mainstream method of pedestrian attribute analysis is a method based on deep learning, however, the acquisition scene of the monitoring video is diversified and complicated, so that the accuracy of pedestrian attribute analysis performed by the existing method is not high.
Disclosure of Invention
An object of the present invention is to provide a method and an apparatus for analyzing pedestrian attributes, a storage medium, and an electronic device, so as to solve the above technical problems.
In order to achieve the above purpose, the present application provides the following technical solutions:
in a first aspect, an embodiment of the present application provides a method for analyzing a pedestrian attribute, including: obtaining semantic context information of pedestrian attributes in a pedestrian image; executing a pedestrian attribute analysis task by utilizing the semantic context information;
wherein the semantic context information is obtained by analyzing the pedestrian image or the associated information thereof, and includes at least one of the following information: the individual attribute association comprises statistical rules obeyed by the association among various pedestrian attributes of the individual pedestrians or a specific rule set accorded with the individual pedestrian attributes; group attribute information including statistical rules to which at least one pedestrian attribute of a pedestrian group is obeyed, or a specific rule set to which the pedestrian attribute is accorded; the system comprises spatio-temporal constraint information, wherein the spatio-temporal constraint information comprises a statistical rule obeyed by the association between spatio-temporal information corresponding to a pedestrian image and at least one pedestrian attribute or a specific rule set accorded with the spatio-temporal constraint information, and the spatio-temporal constraint information comprises at least one of time, space and scene;
the pedestrian attribute analysis task comprises at least one of the following tasks: model training tasks: inputting the pedestrian image and the pedestrian attribute label, and training a neural network model for deducing the pedestrian attribute by combining the semantic context information of the pedestrian image; model inference tasks: inputting the pedestrian image, and deducing the pedestrian attribute by utilizing a neural network model in combination with the semantic context information of the pedestrian image; and (3) post-processing tasks: and inputting an inference result of a neural network model aiming at the pedestrian attribute in the pedestrian image, and correcting the inference result by combining semantic context information of the pedestrian image.
When analyzing the pedestrian attribute, the method uses the semantic context information of the pedestrian attribute such as individual attribute association, group attribute information, space-time constraint information and the like in stages of model training, model inference, post-processing of a model inference result and the like, thereby obviously improving the accuracy of analyzing the pedestrian attribute.
The reason is that although the pedestrian image may have diversity and complexity in the acquisition scene, the diversity and complexity generally only affect the image features extracted by the model, but do not significantly affect the semantic context information of the pedestrian attributes in the pedestrian image, that is, the semantic context information has certain stability in the statistical sense, so that the analysis process of the pedestrian attributes is constrained by the semantic context information, and a better analysis result of the pedestrian attributes can be obtained.
It can be understood that the method can well meet the analysis requirement on the pedestrian attribute in video monitoring, and can be applied to other occasions except video monitoring.
In an implementation manner of the first aspect, the statistical rules in the individual attribute association include: the method comprises the following steps that an attribute association thermodynamic diagram among various pedestrian attributes of individual pedestrians is provided, and the attribute association thermodynamic diagram comprises numerical values representing the correlation degree of the pedestrian attributes; or, a correlation coefficient between a plurality of pedestrian attributes of individual pedestrians; or, a conditional probability distribution obeyed between various pedestrian attributes of individual pedestrians; or, a joint distribution obeyed between various pedestrian attributes of individual pedestrians; the specific rule set in the individual attribute association comprises: at least one rule describing a plurality of pedestrian attributes subject to a correlation between the attributes.
In an implementation manner of the first aspect, the statistical rules in the group attribute information include: a statistical distribution to which at least one pedestrian attribute of a pedestrian population is subject; alternatively, the statistical quantity of at least one pedestrian attribute of the pedestrian population comprises: at least one of mean, variance, covariance, maximum, minimum, high order moment, high order cumulant spectrum; the specific rule set in the group attribute information comprises: at least one rule describing that a pedestrian attribute is constrained by the population of the attribute itself.
In one implementation manner of the first aspect, the statistical rules in the spatio-temporal constraint information include: at least one pedestrian attribute obeys statistical distribution which takes the spatiotemporal information corresponding to the pedestrian image as prior constraint; or, the at least one pedestrian attribute has statistics comprising: at least one of mean, variance, covariance, maximum, minimum, high order moment, high order cumulant spectrum; the specific rule set in the spatio-temporal constraint information comprises: at least one rule describing that pedestrian attributes are constrained by spatiotemporal information.
In the above three implementation manners, typical forms of individual attribute association, group attribute information, and spatiotemporal constraint information are respectively given. In summary, the semantic context information can be a statistical result or a solidified rule (e.g. a hard-coded rule), and the setting mode is very flexible.
In an implementation manner of the first aspect, the performing the model training task by using the semantic context information includes: extracting the image characteristics of the pedestrian image by using an attribute analysis network, and carrying out classification prediction on the pedestrian attributes in the pedestrian image according to the image characteristics to obtain an attribute prediction result; wherein, the attribute analysis network is a neural network model to be trained; calculating a classification loss according to a difference between the attribute prediction result and a pedestrian attribute label of the pedestrian image, and calculating a semantic loss according to the semantic context information; and calculating fusion loss according to the classification loss and the semantic loss, and updating parameters of the attribute analysis network according to the fusion loss.
When the attribute analysis network is trained, the traditional classification loss based on the image features is calculated, and the semantic loss based on the semantic context information is also calculated, so that when the trained model carries out classification prediction on the attributes of the pedestrians, the influence of the image features on attribute values from a data level can be considered, the influence of the semantic context information on the attribute values from a semantic level can be considered, and the accuracy of the model for analyzing the attributes of the pedestrians can be obviously improved.
In an implementation manner of the first aspect, the attribute analysis network includes a feature extraction network and an attribute classification network connected to the feature extraction network, and the extracting, by using the attribute analysis network, an image feature of the pedestrian image, and performing classification prediction on a pedestrian attribute in the pedestrian image according to the image feature to obtain an attribute prediction result includes: extracting image features of the pedestrian image by using the feature extraction network; based on the image features, carrying out classification prediction on the pedestrian attributes in the pedestrian image by using the attribute classification network to obtain the attribute prediction result; the semantic context information comprises individual attribute association, the attribute analysis network further comprises an attribute association extraction network connected with the feature extraction network, and the obtaining of the semantic context information of the pedestrian attribute in the pedestrian image comprises: and extracting individual attribute association of the pedestrian attribute in the pedestrian image by using the attribute association extraction network based on the image feature.
In the above implementation manner, the attribute analysis network includes two parts, namely a feature extraction network and an attribute classification network, the feature extraction network is configured to extract common image features, the attribute classification network may be a multi-branch network, each branch is configured to perform classification prediction on a pedestrian attribute, and each branch may further extract a proprietary image feature for a corresponding attribute. The individual attribute association can also be extracted by using an attribute association extraction network on the basis of common image features, which is equivalent to sharing the feature extraction network with the classification prediction part of the attributes.
In an implementation manner of the first aspect, the semantic context information includes individual attribute association, and the calculating semantic loss according to the semantic context information includes: calculating the average individual attribute correlation of all pedestrian images in the training batch in which the pedestrian images are located; calculating a first semantic loss of the semantic losses according to a difference of the individual attribute associations and the average individual attribute association.
The correlation between attributes should be statistically stable, so the individual attribute correlation in a single pedestrian image should tend to be consistent with the statistically-significant individual attribute correlation, which is the starting point for setting the first semantic loss.
In an implementation manner of the first aspect, the obtaining semantic context information of pedestrian attributes in a pedestrian image includes: and obtaining the group attribute information of the pedestrian attributes in the pedestrian images by counting at least one pedestrian attribute in part or all of the pedestrian images in the training set.
The definition of the group attribute information includes a statistical rule (such as a statistical distribution, a statistic) obeyed by at least one pedestrian attribute of the pedestrian group, or a specific rule set accorded. Therefore, the pedestrian images in the training set are counted, a required statistical rule can be obtained, or a required rule set can be formed according to the statistical result.
In an implementation manner of the first aspect, the calculating semantic loss according to the semantic context information includes: calculating a group attribute prediction result which represents the pedestrian attribute in the pedestrian image predicted according to the group attribute information; calculating a second semantic loss of the semantic losses according to a difference between the attribute prediction result and the population attribute prediction result.
The individual attributes and the group attributes should be similar to each other to some extent, so that the attribute prediction results obtained based on the individual pedestrian images should tend to be consistent with the group attribute prediction results obtained based on the group attribute information, which is the starting point for setting the second semantic loss.
In an implementation manner of the first aspect, the obtaining semantic context information of pedestrian attributes in a pedestrian image includes: acquiring the spatiotemporal information corresponding to the pedestrian image from a camera or a monitoring system; and taking the spatiotemporal information as prior constraint, and obtaining the spatiotemporal constraint information of the pedestrian attributes in the pedestrian images by counting at least one pedestrian attribute in part of or all of the pedestrian images in the training set.
The definition of the spatiotemporal constraint information includes a statistical rule (such as statistical distribution, statistic) obeyed by the association between the spatiotemporal information corresponding to the pedestrian image and at least one pedestrian attribute, or a specific rule set accorded. Therefore, after the spatiotemporal information corresponding to the pedestrian image is obtained from the camera or the monitoring system, the spatiotemporal information is taken as prior constraint to count the pedestrian image in the training set, so that a required statistical rule can be obtained, or a required rule set can be formed according to a statistical result.
In an implementation manner of the first aspect, the semantic context information includes spatiotemporal information, and the calculating semantic loss according to the semantic context information includes: calculating a spatiotemporal attribute prediction result, wherein the spatiotemporal attribute prediction result represents the pedestrian attribute in the pedestrian image predicted according to the spatiotemporal constraint information; and calculating a third semantic loss in the semantic losses according to the difference between the attribute prediction result and the space-time attribute prediction result.
When analyzing the pedestrian attribute based on the pedestrian image, it is necessary to explore the information of the image itself as much as possible, and to make the obtained attribute prediction result as consistent as possible with the space-time attribute prediction result obtained based on the space-time constraint information, which is the starting point for setting the third semantic loss.
In one implementation of the first aspect, performing the model inference task using the semantic context information includes: extracting preliminary image features of the pedestrian image by using an attribute analysis network; the attribute analysis network is a trained neural network model, and the preliminary image features are used for carrying out classification prediction on pedestrian attributes in the pedestrian images; calculating the preliminary image features according to the semantic context information to obtain final image features; and carrying out classification prediction on the pedestrian attributes in the pedestrian image according to the final image features to obtain an attribute prediction result.
When the method is used for deducing the attributes of pedestrians, the method firstly utilizes the attribute analysis network to extract the preliminary image features of the pedestrian images, namely the influence of the image features on attribute values from a data level is considered, then semantic context information is further used for calculating the preliminary attribute image features to obtain final image features, and finally an attribute prediction result is obtained according to the final image features, namely the influence of the semantic context information on the attribute values from the semantic level is considered.
Moreover, because the diversity and complexity of the pedestrian images in the captured scene generally only affect the image features extracted by the model, and do not significantly affect the semantic context information of the pedestrian attributes in the pedestrian images, the method is very suitable for pedestrian attribute analysis in the field of video monitoring, but is not limited to be applied in this field.
In an implementation manner of the first aspect, the obtaining semantic context information of pedestrian attributes in a pedestrian image includes: obtaining the group attribute information of the pedestrian attribute in the pedestrian image by counting part or all of the pedestrian images in the training set and/or inferred part or all of the pedestrian images; or, the group attribute information of the pedestrian attribute in the pedestrian image is obtained by calculating the existing group attribute information obtained before; the inferred pedestrian images refer to pedestrian images which have been subjected to pedestrian attribute prediction, and the training set refers to a set of pedestrian images used for training the attribute analysis network.
In the model inference stage, the group attribute information has multiple acquisition modes, which can be obtained by counting the pedestrian images in the training set and/or the inferred pedestrian images, or can be calculated according to the existing group attribute information, or the two modes are combined, for example, an initial group attribute information is calculated according to the existing group attribute information, and then the initial group attribute information is continuously updated by using the statistical result of the pedestrian attributes in the inferred pedestrian images, so that the group attribute information conforms to the objective change rule of the group attributes, and the accuracy of pedestrian attribute analysis is improved. In short, the calculation method is very flexible.
In an implementation manner of the first aspect, the performing the post-processing task by using the semantic context information includes: acquiring an attribute prediction result which is output by an attribute analysis network and aims at the attribute of the pedestrian in the pedestrian image; wherein, the attribute analysis network is a trained neural network model; and correcting the attribute prediction result according to the semantic context information to obtain a corrected attribute prediction result.
When the method carries out post-processing on the attributes of the pedestrians, firstly, an attribute analysis network is utilized to obtain a preliminary attribute prediction result, namely, the influence of image characteristics on attribute values from a data level is considered, then, semantic context information is further used for correcting the attribute prediction result so as to obtain a final attribute prediction result, namely, the influence of the semantic context information on the attribute values from the semantic level is considered, and because the factors of the data level and the semantic level are considered at the same time, the accuracy of the method for carrying out pedestrian attribute prediction is higher.
In addition, the diversity and complexity of the pedestrian images in the acquired scene only influence the image characteristics extracted by the model generally, and the semantic context information of the pedestrian images is not influenced obviously, so that the method is very suitable for pedestrian attribute analysis in the field of video monitoring.
In a second aspect, an embodiment of the present application provides a pedestrian attribute analysis apparatus, including: the information acquisition module is used for acquiring semantic context information of pedestrian attributes in the pedestrian image; the attribute analysis module is used for executing a pedestrian attribute analysis task by utilizing the semantic context information;
wherein the semantic context information is obtained by analyzing the pedestrian image or the associated information thereof, and includes at least one of the following information: the individual attribute association comprises statistical rules obeyed by the association among various pedestrian attributes of the pedestrian individuals or a specific rule set which is met; group attribute information including statistical rules to which at least one pedestrian attribute of a pedestrian group is obeyed, or a specific rule set to which the pedestrian attribute is accorded; the system comprises spatio-temporal constraint information, wherein the spatio-temporal constraint information comprises a statistical rule obeyed by the association between spatio-temporal information corresponding to a pedestrian image and at least one pedestrian attribute or a specific rule set accorded with the spatio-temporal constraint information, and the spatio-temporal constraint information comprises at least one of time, space and scene;
the pedestrian attribute analysis task comprises at least one of the following tasks: model training tasks: inputting the pedestrian image and the pedestrian attribute label, and training a neural network model for deducing the pedestrian attribute by combining the semantic context information of the pedestrian image; model inference tasks: inputting the pedestrian image, and deducing the pedestrian attribute by utilizing a neural network model in combination with the semantic context information of the pedestrian image; and (3) post-processing tasks: and inputting an inference result of a neural network model aiming at the pedestrian attribute in the pedestrian image, and correcting the inference result by combining semantic context information of the pedestrian image.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the method provided by the first aspect or any one of the possible implementation manners of the first aspect.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: a memory in which computer program instructions are stored, and a processor, where the computer program instructions are read and executed by the processor to perform the method provided by the first aspect or any one of the possible implementation manners of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 illustrates a flow of a first pedestrian property analysis method provided in an embodiment of the present application;
FIG. 2 illustrates a structure of an attribute association thermodynamic diagram provided by an embodiment of the present application;
fig. 3 shows a flow chart of a second pedestrian property analysis method provided in the embodiment of the present application;
FIG. 4 illustrates the operation of the pedestrian attribute analysis method of FIG. 3;
fig. 5 shows a flow chart of a third pedestrian property analysis method provided in the embodiment of the present application;
fig. 6 shows a flow chart of a fourth pedestrian property analysis method provided in the embodiment of the present application;
FIG. 7 illustrates the operation of the pedestrian attribute analysis method of FIG. 6;
fig. 8 shows a flow of a fifth pedestrian property analysis method provided in the embodiment of the present application;
FIG. 9 illustrates the operation of the pedestrian attribute analysis method of FIG. 8;
fig. 10 shows a structure of a pedestrian property analysis apparatus according to an embodiment of the present application;
fig. 11 shows a structure of an electronic device according to an embodiment of the present application.
Detailed Description
In the field of video monitoring, due to the fact that the acquisition scene of a monitoring video is diversified and complex, the accuracy of pedestrian attribute analysis performed by the existing method is low. The inventor has found that the existing method mainly has the following problems through long-term research:
(1) The method is easily influenced by environmental factors, so that the accuracy of pedestrian attribute prediction is obviously reduced. The influencing factors include various illumination changes, occlusion, complex background, various resolutions, various image qualities, and the like.
(2) The prior art analyzes based on a single attribute of the pedestrian without considering the correlation among a plurality of attributes of the pedestrian, resulting in low accuracy of pedestrian attribute analysis. For example, if a general "man" wears the "long skirt less frequently, if it is predicted that a certain" man "wears the" long skirt ", it is likely to be an erroneous result in practice, and the prior art cannot deal with the problem.
(3) The prior art analyzes the attribute of a single pedestrian based on the influence of the group attribute on the attribute of an individual pedestrian without considering, so that the accuracy of analyzing the attribute of the pedestrian is not high. For example, if a lot of pedestrians wear 'short sleeves' in a scene, it is predicted that a certain pedestrian wears 'long sleeves', the prediction result is obviously different from the group attribute, and therefore the prediction result is likely to be wrong, and the problem cannot be solved by the prior art.
(4) In the prior art, attribute analysis is only carried out based on a pedestrian image, and the influence of time and space factors on the attribute of an individual pedestrian is not considered, so that the accuracy of pedestrian attribute analysis is not high. For example, the time is summer, the place is Chongqing, the possibility of wearing the 'longuette' is low, the time is winter, the place is northeast, and the possibility of wearing the overcoat is high; for another example, if the scene is at the entrance of a supermarket, "shopping cart" is highly likely to be held, and if the scene is "school," the "shopping cart" is less likely to be held. The prior art fails to address the problem of predicted attribute values deviating from time, place, or scene (e.g., pedestrians are predicted to wear "longuettes" at a summer festival).
In view of the above problems in the prior art, embodiments of the present application provide a method and an apparatus for analyzing a pedestrian attribute, a storage medium, and an electronic device, so as to introduce semantic context information of a pedestrian attribute in a pedestrian image into stages of model training, model inference, post-processing, and the like of pedestrian attribute analysis, so as to improve accuracy of pedestrian attribute analysis and improve video monitoring effect.
It should be noted that, in addition to the technical solutions newly proposed in the present application, the findings of the above problems (1) to (4) by the inventors and the reasons that the inventors analyzed to cause the above problems (1) to (4) should be regarded as contributions of the inventors to the present application, and should not be regarded as contents already existing in the prior art.
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element. The terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Fig. 1 shows a flow of a first pedestrian property analysis method provided in an embodiment of the present application. The method of fig. 1 may be, but is not limited to being, performed by the electronic device shown in fig. 10, as will be described later with respect to fig. 10. Referring to fig. 1, the method includes:
step S01: and acquiring semantic context information of the pedestrian attribute in the pedestrian image.
The pedestrian image refers to an image that may include pedestrians, for example, an image captured by a monitoring camera disposed on a road or a gate, and certainly, some images of pedestrians are not excluded, but the image of pedestrians including pedestrians is mainly taken as an example in the following description.
The solution of the present application does not limit which pedestrian attributes in the pedestrian image are to be analyzed, and may include one or more of the following attributes, for example:
gender (male, female), age (20, 25, 30, etc.), body orientation (forward, sideways, backwards, etc.), jacket type (long sleeves, short sleeves, etc.), jacket color (red, yellow, black, etc.), shirt type (pants, shorts, skirt, etc.), shirt color (red, yellow, black, etc.), shoe type (sneakers, leather shoes, etc.), shoe color (red, yellow, black, etc.), whether to wear overcoat, whether to wear hat, whether to wear glasses, whether to wear a mask, mask color (white, pink, black, etc.), hair length (bald head, short hair, long hair, etc.), hair color (black, white, brown, etc.), whether to hold a professional bag, whether to hold a handbag, whether to hold a single shoulder bag, whether to hold a double shoulder bag, color of a bag (green, blue, white, etc.), whether to hold a draw-bar box, whether to hold a cart, whether to use a mobile phone, whether to hold an umbrella, color, whether to hold a courier, a health guard, a pet, etc.), whether to hold a baby, etc.
Wherein, the semantic context information of the pedestrian attribute can be defined as follows: is some kind of information associated with the semantics of the pedestrian attributes, which can semantically influence the values of the pedestrian attributes. In contrast, although the image features can also affect the value of the pedestrian attribute, such effects are generated from a data level and have no direct relationship with the semantics of the pedestrian attribute.
Three semantic information context information are listed below:
A. individual attribute association
The individual attribute association includes statistical rules, or a particular set of rules, under which the association between various pedestrian attributes of pedestrian individuals is followed.
The statistical rule in the individual attribute association may be a statistical distribution followed by or a statistical amount possessed by the association between the various pedestrian attributes of the individual pedestrians. The information indicating the statistical distribution may include, but is not limited to, a type of the statistical distribution (e.g., normal distribution, binomial distribution), a parameter (e.g., mean, variance), and the like. For example, the statistical distribution may be a joint distribution to which pedestrian attributes obey, the statistical quantity may be a conditional probability, a correlation coefficient, or the like.
In some implementations, the statistical distribution to which associations between various pedestrian attributes of individual pedestrians are subject may be implemented as: the attribute association thermodynamic diagram comprises numerical values representing the degree of correlation between every two pedestrian attributes, namely the individual attribute association is quantitatively represented, and correlation calculation is facilitated.
Fig. 2 shows a structure of the attribute correlation thermodynamic diagram, and referring to fig. 2, the attribute correlation thermodynamic diagram can be regarded as a table, rows and columns of the table respectively represent pedestrian attributes to be subjected to correlation analysis, such as "gender", "age", "hair style", "jacket type", "jacket color", and the like, and each cell corresponds to a numerical value representing a degree of correlation between two pedestrian attributes corresponding to the row and column in which the cell is located, such as a degree of correlation between "gender" and "age" being 0.1, a degree of correlation between gender "and its own being 1, and the like. It can be understood that the attribute-associated thermodynamic diagram may include all the attributes of pedestrians to be analyzed in the pedestrian image, or may include only some attributes of pedestrians of interest, for example, attributes of pedestrians that have a significant influence on the values of the attributes.
In addition to attribute association thermodynamic diagrams, the statistical distribution to which associations between various pedestrian attributes of individual pedestrians are subjected may be implemented as: conditional probability distribution obeyed among various pedestrian attributes of the individual pedestrians, joint distribution obeyed among various pedestrian attributes of the individual pedestrians, and the like;
the specific rule set in the individual attribute association includes: at least one rule describing a plurality of pedestrian attributes subject to a correlation between the attributes. These rules may be solidified in a hard-coded manner in the program of the line personality analysis. For example, if the sex obtained from the pedestrian attribute analysis result is "male" and the hairstyle is "long hair", the hairstyle can be automatically corrected to "short hair" according to the rule. For example, if the occupation obtained from the pedestrian attribute analysis result is "doctor", the color of the jacket can be automatically predicted as "white" according to the rule. The rules in a particular rule set may be freely set, for example, determined based on expert experience, or may be summarized based on statistical rules.
B. Group attribute information
The group attribute information includes statistical rules to which at least one pedestrian attribute of the pedestrian group is obeyed, or a particular set of rules that are met.
The statistical rule in the group attribute information may be a statistical distribution obeyed by or having a statistical amount of at least one pedestrian attribute of the pedestrian group. The information indicating the statistical distribution may include, but is not limited to, a type of the statistical distribution (e.g., normal distribution, binomial distribution), a parameter (e.g., mean, variance), and the like. For example, the statistical distribution may be a statistical distribution subject to a certain pedestrian property, a joint distribution subject to various pedestrian properties, or the like, and the statistical quantity may be a low-order statistical quantity such as a mean, a maximum, a minimum, an expectation, a variance, a covariance, a correlation coefficient, or the like, or a high-order statistical quantity such as a high-order moment, a high-order cumulant spectrum, or the like.
For example, according to the priori knowledge, the "age" attribute obeys the normal distribution, the corresponding population attribute information may include the mean and the variance of the normal distribution, and if the mean and the variance exist, the probability density function of the normal distribution may be obtained.
The specific rule set in the group attribute information includes: at least one rule describing that a pedestrian attribute is constrained by the population of the attribute itself. These rules may be solidified in a hard-coded manner in the program of the line personality analysis. For example, if the pedestrian attribute analysis result indicates that 95% or more of the pedestrian jacket types in the pedestrian image are "short sleeves", the attribute of the pedestrian whose jacket type is "long sleeves" may be automatically corrected to "short sleeves" according to the rule. The rules in a particular rule set may be set freely, for example, based on expert experience, or may be summarized based on statistical rules.
C. Spatio-temporal constraint information
And the space-time constraint information comprises a statistical rule obeyed by the association between the space-time information corresponding to the pedestrian image and at least one pedestrian attribute or a specific rule set which is met.
The space-time information corresponding to the pedestrian image comprises at least one of time, space and scene. The time in the spatiotemporal information can be the season, the week, the day and the night for collecting the pedestrian image, and the like; the space in the spatiotemporal information may be specific position information for acquiring the pedestrian image, such as longitude and latitude, regions (such as province, city, south China and north China) and the like of the pedestrian image, or abstract position information, such as a roadway, a sidewalk and the like; scenes in the spatiotemporal information may capture a particular environment of the pedestrian image, such as a supermarket, hospital, school, indoor, outdoor, etc.
The statistical rule in the spatio-temporal constraint information may be a statistical distribution obeyed by at least one pedestrian attribute and using spatio-temporal information corresponding to a pedestrian image as a prior constraint, or a statistical quantity possessed by the statistical distribution. The information indicating the statistical distribution may include, but is not limited to, a type of the statistical distribution (e.g., normal distribution, binomial distribution), a parameter (e.g., mean, variance), and the like. For example, the statistical distribution may be a conditional distribution or a joint distribution to which at least one pedestrian attribute is subject, and the statistical quantity may be a low order statistical quantity such as a mean, a maximum, a minimum, an expectation, a variance, a covariance, a correlation coefficient, or the like, or a high order statistical quantity such as a high order moment, a high order cumulant spectrum, or the like.
For example, the "age" attribute obeys a distribution P (θ), θ is a distribution parameter, and when the spatio-temporal constraint information is considered, the "age" attribute obeys a conditional distribution P (θ | s) or a joint distribution P (θ, s), where s represents a distribution parameter corresponding to the spatio-temporal information.
The specific rule set in the spatiotemporal constraint information includes: at least one rule describing that pedestrian attributes are constrained by spatiotemporal information. These rules may be solidified in a hard-coded manner in the program of the line personality analysis. For example, if the type of the shirt obtained from the pedestrian attribute analysis result is "long skirt", and the time in the spatiotemporal information is "summer" and the space is "Chongqing", the type of the shirt may be automatically corrected to "short skirt" according to the rule. For example, if the pedestrian "holds a cart" is obtained from the analysis result of the attribute of the pedestrian and the scene in the spatiotemporal information is "school", it is possible to automatically predict whether or not the pedestrian holds a cart as "no" according to the rule. The rules in a particular rule set may be freely set, for example, determined based on expert experience, or may be summarized based on statistical rules.
It is understood that the semantic context information is not limited to the above three information items, and the semantic context information may include one or more of the above three information items, for example, in fig. 4, the semantic context information includes the individual attribute association, the group attribute information, and the spatio-temporal constraint information.
The semantic context information of the pedestrian attribute may be obtained by analyzing the pedestrian image or the related information thereof, for example, may be obtained by analyzing the features of the image, may be obtained by performing statistics on the pedestrian attribute in the pedestrian image, and may also take into account the related information of the pedestrian image (for example, the above-mentioned spatiotemporal information, etc.) when performing the statistics. The specific manner of obtaining the semantic context information will be further exemplified below.
Step S02: and executing a pedestrian attribute analysis task by utilizing the semantic context information.
The pedestrian attribute analysis task comprises at least one of the following tasks:
model training tasks: inputting a pedestrian image and a pedestrian attribute label, and training a neural network model for deducing the pedestrian attribute by combining semantic context information of the pedestrian image.
Model inference tasks: inputting a pedestrian image, and deducing the pedestrian attribute by using a neural network model in combination with semantic context information of the pedestrian image.
And (3) post-processing tasks: and (3) inputting an inference result of the neural network model aiming at the pedestrian attribute in the pedestrian image, and correcting the inference result by combining the semantic context information of the pedestrian image.
Specific examples of performing these several pedestrian attribute analysis tasks will be given later and will not be set forth herein for the time being.
In summary, when analyzing the pedestrian attribute, the pedestrian attribute analysis method provided in the embodiment of the present application uses semantic context information of the pedestrian attribute, such as individual attribute association, group attribute information, and spatiotemporal constraint information, at stages of model training, model inference, post-processing of a model inference result, and the like, so that the accuracy of pedestrian attribute analysis can be significantly improved.
The reason is that although the pedestrian image may have diversity and complexity in the acquisition scene, the diversity and complexity generally only affect the image features extracted by the model, but do not significantly affect the attribute semantic context information in the pedestrian image, because the semantic context information is either a certain statistical rule or a fixed rule, that is, has a certain stability in the statistical sense, the analysis process of the pedestrian attribute is constrained by the semantic context information, and a better analysis result of the pedestrian attribute can be obtained.
It can be understood that the method can well meet the analysis requirement on the pedestrian attribute in video monitoring, and can be applied to other occasions except video monitoring.
Next, it is briefly analyzed how the pedestrian attribute analysis method improves the aforementioned problems (1) to (4). It is to be appreciated that using semantic context information for pedestrian attribute analysis may improve the problem (2) if the semantic context information includes individual attribute associations, the problem (3) if the semantic context information includes population attribute information, and the problem (4) if the semantic context information includes spatiotemporal constraint information. As for the problem (1), semantic context information also improves the problem, because the illumination change, occlusion, complex background, various resolutions, various image qualities mentioned in the problem (1) directly affect the picture of the pedestrian image, but have no great influence on the semantic context information. For example, the upper half of a certain pedestrian is blocked by other pedestrians, and it is difficult to identify whether the pedestrian wears overcoat directly through the pedestrian image, but if the neural network model can identify that all the surrounding pedestrians wear the overcoat, and the acquisition time of the pedestrian image is "winter", the pedestrian can be predicted to wear the overcoat with a relatively high confidence, rather than "guessing" an attribute value without any basis. It is understood that in practice, the decision logic of the neural network model may be embodied in the trained model parameters, and the above analysis merely explains the prediction behavior of the model from an easy-to-understand point of view. Fig. 3 shows a flow of a second pedestrian attribute analysis method provided in the embodiment of the present application, where the flow may be regarded as an execution flow of the model training task mentioned in step S02, fig. 4 shows a detailed working principle of the method, and the method in fig. 3 may be executed by, but is not limited to, the electronic device shown in fig. 10, which is specifically described later with reference to fig. 10. Referring to fig. 3, the method includes:
step S110: and extracting the image characteristics of the pedestrian image by using an attribute analysis network, and carrying out classification prediction on the pedestrian attributes in the pedestrian image according to the image characteristics to obtain an attribute prediction result.
The pedestrian images in step S110 are pedestrian images in the training set, and are accompanied by pedestrian attribute labels, that is, real attribute values of the pre-labeled pedestrian attributes, and are used for performing supervised model training. The attribute analysis network in step S110 is a neural network model to be trained, and the specific structure thereof is not limited. The attribute analysis network treats the prediction of each attribute as a multi-label classification problem, for example, for gender attributes, the desirable attribute values are two: "Male" and "female" can therefore be considered as a binary problem, the classification label includes both "Male" and "female", and the prediction result of the gender attribute can be the confidence of each attribute value, e.g., "Male" -0.9 and "female" -0.1.
Step S120: and calculating the classification loss according to the difference between the attribute prediction result and the pedestrian attribute label of the pedestrian image.
For a specific pedestrian in the pedestrian image, the attribute label thereof is fixed, for example, the label of the gender attribute is "male". Therefore, the loss generated by predicting the gender attribute by the attribute analysis network can be calculated by substituting the prediction result of the gender attribute and the label of the gender attribute into a preset loss function (the form of the loss is not limited), and the loss represents the difference between the prediction result of the gender attribute and the label of the gender attribute to some extent. The classification loss in step S120 can be regarded as the sum of losses generated by the attribute analysis network predicting the attributes of each pedestrian.
Referring to fig. 4, in some implementations, the attribute analysis network further includes a feature extraction network and an attribute classification network connected to the feature extraction network.
The feature extraction network is used for extracting image features of the pedestrian image, and for example, the feature extraction network may be a convolutional neural network, such as VGG, resNet, and the like. The attribute classification network is configured to perform classification prediction on the pedestrian attribute in the pedestrian image based on the extracted image features and output an attribute prediction result, for example, the attribute classification network may be a multi-branch network, and each branch is configured to perform classification prediction on an attribute and output a prediction result (a single-branch network may be used if only one pedestrian attribute is predicted), where the image features output by the feature extraction network are common image features for the branches. In some implementations, each branch may include a fully connected layer and a classifier (which may also be replaced with a 1 × 1 convolution), and, according to requirements, several convolution layers may be added before the fully connected layer to extract a proprietary image feature of the pedestrian attribute corresponding to the branch on the basis of the common image feature.
Step S130: and obtaining semantic context information of the pedestrian attributes in the pedestrian image, and calculating semantic loss according to the semantic context information.
The definition of semantic context information has already been explained above. The following description explains three methods for obtaining semantic context information, and explains the corresponding calculation method of semantic loss.
A. Individual attribute association
In some implementations, if the attribute analysis network includes a feature extraction network and an attribute classification network, an attribute association extraction network may be configured to obtain the individual attribute associations, where the attribute association extraction network is part of the attribute analysis network and is structurally connected to the feature extraction network. The attribute association extraction network takes the image features output by the feature extraction network as input, outputs individual attribute association of the pedestrian image, and the specific network structure is not limited. In fig. 4, the attribute association extraction network is not directly indicated, but the location of the block "individual attribute association" may be understood as the attribute association extraction network for obtaining the individual attribute association.
In these implementations, the attribute analysis network can be viewed as two parts: the first part comprises a feature extraction network and an attribute classification network, and is used for classifying and predicting the pedestrian attributes in the pedestrian image; the second part comprises a feature extraction network and an attribute association extraction network, and is used for acquiring individual attribute association. It is easy to see that, the two parts of networks share the feature extraction network, and the network architecture mode is favorable for saving the computation on one hand and enables the incidence relation between the attributes to generate the constraint effect on the attribute prediction result on the other hand: the second part of the network updates the parameters of the feature extraction network by calculating semantic loss, and the first part of the network also uses the image features extracted by the feature extraction network for pedestrian property prediction.
It is to be appreciated that in other implementations, the attribute association extraction network may also obtain individual attribute associations directly based on the pedestrian images, and is not connected to the feature extraction network.
Obviously, there are other ways to obtain the individual attribute association, for example, counting the pedestrian images in the training set, directly reading a specific rule set for the individual attribute association, and the like.
In some implementations, given that the associations between attributes should be statistically stable, the individual attribute associations in a single pedestrian image should tend to be consistent with the statistically individual attribute associations, so that a first semantic loss corresponding to an individual attribute association can be calculated as follows:
first, an average individual attribute correlation of all pedestrian images in a training batch (e.g., a mini-batch) in which the pedestrian images are located is calculated, where the average individual attribute correlation may be a mean value of the individual attribute correlations of all pedestrian images in the training batch (of course, the individual attribute correlation should be quantified so that a mean value can be calculated, for example, the aforementioned attribute correlation thermodynamic diagram is a quantification method). Then, a first semantic loss is calculated according to the difference between the individual attribute association and the average individual attribute association, and a specifically adopted loss function is not limited. According to the meaning of the loss, after model training is carried out by utilizing the first semantic loss, the individual attribute association is gradually close to the average individual attribute association.
B. Group attribute information
The group attribute information may be obtained by counting at least one pedestrian attribute in some or all of the pedestrian images in the training set, although other calculation methods are not excluded. For example, for the "age" attribute, the distribution type thereof may be determined as a normal distribution according to the prior knowledge, and then the mean and variance of the normal distribution are determined according to the statistical result of all pedestrian images in the training set. For example, for the "age" attribute, the distribution type may not be specified in advance, and all the pedestrian images in the training set may be counted to determine a distribution and a parameter corresponding to the distribution (or the distribution estimated by some methods may have no parameter).
Obviously, there are other ways to acquire the group attribute information, for example, directly reading a specific rule set for the group attribute information, and the like.
In some implementations, considering that individual attributes and group attributes should be somewhat similar, the attribute predictions obtained based on a single pedestrian image should tend to be consistent with group attribute predictions based on group attribute information, so that a second semantic loss corresponding to group attribute information can be calculated as follows:
first, quantization represents the attribute predictors, e.g., each attribute value is represented by a numerical value, and the attribute predictors can be represented as a vector.
Then, a group attribute prediction result representing the attribute of the pedestrian in the pedestrian image predicted from the group attribute information is calculated and quantized. For example, if a certain pedestrian attribute obeys the distribution P (θ), P (θ) may be regarded as group attribute information, and sampling P (θ) may obtain a specific attribute value, which may represent a pedestrian attribute predicted according to the group attribute information, and similarly, after sampling the distribution obeyed by each pedestrian attribute, all pedestrian attributes predicted according to the group attribute information may be obtained, and then the pedestrian attributes may be quantized into a vector.
Finally, a second semantic loss is calculated according to the difference between the quantized attribute prediction result and the group attribute prediction result, and the specific calculation method is not limited, and for example, the L2 distance between the two vectors may be calculated. And according to the meaning of the loss, after model training is carried out by utilizing the second semantic loss, the attribute prediction result is gradually close to the group attribute prediction result.
C. Spatio-temporal constraint information
The spatiotemporal information corresponding to the pedestrian images can be directly from a camera for collecting the pedestrian images, and can also be from a platform or a system with video collection or storage functions, for example, some monitoring systems can label videos collected by the camera, and the labels contain spatiotemporal information, so that the required spatiotemporal information can be obtained by accessing the monitoring systems. After the spatiotemporal information corresponding to the pedestrian image is obtained, the spatiotemporal information is taken as prior constraint to count at least one pedestrian attribute in part or all of the pedestrian images in the training set, and then the spatiotemporal constraint information of the pedestrian attribute in the pedestrian image can be obtained. The statistical method can refer to the statistics of the group attribute information.
Obviously, there are other ways to obtain spatiotemporal constraint information, such as directly reading a specific set of rules set for the spatiotemporal constraint information, etc.
When analyzing the pedestrian attribute based on the pedestrian image, the information in the image should be discovered as much as possible, and the obtained attribute prediction result should be made to be as consistent as possible with the spatio-temporal attribute prediction result obtained based on the spatio-temporal constraint information, and based on such motivation, the third semantic loss corresponding to the spatio-temporal constraint information may be calculated as follows:
first, quantization represents the attribute predictors, e.g., each attribute value is represented by a numerical value, and the attribute predictors can be represented as a vector.
Then, a spatiotemporal attribute prediction result is calculated and quantitatively expressed, and the spatiotemporal attribute prediction result represents the pedestrian attribute in the pedestrian image predicted according to the spatiotemporal constraint information. For example, a hard coding mode may be adopted, for example, if the time is "winter", the location is "northeast", and the scene is "outdoor", the value of the attribute of whether to wear the overcoat is directly set as "wear the overcoat"; for another example, if a certain pedestrian attribute obeys distribution P (θ), when the spatio-temporal constraint information is considered, the pedestrian attribute can be considered to obey a conditional distribution P (θ | s) or a joint distribution P (θ, s), where s represents a distribution parameter corresponding to the spatio-temporal information, sampling P (θ | s) or P (θ, s) can obtain a specific attribute value, which can represent a pedestrian attribute predicted according to the spatio-temporal constraint information, and similarly, after sampling the distribution obeyed by each pedestrian attribute, all pedestrian attributes predicted according to the group attribute information can be obtained, and then the pedestrian attributes can be quantized into a vector.
Finally, a third semantic loss is calculated according to the difference between the quantized attribute prediction result and the space-time attribute prediction result, and a specific calculation method is not limited, for example, the L2 distance between two vectors can be calculated. And according to the meaning of the loss, after model training is carried out by utilizing the third semantic loss, the attribute prediction result is gradually close to the space-time attribute prediction result.
It is understood that the above three semantic losses can be calculated as one or more terms, for example, since the semantic context information includes the individual attribute association, the group attribute information and the spatio-temporal constraint information in fig. 4, the semantic loss at this time can be the accumulation of the first semantic loss, the second semantic loss and the third semantic loss (either the direct summation or the weighted summation).
In addition, it should be noted that in the model training process, some attribute semantic information does not need to be repeatedly calculated, for example, once the population attribute information is calculated, only needs to be obtained and used when the second semantic loss is calculated each time, and does not need to be calculated again.
Step S140: and calculating fusion loss according to the classification loss and the semantic loss, and updating the parameters of the attribute analysis network according to the fusion loss.
In the implementation shown in fig. 2, the sum of the classification penalty and the semantic penalty (which may be a direct sum or a weighted sum) is taken as the fusion penalty. After the fusion loss is obtained, the parameters of the attribute analysis network can be updated by using a back propagation algorithm according to the fusion loss, and the specific method can refer to the prior art. It is also not excluded that in some implementations the semantic loss is directly taken as the fusion loss, in which case the step of calculating the classification loss in step S120 may not necessarily be performed.
In actual training, batch (for example, mini-batch) training is also often adopted, and the sum of fusion losses of all pedestrian images in one batch is calculated and model parameters are updated each time.
In summary, in the pedestrian attribute analysis method in fig. 3, when the attribute analysis network is trained, not only the conventional classification loss based on the image features but also the semantic loss based on the semantic context information is calculated, so that when the trained model performs classification prediction on the pedestrian attributes, not only the influence of the image features on attribute values from a data level but also the influence of the semantic context information on attribute values from a semantic level are considered, and therefore, the accuracy of the model for performing pedestrian attribute analysis can be significantly improved.
It can be understood that the method can well meet the analysis requirement on the pedestrian attribute in video monitoring, and can be applied to other occasions except for video monitoring.
Fig. 5 illustrates a third pedestrian attribute analysis method provided in the embodiment of the present application, and the method in fig. 5 may be, but is not limited to being, executed by the electronic device illustrated in fig. 10, which will be described later in detail with reference to fig. 10. Referring to fig. 5, the method includes:
step S210: and acquiring a pedestrian image to be analyzed.
Step S220: the method comprises the steps of extracting image features of a pedestrian image by using an attribute analysis network trained by a pedestrian attribute analysis method when a model training task is executed (namely the method in fig. 3 and various possible implementation modes thereof), and carrying out classification prediction on pedestrian attributes in the pedestrian image according to the image features to obtain an attribute prediction result.
The steps of the method of fig. 5 are similar to step S110 and will not be repeated. The attribute prediction result in step S220 may refer to the confidence of each attribute value, for example, "male" -0.9, "female" -0.1, or may refer to the attribute prediction result finally output to the user, for example, "male" (attribute value with highest confidence), which is not strictly limited in this application.
When the pedestrian attribute analysis method in fig. 5 is used for classifying and predicting the pedestrian attributes, the attribute analysis network obtained by training the method in fig. 3 is used, so that the prediction accuracy is high, and the method is very suitable for pedestrian attribute analysis in the field of video monitoring.
Fig. 6 shows a flow of a fourth pedestrian attribute analysis method provided in the embodiment of the present application, where the flow may be regarded as an execution flow of the model inference task mentioned in step S02, fig. 7 shows a detailed working principle of the method, and the method in fig. 6 may be executed by, but is not limited to, the electronic device shown in fig. 10, which is described in detail later with reference to fig. 10. Referring to fig. 6, the method includes:
step S310: and extracting the preliminary image characteristics of the pedestrian image by using the attribute analysis network.
The attribute analysis network is a pre-trained neural network model, and the training method may be the training method shown in fig. 3 (or a possible implementation manner thereof), or may be a training method for such a network in the prior art. Taking fig. 7 as an example, the attribute analysis network in step S310 may include a feature extraction network and an attribute classification network, and the functions of the feature extraction network and the attribute classification network are similar to those of the same-name network in fig. 4, but it should be noted that the attribute classification network in fig. 4 outputs the result of the attribute prediction, and the attribute classification network in fig. 7 outputs the preliminary image features, which may be understood that the attribute classification network in fig. 7 is only the main part of the attribute classification network in fig. 4, for example, the attribute classification network in fig. 7 may be obtained by removing the structures of the full connection layer, the classifier, and the like at the end of the attribute classification network in fig. 4, and the preliminary image features output by the attribute classification network in fig. 7 may also be directly used for performing classification prediction on the pedestrian attributes in the pedestrian image (for example, directly input into the structures of the full connection layer, the classifier, and the like), but in the method in fig. 6, the preliminary image features may be further processed (step S320). If the attribute classification network in fig. 7 includes a plurality of branch networks, the preliminary image features may also include a plurality.
Step S320: and obtaining semantic context information of the pedestrian attributes in the pedestrian image, and calculating the preliminary image features according to the semantic context information to obtain final image features.
Regarding semantic context information, it has been explained in the introduction of the method in fig. 1 that the inference stage may be different from the training stage in the manner of obtaining the semantic context information.
For example, the semantic context information in FIG. 7 includes individual attribute associations, group attribute information, and spatiotemporal constraint information. The individual attribute association used in the inference stage may not be acquired by using an attribute association extraction network, for example, a fixed individual attribute association (for example, averaging) may be determined according to the individual attribute association obtained in the training stage, and the fixed individual attribute association is directly used in the inference stage, which also reflects the stability of the association between the attributes in a statistical sense. In fig. 7, the box "individual attribute association" is not connected to the box "feature extraction network", that is, it indicates that the image feature is not needed to obtain the individual attribute association. Alternatively, the individual attribute associations may be formed as a fixed rule based on the individual attribute associations obtained during the training phase, i.e. the individual attribute associations are provided in a hard-coded form.
For the group attribute information used in the inference stage, there are several methods for obtaining it: for example, some or all of the pedestrian images in the training set may be obtained statistically; for another example, some or all of the inferred (i.e., attribute prediction results obtained) pedestrian images may be statistically derived; for another example, the statistical analysis can be performed according to the inferred part or all of the pedestrian images and the part or all of the pedestrian images in the training set; or, it may also be obtained by calculating the existing group attribute information obtained previously, and so on. Of course, if the group attribute information is a specific rule set, direct reading is sufficient. Since the inference is a continuous process, inferred pedestrian images are generated continuously, and in some implementations, after a new attribute prediction result is obtained (i.e., after step S330 is executed), the new prediction result may be added to statistics of group attribute information to update the group attribute information, which enables the obtained group attribute information to conform to an objective variation rule of the group attribute, thereby facilitating improvement of accuracy of pedestrian attribute analysis.
In particular, at the beginning of the estimation, since there is no inferred pedestrian image, it is impossible to count the group attribute information only from the inferred pedestrian image, and an initial group attribute information is obtained. For example, the group attribute information used in the training phase may be directly used as the initial group attribute information in the inference phase; for example, the initial group attribute information of the estimation stage may be calculated from the group attribute information obtained by statistics from other sources, for example, the pedestrian image to be analyzed in the estimation stage is a pedestrian image from city X, and the group attribute information of city Y and city Z is respectively obtained by statistics when the pedestrian images of city Y and city Z are estimated, so that the initial group attribute information of city X can be obtained by averaging the group attribute information of city Y and city Z.
For the spatiotemporal constraint information used in the inference stage, the spatiotemporal information corresponding to the pedestrian image can be acquired from a camera, a platform or a system for acquiring the pedestrian image, and then the spatiotemporal information is taken as prior constraint and is obtained by counting the pedestrian attributes, wherein the statistical mode can refer to the statistical mode of the group attribute information. Of course, if the spatiotemporal constraint information is a specific rule set, it may be read directly.
After the semantic context information is acquired, the preliminary image features can be operated according to the semantic context information, and the purpose is to fuse the semantic context information into the preliminary image features. The specific operation manner is not limited, and may be, for example, dot product calculation, cross product calculation, weighting operation, L2 distance calculation, and the like. After the operation in step S320 is completed, the obtained image feature is referred to as a final image feature.
Step S330: and carrying out classification prediction on the pedestrian attributes in the pedestrian image according to the final image features to obtain an attribute prediction result.
The final image features can be directly input into the structures such as the full connection layer and the classifier to obtain the attribute prediction result, and certainly, the pedestrian attribute prediction can also be performed after the final image features are further subjected to feature extraction, and the comparison is not limited in the application.
In summary, when the pedestrian attribute analysis method in fig. 6 infers the pedestrian attribute, the attribute analysis network is first used to extract the preliminary image feature of the pedestrian image, that is, the influence of the image feature on the attribute value from the data level is considered, then the preliminary attribute image feature is further corrected by using the semantic context information to obtain the final image feature, and finally the attribute prediction result is obtained according to the final image feature, that is, the influence of the semantic context information on the attribute value from the semantic level is considered.
Moreover, because the diversity and complexity of the pedestrian image in the acquisition scene generally only affect the image features extracted by the model, and cannot significantly affect the semantic context information of the pedestrian attribute in the pedestrian image, the method is very suitable for pedestrian attribute analysis in the field of video monitoring, but is not limited to be applied in the field.
Fig. 8 shows a flow of a fifth pedestrian attribute analysis method provided in the embodiment of the present application, which may be regarded as an execution flow of the post-processing task mentioned in step S02, fig. 9 shows a detailed working principle of the method, and the method in fig. 8 may be, but is not limited to be, executed by the electronic device shown in fig. 10, which is specifically described later with reference to fig. 10. Referring to fig. 8, the method includes:
step S410: and acquiring an attribute prediction result output by the attribute analysis network and aiming at the pedestrian attribute in the pedestrian image.
The attribute analysis network is a pre-trained neural network, and the training method may be the training method shown in fig. 3 (or a possible implementation manner thereof), or may be a training method for such a network in the prior art. Taking fig. 9 as an example, the attribute analysis network in step S410 may include a feature extraction network and an attribute classification network, which have functions similar to those of the same-name network in fig. 4 and will not be described again.
Step S410 includes at least two cases, one is to directly obtain the obtained attribute prediction result (i.e. the method in fig. 8 is only responsible for post-processing and is not responsible for calculating the attribute prediction result), and the other is to calculate the attribute prediction result in a similar manner to step S110, which is not repeated, and the following case is mainly taken as an example. It should be noted that, only the preliminary attribute prediction result, not the final attribute prediction result, is obtained in step S410.
Step S420: and acquiring semantic context information of the pedestrian image, and correcting the attribute prediction result according to the semantic context information to obtain a corrected attribute prediction result.
The semantic context information is already explained when the method in fig. 1 is introduced, and the method for obtaining the semantic context information may refer to the explanation in step S320, and the explanation is not repeated.
After the semantic context information is acquired, the attribute prediction result can be corrected according to the semantic context information, and the purpose is to adjust the value of the attribute prediction result according to the semantic context information. The specific correction method is not limited, and for example, the attribute prediction result may be calculated by dot product calculation, cross product calculation, weighting calculation, L2 distance calculation, or the like. After the correction in step S420 is completed, the obtained corrected attribute prediction result may also be referred to as a final attribute prediction result. Since the attribute prediction result has already been obtained in step S410, the method in fig. 8 can be regarded as a method used at a post-processing stage (a stage after the result is obtained in the inference stage).
Further, each time a new final attribute prediction result is obtained (i.e., after step S420 is performed), the new final attribute prediction result may be added to the statistics of the group attribute information to update the group attribute information.
The following describes a modification process of the attribute prediction result, taking a case where the semantic context information includes individual attribute association as an example:
for example, in a simpler implementation, the association of individual attributes in the post-processing stage may be fixed to a hard-coded rule, such as if the gender obtained from the attribute prediction result is "male" and the hairstyle is "long hair", then the hairstyle is automatically modified to "short hair" according to the hard-coded rule.
For another example, in a slightly more complex implementation, the attribute prediction results may be modified by a dot-product operation. Assume the attribute prediction result is:
sex: "Male" -0.9 "female" -0.1
Hairstyle: long hair-0.6 short hair-0.4
If the result is directly predicted according to the attributes, the given pedestrian attributes are as follows: "Male" and "long hair". Considering now the individual attribute association, assuming that the degree of correlation between the two attributes of gender and hair style in the attribute association thermodynamic diagram is 0.8, the degrees of correlation, which can be embodied as "male" and "short hair", "female" and "long hair", are 0.8, respectively, according to the actual situation.
Therefore, the dot product operation can be performed on the attribute prediction result according to the individual attribute association, which is specifically as follows:
Figure BDA0003986717420000191
Figure BDA0003986717420000192
in which 2 in the denominator represents a normalization factor, it can be easily seen from the above equation that, after considering that the confidence of the sex is 0.9 for "male" and the correlation degree of "male" and "short hair" is 0.8, the probability of predicting the hair style to be "long hair" is decreased, and the probability of predicting the hair style to be "short hair" is increased. The attribute prediction result obtained after correction is as follows:
hairstyle: long hair-0.24 short hair-0.76
If the pedestrian attribute is predicted according to the corrected attribute prediction result, the pedestrian attribute can be given as follows: "Male" and "short hair". Although the situation that the male leaves long hair is not excluded in practice, the situation that the male leaves the long hair is more likely to be caused by inaccurate model prediction, and the accuracy of the corrected attribute prediction result is higher in overall view.
In summary, when the pedestrian attribute analysis method in fig. 8 performs post-processing on the pedestrian attribute, firstly, the attribute analysis network is used to obtain a preliminary attribute prediction result, that is, the influence of the image features on the attribute value from the data level is considered, and then, the preliminary attribute prediction result is further corrected by using the semantic context information to obtain a final attribute prediction result, that is, the influence of the semantic context information on the attribute value from the semantic level is considered.
Moreover, because the diversity and complexity of the pedestrian images in the captured scene generally only affect the image features extracted by the model, and do not significantly affect the semantic context information of the pedestrian images, the method is very suitable for pedestrian attribute analysis in the field of video monitoring, but is not limited to be applied in this field.
Regarding the improvement effect of the pedestrian attribute analysis method in fig. 3, 5, 6, and 8 on the aforementioned problems (1) - (4), reference may be made to the analysis of the pedestrian attribute analysis method in fig. 1, and the description is not repeated here.
From the above, it can be seen that the key point of the present application is to use semantic context information to perform pedestrian attribute analysis, and according to the difference of the use stages of the semantic context information, the present application has at least the following five typical forms:
(1) Used in the training phase, unused in the inference phase, and unused in the post-processing phase;
(2) Used in the training phase, not used in the inference phase, and used in the post-processing phase;
(3) Used in the inference phase and not used in the training phase and the post-processing phase;
(4) The method is used in a training stage and an inference stage, and is not used in a post-processing stage;
(5) It is not used in the training phase and the inference phase, but in the post-processing phase.
The above five forms can be obtained by combining the methods of fig. 3, 5, 6 and 8, and will not be described in detail.
Fig. 10 shows a structure of a pedestrian attribute analysis apparatus 500 according to an embodiment of the present application. Referring to fig. 10, the pedestrian property analysis apparatus 500 includes:
an information obtaining module 510, configured to obtain semantic context information of a pedestrian attribute in a pedestrian image;
an attribute analysis module 520, configured to perform a pedestrian attribute analysis task using the semantic context information;
wherein the semantic context information is obtained by analyzing the pedestrian image or the associated information thereof, and includes at least one of the following information: the individual attribute association comprises statistical rules obeyed by the association among various pedestrian attributes of the individual pedestrians or a specific rule set accorded with the individual pedestrian attributes; group attribute information including statistical rules to which at least one pedestrian attribute of a pedestrian group is obeyed, or a specific rule set to which the pedestrian attribute is accorded; the system comprises spatio-temporal constraint information, wherein the spatio-temporal constraint information comprises a statistical rule obeyed by the association between spatio-temporal information corresponding to a pedestrian image and at least one pedestrian attribute or a specific rule set accorded with the spatio-temporal constraint information, and the spatio-temporal constraint information comprises at least one of time, space and scene;
the pedestrian attribute analysis task comprises at least one of the following tasks: model training tasks: inputting the pedestrian image and the pedestrian attribute label, and training a neural network model for deducing the pedestrian attribute by combining semantic context information of the pedestrian image; model inference tasks: inputting the pedestrian image, and deducing the pedestrian attribute by utilizing a neural network model in combination with the semantic context information of the pedestrian image; and (3) post-processing tasks: and inputting an inference result of a neural network model aiming at the pedestrian attribute in the pedestrian image, and correcting the inference result by combining semantic context information of the pedestrian image.
In one implementation of the personal attribute analysis apparatus 500, the statistical rules in the individual attribute association include: the method comprises the following steps of obtaining an attribute association thermodynamic diagram among various pedestrian attributes of individual pedestrians, wherein the attribute association thermodynamic diagram comprises numerical values representing the degree of correlation between every two pedestrian attributes; or, a correlation coefficient between various pedestrian attributes of individual pedestrians; or, a conditional probability distribution obeyed between various pedestrian attributes of individual pedestrians; or, a joint distribution obeyed between various pedestrian attributes of individual pedestrians; the particular rule set in the individual attribute association includes: at least one rule describing a plurality of pedestrian attributes subject to a correlation between the attributes.
In one implementation of the people attribute analysis apparatus 500, the statistical rules in the group attribute information include: a statistical distribution to which at least one pedestrian attribute of a pedestrian population is subject; or, the at least one pedestrian attribute of the pedestrian population has statistics comprising: at least one of mean, variance, covariance, maximum, minimum, high order moment, high order cumulant spectrum; the specific rule set in the group attribute information comprises: at least one rule describing that a pedestrian attribute is constrained by the population of the attribute itself.
In one implementation of the pedestrian attribute analysis apparatus 500, the statistical rules in the spatiotemporal constraint information include: statistical distribution obeyed by at least one pedestrian attribute and using the space-time information corresponding to the pedestrian image as prior constraint; alternatively, the at least one pedestrian attribute has statistics comprising: at least one of mean, variance, covariance, maximum, minimum, high order moment, high order cumulant spectrum; the specific rule set in the spatiotemporal constraint information includes: at least one rule describing that pedestrian attributes are constrained by spatiotemporal information.
In one implementation of the pedestrian attribute analysis apparatus 500, the pedestrian image is a training sample in a training set, and the attribute analysis module 520 performs the model training task by using the semantic context information, including: extracting image features of the pedestrian image by using an attribute analysis network, and performing classification prediction on pedestrian attributes in the pedestrian image according to the image features to obtain an attribute prediction result; wherein, the attribute analysis network is a neural network model to be trained; calculating a classification loss according to a difference between the attribute prediction result and a pedestrian attribute label of the pedestrian image, and calculating a semantic loss according to the semantic context information; and calculating fusion loss according to the classification loss and the semantic loss, and updating parameters of the attribute analysis network according to the fusion loss.
In one implementation manner of the pedestrian attribute analysis apparatus 500, the attribute analysis network includes a feature extraction network and an attribute classification network connected to the feature extraction network, and the attribute analysis module 520 extracts an image feature of the pedestrian image by using the attribute analysis network, and performs classification prediction on a pedestrian attribute in the pedestrian image according to the image feature to obtain an attribute prediction result, including: extracting image features of the pedestrian image by using the feature extraction network; based on the image features, carrying out classification prediction on the pedestrian attributes in the pedestrian image by using the attribute classification network to obtain the attribute prediction result; the semantic context information includes individual attribute association, the attribute analysis network further includes an attribute association extraction network connected to the feature extraction network, and the information obtaining module 510 obtains semantic context information of pedestrian attributes in a pedestrian image, including: and based on the image characteristics, extracting individual attribute association of the pedestrian attribute in the pedestrian image by using the attribute association extraction network.
In one implementation of the pedestrian attribute analysis apparatus 500, the semantic context information includes individual attribute association, and the attribute analysis module 520 calculates semantic loss according to the semantic context information, including: calculating the average individual attribute correlation of all pedestrian images in the training batch in which the pedestrian images are located; calculating a first semantic loss of the semantic losses according to a difference of the individual attribute associations and the average individual attribute association.
In one implementation manner of the pedestrian attribute analysis apparatus 500, the semantic context information includes group attribute information, and the information obtaining module 510 obtains the semantic context information of the pedestrian attribute in the pedestrian image, including: and obtaining the group attribute information of the pedestrian attributes in the pedestrian images by counting at least one pedestrian attribute in part or all of the pedestrian images in the training set.
In one implementation of the pedestrian attribute analysis apparatus 500, the semantic context information includes group attribute information, and the attribute analysis module 520 calculates semantic loss according to the semantic context information, including: calculating a group attribute prediction result which represents the pedestrian attribute in the pedestrian image predicted according to the group attribute information; and calculating a second semantic loss in the semantic losses according to the difference between the attribute prediction result and the group attribute prediction result.
In one implementation of the pedestrian attribute analysis apparatus 500, the semantic context information includes spatiotemporal constraint information, and the information obtaining module 510 obtains the semantic context information of the pedestrian attribute in the pedestrian image, including: acquiring the spatiotemporal information corresponding to the pedestrian image from a camera or a monitoring system; and taking the spatiotemporal information as prior constraint, and obtaining the spatiotemporal constraint information of the pedestrian attributes in the pedestrian images by counting at least one pedestrian attribute in part or all of the pedestrian images in the training set.
In one implementation of the pedestrian attribute analysis apparatus 500, the semantic context information includes spatio-temporal information, and the attribute analysis module 520 calculates semantic loss according to the semantic context information, including: calculating a spatiotemporal attribute prediction result, wherein the spatiotemporal attribute prediction result represents the pedestrian attribute in the pedestrian image predicted according to the spatiotemporal constraint information; and calculating a third semantic loss in the semantic losses according to the difference between the attribute prediction result and the space-time attribute prediction result.
In one implementation of the human attribute analysis device 500, the attribute analysis module 520 performs the model inference task using the semantic context information, including: extracting preliminary image features of the pedestrian image by using an attribute analysis network; the attribute analysis network is a trained neural network model, and the preliminary image features are used for carrying out classification prediction on the pedestrian attributes in the pedestrian images; calculating the preliminary image features according to the semantic context information to obtain final image features; and carrying out classification prediction on the pedestrian attributes in the pedestrian image according to the final image features to obtain an attribute prediction result.
In one implementation manner of the pedestrian attribute analysis apparatus 500, the semantic context information includes group attribute information, and the information obtaining module 510 obtains the semantic context information of the pedestrian attribute in the pedestrian image, including: acquiring group attribute information of the pedestrian attributes in the pedestrian images by counting part or all of the pedestrian images in the training set and/or inferred part or all of the pedestrian images; or, the group attribute information of the pedestrian attribute in the pedestrian image is obtained by calculating the existing group attribute information obtained before; the inferred pedestrian images refer to pedestrian images which have been subjected to pedestrian attribute prediction, and the training set refers to a set of pedestrian images used for training the attribute analysis network.
In one implementation of the people attribute analysis device 500, the attribute analysis module 520 performs the post-processing task using the semantic context information, including: acquiring an attribute prediction result output by an attribute analysis network and aiming at the attribute of the pedestrian in the pedestrian image; wherein, the attribute analysis network is a trained neural network model; and correcting the attribute prediction result according to the semantic context information to obtain a corrected attribute prediction result.
The implementation principle and the resulting technical effects of the pedestrian attribute analyzing apparatus 500 provided in the embodiment of the present application have been introduced in the foregoing method embodiments, and for the sake of brief description, reference may be made to corresponding contents in the method embodiments where no part of the embodiment of the apparatus is mentioned.
Fig. 11 shows a possible structure of an electronic device 600 provided in an embodiment of the present application. Referring to fig. 11, the electronic device 600 includes: a processor 610, a memory 620, and a communication interface 630, which are interconnected and in communication with each other via a communication bus 640 and/or other form of connection mechanism (not shown).
The Memory 620 includes one or more (Only one is shown in the figure), which may be, but not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an electrically Erasable Programmable Read-Only Memory (EEPROM), and the like. The processor 610, and possibly other components, may access, read from, and/or write to the memory 620.
The processor 610 includes one or more (only one shown) which may be an integrated circuit chip having signal processing capabilities. The Processor 610 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Micro Control Unit (MCU), a Network Processor (NP), or other conventional processors; the Processor may also be a dedicated Processor, including a Graphics Processing Unit (GPU), a Neural-Network Processing Unit (NPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, and a discrete hardware component. Also, when there are multiple processors 610, some of them may be general-purpose processors, and other parts may be special-purpose processors.
Communication interface 630 includes one or more (only one shown) devices that can be used to communicate, directly or indirectly, with other devices for interaction of data. Communication interface 630 may include an interface for wired and/or wireless communication.
One or more computer program instructions may be stored in the memory 620 and may be read and executed by the processor 610 to implement the model training method and/or the pedestrian attribute analysis method provided by the embodiments of the present application.
It will be appreciated that the configuration shown in FIG. 11 is merely illustrative and that electronic device 600 may include more or fewer components than shown in FIG. 11 or have a different configuration than shown in FIG. 11. The components shown in fig. 11 may be implemented in hardware, software, or a combination thereof. The electronic device 600 may be a physical device, such as a PC, a laptop, a tablet, a cell phone, a server, an embedded device, etc., or may be a virtual device, such as a virtual machine, a virtualized container, etc. The electronic device 600 is not limited to a single device, and may be a combination of a plurality of devices or a cluster including a large number of devices.
The embodiment of the present application further provides a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor of a computer, the computer program instructions execute the model training method and/or the pedestrian attribute analysis method provided in the embodiment of the present application. The computer readable storage medium may be implemented as the memory 620 in the electronic device 600 in fig. 11, for example.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (17)

1. A pedestrian attribute analysis method is characterized by comprising the following steps:
obtaining semantic context information of pedestrian attributes in a pedestrian image;
executing a pedestrian attribute analysis task by utilizing the semantic context information;
wherein the semantic context information is obtained by analyzing the pedestrian image or the associated information thereof, and includes at least one of the following information:
the individual attribute association comprises statistical rules obeyed by the association among various pedestrian attributes of the pedestrian individuals or a specific rule set which is met;
group attribute information including statistical rules to which at least one pedestrian attribute of a pedestrian group is obeyed, or a specific rule set to which the pedestrian attribute is accorded;
the system comprises spatio-temporal constraint information, wherein the spatio-temporal constraint information comprises a statistical rule obeyed by the association between spatio-temporal information corresponding to a pedestrian image and at least one pedestrian attribute or a specific rule set accorded with the spatio-temporal constraint information, and the spatio-temporal constraint information comprises at least one of time, space and scene;
the pedestrian attribute analysis task comprises at least one of the following tasks:
model training tasks: inputting the pedestrian image and the pedestrian attribute label, and training a neural network model for deducing the pedestrian attribute by combining semantic context information of the pedestrian image;
model inference tasks: inputting the pedestrian image, and deducing the pedestrian attribute by utilizing a neural network model in combination with the semantic context information of the pedestrian image;
and (3) post-processing tasks: and inputting an inference result of a neural network model aiming at the pedestrian attribute in the pedestrian image, and correcting the inference result by combining semantic context information of the pedestrian image.
2. The pedestrian attribute analysis method of claim 1, wherein the statistical rules in the individual attribute association comprise:
the method comprises the following steps that an attribute association thermodynamic diagram among various pedestrian attributes of individual pedestrians is provided, and the attribute association thermodynamic diagram comprises numerical values representing the correlation degree of the pedestrian attributes;
or, a correlation coefficient between various pedestrian attributes of individual pedestrians;
or, a conditional probability distribution obeyed between various pedestrian attributes of individual pedestrians;
or, a joint distribution obeyed between various pedestrian attributes of individual pedestrians;
the particular rule set in the individual attribute association includes: at least one rule describing a plurality of pedestrian attributes is constrained by a correlation between the attributes.
3. The pedestrian attribute analysis method according to claim 1, wherein the statistical rules in the group attribute information include:
a statistical distribution to which at least one pedestrian attribute of a pedestrian population is subject;
or, the at least one pedestrian attribute of the pedestrian population has statistics comprising: at least one of mean, variance, covariance, maximum, minimum, high order moment, high order cumulant spectrum;
the specific rule set in the group attribute information comprises: at least one rule describing that the pedestrian attributes are constrained by the population of the attributes themselves.
4. The pedestrian attribute analysis method of claim 1, wherein the statistical rules in the spatiotemporal constraint information include:
statistical distribution obeyed by at least one pedestrian attribute and using the space-time information corresponding to the pedestrian image as prior constraint;
alternatively, the at least one pedestrian attribute has statistics comprising: at least one of mean, variance, covariance, maximum, minimum, high order moment, high order cumulant spectrum;
the specific rule set in the spatiotemporal constraint information includes: at least one rule describing that pedestrian attributes are constrained by spatiotemporal information.
5. The pedestrian attribute analysis method of claim 1, wherein the pedestrian images are training samples in a training set, and the performing the model training task using the semantic context information comprises:
extracting image features of the pedestrian image by using an attribute analysis network, and performing classification prediction on pedestrian attributes in the pedestrian image according to the image features to obtain an attribute prediction result; wherein, the attribute analysis network is a neural network model to be trained;
calculating a classification loss according to a difference between the attribute prediction result and a pedestrian attribute label of the pedestrian image, and calculating a semantic loss according to the semantic context information;
and calculating fusion loss according to the classification loss and the semantic loss, and updating parameters of the attribute analysis network according to the fusion loss.
6. The pedestrian attribute analysis method according to claim 5, wherein the attribute analysis network comprises a feature extraction network and an attribute classification network connected to the feature extraction network, and the using the attribute analysis network extracts image features of the pedestrian image and performs classification prediction on the pedestrian attributes in the pedestrian image according to the image features to obtain an attribute prediction result, includes:
extracting image features of the pedestrian image by using the feature extraction network;
based on the image features, carrying out classification prediction on the pedestrian attributes in the pedestrian image by using the attribute classification network to obtain the attribute prediction result;
the semantic context information comprises individual attribute association, the attribute analysis network further comprises an attribute association extraction network connected with the feature extraction network, and the obtaining of the semantic context information of the pedestrian attribute in the pedestrian image comprises:
and based on the image characteristics, extracting individual attribute association of the pedestrian attribute in the pedestrian image by using the attribute association extraction network.
7. The pedestrian attribute analysis method according to claim 5 or 6, wherein the semantic context information includes individual attribute associations, and the calculating of the semantic loss from the semantic context information includes:
calculating the average individual attribute correlation of all the pedestrian images in the training batch where the pedestrian images are located;
calculating a first semantic loss of the semantic losses according to a difference of the individual attribute associations and the average individual attribute association.
8. The pedestrian attribute analysis method according to claim 5, wherein the semantic context information includes group attribute information, and the obtaining of the semantic context information of the pedestrian attribute in the pedestrian image includes:
and obtaining the group attribute information of the pedestrian attributes in the pedestrian images by counting at least one pedestrian attribute in part or all of the pedestrian images in the training set.
9. The pedestrian attribute analysis method according to claim 5 or 8, wherein the semantic context information includes group attribute information, and the calculating of the semantic loss from the semantic context information includes:
calculating a group attribute prediction result which represents the pedestrian attribute in the pedestrian image predicted according to the group attribute information;
calculating a second semantic loss of the semantic losses according to a difference between the attribute prediction result and the population attribute prediction result.
10. The pedestrian attribute analysis method according to claim 5, wherein the semantic context information includes spatiotemporal constraint information, and the obtaining of the semantic context information of the pedestrian attribute in the pedestrian image includes:
acquiring space-time information corresponding to the pedestrian image from a camera or a monitoring system;
and taking the spatiotemporal information as prior constraint, and obtaining the spatiotemporal constraint information of the pedestrian attributes in the pedestrian images by counting at least one pedestrian attribute in part of or all of the pedestrian images in the training set.
11. The pedestrian attribute analysis method according to claim 5 or 10, wherein the semantic context information includes spatiotemporal information, and the calculating of the semantic loss from the semantic context information includes:
calculating a spatiotemporal attribute prediction result, wherein the spatiotemporal attribute prediction result represents the pedestrian attribute in the pedestrian image predicted according to the spatiotemporal constraint information;
and calculating a third semantic loss in the semantic losses according to the difference between the attribute prediction result and the space-time attribute prediction result.
12. The pedestrian attribute analysis method of claim 1, wherein performing the model inference task using the semantic context information comprises:
extracting preliminary image features of the pedestrian image by using an attribute analysis network; the attribute analysis network is a trained neural network model, and the preliminary image features are used for carrying out classification prediction on the pedestrian attributes in the pedestrian images;
calculating the preliminary image features according to the semantic context information to obtain final image features;
and carrying out classification prediction on the pedestrian attributes in the pedestrian image according to the final image features to obtain an attribute prediction result.
13. The pedestrian attribute analysis method according to claim 12, wherein the semantic context information includes group attribute information, and the obtaining of the semantic context information of the pedestrian attribute in the pedestrian image includes:
obtaining the group attribute information of the pedestrian attribute in the pedestrian image by counting part or all of the pedestrian images in the training set and/or inferred part or all of the pedestrian images; alternatively, the first and second electrodes may be,
the method comprises the steps of calculating existing group attribute information obtained in the past to obtain group attribute information of pedestrian attributes in a pedestrian image;
the inferred pedestrian images refer to pedestrian images which have been subjected to pedestrian attribute prediction, and the training set refers to a set of pedestrian images used for training the attribute analysis network.
14. The pedestrian attribute analysis method of claim 1, wherein performing the post-processing task using the semantic context information comprises:
acquiring an attribute prediction result output by an attribute analysis network and aiming at the attribute of the pedestrian in the pedestrian image; wherein, the attribute analysis network is a trained neural network model;
and correcting the attribute prediction result according to the semantic context information to obtain a corrected attribute prediction result.
15. A pedestrian property analysis apparatus characterized by comprising:
the information acquisition module is used for acquiring semantic context information of pedestrian attributes in the pedestrian image;
the attribute analysis module is used for executing a pedestrian attribute analysis task by utilizing the semantic context information;
wherein the semantic context information is obtained by analyzing the pedestrian image or the associated information thereof, and includes at least one of the following information:
the individual attribute association comprises statistical rules obeyed by the association among various pedestrian attributes of the pedestrian individuals or a specific rule set which is met;
the group attribute information comprises statistical rules obeyed by at least one pedestrian attribute of the pedestrian group or a specific rule set which is met;
the system comprises spatio-temporal constraint information, wherein the spatio-temporal constraint information comprises a statistical rule obeyed by the association between spatio-temporal information corresponding to a pedestrian image and at least one pedestrian attribute or a specific rule set accorded with the spatio-temporal constraint information, and the spatio-temporal constraint information comprises at least one of time, space and scene;
the pedestrian attribute analysis task comprises at least one of the following tasks:
model training tasks: inputting the pedestrian image and the pedestrian attribute label, and training a neural network model for deducing the pedestrian attribute by combining the semantic context information of the pedestrian image;
model inference tasks: inputting the pedestrian image, and deducing the pedestrian attribute by utilizing a neural network model in combination with the semantic context information of the pedestrian image;
and (3) post-processing tasks: and inputting an inference result of a neural network model aiming at the pedestrian attribute in the pedestrian image, and correcting the inference result by combining semantic context information of the pedestrian image.
16. A computer-readable storage medium, having stored thereon computer program instructions, which when read and executed by a processor, perform the method of any one of claims 1-14.
17. An electronic device comprising a memory and a processor, the memory having stored therein computer program instructions that, when read and executed by the processor, perform the method of any of claims 1-14.
CN202211589015.7A 2022-12-07 2022-12-07 Pedestrian attribute analysis method and device, storage medium and electronic equipment Pending CN115909409A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211589015.7A CN115909409A (en) 2022-12-07 2022-12-07 Pedestrian attribute analysis method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211589015.7A CN115909409A (en) 2022-12-07 2022-12-07 Pedestrian attribute analysis method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN115909409A true CN115909409A (en) 2023-04-04

Family

ID=86481309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211589015.7A Pending CN115909409A (en) 2022-12-07 2022-12-07 Pedestrian attribute analysis method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115909409A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116934557A (en) * 2023-09-15 2023-10-24 中关村科学城城市大脑股份有限公司 Behavior prediction information generation method, device, electronic equipment and readable medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116934557A (en) * 2023-09-15 2023-10-24 中关村科学城城市大脑股份有限公司 Behavior prediction information generation method, device, electronic equipment and readable medium
CN116934557B (en) * 2023-09-15 2023-12-01 中关村科学城城市大脑股份有限公司 Behavior prediction information generation method, device, electronic equipment and readable medium

Similar Documents

Publication Publication Date Title
CN110532996B (en) Video classification method, information processing method and server
US9940522B2 (en) Systems and methods for identifying activities and/or events in media contents based on object data and scene data
CN110555481A (en) Portrait style identification method and device and computer readable storage medium
CN107818307B (en) Multi-label video event detection method based on LSTM network
CN110765863B (en) Target clustering method and system based on space-time constraint
Inoue et al. Sequential fuzzy cluster extraction by a graph spectral method
CN110956122A (en) Image processing method and device, processor, electronic device and storage medium
KR20210049717A (en) Image processing method and apparatus, processor, storage medium
Ge et al. Dynamic background estimation and complementary learning for pixel-wise foreground/background segmentation
US20210117687A1 (en) Image processing method, image processing device, and storage medium
Xu et al. A robust background initialization algorithm with superpixel motion detection
CN109389076B (en) Image segmentation method and device
Grigorev et al. Depth estimation from single monocular images using deep hybrid network
CN112668366A (en) Image recognition method, image recognition device, computer-readable storage medium and chip
CN116416416A (en) Training method of virtual fitting model, virtual fitting method and electronic equipment
CN115909409A (en) Pedestrian attribute analysis method and device, storage medium and electronic equipment
CN115359566A (en) Human behavior identification method, device and equipment based on key points and optical flow
Li et al. A deep learning framework for autonomous flame detection
CN111177460B (en) Method and device for extracting key frame
CN115346272A (en) Real-time tumble detection method based on depth image sequence
CN111291785A (en) Target detection method, device, equipment and storage medium
Qi et al. Saliency detection via joint modeling global shape and local consistency
Haq et al. Implementation of CNN for plant identification using UAV imagery
KR102323861B1 (en) System for selling clothing online
CN113903063A (en) Facial expression recognition method and system based on deep spatiotemporal network decision fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination