CN117152787A

CN117152787A - Character clothing recognition method, device, equipment and readable storage medium

Info

Publication number: CN117152787A
Application number: CN202210542463.5A
Authority: CN
Inventors: 王�忠; 古川南
Original assignee: Entropy Technology Co Ltd
Current assignee: Entropy Technology Co Ltd
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2023-12-01

Abstract

The application discloses a character clothing identification method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a figure picture to be identified; acquiring a garment identification model after training, wherein the garment identification model comprises a feature extraction layer, a dense connection network and a feature analysis network; inputting the figure picture to be identified into the clothing identification model, carrying out feature extraction on the input figure picture to be identified through the feature extraction layer to generate a first feature map, carrying out feature multiplexing on the first feature map through the dense connection network to generate a second feature map, analyzing the second feature map through the feature analysis network, and determining the clothing identification result of the figure to be identified. The character clothing recognition method adopts the characteristic to connect on the channel dimension to realize the characteristic reuse, and the mode can realize higher recognition efficiency and recognition accuracy under the condition of less parameters and calculation cost.

Description

Character clothing recognition method, device, equipment and readable storage medium

Technical Field

The present application relates to the field of intelligent recognition, and more particularly, to a character clothing recognition method, apparatus, device, and readable storage medium.

Background

Character apparel identification refers to the process of identifying biological properties of a target character from an image, which may include the sex, age, etc. of the target character. Along with the popularization of the monitoring video, how to accurately and effectively utilize the character information in the video and to mine the character clothing related attribute have great value. For example, in the process of performing the sky and eye touch search for the target person by the auxiliary public security organization and determining the position of the person, the person same as the target person garment can be determined according to the person garment recognition result of the people coming and going in each area so as to determine the position of the target person.

The traditional recognition model used for recognizing the existing personage clothing is obtained by stacking a series of convolution layers and downsampling layers, and the calculated amount of the server for recognizing the biological attribute through the traditional recognition model is large, and the recognition efficiency and accuracy are low.

Based on the above, there is a need for a character clothing recognition scheme to reduce recognition calculation and improve recognition efficiency and recognition accuracy.

Disclosure of Invention

In view of this, the present application provides a character clothing recognition method, apparatus, device, and readable storage medium to improve recognition efficiency and recognition accuracy of biological properties of a target character.

A character garment identification method comprising:

acquiring a figure picture to be identified;

acquiring a garment identification model after training, wherein the garment identification model comprises a feature extraction layer, a dense connection network and a feature analysis network;

inputting the figure picture to be identified into the clothing identification model, carrying out feature extraction on the input figure picture to be identified through the feature extraction layer to generate a first feature map, carrying out feature multiplexing on the first feature map through the dense connection network to generate a second feature map, analyzing the second feature map through the feature analysis network, and determining the clothing identification result of the figure to be identified.

Preferably, the dense connection network is composed of a plurality of dense blocks and a plurality of transition layers, wherein the dense blocks and the transition layers adopt dense connection, the feature graphs of each layer are the same in size, the dense blocks are used for carrying out batch normalization, activation and convolution processing according to a predefined connection mode of input and output on a channel dimension, and the transition layers are used for controlling the number of channels.

Preferably, the characteristic analysis network comprises a compression processing network, an index analysis network and a result determination layer;

analyzing the second feature map through the feature analysis network, and determining a clothing recognition result of the person to be recognized, wherein the process comprises the following steps:

Performing dimension reduction processing and dimension compression on the input second feature map through the compression processing network to generate a third feature map;

carrying out global average pooling processing and normalization index processing on the third characteristic diagram through the index analysis network to obtain clothing information and predicted index values corresponding to each piece of clothing information;

and determining a clothing recognition result of the person to be recognized according to the clothing information and the corresponding predicted index value by using the result determination layer.

Preferably, the feature extraction layer comprises a convolution layer and a maximum pooling layer;

the process for extracting the characteristics of the input character picture to be identified through the characteristic extraction layer comprises the following steps:

carrying out convolution operation on the input figure picture to be identified through the convolution layer, and extracting to obtain picture characteristic information;

and carrying out maximum pooling processing on the picture characteristic information through the maximum processing layer to generate a first characteristic diagram.

Preferably, the exponential analysis network comprises a first convolution block, a maximum pooling layer, a second convolution block, a global average pooling layer, a third convolution block and a normalization processing layer which are sequentially connected.

Preferably, the obtaining the figure picture to be identified includes:

extracting a plurality of frames to be identified at different moments from the video to be identified;

detecting a person region to be identified on the frame to be identified;

and performing size transformation, mean removal and standard normalization processing on the frame to be identified based on the detected person region to be identified, and generating a person picture to be identified.

Preferably, the process of training the clothing recognition model comprises the following steps:

acquiring a training character picture, wherein the training character picture is marked with corresponding training clothes information;

inputting the training character picture into a preset basic recognition model to obtain a clothing recognition result of the training character, which is output by the basic recognition model;

training the basic recognition model by taking the clothing recognition result of the training character as a target and consistent training clothing information marked by the training character picture;

and when the basic recognition model meets preset training conditions, taking the basic recognition model after training as a clothing recognition model.

A character garment recognition device, comprising:

the image acquisition module is used for acquiring the image of the person to be identified;

the system comprises a model acquisition module, a model analysis module and a model analysis module, wherein the model acquisition module is used for acquiring a garment identification model after training, and the garment identification model comprises a feature extraction layer, a dense connection network and a feature analysis network;

The clothing recognition module is used for inputting the person picture to be recognized into the clothing recognition model, extracting features of the input person picture to be recognized through the feature extraction layer, generating a first feature map, carrying out feature multiplexing on the first feature map through the dense connection network, generating a second feature map, analyzing the second feature map through the feature analysis network, and determining clothing recognition results of the person to be recognized.

A character clothing recognition device comprising a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of the character clothing identification method described above.

A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a character garment recognition method as described above.

According to the technical scheme, the character clothing recognition method, the device, the equipment and the readable storage medium provided by the embodiment of the application are used for obtaining the character picture to be recognized and training a finished clothing recognition model, wherein the clothing recognition model comprises a feature extraction layer, a dense connection network and a feature analysis network. Inputting the character picture to be identified into the clothing identification model, carrying out feature extraction on the input character picture to be identified through the feature extraction layer to generate a first feature picture, carrying out feature multiplexing on the first feature picture through the dense connection network to generate a second feature picture, and analyzing the second feature picture through the feature analysis network to finally obtain the clothing identification result of the character to be identified in the character picture to be identified.

Because the clothing recognition model consists of the feature extraction layer, the dense connection network and the feature analysis network, wherein the dense connection network adopts features to connect in the channel dimension to realize feature reuse, namely all layers are connected with each other, specifically, each layer receives feature graphs from different layers as additional input to realize feature reuse, the mode can realize higher recognition efficiency and recognition accuracy under the condition of less parameters and calculation cost, and compared with the prior art, the recognition efficiency and recognition accuracy are higher.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a character clothing recognition method disclosed by the application;

FIG. 2 is a schematic diagram of a character clothing recognition model according to the present disclosure;

FIG. 3 is a schematic structural diagram of a feature extraction layer according to the present disclosure;

FIG. 4 is a schematic diagram of a dense connectivity network according to the present disclosure;

FIG. 5 is a schematic diagram of a feature analysis network according to the present disclosure;

FIG. 6 is a block diagram of a character garment recognition device according to the present application;

fig. 7 is a block diagram of a hardware configuration of the character clothing recognition apparatus disclosed in the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The following technical scheme is presented in the following description, and the specific reference is made to the following.

Fig. 1 is a flowchart of a method for identifying a person garment according to an embodiment of the present application, where the method for identifying a person garment may be applied to a device for identifying a person garment, as shown in fig. 1, and the method may include:

And S1, acquiring a figure picture to be identified.

Specifically, when the figure picture to be identified is obtained, the target figure to be identified exists in the figure picture to be identified, and the clothes comprise all accessories and all clothes, such as shoes, hats, glasses, clothes, backpacks and the like, and it is understood that in the process of identifying the clothes of the target figure, whether the target figure wears or wears corresponding articles or not can be identified, and the characteristics of the clothes worn by the target figure, such as the length characteristics, the color characteristics, the material characteristics and the like, can be identified, such as whether the trousers of the target figure are trousers or shorts, whether the glasses worn by the target figure are black or white, whether the jackets of the target figure are woolen or chiffon, and the like.

The character clothing recognition method provided by the application can assist the character searching work of the public security organization, for example, the public security organization can use the character clothing recognition method to perform character clothing recognition on the flowing personnel in the public area in the searching process of a certain target person so as to determine the character matched with the known target character clothing information, further determine the position information or the movement information of the target character and assist in realizing the searching of the target character.

When the person picture to be identified is acquired, an indication mark is required to be set in a specific public place and area for application, so that people can be reminded of acquiring the person picture of the person and incorporating the person picture into the biological attribute statistics of the area when the person goes in and out of the public place and area, and further, the biological attribute of the person can be identified legally under the condition that the user authorization is obtained, so that the person clothing identification method is realized.

The to-be-identified person picture may be a to-be-identified person picture obtained by obtaining a video frame picture of a video recorded by a camera, or a photo obtained by capturing the video frame picture may be a to-be-identified person picture, or an image of a target person, which needs to be identified by person clothing, obtained by other means.

It can be understood that when the video frame picture of the video recorded by the camera is obtained as the figure picture to be identified, the video frame picture can be a picture of a certain video frame of the video recorded in real time or pre-recorded in a certain public area or place. Multiple persons may simultaneously be present in the video frame, each of which may be a target person for which person apparel identification is desired.

The application can include the following steps in the process of obtaining the figure picture to be identified:

(1) and extracting a plurality of frames to be identified at different moments from the video to be identified.

(2) And detecting the character region to be identified on the frame to be identified.

Specifically, pictures can be cut out from the video to be identified at regular time intervals and stored, namely, a plurality of frames to be identified at different moments are extracted, characters appearing in the frames to be identified are detected, namely, the frames to be identified are subjected to the detection of the areas of the characters to be identified, and the areas of the characters to be identified are cut out and stored as pictures.

(3) And performing size transformation, mean removal and standard normalization processing on the frame to be identified based on the detected person region to be identified, and generating a person picture to be identified.

Specifically, the image is processed based on the detected character region to be identified, and a character picture to be identified is generated, which includes but is not limited to size transformation, mean removal, standard normalization and other processes. The method aims to remove factors affecting the precision and accuracy of subsequent processing in the image, so that the character image to be identified accords with the set rule better than the original image, the subsequent processing is facilitated, the subsequent operation amount is reduced, the convergence is accelerated, and the reliability of the subsequent identification is improved.

The size transformation refers to adjusting the image size to be in line with the size input by the network, including, for example, scaling, using cv.resize (), the scaling generally calculates a length-width scaling, and then scales the target frame size by using a difference value, and using cv.resize () can accurately convert the image size to the target size by using an interpolation method.

The mean value removing process is to reduce the mean value of the corresponding dimension for each dimension of the image, and the fitting property is improved. In the neural network, when the characteristic value is large, the gradient dissipation problem is easy to cause when the back propagation is carried out, so that the parameter change amount is small, the fitting is difficult, and the effect is poor. The average value of the corresponding dimension is subtracted from each dimension of the image by the average value removing process, so that each dimension of the image is centered to 0, and the subsequent fitting property is improved.

The standard normalization process refers to a process of performing a series of standard processing transformations on the image to transform the image into a fixed standard form, and controlling the scale of each feature of the image within the same range, so that an optimal solution can be found conveniently, and the convergence efficiency is improved.

In addition, in order to obtain a more accurate recognition result, the character picture to be recognized can be subjected to secondary processing, wherein the secondary processing includes, but is not limited to, sharpening processing and/or denoising processing on the image.

The sharpening process is to compensate the boundary and outline of the image, the difference of the objects in the image is that the brightness values of the image are different, and the change is very large on the boundary, so the sharpening aim is to highlight the details of the objects or enhance the details of the blurred image, and the sharpening process can improve the definition of the image by highlighting the edges, the outline of the person on the image or the characteristics of certain linear target elements.

The denoising process refers to a process of reducing noise in a digital image, wherein the character images to be identified are all digital images, and the digital images can be polluted by noise in the processes of acquisition and transmission, and common noise mainly comprises Gaussian noise and salt and pepper noise. The Gaussian noise is mainly generated in the camera sensor component, and the spiced salt noise is mainly generated by black-white interphase bright and dark spot noise generated by image cutting. The noise becomes an important cause of the interference of the character picture to be identified, and the authenticity and accuracy of the character picture to be identified can be improved by denoising the character picture to be identified.

And S2, acquiring a garment recognition model after training.

Specifically, the trained clothing recognition model comprises a feature extraction layer, a dense connection network and a feature analysis network. After obtaining the figure picture to be identified containing the target figure needing to be identified by the figure clothing, the figure picture to be identified can be input into a figure clothing identification model, the figure clothing identification model can analyze and identify the biological attribute of the target figure in the figure picture to be identified, and a corresponding figure clothing identification result is output.

Fig. 2 is a schematic structural diagram of a character clothing recognition model disclosed in the application, wherein the character clothing recognition model comprises a feature extraction layer a, a dense connection network B and a feature analysis network C. The dense connection network is a convolutional neural network with a deeper layer, has the characteristics of small parameter quantity, heavy reinforcement characteristics and easy training, and can simultaneously relieve the problems of gradient disappearance and model degradation. The dense connection network contains a plurality of dense blocks, wherein for any dense block, any two layers in the dense block are directly connected, that is, the input of each layer of the network is the union of the outputs of all the previous layers, and the learned feature map of the layer is directly transmitted to all the layers behind the dense block as the input. Through dense connection, the gradient vanishing problem is relieved, feature propagation is enhanced, feature multiplexing is encouraged, and parameter quantity is greatly reduced.

It can be understood that in order to further shorten the detection time and improve the detection efficiency, the application can pre-train a preset basic recognition model based on the training character picture marked with the corresponding training clothes information, and determine the basic recognition model after training as a clothes recognition model. When the character clothing is required to be identified, the character clothing identification model can be directly acquired and used, and a great amount of training time is not required to be wasted. The basic recognition model also comprises a feature extraction layer, a dense connection network and a feature analysis network, and aims at the consistency of the character clothing recognition result of the sample character and the biological attribute of the marked sample character, and in the training process, various parameters of each network structure are continuously adjusted and corrected.

Furthermore, the application adopts cross entropy as a loss function to determine the loss value during model training, and the cross entropy can measure the difference degree of two different probability distributions in the same random variable, and is expressed as the difference between the real probability distribution and the predicted probability distribution in machine learning. The smaller the value of the cross entropy, namely the smaller the loss function value, the better the model prediction effect is, and the more accurate the analysis prediction result is. And when the basic recognition model starts to be trained by combining the cross entropy loss function, the cross entropy can be used as the loss function to avoid gradient dispersion when gradient descent calculation is performed, so that the learning rate is reduced and other problems are avoided.

And S3, inputting the figure picture to be identified into the clothing identification model, and determining the clothing identification result of the figure to be identified.

Specifically, the character picture to be identified is input into the character clothing identification model for forward propagation, the character clothing identification model analyzes clothing corresponding to a target character in the character picture to be identified, the character extraction layer in the character clothing identification model is used for extracting characteristics of the input character picture to be identified to generate a first characteristic picture, the dense connection network in the character clothing identification model is used for carrying out characteristic multiplexing on the first characteristic picture to generate a second characteristic picture, the characteristic analysis network in the character clothing identification model is used for analyzing the second characteristic picture, and the clothing identification result of the character to be identified can be finally determined.

In some embodiments of the present application, the structural functions of the character clothing recognition model of the present application are specifically described, and the character clothing recognition model includes a feature extraction layer a, a dense connection network B, and a feature analysis network C. The feature extraction layer a, the dense connection network B, and the feature analysis network C are specifically described in order below with reference to fig. 3 to 5.

Feature extraction layer a:

alternatively, as shown in fig. 3, the feature extraction layer a includes a convolution layer A1 and a max-pooling layer A2.

On the basis, the process of extracting the characteristics of the input character picture to be identified through the characteristic extraction layer comprises the following steps:

(1) and carrying out convolution operation on the input figure picture to be identified through the convolution layer A1, and extracting to obtain picture characteristic information.

Specifically, the function of the convolution layer A1 is to perform convolution operation on an input figure picture to be identified, so as to extract the figure picture to be identified, and the parameters of the convolution layer A1 comprise the size of a convolution kernel, the step length and filling, so that the size of the feature extraction layer output feature picture is determined together. Where the convolution kernel size may be specified as any value less than the input image size, the larger the convolution kernel, the more complex the extractable input feature. The preferred setting mode provided by the application is that the convolution kernel size of the convolution layer A1 is 3*3, the step length is 2 and the filling is 1.

(2) And carrying out maximum pooling processing on the picture characteristic information through the maximum pooling layer A2 to generate a first characteristic diagram.

Specifically, the maximum pooling layer A2 performs maximum pooling processing on the picture feature information to generate a first feature map, and the maximum pooling processing makes the obtained feature data more sensitive to texture information by taking the maximum value of all pixel values of a pooling area, so that the statistical features after pooling have much lower dimensionality, and meanwhile, the training result is not easy to be overfitted. An alternative arrangement is for the pooling layer to have a convolution kernel size of 2 x 2 and a step size of 2.

Dense connectivity network B:

optionally, as shown in fig. 4, the dense connection network B is composed of a plurality of dense blocks and a plurality of transition layers, where the dense blocks are used for performing batch normalization, activation and convolution processing according to a predefined connection mode of input and output on a channel dimension, and the transition layers are used for controlling the number of channels.

Specifically, for any dense block, a 'batch normalization, activation and convolution' structure is used, the dense block is composed of a plurality of conv_blocks, each block uses the same output channel number, but in forward calculation, the input and output of each block are connected in channel dimension, and network parameters are reduced. Since each dense block brings about an increase in the number of channels, too many uses bring about too complex models, and therefore a transition layer is used to control the number of channels after dense blocks. The transition layer is provided with a convolution layer and a maximum pooling layer, the number of channels is reduced through the 1 multiplied by 1 convolution layer, and the maximum pooling layer with the step length of 2 is used for reducing the half height and the width, so that the complexity of a model is further reduced.

An alternative arrangement, for any dense block, comprises six layers, respectively: the first layer is a convolution layer, the convolution kernel size is 1*1, the step length is 1, and the filling is 0; the second layer is a convolution layer, the convolution kernel size is 3*3, the step length is 1, and the filling is 1; the third layer is a concat layer, and the output result of the pooling processing layer and the output result of the convolution layer of the second layer are combined; the fourth layer is a convolution layer, the convolution kernel size is 1*1, the step length is 1, and the filling is 0; the fifth layer is a convolution layer, the convolution kernel size is 3*3, the step length is 1, and the filling is 1; the sixth layer is a concat layer, and the result output by the third layer concat layer and the result output by the fifth layer convolution layer are combined.

For any transition layer, two layers are included, respectively: the first layer is a convolution layer, the convolution kernel size is 1*1, the step length is 1, and the filling is 0; the second layer is the largest pooling layer, the convolution kernel size is 2 x 2, and the step size is 2.

Feature analysis network C:

alternatively, as shown in fig. 5, for the feature analysis network C, it may include: a compression processing network C1, an exponential analysis network C2, and a result determination layer C3.

On the basis, the process of analyzing the second feature map through the feature analysis network to determine the clothing recognition result of the person to be recognized specifically can include:

(1) And performing dimension reduction processing and dimension compression on the input second feature map through the compression processing network C1 to generate a third feature map.

Specifically, the compression processing network includes two layers, namely: the first layer is a convolution layer and the second layer is a maximum pooling layer. The convolution layer is the convolution layer of the convolution kernel 1*1, and when the convolution kernels of the feature images and 1*1 are convolved, only the current pixel is needed to be considered, and surrounding pixel values are not needed to be considered, so that the convolution layer with the convolution kernel 1*1 can be used for adjusting the channel number of the feature images, and the pixel points on different channels are linearly combined, so that the dimension reduction of the feature images can be realized, and the effect of reducing the parameter number is achieved while the depth of the feature images is flexibly controlled.

The maximum pooling layer performs further dimension reduction on the information extracted by the convolution layer, removes redundant information, compresses the features, reduces the calculated amount and the memory consumption, and can strengthen the invariance of the image features and increase the robustness in the aspects of image offset, rotation and the like.

In the application, a convolution layer and a maximum pooling layer of a compression processing network sequentially perform dimension reduction processing and dimension compression processing on the second feature map, so as to generate a third feature map.

(2) And carrying out global average pooling processing and normalization index processing on the third characteristic diagram through the index analysis network C2 to obtain clothing information and predicted index values corresponding to each piece of clothing information.

Specifically, the exponential analysis network C2 includes four layers, which are a first convolution layer, a global average pooling layer, a second convolution layer, and a normalization processing layer, which are sequentially connected.

The convolution layers are all convolution layers of the convolution kernel 1*1 and are used for adjusting the channel number of the feature map, and the dimension reduction of the feature map can be realized by linearly combining pixel points on different channels. In the application, the convolution layer of the exponential analysis network is used for performing dimension reduction processing on the feature map.

The global averaging pooling layer is used for calculating an average value for all pixels of the feature map of each output channel, and representing the corresponding feature map by the value. If global average pooling is performed on a whole picture of each feature map, thus each feature map can obtain an output, and the global average pooling is adopted, so that network parameters can be greatly reduced, overfitting is avoided, in addition, each feature map is equivalent to an output feature, and then the feature represents the feature of the output class. The global average pooling sums the space information, so that the input space transformation is more stable, the convolution structure is simpler by enhancing the consistency of the feature map and the category, parameter optimization is not needed, and the condition of overfitting is avoided.

The normalization process layer "compresses" a k-dimensional vector z containing arbitrary real numbers into another k-dimensional real vector σ (z) using a normalized exponential function such that each element ranges between (0, 1) and the sum of all elements is 1. Normalized exponential functions are often used in multi-classification problems in order to reveal the multi-classification results in the form of probabilities. In the application, the exponential analysis network carries out global average pooling processing and normalization exponential processing on the third feature map, and finally obtains the predicted index values of various biological attributes, namely the probability values of accuracy of the corresponding standard predicted results of the various biological attributes.

(3) And determining a clothing recognition result of the person to be recognized according to the clothing information and the corresponding predicted index value by utilizing the result determination layer C3.

Specifically, the result determining layer determines and returns the finally obtained character clothing recognition result of the target character by combining the preset prediction index filtering value and the prediction index values of the various clothing attributes. The application judges the character clothing recognition result of the final target character based on the predicted index value obtained by recognition and the preset predicted index filtering value. When the predictive index of the identification result of the person clothing identification model is higher, namely, is larger than a preset predictive index filtering value, the current clothing identification result of the target person can be used as the final clothing identification result output by the model, otherwise, the current clothing identification result of the target person is uncertain.

The setting of the predictive index filtering value is set according to an empirical value or is obtained by statistics of a large number of data, in practical application, the predictive index filtering value preset for different attribute items of different clothes is different, the lower the predictive index filtering value is, the lower the precision is, the higher the recall is, and in practical application, the predictive index filtering value can be set according to the practical application requirement, namely, the higher the required precision is, the higher the threshold value is set. It can be understood that the lower the set prediction index filtering value is, the lower the precision is, the higher the recall is, and in actual use, the prediction index filtering value can be set according to the actual application requirement, that is, the higher the required precision is, the higher the threshold is set.

Taking whether a person in the picture to be identified wears the hat as an example, according to a set prediction index filtering value, the sex after filtering is one of wearing the hat, not wearing the hat and being uncertain. If the predicted index filtering value of whether the cap is worn is set to be 0.6, when the predicted index of the clothes recognition result obtained by analyzing the character clothes recognition model is greater than or equal to 0.6, the sex recognition result finally output by the character clothes recognition model is the sex recognition result obtained by analyzing before, if the target character obtained by analyzing is worn by the cap, the corresponding predicted index is 0.7, and the finally output clothes recognition result is the target character worn by the cap; when the predictive index of the clothes recognition result obtained by analyzing the character clothes recognition model is smaller than 0.6, the clothes recognition result finally output by the character clothes recognition model is uncertain, if the clothes recognition result obtained by analyzing is that the hat is not worn, the corresponding predictive index is smaller than 0.55, and the clothes recognition result finally output is uncertain.

Tables 1 and 2 are exemplary network structure parameter tables of two alternative character clothing recognition models, wherein the character clothing recognition model corresponding to table 1 recognizes whether the target character wears the hat, and the character clothing recognition model corresponding to table 2 recognizes the cuff length of the target character.

/>

TABLE 1

As shown in table 1, the output feature map size of each layer and the relevant parameters of each layer are recorded, in this example, the dense connection network has 4 dense blocks and 2 layers of transition layers, the number of basic channels of each dense block increases as the size becomes smaller, and the feature map size of the corresponding object person input by the network for identifying whether the object person wears a hat is 96×96.

For each dense module, there are 4 convolutional layers and 2 concat layers, respectively. Wherein, for each dense block's concat layer, the convolution results of the previous layer of the concat layer in the dense block where itself is located, and the final results of the previous layer of the connection of the dense block where it is located are combined, and new results are output.

In the dense connection network shown in table 1, the concat layer of the third layer in the dense block B1 combines the output of the feature extraction layer A2 with the output of the convolution layer of the third layer in the dense block B1, and the output dimension is 32×24×24. And combining the result output by the third layer which is the concat layer and the result output by the fifth layer convolution in the dense block B1 by the concat layer of the sixth layer in the dense block B1, wherein the output dimension is 40 x 24. The third layer in the dense block B3 is a concat layer, and the output of the second layer maximum pooling layer of the transition layer B2 and the output result of the convolution layer of the second layer in the dense block B3 are combined, and the output dimension is 32×24×24. And combining the result of the output of the dense block B3 layer which is the dense layer and the result of the convolution output of the dense block B3 fifth layer by the dense block B3 layer, wherein the dimension is 32 x 12. And the concat layer of the third layer in the dense block B4 combines the output result of the concat layer of the sixth layer in the dense block B3 with the output result of the convolution layer of the second layer in the dense block B4, and the output dimension is 32×24×24. And combining the result of the output of the third layer of the dense block B4 which is the concat layer and the result of the convolution output of the fifth layer of the dense block B4 by the concat layer of the sixth layer of the dense block B4, wherein the dimension is 32 x 12. The third layer in the dense block B6 is a concat layer, and the output dimension is 56×6×6 by combining the output of the maximum pooling layer of the second layer in the transition layer B5 and the output result of the convolution layer of the second layer in the dense block B6. And combining the result of the output of the dense block B6 layer which is the dense block B6 layer and the result of the convolution output of the dense block B6 fifth layer by the dense block B6 layer, wherein the dimension is 64 x 6.

/>

TABLE 2

As shown in table 2, the output feature map size of each layer and the relevant parameters of each layer are described, in this example, the dense connection network has a total of 4 dense blocks and 2 layers of transition layers, the number of basic channels of each dense block increases as the size becomes smaller, and the feature map size of the network input corresponding to the cuff length recognition of the target person is 160×128.

In the dense connection network shown in table 2, the concat layer of the third layer in the dense block B1 combines the output of the feature extraction layer A2 and the output result of the convolution layer of the third layer in the dense block B1, and the output dimension is 32×40×32. And combining the result output by the third layer which is the concat layer and the result output by the fifth layer convolution in the dense block B1 by the concat layer of the sixth layer in the dense block B1, wherein the output dimension is 40 x 32. The third layer in the dense block B3 is a concat layer, and the output dimension of the second layer maximum pooling layer output of the transition layer B2 and the convolution layer output result of the second layer in the dense block B3 are combined, wherein the output dimension is 40×20×16. And combining the result of the output of the dense block B3 layer which is the dense layer and the result of the convolution output of the dense block B3 fifth layer by the dense block B3 layer, wherein the dimension is 48 x 20 x 16. And the concat layer of the third layer in the dense block B4 combines the output result of the concat layer of the sixth layer in the dense block B3 with the output result of the convolution layer of the second layer in the dense block B4, and the output dimension is 56×20×16. And combining the result of the output of the third layer of the dense block B4 which is the concat layer and the result of the convolution output of the fifth layer of the dense block B4 by the concat layer of the sixth layer of the dense block B4, wherein the dimension is 64 x 20 x 16. The third layer in the dense block B6 is a concat layer, and the output dimension is 56×10×8 by combining the output of the second layer maximum pooling layer of the transition layer B5 and the output result of the convolution layer of the second layer in the dense block B6. And combining the result of the output of the dense block B6 layer which is the dense block B6 layer and the result of the convolution output of the dense block B6 fifth layer by the dense block B6 layer, wherein the dimension is 64 x 10 x 8.

The above embodiments are used for describing the process of the character clothing recognition method in the present application, and the process of training the clothing recognition model in the present application will be described below.

In some embodiments of the application, the process of training the apparel identification model may include:

the first step, a training character picture is obtained, and the training character picture is marked with corresponding training clothes information.

Specifically, the training character picture in the application can be a pre-configured image in the training character picture set, or can be a picture collected by the camera equipment or a picture obtained by capturing a video frame for preprocessing, and the picture of character clothing information is correspondingly marked, and the process can be completed by combining intelligent pre-marking and manual marking. The method comprises the steps of collecting video streams through camera equipment, intercepting and storing the video streams into pictures at regular intervals, detecting people appearing in the pictures, intercepting and storing single people into pictures, and labeling the single people with corresponding training clothes information.

And secondly, inputting the training character picture into a preset basic recognition model to obtain a clothing recognition result of the training character, which is output by the basic recognition model.

Thirdly, training the basic recognition model by taking the clothing recognition result of the training character as a target and the training clothing information marked by the training character picture.

Specifically, after a large number of training character pictures are obtained, the training character pictures can be input into a preset basic recognition model for training, and the cross entropy is used as a loss function to determine a loss value during model training. And training by taking the character clothing recognition result of the training character and the marked training clothing information of the training character as targets. In the training process, various parameters of each network structure are continuously adjusted and corrected.

In the present application, when training the basic recognition model, the process of continuously correcting each parameter of the basic recognition model by the training clothes information of the training characters in the training character picture mark may specifically include:

(1) parameters of the random basic recognition model, wherein the parameters can comprise parameters of a loss function, network structure parameters and the like, and the loss function is mainly used for correcting the basic recognition model;

(2) inputting training clothes information of training characters in the training character pictures to a basic identification model for loss function calculation so as to calculate loss values of the loss functions;

(3) Deriving the loss function to obtain a gradient, and carrying out back propagation through a chain calculation method to obtain updated parameters;

(4) repeating the step 2 and the step 3 continuously, so as to perform iterative training on the basic recognition model continuously until the loss function is no longer reduced and converged, and updating the final parameters at the moment;

(5) and finishing the parameter updating of the basic identification model through the final parameters.

And fourthly, when the basic recognition model meets preset training conditions, taking the trained basic recognition model as a clothes recognition model.

Specifically, after the training character picture is input into the basic recognition model, the character clothing recognition result of the training character recognized by the basic recognition model can be obtained, then, model parameters can be updated by taking the character clothing recognition result of the training character and the marked training clothing information of the training character as targets, when the basic recognition model meets the preset training conditions, the basic recognition model after training can be used as the character clothing recognition model, and the basic recognition model can rapidly and accurately recognize all training clothing information of the training character in the training character picture.

The basic recognition model adopts a dense connection network, improves the problem of information flow between different layers, strengthens feature propagation, encourages feature reuse, greatly reduces parameter quantity, does not need to learn redundant feature mapping, has higher training speed and higher training efficiency compared with the existing model, and has higher detection efficiency and detection precision of the model after training is completed.

The description of the character clothing recognition device provided by the embodiment of the application is provided below, and the character clothing recognition device described below and the character clothing recognition method described above can be referred to correspondingly.

Referring to fig. 6, fig. 6 is a block diagram illustrating a configuration of a character clothing recognition apparatus according to an embodiment of the present application.

As shown in fig. 6, the character clothing recognition device may include:

a picture obtaining module 110, configured to obtain a picture of a person to be identified;

a model acquisition module 120, configured to acquire a trained garment recognition model, where the garment recognition model includes a feature extraction layer, a dense connection network, and a feature analysis network;

the clothing recognition module 130 is configured to input the person image to be recognized into the clothing recognition model, perform feature extraction on the input person image to be recognized through the feature extraction layer, generate a first feature map, perform feature multiplexing on the first feature map through the dense connection network, generate a second feature map, analyze the second feature map through the feature analysis network, and determine a clothing recognition result of the person to be recognized.

In some embodiments of the present application, the dense connection network of the character clothing recognition model may be composed of dense blocks and transition layers, wherein the dense blocks are used for performing batch normalization, activation and convolution processing according to a predefined connection mode of input and output on a channel dimension, and the transition layers are used for controlling the number of channels.

In some embodiments of the application, the character clothing recognition model feature analysis network may include a compression processing network, an index analysis network, and a result determination layer;

the compression processing network can perform dimension reduction processing and dimension compression on the input second feature map to generate a third feature map;

the index analysis network can perform global average pooling processing and normalization index processing on the third characteristic diagram to obtain clothing information and predicted index values corresponding to each piece of clothing information;

and the result determination layer can determine the clothing recognition result of the person to be recognized according to the clothing information and the corresponding predicted index value.

In some embodiments of the application, the feature extraction layer of the character clothing recognition model may include a convolution layer and a max pooling layer;

The convolution layer can carry out convolution operation on the input figure picture to be identified, and extract the figure feature information;

and the maximum processing layer can perform maximum pooling processing on the picture characteristic information to generate a first characteristic diagram.

In some embodiments of the present application, the exponential analysis network of the character clothing recognition model may include a first convolution block, a max pooling layer, a second convolution block, a global average pooling layer, a third convolution block, a normalization processing layer, connected in sequence.

In some embodiments of the present application, the process of the picture obtaining module obtaining the picture of the person to be identified may include:

detecting a person region to be identified on the frame to be identified;

In some embodiments of the present application, the character clothing recognition device may further include a model generation module;

the process of training the clothing recognition model by the model generation module comprises the following steps:

The character clothing recognition device provided by the embodiment of the application can be applied to character clothing recognition equipment. Fig. 7 shows a block diagram of a hardware structure of the character clothing recognition apparatus, and referring to fig. 7, the hardware structure of the character clothing recognition apparatus may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;

in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete the communication with each other through the communication bus 4;

Processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;

the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;

wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:

acquiring a figure picture to be identified;

Alternatively, the refinement function and the extension function of the program may be described with reference to the above.

The embodiment of the present application also provides a readable storage medium storing a program adapted to be executed by a processor, the program being configured to:

acquiring a figure picture to be identified;

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of character apparel identification comprising:

acquiring a figure picture to be identified;

2. The method according to claim 1, wherein the dense connection network is composed of dense blocks using dense connection and transition layers, each layer having the same feature map size, wherein the dense blocks are used for performing batch normalization, activation and convolution processing according to a predefined connection manner of input and output in a channel dimension, and the transition layers are used for controlling the number of channels.

3. The method of claim 1, wherein the signature analysis network comprises a compression processing network, an exponential analysis network, and a result determination layer;

4. The method of claim 1, wherein the feature extraction layer comprises a convolution layer and a max pooling layer;

5. A method according to claim 3, wherein the exponential analysis network comprises a first convolution block, a maximum pooling layer, a second convolution block, a global average pooling layer, a third convolution block, and a normalization processing layer connected in sequence.

6. The method of claim 1, wherein the obtaining a picture of the person to be identified comprises:

detecting a person region to be identified on the frame to be identified;

7. The method of claim 1, wherein the process of training the apparel recognition model comprises:

8. A character clothing recognition device, comprising:

9. A character clothing recognition device, comprising a memory and a processor;

the memory is used for storing programs;

the processor configured to execute the program to implement the steps of the character clothing recognition method of any one of claims 1-7.

10. A readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the character garment recognition method according to any one of claims 1-7.