CN109145947B

CN109145947B - Fashion women's dress image fine-grained classification method based on part detection and visual features

Info

Publication number: CN109145947B
Application number: CN201810784023.4A
Authority: CN
Inventors: 刘骊; 吴苗苗; 付晓东; 黄青松; 刘利军
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2018-07-17
Filing date: 2018-07-17
Publication date: 2022-04-12
Anticipated expiration: 2038-07-17
Also published as: CN109145947A

Abstract

The invention relates to a method for classifying fine granularity of a fashionable women's dress image based on component detection and visual features, and belongs to the field of computer vision and image application. Firstly, carrying out part detection on body parts on input images to be classified of the fashionable dress and images in a training set; secondly, extracting the detected fashionable dress images respectively, and training 4 bottom layer characteristics of HOG, LBP, a color histogram and an edge operator of the fashionable dress images to obtain the images after characteristic extraction; then, matching the defined visual feature descriptors with the extracted 4 bottom-layer features, and training a fine-grained classifier model by adopting multi-class SVM supervised learning; and finally, performing fine-grained classification on the fashionable woman dress image with the extracted features through the trained fine-grained classifier, and outputting a classification result of the fashionable woman dress image. The detection and classification method adopted by the invention has higher accuracy.

Description

Fashion women's dress image fine-grained classification method based on part detection and visual features

Technical Field

The invention relates to a method for classifying fine granularity of a fashionable women's dress image based on component detection and visual features, and belongs to the field of computer vision and image application.

Background

The online shopping is greatly popular with people, and the development trend of popularization, globalization and mobilization is presented, so that the fashion clothing classification becomes an increasingly hot topic, and the fashion clothing classification is widely applied to the fields of e-commerce and the like. Therefore, many improved methods also appear in fashion clothing classification, including the most classical bag-of-words model, fashion clothing classification method based on deep learning, and methods based on random forest, Support Vector Machine (SVM), Convolutional Neural Network (CNN), and the like. Most of the known methods aim at the coarse-grained classification of fashion clothing images, and the analysis among similar style categories is lacked, so that more fine classification and multi-level classification cannot be realized. Because fashion women's clothes are various in styles and different from classification tasks of coarse granularity, the class precision of fine-grained images of the fashion women's clothes is finer, the difference between the styles is finer, and different styles can be distinguished only by means of small local difference. In addition, the signal-to-noise ratio of the fine-grained image is small, and information containing sufficient discrimination exists in a very small local area. Therefore, how to find and effectively utilize useful local area information and more finely, accurately and efficiently realize fine-grained classification of the fashionable dress image has important theoretical significance and practical value. Among the known methods, the POOFs method Based on One-to-One feature of location, such as that proposed by Berg (< POOF: Part-Based One-vs. -One-dimensional Features for Fine-Grained assessment, Face Verification, and Attribute assessment, 2013: 955-. Each feature is capable of distinguishing between two different classes based on the apparent characteristics of a particular location of the object. The Bossard (application classification with style > 2012,321-335.) provides a complete method for identifying and classifying fashion clothing in natural scenes, and the key point is to adopt a plurality of learners based on random forest learning and having strong identification capability as decision nodes, and simultaneously expand the random forest into a migratory forest which can be converted to different fields. Cui (< Fine-Grained classification and data set Bootstrapping Using Deep Metric Learning with Humans in the Loop >, (2015: 1153) -1162) proposes a depth Metric Learning-based general iterative framework for Fine-Grained classification to learn low-dimensional features of anchors embedded in each class. Zhang (< good Supervised Fine-Grained training With Part-Based Image reconstruction >,2016,25(4): 1713-.

In summary, although there are many ways to implement the classification method of fashion clothing images, the style of the clothing itself changes due to various styles, varied textures and accessories, and flexible and deformable clothing, and these factors bring great difficulty to classification and identification. The known method still has certain defects and limitations, and because of numerous shooting scenes and human postures, how to detect different areas of the human body is very important. In the aspect of feature extraction and classification, the known method mostly realizes feature extraction based on bottom layer features such as colors, textures and the like, cannot well utilize local information, has certain limitation on feature extraction of fine difference between styles and types among fashionable clothes, and can only realize coarse-grained classification of the fashionable clothes.

Disclosure of Invention

The invention relates to a fine-grained classification method of a fashionable woman dress image based on component detection and visual features, which is suitable for body part detection with different postures and visual angle changes and meets the fine-grained classification of the fashionable woman dress image in electronic commerce.

The technical scheme of the invention is as follows: a fine-grained classification method for fashionable women's dress images based on component detection and visual features comprises the following steps: step1, carrying out component detection on human body parts under different postures and visual angles by adopting an improved DPM (differential motion modeling) model on the input training fashion women image T and the fashion women image I to be classified; firstly, HOG (Histogram of Oriented Gradient, HOG for short) is extracted from a training fashion women image T and a fashion women image I to be classified, DPM (Deformable Part Model, DPM for short) characteristics are obtained after normalization is carried out, secondly, the DPM human body detection Model is adjusted according to human body postures and visual angles, the human body detection Model is divided into a root Model and a Part Model, then, response scores of the root Model and the Part Model are respectively calculated according to the DPM characteristics, target hypothesis scores are calculated through response transformation, the optimal position is obtained, so that the comprehensive response score of each root position of the target is calculated, and finally, the detection result is obtained.

The improved DPM model consists of a root model and a plurality of part models, wherein the object model of n parts is represented as an (n +2) tuple (F)₀,P₁,...P_i,...P_nB) wherein F₀Is a root filter, P_iIs a model of the i-th component, b is a coefficient of loss of separation at l₀Scale layer of (x)₀,y₀) The response score for an anchor is:

wherein the content of the first and second substances,

is the response score of the root model, v_iIs a two-dimensional vector specifying the coordinates of the anchor point position (i.e., the standard position when no deformation occurs) of the ith filter relative to the root position,

for the response scores of the n part models, λ is the number of levels of feature mapping computed at twice the resolution in the feature pyramid;

after calculating the response score, the response of the component filter is transformed and the spatial uncertainty is considered, and the response transformation calculation formula is as follows:

wherein (x, y) is the ideal position of the ith part model in the scale layer, l is the level number of the feature pyramid H, (dx, dy) is the offset relative to (x, y), and R_i,l(x + dx, y + dy) is the match of the part model at (x + dx, y + dy)C is a fraction of_i.φ_d(dx, dy) is the score lost by the offset (dx, dy), φ_d(dx,dy)＝(dx,dy,dx²,dy²) Is a DPM feature, d_iD is the offset loss coefficient when the parameter model needing to be learned is initialized during model training_i(0,0,1,1), i.e. the offset loss is the euclidean distance of the offset from the ideal position;

each target hypothesis specifies the position of each filter in the model in the feature pyramid H: z ═ p₀,...,p_n)，p_i＝(x_i,y_i,l_i) Is the layer and position coordinates where the ith filter is located, the score of the target hypothesis is calculated as follows:

wherein F_i'.φ(H,p_i) Is the score of the ith filter, phi (H, p)_i) Is the feature vector of the feature pyramid H, F_i' is a vector obtained by connecting weight vectors in the ith filter, (dx)_i,dy_i)＝(x_i,y_i)-(2(x₀,y₀)+v_i) Giving the displacement of the position of the ith filter relative to the anchor point position of the ith filter, obtaining the optimal position through the target hypothesis score, and calculating the comprehensive response score of each root position according to the optimal position:

and obtaining a detection result by scoring a plurality of examples of the detection target through the comprehensive response of each root position.

Step2 extracts the HOG, LBP (Local Binary Pattern), color histogram and edge operator 4 bottom layer characteristics of the training fashion female image T ' and the fashion female image I ' to be classified respectively after detection, and obtains the training fashion female image T ' and the fashion female image I to be classified after characteristic extraction.

Step3, matching the defined visual feature descriptors with the extracted 4 bottom-layer features, and training a fine-grained classifier model by adopting multi-class SVM supervised learning; firstly, dividing fashionable women clothes into upper-body women clothes and lower-body women clothes, wherein the upper-body clothes are divided into 14 styles, the lower-body clothes are divided into 6 styles, the style of the whole-body clothes is divided into 3 styles, and attribute labeling is carried out according to different attributes (such as collar, sleeve shape, color, style pattern and the like); secondly, describing styles and attributes of the fashion women's dress images by defining visual feature descriptors, and then performing feature matching on the visual feature descriptors and 4 bottom-layer features extracted by step2, wherein the visual feature descriptors are divided into upper body visual feature descriptors, lower body visual feature descriptors and global feature descriptors; and finally, training the fashionable women's dress image T' after feature extraction through supervised learning of random forests and a multi-class SVM method to obtain a style and attribute fine-grained classifier.

Step4, through the trained fine-grained classifier, fine-grained classification is carried out on the fashion woman clothes image I' with the extracted features, and a classification result of the fashion woman clothes image is output.

The invention has the beneficial effects that:

1. the known method for detecting the fashion clothing image mainly aims at detecting the fashion clothing image in an ideal scene, but has certain limitation due to interference of shooting scenes, shooting postures, illumination, shielding and other factors. The invention adopts the improved DPM model to detect the parts based on the human body parts, and can better adapt to the detection of the human body parts in different scenes, different postures and different visual angle changes.

2. The known feature extraction method is mostly based on color features and global features, the feature attributes are single, and the important local features and attributes with fine granularity cannot be obtained. The invention provides a visual attribute descriptor which is divided into an upper body visual characteristic descriptor, a lower body visual characteristic descriptor and a global characteristic descriptor through a defined visual attribute descriptor. And the visual feature descriptors are subjected to feature matching with the extracted 4 bottom-layer features of the images of the fashionable women, so that the accuracy of visual feature extraction and representation is improved.

3. The invention respectively supervises and learns the defined different fashionable clothes attributes, establishes a fine-grained classifier model of the fashionable woman clothes image, realizes fine-grained classification of the fashionable woman clothes image by combining random forest and SVM, outputs the classification result of the fashionable woman clothes image and has higher classification accuracy.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram illustrating an example of a flow chart according to the present invention;

FIG. 3 is an exemplary diagram of bottom-layer feature extraction of a fashion suit-dress in accordance with the present invention;

FIG. 4 is a diagram of a fashion woman's clothing attribute in accordance with the present invention;

FIG. 5 is a diagram illustrating the classification effect of fashion women's clothes according to the present invention;

Detailed Description

The invention is further described with reference to the following figures and detailed description.

Example 1: as shown in fig. 1-2, a fine-grained classification method of fashion women's dress images based on component detection and visual features, firstly, performing component detection of body parts on input fashion women's dress images to be classified and fashion women's dress images in a training set of fashion women's dress; secondly, extracting 4 bottom layer characteristics of HOG, LBP, color histogram and edge operator of the fashionable dress image and the training fashionable dress image after the detection of the component respectively to obtain an image after characteristic extraction; then, matching the defined visual feature descriptors with the extracted 4 bottom-layer features, and training a fine-grained classifier model by adopting random forest and multi-class SVM supervised learning; and finally, performing fine-grained classification on the fashionable woman dress image with the extracted features through the trained fine-grained classifier, and outputting a classification result of the fashionable woman dress image.

The method comprises the following specific steps:

step1, carrying out component detection on human body parts under different postures and visual angles by adopting an improved DPM (differential motion modeling) model on the input training fashion women image T and the fashion women image I to be classified; firstly, extracting HOG from a training fashion female garment image T and a fashion female garment image I to be classified and normalizing to obtain DPM characteristics; secondly, adjusting the DPM human body detection model according to the human body posture and the visual angle, and dividing the human body detection model into a root model and a component model; then, respectively calculating the response scores of the root model and the component model according to the DPM characteristics, performing response transformation to calculate the target hypothesis score, obtaining the optimal position, calculating the comprehensive response score of each root position of the target, and finally obtaining the detection result;

step2, respectively extracting HOG, LBP, color histogram and edge operator 4 bottom layer characteristics of the training fashion female garment image T ' and the fashion female garment image I ' to be classified after detection to obtain a training fashion female garment image T ' and the fashion female garment image I to be classified after characteristic extraction;

step3, matching the defined visual feature descriptors with the extracted 4 bottom-layer features, and training a fine-grained classifier model by adopting multi-class SVM supervised learning; firstly, dividing fashionable women clothes into upper-body women clothes and lower-body women clothes, wherein the upper-body clothes are divided into 14 styles, the lower-body clothes are divided into 6 styles, the style of the whole-body clothes is divided into 3 styles, and attribute labeling is carried out according to different attributes; secondly, describing styles and attributes of the fashion women's dress images by defining visual feature descriptors, and then performing feature matching on the visual feature descriptors and 4 bottom-layer features extracted by step 2; finally, training the fashionable women's dress image T' after feature extraction through supervised learning of random forests and multi-class SVM methods to obtain a style and attribute fine-grained classifier;

Example 2: wherein the improved DPM model is composed of a root model and a plurality of part models, and the object model of n parts is represented as an (n +2) tuple (F)₀,P₁,...P_i,...P_nB) wherein F₀Is a root filter, P_iIs the ithModel of the part, b is a loss of separation coefficient, at₀Scale layer of (x)₀,y₀) The response score for an anchor is:

wherein the content of the first and second substances,

wherein (x, y) is the ideal position of the ith part model in the scale layer, l is the level number of the feature pyramid H, (dx, dy) is the offset relative to (x, y), and R_i,l(x + dx, y + dy) is the matching score of the part model at (x + dx, y + dy), d_i.φ_d(dx, dy) is the score lost by the offset (dx, dy), φ_d(d_x,dy)＝(dx,dy,dx²,dy²) Is a DPM feature, d_iD is the offset loss coefficient when the parameter model needing to be learned is initialized during model training_i(0,0,1,1), i.e. the offset loss is the euclidean distance of the offset from the ideal position;

As shown in fig. 3, in the present invention, 4 kinds of bottom layer features of HOG, LBP, color histogram and edge operator of the training fashion female image T 'and the fashion female image I' to be classified after component detection are extracted respectively, to obtain the training fashion female image T ″ and the fashion female image I "to be classified after feature extraction.

And (3) reducing the dimension of the feature by using a PCA dimension reduction method, firstly calculating the mean value of the feature vector on each dimension, and subtracting the mean value from the feature value on each dimension. Then solving the covariance matrix and the eigenvector and eigenvalue of the matrix, ensuring that the eigenvector is a unit vector, taking the eigenvector under high dimension as a principal component, and extracting the corresponding eigenvector according to the eigenvalue. And finally, selecting a proper principal component coverage proportion, and deleting relatively scattered characteristic points to increase the overall reliability in order to ensure the minimum information loss. The retention percentage value is usually set to 94%, and the feature information can be maximally retained.

As shown in table 1, table 2 and fig. 4, the specific content of step3 is to first divide the fashion women's wear into upper body women's wear and lower body women's wear, wherein the upper body clothes are divided into 14 styles, the lower body clothes are divided into 6 styles, and the whole body clothes are divided into 3 styles; and (4) carrying out attribute marking according to different attributes (such as collar, sleeve type, color, pattern and the like) of the fashionable women's dress.

Watch 1 fashion women's dress style watch

TABLE 2 attribute table of fashion suit-dress

Next, as shown in table 3, the style and attributes of the fashion dress image are described by defining visual feature descriptors, which are divided into an upper body visual feature descriptor, a lower body visual feature descriptor, and a global feature descriptor.

The visual feature descriptors are then feature matched to the 4 underlying features extracted at step 2.

Aiming at different styles and attributes, the method defines a series of visual feature descriptors to describe the styles and attributes of the fashion female clothes images, and the visual feature descriptors are divided into upper visual feature descriptors, lower visual feature descriptors and global feature descriptors. The upper body feature descriptors are classified into 3 types of collar types and sleeve types, the lower body feature descriptors are classified into 3 types of length types, wrinkle types and width types, and the global feature descriptors have 1 type of pattern feature. Visual features are matched with bottom-layer features in the feature extraction process, and the feature extraction effectiveness is improved.

Table 3 visual characteristics descriptor table for fashion women's dress

Wherein, tau represents the trunk,

m in (a) represents the number of detected corners of the collar, A_τRepresenting the number of pixels on the torso tau,

middle D (I)_k,I_g) Is a pixel of different color I_k，I_iA measure of the distance of the colors in between,

in R_cThe edge of the collar is shown,

in

Indicating the standard position of the jth detected neck collar corner,

in n_ANumber of pixels representing detected arm area, f_lIn_lThe length of the lower garment is shown,

and

respectively, the lengths of the left and right legs, f_rIn n_wShowing the wrinkling of the lower garmentNumber of elements, A_lRepresenting the total number of pixels detected by the downloader, f_tIn n_vIndicating the number of pixels of the underlying vertical line,

in

Respectively the width of the three parts of the lower garment, w_ωIs the width of the waist region.

And finally, respectively performing supervised learning through a Random Forest (RF) algorithm and a multi-class SVM algorithm according to different defined styles and attributes, and establishing a fine-grained classifier model. A random forest is a set of T decision trees, where each tree is trained to maximize the information gain at each node level, quantized to the form:

where H (x) is the entropy of the sample set x, and t is the division of x into subsets x_lAnd x_rBinary testing of (1), class prediction from average leaf distribution

Is performed with L ═ L₁,......l_T) Are leaf nodes on all trees. The present invention uses a discriminative learner of a strong binary SVM as the split decision function t if x ∈ R^dIs an input vector of d dimensions and w is the trained SVM weight vector. SVM node w^TAll samples with x < 0 are split to the left and all other samples are split to the right children respectively. In training, several binary class partitions are randomly generated. For each packet, the linear SVM is trained for a randomly selected feature channel. Finally, the splitting of the multistage information gain L (x, w) is maximized, and the measurement selects the true label as the splitting function, fromAnd obtaining the trained fashion women's clothing style fine-grained classifier.

In addition, a one-vs-all method is applied in the multi-class SVM supervised learning to train each fine-grained attribute, 47 two classes of classifiers are constructed according to the defined 47 fashionable dress attributes, wherein the h-th classifier divides the i-th class from the rest classes, the h-th classifier takes the h-th class in the training set as a positive class and the rest classes as negative classes to train during the training, for a data x needing to be classified, the class of x is determined by using a voting mode, the classifier h is assumed to predict the data x, if a positive class result is obtained, the result of classifying x by the classifier h is that x belongs to the h class, the class h obtains a ticket, if a negative class result is obtained, the x belongs to the other classes except the h class, therefore, each class except h obtains a ticket, and finally the class attribute with the largest number of counted tickets is x, so as to train the fine-grained classifier of the attribute of the fashion suit-dress.

And training the fashionable women's dress image T' after feature extraction by supervised learning through a random forest and a multi-class SVM method to obtain a fine-grained classifier model of style and attribute.

As shown in fig. 5, the fine-grained classification of the fashion woman clothes image I ″ with extracted features is realized through the trained fine-grained classifier, the classification result of the fashion woman clothes image is output, the detection result is displayed in the form of a detection frame, and the style and the attribute are displayed in the classification result in the form of separate different labels.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A fashion women's dress image fine-grained classification method based on component detection and visual features is characterized in that: the method comprises the following steps:

the visual feature descriptors in Step3 are divided into upper body visual feature descriptors, lower body visual feature descriptors and global feature descriptors, and are correspondingly matched with 4 bottom-layer features in Step2 in terms of features;

the upper body visual characteristic descriptor is used for describing the collar and sleeves, including the percentage of the corners on the edge of the collar

X variation of all corners on collar edge

Y variation of all corners on collar edge

Percentage of pixels in arm region

The four feature descriptors are matched with the features of the HOG and Roberts edge operators;

the lower body visual characteristic descriptor is used for describing length, folds and width, including the ratio of leg length to lower garment length

Percent of drape of under-garment area f_r＝(n_w/A_l) Percent of vertical line of the lower region f_t＝(n_v/A_l) Ratio of lower garment to waist area Width

the global feature descriptor is used for describing styles and comprises the density of corners in the area

Total saliency of color variance within a region

The density of corners in the region matches LBP features, and the overall significance of color variance in the region matches color histogram features;

wherein m represents the number of detected corners of the collar, R_cThe edge of the collar is shown,

in

Indicating the standard position of the jth detected neck collar corner, n_ANumber of pixels representing detected arm region, τ representing torso, A_τRepresenting the number of pixels, l, on the torso τ_lThe length of the lower garment is shown,

and

respectively representing the length of the left and right legs, n_wRepresenting the number of under-packed wrinkled pixels, A_lIndicates the total number of pixels detected by the bottom loading, n_vIndicating the number of pixels of the underlying vertical line,

respectively the width of the three parts of the lower garment, w_ωIs the width of the lumbar region, D (I)_k,I_g) Is a pixel of different color I_k，I_gA color distance measure therebetween;

2. The fine-grained classification method for fashion women's wear images based on component detection and visual features according to claim 1, characterized in that: the improved DPM model in Step1 is composed of a root model and a plurality of part models, wherein the object model of n parts is represented as an n +2 tuple (F)₀,P₁,...P_i,...P_nB) wherein F₀Is a root filter, P_iIs a model of the ith component, b is a deviation lossLoss coefficient at₀Scale layer of (x)₀,y₀) The response score for an anchor is:

wherein the content of the first and second substances,

is the response score of the root model, v_iIs a two-dimensional vector specifying the coordinates of the anchor point position of the ith filter relative to the root position,

wherein (x, y) is the ideal position of the ith part model in the scale layer, l is the level number of the feature pyramid H, (dx, dy) is the offset relative to (x, y), and R_i,l(x + dx, y + dy) is the matching score of the part model at (x + dx, y + dy), d_i.φ_d(dx, dy) is the score lost by the offset (dx, dy), φ_d(dx,dy)＝(dx,dy,dx²,dy²) Is a DPM feature, d_iFor shifting the loss factor, at model initialization, d_i(0,0,1,1), i.e. the offset loss is the euclidean distance of the offset from the ideal position;

each target hypothesis specifies the position of each filter in the model in the feature pyramid H: z ═ p₀,...,p_n)，p_i＝(x_i,y_i,l_i) Is where the ith filter is locatedLayer and location coordinates, the score of the target hypothesis is calculated as follows: