CN109145947B - Fashion women's dress image fine-grained classification method based on part detection and visual features - Google Patents

Fashion women's dress image fine-grained classification method based on part detection and visual features Download PDF

Info

Publication number
CN109145947B
CN109145947B CN201810784023.4A CN201810784023A CN109145947B CN 109145947 B CN109145947 B CN 109145947B CN 201810784023 A CN201810784023 A CN 201810784023A CN 109145947 B CN109145947 B CN 109145947B
Authority
CN
China
Prior art keywords
fashion
model
image
feature
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810784023.4A
Other languages
Chinese (zh)
Other versions
CN109145947A (en
Inventor
刘骊
吴苗苗
付晓东
黄青松
刘利军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201810784023.4A priority Critical patent/CN109145947B/en
Publication of CN109145947A publication Critical patent/CN109145947A/en
Application granted granted Critical
Publication of CN109145947B publication Critical patent/CN109145947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method for classifying fine granularity of a fashionable women's dress image based on component detection and visual features, and belongs to the field of computer vision and image application. Firstly, carrying out part detection on body parts on input images to be classified of the fashionable dress and images in a training set; secondly, extracting the detected fashionable dress images respectively, and training 4 bottom layer characteristics of HOG, LBP, a color histogram and an edge operator of the fashionable dress images to obtain the images after characteristic extraction; then, matching the defined visual feature descriptors with the extracted 4 bottom-layer features, and training a fine-grained classifier model by adopting multi-class SVM supervised learning; and finally, performing fine-grained classification on the fashionable woman dress image with the extracted features through the trained fine-grained classifier, and outputting a classification result of the fashionable woman dress image. The detection and classification method adopted by the invention has higher accuracy.

Description

Fashion women's dress image fine-grained classification method based on part detection and visual features
Technical Field
The invention relates to a method for classifying fine granularity of a fashionable women's dress image based on component detection and visual features, and belongs to the field of computer vision and image application.
Background
The online shopping is greatly popular with people, and the development trend of popularization, globalization and mobilization is presented, so that the fashion clothing classification becomes an increasingly hot topic, and the fashion clothing classification is widely applied to the fields of e-commerce and the like. Therefore, many improved methods also appear in fashion clothing classification, including the most classical bag-of-words model, fashion clothing classification method based on deep learning, and methods based on random forest, Support Vector Machine (SVM), Convolutional Neural Network (CNN), and the like. Most of the known methods aim at the coarse-grained classification of fashion clothing images, and the analysis among similar style categories is lacked, so that more fine classification and multi-level classification cannot be realized. Because fashion women's clothes are various in styles and different from classification tasks of coarse granularity, the class precision of fine-grained images of the fashion women's clothes is finer, the difference between the styles is finer, and different styles can be distinguished only by means of small local difference. In addition, the signal-to-noise ratio of the fine-grained image is small, and information containing sufficient discrimination exists in a very small local area. Therefore, how to find and effectively utilize useful local area information and more finely, accurately and efficiently realize fine-grained classification of the fashionable dress image has important theoretical significance and practical value. Among the known methods, the POOFs method Based on One-to-One feature of location, such as that proposed by Berg (< POOF: Part-Based One-vs. -One-dimensional Features for Fine-Grained assessment, Face Verification, and Attribute assessment, 2013: 955-. Each feature is capable of distinguishing between two different classes based on the apparent characteristics of a particular location of the object. The Bossard (application classification with style > 2012,321-335.) provides a complete method for identifying and classifying fashion clothing in natural scenes, and the key point is to adopt a plurality of learners based on random forest learning and having strong identification capability as decision nodes, and simultaneously expand the random forest into a migratory forest which can be converted to different fields. Cui (< Fine-Grained classification and data set Bootstrapping Using Deep Metric Learning with Humans in the Loop >, (2015: 1153) -1162) proposes a depth Metric Learning-based general iterative framework for Fine-Grained classification to learn low-dimensional features of anchors embedded in each class. Zhang (< good Supervised Fine-Grained training With Part-Based Image reconstruction >,2016,25(4): 1713-.
In summary, although there are many ways to implement the classification method of fashion clothing images, the style of the clothing itself changes due to various styles, varied textures and accessories, and flexible and deformable clothing, and these factors bring great difficulty to classification and identification. The known method still has certain defects and limitations, and because of numerous shooting scenes and human postures, how to detect different areas of the human body is very important. In the aspect of feature extraction and classification, the known method mostly realizes feature extraction based on bottom layer features such as colors, textures and the like, cannot well utilize local information, has certain limitation on feature extraction of fine difference between styles and types among fashionable clothes, and can only realize coarse-grained classification of the fashionable clothes.
Disclosure of Invention
The invention relates to a fine-grained classification method of a fashionable woman dress image based on component detection and visual features, which is suitable for body part detection with different postures and visual angle changes and meets the fine-grained classification of the fashionable woman dress image in electronic commerce.
The technical scheme of the invention is as follows: a fine-grained classification method for fashionable women's dress images based on component detection and visual features comprises the following steps: step1, carrying out component detection on human body parts under different postures and visual angles by adopting an improved DPM (differential motion modeling) model on the input training fashion women image T and the fashion women image I to be classified; firstly, HOG (Histogram of Oriented Gradient, HOG for short) is extracted from a training fashion women image T and a fashion women image I to be classified, DPM (Deformable Part Model, DPM for short) characteristics are obtained after normalization is carried out, secondly, the DPM human body detection Model is adjusted according to human body postures and visual angles, the human body detection Model is divided into a root Model and a Part Model, then, response scores of the root Model and the Part Model are respectively calculated according to the DPM characteristics, target hypothesis scores are calculated through response transformation, the optimal position is obtained, so that the comprehensive response score of each root position of the target is calculated, and finally, the detection result is obtained.
The improved DPM model consists of a root model and a plurality of part models, wherein the object model of n parts is represented as an (n +2) tuple (F)0,P1,...Pi,...PnB) wherein F0Is a root filter, PiIs a model of the i-th component, b is a coefficient of loss of separation at l0Scale layer of (x)0,y0) The response score for an anchor is:
Figure BDA0001733282350000021
wherein the content of the first and second substances,
Figure BDA0001733282350000031
is the response score of the root model, viIs a two-dimensional vector specifying the coordinates of the anchor point position (i.e., the standard position when no deformation occurs) of the ith filter relative to the root position,
Figure BDA0001733282350000032
for the response scores of the n part models, λ is the number of levels of feature mapping computed at twice the resolution in the feature pyramid;
after calculating the response score, the response of the component filter is transformed and the spatial uncertainty is considered, and the response transformation calculation formula is as follows:
Figure BDA0001733282350000033
wherein (x, y) is the ideal position of the ith part model in the scale layer, l is the level number of the feature pyramid H, (dx, dy) is the offset relative to (x, y), and Ri,l(x + dx, y + dy) is the match of the part model at (x + dx, y + dy)C is a fraction ofid(dx, dy) is the score lost by the offset (dx, dy), φd(dx,dy)=(dx,dy,dx2,dy2) Is a DPM feature, diD is the offset loss coefficient when the parameter model needing to be learned is initialized during model trainingi(0,0,1,1), i.e. the offset loss is the euclidean distance of the offset from the ideal position;
each target hypothesis specifies the position of each filter in the model in the feature pyramid H: z ═ p0,...,pn),pi=(xi,yi,li) Is the layer and position coordinates where the ith filter is located, the score of the target hypothesis is calculated as follows:
Figure BDA0001733282350000034
wherein Fi'.φ(H,pi) Is the score of the ith filter, phi (H, p)i) Is the feature vector of the feature pyramid H, Fi' is a vector obtained by connecting weight vectors in the ith filter, (dx)i,dyi)=(xi,yi)-(2(x0,y0)+vi) Giving the displacement of the position of the ith filter relative to the anchor point position of the ith filter, obtaining the optimal position through the target hypothesis score, and calculating the comprehensive response score of each root position according to the optimal position:
Figure BDA0001733282350000035
and obtaining a detection result by scoring a plurality of examples of the detection target through the comprehensive response of each root position.
Step2 extracts the HOG, LBP (Local Binary Pattern), color histogram and edge operator 4 bottom layer characteristics of the training fashion female image T ' and the fashion female image I ' to be classified respectively after detection, and obtains the training fashion female image T ' and the fashion female image I to be classified after characteristic extraction.
Step3, matching the defined visual feature descriptors with the extracted 4 bottom-layer features, and training a fine-grained classifier model by adopting multi-class SVM supervised learning; firstly, dividing fashionable women clothes into upper-body women clothes and lower-body women clothes, wherein the upper-body clothes are divided into 14 styles, the lower-body clothes are divided into 6 styles, the style of the whole-body clothes is divided into 3 styles, and attribute labeling is carried out according to different attributes (such as collar, sleeve shape, color, style pattern and the like); secondly, describing styles and attributes of the fashion women's dress images by defining visual feature descriptors, and then performing feature matching on the visual feature descriptors and 4 bottom-layer features extracted by step2, wherein the visual feature descriptors are divided into upper body visual feature descriptors, lower body visual feature descriptors and global feature descriptors; and finally, training the fashionable women's dress image T' after feature extraction through supervised learning of random forests and a multi-class SVM method to obtain a style and attribute fine-grained classifier.
Step4, through the trained fine-grained classifier, fine-grained classification is carried out on the fashion woman clothes image I' with the extracted features, and a classification result of the fashion woman clothes image is output.
The invention has the beneficial effects that:
1. the known method for detecting the fashion clothing image mainly aims at detecting the fashion clothing image in an ideal scene, but has certain limitation due to interference of shooting scenes, shooting postures, illumination, shielding and other factors. The invention adopts the improved DPM model to detect the parts based on the human body parts, and can better adapt to the detection of the human body parts in different scenes, different postures and different visual angle changes.
2. The known feature extraction method is mostly based on color features and global features, the feature attributes are single, and the important local features and attributes with fine granularity cannot be obtained. The invention provides a visual attribute descriptor which is divided into an upper body visual characteristic descriptor, a lower body visual characteristic descriptor and a global characteristic descriptor through a defined visual attribute descriptor. And the visual feature descriptors are subjected to feature matching with the extracted 4 bottom-layer features of the images of the fashionable women, so that the accuracy of visual feature extraction and representation is improved.
3. The invention respectively supervises and learns the defined different fashionable clothes attributes, establishes a fine-grained classifier model of the fashionable woman clothes image, realizes fine-grained classification of the fashionable woman clothes image by combining random forest and SVM, outputs the classification result of the fashionable woman clothes image and has higher classification accuracy.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram illustrating an example of a flow chart according to the present invention;
FIG. 3 is an exemplary diagram of bottom-layer feature extraction of a fashion suit-dress in accordance with the present invention;
FIG. 4 is a diagram of a fashion woman's clothing attribute in accordance with the present invention;
FIG. 5 is a diagram illustrating the classification effect of fashion women's clothes according to the present invention;
Detailed Description
The invention is further described with reference to the following figures and detailed description.
Example 1: as shown in fig. 1-2, a fine-grained classification method of fashion women's dress images based on component detection and visual features, firstly, performing component detection of body parts on input fashion women's dress images to be classified and fashion women's dress images in a training set of fashion women's dress; secondly, extracting 4 bottom layer characteristics of HOG, LBP, color histogram and edge operator of the fashionable dress image and the training fashionable dress image after the detection of the component respectively to obtain an image after characteristic extraction; then, matching the defined visual feature descriptors with the extracted 4 bottom-layer features, and training a fine-grained classifier model by adopting random forest and multi-class SVM supervised learning; and finally, performing fine-grained classification on the fashionable woman dress image with the extracted features through the trained fine-grained classifier, and outputting a classification result of the fashionable woman dress image.
The method comprises the following specific steps:
step1, carrying out component detection on human body parts under different postures and visual angles by adopting an improved DPM (differential motion modeling) model on the input training fashion women image T and the fashion women image I to be classified; firstly, extracting HOG from a training fashion female garment image T and a fashion female garment image I to be classified and normalizing to obtain DPM characteristics; secondly, adjusting the DPM human body detection model according to the human body posture and the visual angle, and dividing the human body detection model into a root model and a component model; then, respectively calculating the response scores of the root model and the component model according to the DPM characteristics, performing response transformation to calculate the target hypothesis score, obtaining the optimal position, calculating the comprehensive response score of each root position of the target, and finally obtaining the detection result;
step2, respectively extracting HOG, LBP, color histogram and edge operator 4 bottom layer characteristics of the training fashion female garment image T ' and the fashion female garment image I ' to be classified after detection to obtain a training fashion female garment image T ' and the fashion female garment image I to be classified after characteristic extraction;
step3, matching the defined visual feature descriptors with the extracted 4 bottom-layer features, and training a fine-grained classifier model by adopting multi-class SVM supervised learning; firstly, dividing fashionable women clothes into upper-body women clothes and lower-body women clothes, wherein the upper-body clothes are divided into 14 styles, the lower-body clothes are divided into 6 styles, the style of the whole-body clothes is divided into 3 styles, and attribute labeling is carried out according to different attributes; secondly, describing styles and attributes of the fashion women's dress images by defining visual feature descriptors, and then performing feature matching on the visual feature descriptors and 4 bottom-layer features extracted by step 2; finally, training the fashionable women's dress image T' after feature extraction through supervised learning of random forests and multi-class SVM methods to obtain a style and attribute fine-grained classifier;
step4, through the trained fine-grained classifier, fine-grained classification is carried out on the fashion woman clothes image I' with the extracted features, and a classification result of the fashion woman clothes image is output.
Example 2: wherein the improved DPM model is composed of a root model and a plurality of part models, and the object model of n parts is represented as an (n +2) tuple (F)0,P1,...Pi,...PnB) wherein F0Is a root filter, PiIs the ithModel of the part, b is a loss of separation coefficient, at0Scale layer of (x)0,y0) The response score for an anchor is:
Figure BDA0001733282350000061
wherein the content of the first and second substances,
Figure BDA0001733282350000065
is the response score of the root model, viIs a two-dimensional vector specifying the coordinates of the anchor point position (i.e., the standard position when no deformation occurs) of the ith filter relative to the root position,
Figure BDA0001733282350000062
for the response scores of the n part models, λ is the number of levels of feature mapping computed at twice the resolution in the feature pyramid;
after calculating the response score, the response of the component filter is transformed and the spatial uncertainty is considered, and the response transformation calculation formula is as follows:
Figure BDA0001733282350000063
wherein (x, y) is the ideal position of the ith part model in the scale layer, l is the level number of the feature pyramid H, (dx, dy) is the offset relative to (x, y), and Ri,l(x + dx, y + dy) is the matching score of the part model at (x + dx, y + dy), did(dx, dy) is the score lost by the offset (dx, dy), φd(dx,dy)=(dx,dy,dx2,dy2) Is a DPM feature, diD is the offset loss coefficient when the parameter model needing to be learned is initialized during model trainingi(0,0,1,1), i.e. the offset loss is the euclidean distance of the offset from the ideal position;
each target hypothesis specifies the position of each filter in the model in the feature pyramid H: z ═ p0,...,pn),pi=(xi,yi,li) Is the layer and position coordinates where the ith filter is located, the score of the target hypothesis is calculated as follows:
Figure BDA0001733282350000064
wherein Fi'.φ(H,pi) Is the score of the ith filter, phi (H, p)i) Is the feature vector of the feature pyramid H, Fi' is a vector obtained by connecting weight vectors in the ith filter, (dx)i,dyi)=(xi,yi)-(2(x0,y0)+vi) Giving the displacement of the position of the ith filter relative to the anchor point position of the ith filter, obtaining the optimal position through the target hypothesis score, and calculating the comprehensive response score of each root position according to the optimal position:
Figure BDA0001733282350000071
and obtaining a detection result by scoring a plurality of examples of the detection target through the comprehensive response of each root position.
As shown in fig. 3, in the present invention, 4 kinds of bottom layer features of HOG, LBP, color histogram and edge operator of the training fashion female image T 'and the fashion female image I' to be classified after component detection are extracted respectively, to obtain the training fashion female image T ″ and the fashion female image I "to be classified after feature extraction.
And (3) reducing the dimension of the feature by using a PCA dimension reduction method, firstly calculating the mean value of the feature vector on each dimension, and subtracting the mean value from the feature value on each dimension. Then solving the covariance matrix and the eigenvector and eigenvalue of the matrix, ensuring that the eigenvector is a unit vector, taking the eigenvector under high dimension as a principal component, and extracting the corresponding eigenvector according to the eigenvalue. And finally, selecting a proper principal component coverage proportion, and deleting relatively scattered characteristic points to increase the overall reliability in order to ensure the minimum information loss. The retention percentage value is usually set to 94%, and the feature information can be maximally retained.
As shown in table 1, table 2 and fig. 4, the specific content of step3 is to first divide the fashion women's wear into upper body women's wear and lower body women's wear, wherein the upper body clothes are divided into 14 styles, the lower body clothes are divided into 6 styles, and the whole body clothes are divided into 3 styles; and (4) carrying out attribute marking according to different attributes (such as collar, sleeve type, color, pattern and the like) of the fashionable women's dress.
Watch 1 fashion women's dress style watch
Figure BDA0001733282350000072
TABLE 2 attribute table of fashion suit-dress
Figure BDA0001733282350000073
Figure BDA0001733282350000081
Next, as shown in table 3, the style and attributes of the fashion dress image are described by defining visual feature descriptors, which are divided into an upper body visual feature descriptor, a lower body visual feature descriptor, and a global feature descriptor.
The visual feature descriptors are then feature matched to the 4 underlying features extracted at step 2.
Aiming at different styles and attributes, the method defines a series of visual feature descriptors to describe the styles and attributes of the fashion female clothes images, and the visual feature descriptors are divided into upper visual feature descriptors, lower visual feature descriptors and global feature descriptors. The upper body feature descriptors are classified into 3 types of collar types and sleeve types, the lower body feature descriptors are classified into 3 types of length types, wrinkle types and width types, and the global feature descriptors have 1 type of pattern feature. Visual features are matched with bottom-layer features in the feature extraction process, and the feature extraction effectiveness is improved.
Table 3 visual characteristics descriptor table for fashion women's dress
Figure BDA0001733282350000082
Figure BDA0001733282350000091
Wherein, tau represents the trunk,
Figure BDA0001733282350000092
m in (a) represents the number of detected corners of the collar, AτRepresenting the number of pixels on the torso tau,
Figure BDA0001733282350000093
middle D (I)k,Ig) Is a pixel of different color Ik,IiA measure of the distance of the colors in between,
Figure BDA00017332823500000912
in RcThe edge of the collar is shown,
Figure BDA0001733282350000094
in
Figure BDA00017332823500000913
Indicating the standard position of the jth detected neck collar corner,
Figure BDA0001733282350000096
in nANumber of pixels representing detected arm area, flInlThe length of the lower garment is shown,
Figure BDA0001733282350000097
and
Figure BDA0001733282350000098
respectively, the lengths of the left and right legs, frIn nwShowing the wrinkling of the lower garmentNumber of elements, AlRepresenting the total number of pixels detected by the downloader, ftIn nvIndicating the number of pixels of the underlying vertical line,
Figure BDA0001733282350000099
in
Figure BDA00017332823500000910
Figure BDA00017332823500000911
Respectively the width of the three parts of the lower garment, wωIs the width of the waist region.
And finally, respectively performing supervised learning through a Random Forest (RF) algorithm and a multi-class SVM algorithm according to different defined styles and attributes, and establishing a fine-grained classifier model. A random forest is a set of T decision trees, where each tree is trained to maximize the information gain at each node level, quantized to the form:
Figure BDA0001733282350000101
where H (x) is the entropy of the sample set x, and t is the division of x into subsets xlAnd xrBinary testing of (1), class prediction from average leaf distribution
Figure BDA0001733282350000102
Is performed with L ═ L1,......lT) Are leaf nodes on all trees. The present invention uses a discriminative learner of a strong binary SVM as the split decision function t if x ∈ RdIs an input vector of d dimensions and w is the trained SVM weight vector. SVM node wTAll samples with x < 0 are split to the left and all other samples are split to the right children respectively. In training, several binary class partitions are randomly generated. For each packet, the linear SVM is trained for a randomly selected feature channel. Finally, the splitting of the multistage information gain L (x, w) is maximized, and the measurement selects the true label as the splitting function, fromAnd obtaining the trained fashion women's clothing style fine-grained classifier.
In addition, a one-vs-all method is applied in the multi-class SVM supervised learning to train each fine-grained attribute, 47 two classes of classifiers are constructed according to the defined 47 fashionable dress attributes, wherein the h-th classifier divides the i-th class from the rest classes, the h-th classifier takes the h-th class in the training set as a positive class and the rest classes as negative classes to train during the training, for a data x needing to be classified, the class of x is determined by using a voting mode, the classifier h is assumed to predict the data x, if a positive class result is obtained, the result of classifying x by the classifier h is that x belongs to the h class, the class h obtains a ticket, if a negative class result is obtained, the x belongs to the other classes except the h class, therefore, each class except h obtains a ticket, and finally the class attribute with the largest number of counted tickets is x, so as to train the fine-grained classifier of the attribute of the fashion suit-dress.
And training the fashionable women's dress image T' after feature extraction by supervised learning through a random forest and a multi-class SVM method to obtain a fine-grained classifier model of style and attribute.
As shown in fig. 5, the fine-grained classification of the fashion woman clothes image I ″ with extracted features is realized through the trained fine-grained classifier, the classification result of the fashion woman clothes image is output, the detection result is displayed in the form of a detection frame, and the style and the attribute are displayed in the classification result in the form of separate different labels.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (2)

1. A fashion women's dress image fine-grained classification method based on component detection and visual features is characterized in that: the method comprises the following steps:
step1, carrying out component detection on human body parts under different postures and visual angles by adopting an improved DPM (differential motion modeling) model on the input training fashion women image T and the fashion women image I to be classified; firstly, extracting HOG from a training fashion female garment image T and a fashion female garment image I to be classified and normalizing to obtain DPM characteristics; secondly, adjusting the DPM human body detection model according to the human body posture and the visual angle, and dividing the human body detection model into a root model and a component model; then, respectively calculating the response scores of the root model and the component model according to the DPM characteristics, performing response transformation to calculate the target hypothesis score, obtaining the optimal position, calculating the comprehensive response score of each root position of the target, and finally obtaining the detection result;
step2, respectively extracting HOG, LBP, color histogram and edge operator 4 bottom layer characteristics of the training fashion female garment image T ' and the fashion female garment image I ' to be classified after detection to obtain a training fashion female garment image T ' and the fashion female garment image I to be classified after characteristic extraction;
step3, matching the defined visual feature descriptors with the extracted 4 bottom-layer features, and training a fine-grained classifier model by adopting multi-class SVM supervised learning; firstly, dividing fashionable women clothes into upper-body women clothes and lower-body women clothes, wherein the upper-body clothes are divided into 14 styles, the lower-body clothes are divided into 6 styles, the style of the whole-body clothes is divided into 3 styles, and attribute labeling is carried out according to different attributes; secondly, describing styles and attributes of the fashion women's dress images by defining visual feature descriptors, and then performing feature matching on the visual feature descriptors and 4 bottom-layer features extracted by step 2; finally, training the fashionable women's dress image T' after feature extraction through supervised learning of random forests and multi-class SVM methods to obtain a style and attribute fine-grained classifier;
the visual feature descriptors in Step3 are divided into upper body visual feature descriptors, lower body visual feature descriptors and global feature descriptors, and are correspondingly matched with 4 bottom-layer features in Step2 in terms of features;
the upper body visual characteristic descriptor is used for describing the collar and sleeves, including the percentage of the corners on the edge of the collar
Figure FDA0003247674390000011
X variation of all corners on collar edge
Figure FDA0003247674390000012
Y variation of all corners on collar edge
Figure FDA0003247674390000013
Percentage of pixels in arm region
Figure FDA0003247674390000014
The four feature descriptors are matched with the features of the HOG and Roberts edge operators;
the lower body visual characteristic descriptor is used for describing length, folds and width, including the ratio of leg length to lower garment length
Figure FDA0003247674390000015
Percent of drape of under-garment area fr=(nw/Al) Percent of vertical line of the lower region ft=(nv/Al) Ratio of lower garment to waist area Width
Figure FDA0003247674390000021
Figure FDA0003247674390000022
The four feature descriptors are matched with the features of the HOG and Roberts edge operators;
the global feature descriptor is used for describing styles and comprises the density of corners in the area
Figure FDA0003247674390000023
Total saliency of color variance within a region
Figure FDA0003247674390000024
The density of corners in the region matches LBP features, and the overall significance of color variance in the region matches color histogram features;
wherein m represents the number of detected corners of the collar, RcThe edge of the collar is shown,
Figure FDA0003247674390000025
in
Figure FDA0003247674390000026
Indicating the standard position of the jth detected neck collar corner, nANumber of pixels representing detected arm region, τ representing torso, AτRepresenting the number of pixels, l, on the torso τlThe length of the lower garment is shown,
Figure FDA0003247674390000027
and
Figure FDA0003247674390000028
respectively representing the length of the left and right legs, nwRepresenting the number of under-packed wrinkled pixels, AlIndicates the total number of pixels detected by the bottom loading, nvIndicating the number of pixels of the underlying vertical line,
Figure FDA0003247674390000029
respectively the width of the three parts of the lower garment, wωIs the width of the lumbar region, D (I)k,Ig) Is a pixel of different color Ik,IgA color distance measure therebetween;
step4, through the trained fine-grained classifier, fine-grained classification is carried out on the fashion woman clothes image I' with the extracted features, and a classification result of the fashion woman clothes image is output.
2. The fine-grained classification method for fashion women's wear images based on component detection and visual features according to claim 1, characterized in that: the improved DPM model in Step1 is composed of a root model and a plurality of part models, wherein the object model of n parts is represented as an n +2 tuple (F)0,P1,...Pi,...PnB) wherein F0Is a root filter, PiIs a model of the ith component, b is a deviation lossLoss coefficient at0Scale layer of (x)0,y0) The response score for an anchor is:
Figure FDA00032476743900000210
wherein the content of the first and second substances,
Figure FDA00032476743900000211
is the response score of the root model, viIs a two-dimensional vector specifying the coordinates of the anchor point position of the ith filter relative to the root position,
Figure FDA00032476743900000212
for the response scores of the n part models, λ is the number of levels of feature mapping computed at twice the resolution in the feature pyramid;
after calculating the response score, the response of the component filter is transformed and the spatial uncertainty is considered, and the response transformation calculation formula is as follows:
Figure FDA0003247674390000031
wherein (x, y) is the ideal position of the ith part model in the scale layer, l is the level number of the feature pyramid H, (dx, dy) is the offset relative to (x, y), and Ri,l(x + dx, y + dy) is the matching score of the part model at (x + dx, y + dy), did(dx, dy) is the score lost by the offset (dx, dy), φd(dx,dy)=(dx,dy,dx2,dy2) Is a DPM feature, diFor shifting the loss factor, at model initialization, di(0,0,1,1), i.e. the offset loss is the euclidean distance of the offset from the ideal position;
each target hypothesis specifies the position of each filter in the model in the feature pyramid H: z ═ p0,...,pn),pi=(xi,yi,li) Is where the ith filter is locatedLayer and location coordinates, the score of the target hypothesis is calculated as follows:
Figure FDA0003247674390000032
wherein Fi'.φ(H,pi) Is the score of the ith filter, phi (H, p)i) Is the feature vector of the feature pyramid H, Fi' is a vector obtained by connecting weight vectors in the ith filter, (dx)i,dyi)=(xi,yi)-(2(x0,y0)+vi) Giving the displacement of the position of the ith filter relative to the anchor point position of the ith filter, obtaining the optimal position through the target hypothesis score, and calculating the comprehensive response score of each root position according to the optimal position:
Figure FDA0003247674390000033
and obtaining a detection result by scoring a plurality of examples of the detection target through the comprehensive response of each root position.
CN201810784023.4A 2018-07-17 2018-07-17 Fashion women's dress image fine-grained classification method based on part detection and visual features Active CN109145947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810784023.4A CN109145947B (en) 2018-07-17 2018-07-17 Fashion women's dress image fine-grained classification method based on part detection and visual features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810784023.4A CN109145947B (en) 2018-07-17 2018-07-17 Fashion women's dress image fine-grained classification method based on part detection and visual features

Publications (2)

Publication Number Publication Date
CN109145947A CN109145947A (en) 2019-01-04
CN109145947B true CN109145947B (en) 2022-04-12

Family

ID=64800777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810784023.4A Active CN109145947B (en) 2018-07-17 2018-07-17 Fashion women's dress image fine-grained classification method based on part detection and visual features

Country Status (1)

Country Link
CN (1) CN109145947B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10657584B1 (en) * 2019-01-31 2020-05-19 StradVision, Inc. Method and device for generating safe clothing patterns for rider of bike
CN110136100B (en) * 2019-04-16 2021-02-19 华南理工大学 Automatic classification method and device for CT slice images
CN110738233B (en) * 2019-08-28 2022-07-12 北京奇艺世纪科技有限公司 Model training method, data classification method, device, electronic equipment and storage medium
CN113869371A (en) * 2021-09-03 2021-12-31 深延科技(北京)有限公司 Model training method, clothing fine-grained segmentation method and related device

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819566A (en) * 2012-07-17 2012-12-12 杭州淘淘搜科技有限公司 Cross-catalogue indexing method for business images
CN104978762A (en) * 2015-07-13 2015-10-14 北京航空航天大学 Three-dimensional clothing model generating method and system
CN105069466A (en) * 2015-07-24 2015-11-18 成都市高博汇科信息科技有限公司 Pedestrian clothing color identification method based on digital image processing
CN105373783A (en) * 2015-11-17 2016-03-02 高新兴科技集团股份有限公司 Seat belt not-wearing detection method based on mixed multi-scale deformable component model
CN105488490A (en) * 2015-12-23 2016-04-13 天津天地伟业数码科技有限公司 Judge dressing detection method based on video
CN106022375A (en) * 2016-05-19 2016-10-12 东华大学 HU invariant moment and support vector machine-based garment style identification method
CN106021603A (en) * 2016-06-20 2016-10-12 昆明理工大学 Garment image retrieval method based on segmentation and feature matching
WO2016168235A1 (en) * 2015-04-17 2016-10-20 Nec Laboratories America, Inc. Fine-grained image classification by exploring bipartite-graph labels
CN106203313A (en) * 2016-07-05 2016-12-07 昆明理工大学 The clothing classification of a kind of image content-based and recommendation method
CN106295693A (en) * 2016-08-05 2017-01-04 深圳云天励飞技术有限公司 A kind of image-recognizing method and device
CN107368832A (en) * 2017-07-26 2017-11-21 中国华戎科技集团有限公司 Target detection and sorting technique based on image
CN107729908A (en) * 2016-08-10 2018-02-23 阿里巴巴集团控股有限公司 A kind of method for building up, the apparatus and system of machine learning classification model
US9959480B1 (en) * 2015-06-03 2018-05-01 Amazon Technologies, Inc. Pixel-structural reference image feature extraction

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819566A (en) * 2012-07-17 2012-12-12 杭州淘淘搜科技有限公司 Cross-catalogue indexing method for business images
WO2016168235A1 (en) * 2015-04-17 2016-10-20 Nec Laboratories America, Inc. Fine-grained image classification by exploring bipartite-graph labels
US9959480B1 (en) * 2015-06-03 2018-05-01 Amazon Technologies, Inc. Pixel-structural reference image feature extraction
CN104978762A (en) * 2015-07-13 2015-10-14 北京航空航天大学 Three-dimensional clothing model generating method and system
CN105069466A (en) * 2015-07-24 2015-11-18 成都市高博汇科信息科技有限公司 Pedestrian clothing color identification method based on digital image processing
CN105373783A (en) * 2015-11-17 2016-03-02 高新兴科技集团股份有限公司 Seat belt not-wearing detection method based on mixed multi-scale deformable component model
CN105488490A (en) * 2015-12-23 2016-04-13 天津天地伟业数码科技有限公司 Judge dressing detection method based on video
CN106022375A (en) * 2016-05-19 2016-10-12 东华大学 HU invariant moment and support vector machine-based garment style identification method
CN106021603A (en) * 2016-06-20 2016-10-12 昆明理工大学 Garment image retrieval method based on segmentation and feature matching
CN106203313A (en) * 2016-07-05 2016-12-07 昆明理工大学 The clothing classification of a kind of image content-based and recommendation method
CN106295693A (en) * 2016-08-05 2017-01-04 深圳云天励飞技术有限公司 A kind of image-recognizing method and device
CN107729908A (en) * 2016-08-10 2018-02-23 阿里巴巴集团控股有限公司 A kind of method for building up, the apparatus and system of machine learning classification model
CN107368832A (en) * 2017-07-26 2017-11-21 中国华戎科技集团有限公司 Target detection and sorting technique based on image

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Part-Based and Feature Fusion Method for Clothing Classification;Pan Huo 等;《PCM 2016: Advances in Multimedia Information Processing》;20161127;231–241 *
Street-to-Shop: Cross-Scenario Clothing Retrieval via Parts Alignment and Auxiliary Set;Si Liu 等;《2012 IEEE Conference on Computer Vision and Pattern Recognition》;20120626;3330-3337 *
联合分割和特征匹配的服装图像检索;黄冬艳 等;《计算机辅助设计与图形学学报》;20170630;第29卷(第6期);1075-1084 *
面向个性化服装推荐的判断优化模型;王安琪 等;《计算机工程与应用》;20180630;第54卷(第11期);204-210,229 *

Also Published As

Publication number Publication date
CN109145947A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
Sarfraz et al. Deep view-sensitive pedestrian attribute inference in an end-to-end model
CN109145947B (en) Fashion women&#39;s dress image fine-grained classification method based on part detection and visual features
Oquab et al. Is object localization for free?-weakly-supervised learning with convolutional neural networks
CN106682598B (en) Multi-pose face feature point detection method based on cascade regression
Simo-Serra et al. Fashion style in 128 floats: Joint ranking and classification using weak data for feature extraction
CN103942577B (en) Based on the personal identification method for establishing sample database and composite character certainly in video monitoring
Gall et al. Hough forests for object detection, tracking, and action recognition
CN103514456B (en) Image classification method and device based on compressed sensing multi-core learning
CN105488809B (en) Indoor scene semantic segmentation method based on RGBD descriptors
CN105069481B (en) Natural scene multiple labeling sorting technique based on spatial pyramid sparse coding
US20110142335A1 (en) Image Comparison System and Method
CN106022375B (en) A kind of clothes fashion recognition methods based on HU not bending moment and support vector machines
CN105893936B (en) A kind of Activity recognition method based on HOIRM and Local Feature Fusion
Hu et al. Exploring structural information and fusing multiple features for person re-identification
CN110334687A (en) A kind of pedestrian retrieval Enhancement Method based on pedestrian detection, attribute study and pedestrian&#39;s identification
CN109583482A (en) A kind of infrared human body target image identification method based on multiple features fusion Yu multicore transfer learning
Lee et al. Shape discovery from unlabeled image collections
CN106021603A (en) Garment image retrieval method based on segmentation and feature matching
CN109344872A (en) A kind of recognition methods of national costume image
CN106056132B (en) A kind of clothes fashion recognition methods based on Fourier descriptor and support vector machines
Huang et al. PBC: Polygon-based classifier for fine-grained categorization
Ren et al. Facial expression recognition based on AAM–SIFT and adaptive regional weighting
CN107092931B (en) Method for identifying dairy cow individuals
CN106897669A (en) A kind of pedestrian based on consistent iteration various visual angles transfer learning discrimination method again
CN109509191A (en) A kind of saliency object detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant