CN106874924B - Picture style identification method and device - Google Patents

Picture style identification method and device Download PDF

Info

Publication number
CN106874924B
CN106874924B CN201510922662.9A CN201510922662A CN106874924B CN 106874924 B CN106874924 B CN 106874924B CN 201510922662 A CN201510922662 A CN 201510922662A CN 106874924 B CN106874924 B CN 106874924B
Authority
CN
China
Prior art keywords
picture
style
layer
probability
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510922662.9A
Other languages
Chinese (zh)
Other versions
CN106874924A (en
Inventor
石克阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Network Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510922662.9A priority Critical patent/CN106874924B/en
Publication of CN106874924A publication Critical patent/CN106874924A/en
Application granted granted Critical
Publication of CN106874924B publication Critical patent/CN106874924B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2111Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Physiology (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a picture style identification method and device. The method comprises the following steps: obtaining a sample picture, and processing the sample picture according to a preset mode to form a sample training set; initializing parameters of a preset multi-target convolutional neural network, and training sample pictures in a sample training set in the multi-target convolutional neural network after the parameters are initialized to obtain a picture style identification model; identifying the picture to be identified by using the picture style identification model, and acquiring probability vectors of the picture to be identified, wherein the probability value range of the probability value of each style type in the probability vectors is 0-1; and identifying the style type of the picture to be identified according to a preset judgment rule and a preset probability vector. By utilizing the technical scheme provided by the embodiment of the application, the styles of the commodity pictures can be automatically and accurately identified, the picture style identification efficiency can be improved, the working intensity of operators is reduced, and the user experience is improved.

Description

Picture style identification method and device
Technical Field
The application belongs to the technical field of image information processing, and particularly relates to a picture style identification method and device.
Background
With the development of the internet consumption era, consumers can select favorite commodities on line, and great convenience is brought to shopping of the users. For example, the consumer can select the favorite commodity category through the commodity picture displayed by the online merchant.
Generally, consumers often suffer from a variety of conceptual factors, such as brand, price, color, style type, etc., when purchasing goods online, and these conceptual factors can be generally manually set by the merchant on the service operation platform. Among the many conceptual factors, some factors such as the brand, price, color, etc. of the garment are often easily defined and generally distinguished by relatively clear, normative boundaries. For other goods concepts such as the clothing style, the concept has strong semantic meaning and is seriously influenced by subjective judgment of a person, so that different merchants or consumers have large deviation on the definition of a specific clothing style. For example, some consumers may consider the clothing to include the letter "ROCK" as a street style, and some consumers may consider the clothing to include the element such as a rivet as a street style. For the clothing designer, various elements may be combined to form the final product, which may include street elements and national or artistic elements. Through some practical application data analysis, about 15% of consumers are found to combine style keywords to perform commodity search when purchasing clothing commodities, and the ratio is just behind two factors of the brand and the category of clothing. It can be seen that this factor of style of goods plays a very important role in the online shopping guide of goods. However, when a merchant or a consumer determines the style type of a filled commodity, a large deviation often occurs due to subjective factors, so concepts similar to the style type of the commodity are often confused and ambiguous in the merchant and the consumer, the style classification of the merchant commodity or the screening of the commodity style of the consumer is affected, and the commodity marketing effect is reduced. This not only affects the conversion rate of the deal, but also reduces the user experience. Meanwhile, due to the wide variety of on-line commodities, the amount of information related to the style and type processing of the commodities is large, and a large amount of working time and labor are consumed by operators.
In the prior art, the commodity style is identified by adopting a mode of artificial subjective judgment, so that the differentiation of the commodity style identification result is large, and accurate, reasonable and uniform style identification is difficult to perform. Meanwhile, the working intensity of commodity style recognition of operators is increased by adopting a manual recognition judgment mode, and the style recognition efficiency is reduced.
Disclosure of Invention
The application aims to provide a picture style identification method and a picture style identification device, which can automatically and accurately identify the commodity style types of commodity pictures, improve the accuracy and efficiency of commodity style identification and reduce the working intensity of operators.
The application provides a picture style identification method and a picture style identification device, which are realized by the following steps:
a method of picture style identification, the method comprising:
obtaining a sample picture, and processing the sample picture according to a preset mode to form a sample training set;
initializing parameters of a preset multi-target convolutional neural network, and training sample pictures in the sample training set in the multi-target convolutional neural network after the parameters are initialized to obtain a picture style identification model;
identifying a picture to be identified by using the picture style identification model, and acquiring probability vectors of the picture to be identified belonging to different style types, wherein the value range of the probability value of each style type in the probability vectors is 0-1;
and identifying the style type of the picture to be identified according to a preset judgment rule and the probability vector.
A picture style identification apparatus, the apparatus comprising:
the training set construction module is used for acquiring a sample picture and processing the sample picture according to a preset mode to form a sample training set;
the sample training module is used for storing the set multi-target convolutional neural network; the system is also used for carrying out parameter initialization on the multi-target convolutional neural network, and training the sample pictures in the sample training set in the multi-target convolutional neural network after the parameter initialization to obtain a picture style identification model;
the picture identification module is used for identifying a picture to be identified by using the picture style identification model to acquire probability vectors of the picture to be identified belonging to different style types, and the value range of the probability value of each style type in the probability vectors is 0-1;
and the style identification module is used for storing a preset picture style judgment rule and identifying the style type of the picture to be identified according to the judgment rule and the probability vector.
The image style identification method and the image style identification device adopt the convolution neural network which is set by combining the image information of the commodity to identify the commodity style. In a specific implementation process, sample pictures in a sample training set can be trained in a preset multi-target convolutional neural network with a specific network layer structure to obtain an identification model with picture style identification capability, and then the style type of the picture to be identified which needs to be subjected to style classification can be automatically identified. The method provided by the application can realize automatic and rapid recognition of the commodity style, reduce the working strength of operators and improve the recognition efficiency. The sample pictures in the sample training set can be subjected to normalization and data expansion processing in advance, so that the style identification accuracy and reliability of the identification model can be improved. According to the method and the device, the style judgment rule can be preset as required, and the style type of the commodity picture can be reasonably, effectively and accurately identified based on the probability vector output by the identification model. By utilizing the embodiment of the application, the picture style identification accuracy can be greatly improved, the working intensity of operators is reduced, accurate style shopping guide can be provided for users, accurate commodity style classification can be provided for merchants, the user experience can be improved, and the commodity transaction conversion rate can be increased.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.
FIG. 1 is a flowchart of a method according to an embodiment of a method for recognizing a style of a picture provided by the present application;
FIG. 2 is a schematic diagram of an embodiment of the present application for cutting a sample picture of a garment;
FIG. 3 is a schematic diagram of a model structure of an embodiment of a pre-configured multi-target convolutional neural network provided in the present application;
FIG. 4 is a schematic diagram illustrating the visualization effect of 64 Gaussian convolution kernels learned by a first convolution layer in an application scenario of the present application;
FIG. 5 is a block diagram of an embodiment of a clothing style identification device according to the present application;
FIG. 6 is a schematic diagram of a model structure of an embodiment of a multi-target convolutional neural network in the sample training module provided in the present application;
FIG. 7 is a block diagram illustrating an exemplary embodiment of the style identification module provided herein;
FIG. 8 is a block diagram of an embodiment of the training set building block provided herein;
fig. 9 is a schematic block diagram of another embodiment of the training set building block provided in this application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The following describes the method and apparatus for recognizing picture style in detail with reference to the accompanying drawings. Fig. 1 is a flowchart of a method according to an embodiment of a method for recognizing a style of a picture. Although the present application provides method operational steps or apparatus configurations as illustrated in the following examples or figures, more or fewer operational steps or module configurations may be included in the method or apparatus based on conventional or non-inventive efforts. In the case of steps or structures where there is no logically necessary cause-and-effect relationship, the execution order of the steps or the block structure of the apparatus is not limited to the execution order or the block structure provided in the embodiments of the present application. When the described method or module structure is implemented in an actual device or end product, it can be executed sequentially or executed in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or the method or module structure shown in the figures.
According to the method and the device, the picture information of the commodity can be subjected to feature extraction from the picture information of the commodity, and then each style is taken as a dimension to be combined with a classifier for classification. The classifier can judge the possibility that the commodity picture belongs to each style type to obtain a corresponding probability value. The method can output the style types to which the commodity possibly belongs by combining the probability values obtained by each style, so that the style types to which the commodity belongs are identified. The scheme can be specifically described by taking clothing commodity identification as an example, and the technical scheme for identifying the picture style can be used for identifying the style types of clothing pictures such as men's clothing, women's clothing, underwear and the like, and can also be applied to identifying the style types of commodity display pictures such as shoes, hats, boxes, bags, decoration styles and the like. Specifically, as shown in fig. 1, an embodiment of the image style identification method provided by the present application may include:
s1: and acquiring a sample picture, and processing the sample picture according to a preset mode to form a sample training set.
In the application scene of the clothing style embodiment of the application, the sample picture of the clothing commodity can be acquired, and the specific acquisition mode can be acquired in advance through network search or shooting or a stored database. In the processing process of constructing and forming the sample training set, the method and the device can sample pictures of different clothes and obtain the pictures of the clothes samples. Generally, the obtained clothing sample picture is generally rectangular in shape, although the sample picture described in the present application does not exclude other polygonal shapes. In one embodiment, a non-rectangular sample picture may be preprocessed into a rectangular sample picture when the application scheme is implemented. This application can specifically explain for an application scenario embodiment with dress one-piece dress, can define 11 dress style types of different styles in this embodiment, then can gather 1500 clothing sample pictures to every dress style type. Each sample picture in this embodiment may include the label data of the set picture style label.
Further, the sample pictures can be processed according to a preset processing mode to form a sample training set. The specific processing mode can be set according to design requirements. In general, the preset manner set for processing the sample pictures to form the sample training set may generally include a manner of processing the sample pictures to meet the data processing requirements of a convolutional neural network as described below in the present application, such as processing the pictures into a format meeting the requirements of a specified size or color. In an application scenario of an embodiment of the present application, the preset manner may include performing normalization processing on the acquired clothing sample picture according to a unified rule, so as to form a normalized clothing sample picture in a predetermined picture format. After normalization processing is performed in the embodiment, the efficiency and accuracy of data processing during subsequent garment style identification can be improved. Then, the method and the device can perform data expansion processing on the clothes sample picture after the normalization processing, and improve the accuracy and reliability of the convolutional neural network training result. The set of the garment sample pictures processed as described above can be used as the sample training set described in the present application.
The clothes sample picture normalization and data expansion processing can normalize the sizes of different clothes sample pictures to the same size, reduce the influence of the pictures with different sizes on the determination of the clothes picture style, and the specific normalization scheme can select a corresponding processing method according to the data processing requirement. Specifically, in an embodiment of the present application, the processing the sample picture according to the preset manner may include:
s101: converting the color information of the sample picture into RGB three-channel color information;
s102: zooming the short edge of the sample picture to a first preset value, correspondingly zooming the long edge of the sample picture in the same proportion according to the zoom proportion of the short edge to form a first sample picture;
s103: and cutting the first sample picture into a square sample picture with the side length of the square sample picture being the first preset value by taking the vertical median intersection point of the long side and the short side of the first sample picture as a central point.
In practical implementation, the size of the sample pictures obtained by random acquisition is usually inconsistent. In this embodiment, the acquired garment sample picture may be normalized. The specific main steps can include two processing procedures: the first processing procedure can be converting the color information of the clothing sample picture into RGB three-channel color information; the second processing procedure may be to scale the sizes of the clothing sample pictures uniformly to the first sample picture with the short side being the first preset value and the other long side being the same scale. For example, the first preset value may be set to 256 pixels in this embodiment, assuming that the size of the clothing sample picture is [ W, H ], W is the width of the clothing sample picture, and H is the height of the clothing sample picture. If W > H, the height H of the clothing sample picture can be scaled to 256 pixels, the ratio is H/256, and accordingly, W is scaled to W/(H/256). If W < H, then W is scaled to 256 pixels accordingly, and H is then scaled the same. For example, if the size of a clothing sample picture P1 is 800 × 1200 pixels, the size of the first sample picture P1' formed by the above processing is 256 × 384 pixels.
The sample pictures described in the present application can be trained in a multi-target convolutional neural network that is applied for setup. In an embodiment of the application, after the first sample picture is obtained, a non-square picture in the first sample picture may be further cropped to be processed into a square sample picture with a first side length preset value. In the embodiment, the clothing main body in the clothing sample picture generally appears in the central area of the picture, so that the central area of the sample picture can be reserved during cutting, the clothing main body information in the clothing sample picture can be reserved to the maximum extent, and the clothing style identification accuracy is improved. The center area in this embodiment may be located by an intersection of the long-side vertical middle line and the short-side vertical middle line of the first sample picture. In a specific application scenario, for example:
if the clothing sample picture is scaled to have a size of [256, H ], i.e., H > 256 pixels, then the top and bottom of the clothing sample picture can be symmetrically cropped, so as to ensure that the central region of the clothing sample picture can be preserved and the vertical edge H of the clothing sample picture is 256 pixels. The upper side and the lower side of the clothing sample picture can be respectively cut into (H-256)/2 pixels;
if the clothing sample picture is scaled to have a size of [ W, 256], i.e., W > 256 pixels, then the left and right sides of the clothing sample picture can be symmetrically cropped, so as to ensure that the central area of the clothing sample picture can be reserved and the lateral side W of the clothing sample picture is 256 pixels. The left side and the right side of the clothing sample picture can be respectively cut out by (W-256)/2 pixels.
In an embodiment of the present application, data expansion may be performed on the normalized sample picture. Specifically, the data volume of the original acquired sample picture can be expanded to the data volume with the preset requirement in a certain mode, so that overfitting can be prevented during subsequent convolutional neural network processing, and the reliability of convolutional neural network data processing can be improved. The application provides a processing method for data expansion, and specifically, in an embodiment of the application, after performing data expansion processing after normalizing the sample picture, the method may further include:
and respectively cutting out a square sample picture with the side length of a second preset value by taking the upper left corner, the upper right corner, the lower right corner, the middle left part, the middle right part, the middle upper edge part and the middle lower edge part of the square sample picture with the side length of the first preset value as boundaries, and performing mirror image turning by taking one vertical side of the square sample picture with the side length of the second preset value as an axis to form a new square sample picture with the side length of the second preset value.
The processing method for data expansion provided by the embodiment can expand the original clothing sample picture data to 16 times of the original data volume in the application scene of clothing style identification. Specifically, for example, as shown in fig. 2, fig. 2 is a schematic diagram of an implementation process of the present application for cutting a clothing sample picture. As shown in fig. 2, when data expansion is performed on the square clothing sample picture PA with the normalized side length of 256 pixels, 8 key parts of the clothing sample picture PA, that is, the upper left corner, the upper right corner, the lower left corner, the lower right corner, the middle left part, the middle right part, the middle upper part, and the middle lower part, which are set in this embodiment, may be cropped, and 8 square sample pictures P01, P02, P03, P04, P05, P06, P07, and P08 with the side length of 227 pixels are extracted. Then, mirror image inversion with vertical side as axis can be performed on each square sample picture with 227 × 227 pixels extracted from each cut to form new 8 square sample pictures with side length of 227 pixels, P11, P12, P13, P14, P15, P16, P17 and P18. Thus, the normalized clothing sample picture PA is subjected to the data expansion processing to obtain 16 square sample pictures P01, P02, P03, P04, P05, P06, P07, P08, P11, P12, P13, P14, P15, P16, P17 and P18 with the side length of 227 pixels. And each normalized clothing sample picture is subjected to data expansion processing to generate a picture set, and the picture set forms a sample training set of the application.
It should be noted that the second preset value 227 described in this embodiment may be specifically set according to actual data processing requirements. Generally, the second preset value setting may be slightly smaller than the length of the short side of the normalized clothing sample picture. The second preset value of 227 pixels can be set for the square sample picture with the side length of 256 pixels as in the above embodiment. In addition, the mirror image flipping described in this embodiment may specifically perform processing with the vertical edge on the left or right of the square sample picture with the side length being the second preset value as an axis. For example, in the application scenario of this embodiment, the mirror image flipping may be performed by taking the extracted right vertical edge of the square sample picture with the side length of 227 pixels as an axis.
According to the method and the device, the sample picture can be acquired, and the sample training set can be obtained after normalization and data expansion processing are carried out on the acquired sample picture.
S2: initializing parameters of a preset multi-target convolutional neural network, and training sample pictures in the sample training set in the multi-target convolutional neural network to obtain a picture style recognition model.
The scene faced by the style recognition of the commodity in the real scene is usually complex, so that the multi-target convolutional neural network formed by redesigning the customized neural network can be adopted for sample training to obtain the convolutional neural network model with the style recognition capability. The convolutional neural network used in the present application may be trained with multiple targets. The model structure of the multi-target convolutional neural network can be preset, the sample pictures in the sample training set are trained in the preset multi-target convolutional neural network, and then the recognition model capable of recognizing the commodity style in the pictures is obtained.
Generally, after the multi-target convolutional neural network model is determined, parameter initialization may be performed on the multi-target convolutional neural network. Specifically, in an embodiment of the present application, a fine-tuning (fine-tuning) method may be adopted to initialize parameters of the multi-target convolutional neural network, so that the influence of insufficient model training caused by a small number of labels may be effectively reduced, and the accuracy and reliability of the image style recognition of the present application may be improved. Generally, for the deep learning problem of the style recognition, there may be a case where the labeling data of the sample picture acquired for each category is insufficient, and particularly, the effective convergence condition of the parameter is insufficient. Therefore, in an embodiment of the present application, a fine-tuning technique (fine-tuning) may be used to initialize the parameters of the multi-target convolutional neural network preset in the present application through an existing stable model. Specific operations in one embodiment may be found in the following patent application nos.: CN201510020689.9, "a method and apparatus for determining information of displayed pictures", initializes parameters of the multi-target convolutional neural network described in the present application, and the initialized network contents may include all convolutional layers and all connected layers. After the initialization is completed, the multi-target convolutional neural network training can be continued based on the existing 1500 sample pictures, and can also be performed after the initialization parameters are adaptively adjusted, so that the parameters can be quickly and accurately converged into the parameter content required by the style.
The preset model structure of the multi-target convolutional neural network can be designed and set according to the sample training requirement and the practical application scene. The present application provides a model structure of a multi-target convolutional neural network, and in particular, in an embodiment of the present application, the preset multi-target convolutional neural network is configured to include:
three convolutional layers, two fully interconnected layers, three RELU layers, three Maxpooling layers, a Softmax layer comprising at least two Softmax sublayers.
In the specific implementation process of the multi-target convolutional neural network of the embodiment, a corresponding neural network layer structure can be set according to the processing requirements of the convolutional neural network and the design requirements of garment style recognition. For example, in one embodiment, a RELU layer and a normalization layer may be incorporated into each convolutional layer, which may avoid over-fitting problems during model training. In other embodiments, a Dropout layer may be further connected to the fully connected layer, which may be used to improve the efficiency of model convergence. Of course, the actual deep convolutional neural network may add other network structures according to the application scenario requirements, for example, a Norm layer may also be added.
In an embodiment of the present application, the Loss value (Loss) determined by each sublayer of the Softmax layer may be propagated backward (Back Propagation) and affect parameters of an upper layer. Specifically, in another embodiment of the image style identification method according to the present application, the preset multi-target convolutional neural network is configured to include:
and the Softmax sublayer of the Softmax layer transmits the judged loss value backwards to a fully-connected layer connected with the Softmax sublayer, and the fully-connected layer adjusts the parameters of the fully-connected layer correspondingly according to the received loss value.
Specifically, for example, if the Softmax-A sub-layer corresponds to a street style and the Softmax-B sub-layer corresponds to a literature style, the two discriminant functions will obtain two Loss values, such as Loss-A and Loss-B. Both values may affect the parameters of the fully-connected layer of the previous layer when propagating backwards. The fully-connected layer of the previous layer can adjust and optimize parameters of the fully-connected layer according to loss values fed back by the Softmax sublayers, and therefore the fully-connected layer can describe the style of the garment picture more accurately. Correspondingly, when the next garment sample picture training is performed, each Softmax sublayer can judge the garment style type more accurately according to the description information of the fully connected layer after the adjustment and optimization. Therefore, based on the structure and the processing mode of the multi-target convolutional neural network, after a sufficient amount of samples are obtained for training, the accuracy of the multi-target convolutional neural network image style identification in the application can be greatly improved.
FIG. 3 is a schematic diagram of a model structure of an embodiment of the preset multi-target convolutional neural network provided in the present application. Specifically, as shown in fig. 3, in an embodiment of the present application, the preset multi-target convolutional neural network is configured to include:
a first gaussian convolution layer comprising 64 convolution kernels; a first Maxpooling layer, a RELU layer, a normalization layer connected to the first Gaussian convolution layer; a second gaussian convolution layer including 32 convolution kernels connected to the first maxporoling layer; a second Maxpooling layer, a RELU layer, a normalization layer connected to the second Gauss convolution layer; a third gaussian convolution layer including 16 convolution kernels connected to the second maxporoling layer; a third Maxpooling layer, a RELU layer, a normalization layer connected to the third Gaussian convolution layer; a first fully interconnected layer to which the third Maxpooling layer is connected; a second fully connected layer and a Dropout layer connected to the first fully connected layer; and the Softmax layer is connected with the second fully-communicated layer and comprises N Softmax sublayers, and N is more than or equal to 2.
As shown in fig. 3, the multi-target convolutional neural network in this embodiment may access a Softmax layer at the last fully connected layer, where the Softmax layer may include a plurality of sub-layers, each Softmax sub-layer may correspond to a judgment function of a garment style type, for example, the Softmax-a sub-layer corresponds to a street style, and the Softmax-B sub-layer corresponds to a literature style. The specific parameters of the judgment function in the Softmax sublayer can be obtained by confirming and optimizing the training process of the convolutional neural network sample. In this embodiment, each Softmax sublayer may be determined based on feature information in the same all-connected layer, and the determination of each Softmax sublayer may be independent from each other and may not affect each other.
Specifically, the Softmax layer used in this embodiment may be a nonlinear classifier, and the classifier training may be performed by using the feature vector output by the fully connected layer and the corresponding label. The entire process of Softmax layer processing may typically include three steps: in the first step, the maximum value of all dimensions of the fixed characteristic vector X output by the full connected layer can be calculated and recorded as Max _ i; the second step may convert each dimension in the feature vector X into a number between 0 and 1 using an exp expression, that is, each dimension X [ i ] in the feature vector X is exp (X [ i ] -Max _ i); in the third step, all values of the feature vector X converted in the second step may be summed and then normalized accordingly, that is, X [ i ] ═ X [ i ]/sum (X [ i ]).
The convolution process is usually a feature extraction method, and can screen out the parts of the image which meet the conditions. In an embodiment of the present application, the convolutional layer in the preset multi-target convolutional neural network may adopt a gaussian convolutional layer, and the gaussian convolutional layer is mainly used for performing a convolution operation on an output result of a previous layer and a plurality of gaussian convolutional kernels. The parameters of the gaussian convolution kernel in this embodiment can be obtained through learning. In one embodiment, three gaussian convolution layers may be used, and the size of the gaussian convolution kernel used in each layer may be set to 5 × 5 pixels, while the convolution kernel in each gaussian convolution layer may process all pixels in the clothing sample picture through calculation. In general, in the principle of deep learning, the data of the lower convolutional layer can represent fine-grained features, and the data of the higher convolutional layer can represent abstract features. Therefore, in the convolutional layers in the multi-target convolutional neural network according to an embodiment of the present application, the number of convolution kernels of the convolutional layers at the upper layer may be greater than that of the convolutional layers at the lower layer. In one specific application, for example, the three convolutional layers may include a first convolutional layer (i.e., a top convolutional layer) having 64 convolutional kernels, a second convolutional layer having 32 convolutional kernels, and a third convolutional layer (i.e., a bottom convolutional layer) having 16 convolutional kernels, and the convolutional kernels of the three convolutional layers may each have a size of 5 × 5 pixels. Fig. 4 is a schematic diagram illustrating the visualization effect of the 64 gaussian convolution kernels learned by the first convolution layer.
The maxpoling layer described in this embodiment may be used to perform a down-sampling operation on the output of the previous convolutional layer, that is, to select the maximum value in a preset sampling window with a fixed size as the value of the down-sampled point. For example, in one specific embodiment, the sampling windows used by the maxpoling layers may each be set to 3 x 3 pixels, and the sampling interval may be 2 pixels.
Generally, neurons in a neural network have non-linear characteristics that are not saturated. The output and input x of a conventional neuron have a saturated nonlinear characteristic, i.e., f (x) ═ tanh (x), while the unsaturated nonlinear characteristic makes the neuron have a new functional relationship, f (x) ═ 0, x. In this embodiment, the RELU (corrected linear unit, an activation function) layer may be mainly used to modify the data result of the previous layer, including changing all the inputs of the previous layer less than 0 into 0 and then outputting, and the output greater than 0 is unchanged. In this embodiment, the use of the RELU layer can improve the overall training efficiency of the multi-target convolutional neural network model.
In the embodiment of the application, a RELU layer in the multi-target convolutional neural network can be accessed to a normalization layer, and can be used for enhancing the overall generalization performance of the multi-target convolutional neural network. In a specific process, the normalization may be based on a local window of each pixel, i.e., a local normalization operation may be performed. The size of the local window may be the same size as the convolutional layer convolution kernel, e.g., 5 x 5 pixels.
In the embodiment of the application, the full communication layer can be used as a connection layer between nodes of an upper layer and a lower layer, and connection relation is established between data of each node obtained by the upper layer and the lower layer. For example, the output of the full connectivity layer described in this embodiment may be a 128-dimensional matrix.
The Dropout (sleep layer) layer in this embodiment may be used to improve the efficiency of model convergence, for example, the data of 50% of the output nodes in the previous layer may be randomly set to 0, so as to avoid over-fitting.
According to the method and the device, in the model training process, characteristic pretreatment on sample pictures in a sample training set is not needed, and the sample pictures can be used as a characteristic to be input into the multi-target convolutional neural network. Thus, each picture in the sample training set can be directly converted into a corresponding feature matrix [ W, H, C ] after training. And then, taking K sample pictures as units at a time, calling all pictures in the sample training set into the multi-target convolutional neural network for learning. The multi-target convolutional neural network avoids the overall or local characteristic preprocessing processes of the clothes sample picture such as early background processing and interference information processing, can directly use the whole sample picture for training and recognition, and improves the picture style recognition efficiency. The K may be set according to data processing requirements, and may generally take a value of 32 or 64.
Specifically, in the training process, a random gradient descent method can be adopted to perform iterative learning on the multi-target convolutional neural network. Usually, each iteration will update the parameters of each layer in the multi-target convolutional neural network, such as the weight values and the bias values of the nodes in the network layer, until the parameter values converge to obtain the optimal solution. The specific convergence condition may be set according to the data processing requirement, and generally, the optimal convolutional neural network model meeting the design requirement may be obtained after about 150000 iterations of the multi-target convolutional neural network provided in this embodiment. The multi-target convolutional neural network obtained after the sample training has certain style and type identification and judgment capacity, and therefore the multi-target convolutional neural network can be used as the image style identification model.
The method and the device can initialize the parameters of the preset multi-target convolutional neural network, and then train the sample pictures in the sample training set in the multi-target convolutional neural network to obtain the picture style recognition model capable of judging the commodity style types in the pictures.
S3: and identifying the picture to be identified by using the picture style identification model, and acquiring probability vectors of the picture to be identified, wherein the probability value range of the probability value of each style type in the probability vectors is 0 to 1.
After the preset multi-target convolutional neural network trains the clothing sample picture processed in a specific mode, the clothing style recognition model capable of judging the picture style can be obtained. At this time, for an arbitrarily input picture to be recognized, the picture to be recognized can be predicted and recognized by using a trained picture style recognition model, so that an N-dimensional probability vector P ═ { P1, P2, …, PN } can be obtained. For any style type i in the picture style identification model, the corresponding probability value Pi in the N-dimensional probability vector can represent the probability that the current picture to be identified belongs to the style type i. For example, in an application scenario, the 11-dimensional probability vector P of the one-piece dress picture PA ═ {0.70, 0.35, 0.98, 0.84, 0.69, 0.11, 0.20, 0.48, 0.97, 0.92} can be obtained through the clothing style identification model, and it can be sequentially expressed that the one-piece dress picture PA belongs to the probabilities of a literary style, a street style, a fresh style, a college style, a wild style, a punc style, a neutral style, a national style, an european style, a gentlewoman style, and a rural style.
It should be noted that the output result data of the multi-target convolutional neural network provided in the present application obtains N feature values, such as P ═ P1, P2, …, PN, when the picture style determination is finally performed. For a conventional convolutional neural network, the sum of the probabilities of all N eigenvalues is 1, i.e.:
Figure GDA0002725823410000111
the probability of each picture style type obtained by the preset multi-target convolutional neural network in the embodiment of the present application may be 0 to 1, that is, p in the embodiment of the present applicationi∈[1,N]. The multi-target convolutional neural network can effectively improve the training effect of the sample picture and the accuracy of the style recognition of the sample picture to be recognized.
And identifying the picture to be identified by using the picture style identification model obtained by training, so as to obtain probability vectors of the picture to be identified belonging to different style types.
S4: and identifying the style type of the picture to be identified according to a preset judgment rule and the probability vector.
After obtaining the probability vectors of the clothes pictures to be identified belonging to different style types, the embodiment can identify the style types of the clothes pictures to be identified based on the obtained probability vectors according to the preset judgment rules, and finally can output the determined style types of the clothes pictures to be identified. The specific judgment and output rules can be set according to actual application scenes and design requirements, different designers can set different judgment rules based on the preset probability vector output by the multi-target convolutional neural network, for example, the style type corresponding to the probability vector with the highest value can be directly selected as the most finally recognized style type, or the style types of the pictures to be recognized are determined after the probability values are screened and compared according to a certain specific mode. Other embodiments for identifying and judging the style type of the picture to be identified based on the probability vector output by the multi-target convolutional neural network are within the scope of the present application.
The application provides a judgment and output mechanism of picture style types in consideration of practical application scenes of commodity style identification such as clothes, shoes and hats. In this embodiment, the last identified style type for each article may not exceed M. Specifically, in an embodiment of the present application, the identifying, according to a preset determination rule and the probability vector, the style type of the picture to be identified may include:
selecting the first M probability values with the maximum probability value from the probability vector, wherein M is more than or equal to 1 and less than N, and N is the number of the probability values in the probability vector; and the number of the first and second groups,
if the M probability values are all larger than or equal to a first threshold value, judging that the picture to be identified belongs to the style types corresponding to the M probability values;
if the M probability values are smaller than the first threshold value and the maximum probability value in the M probability values is larger than or equal to a second threshold value, judging that the picture to be identified belongs to the style type corresponding to the maximum probability value in the M probability values;
if the M probability values are all smaller than a second threshold value and the minimum probability value of the M probability values is larger than or equal to a third threshold value, judging that the picture to be identified belongs to the style types corresponding to the M probability values;
if the M probability values are all smaller than a second threshold, and at least one probability value smaller than a third threshold and at least one probability value larger than or equal to the third threshold exist in the M probability values, judging that the picture to be identified belongs to a style type corresponding to the probability value larger than or equal to the third threshold in the M probability values;
if the M probability values are all smaller than a third threshold value, the picture to be recognized is judged to belong to the style types corresponding to the M probability values.
The value of M, the first threshold, the second threshold, and the third threshold in this embodiment may be set by user according to actual determination requirements and output requirements. In this application scenario for implementing clothing style recognition, generally, M may be represented as a maximum value of the number of styles to which the outputted to-be-recognized picture may belong, in this embodiment, the first threshold may be set to be greater than the second threshold, the second threshold may be greater than the third threshold, and the first threshold may be represented as a threshold that is set that the to-be-recognized clothing picture has a higher probability of belonging to a certain style. In the embodiment, even if the M probability values are all smaller than the third threshold, the probability that the to-be-recognized clothing picture belongs to the style types corresponding to the M probability values is smaller than the set third threshold, and the style types corresponding to the M probability values are still output in the embodiment. Therefore, the automatic and effective recognition and output style types of the picture to be recognized can be effectively ensured, and the user experience is guaranteed.
The image style judging rule and the style output mechanism provided by the embodiment of the application can automatically, reasonably and effectively identify the clothing style according to the actual application characteristics of the clothing style and the output result of the multi-target convolutional neural network model with the specific structure set by the application. Compared with the prior art, the technical scheme provided by the embodiment of the application can greatly improve the style identification accuracy of the picture, not only provides accurate style shopping guide for users and accurate commodity style classification for merchants, but also can improve user experience and increase the commodity transaction conversion rate.
In another preferred embodiment of the present application, a setting scheme for a specific practical application of the picture style determination output rule is further provided. In the scheme of the embodiment, the style types of each finally identified picture can be not more than 3, so that the interference of excessive output results on user identification or difficulty in selection is avoided. Meanwhile, the embodiment provides a value range of the first threshold, the second threshold and the third threshold based on actual scene applications such as clothes, shoes, hats and the like, and the style type of the picture to be recognized can be accurately recognized and judged. Specifically, in another embodiment of the present application, the style type to which the picture to be recognized belongs may be recognized by using at least one of the following setting manners:
m takes a value of 3;
the first threshold value range includes: 0.9 to 0.95;
the second threshold value range includes: 0.4 to 0.6;
the third threshold value range includes: 0.2 to 0.3.
In a specific application scenario, for example, the 11-dimensional probability vector P of the dress picture PA to be recognized obtained by the clothing style recognition model is {0.70, 0.35, 0.98, 0.84, 0.69, 0.11, 0.20, 0.48, 0.97, 0.92 }. The first three values PA, Pb, Pc with the highest probability values may be selected from the probability vector PA, and then the corresponding decision output logic may include:
s401: if the probability values of Pa, Pb and Pc are all larger than or equal to 0.9, judging that the clothes in the clothes picture to be identified belong to the style types corresponding to Pa, Pb and Pc. For example, when Pa, Pb and Pc are respectively 0.98, 0.97 and 0.92, fresh air, gentlewoman air and rural air corresponding to three probability values are output.
S402: if the probability values of the Pa, the Pb and the Pc are all smaller than 0.9, and the maximum probability value of the Pa, the Pb and the Pc is larger than or equal to 0.5, the clothes in the clothes picture to be identified are judged to belong to the style type corresponding to the maximum probability value of the Pa, the Pb and the Pc. For example, Pa, Pb, and Pc are 0.70, 0.84, and 0.69, respectively, and then output academy winds with probability values of 0.84.
S403: if the probability values of the Pa, the Pb and the Pc are all smaller than 0.5, and the minimum probability value of the Pa, the Pb and the Pc is larger than or equal to 0.25, judging that the clothes in the clothes picture to be identified belong to the style types corresponding to the Pa, the Pb and the Pc.
S404: if the probability values of the Pa, the Pb and the Pc are all smaller than 0.5, and at least one probability value of the Pa, the Pb and the Pc is smaller than 0.25 and at least one probability word is larger than or equal to 0.25, it is judged that the clothes in the clothes picture to be identified belong to the style type corresponding to the probability value of the Pa, the Pb and the Pc which is larger than or equal to 0.25. For example, when Pa, Pb, Pc are 0.35, 0.20, 0.48 respectively, street wind and European and American wind corresponding to 0.35 and 0.48 are output.
S405: if the probability values of Pa, Pb and Pc are all smaller than a third threshold value, the clothing in the clothing picture to be identified is judged to belong to the style type corresponding to Pa, Pb and Pc.
The application provides a method capable of automatically identifying the style of a commodity main body in a commodity picture, and a plurality of (within 3, for example) identified style types to which the commodity main body belongs can be automatically output aiming at each specific commodity picture. Compared with the prior art, the technical scheme provided by the embodiment of the application can greatly improve the accuracy of picture style identification, reduce the working intensity of operators, provide accurate style shopping guide for users and accurate commodity style classification for merchants, improve the user experience and increase the commodity transaction conversion rate.
Based on the picture style identification method, the application provides a picture style identification device. Fig. 5 is a schematic block diagram of an embodiment of an image style recognition apparatus according to the present application, and as shown in fig. 5, the apparatus may include:
the training set constructing module 101 may be configured to obtain a sample picture, and process the sample picture according to a preset manner to form a sample training set;
the sample training module 102 may be configured to store the set multi-target convolutional neural network; the method can also be used for initializing parameters of the multi-target convolutional neural network, and training the sample pictures in the sample training set in the multi-target convolutional neural network after the parameters are initialized to obtain a picture style identification model;
the picture identification module 103 may be configured to identify a picture to be identified by using the picture style identification model, and acquire probability vectors of the picture to be identified belonging to different style types, where a value range of a probability value of each style type in the probability vectors is 0 to 1;
the style identification module 104 may be configured to store a preset picture style judgment rule, and identify a style type to which the picture to be identified belongs according to the judgment rule and the probability vector.
The multi-target convolutional neural network stored in the sample training module 102 may be specifically set according to the data processing requirements of different application scenarios. Fig. 6 is a schematic structural diagram of a model of an embodiment of the multi-target convolutional neural network in the sample training module 102 provided in the present application, and as shown in fig. 6, the multi-target convolutional neural network stored in the sample training module 102 is configured to include:
three convolutional layers, two fully interconnected layers, three RELU layers, three Maxpooling layers, a Softmax layer comprising at least two Softmax sublayers.
Of course, as described in other embodiments, one or more layer structures may be added in the actual design of the multi-target convolutional neural network, such as adding a maxporoling layer, a RELU layer, a normalization layer, a Dropout layer, and the like.
In one embodiment, the preset multi-target convolutional neural network may be configured to include:
and the Softmax sublayer of the Softmax layer transmits the judged loss value backwards to a fully-connected layer connected with the Softmax sublayer, and the fully-connected layer adjusts the parameters of the fully-connected layer correspondingly according to the received loss value.
Therefore, the fully connected layer connected with the Softmax sub-layers can adjust and optimize parameters of the fully connected layer according to the loss values fed back by the Softmax sub-layers, and the fully connected layer can describe the styles of the pictures more accurately. Correspondingly, when the next sample picture training is performed, each Softmax sublayer can judge the picture style type more accurately according to the description information of the adjusted and optimized full connected layer.
In another specific embodiment of the multi-target convolutional neural network stored in the sample training module 102, the preset multi-target convolutional neural network is configured to include:
a first gaussian convolution layer comprising 64 convolution kernels; a first Maxpooling layer, a RELU layer, a normalization layer connected to the first Gaussian convolution layer; a second gaussian convolution layer including 32 convolution kernels connected to the first maxporoling layer; a second Maxpooling layer, a RELU layer, a normalization layer connected to the second Gauss convolution layer; a third gaussian convolution layer including 16 convolution kernels connected to the second maxporoling layer; a third Maxpooling layer, a RELU layer, a normalization layer connected to the third Gaussian convolution layer; a first fully interconnected layer to which the third Maxpooling layer is connected; a second fully connected layer and a Dropout layer connected to the first fully connected layer; and the Softmax layer is connected with the second fully-communicated layer and comprises N Softmax sublayers, and N is more than or equal to 2.
In particular, reference may be made to the multi-target convolutional neural network structure shown in fig. 3.
Fig. 7 is a schematic block diagram of an embodiment of the style identification module 104 provided in the present application. As shown in fig. 7, the style identification module 104 may include:
a probability value selecting module 1041, configured to select, from the probability vector, the first M probability values with the largest probability values, where M is greater than or equal to 1 and is less than N, where N is the number of probability values in the probability vector;
the recognition result output module 1042 may be configured to recognize a style type to which the picture to be recognized belongs by using a style determination rule:
if the M probability values are all larger than or equal to a first threshold value, judging that the picture to be identified belongs to the style types corresponding to the M probability values;
if the M probability values are smaller than the first threshold value and the maximum probability value in the M probability values is larger than or equal to a second threshold value, judging that the picture to be identified belongs to the style type corresponding to the maximum probability value in the M probability values;
if the M probability values are all smaller than a second threshold value and the minimum probability value of the M probability values is larger than or equal to a third threshold value, judging that the picture to be identified belongs to the style types corresponding to the M probability values;
if the M probability values are all smaller than a second threshold, and at least one probability value smaller than a third threshold and at least one probability value larger than or equal to the third threshold exist in the M probability values, judging that the picture to be identified belongs to a style type corresponding to the probability value larger than or equal to the third threshold in the M probability values;
if the M probability values are all smaller than a third threshold value, the picture to be recognized is judged to belong to the style types corresponding to the M probability values.
Another embodiment of the style identification apparatus of the present application provides an implementation manner of a specific value range of parameters in the style identification module 104. Specifically, the style identification module 104 may identify the style type of the picture to be identified by at least one of the following setting modes:
m takes a value of 3;
the first threshold value range includes: 0.9 to 0.95;
the second threshold value range includes: 0.4 to 0.6;
the third threshold value range includes: 0.2 to 0.3.
Fig. 8 is a schematic block diagram of an embodiment of the training set building module 101 provided in this application, and as shown in fig. 8, the training set building module 101 may include:
a color information conversion module 1011, configured to convert color information of the sample picture into RGB three-channel color information;
a scaling module 1012, configured to scale the short edge of the sample picture to a first preset value, and scale the long edge of the sample picture according to the scaling of the short edge to form a first sample picture;
the central region cropping module 1013 may be configured to crop the first sample picture into a square sample picture with the side length being the first preset value, with a vertical middle line intersection point of the long side and the short side of the first sample picture as a central point.
Fig. 9 is a schematic block structure diagram of another embodiment of the training set constructing module 101 provided in this application, and as shown in fig. 9, the training set constructing module 101 may further include:
the first extension module 1014 can be configured to respectively cut out a square sample picture with a side length of a second preset value by taking the side length as a boundary of an upper left corner, an upper right corner, a lower left corner, a lower right corner, a left middle part, a right middle part, an upper middle part and a lower middle part of the square sample picture with the first preset value;
the second expansion module 1015 may be configured to perform mirror image flipping with a vertical side of the square sample picture with the side length being the second preset value as an axis to form a new square sample picture with the side length being the second preset value.
In the above embodiment of the training set constructing module 101, the sample picture normalization and the data expansion processing may be performed, sizes of different sample pictures may be normalized to the same size, and influences of pictures with different sizes on determining the style of the picture are reduced. In one embodiment, the commodity main body in the sample picture generally appears in the center area of the picture, so that the center area of the sample picture can be reserved during cutting, the commodity main body information in the sample picture can be reserved to the maximum extent, and the accuracy of style identification is improved. The data expansion method provided by the embodiment can expand the data volume of the original acquired sample picture to the data volume required by the preset, so that overfitting can be prevented during subsequent convolutional neural network processing, and the reliability of convolutional neural network data processing can be improved.
According to the picture style identification method and device, the sample pictures in the sample training set can be trained in the preset multi-target convolutional neural network with the specific network structure to obtain the identification model with the picture style identification capability, and then the style types of the pictures to be identified, which need to be subjected to style classification, can be automatically identified. The method provided by the application can realize automatic and rapid recognition of the commodity style, reduce the working strength of operators and improve the recognition efficiency. The sample pictures in the sample training set can be subjected to normalization and data expansion processing in advance, so that the style identification accuracy and reliability of the identification model can be improved. According to the method and the device, the style judgment rule can be preset as required, and the style type of the commodity picture can be reasonably, effectively and accurately identified based on the probability vector output by the identification model. By utilizing the embodiment of the application, the picture style identification accuracy can be greatly improved, the working intensity of operators is reduced, accurate style shopping guide can be provided for users, accurate commodity style classification can be provided for merchants, the user experience can be improved, and the commodity transaction conversion rate can be increased.
Although the description of picture cropping, RGB channel color conversion, fine-tuning (fine-tuning), convolutional neural network model layer structure, convolution, loss feedback, and picture information processing of references, etc., neural network model structure is referred to in the present disclosure, the present disclosure is not limited to the case of information processing, neural network model structure, which must be fully standard or the manner referred to. The above description of the embodiments of the present application is only an application of some embodiments of the present application, and the embodiments of the present application may be implemented on the basis of some standards, models, and methods with slight modifications. Of course, other non-inventive variations of the processing method steps described in the above embodiments consistent with the present application may still be implemented in the same application, and are not described herein again.
Although the present application provides method steps as described in an embodiment or flowchart, additional or fewer steps may be included based on conventional or non-inventive efforts. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or client product executes, it may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.
The apparatuses or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. The functionality of the modules may be implemented in the same one or more software and/or hardware implementations of the present application. Of course, a module that implements a certain function may be implemented by a plurality of sub-modules or sub-units in combination.
The methods, apparatus or modules described herein may be implemented in computer readable program code to a controller implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
Some of the modules in the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary hardware. Based on such understanding, the technical solutions of the present application may be embodied in the form of software products or in the implementation process of data migration, which essentially or partially contributes to the prior art. The computer software product may be stored in a storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, mobile terminal, server, or network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. All or portions of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
While the present application has been described with examples, those of ordinary skill in the art will appreciate that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims (16)

1. A picture style identification method is characterized by comprising the following steps:
obtaining a sample picture, and processing the sample picture according to a preset mode to form a sample training set;
initializing parameters of a preset multi-target convolutional neural network, and training sample pictures in the sample training set in the multi-target convolutional neural network after the parameters are initialized to obtain a picture style identification model;
identifying a picture to be identified by using the picture style identification model, and acquiring probability vectors of the picture to be identified belonging to different style types, wherein the value range of the probability value of each style type in the probability vectors is 0-1; wherein the sum of the probability values of the respective genre types in the probability vector is greater than 1;
identifying the style type of the picture to be identified according to a preset judgment rule and the probability vector; the style types of the identified pictures to be identified do not exceed M, M is more than or equal to 1 and less than N, and N is the number of probability values in the probability vector.
2. The picture style identification method of claim 1, wherein the preset multi-target convolutional neural network is configured to include:
three convolutional layers, two fully-connected layers, three RELU layers, three Maxpooling layers and a Softmax layer comprising at least two Softmax sublayers; each of the at least two Softmax sublayers is provided with a judging function of a style type, and each Softmax sublayer is used for determining the probability that the picture to be identified belongs to the style type corresponding to the Softmax sublayer.
3. The picture style identification method of claim 2, wherein the preset multi-target convolutional neural network is configured to include:
and the Softmax sublayer of the Softmax layer transmits the judged loss value backwards to a fully-connected layer connected with the Softmax sublayer, and the fully-connected layer adjusts the parameters of the fully-connected layer correspondingly according to the received loss value.
4. The picture style identification method of claim 2, wherein the preset multi-target convolutional neural network is configured to include:
a first gaussian convolution layer comprising 64 convolution kernels; a first Maxpooling layer, a RELU layer, a normalization layer connected to the first Gaussian convolution layer;
a second gaussian convolution layer including 32 convolution kernels connected to the first maxporoling layer; a second Maxpooling layer, a RELU layer, a normalization layer connected to the second Gauss convolution layer;
a third gaussian convolution layer including 16 convolution kernels connected to the second maxporoling layer; a third Maxpooling layer, a RELU layer, a normalization layer connected to the third Gaussian convolution layer;
a first fully interconnected layer to which the third Maxpooling layer is connected;
a second fully connected layer and a Dropout layer connected to the first fully connected layer;
and the Softmax layer is connected with the second fully-communicated layer and comprises N Softmax sublayers, and N is more than or equal to 2.
5. The picture style identification method according to claim 1, wherein the identifying the style type of the picture to be identified according to the preset judgment rule and the probability vector comprises:
selecting the first M probability values with the maximum probability value from the probability vector, wherein M is more than or equal to 1 and less than N, and N is the number of the probability values in the probability vector; and the number of the first and second groups,
if the M probability values are all larger than or equal to a first threshold value, judging that the picture to be identified belongs to the style types corresponding to the M probability values;
if the M probability values are smaller than the first threshold value and the maximum probability value in the M probability values is larger than or equal to a second threshold value, judging that the picture to be identified belongs to the style type corresponding to the maximum probability value in the M probability values;
if the M probability values are all smaller than a second threshold value and the minimum probability value of the M probability values is larger than or equal to a third threshold value, judging that the picture to be identified belongs to the style types corresponding to the M probability values;
if the M probability values are all smaller than a second threshold, and at least one probability value smaller than a third threshold and at least one probability value larger than or equal to the third threshold exist in the M probability values, judging that the picture to be identified belongs to a style type corresponding to the probability value larger than or equal to the third threshold in the M probability values;
if the M probability values are all smaller than a third threshold value, the picture to be recognized is judged to belong to the style types corresponding to the M probability values.
6. The picture style identification method according to claim 5, wherein the style type of the picture to be identified is identified by at least one of the following setting modes:
m takes a value of 3;
the first threshold value range includes: 0.9 to 0.95;
the second threshold value range includes: 0.4 to 0.6;
the third threshold value range includes: 0.2 to 0.3.
7. The picture style identification method according to claim 1, wherein the processing the sample picture in the preset manner comprises:
converting the color information of the sample picture into RGB three-channel color information;
zooming the short edge of the sample picture to a first preset value, correspondingly zooming the long edge of the sample picture in the same proportion according to the zoom proportion of the short edge to form a first sample picture;
and cutting the first sample picture into a square sample picture with the side length of the square sample picture being the first preset value by taking the vertical median intersection point of the long side and the short side of the first sample picture as a central point.
8. The picture style identification method of claim 7, wherein the method further comprises:
and respectively cutting out a square sample picture with the side length of a second preset value by taking the upper left corner, the upper right corner, the lower left corner, the middle left part, the middle right part, the middle upper edge part and the middle lower edge part of the square sample picture with the side length of the first preset value as boundaries, and performing mirror image turning by taking one vertical side of the square sample picture with the side length of the second preset value as an axis to form a new square sample picture with the side length of the second preset value.
9. An apparatus for recognizing a style of a picture, the apparatus comprising:
the training set construction module is used for acquiring a sample picture and processing the sample picture according to a preset mode to form a sample training set;
the sample training module is used for storing the set multi-target convolutional neural network; the system is also used for carrying out parameter initialization on the multi-target convolutional neural network, and training the sample pictures in the sample training set in the multi-target convolutional neural network after the parameter initialization to obtain a picture style identification model;
the picture identification module is used for identifying a picture to be identified by using the picture style identification model to acquire probability vectors of the picture to be identified belonging to different style types, and the value range of the probability value of each style type in the probability vectors is 0-1; wherein the sum of the probability values of the respective genre types in the probability vector is greater than 1;
the style identification module is used for storing a preset picture style judgment rule and identifying the style type of the picture to be identified according to the judgment rule and the probability vector; the style types of the identified pictures to be identified do not exceed M, M is more than or equal to 1 and less than N, and N is the number of probability values in the probability vector.
10. The apparatus of claim 9, wherein the multi-objective convolutional neural network stored in the sample training module is configured to include:
three convolutional layers, two fully-connected layers, three RELU layers, three Maxpooling layers and a Softmax layer comprising at least two Softmax sublayers; each of the at least two Softmax sublayers is provided with a judging function of a style type, and each Softmax sublayer is used for determining the probability that the picture to be identified belongs to the style type corresponding to the Softmax sublayer.
11. The picture style recognition apparatus of claim 10, wherein the preset multi-target convolutional neural network is configured to include:
and the Softmax sublayer of the Softmax layer transmits the judged loss value backwards to a fully-connected layer connected with the Softmax sublayer, and the fully-connected layer adjusts the parameters of the fully-connected layer correspondingly according to the received loss value.
12. The picture style recognition apparatus of claim 10, wherein the preset multi-target convolutional neural network is configured to include:
a first gaussian convolution layer comprising 64 convolution kernels; a first Maxpooling layer, a RELU layer, a normalization layer connected to the first Gaussian convolution layer; a second gaussian convolution layer including 32 convolution kernels connected to the first maxporoling layer; a second Maxpooling layer, a RELU layer, a normalization layer connected to the second Gauss convolution layer; a third gaussian convolution layer including 16 convolution kernels connected to the second maxporoling layer; a third Maxpooling layer, a RELU layer, a normalization layer connected to the third Gaussian convolution layer; a first fully interconnected layer to which the third Maxpooling layer is connected; a second fully connected layer and a Dropout layer connected to the first fully connected layer; and the Softmax layer is connected with the second fully-communicated layer and comprises N Softmax sublayers, and N is more than or equal to 2.
13. The picture style recognition apparatus of claim 9, wherein the style recognition module comprises:
the probability value selection module is used for selecting the first M probability values with the maximum probability value from the probability vector, wherein M is more than or equal to 1 and is less than N, and N is the number of the probability values in the probability vector;
the recognition result output module is used for recognizing the style type of the picture to be recognized by adopting the following style judgment rules:
if the M probability values are all larger than or equal to a first threshold value, judging that the picture to be identified belongs to the style types corresponding to the M probability values;
if the M probability values are smaller than the first threshold value and the maximum probability value in the M probability values is larger than or equal to a second threshold value, judging that the picture to be identified belongs to the style type corresponding to the maximum probability value in the M probability values;
if the M probability values are all smaller than a second threshold value and the minimum probability value of the M probability values is larger than or equal to a third threshold value, judging that the picture to be identified belongs to the style types corresponding to the M probability values;
if the M probability values are all smaller than a second threshold, and at least one probability value smaller than a third threshold and at least one probability value larger than or equal to the third threshold exist in the M probability values, judging that the picture to be identified belongs to a style type corresponding to the probability value larger than or equal to the third threshold in the M probability values;
if the M probability values are all smaller than a third threshold value, the picture to be recognized is judged to belong to the style types corresponding to the M probability values.
14. The picture style recognition device of claim 13, wherein the style recognition module recognizes the style type of the picture to be recognized by at least one of the following setting modes:
m takes a value of 3;
the first threshold value range includes: 0.9 to 0.95;
the second threshold value range includes: 0.4 to 0.6;
the third threshold value range includes: 0.2 to 0.3.
15. The picture style recognition apparatus of claim 9, wherein the training set construction module comprises:
the color information conversion module is used for converting the color information of the sample picture into RGB three-channel color information;
the scaling module is used for scaling the short edge of the sample picture to a first preset value and scaling the long edge of the sample picture according to the scaling of the short edge in the same scale to form a first sample picture;
and the central region clipping module is used for clipping the first sample picture into a square sample picture with the side length being the first preset value by taking the vertical median intersection point of the long side and the short side of the first sample picture as a central point.
16. The picture style recognition apparatus of claim 15, wherein the training set construction module further comprises:
the first expansion module is used for cutting out a square sample picture with the side length of a second preset value by taking the upper left corner, the upper right corner, the lower left corner, the lower right corner, the middle left part, the middle right part, the middle upper part and the middle lower part of the square sample picture with the side length of the first preset value as boundaries;
and the second expansion module is used for performing mirror image turning by taking one vertical side of the square sample picture with the side length of the second preset value as an axis to form a new square sample picture with the side length of the second preset value.
CN201510922662.9A 2015-12-14 2015-12-14 Picture style identification method and device Active CN106874924B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510922662.9A CN106874924B (en) 2015-12-14 2015-12-14 Picture style identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510922662.9A CN106874924B (en) 2015-12-14 2015-12-14 Picture style identification method and device

Publications (2)

Publication Number Publication Date
CN106874924A CN106874924A (en) 2017-06-20
CN106874924B true CN106874924B (en) 2021-01-29

Family

ID=59178561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510922662.9A Active CN106874924B (en) 2015-12-14 2015-12-14 Picture style identification method and device

Country Status (1)

Country Link
CN (1) CN106874924B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341523A (en) * 2017-07-13 2017-11-10 浙江捷尚视觉科技股份有限公司 Express delivery list information identifying method and system based on deep learning
CN107832727A (en) * 2017-11-21 2018-03-23 深圳市未来媒体技术研究院 A kind of indoor mall shop feature extracting method
CN108229341B (en) * 2017-12-15 2021-08-06 北京市商汤科技开发有限公司 Classification method and device, electronic equipment and computer storage medium
CN108197574B (en) * 2018-01-04 2020-09-08 张永刚 Character style recognition method, terminal and computer readable storage medium
CN108491866B (en) * 2018-03-06 2022-09-13 平安科技(深圳)有限公司 Pornographic picture identification method, electronic device and readable storage medium
CN110580729B (en) * 2018-06-11 2022-12-09 阿里巴巴集团控股有限公司 Image color matching method and device and electronic equipment
CN111581414B (en) * 2019-02-18 2024-01-16 北京京东尚科信息技术有限公司 Method, device, equipment and storage medium for identifying, classifying and searching clothes
CN112308095A (en) * 2019-07-30 2021-02-02 顺丰科技有限公司 Picture preprocessing and model training method and device, server and storage medium
CN111339944A (en) * 2020-02-26 2020-06-26 广东三维家信息科技有限公司 Decoration style identification method and device and electronic equipment
CN111680760A (en) * 2020-06-16 2020-09-18 北京联合大学 Clothing style identification method and device, electronic equipment and storage medium
CN112015937B (en) * 2020-08-31 2024-01-19 核工业北京地质研究院 Picture geographic positioning method and system
CN112800510A (en) * 2020-12-25 2021-05-14 佛山欧神诺云商科技有限公司 Design scheme decoration style identification method of online design system, electronic equipment and storage medium
CN115658964B (en) * 2022-05-25 2023-07-18 腾讯科技(深圳)有限公司 Training method and device for pre-training model and somatosensory wind identification model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090871A (en) * 2014-07-18 2014-10-08 百度在线网络技术(北京)有限公司 Picture translation method and system
CN104102919A (en) * 2014-07-14 2014-10-15 同济大学 Image classification method capable of effectively preventing convolutional neural network from being overfit
US20150006444A1 (en) * 2013-06-28 2015-01-01 Denso Corporation Method and system for obtaining improved structure of a target neural network
CN104517122A (en) * 2014-12-12 2015-04-15 浙江大学 Image target recognition method based on optimized convolution architecture

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150006444A1 (en) * 2013-06-28 2015-01-01 Denso Corporation Method and system for obtaining improved structure of a target neural network
CN104102919A (en) * 2014-07-14 2014-10-15 同济大学 Image classification method capable of effectively preventing convolutional neural network from being overfit
CN104090871A (en) * 2014-07-18 2014-10-08 百度在线网络技术(北京)有限公司 Picture translation method and system
CN104517122A (en) * 2014-12-12 2015-04-15 浙江大学 Image target recognition method based on optimized convolution architecture

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于认知特征的服装风格自主分类的研究与实现;钟晓东;《中国优秀硕士学位论文全文数据库电子期刊》;20120731;摘要;第三章3.2节 *

Also Published As

Publication number Publication date
CN106874924A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
CN106874924B (en) Picture style identification method and device
CN106874296B (en) Method and device for identifying style of commodity
US11232318B2 (en) Methods and apparatuses for vehicle appearance feature recognition, methods and apparatuses for vehicle retrieval, storage medium, and electronic devices
WO2021078027A1 (en) Method and apparatus for constructing network structure optimizer, and computer-readable storage medium
CN108229341B (en) Classification method and device, electronic equipment and computer storage medium
Nawaz et al. AI-based object detection latest trends in remote sensing, multimedia and agriculture applications
CN107683469A (en) A kind of product classification method and device based on deep learning
US11887217B2 (en) Text editing of digital images
WO2019233077A1 (en) Ranking of business object
CN113536107B (en) Big data decision method and system based on block chain and cloud service center
Liang et al. A new image classification method based on modified condensed nearest neighbor and convolutional neural networks
US20230137533A1 (en) Data labeling method and apparatus, computing device, and storage medium
CN112381030B (en) Satellite optical remote sensing image target detection method based on feature fusion
CN114511576B (en) Image segmentation method and system of scale self-adaptive feature enhanced deep neural network
CN113657087B (en) Information matching method and device
Shang et al. Image spam classification based on convolutional neural network
CN107506792A (en) A kind of semi-supervised notable method for checking object
CN113642400A (en) Graph convolution action recognition method, device and equipment based on 2S-AGCN
CN113344000A (en) Certificate copying and recognizing method and device, computer equipment and storage medium
Huo et al. Semisupervised learning based on a novel iterative optimization model for saliency detection
CN113642481A (en) Recognition method, training method, device, electronic equipment and storage medium
Golts et al. Deep energy: task driven training of deep neural networks
CN111222546A (en) Multi-scale fusion food image classification model training and image classification method
US10643092B2 (en) Segmenting irregular shapes in images using deep region growing with an image pyramid
CN110472639B (en) Target extraction method based on significance prior information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211119

Address after: No. 699, Wangshang Road, Binjiang District, Hangzhou, Zhejiang

Patentee after: Alibaba (China) Network Technology Co.,Ltd.

Address before: Cayman Islands, Grand Cayman

Patentee before: ALIBABA GROUP HOLDING Ltd.

TR01 Transfer of patent right