CN110390350A

CN110390350A - A kind of hierarchical classification method based on Bilinear Structure

Info

Publication number: CN110390350A
Application number: CN201910548377.3A
Authority: CN
Inventors: 范建平; 张翔; 赵万青; 罗迒哉; 彭进业; 张二磊; 赵超
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2019-10-29
Anticipated expiration: 2039-06-24
Also published as: CN110390350B

Abstract

The hierarchical classification method based on Bilinear Structure that the invention discloses a kind of, a kind of hierarchical classification network of Bilinear Structure is proposed in this method, kind network structure can optimize final taxonomic structure using hierarchical category relationship, the characteristic of depth convolutional network is utilized, it is combined with the priori knowledge of category hierarchy structure, so that the different layers in network learn human understandable concept；At the same time, the present invention also comes together to this hierarchical network structure and bilinear model to further increase classifying quality；The network structure of level is combined the target that can effectively distinguish same " category " but different " kinds " by the present invention with bilinear model, to further improve target identification effect.

Description

A kind of hierarchical classification method based on Bilinear Structure

Technical field

The present invention relates to computer vision fields, are related to pattern-recognition, image procossing, depth learning technology, and in particular to A kind of hierarchical classification method based on Bilinear Structure.

Background technique

In recent years, depth convolutional neural networks show excellent performance in terms of image classification, achieve and make us looking steadily Purpose achievement.Traditional disaggregated model based on CNN is designed to the network structure of sequence, and unique in the last output of model Prediction result, there is no any network branches in output end, this is because traditional neural network structure equals all target class Deng treat.But there are a kind of connections between classification in real life, for example " orange " and " table tennis " has phase As contour structures, distinguish these two types than differentiation " orange " and " chair " difficulty it is more, however traditional deep learning is but Think that their separating capacity is identical.

Summary of the invention

The hierarchical classification method based on Bilinear Structure that the object of the present invention is to provide a kind of, by hierarchical structure and bilinearity Models coupling together, to further increase classifying quality.

In order to realize above-mentioned task, the invention adopts the following technical scheme:

A kind of hierarchical classification method based on Bilinear Structure, comprising the following steps:

Step 1, the hierarchical classification network of Bilinear Structure is constructed

The sorter network includes hierarchical network and bilinearity network, and the hierarchical network includes setting gradually First convolution module to the 5th convolution module, the bilinearity network include parallel arrangement of two convolutional neural networks, In:

The first branch module is connected in first convolution module, the first convolution module is used to belong to kind of a structure label to section Input picture carry out process of convolution, the processing of primary maximum pondization twice after, on the one hand the characteristic pattern of output is input to volume Two In volume module, the first branch module is on the other hand inputted, is carried out in the first branch module after convolution operation twice by one Full articulamentum is classified with the section label to input picture；

The second branch module is connected in second convolution module, the second convolution module is to the spy from the first convolution module After sign figure carries out process of convolution, primary maximum pondization processing twice, on the one hand the characteristic pattern of output is input to third convolution module In, the second branch module is on the other hand inputted, is connected entirely after convolution operation twice is carried out in the second branch module by one Layer is classified with the section label for continuing to input picture；

Third branch module is connected in third convolution module, third convolution module is to the spy from the second convolution module After sign figure carries out cubic convolution processing, primary maximum pondization processing, on the one hand the characteristic pattern of output is input to Volume Four volume module In, third branch module is on the other hand inputted, is connected entirely after convolution operation twice is carried out in third branch module by one Layer, is classified with the category grade label to input picture；

The 4th branch module is connected on Volume Four volume module, Volume Four volume module is to the spy from third convolution module After sign figure carries out cubic convolution processing, primary maximum pondization processing, on the one hand the characteristic pattern of output is input to the 5th convolution module In, the 4th branch module is on the other hand inputted, is connected entirely after convolution operation twice is carried out in the 4th branch module by one Layer, is classified with the kind grade label to input picture；

After 5th convolution module is to process of convolution twice is carried out from the characteristic pattern of Volume Four volume module, output It is divided into two-way, is separately connected two convolutional neural networks and carries out feature extraction；The spy that two convolutional neural networks extract Sign does apposition operation, then with the mode in pond of summing, the feature of all extractions is summed, then obtained feature is opened Side simultaneously normalizes, finally using full articulamentum；

Using the first branch module to the output of the 4th branch module and the output of bilinearity network, pass through cross entropy Five loss functions are calculated, by this five loss function linear, additives, and assign each loss function different weights；

Step 2, the training of the hierarchical classification network of Bilinear Structure

The hierarchical classification network of Bilinear Structure is trained, is optimized when training using the weight distribution of loss function Final classification results save trained network model for image classification.

The present invention has following technical characterstic:

1. network structure proposed by the present invention can optimize final taxonomic structure using hierarchical category relationship, it is utilized The characteristic of depth convolutional network combines it with the priori knowledge of category hierarchy structure, so that the different layers in network learn Human understandable concept.

2. the hierarchy structure information of image tag is embedded into deep learning network by the present invention, having devised can be effective Identify similar characteristic but different types of target.

3. the present invention, which combines the network structure of level with bilinear model, can effectively distinguish same " category " but difference The target of " kind ", to further improve target identification effect.

Detailed description of the invention

Fig. 1 is the schematic network structure of hierarchical network；

Fig. 2 is the corresponding tag tree schematic diagram of hierarchical network；

Fig. 3 is the schematic network structure of bilinearity network；

Fig. 4 is the hierarchical classification network diagram of the Bilinear Structure proposed in the present invention.

Specific embodiment

In recent years, popular neural network structure such as AlexNet, VggNet, ResNet etc. use single network There is no be designed into the design feature of image label itself in network structure to structure.For extensive fine-grained image classification Be for task rely solely on this mode network structure be extremely difficult to it is ideal as a result, because for extensive fine granularity Many different types of targets have similar characteristic for image classification, and the characteristics of label itself can be good at constraining this The case where misclassification.

Therefore, this programme realizes image classification using hierarchical network architecture.This hierarchical network architecture is needed target Classification sort out, such as " apple " belong to " fruit " thick classification simultaneously it also belong to " apple " subclass it is other, then use level The mode of classification optimizes final classification results.This hierarchical network structure can restraining error to subclass, this is because this Kind structure is constrained subclass not (" orange ") by thick classification (" fruit ") and assigns to other classes to reduce subclass (" orange ") The error of (" table tennis ").Furthermore bilinear model, which can capture more features with distinction and have, extracts region letter The effect of breath, therefore, bilinear model is further increased image recognition result in conjunction with hierarchical structure by the present invention.

It should be noted that the data that the network structure that this programme proposes is directed to are data (section's categories with hierarchical structure Kind of structure) or varigrained classification (thick class and subclass) can be splitted data by clustering method.

1. hierarchical classification network

The structure chart of hierarchical classification network is as shown in Figure 1, corresponding tag tree is as shown in Figure 2.It is fine to mark in tag tree Label are target class, they are presented in the form of leaf, and are gathered into coarse classification, these classifications can construct or pass through by hand Non-supervisory method generates.

Hierarchical network model uses existing CNN component as structure block, constructs the network with internal output branch.Figure Network shown in 1 bottom is traditional convolutional network, and the middle section of Fig. 1 shows the output branch network of hierarchical network, often A branching networks generate a prediction on the appropriate level of tag tree.

The structure of hierarchical classification network is as follows:

Network inputs are one 224 × 224 × 3 images and label (there is section to belong to kind of a structure label), and network starts Be for two layers that 64 3 × 3 filters carry out convolution to input picture, then output is passed through the result is that 224 × 224 × 64 Maxpool operation, exports the characteristic pattern of 112 × 112 × 64 sizes, is then divided into two-way, is named as LA1 all the way, names all the way For LA2, this mainly passes through two layers of convolution to LA1 all the way, and convolution size is 3 × 3, and the quantity of convolution kernel is 256, then connects one Full articulamentum pays attention to being convolution nuclear volume here being C1, and C1 indicates the classification number of thick class (section), LA1 this be mainly all the way Classify to the label of section, this passes through two layers of convolution to LA2 all the way, and convolution size is 3 × 3, the quantity of convolution kernel is 128, then by a maxpool operation, the characteristic pattern of 56 × 56 × 128 sizes is exported, is then divided into two-way, names all the way For LB1, it is named as LB2 all the way, this mainly passes through two layers of convolution to LB1 all the way, and convolution size is 3 × 3, and the quantity of convolution kernel is 256, a full articulamentum is then connect, pays attention to being convolution nuclear volume here being C1, C1 indicates the classification number of thick class (section), LB1 This is all the way primarily to the label to section is classified, this passes through three layers of convolution to LB2 all the way, and convolution size is 3 × 3, The quantity of convolution kernel is 256, then by a maxpool operation, exports the characteristic pattern of 28 × 28 × 256 sizes, then divides For two-way, it is named as LC1 all the way, is named as LC2 all the way, this main convolution for passing through two layers all the way of LC1, convolution size is 3 × 3, the quantity of convolution kernel is 1024, then connects a full articulamentum, pays attention to being convolution nuclear volume here being C2, C2 indicates thick class The classification number of (category), LC1 this all the way primarily to belong to grade label classify, this passes through three layers of convolution to LC2 all the way, Convolution size is 3 × 3, and the quantity of convolution kernel is 256, then by a maxpool operation, exports 14 × 14 × 256 sizes Characteristic pattern, be then divided into two-way, be named as LD1 all the way, be named as LD2 all the way, this main volume for passing through two layers all the way of LD1 Product, convolution size is 3 × 3, and the quantity of convolution kernel is 1024, then connects a full articulamentum, pays attention to being convolution nuclear volume here C, C indicates the classification number of subclass (kind), LD1 this all the way primarily to classify to the label of kind of grade, DC2 this all the way By three layers of convolution, convolution size is 3 × 3, and the quantity of convolution kernel is 512, then connects two layers of convolution again, and convolution size is 3 × 3, convolution nuclear volume is 4096, is most followed by a full articulamentum, pays attention to being convolution nuclear volume here being C, C indicates subclass The classification number of (kind).5 loss functions can be calculated eventually by cross entropy, we are linear by this 5 loss functions It adds up, and assigns the different weight of each loss function to optimize network.

2. bilinearity network

Bilinearity network structure is as shown in Figure 3.Can be seen that from the structure chart of bilinearity disaggregated model is exactly two in fact A convolutional neural networks carry out feature extraction to image, then CNN is extracted with a bilinear pooling function two groups Feature is combined, and is finally substituted into softmax layers and is classified.There are two neural network A and B, input pictures to be adjusted for the network It is whole then to extract the feature of the image respectively with the two networks to 448 × 448 sizes, in each position of image, two Network generates the feature of 1 × 512 size respectively, and the feature A (I) and B (I) extracted in each position to two networks does apposition Operation, formula are as follows:

X (I)=A (I)^TB(I)

Then using summation pond mode, the bilinearity feature of all positions is summed:

Wherein l indicates position, is next calculated as follows obtained feature:

Operation finally is normalized in obtained feature:

Using the result after normalizing as the feature of the picture, and it is used to classify.This single convolution of bilinearity aspect ratio The feature that network extracts can obtain better effect in classification, this is because phase is distinguished in the effect of two convolutional neural networks When in region detection and feature extraction.

3. the present invention proposes to combine bilinearity network model with hierarchical network model to further increase point of image Class accuracy rate, the prototype network structure of proposition is as shown in figure 4, specific method is as follows:

The first branch module LA1 is connected in first convolution module, the first convolution module is used to belong to kind of a structure to section The input picture of label carries out convolution twice to input picture using 64 3 × 3 filters, output the result is that 224 × 224 × 64, then operated by maxpool；On the one hand the characteristic pattern of output is input to LA2 in the second convolution module, on the other hand input First branch module LA1 carries out convolution operation twice in the first branch module LA1, and convolution size is 3 × 3, the number of convolution kernel Amount is 256, after by a full articulamentum；It here be convolution nuclear volume is C1, C1 indicates the classification number of thick class (section), LA1 Primarily to input picture classifies to the label of section.

The second branch module LB1 is connected in second convolution module, the second convolution module LA2 is to from first It is 3 × 3 that the characteristic pattern of convolution module, which carries out process of convolution, convolution size twice, and the quantity of convolution kernel is 128, then passes through one A maxpool operation, exports the characteristic pattern of 56 × 56 × 128 sizes；On the one hand the characteristic pattern of output is input to third convolution mould On the other hand LB2 in block inputs the second branch module LB1, carries out convolution operation twice, convolution in the second branch module LB1 Size is 3 × 3, and the quantity of convolution kernel is 256, then connects a full articulamentum, be convolution nuclear volume is here C1, and C1 is indicated The classification number of thick class (section), LB1 is primarily to the label to section is classified.

Third branch module LC1 is connected in third convolution module, third convolution module LB2 is to from volume Two product module The characteristic pattern of block carries out cubic convolution processing, and convolution size is 3 × 3, and the quantity of convolution kernel is 256, then passes through one Maxpool operation, exports the characteristic pattern of 28 × 28 × 256 sizes；On the one hand the characteristic pattern of output is input to Volume Four volume module In LC2, third branch module LC1 is on the other hand inputted, carries out convolution operation twice in third branch module LC1, convolution is big Small is 3 × 3, and the quantity of convolution kernel is 1024, then connects a full articulamentum；Here convolution nuclear volume is C2, and C2 indicates thick The classification number of class (category), LC1 is primarily to classify to the label for belonging to grade.

The 4th branch module LD1 is connected on Volume Four volume module, Volume Four volume module LC2 is to from third convolution mould The characteristic pattern of block carries out cubic convolution processing, and convolution size is 3 × 3, and the quantity of convolution kernel is 256, then passes through one Maxpool operation, exports the characteristic pattern of 14 × 14 × 256 sizes；On the one hand the characteristic pattern of output is input to the 5th convolution module In LD2, the 4th branch module LD1 is on the other hand inputted, convolution operation twice is carried out in the 4th branch module LD1, convolution is big Small is 3 × 3, and the quantity of convolution kernel is 1024, then connects a full articulamentum, be convolution nuclear volume is here C, and C indicates thin The classification number of class (kind), LD1 is primarily to classify to the label of kind of grade.

Process of convolution, convolution are big twice to the characteristic pattern progress from Volume Four volume module for 5th convolution module Small is 3 × 3, and the quantity of convolution kernel is 512, and output is divided into two-way, is separately connected described two convolutional neural networks LE1, LE2 Carry out feature extraction；LE1 and LE passes through one layer of convolution respectively, and convolution size is 3 × 3, and the quantity of convolution kernel is 512, obtains The characteristic pattern of one 14 × 14 × 512 size, in each position of image, two networks generate the spy of 1 × 512 size respectively Sign does apposition operation in the feature that two convolutional neural networks are extracted in each position, then with the mode in pond of summing, by institute There is the bilinearity feature of position to sum, then obtained feature evolution and normalize, as the feature of input picture, Finally using full articulamentum.

Using the first branch module to the output of the 4th branch module and the output of bilinearity network, pass through cross entropy Five loss function loss1~loss5 are calculated, by this five loss function linear, additives, and assign each loss function not Same weight w1~w5.

The Training strategy of network optimizes final classification results using the weight distribution of loss function.In training Shi Laixiu Change the distribution of loss function weight, such as one three layers of hierarchical network structure, initialize weight should be endowed for [1, 0,0], after iteration 20 times by weight modification be [0,1,0], obtain after the number of iterations reaches 50 times by weight modification be [0,0, 1].The Training strategy of network is as follows:

(1) initialization weight should be endowed as [1,0,0,0,0], be mainly used to train this module (first volume of loss1 The+the first branch module of volume module) network, optimize the classification results of thick class (section)；

(2) it is [0,1,0,0,0] by weight modification after iteration Num1 times, is mainly used to train this module (second of loss2 The+the second branch module of convolution module) network, optimize the classification results of thick class (section).

(3) it is [0,0,1,0,0] by weight modification after iteration Num2 times, is mainly used to train this module (third of loss3 Convolution module+third branch module) network, optimize the classification results of thicker class (category).

(4) it is [0,0,0,1,0] by weight modification after iteration Num3 times, is mainly used to train this module the (the 4th of loss4 Four branch module of convolution module+the) network, optimize compared with subclass (kind) classification results.

(5) it is [0,0,0,0,1] by weight modification after iteration Num4 times, for training whole network, optimizes compared with subclass The classification results of (kind).

Experimental section:

Experimental data uses four databases: CIFAR-10, CIFAR-100, " Orchid " plant database.Wherein " Orchid " plant database is that inventor's team itself collects " orchid section ".

CIFAR-10:CIFAR-10 data set contains 10 target class, and wherein training set and test set respectively have 50000 With 10000 images.These images are all the color images of 32*32 size.CIFAR-10 10 subclasses (aireplane, Ship, truck, automobile, bird, frog, dog, cat, deer horse) be divided into 7 thick classes (sky, water, Road, bird, reptile, pet, medium), this 7 thick classes be further able to be divided into 2 thicker classifications (transport, animal)。

CIFAR-100:CIFAR-100 database contains 100 class targets, and every class contains 600 pictures, image size It is the color image of 32*32.100 subclasses of CIFAR-100 are divided into 20 thick classes, this 20 thick classes are further able to point At 8 thicker classifications.

" Orchid " plant database: having collected 51 class plants in orchid section, a total of 32064 trained pictures, 7894 test pictures.The plant of this 52 kinds of orchid sections is further constituted 8 major class.It is worth noting that these orchid sections Some different types have closely similar contour structures, this for classification for be a challenge task.

1. the classification results of hierarchical network structure

(1) Training strategy

The Training strategy of hierarchical network structure optimizes final classification results using the weight distribution of loss function.It is instructing The distribution of loss function is modified when practicing, for example for one three layers of hierarchical network structure, initializing weight should be endowed By weight modification be [0,1,0] after iteration 20 times for [1,0,0], obtain be by weight modification after the number of iterations reaches 50 times [0,0,1]。

The weight distribution of loss function is as shown in table 1 on CIFAR-10 and CIFAR-100, on " Orchid " data set The weight distribution of loss function is as indicated in the chart 2.

The weight of loss function on 1 CIFAR-10 and CIFAR-100 data set of table

	CIFAR-10	CIFAR-100
			The number of iterations	Lose weight distribution	Lose weight distribution
1	[1,0,0,0,0]	[1,0,0,0,0]
			13	[0,1,0,0,0]	[0,1,0,0,0]
23	[0,0,1,0,0]	[0,0,1,0,0]
			33	[0,0,0,1,0]	[0,0,0,1,0]
43	[0,0,0,0,1]	[0,0,0,0,1]

The weight of loss function on table 2 " Orchid " data set

	Orchid
		The number of iterations	Lose weight distribution
1	[1,0,0,0]
		15	[0,1,0,0]
50	[0,0,1,0]
		100	[0,0,0,1]

(2) experimental result

Experimental result is as shown in table 3, table 4 and table 5, from experimental result it can be seen that the hierarchical network structure proposed is in difference Data set under all achieve good classifying quality.

3 CIFAR-10 database recognition accuracy of table

Network structure	Discrimination (%)
		vgg16	88.11
Hierarchical network structure	88.75

4 CIFAR-100 database recognition accuracy of table

Network structure	Discrimination (%)
		vgg16	62.97
Hierarchical network structure	64.57

Table 5 " Orchid " database recognition accuracy

Network structure	Discrimination (%)
		vgg16	84.02
Hierarchical network structure	84.78

2. the classification results of the hierarchical classification network of Bilinear Structure

(1) Training strategy

Training strategy optimizes final classification results using the weight distribution of loss function.Loss is modified in training The distribution of function, such as one three layers of hierarchical network structure, initializing weight should be endowed as [1,0,0], repeatedly Generation 20 times after by weight modification be [0,1,0], obtain after the number of iterations reaches 50 times by weight modification be [0,0,1].In network The 5th module we using Bilinear Structure carry out Optimum Classification effect.

(2) experimental result

Vgg16, bilinearity network are realized on " Orchid " database and based on the hierarchical network of Bilinear Structure Model, experimental configuration is as shown in table 6, compared to traditional VGG16 sorter network, this programme propose based on Bilinear Structure Hierarchical network model can effectively improve classifying quality.

Table 6 " Orchid " database recognition accuracy

Network structure	Discrimination (%)
		vgg16	84.02
Bilinearity	89.4
		Hierarchical structure+bilinearity	91.1

Claims

1. a kind of hierarchical classification method based on Bilinear Structure, which comprises the following steps:

The sorter network includes hierarchical network and bilinearity network, and the hierarchical network includes first set gradually For convolution module to the 5th convolution module, the bilinearity network includes parallel arrangement of two convolutional neural networks, in which:

The first branch module is connected in first convolution module, the first convolution module is used to belong to the defeated of kind of structure label to section After entering image progress process of convolution, primary maximum pondization processing twice, on the one hand the characteristic pattern of output is input to volume Two product module In block, the first branch module is on the other hand inputted, is connected entirely after convolution operation twice is carried out in the first branch module by one Layer is connect, is classified with the section label to input picture；

The second branch module is connected in second convolution module, the second convolution module is to the characteristic pattern from the first convolution module After carrying out process of convolution, primary maximum pondization processing twice, on the one hand the characteristic pattern of output is input in third convolution module, separately On the one hand the second branch module of input, carries out after convolution operation twice in the second branch module by a full articulamentum, with Continue to classify to the section label of input picture；

Third branch module is connected in third convolution module, third convolution module is to the characteristic pattern from the second convolution module After carrying out cubic convolution processing, primary maximum pondization processing, on the one hand the characteristic pattern of output is input in Volume Four volume module, separately On the one hand input third branch module, carries out after convolution operation twice in third branch module by a full articulamentum, with Classify to the category grade label of input picture；

The 4th branch module is connected on Volume Four volume module, Volume Four volume module is to the characteristic pattern from third convolution module After carrying out cubic convolution processing, primary maximum pondization processing, on the one hand the characteristic pattern of output is input in the 5th convolution module, separately On the one hand the 4th branch module of input, carries out after convolution operation twice in the 4th branch module by a full articulamentum, with Classify to the kind grade label of input picture；

After 5th convolution module is to process of convolution twice is carried out from the characteristic pattern of Volume Four volume module, output is divided into Two-way is separately connected two convolutional neural networks and carries out feature extraction；The feature that two convolutional neural networks extract is done Apposition operation, then with the mode in summation pond, the feature of all extractions is summed, then obtained feature is subjected to evolution simultaneously Normalization, finally using full articulamentum；

Using the first branch module to the output of the 4th branch module and the output of bilinearity network, calculated by cross entropy Five loss functions out by this five loss function linear, additives, and assign each loss function different weights；

The hierarchical classification network of Bilinear Structure is trained, is optimized when training using the weight distribution of loss function final Classification results, save trained network model for image classification.