CN110414336A

CN110414336A - A kind of depth complementation classifier pedestrian's searching method of triple edge center loss

Info

Publication number: CN110414336A
Application number: CN201910542675.1A
Authority: CN
Inventors: 姚睿; 高存远; 夏士雄; 赵佳琦; 周勇; 牛强; 袁冠; 张凤荣; 陈朋朋; 王重秋
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2019-11-05

Abstract

The invention discloses a kind of depth complementation classifier pedestrian's searching methods of triple edge center loss, belong to computer vision technique processing technology field.In pedestrian's weight identification division, it is proposed that triple edge center is lost, on the basis of the feature difference of the same pedestrian can be effectively reduced in loss at center, the thinking of triple loss is introduced, the feature difference between different pedestrians is effectively increased.By searching for subtask, detection and the raising identified to pedestrian again, reach the promotion to pedestrian's search model overall performance.The present invention can carry out pedestrian detection simultaneously to large-scale reality scene image and identify again, play a significant role in safety-security areas such as supervision of the cities.

Description

A kind of depth complementation classifier pedestrian's searching method of triple edge center loss

Technical field

The invention belongs to computer vision technique processing technology field, further relates to target detection and target retrieval is led Depth complementation classifier pedestrian's searching method of one of field technique field triple edge center loss

Background technique

Document is by Xiao, Tong, et al. " End-to-End Deep Learning for Person Search. " Pedestrian detection and pedestrian are identified that be integrated to one goes end to end by arXiv:1604.01850.arXiv, (2016) again for the first time People's search framework.Since current pedestrian identifies that benchmark and method mainly match pedestrian's picture of clipped mistake again, still Scene in reality will not be ideal in this way, cuts pedestrian's picture and needs to consume a large amount of time, and sometimes due to real-time The problem of cutting photo can not be provided.Then it when doing pedestrian's search, does not need to provide the pedestrian's picture cut, can pass through The method of pedestrian detection detects candidate pedestrian's frame, then knows with pedestrian the people that method for distinguishing retrieves specific identity again.

Document is directed to the big data small sample problem of pedestrian's search, increases the synthesis of no label using confrontation network is generated Pedestrian, the especially significant effect when the pedestrian in scene picture is rare.

It usually will appear in the mankind/inhuman judgement of pedestrian detection and fail to judge and misjudge.

Summary of the invention

In view of the above technical problems, depth complementation classifier, which is efficiently used, in this method makes detector it can be found that complementation is sentenced Other region.On the basis of the feature difference of the same pedestrian can be effectively reduced in loss at center, triple loss is introduced Thinking, effectively increase the feature difference between different pedestrians.By the subtask searched for pedestrian, that is, detects and identify again To improve the performance of whole pedestrian's search model.

In order to achieve the above technical purposes, the technical scheme adopted by the invention is that:

A kind of depth complementation classifier pedestrian's searching method of triple edge center loss, comprising the following steps:

(1) it before model training, is fought by original image training pedestrian and generates network, and using the network original Any position of image synthesizes new pedestrian, generates new scene picture, reaches generation and enhancing pedestrian searches for network training number According to the purpose of collection；

(2) in the training stage, characteristic information is carried out by entire scene image of the convolutional neural networks to input first and is mentioned It takes；

(3) in pedestrian's detection-phase setting area candidate network RPN, and depth complementation classifier is utilized, obtains each frame It is likely to be the candidate region of pedestrian target, and the size and location of constantly amendment pedestrian candidate region in video image, extracts Their characteristic information；

(4) it after the characteristic information pond for the pedestrian candidate region that will test out is melted into identical size, is sent into pedestrian and identifies net again Network training using the loss of triple edge center and online example match loss function combined optimization and updates pedestrian's feature, mentions Rise the performance that pedestrian's search model identifies again；

(5) it in model measurement and forecast period, is gone using trained pedestrian's search model to input scene image People's detection after detecting pedestrian's frame, carries out characteristic similarity matching sequence with target pedestrian image and retrieves, characteristic matching degree Soprano is the pedestrian information that need to be retrieved.

The step 1 specifically includes:

1.1, pedestrian's frame of true target in pedestrian's scene image is filtered, only retains height and width is less than a certain fixed value Bounding box, and intercept include the pedestrian fixed-size image block；

1.2, in pedestrian image block after filtration, then the pedestrian image for showing the complete bodily form is selected, pedestrian's frame is used Random pixel 0 or 255 noises, i.e., random black or white covering, the image block comprising noise image fight as pedestrian and generate network Training set；

1.3, by pedestrian fight generate model training, make the e-learning one by black and white noise frame to specific pedestrian The mapping relations of image；

1.4, when generating pedestrian image, interception needs to generate the fixed-size image block of the scene image of pedestrian, selection Wherein any position covering certain altitude and width noise frame, fight as pedestrian and generate model measurement collection；

1.5, using network training go out by black and white noise frame to the mapping relations of specific pedestrian image, generation pedestrian image Afterwards, and former scene image is reduced back, completes data generation and enhancing that pedestrian searches for data set.

The step 3 is specific as follows:

3.1, after the characteristic information of the entire scene image extracted in step 2, suggested using Faster R-CNN and region Network RPN detects pedestrian candidate region, and cross entropy loss function classifier is arranged, and for determining whether anchor point is pedestrian, is arranged Smooth absolutely loss function returns position and the size of bounding box；

3.2, non-maximum suppression then, is used to delete repetition detection, and retains 128 couple candidate detections for each image Then pedestrian's frame feeding pond layer is obtained 7 × 7 × 2048 characteristic pattern, reconnects one layer of fully-connected network and be sent into three by frame A branching networks；

3.3, first branch is the classifier of depth complementation, can make the mankind/inhuman judgement by training；

3.4, second branch carries out further process of refinement to the position of pedestrian's frame and size；

3.5, third branch is the full convolutional layer of 256 dimensions, and output is the standardized feature of L-2；

3.6, in first branch, two depth complementation classifiers are set, are used for the mankind/inhuman judgement, first Using classifier A, it is expressed as f (θ_A) identify the region for most having judgement index, generate characteristic patternAnd erasing is grasped Being directed toward wherein has judgement index characteristic pattern, and the characteristic pattern after erasing is then supplied to its complementary classifier B, is expressed as f (θ_B), to find the complementary characteristic area with judgement index, generate characteristic pattern

3.7, characteristic pattern F is supervised with cross entropy loss function_AAnd F_B, and with unknown losses combined optimization model.

Step 4 specifically includes:

4.1, the characteristic pattern that second branch obtains utilizes the loss of online example match and triple edge center loss connection Training is closed, in back-propagating, if the tag along sort of target pedestrian is t, is just updated in inquiry table using following formula T column, enable inquiry table to save many attitude of same target pedestrian and the various feature V under angle_t←γV_t+ (1- γ) x, Wherein,

V_tTag along sort is represented as pedestrian's feature of t；

The weight that γ setting updates, can take γ=0.5 in section (0,1) interior value, this method；

4.2, to pedestrian's frame feature of the not tag identity occurred in scene picture as negative sample, for learning characteristic Expression be also it is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ R^D×QCarry out table Show, D × Q ties up matrix, and D is pedestrian's frame characteristic dimension after L2 regularization, and Q is the size of round-robin queue, is set according to actual scene Size is set, while calculating the cosine similarity U in the feature U without tag identity and minimum batch between sample x^TX, in each round After iteration, new feature vector is pressed into queue, and reject those out-of-date feature vectors, the process of a circulation is presented；

It introduces triple edge center loss function shown in formula (6) and constraint is realized to the feature with tag identity, By reducing difference in class, increase class inherited loss Optimized model training, triple edge center loss function only trains tool There is pedestrian's feature of label, changes model minimization with the internal feature of a group traveling together, maximize the internal feature of different pedestrians Variation

Wherein, X_i∈R^dThe feature of pedestrian's frame i is represented, it is to belong to people's identity label y_iClass；

Representative's identity label y_iThe central feature of class；

Representative's identity label y_jThe central feature of class；

The quantity of m expression pedestrian pedestrian's classification.

The step 5 is specific as follows:

Construct the test sample of pedestrian's search；And test sample is sent into the depth of trained triple edge center loss It spends complementary classifier pedestrian and searches for network, pedestrian detection is carried out to the test scene sample image of input, detects candidate pedestrian Frame position is sent into pedestrian and identifies that network obtains pedestrian's feature of its 256 dimension again, then inputs target pedestrian image and equally obtain it Itself and pedestrian's frame feature are done characteristic similarity matching using cosine similarity and sort out possibility most by pedestrian's feature of 256 dimensions High identity label, as the result of retrieving identity.

The beneficial effects of the present invention are:

The first, propose it is a kind of using confrontation generate network to original image carry out data enhancing, and then promoted pedestrian search The frame of rope model generalization.

Second, pass through the depth complementation classifier for proposing pedestrian detection in the stage of pedestrian detection, utilizes complementary target area Domain carries out people/non-human classification, to improve the overall performance of model, so that model is restrained faster, improves pedestrian detection Efficiency.

Third is combined using the loss of triple edge center and the polymerization matching of online example, solves the row of same label When proper manners sheet is less, minimizes class one skilled in the art feature gap and maximize pedestrian's feature gap between class.So that model is acquired Feature robustness it is stronger, cope with data set challenge bigger in reality scene.

Detailed description of the invention

Fig. 1 is the network flow that a kind of depth complementation classifier pedestrian of triple edge center loss of the present invention searches for network Cheng Tu.

Fig. 2 is the flow through a network figure of depth complementation classifier of the present invention.

Fig. 3 is the policy map of triple edge center loss of the present invention.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below by specific embodiment and Attached drawing, the present invention will be further described.

People search task is a typical big data small sample problem, because each pedestrian has seldom picture. For model, the differentiation of the few pedestrian of acquisition pedestrian's quantity is characterized in highly difficult while easy to a small amount of pedestrian's figure Piece generates over-fitting, in order to inhibit network to be easy to produce over-fitting, it is promoted to search for the development in practical application, this hair in pedestrian It is bright to propose a kind of depth complementation classifier pedestrian's searching method of triple edge center loss.Network is generated using confrontation, Can on original digital image data the new pedestrian of any position synthesis.Pedestrian's frame can not be accurately and efficiently detected in pedestrian detection and is made Out in the case where the classification of front and back scape, using the classifier of depth complementation, the information that there is complementary conspicuousness to differentiate pedestrian is excavated, Pedestrian detection model accuracy is further increased with this.

In addition to this, this method combines online example match and triple edge center loss function, more preferable area Divide image and different classes of image from the same category, so that pedestrian is searched for e-learning to diversification and have the spy of judgement index Sign, to be effectively relieved, the generic image of data set is few and lack of diversity problem bring influences.

It is the depth complementation classifier pedestrian search network of triple edge center loss of the present invention as shown in Figure 1, including Following steps:

1, it generates and searches for scene image sample with enhancing pedestrian

(a) original pedestrian scene picture is searched for screen, select pedestrian's frame height degree pixel therein less than 70 and Width pixel is mad less than 25 boundary, and intercepts the image block of 256 × 256 resolution ratio comprising pedestrian's frame, then therefrom screen Wherein there is the pedestrian image of complete body.

(b) noise of pedestrian's frame random pixel 0 or 255 is covered, i.e., random black or white covering, then by 256 × 256 points The image block of resolution fights the training set for generating network as pedestrian；

(c) scene image that selection needs to generate pedestrian intercepts the image block of wherein 256 × 256 resolution ratio again, wherein Any position cover height less than 70 pixels and width less than 25 pixels noise frame, be sent into trained pedestrian to antibiosis At network, after generating new synthesis pedestrian image, then former scene image is cut back, so that completing pedestrian searches for scene image sample Generation and enhancing.

2, pedestrian detection and feature learning

(d) existing depth convolutional network ResNet-50 or VGG-16 are used into transfer learning strategy, imports and uses The trained network parameter of ImageNet data set, as the initial training parameter of depth network, by a series of volume After product, 1024 channel characteristics figures are exported, size is the 1/16 of input image resolution, obtained characteristic pattern while sharing to pedestrian Detection and weight identification division.

(e) Faster R-CNN is set above characteristic pattern, and region is suggested that network (RPN) is responsible for detecting pedestrian's frame, examined The pedestrian's frame feature measured enters pond layer.

(f) identification network again is set after the layer of pond and there are three branches, first branch is depth complementation classifier, warp The judgement of pedestrian/non-pedestrian can be made by crossing training

(g) second branch is the further process of refinement to pedestrian's frame position and size.

(h) second branch.It is responsible for preservation in training, optimization and update have pedestrian's feature of label, survey in model It is responsible for searched targets pedestrian when examination.

3, pedestrian detection is done using depth complementation classifier

(i) in first branch, two depth complementation classifiers are set, flow through a network figure is as shown in Fig. 2, be used for people Class/inhuman judgement.Classifier A is used first, is expressed as f (θ_A) identify the region for most having judgement index, generate characteristic patternAnd erasing operation is directed toward wherein with judgement index characteristic pattern.Then the characteristic pattern after erasing is supplied to it Complementary classifier B, is expressed as f (θ_B), to find the complementary characteristic area with judgement index, generate characteristic pattern

(j) characteristic pattern F is supervised with cross entropy loss function_AAnd F_B, and in unknown losses combined optimization model；

4, it is identified again using the loss of online example match and triple edge center loss joint training pedestrian

(k) characteristic pattern that second branch obtains utilizes the loss of online example match and triple edge center loss connection Training is closed, wherein triple edge center loses its strategic process figure as shown in figure 3, in back-propagating, if target pedestrian Tag along sort be t, just updated using following formula in inquiry table t column, so that inquiry table is saved same target pedestrian Many attitude and angle under various feature V_t←γV_t+ (1- γ) x, wherein V_tTag along sort is represented as pedestrian's feature of t, The weight that γ setting updates, can take γ=0.5 in section (0,1) interior value, this method；

To pedestrian's frame feature of the not tag identity occurred in scene picture as negative sample, for the table of learning characteristic Up to be also it is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ R^D×QIt indicates, D × Q ties up matrix, and D is pedestrian's frame characteristic dimension after L2 regularization, and Q is the size of round-robin queue, is arranged according to actual scene big It is small, while calculating the cosine similarity U in U and minimum batch between sample x^TX, after each round iteration, by new feature Vector is pressed into queue, and rejects those out-of-date feature vectors, and the process of a circulation is presented；

Wherein, X_i∈R^dThe feature of pedestrian's frame i is represented, it is to belong to people's identity label y_iClass,Represent the person Part label y_iThe central feature of class,Representative's identity label y_jThe central feature of class, m indicate the number of pedestrian pedestrian's classification Amount；

5, pedestrian's search model is tested

Construct the test sample of pedestrian's search；And be sent into test sample, trained triple edge center loss Depth complementation classifier pedestrian searches for network, carries out pedestrian detection to the test scene sample image of input, detects candidate row People's frame position is sent into pedestrian and identifies that network obtains pedestrian's feature of its 256 dimension again, then inputs target pedestrian image and equally obtain Itself and pedestrian's frame feature are done characteristic similarity using cosine similarity and match the possibility that sorts out by pedestrian's feature of its 256 dimension Highest identity label, as the result of retrieving identity.

Specific embodiment:

S1, generation and enhancing pedestrian search for scene image sample；

S2, pedestrian detection and feature learning, and the depth complementation classifier for being used for pedestrian detection is set；

The joint training of the loss of line example match and the loss of triple edge center that S3, setting identify again for pedestrian；

S4, pedestrian's search model is tested and is predicted.

The step S1 specifically includes following sub-step:

(a) the pedestrian's frame for filtering true target in pedestrian's scene image, only retains all height less than 70 pixels and width Less than the bounding box of 25 pixels, and intercept 256 × 256 image block comprising the pedestrian；

(b) in pedestrian's edge image after filtration, then select the pedestrian image for showing the complete bodily form, by pedestrian's frame with Machine pixel 0 or 255 noises, i.e., random black or white covering, 256 × 256 image block comprising noise image are fought as pedestrian Generate the training set of network；

(c) training pedestrian, which fights, generates network, in the image block of interception pedestrian's scene image 256 × 256, selects any position Cover height is set less than 70 pixels and width less than 25 pixels noise frame as test set, trained network generates for utilization After the different pedestrian image of posture, and former scene image is reduced back, completes data generation and enhancing that pedestrian searches for data set.

The step S2 is specific as follows:

Using VGG-16 or ResNet-50,1024 channel characteristics figures are exported, size is that the resolution ratio of input picture is 1/ 16, with Faster R-CNN on characteristic pattern, region suggests that network (RPN) detects pedestrian's frame, and setting binary softmax divides Position and size that smooth absolutely loss function returns bounding box is arranged for determining whether anchor point is pedestrian in class device；

Then, non-maximum suppression is used to delete repetition detection, and retains 128 couple candidate detection frames for each image, so Pedestrian's frame feeding pond layer is obtained into 7 × 7 × 2048 characteristic pattern afterwards, one layer of fully-connected network is reconnected and is sent into three branches Network；

First branch is the classifier of depth complementation, can make the mankind/inhuman judgement by training；

Second branch carries out further process of refinement to the position of pedestrian's frame and size；

Third branch is the full convolutional layer of 256 dimensions, and output is the standardized feature of L-2；

(d) in first branch, two depth complementation classifiers are set, is used for the mankind/inhuman judgement, makes first With classifier A, it is expressed as f (θ_A) identify the region for most having judgement index, generate characteristic patternAnd erasing is grasped Being directed toward wherein has judgement index characteristic pattern, and the characteristic pattern after erasing is then supplied to its complementary classifier B, is expressed as f (θ_B), to find the complementary characteristic area with judgement index, generate characteristic pattern

(e) characteristic pattern F is supervised with cross entropy loss function_AAnd F_B, and with unknown losses combined optimization model.

Step S3 is specifically included:

(f) characteristic pattern that second branch obtains utilizes the loss of online example match and triple edge center loss connection Training is closed, in back-propagating, if the tag along sort of target pedestrian is t, is just updated in inquiry table using following formula T column, enable inquiry table to save many attitude of same target pedestrian and the various feature V under angle_t←γV_t+ (1- γ) x, Wherein,

V_tTag along sort is represented as pedestrian's feature of t, the weight that γ setting updates can be in section (0,1) interior value, this γ=0.5 is taken in method；

Wherein, X_i∈R^dThe feature of pedestrian's frame i is represented, it is to belong to people's identity label y_iClass,Represent the person Part label y_iThe central feature of class,Representative's identity label y_jThe central feature of class, m indicate the number of pedestrian pedestrian's classification Amount.

The step S4 includes:

Claims

1. a kind of depth complementation classifier pedestrian's searching method of triple edge center loss, which is characterized in that including following Step:

(1) it before model training, is fought by original image training pedestrian and generates network, and using the network in original image Any position synthesize new pedestrian, generate new scene picture, reach generation and enhancing pedestrian search for network training dataset Purpose；

(2) in the training stage, feature information extraction is carried out by entire scene image of the convolutional neural networks to input first；

(3) in pedestrian's detection-phase setting area candidate network RPN, and depth complementation classifier is utilized, obtains each frame video It is likely to be the candidate region of pedestrian target, and the size and location of constantly amendment pedestrian candidate region in image, extracts them Characteristic information；

(4) it after the characteristic information pond for the pedestrian candidate region that will test out is melted into identical size, is sent into pedestrian and identifies that network is instructed again Practice, using the loss of triple edge center and online example match loss function combined optimization and updates pedestrian's feature, promote row The performance that people's search model identifies again；

(5) in model measurement and forecast period, pedestrian's inspection is carried out to input scene image using trained pedestrian's search model It surveys, after detecting pedestrian's frame, carries out characteristic similarity matching sequence with target pedestrian image and retrieve, characteristic matching degree highest Person is the pedestrian information that need to be retrieved.

2. a kind of depth complementation classifier pedestrian's searching method of triple edge center loss according to claim 1, It is characterized in that, the step 1 specifically includes:

1.1, pedestrian's frame of true target in pedestrian's scene image is filtered, only retains height and width is less than the side of a certain fixed value Boundary's frame, and intercept the fixed-size image block comprising the pedestrian；

1.2, in pedestrian image block after filtration, then the pedestrian image for showing the complete bodily form is selected, by pedestrian's frame at random Pixel 0 or 255 noises, i.e., random black or white covering, the image block comprising noise image fight the instruction for generating network as pedestrian Practice collection；

1.3, by pedestrian fight generate model training, make the e-learning one by black and white noise frame to specific pedestrian image Mapping relations；

1.4, when generating pedestrian image, interception needs to generate the fixed-size image block of the scene image of pedestrian, and selection is wherein Any position covers certain altitude and width noise frame, fights as pedestrian and generates model measurement collection；

1.5, using network training go out by black and white noise frame to the mapping relations of specific pedestrian image, after generation pedestrian image, And former scene image is reduced back, complete data generation and enhancing that pedestrian searches for data set.

3. a kind of depth complementation classifier pedestrian's searching method of triple edge center loss according to claim 1, It is characterized in that, the step 3 is specific as follows:

3.1, after the characteristic information of the entire scene image extracted in step 2, suggest network using Faster R-CNN and region RPN detects pedestrian candidate region, and cross entropy loss function classifier is arranged, and for determining whether anchor point is pedestrian, setting is smooth Absolute loss function returns position and the size of bounding box；

3.2, non-maximum suppression then, is used to delete repetition detection, and retains 128 couple candidate detection frames for each image, so Pedestrian's frame feeding pond layer is obtained into 7 × 7 × 2048 characteristic pattern afterwards, one layer of fully-connected network is reconnected and is sent into three branches Network；

3.6, in first branch, two depth complementation classifiers are set, is used for the mankind/inhuman judgement, uses first Classifier A is expressed as f (θ_A) identify the region for most having judgement index, generate characteristic patternAnd erasing operation is referred to There is judgement index characteristic pattern thereto, the characteristic pattern after erasing is then supplied to its complementary classifier B, is expressed as f (θ_B), with It was found that the complementary characteristic area with judgement index, generates characteristic pattern

4. a kind of depth complementation classifier pedestrian's searching method of triple edge center loss according to claim 1, It is characterized in that, step 4 specifically includes:

4.1, the characteristic pattern that second branch obtains utilizes the loss of online example match and triple edge center loss joint instruction Practice, in back-propagating, if the tag along sort of target pedestrian is t, the t in inquiry table is just updated using following formula Column, enable inquiry table to save many attitude of same target pedestrian and the various feature V under angle_t←γV_t+ (1- γ) x, In,

V_tTag along sort is represented as pedestrian's feature of t；

4.2, to pedestrian's frame feature of the not tag identity occurred in scene picture as negative sample, for the table of learning characteristic Up to be also it is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ R^D×QIt indicates, D × Q ties up matrix, and D is pedestrian's frame characteristic dimension after L2 regularization, and Q is the size of round-robin queue, is arranged according to actual scene big It is small, while calculating the cosine similarity U in the feature U without tag identity and minimum batch between sample x^TX, in each round iteration Later, new feature vector is pressed into queue, and rejects those out-of-date feature vectors, the process of a circulation is presented；

It introduces triple edge center loss function shown in formula (6) and constraint is realized to the feature with tag identity, pass through Reduce difference in class, increases class inherited loss Optimized model training, only training has mark to triple edge center loss function Pedestrian's feature of label changes model minimization with the internal feature of a group traveling together, maximizes the internal feature variation of different pedestrians

Representative's identity label y_iThe central feature of class；

Representative's identity label y_jThe central feature of class；

The quantity of m expression pedestrian pedestrian's classification.

5. a kind of depth complementation classifier pedestrian's searching method of triple edge center loss according to claim 1, It is characterized in that, the step 5 is specific as follows:

Construct the test sample of pedestrian's search；And the depth that test sample is sent into the loss of trained triple edge center is mutual It mends classifier pedestrian and searches for network, pedestrian detection is carried out to the test scene sample image of input, detects candidate pedestrian's frame position Set, be sent into pedestrian identify again network obtain its 256 dimension pedestrian's feature, then input target pedestrian image equally obtain its 256 Itself and pedestrian's frame feature are done characteristic similarity using cosine similarity and match the possibility highest that sorts out by pedestrian's feature of dimension Identity label, as the result of retrieving identity.