CN108875487A

CN108875487A - Pedestrian is identified the training of network again and is identified again based on its pedestrian

Info

Publication number: CN108875487A
Application number: CN201710906719.5A
Authority: CN
Inventors: 罗浩; 张弛
Original assignee: Beijing Megvii Technology Co Ltd; Beijing Maigewei Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd; Beijing Maigewei Technology Co Ltd
Priority date: 2017-09-29
Filing date: 2017-09-29
Publication date: 2018-11-23
Anticipated expiration: 2037-09-29
Also published as: CN108875487B

Abstract

The present invention provides method, apparatus, system and storage medium that pedestrian is identified the training of network and identified based on its pedestrian again again, the pedestrian identifies that the training method of network includes again：Pre-training is carried out to baseline network using Classification Loss；And joint classification loss and five-tuple loss carry out tuning to pre-trained baseline network and identify network again to obtain pedestrian.Pedestrian according to an embodiment of the present invention identifies that training method, device, system and the loss of storage medium joint classification of network and range loss are trained again, can accelerate training process and improve precision；In addition, using five-tuple method in range loss link, compared to traditional triple, triple and four-tuple method are improved, can significantly shorten the training time, and further increase precision.

Description

Pedestrian is identified the training of network again and is identified again based on its pedestrian

Technical field

The present invention relates to pedestrian's weight identification technology fields, relate more specifically to training and base that a kind of pedestrian identifies network again In its pedestrian recognition methods, device, system and storage medium again.

Background technique

Pedestrian identify again also referred to as pedestrian identify again, be judged using computer vision technique be in image or video sequence It is no that there are the technologies of specific pedestrian.A monitoring pedestrian image is given, the pedestrian image under striding equipment is retrieved.It is intended to make up mesh The vision of the camera of preceding fixation is limited to, and can be combined with pedestrian detection/pedestrian tracking technology, can be widely applied to intelligent view The fields such as frequency monitoring, intelligent security.

Existing pedestrian again recognition methods according to training thought be segmented into two kinds of ways：The first is that each pedestrian is made For a classification, pedestrian is identified again and is converted into image classification problem；Second is the feature for extracting every pedestrian's picture, is calculated The distance of two kinds of picture features is schemed between the distance of picture feature and the different pedestrians of maximization between the same person by minimizing The distance of piece feature come train one extraction feature network model, current method include triple, improve triple and Four-tuple.

However, the model based on Classification Loss training is extremely difficult to a high level in precision aspect, and based on away from Although the model precision from loss training would generally be better than the former, net training time is but very long.

Summary of the invention

In view of the above-mentioned problems, the invention proposes a kind of schemes of training for identifying network again about pedestrian, in conjunction with two The advantages of kind method, to accelerate training process and improves precision by joint classification loss and range loss.It is briefly described below The scheme of the training proposed by the present invention for identifying network again about pedestrian, more details will be embodied in subsequent combination attached drawing It is described in mode.

According to an aspect of the present invention, the training method that a kind of pedestrian identifies network again is provided, the training method includes： Pre-training is carried out to baseline network using Classification Loss；And joint classification loss and five-tuple loss are to pre-trained benchmark Network carries out tuning and identifies network again to obtain pedestrian.

In one embodiment of the invention, described to include to baseline network progress pre-training using Classification Loss：By sample This picture is input to the baseline network；Predicted vector and the sample by the baseline network for samples pictures output The label vector of this picture is compared to obtain Classification Loss；The ginseng of the baseline network is adjusted based on the Classification Loss Number；And above-mentioned steps are repeated, until classification accuracy and Classification Loss no longer change substantially.

In one embodiment of the invention, the baseline network is residual error network.

In one embodiment of the invention, before the samples pictures are input to the baseline network, to described Samples pictures implement pretreatment operation.

In one embodiment of the invention, the joint classification loss and five-tuple loss are to pre-trained reference net Network carries out tuning：By pre-provisioning request and five samples pictures for sequentially inputting five-tuple；It is directed to based on the baseline network The predicted vector of every samples pictures output calculates Classification Loss；Five sample graphs are directed to based on the baseline network The feature vector of piece output calculates five-tuple loss；And based on Classification Loss calculated and five-tuple calculated loss Loss of the final loss to identify network again as the pedestrian.

In one embodiment of the invention, the Classification Loss calculated is that the classification of five samples pictures is damaged The average value of mistake.

In one embodiment of the invention, the five-tuple loss is defined as：

l_qt=d (positive sample 1, positive sample 2)-d (negative sample 1, negative sample 21)+d (negative sample 21, negative sample 22)-d is (negative Sample 1, positive sample 2)+a

Wherein, l_qtFor five-tuple loss；Positive sample 1, positive sample 2, negative sample 1, negative sample 21 and negative sample 22 are institute Five samples pictures are stated, and positive sample 1 and positive sample 2 are two different pictures of the first pedestrian, negative sample 1 is the second pedestrian's Picture, negative sample 21 and negative sample 22 are two different pictures of third pedestrian；D be two pictures feature vector between away from From；A is the constant parameter being arranged according to demand.

In one embodiment of the invention, the final loss by the Classification Loss calculated and described is counted The weighted sum of the five-tuple loss of calculation.

According to a further aspect of the invention, the training device that a kind of pedestrian identifies network again is provided, the experienced device includes： Pre-training module, for carrying out pre-training to baseline network using Classification Loss；And tuning module, it is lost for joint classification Tuning is carried out to pre-trained baseline network with five-tuple loss and identifies network again to obtain pedestrian.

In one embodiment of the invention, the pre-training module further wraps the pre-training of the baseline network It includes：Samples pictures are input to the baseline network；The baseline network is directed to the predicted vector of samples pictures output It is compared with the label vector of the samples pictures to obtain Classification Loss；The reference net is adjusted based on the Classification Loss The parameter of network；And aforesaid operations are repeated, until classification accuracy and Classification Loss no longer change substantially.

In one embodiment of the invention, the pre-training module is also used to：The samples pictures are being input to institute Before stating baseline network, pretreatment operation is implemented to the samples pictures.

In one embodiment of the invention, the tuning module includes to the tuning of pre-trained baseline network：It presses Pre-provisioning request and five samples pictures for sequentially inputting five-tuple；It is defeated for every samples pictures based on the baseline network Predicted vector out calculates Classification Loss；Based on feature vector of the baseline network for five samples pictures output Calculate five-tuple loss；And based on the final loss of Classification Loss calculated and five-tuple costing bio disturbance calculated using as The pedestrian identifies the loss of network again.

In one embodiment of the invention, the five-tuple loss is defined as：

According to a further aspect of the invention, provide a kind of pedestrian's recognition methods again, the pedestrian again recognition methods using upper It states described in any item pedestrians and identifies that pedestrian made of the training method training of network identifies that network carries out pedestrian and identifies again again again.

According to a further aspect of the invention, a kind of pedestrian's weight identification device is provided, pedestrian's weight identification device is for real Apply above-mentioned pedestrian recognition methods again.

Another aspect according to the present invention provides a kind of computing system, and the system comprises storage device and processor, institutes The computer program for being stored on storage device and being run by the processor is stated, the computer program is transported by the processor Pedestrian described in any of the above embodiments is executed when row to identify the training method of network again or execute above-mentioned pedestrian's recognition methods again.

According to a further aspect of the present invention, a kind of storage medium is provided, is stored with computer program on the storage medium, The computer program executes pedestrian described in any of the above embodiments at runtime and identifies the training method of network again or execute above-mentioned Pedestrian's recognition methods again.

Pedestrian according to an embodiment of the present invention identifies training method, device, system and the storage medium joint point of network again Class loss and range loss are trained, and can be accelerated training process and be improved precision；In addition, being used in range loss link Five-tuple method compared to traditional triple, improves triple and four-tuple method, can significantly shorten the training time, and And further increase precision.

Detailed description of the invention

The embodiment of the present invention is described in more detail in conjunction with the accompanying drawings, the above and other purposes of the present invention, Feature and advantage will be apparent.Attached drawing is used to provide to further understand the embodiment of the present invention, and constitutes explanation A part of book, is used to explain the present invention together with the embodiment of the present invention, is not construed as limiting the invention.In the accompanying drawings, Identical reference label typically represents same parts or step.

Fig. 1 show for realizing pedestrian according to an embodiment of the present invention identify again the training method of network, device, system and The schematic block diagram of the exemplary electronic device of storage medium；

Fig. 2 shows the schematic flow charts that pedestrian according to an embodiment of the present invention identifies the training method of network again；

Fig. 3 shows the schematic diagram of baseline network pre-training according to an embodiment of the present invention；

Fig. 4 shows the schematic diagram of tuning after baseline network pre-training according to an embodiment of the present invention；

Fig. 5 shows the schematic block diagram that pedestrian according to an embodiment of the present invention identifies the training device of network again；And

Fig. 6 shows the schematic block diagram that pedestrian according to an embodiment of the present invention identifies the training system of network again.

Specific embodiment

In order to enable the object, technical solutions and advantages of the present invention become apparent, root is described in detail below with reference to accompanying drawings According to example embodiments of the present invention.Obviously, described embodiment is only a part of the embodiments of the present invention, rather than this hair Bright whole embodiments, it should be appreciated that the present invention is not limited by example embodiment described herein.Based on described in the present invention The embodiment of the present invention, those skilled in the art's obtained all other embodiment in the case where not making the creative labor It should all fall under the scope of the present invention.

Firstly, identifying the training method of network, dress again to describe the pedestrian for realizing the embodiment of the present invention referring to Fig.1 It sets, the exemplary electronic device 100 of system and storage medium.

As shown in Figure 1, electronic equipment 100 include one or more processors 102, it is one or more storage device 104, defeated Enter device 106, output device 108 and image collecting device 110, these components pass through bus system 112 and/or other forms Bindiny mechanism's (not shown) interconnection.It should be noted that the component and structure of electronic equipment 100 shown in FIG. 1 are only exemplary, And not restrictive, as needed, the electronic equipment also can have other assemblies and structure.

The processor 102 can be central processing unit (CPU) or have data-handling capacity and/or instruction execution The processing unit of the other forms of ability, and the other components that can control in the electronic equipment 100 are desired to execute Function.

The storage device 104 may include one or more computer program products, and the computer program product can To include various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.It is described easy The property lost memory for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non- Volatile memory for example may include read-only memory (ROM), hard disk, flash memory etc..In the computer readable storage medium On can store one or more computer program instructions, processor 102 can run described program instruction, to realize hereafter institute The client functionality (realized by processor) in the embodiment of the present invention stated and/or other desired functions.In the meter Can also store various application programs and various data in calculation machine readable storage medium storing program for executing, for example, the application program use and/or The various data etc. generated.

The input unit 106 can be the device that user is used to input instruction, and may include keyboard, mouse, wheat One or more of gram wind and touch screen etc..

The output device 108 can export various information (such as image or sound) to external (such as user), and It may include one or more of display, loudspeaker etc..

Described image acquisition device 110 can acquire the desired image of user (such as photo, video etc.), and will be adopted The image of collection is stored in the storage device 104 for the use of other components.Image collecting device 110 can be camera. It should be appreciated that image collecting device 110 is only example, electronic equipment 100 can not include image collecting device 110.This In the case of, it can use other image acquisition device samples pictures or samples pictures, and the picture of acquisition is sent to electricity Sub- equipment 100.

Illustratively, training method, device, the system of network are identified again for realizing pedestrian according to an embodiment of the present invention It may be implemented as smart phone, tablet computer etc. with the exemplary electronic device of storage medium.

In the following, reference Fig. 2 is described the training method 200 that pedestrian according to an embodiment of the present invention identifies network again.Such as Fig. 2 Shown, pedestrian identifies that the training method 200 of network may include steps of again：

In step S210, pre-training is carried out to baseline network using Classification Loss.

In one embodiment, pre-training network model can be carried out first with Classification Loss, due to usually passing through tens Secondary repetitive exercise can make network fast convergence, and reach identical performance, and the method based on range loss at least needs Ten times of training time is expended, therefore the training time can be greatly shortened using Classification Loss pre-training network model.

In one embodiment, it is known as baseline network using the network model that Classification Loss carries out pre-training, it is subsequent to retouch The tuning step stated is implemented on the baseline network after pre-training.Illustratively, baseline network can be residual error network, The for example, residual error network (ResNet50) of large-scale image identification challenge match (ImageNet) pre-training.When baseline network is to be somebody's turn to do When residual error network, before samples pictures are input to baseline network to be trained, first samples pictures can be pre-processed.

For example, can be 224 × 224 pixels by the size conversion of samples pictures, picture format be BGR channel format, often A channel needs to cut all images of ImageNet in the average value in the channel, is formulated as：

The channel B -104.00698793 of new channel B=original

The channel G -116.66876762 in the new channel G=original

The channel R -122.67891434 in the new channel R=original

Pretreated as above journey is merely exemplary, and is not required.It in other examples, can also be using other certainly The convolutional network of definition or other suitable networks are as baseline network, correspondingly, samples pictures can be input to the base Implement other suitable preprocessing process before pseudo-crystalline lattice.

In one embodiment, carrying out pre-training to baseline network using Classification Loss in step S210 can further wrap It includes：Samples pictures are input to baseline network；Predicted vector and the sample by baseline network for samples pictures output The label vector of this picture is compared to obtain Classification Loss；The ginseng of the baseline network is adjusted based on the Classification Loss Number；And above-mentioned steps are repeated, until classification accuracy and Classification Loss no longer change substantially.

Specifically, the structure and pre-training process of said reference network can be further understood in conjunction with Fig. 3.As shown in figure 3, Input picture (such as by pretreated samples pictures) can be input to baseline network and (be shown in Figure 3 for residual error net Network ResNet50), after normalizing classification layer (Softmax), baseline network can be for every samples pictures output one A predicted vector, the value of i-th of element of the predicted vector indicate this picture be i-th of people probability (i=1,2, 3 ... ..., N, wherein N be natural number), so the element of this vector and be 1.

It then, can be by label vector (label marked, for example, artificial mark of the predicted vector and the samples pictures Label) compare to obtain Classification Loss.Since label vector is only hot (one-hot) vector, i.e. only one element is 1, other elements are 0, this 1 with regard to representing is which individual, i.e. id information.Classification Loss is the prediction of baseline network output Difference between vector and label vector (for example, by using entropy loss is intersected).It then, can be by Classification Loss backpropagation Hui Ji Pseudo-crystalline lattice adjusts the parameter of baseline network.

Forward calculation predicted vector and the reversed network parameter that updates are primary complete iteration, and such iteration is repeated Classification accuracy and Classification Loss to the last no longer changes then deconditioning substantially.This usual stage only needs tens times Iterative network can fast convergence.Therefore, the training time can be greatly shortened.

Now referring back to Fig. 2, the training method 200 that pedestrian according to an embodiment of the present invention identifies network again is continued to describe Subsequent step.

In step S220, joint classification loss and five-tuple loss carry out tuning to pre-trained baseline network to obtain Pedestrian identifies network again.

In one embodiment, five-tuple is referred to according to five samples for having to the different pedestrians of summation sequential selection three This picture, concrete condition are as follows：

(1) picture 1：The first picture of pedestrian 1, is named as positive sample 1；

(2) picture 2：The second picture of pedestrian 1, it is different from picture 1, it is named as positive sample 2；

(3) picture 3：The first picture of pedestrian 2, is named as negative sample 1；

(4) picture 4：The first picture of pedestrian 3, is named as negative sample 21；

(5) picture 5：The second picture of pedestrian 3, it is different from picture 4, it is named as negative sample 22.

In one embodiment, five-tuple loss can be defined as formula：

Wherein, l_qtFor five-tuple loss, positive sample 1, positive sample 2, negative sample 1, negative sample 21 and negative sample 22 are upper Five samples pictures are stated, positive sample 1 and positive sample 2 are two different pictures of the first pedestrian, and negative sample 1 is the figure of the second pedestrian Piece, negative sample 21 and negative sample 22 are two different pictures of third pedestrian；A be arranged according to demand constant parameter (such as May be configured as 2 or other values for being arbitrarily arranged according to actual needs)；D is the distance between the feature vector of two pictures, example Such as, d (positive sample 1, positive sample 2) be pedestrian 1 the first picture and pedestrian 1 the second picture feature vector between away from The distance between the first picture from the first picture and pedestrian 3 that, d (negative sample 1, negative sample 21) is pedestrian 2, it is all so Class etc..In one example, above-mentioned d can indicate Euclidean distance.In other examples, can also based on feature vector it Between other distances come calculate five-tuple loss, COS distance, mahalanobis distance etc..

The distance between feature vector (also referred to as image content feature vector) of different pictures (such as it is two-dimentional it is European away from From) it can define similarity between different pictures.Above-mentioned samples pictures are input to the baseline network by pre-training, it is complete to connect Layer Fc (alternatively referred to as characteristic layer, as shown in Figure 3) feature vector for corresponding to every samples pictures will be exported.It is assumed that above-mentioned figure The feature vector of piece 1 and picture 2 by extraction after network is respectively f1 and f₂, regularization can be carried out to feature vector first (normalization), the formula of regularization is：

Wherein, | f | indicate the mould of vector f, it is assumed that use f_n1And f_n2Respectively indicate f₁And f₂Vector after regularization, then it is two-dimentional Euclidean distance is defined as：

Based on above-mentioned distance d, five-tuple loss can be calculated.

In one embodiment, step S220 joint classification loss and five-tuple loss to pre-trained baseline network into Row tuning may further include：By pre-provisioning request and five samples pictures for sequentially inputting five-tuple；Based on the reference net Network calculates Classification Loss for the predicted vector of every samples pictures output；Described five are directed to based on the baseline network The feature vector of samples pictures output calculates five-tuple loss；And it is based on Classification Loss calculated and five-tuple calculated Loss of the final loss of costing bio disturbance to identify network again as the pedestrian.

Joint classification loss and five-tuple loss recited above are described to pre-trained baseline network below with reference to Fig. 4 Carry out the example process of tuning.

It is as shown in Figure 4, by above-mentioned five-tuple samples pictures (including positive sample 1, positive sample 2, negative sample 1, negative sample 21 And negative sample 22) be input to it is trained after baseline network.Herein, for the reference net in the pre-training stage with step S210 Network is mutually distinguished, and the baseline network in pre-training stage is named as ID network (IDNet), and by the network in step S220 tuning stage It is named as five-tuple-ID network (Quintuplet-IDNet), it should be understood, however, that actually the two stages are the same networks Positive sample 1, positive sample 2, negative sample 1, negative sample 21 and negative sample 22 are input to by structure as shown in Figure 4 Quintuplet-IDNet, namely it is input to IDNet.

After above-mentioned samples pictures are input to Quintuplet-IDNet, for every samples pictures, Fc layers will output Corresponding feature vector, Softmax layers will export corresponding predicted vector.For example, as shown in figure 4, with positive sample 1, positive sample 2, negative sample 1, negative sample 21 and the corresponding feature vector of negative sample 22 are respectively feature 1, feature 2, feature 3, feature 4 and feature 5；With positive sample 1, positive sample 2, negative sample 1, negative sample 21 and negative sample 22 corresponding predicted vectors are respectively ID1, ID2, ID3, ID4 and ID5.

Then, Classification Loss can be calculated based on the predicted vector for every samples pictures, calculation method is similar to Described in step S210.Herein, due to input five pictures, so final Classification Loss can be this five picture The average value of Classification Loss.Then, five-tuple loss, calculation method can be calculated based on the feature vector of five samples pictures As described above.

Finally, can based on the final loss of Classification Loss calculated and five-tuple costing bio disturbance calculated using as Final pedestrian identifies the loss of network again.Illustratively, the final loss is the Classification Loss calculated and institute The weighted sum for stating five-tuple loss calculated, is expressed as：

Loss=λ l_ID+(1-λ)l_qt

Wherein, λ is the weight parameter of 0~1 range, can voluntarily be adjusted.Illustratively, 0.5 can be set by λ.

The IDNet pre-trained come tuning is lost by the loss of above-mentioned joint classification and five-tuple, after tuning Quintuplet-IDNet identifies that network identifies again for pedestrian as final pedestrian again.

The above-mentioned pedestrian completed based on training identifies network again, when input one opens picture probe to be checked and wait search After the pedestrian image collection gallery of rope, it can identify that again the propagated forward of network obtains every by above-mentioned trained pedestrian The feature vector of picture, through the feature vector of calculating probe picture at a distance from picture feature vector every in gallery, It can obtain a sequencing of similarity.When the threshold value for being less than setting in gallery with probe picture minimum range, then it is assumed that This picture (i.e. most like picture) and probe picture in gallery are the same pedestrians, and pedestrian's weight identification mission is complete At.

Based on above description, pedestrian according to an embodiment of the present invention identifies the training method joint classification loss of network again Be trained with range loss so that finally trained network possess simultaneously based on Classification Loss and based on range loss both The advantages of method, can accelerate training process and improve precision；In addition, using five-tuple method, phase in range loss link It is significant to shorten instruction than half or so can be shorten in traditional triple, improvement triple and four-tuple method, training time Practice the time, and the inter- object distance that can further further, between class distance is zoomed out, to further increase precision.

The training method that pedestrian according to an embodiment of the present invention identifies network again is described above exemplarily.It is exemplary Ground, pedestrian according to an embodiment of the present invention identify again network training method can with memory and processor equipment, It is realized in device or system.

In addition, pedestrian according to an embodiment of the present invention identifies that the training method processing speed of network is fast again, it can be convenient ground It is deployed in the mobile devices such as smart phone, tablet computer, personal computer.Alternatively, pedestrian according to an embodiment of the present invention Identify that the training method of network can also be deployed in server end (or cloud) again.Alternatively, row according to an embodiment of the present invention People identifies that the training method of network can also be deployed at server end (or cloud) and personal terminal with being distributed again.

Identify the training device of network again according to the pedestrian that another aspect provides below with reference to Fig. 5 description.Fig. 5 shows root Identify the schematic block diagram of the training device 500 of network again according to the pedestrian of the embodiment of the present invention.

As shown in figure 5, pedestrian according to an embodiment of the present invention identifies that the training device 500 of network includes pre-training module again 510 and tuning module 520.The modules can execute the instruction for identifying network again above in conjunction with the pedestrian of Fig. 2 description respectively Practice each step/function of method.Only pedestrian is identified again below the major function of each module of the training device 500 of network into Row description, and omit the detail content having been described above.

Pre-training module 510 is used to carry out pre-training to baseline network using Classification Loss.Tuning module 520 is for combining Classification Loss and five-tuple loss carry out tuning to pre-trained baseline network and identify network again to obtain pedestrian.

In one embodiment, pre-training module 510 can use Classification Loss and carry out pre-training network model, due to usual It can make network fast convergence by tens repetitive exercises, therefore can be significantly using Classification Loss pre-training network model Shorten the training time.

In one embodiment, pre-training module 510 is known as benchmark using the network model that Classification Loss carries out pre-training Network, the subsequent evolutionary process of tuning module 520 are implemented on the baseline network after pre-training.Illustratively, benchmark Network can be residual error network, for example, the residual error network of large-scale image identification challenge match (ImageNet) pre-training (ResNet50).When baseline network be the residual error network when, pre-training module 510 by samples pictures be input to baseline network with Before being trained, first samples pictures can be pre-processed.In other examples, pre-training module 510 can also use it His customized convolutional network or other suitable networks are as baseline network, and correspondingly, pre-training module 510 can be by sample This picture implements corresponding preprocessing process before being input to the baseline network.

In one embodiment, pre-training module 510 can be into one to baseline network progress pre-training using Classification Loss Step includes：Samples pictures are input to baseline network；Predicted vector and institute by baseline network for samples pictures output The label vector for stating samples pictures is compared to obtain Classification Loss；The baseline network is adjusted based on the Classification Loss Parameter；And aforesaid operations are repeated, until classification accuracy and Classification Loss no longer change substantially.It is referred to tie above Structure and pre-training process that Fig. 3 further understands said reference network are closed, for sake of simplicity, details are not described herein again.

In one embodiment, five-tuple is referred to according to five samples for having to the different pedestrians of summation sequential selection three This picture.In one embodiment, the loss of 520 joint classification of tuning module and five-tuple loss are to pre-trained baseline network Carrying out tuning may further include：By pre-provisioning request and five samples pictures for sequentially inputting five-tuple；Based on the benchmark Network calculates Classification Loss for the predicted vector of every samples pictures output；Described five are directed to based on the baseline network The feature vector for opening samples pictures output calculates five-tuple loss；And based on Classification Loss calculated and five yuan calculated Loss of the final loss of group costing bio disturbance to identify network again as the pedestrian.It is referred to understand above in association with Fig. 4 and join It closes Classification Loss and five-tuple loss carries out the process of tuning to pre-trained baseline network, for sake of simplicity, no longer superfluous herein It states.

Based on above description, pedestrian according to an embodiment of the present invention identifies the training device joint classification loss of network again Be trained with range loss so that finally trained network possess simultaneously based on Classification Loss and based on range loss both The advantages of method, can accelerate training process and improve precision；In addition, using five-tuple method, phase in range loss link It is significant to shorten instruction than half or so can be shorten in traditional triple, improvement triple and four-tuple method, training time Practice the time, and the inter- object distance that can further further, between class distance is zoomed out, to further increase precision.

Fig. 6 shows the schematic block diagram that pedestrian according to an embodiment of the present invention identifies the training system 600 of network again.Row People identifies that the training system 600 of network includes storage device 610 and processor 620 again.

Wherein, storage device 610 stores the training side for identifying network again for realizing pedestrian according to an embodiment of the present invention The program code of corresponding steps in method.Program code of the processor 620 for being stored in Running storage device 610, to execute Pedestrian according to an embodiment of the present invention identifies the corresponding steps of the training method of network again, and for realizing real according to the present invention The pedestrian for applying example identifies the corresponding module in the training device of network again.

In one embodiment, pedestrian is made to identify the instruction of network again when said program code is run by processor 620 Practice system 600 and executes following steps：Pre-training is carried out to baseline network using Classification Loss；And joint classification loses and five yuan Group loss carries out tuning to pre-trained baseline network and identifies network again to obtain pedestrian.

In one embodiment, pedestrian is made to identify the instruction of network again when said program code is run by processor 620 Practice the described of the execution of system 600 includes to baseline network progress pre-training using Classification Loss：Samples pictures are input to described Baseline network；By the baseline network for the predicted vector of samples pictures output and the label vector of the samples pictures It is compared to obtain Classification Loss；The parameter of the baseline network is adjusted based on the Classification Loss；And it is repeated Step is stated, until classification accuracy and Classification Loss no longer change substantially.

In one embodiment, the baseline network is residual error network.

In one embodiment, before the samples pictures are input to the baseline network, to the samples pictures Implement pretreatment operation.

In one embodiment, pedestrian is made to identify the instruction of network again when said program code is run by processor 620 The joint classification loss and five-tuple loss for practicing the execution of system 600 carry out tuning to pre-trained baseline network and include： By pre-provisioning request and five samples pictures for sequentially inputting five-tuple；Every samples pictures are directed to based on the baseline network The predicted vector of output calculates Classification Loss；The feature vector of five samples pictures output is directed to based on the baseline network Calculate five-tuple loss；And based on the final loss of Classification Loss calculated and five-tuple costing bio disturbance calculated to make Identify the loss of network again for the pedestrian.

In one embodiment, the Classification Loss calculated is being averaged for the Classification Loss of five samples pictures Value.

In one embodiment, the five-tuple loss is defined as：

In one embodiment, the final loss is the Classification Loss calculated and five yuan calculated described The weighted sum of group loss.

In addition, according to embodiments of the present invention, additionally providing a kind of storage medium, storing program on said storage Instruction identifies network for executing the pedestrian of the embodiment of the present invention when described program instruction is run by computer or processor again Training method corresponding steps, and identify again for realizing pedestrian according to an embodiment of the present invention in the training device of network Corresponding module.The storage medium for example may include the storage card of smart phone, the storage unit of tablet computer, personal meter The hard disk of calculation machine, read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM), the read-only storage of portable compact disc Any combination of device (CD-ROM), USB storage or above-mentioned storage medium.The computer readable storage medium can be Any combination of one or more computer readable storage mediums, such as a computer readable storage medium include to utilize classification The computer-readable program code that pre-training is carried out to baseline network is lost, another computer readable storage medium includes connection It closes Classification Loss and five-tuple loss and tuning is carried out to obtain the calculating that pedestrian identifies network again to pre-trained baseline network The readable program code of machine.

In one embodiment, the computer program instructions may be implemented real according to the present invention when being run by computer The pedestrian for applying example identifies each functional module of the training device of network again, and/or can execute and implement according to the present invention The pedestrian of example identifies the training method of network again.

In one embodiment, the computer program instructions make computer or place when being run by computer or processor It manages device and executes following steps：Pre-training is carried out to baseline network using Classification Loss；And joint classification loss and five-tuple damage It loses and network is identified again to obtain pedestrian to pre-trained baseline network progress tuning.

In one embodiment, the computer program instructions make computer or place when being run by computer or processor Manage the described of device execution includes to baseline network progress pre-training using Classification Loss：Samples pictures are input to the reference net Network；Label vector by the baseline network for the predicted vector and the samples pictures of samples pictures output compares Compared with to obtain Classification Loss；The parameter of the baseline network is adjusted based on the Classification Loss；And above-mentioned steps are repeated, Until classification accuracy and Classification Loss no longer change substantially.

In one embodiment, the baseline network is residual error network.

In one embodiment, the computer program instructions make computer or place when being run by computer or processor The joint classification loss and five-tuple loss that reason device executes carry out tuning to pre-trained baseline network and include：By predetermined It is required that and sequentially inputting five samples pictures of five-tuple；It is exported based on the baseline network for every samples pictures Predicted vector calculates Classification Loss；Feature vector based on the baseline network for five samples pictures output calculates five Tuple loss；And based on the final loss of Classification Loss calculated and five-tuple costing bio disturbance calculated using as described Pedestrian identifies the loss of network again.

In one embodiment, the five-tuple loss is defined as：

Pedestrian according to an embodiment of the present invention identifies that each module in the training device of network can be by according to this hair again The pedestrian of bright embodiment identifies that the processor of the electronic equipment of the training of network runs the computer journey stored in memory again Sequence instructs to realize, or can be in the computer readable storage medium of computer program product according to an embodiment of the present invention The realization when computer instruction of storage is run by computer.

Pedestrian according to an embodiment of the present invention identifies training method, device, system and the storage medium joint of network again Classification Loss and range loss are trained, so that finally trained network possesses simultaneously based on Classification Loss and based on distance damage The advantages of losing both methods, can accelerate training process and improve precision；In addition, using five-tuple in range loss link Method can shorten to half or so compared to traditional triple, improvement triple and four-tuple method, training time, show It writes and shortens the training time, and the inter- object distance that can further further, between class distance is zoomed out, to further increase precision.

Pedestrian according to an embodiment of the present invention is described to example above to identify the training method of network again, device, be System and storage medium.The present invention also provides a kind of pedestrians again recognition methods, and pedestrian described above is used to identify network again Training method training made of pedestrian identify again network carry out pedestrian identify again.The present invention also provides a kind of pedestrians to identify again Device is used to implement pedestrian recognition methods again.The present invention also provides a kind of pedestrian weight identifying systems comprising storage dress It sets and processor, is stored with the computer program run by the processor on the storage device, the computer program exists Pedestrian recognition methods again is executed when being run by the processor.The present invention also provides a kind of storage medium, the storage is situated between Computer program is stored in matter, the computer program executes pedestrian recognition methods again at runtime.Those skilled in the art Member can identify that training method, device, system and the storage of network are situated between again based on pedestrian according to an embodiment of the present invention above-mentioned Matter understands pedestrian according to an embodiment of the present invention recognition methods, device, system and storage medium again, for sake of simplicity, herein no longer It repeats.

Although describing example embodiment by reference to attached drawing here, it should be understood that above example embodiment are only exemplary , and be not intended to limit the scope of the invention to this.Those of ordinary skill in the art can carry out various changes wherein And modification, it is made without departing from the scope of the present invention and spiritual.All such changes and modifications are intended to be included in appended claims Within required the scope of the present invention.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.

In several embodiments provided herein, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, apparatus embodiments described above are merely indicative, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another equipment is closed or is desirably integrated into, or some features can be ignored or not executed.

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.

Similarly, it should be understood that in order to simplify the present invention and help to understand one or more of the various inventive aspects, To in the description of exemplary embodiment of the present invention, each feature of the invention be grouped together into sometimes single embodiment, figure, Or in descriptions thereof.However, the method for the invention should not be construed to reflect following intention：It is i.e. claimed The present invention claims features more more than feature expressly recited in each claim.More precisely, such as corresponding power As sharp claim reflects, inventive point is that the spy of all features less than some disclosed single embodiment can be used Sign is to solve corresponding technical problem.Therefore, it then follows thus claims of specific embodiment are expressly incorporated in this specific Embodiment, wherein each, the claims themselves are regarded as separate embodiments of the invention.

It will be understood to those skilled in the art that any combination pair can be used other than mutually exclusive between feature All features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed any method Or all process or units of equipment are combined.Unless expressly stated otherwise, this specification (is wanted including adjoint right Ask, make a summary and attached drawing) disclosed in each feature can be replaced with an alternative feature that provides the same, equivalent, or similar purpose.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any Can in any combination mode come using.

Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize some or all of some modules according to an embodiment of the present invention Function.The present invention is also implemented as some or all program of device (examples for executing method as described herein Such as, computer program and computer program product).It is such to realize that program of the invention can store in computer-readable medium On, or may be in the form of one or more signals.Such signal can be downloaded from an internet website to obtain, or Person is provided on the carrier signal, or is provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

The above description is merely a specific embodiment or to the explanation of specific embodiment, protection of the invention Range is not limited thereto, and anyone skilled in the art in the technical scope disclosed by the present invention, can be easily Expect change or replacement, should be covered by the protection scope of the present invention.Protection scope of the present invention should be with claim Subject to protection scope.

Claims

1. the training method that a kind of pedestrian identifies network again, which is characterized in that the training method includes：

Pre-training is carried out to baseline network using Classification Loss；And

Joint classification loss and five-tuple loss carry out tuning to pre-trained baseline network and identify network again to obtain pedestrian.

2. training method according to claim 1, which is characterized in that described to be carried out in advance using Classification Loss to baseline network Training includes：

Samples pictures are input to the baseline network；

The baseline network is carried out for the predicted vector of samples pictures output and the label vector of the samples pictures Compare to obtain Classification Loss；

The parameter of the baseline network is adjusted based on the Classification Loss；And

Above-mentioned steps are repeated, until classification accuracy and Classification Loss no longer change substantially.

3. training method according to claim 2, which is characterized in that the baseline network is residual error network.

4. training method according to claim 3, which is characterized in that the samples pictures are being input to the reference net Before network, pretreatment operation is implemented to the samples pictures.

5. training method according to claim 1, which is characterized in that the joint classification loss and five-tuple loss are to warp The baseline network of pre-training carries out tuning：

By pre-provisioning request and five samples pictures for sequentially inputting five-tuple；

Predicted vector based on the baseline network for every samples pictures output calculates Classification Loss；

Five-tuple loss is calculated for the feature vector of five samples pictures output based on the baseline network；And

Based on the final loss of Classification Loss calculated and five-tuple costing bio disturbance calculated to know as the pedestrian again The loss of other network.

6. training method according to claim 5, which is characterized in that the Classification Loss calculated is five samples The average value of the Classification Loss of this picture.

7. training method according to claim 5, which is characterized in that the five-tuple loss is defined as：

l_qt=d (positive sample 1, positive sample 2)-d (negative sample 1, negative sample 21)+d (negative sample 21, negative sample 22)-d (negative sample 1, positive sample 2)+a

Wherein, l_qtFor five-tuple loss；Positive sample 1, positive sample 2, negative sample 1, negative sample 21 and negative sample 22 are described five Samples pictures are opened, and positive sample 1 and positive sample 2 are two different pictures of the first pedestrian, negative sample 1 is the figure of the second pedestrian Piece, negative sample 21 and negative sample 22 are two different pictures of third pedestrian；D be two pictures feature vector between away from From；A is the constant parameter being arranged according to demand.

8. the training method according to any one of claim 5-7, which is characterized in that the final loss is described The weighted sum of Classification Loss calculated and the five-tuple loss calculated.

9. the training device that a kind of pedestrian identifies network again, which is characterized in that the training device includes：

Pre-training module, for carrying out pre-training to baseline network using Classification Loss；And

Tuning module carries out tuning to pre-trained baseline network for joint classification loss and five-tuple loss to be gone People identifies network again.

10. training device according to claim 9, which is characterized in that the pre-training module is to the baseline network Pre-training further comprises：

Samples pictures are input to the baseline network；

Aforesaid operations are repeated, until classification accuracy and Classification Loss no longer change substantially.

11. training device according to claim 10, which is characterized in that the baseline network is residual error network.

12. training device according to claim 11, which is characterized in that the pre-training module is also used to：Will be described Samples pictures are input to before the baseline network, implement pretreatment operation to the samples pictures.

13. training device according to claim 9, which is characterized in that the tuning module is to pre-trained reference net The tuning of network includes：

14. training device according to claim 13, which is characterized in that the Classification Loss calculated is described five The average value of the Classification Loss of samples pictures.

15. training device according to claim 13, which is characterized in that the five-tuple loss is defined as：

16. training device described in any one of 3-15 according to claim 1, which is characterized in that the final loss is The weighted sum of the Classification Loss calculated and the five-tuple loss calculated.

17. a kind of pedestrian recognition methods again, which is characterized in that the pedestrian again recognition methods using according to claim 1 in -8 Described in any item pedestrians identify again network training method training made of pedestrian identify again network carry out pedestrian identify again.

18. a kind of pedestrian weight identification device, which is characterized in that pedestrian's weight identification device is for implementing according to claim 17 Pedestrian recognition methods again.

19. a kind of computing system, which is characterized in that the system comprises storage device and processor, deposited on the storage device The computer program run by the processor is contained, the computer program executes such as right when being run by the processor It is required that pedestrian described in any one of 1-8 identifies the training method of network or executes pedestrian's weight as claimed in claim 17 again Recognition methods.

20. a kind of storage medium, which is characterized in that be stored with computer program, the computer program on the storage medium The pedestrian as described in any one of claim 1-8 is executed at runtime to identify the training method of network again or execute such as right It is required that the recognition methods again of pedestrian described in 17.