CN109948561A - The method and system that unsupervised image/video pedestrian based on migration network identifies again - Google Patents

The method and system that unsupervised image/video pedestrian based on migration network identifies again Download PDF

Info

Publication number
CN109948561A
CN109948561A CN201910227955.3A CN201910227955A CN109948561A CN 109948561 A CN109948561 A CN 109948561A CN 201910227955 A CN201910227955 A CN 201910227955A CN 109948561 A CN109948561 A CN 109948561A
Authority
CN
China
Prior art keywords
image
video
feature
cross
pedestrian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910227955.3A
Other languages
Chinese (zh)
Other versions
CN109948561B (en
Inventor
荆晓远
张新玉
李森
黄鹤
姚永芳
訾璐
彭志平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Petrochemical Technology
Original Assignee
Guangdong University of Petrochemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Petrochemical Technology filed Critical Guangdong University of Petrochemical Technology
Priority to CN201910227955.3A priority Critical patent/CN109948561B/en
Publication of CN109948561A publication Critical patent/CN109948561A/en
Application granted granted Critical
Publication of CN109948561B publication Critical patent/CN109948561B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention belongs to pedestrian's weight identification technology fields, disclose a kind of method and system that the unsupervised image/video pedestrian based on migration network identifies again, image in source domain and sets of video data are carried out feature extraction with improved triple network respectively;Confrontation network is generated using source domain data set and the training of aiming field training set;According to pedestrian image I to be identified in target training settiNetwork is fought using trained generation, generates depth characteristic;Calculate the Euclidean distance in the depth characteristic and aiming field of image between the depth characteristic of video;Selection and query image stamp class label identical with image apart from nearest video.The present invention eliminates the wide gap between image and video using unsupervised method, and label cost is greatly saved, and improves the efficiency that pedestrian identifies again;By carrying out unsupervised deep learning to different modalities image and video, the efficiency of cross-module identification is effectively increased.

Description

The method and system that unsupervised image/video pedestrian based on migration network identifies again
Technical field
The invention belongs to pedestrian weight identification technology field more particularly to a kind of unsupervised image/videos based on migration network The method and system that pedestrian identifies again, and in particular to a kind of to be generated and the migration network of target information reservation using cross-module feature Unsupervised image/video pedestrian knows method for distinguishing again.
Background technique
Currently, the prior art commonly used in the trade is such that
The data set that pedestrian's weight identification model of existing image to video utilizes has label, and zhu et al. is proposed To the method for study, this method can be to mapping matrix in video and heterogeneous figure for a kind of binding characteristic mapping matrix and heterogeneous dictionary As video dictionary is learnt.Zhang et al. proposes the similarity-based learning neural network of time memory a kind of, including a spy Sign indicates subnet and a similitude subnet, the former is extracted the feature of image using convolutional neural networks and is remembered using shot and long term Network extracts its time response, and the latter is for learning distance metric.Wang et al. then devises a kind of point to the network of set, should Network uses k neighbour-triple mode as denoiser first, then using video and image as the input of deep neural network, altogether The distance metric gathered is arrived with the character representation and point for learning unified out.
Pedestrian is identified again as the important research problem in computer vision field, has both practicability and challenge, can Tracking, identification and the Missing Persons positioning etc. of target pedestrian is carried out without limitation of time and space.In recent years, Hang Renchong Identification technology continues to develop mature, in the pedestrian again in identification problem of image to video, needs to be gone out according to the image retrieval of pedestrian The video of the pedestrian under striding equipment is learned since the image and video sample of study are by respectively from different feature spaces Practise they mapping measurement than image to image, video to video the same feature space in metric learning more be stranded It is difficult.
Pedestrian's weight identification model of existing image to video is all based on the frame of supervision, and needs largely to have For the image/video of label to for learning mapping measurement, this proposes challenge to the practical application of these models.Firstly, video comes Source may be city, rural area or it is any other place can not the phase picture pick-up device, these equipment generate video sample may Without any label.In addition to this it for the application scenarios of similar suspicious object tracking and Missing Persons' positioning, generally requires Monitor video is quickly retrieved according to given image, but needs are marked to large-scale sample and pay valuableness Manpower and time cost.Therefore image is carried out using unsupervised method identify that there is important reality again to the pedestrian of video Meaning.
In conclusion problem of the existing technology is:
(1) pedestrian's weight identification model of existing image to video is all based on the frame of supervision, and needs a large amount of It is measured with markd image/video to for learning mapping, and can not direct utilization measure on the video set of not tape label It practises and is matched.In actual conditions, source video sequence may be the picture pick-up device in city, rural area or any other place, these The video sample that equipment generates may not have any label.Monitor video is carried out further for the image for needing basis given The application scenarios of quick-searching are marked large-scale sample and need to pay expensive manpower and time cost, therefore have The pedestrian of the supervision application that recognition methods is realized on these video sets again is challenged.
(2) existing pedestrian weight learning method cannot handle the isomerism between data, these methods be used only cluster or The method of transfer learning, for solving the problems, such as that image is identified to the pedestrian between the isomorphisms data such as image, video to video again.And Pedestrian image and video are often by different character representations, for example, image is indicated using external appearance characteristic, it include more timing informations Video indicate often there is great wide gap between probe image and the feature of gallery video set by space-time characteristic, hinder The distance between image and video is hindered to measure and match.
Solve the difficulty of above-mentioned technical problem:
How to be migrated according to the source domain data of known mark, so that also being carried out between the image and video in aiming field Metric learning.Target data is marked due to lacking, and tends not to directly carry out metric learning, for example, if being lost using triple, It needs to find one respectively according to data markers and makes the distance between they as far as possible with the anchor sample for having same tag Closely and the sample of a not isolabeling, so that their distance is as far as possible.
How the wide gap image/video data of isomery between is eliminated.In the business of pedestrian's important task, image data is by external appearance characteristic It indicates, and video data is indicated, the two is in feature quantity and characteristic measure due to additionally including temporal information by space-time characteristic All tool is very different, thus needs for two kinds of data to be transformed into same feature space, and the target numeric field data after conversion is still It so needs to keep the relativeness in original space, then carries out similarity calculation again.
Solve the meaning of above-mentioned technical problem:
The image and video data of aiming field usually can not directly carry out metric learning due to not including mark information.Root After being migrated according to source domain data, it can use the label of image and video in source domain and such as lose degree of progress using triple Amount study, reuses the feature extraction network learnt, extracts the feature of source domain and aiming field in aiming field respectively.It finally obtains Aiming field in the feature of image and video can also be respectively provided with corresponding distance relation, more have distinguishing ability.
Characteristics of image and video features are by the heterogeneity between eliminating, in the same feature space to image and view Relationship between frequency is discussed.By the mapping of common subspace, enable unlabelled image/video data in aiming field It is enough similar to image/video marked in source domain as far as possible, and retain the structure feature of aiming field itself.It is past when mapping Toward based on video features, this is because video includes richer space time information, effective information loss can drop after mapping To minimum.
Summary of the invention
In view of the problems of the existing technology, the present invention provides a kind of unsupervised image/video rows based on migration network The method and system that people identifies again.
The invention is realized in this way a kind of generated with the migration network of target information reservation using cross-module feature without prison Superintend and direct image/video pedestrian know method for distinguishing again the following steps are included:
Step 1, by image in source domain and sets of video data X={ (Is,Vs) carried out respectively with improved triple network Feature extraction.
Step 2 generates confrontation network using source domain data set and the training of aiming field training set, while considering that cross-module feature is raw Retain at destination domain information.
Step 3, according to pedestrian image I to be identified in target training settiNet is fought using the trained generation of step 2 Network generates depth characteristic G (f2d(Iti))。
Step 4, the depth characteristic G (f for calculating image3d(Vtj)) with aiming field in video depth characteristic G (f3d(Vtj)) it Between Euclidean distance.
Step 5, selection and query image stamp class label c identical with image apart from nearest video.
Further, in step 1, it is assumed thatTape label Source domain S includes NsA image/video is to (Isi,Vsi), Isi∈RPIt is i-th of image of source domain, the video V in corresponding source domainsi∈ RP.Similarly,Lack in aiming field T label image and video byWithIt respectively indicates.Since video features often include richer information than characteristics of image, construction triple network will make Video where obtaining target pedestrianWith image (positive example) where himDistance ratio to image (negative example) where other pedestrians's Distance wants small.Triple loss is defined as follows:
Indicate Va,Ip,InFrom source domain X, f2dIndicate the 2D image characteristics extraction subnet of 2d convolutional layer composition, f3dIndicate the 3D video feature extraction subnet of 3d convolutional layer composition.
In order to make model more rapid convergence, the triple of more " hardly possible " is often selected, that is, is givenSelect positive example pictureMake ?Select negative example diagram pieceSo thatIt is specifically used Online triple generator and the biggish batch of setting, but only calculate the sample of the minimum and maximum in batch.
Further, in step 2, the pedestrian image video pair for giving tape label in one group of source domain S, cross-module trained Net includes: to extract function f2dFor extracting the 2D feature of image, function f is extracted3dFor extracting the 3D feature of video, generator G It is from source domain or aiming field for generating intermediate cross-module feature and arbiter D for distinguishing feature.Learn out f2d, f3d, G and D make whole cross-module migration loss L reach minimum.L is defined as follows:
L=LGAN+αLcross-modal+βLtarget-preserving (2)。
Wherein, LGANIt indicates that the migration network generated migrates sample feature unmarked in aiming field to source domain, and makes Each arbiter can not distinguish source domain feature or the target domain characterization that G is generated.In this case, generator can be by mesh Mark characteristic of field is effectively converted to source domain, and identical as source domain distribution.LGANIt is defined as follows:
EX~SIndicate that X comes from source domain, EY~TIndicate that Y comes from aiming field, D is two classification functions for generating feature, D (f (Ii)) and D (f (Vi)) it is sample I respectivelyi∈ S and ViThe selection of a possibility that ∈ S, f are as follows:
Lcross-modalLearn a communal space out, for generating cross-module feature from 2D and 3D feature.Since pedestrian regards The 2D feature of 3D aspect ratio image in frequency includes more information, such as space time information etc., therefore the feature vector generated need to be with f3d(V) similar.Lcross-modalIt is defined as follows:
Lcross-modal=E(I, V)~S||G(f2d(I))-f3d(V)||2+||G(f3d(V))-f3d(V)||2 (4)
EI, V~SIndicate that I, V come from source domain.
Ltarget-preservingG (the f that migration can be made to generate2d(It)) 2D of image in aiming field is still kept to determine information, Likewise, G (the f generated3d(Vt)) 3D of original video is kept to determine information.This is because target data does not mark, figure Correlation between picture and video is also unknown.Ltarget-preservingIt is defined as follows:
Ltarget-preserving=E(I, V)~T||G(f2d(I))-f2d(I)||2+||G(f3d(V))-f3d(V)||2 (5)
EI, V~TIndicate that I, V come from aiming field.
CMGTN can be used back-propagating and optimize, each section f in network2d, f3d, the renewal process of G and D is as follows:
Input training set X={ (Is,Vs) and Y={ It, Vt, weight parameter α and β, and use improved ternary networking Network is to the variable W in GANl,blIt is initialized.By M iteration, each iteration chooses a collection of source domain and aiming field number respectively According to { (Vsi,Isi)∈X,(Vti,Itj) ∈ Y, feature extractor f is updated using the gradient decline of formula 2-52dAnd f3d, use public affairs The gradient decline of formula 2 updates arbiter D.Wherein, by the circulation of N number of sample, a collection of source domain and aiming field are acquired respectively every time Data { (Vsi,Isi)∈X,(Vti,Itj) ∈ Y, generator G is updated using the formula of modified arbiter D and formula 2-5.
Further, in steps of 5, the image/video pair measured by Euclidean distance, selection wherein with query image Apart from the smallest video as the similar of image, class label c identical with image is stamped.Its constraint is as follows:
Another object of the present invention is to provide a kind of migration networks generated using cross-module feature and target information retains Unsupervised image/video pedestrian weight control system for identifying.
Another object of the present invention is to provide it is a kind of realize it is described utilize cross-module feature generate and target information retain The unsupervised image/video pedestrian of migration network knows the computer program of method for distinguishing again.
Another object of the present invention is to provide it is a kind of realize it is described utilize cross-module feature generate and target information retain The unsupervised image/video pedestrian of migration network knows the information data processing terminal of method for distinguishing again.
Another object of the present invention is to provide a kind of computer readable storage mediums, including instruction, when it is in computer When upper operation, so that computer executes the unsupervised of the migration network retained using the generation of cross-module feature and target information Image/video pedestrian knows method for distinguishing again.
In conclusion advantages of the present invention and good effect are as follows:
The present invention uses unsupervised method, and alleviating image, often label is a large amount of into pedestrian's weight identification mission of video Label cost is greatly saved in the predicament of missing.
The present invention utilizes transfer learning, and the metric learning of source domain is moved in aiming field, and fights network using generating, The generator learnt out can convert sample characteristics unlabelled in aiming field to source domain, and retain aiming field as far as possible Information.After migration, a large amount of unlabelled target numeric field datas can obtain feature similar with data marked in source domain, most The matching accuracy rate of image and video is promoted eventually.
The present invention considers existing Heterogeneous data between image and video features, with the video comprising more effective informations Sub-space learning is carried out based on feature, and the feature after conversion is able to maintain the structure in former domain, eliminates image and video Between wide gap, effectively increase cross-module identification efficiency.
Detailed description of the invention
Fig. 1 is the migration network provided in an embodiment of the present invention for being generated using cross-module feature and being retained with target information without prison Superintend and direct the method flow diagram that image/video pedestrian identifies again.
Fig. 2 is CMC curve graph of the DukeMTMC-reID data set provided in an embodiment of the present invention as aiming field.
Fig. 3 is CMC curve graph of the MARS data set provided in an embodiment of the present invention as aiming field.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.
Pedestrian's weight identification model of existing image to video is all based on the frame of supervision, and needs largely to have The image/video of label is measured to for learning mapping, and data are usually all unmarked in actual scene, therefore to these The practical application of model such as metric learning proposes challenge.
Source video sequence may be city, rural area or it is any other place can not the phase picture pick-up device, these equipment generate Video sample may not have any label.In addition to this for the application of similar suspicious object tracking and Missing Persons' positioning Scene is generally required quickly to retrieve monitor video according to given image, but is marked to large-scale sample Note needs to pay expensive manpower and time cost.
In order to solve the above technical problems, below with reference to technical solution, the present invention will be described in detail.
As shown in Figure 1, proposed by the present invention generated with the migration network of target information reservation using cross-module feature without prison It superintends and directs image/video pedestrian and knows method for distinguishing again, comprising the following steps:
Step 1 is assumedThe source domain S of tape label includes Ns A image/video is to (Isi,Vsi), Isi∈RpIt is i-th of image of source domain, the video V in corresponding source domainsi∈Rq.Similarly,Lack in aiming field T label image and video byWithRespectively It indicates.Since video features often include richer information than characteristics of image, construction triple network will be so that target Video where pedestrianWith image (positive example) where himDistance ratio to image (negative example) where other pedestriansDistance want It is small.Triple loss is defined as follows:
Indicate Va,Ip,InFrom source domain X, f2dIndicate the 2D image characteristics extraction subnet of 2d convolutional layer composition, f3dIndicate the 3D video feature extraction subnet of 3d convolutional layer composition.
In order to make model more rapid convergence, the triple of more " hardly possible " is often selected, that is, is givenSelect positive example pictureMake ?Select negative example diagram pieceSo thatIt is specifically used Online triple generator and the biggish batch of setting, but only calculate the sample of the minimum and maximum in batch.
Step 2 generates confrontation network using the training of the training set of source domain data set and aiming field.In given one group of source domain S The pedestrian image video pair of tape label, the cross-module subnet trained include: to extract function f2dFor extracting the 2D feature of image, mention Take function f3dFor extracting the 3D feature of video, generator G is used to generate intermediate cross-module feature and arbiter D for distinguishing It is not characterized in from source domain or aiming field.Learn f out2d, f3d, G and D reach whole cross-module migration loss L most It is small.L is defined as follows:
L=LGAN+αLcross-modal+βLtarget-preserving (2)
Wherein, LGANIt indicates that the migration network generated migrates sample feature unmarked in aiming field to source domain, and makes Each arbiter can not distinguish source domain feature or the target domain characterization that G is generated.In this case, generator can be by mesh Mark characteristic of field is effectively converted to source domain, and identical as source domain distribution.LGANIt is defined as follows:
EX~SIndicate that X comes from source domain, EY~TIndicate that Y comes from aiming field, D is two classification functions for generating feature, D (f (Ii)) and D (f (Vi)) it is sample I respectivelyi∈ S and ViThe selection of a possibility that ∈ S, f are as follows:
Lcross-modalLearn a communal space out, for generating cross-module feature from 2D and 3D feature.Since pedestrian regards The 2D feature of 3D aspect ratio image in frequency includes more information, such as space time information etc., therefore the feature vector generated need to be with f3d(V) similar.Lcross-modalIt is defined as follows:
Lcross-modal=E(I, V)~S||G(f2d(I))-f3d(V)||2+||G(f3d(V))-f3d(V)||2 (4)
EI, V~SIndicate that I, V come from source domain.
Ltarget-preservingG (the f that migration can be made to generate2d(It)) 2D of image in aiming field is still kept to determine information, Likewise, G (the f generated3d(Vt)) 3D of original video is kept to determine information.This is because target data does not mark, figure Correlation between picture and video is also unknown.Ltarget-preservingIt is defined as follows:
Ltarget-preserving=E(I, V)~T||G(f2d(I))-f2d(I)||2+||G(f3d(V))-f3d(V)||2 (5)
EI, V~TIndicate that I, V come from aiming field.
CMGTN can be used back-propagating and optimize, each section f in network2d, f2d, the renewal process of G and D is as follows:
Input training set X={ (Is,Vs) and Y={ It, Vt, weight parameter α and β, and use improved ternary networking Network is to the variable W in GANl,blIt is initialized.By M iteration, each iteration chooses a collection of source domain and aiming field number respectively According to { (Vsi,Isi)∈X,(Vti,Itj) ∈ Y, feature extractor f is updated using the gradient decline of formula (5)2dAnd f3d, use public affairs The gradient decline of formula 2 updates arbiter D.Wherein, by the circulation of N number of sample, a collection of source domain and aiming field are acquired respectively every time Data { (Vsi,Isi)∈X,(Vti,Itj) ∈ Y, generator G is updated using the formula of modified arbiter D and formula (5).
Step 3, according to pedestrian image I to be identified in target training settiNetwork is fought using the generation that step 2 obtains, Generate depth characteristic G (f2d(Iti))。
Characteristics of image G (the f that step 4, calculating are generated by step 33d(Vtj)) with aiming field in video depth characteristic G (f3d (Vtj)) between Euclidean distance.
Step 5, the image/video pair measured by Euclidean distance select wherein with query image apart from the smallest view Frequency stamps class label c identical with image as the similar of image.Its constraint is as follows:
In embodiments of the present invention, the present invention provides a kind of migration net generated using cross-module feature and target information retains The unsupervised image/video pedestrian weight control system for identifying of network.
Below with reference to specific experiment, the invention will be further described.
Validity of the image/video pedestrian again in identification is being solved the problems, such as in order to verify the present invention unsupervised method of application, The CMGTN of proposition is compared with other 5 kinds unsupervised pedestrian's weight identification model UCDTL, GRDL, CAMEL, DGM and PUL. Since existing unsupervised approaches are only applicable to single mode, it is therefore desirable to certain modification is made based on control methods, here to analogy The image characteristics extraction of method uses JSTL feature, and video uses IDE feature.And in order to improved triple network ratio Compared with CMGTN+CM indicates to carry out feature extraction using triple network, and CMGTN+IV expression carries out figure using JSTL and IDE respectively The feature extraction of picture and video.
Experiment carries out on MARS and DukeMTMC-reID data set, and uses cumulative matches curve CMC and ranking k Matching rate evaluation experimental result.In training and test MARS, use DukeMTMC-reID as source domain data, similarly, training When with test DukeMTMC-reID, MARS can be by as source domain data.
The CMC curve such as Fig. 2 of DukeMTMC-reID data set as aiming field.CMC of the MARS data set as aiming field Curve such as Fig. 3.
Available by analysis experimental result data, CMGTN method proposed by the present invention is quantitative in any ranking Than 5 kinds control methods of matching degree will be high.By taking ranking is 1 matching rate as an example, CMGTN method is in DukeMTMC-reID data It is on collection as a result, the promotion of Mean match rate is made to be at least 10.6% (=28.6%-18%).
Below with reference to taking the experimental result of the matching accuracy rate of 1-20 before ranking to carry out more detailed comparison to each method The invention will be further described.
The experimental result of the matching accuracy rate of 1-20 before ranking is taken to carry out more detailed comparison, comparison result to each method It is as follows:
Experimental result is shown in two datasets, CMGTN method proposed by the present invention than it is existing based on cluster without prison Superintend and direct method and the existing method matching rate based on transfer learning be higher, on DukeMTMC-reID data set than it is existing most In good method rank-20 index, highest promoted 18.3% (53.2%-34.9%), on MARS data set can than it is existing most The good same index of method, highest promote 20.3% (65.3%-45.0%).
And experiment show can in all fingers using the feature extraction network (CMGTN+CM) of the method for the present invention It puts on better than existing JSTL characteristics of image and IDE video features method, is proposed on MARS data set using the method for the present invention Feature extraction network, be able to ascend rank-15 index highest 16.1% (62.7%-46.6%).
The present invention uses unsupervised method, the migration network retained with target information is generated using cross-module feature, to more The image and video of kind mode are learnt, and the efficiency of cross-module identification is effectively increased.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims (10)

1. a kind of unsupervised image/video pedestrian for generating the migration network retained with target information using cross-module feature identifies again Method, which is characterized in that it is described using cross-module feature generate and target information retain migration network unsupervised image regard Frequency pedestrian know method for distinguishing again the following steps are included:
Step 1, by image in source domain and sets of video data X={ (Is, Vs) with improved triple network feature is carried out respectively It extracts;
Step 2 generates confrontation network using source domain data set and the training of aiming field training set;
Step 3, according to pedestrian image I to be identified in target training settiNetwork is fought using the trained generation of step 2, it is raw At depth characteristic G (f2d(Iti));
Step 4, the depth characteristic G (f for calculating image3d(Vtj)) with aiming field in video depth characteristic G (f3d(Vtj)) between Euclidean distance;
Step 5, selection and query image stamp class label c identical with image apart from nearest video.
2. being generated as described in claim 1 using cross-module feature and the unsupervised image of the migration network of target information reservation regarding Frequency pedestrian knows method for distinguishing again, which is characterized in that in step 1, The source domain S of tape label includes NsA image/video is to (Isi, Vsi), Isi∈RPIt is i-th of image of source domain, the view in corresponding source domain Frequency Vsi∈RP
Lack in aiming field T label image and video byWithPoint It does not indicate;
Improved triple network will be so that video where target pedestrianWith place image positive exampleDistance ratio arrive other The negative example of image where pedestrianDistance it is small;Triple loss is as follows:
Indicate Va, Ip, InFrom source domain X, f2dIndicate the 2D image characteristics extraction subnet of 2d convolutional layer composition, f3dTable Show the 3D video feature extraction subnet of 3d convolutional layer composition;
It is givenSelect positive example pictureMakeSelect negative example diagram pieceMakeSpecifically used online triple generator and the biggish batch of setting.
3. being generated as described in claim 1 using cross-module feature and the unsupervised image of the migration network of target information reservation regarding Frequency pedestrian knows method for distinguishing again, which is characterized in that in step 2, the pedestrian image video pair of tape label in one group of source domain S is given, The cross-module subnet trained includes: to extract function f2dFor extracting the 2D feature of image, function f is extracted3dFor extracting video 3D feature, it is from source domain or mesh that generator G, which is used to generate intermediate cross-module feature and arbiter D for distinguishing feature, Mark domain;The f of study2d, f3d, G and D make whole cross-module migration loss L minimum;L are as follows:
L=LGAN+θLcross-modal+βLtarget-preserving
Wherein, LGANIt indicates that the migration network generated migrates sample feature unmarked in aiming field to source domain, and makes each Arbiter can not distinguish source domain feature or the target domain characterization that G is generated;In this case, generator is by target domain characterization Conversion is and identical as source domain distribution to source domain;LGANAre as follows:
EX~SIndicate that X comes from source domain, EY~TIndicate that Y comes from aiming field, D is two classification functions for generating feature, D (f (Ii)) With D (f (Vi)) it is sample I respectivelyi∈ S and ViThe selection of a possibility that ∈ S, f are as follows:
Lcross-modalLearn a communal space out, for generating cross-module feature from 2D and 3D feature;The feature vector of generation With f3d(V) similar;Lcross-modalAre as follows:
Lcross-modal=E(I, V)~S||G(f2d(I))-f3d(V)||2+||G(f3d(V))-f3d(V)||2
EI, V~SIndicate that I, V come from source domain;
Ltarget-preservingG (the f that migration can be made to generate2d(It)) 2D of image in aiming field is still kept to determine information, it generates G (f3d(Vt)) 3D of original video is kept to determine information;Ltarget-preservingAre as follows:
Ltarget-presering=E(I, V)~T||G(f2d(I))-f2d(I)||2+||G(f3d(V))-f3d(V)||2
EI, V~TIndicate that I, V come from aiming field.
4. being generated as claimed in claim 3 using cross-module feature and the unsupervised image of the migration network of target information reservation regarding Frequency pedestrian knows method for distinguishing again, which is characterized in that CMGTN is optimized using back-propagating, each section f in network2d, f3d, G Renewal process with D includes:
Input training set X={ (Is, Vs) and Y={ It, Vt, weight parameter d and β, and use improved triple network pair Variable W in GANl, blIt is initialized;By M iteration, each iteration chooses a collection of source domain and target numeric field data respectively {(Vsi, Isi) ∈ X, (Vti, Itj) ∈ Y, it uses Formula,Formula, Lcross-modal=E(I, V)~S||G(f2d(I))-f3d(V)||2+||G (f3d(V))-f3d (V)||2Formula, Ltarget-preserving=E(I, V)~T||G(f2d(I))-f2d(I)||2+||G(f3d(V))-f3d (V)||2The gradient decline of formula updates feature extractor f2dAnd f3d, use formulaGradient decline update arbiter D;By N The circulation of a sample acquires a collection of source domain and target numeric field data { (V respectively every timesi, Isi) ∈ X, (Vti, Itj) ∈ Y, using repairing Arbiter D after changing andFormula,Formula, Lcross-modal=E(I, V)~S||G(f2d(I))-f3d(V)||2+||G(f3d (V))-f3d(V)||2Formula, Ltarget-preserving=E(I, V)~T||G(f2d(I))-f2d(I)||2+||G(f3d(V))-f3d(V)| |2Formula updates generator G.
5. being generated as described in claim 1 using cross-module feature and the unsupervised image of the migration network of target information reservation regarding Frequency pedestrian knows method for distinguishing again, which is characterized in that in step 5, the image/video pair measured by Euclidean distance selects it In with query image apart from the smallest video as the similar of image, stamp identical with image class label c, constraint are as follows:
6. a kind of unsupervised figure for implementing to generate the migration network retained with target information described in claim 1 using cross-module feature As again video pedestrian knows the unsupervised image for migrating network of method for distinguishing generated using cross-module feature and target information retains Video pedestrian weight control system for identifying.
7. a kind of unsupervised figure for implementing to generate the migration network retained with target information described in claim 1 using cross-module feature As again video pedestrian knows the traffic information image/video monitoring device of method for distinguishing.
8. a kind of realize generates the migration net retained with target information using cross-module feature described in claim 1~6 any one The unsupervised image/video pedestrian of network knows the computer program of method for distinguishing again.
9. a kind of realize generates the migration net retained with target information using cross-module feature described in claim 1~6 any one The unsupervised image/video pedestrian of network knows the information data processing terminal of method for distinguishing again.
10. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer executes such as The unsupervised image as claimed in any one of claims 1 to 6 that migrate network generated using cross-module feature and target information retains Video pedestrian knows method for distinguishing again.
CN201910227955.3A 2019-03-25 2019-03-25 The method and system that unsupervised image/video pedestrian based on migration network identifies again Active CN109948561B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910227955.3A CN109948561B (en) 2019-03-25 2019-03-25 The method and system that unsupervised image/video pedestrian based on migration network identifies again

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910227955.3A CN109948561B (en) 2019-03-25 2019-03-25 The method and system that unsupervised image/video pedestrian based on migration network identifies again

Publications (2)

Publication Number Publication Date
CN109948561A true CN109948561A (en) 2019-06-28
CN109948561B CN109948561B (en) 2019-11-08

Family

ID=67011442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910227955.3A Active CN109948561B (en) 2019-03-25 2019-03-25 The method and system that unsupervised image/video pedestrian based on migration network identifies again

Country Status (1)

Country Link
CN (1) CN109948561B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399856A (en) * 2019-07-31 2019-11-01 上海商汤临港智能科技有限公司 Feature extraction network training method, image processing method, device and its equipment
CN110414462A (en) * 2019-08-02 2019-11-05 中科人工智能创新技术研究院(青岛)有限公司 A kind of unsupervised cross-domain pedestrian recognition methods and system again
CN110413924A (en) * 2019-07-18 2019-11-05 广东石油化工学院 A kind of Web page classification method of semi-supervised multiple view study
CN110728216A (en) * 2019-09-27 2020-01-24 西北工业大学 Unsupervised pedestrian re-identification method based on pedestrian attribute adaptive learning
CN111126360A (en) * 2019-11-15 2020-05-08 西安电子科技大学 Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model
CN111257507A (en) * 2020-01-16 2020-06-09 清华大学合肥公共安全研究院 Gas concentration detection and accident early warning system based on unmanned aerial vehicle
CN111259812A (en) * 2020-01-17 2020-06-09 上海交通大学 Inland ship re-identification method and equipment based on transfer learning and storage medium
CN111611880A (en) * 2020-04-30 2020-09-01 杭州电子科技大学 Efficient pedestrian re-identification method based on unsupervised contrast learning of neural network
CN111881722A (en) * 2020-06-10 2020-11-03 广东芯盾微电子科技有限公司 Cross-age face recognition method, system, device and storage medium
CN112016682A (en) * 2020-08-04 2020-12-01 杰创智能科技股份有限公司 Video representation learning and pre-training method and device, electronic equipment and storage medium
CN112069929A (en) * 2020-08-20 2020-12-11 之江实验室 Unsupervised pedestrian re-identification method and device, electronic equipment and storage medium
CN112541453A (en) * 2020-12-18 2021-03-23 广州丰石科技有限公司 Luggage weight recognition model training and luggage weight recognition method
CN112560925A (en) * 2020-12-10 2021-03-26 中国科学院深圳先进技术研究院 Complex scene target detection data set construction method and system
CN112633071A (en) * 2020-11-30 2021-04-09 之江实验室 Pedestrian re-identification data domain adaptation method based on data style decoupling content migration
CN112861705A (en) * 2021-02-04 2021-05-28 东北林业大学 Cross-domain pedestrian re-identification method based on hybrid learning
CN112990120A (en) * 2021-04-25 2021-06-18 昆明理工大学 Cross-domain pedestrian re-identification method using camera style separation domain information
CN113313188A (en) * 2021-06-10 2021-08-27 四川大学 Cross-modal fusion target tracking method
CN114663802A (en) * 2022-02-28 2022-06-24 北京理工大学 Cross-modal video migration method of surveillance video based on characteristic space-time constraint
WO2022134104A1 (en) * 2020-12-25 2022-06-30 Alibaba Group Holding Limited Systems and methods for image-to-video re-identification
CN115482666A (en) * 2022-09-13 2022-12-16 杭州电子科技大学 Multi-graph convolution neural network traffic prediction method based on data fusion

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013191975A1 (en) * 2012-06-21 2013-12-27 Siemens Corporation Machine-learnt person re-identification
CN107133601A (en) * 2017-05-13 2017-09-05 五邑大学 A kind of pedestrian's recognition methods again that network image super-resolution technique is resisted based on production
CN107145900A (en) * 2017-04-24 2017-09-08 清华大学 Pedestrian based on consistency constraint feature learning recognition methods again
CN108256439A (en) * 2017-12-26 2018-07-06 北京大学 A kind of pedestrian image generation method and system based on cycle production confrontation network
CN109063776A (en) * 2018-08-07 2018-12-21 北京旷视科技有限公司 Image identifies network training method, device and image recognition methods and device again again
CN109063607A (en) * 2018-07-17 2018-12-21 北京迈格威科技有限公司 The method and device that loss function for identifying again determines
CN109241816A (en) * 2018-07-02 2019-01-18 北京交通大学 It is a kind of based on label optimization image identifying system and loss function determine method again
WO2019016540A1 (en) * 2017-07-18 2019-01-24 Vision Semantics Limited Target re-identification

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013191975A1 (en) * 2012-06-21 2013-12-27 Siemens Corporation Machine-learnt person re-identification
CN107145900A (en) * 2017-04-24 2017-09-08 清华大学 Pedestrian based on consistency constraint feature learning recognition methods again
CN107133601A (en) * 2017-05-13 2017-09-05 五邑大学 A kind of pedestrian's recognition methods again that network image super-resolution technique is resisted based on production
WO2019016540A1 (en) * 2017-07-18 2019-01-24 Vision Semantics Limited Target re-identification
CN108256439A (en) * 2017-12-26 2018-07-06 北京大学 A kind of pedestrian image generation method and system based on cycle production confrontation network
CN109241816A (en) * 2018-07-02 2019-01-18 北京交通大学 It is a kind of based on label optimization image identifying system and loss function determine method again
CN109063607A (en) * 2018-07-17 2018-12-21 北京迈格威科技有限公司 The method and device that loss function for identifying again determines
CN109063776A (en) * 2018-08-07 2018-12-21 北京旷视科技有限公司 Image identifies network training method, device and image recognition methods and device again again

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FEI MA ET AL;: "《True-Color and Grayscale Video Person Re-Identification》", 《JOURNAL OF LATEX CLASS FILES, 2018》 *
XIAOKE ZHU ET AL;: "《Semi-Supervised Cross-View Projection-Based Dictionary Learning for Video-Based Person Re-Identification》", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》 *
XINHONG MA ET AL: "《Deep Multi-Modality Adversarial Networks for Unsupervised Domain Adaptation》", 《IEEE TRANSACTIONS ON MULTIMEDIA》 *
张璐: "《基于对抗学习的跨模态检索方法研究进展》", 《研究与开发》 *
朱繁: "《基于深度学习的行人重识别研究综述》", 《南京师大学报》 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413924A (en) * 2019-07-18 2019-11-05 广东石油化工学院 A kind of Web page classification method of semi-supervised multiple view study
CN110413924B (en) * 2019-07-18 2020-04-17 广东石油化工学院 Webpage classification method for semi-supervised multi-view learning
CN110399856B (en) * 2019-07-31 2021-09-14 上海商汤临港智能科技有限公司 Feature extraction network training method, image processing method, device and equipment
CN110399856A (en) * 2019-07-31 2019-11-01 上海商汤临港智能科技有限公司 Feature extraction network training method, image processing method, device and its equipment
CN110414462A (en) * 2019-08-02 2019-11-05 中科人工智能创新技术研究院(青岛)有限公司 A kind of unsupervised cross-domain pedestrian recognition methods and system again
CN110728216A (en) * 2019-09-27 2020-01-24 西北工业大学 Unsupervised pedestrian re-identification method based on pedestrian attribute adaptive learning
CN111126360A (en) * 2019-11-15 2020-05-08 西安电子科技大学 Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model
CN111126360B (en) * 2019-11-15 2023-03-24 西安电子科技大学 Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model
CN111257507A (en) * 2020-01-16 2020-06-09 清华大学合肥公共安全研究院 Gas concentration detection and accident early warning system based on unmanned aerial vehicle
CN111259812B (en) * 2020-01-17 2023-04-18 上海交通大学 Inland ship re-identification method and equipment based on transfer learning and storage medium
CN111259812A (en) * 2020-01-17 2020-06-09 上海交通大学 Inland ship re-identification method and equipment based on transfer learning and storage medium
CN111611880B (en) * 2020-04-30 2023-06-20 杭州电子科技大学 Efficient pedestrian re-recognition method based on neural network unsupervised contrast learning
CN111611880A (en) * 2020-04-30 2020-09-01 杭州电子科技大学 Efficient pedestrian re-identification method based on unsupervised contrast learning of neural network
CN111881722A (en) * 2020-06-10 2020-11-03 广东芯盾微电子科技有限公司 Cross-age face recognition method, system, device and storage medium
CN111881722B (en) * 2020-06-10 2021-08-24 广东芯盾微电子科技有限公司 Cross-age face recognition method, system, device and storage medium
CN112016682B (en) * 2020-08-04 2024-01-26 杰创智能科技股份有限公司 Video characterization learning and pre-training method and device, electronic equipment and storage medium
CN112016682A (en) * 2020-08-04 2020-12-01 杰创智能科技股份有限公司 Video representation learning and pre-training method and device, electronic equipment and storage medium
CN112069929B (en) * 2020-08-20 2024-01-05 之江实验室 Unsupervised pedestrian re-identification method and device, electronic equipment and storage medium
CN112069929A (en) * 2020-08-20 2020-12-11 之江实验室 Unsupervised pedestrian re-identification method and device, electronic equipment and storage medium
CN112633071A (en) * 2020-11-30 2021-04-09 之江实验室 Pedestrian re-identification data domain adaptation method based on data style decoupling content migration
CN112560925A (en) * 2020-12-10 2021-03-26 中国科学院深圳先进技术研究院 Complex scene target detection data set construction method and system
CN112541453A (en) * 2020-12-18 2021-03-23 广州丰石科技有限公司 Luggage weight recognition model training and luggage weight recognition method
WO2022134104A1 (en) * 2020-12-25 2022-06-30 Alibaba Group Holding Limited Systems and methods for image-to-video re-identification
CN112861705A (en) * 2021-02-04 2021-05-28 东北林业大学 Cross-domain pedestrian re-identification method based on hybrid learning
CN112861705B (en) * 2021-02-04 2022-07-05 东北林业大学 Cross-domain pedestrian re-identification method based on hybrid learning
CN112990120A (en) * 2021-04-25 2021-06-18 昆明理工大学 Cross-domain pedestrian re-identification method using camera style separation domain information
CN112990120B (en) * 2021-04-25 2022-09-16 昆明理工大学 Cross-domain pedestrian re-identification method using camera style separation domain information
CN113313188B (en) * 2021-06-10 2022-04-12 四川大学 Cross-modal fusion target tracking method
CN113313188A (en) * 2021-06-10 2021-08-27 四川大学 Cross-modal fusion target tracking method
CN114663802A (en) * 2022-02-28 2022-06-24 北京理工大学 Cross-modal video migration method of surveillance video based on characteristic space-time constraint
CN114663802B (en) * 2022-02-28 2024-05-31 北京理工大学 Feature space-time constraint-based cross-modal video migration method for surveillance video
CN115482666A (en) * 2022-09-13 2022-12-16 杭州电子科技大学 Multi-graph convolution neural network traffic prediction method based on data fusion
CN115482666B (en) * 2022-09-13 2024-05-07 杭州电子科技大学 Multi-graph convolution neural network traffic prediction method based on data fusion

Also Published As

Publication number Publication date
CN109948561B (en) 2019-11-08

Similar Documents

Publication Publication Date Title
CN109948561B (en) The method and system that unsupervised image/video pedestrian based on migration network identifies again
Wu et al. Progressive learning for person re-identification with one example
CN111126360B (en) Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model
CN111967294B (en) Unsupervised domain self-adaptive pedestrian re-identification method
CN112308158B (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN112069929B (en) Unsupervised pedestrian re-identification method and device, electronic equipment and storage medium
CN108921051B (en) Pedestrian attribute identification network and technology based on cyclic neural network attention model
Lin et al. RSCM: Region selection and concurrency model for multi-class weather recognition
Stone et al. Autotagging facebook: Social network context improves photo annotation
CN108875816A (en) Merge the Active Learning samples selection strategy of Reliability Code and diversity criterion
CN109299707A (en) A kind of unsupervised pedestrian recognition methods again based on fuzzy depth cluster
CN110210335B (en) Training method, system and device for pedestrian re-recognition learning model
CN107368534B (en) Method for predicting social network user attributes
CN112819065B (en) Unsupervised pedestrian sample mining method and unsupervised pedestrian sample mining system based on multi-clustering information
CN108229347A (en) For the method and apparatus of the deep layer displacement of the plan gibbs structure sampling of people's identification
CN106250925B (en) A kind of zero Sample video classification method based on improved canonical correlation analysis
CN108427713A (en) A kind of video summarization method and system for homemade video
CN111967325A (en) Unsupervised cross-domain pedestrian re-identification method based on incremental optimization
CN110705591A (en) Heterogeneous transfer learning method based on optimal subspace learning
CN104966075B (en) A kind of face identification method and system differentiating feature based on two dimension
CN111126464A (en) Image classification method based on unsupervised domain confrontation field adaptation
Song et al. Robust label rectifying with consistent contrastive-learning for domain adaptive person re-identification
CN110889335B (en) Human skeleton double interaction behavior identification method based on multichannel space-time fusion network
CN115761408A (en) Knowledge distillation-based federal domain adaptation method and system
CN109214430A (en) A kind of recognition methods again of the pedestrian based on feature space topology distribution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant