CN110046608B

CN110046608B - Leaf-shielded pedestrian re-recognition method and system based on semi-coupling identification dictionary learning

Info

Publication number: CN110046608B
Application number: CN201910344098.5A
Authority: CN
Inventors: 荆晓远; 马飞; 朱小柯; 黄鹤; 姚永芳; 彭志平
Original assignee: Guangdong University of Petrochemical Technology
Current assignee: Guangdong University of Petrochemical Technology
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2020-01-07
Anticipated expiration: 2039-04-26
Also published as: CN110046608A

Abstract

The invention belongs to the technical field of image retrieval, and discloses a leaf occlusion pedestrian re-identification method and system based on half-coupling identification dictionary learning, wherein real branch and leaf occlusion is added from two public pedestrian re-identification data sets, including an occlusion video and a common video; then respectively extracting the characteristics of the shielded video and the ordinary video; secondly, processing the extracted sample characteristics, introducing a dictionary learning method, and learning a projection matrix from the occlusion video and the common video; and introducing a pair of the identification thought learning dictionaries. The method learns the half-coupling mapping matrix and the identification dictionary pair, the half-coupling mapping matrix can compensate the difference between the shielding video and the common video, the identification dictionary pair can enable the same person to be more compact in different cameras, and different persons are separated in different cameras; the experimental results on 2 public data sets demonstrate that the proposed method has better recognition performance.

Description

Leaf-shielded pedestrian re-recognition method and system based on semi-coupling identification dictionary learning

Technical Field

The invention belongs to the technical field of image retrieval, and particularly relates to a leaf-occluded pedestrian re-identification method and system based on semi-coupled identification dictionary learning.

Background

Currently, the closest prior art:

pedestrian re-identification plays an important role in video monitoring and smart cities, and is also widely researched in recent years. The purpose of pedestrian re-identification is to retrieve the image of the same person from another image set for identification. Most methods solve the problem of pedestrian re-identification in a normal scene to a certain extent, but in an actual scene, illumination, a visual angle and shielding exist, particularly in summer, branches and leaves are not trimmed in time to shield a camera, one person is shielded through the camera within a period of time, the other person is not shielded through the camera within other periods of time, under the condition, a detection video is captured by a shielded camera, and a gallery video is captured by a common camera. The pedestrian re-identification problem under the scene is called as the pedestrian re-identification problem based on video branch and leaf shielding. The occlusion of the video or image will cause the loss of effective information such as visual appearance characteristics, space-time characteristics and the like, most of the existing methods are suitable for normal scenes, and the problem under the occlusion scene cannot be well solved.

However, the pedestrian re-identification problem based on video occlusion is also a common and important application, and the research is not common at present. Although there are some pedestrian re-identification methods that study light occlusion and partial body occlusion, the focus is mainly on image-based identification methods. Video contains not only visual appearance information but also spatiotemporal information, which is very effective for video-based pedestrian re-identification.

In summary, the problems of the prior art are as follows:

(1) in the prior art, visual spatiotemporal information contained in a video cannot be identified. The problem of pedestrian heavy identification with branch and leaf shielding can not be solved.

(2) In the occlusion scene, no data set identified again by pedestrians exists, so that the invention prepares 2 public data sets again, and uses a real occlusion template (real branches and leaves) to simulate on the PRID2011 and the iLIDS-VID to form 2 new data sets LO-PRID2011 and LO-iLIDS-VID.

The difficulty of solving the technical problems is as follows:

most methods solve the problem of pedestrian re-identification in a normal scene to a certain extent, but in an actual scene, illumination, a visual angle and shielding exist, particularly in summer, branches and leaves are not trimmed in time to shield a camera, one person is shielded through the camera within a period of time, the other person is not shielded through the camera within other periods of time, under the condition, a detection video is captured by a shielded camera, and a gallery video is captured by a common camera.

The significance of solving the technical problems is as follows:

the invention calls the pedestrian re-identification problem under the scene shielded by the branches and leaves as the pedestrian re-identification problem based on the video branch and leaf shielding. The occlusion of the video or image will cause the loss of effective information such as visual appearance characteristics, space-time characteristics and the like, most of the existing methods are suitable for normal scenes, and the problem under the occlusion scene cannot be well solved. The invention provides a leaf-occlusion pedestrian re-identification method and system based on semi-coupling identification dictionary learning aiming at the problems.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a leaf-occluded pedestrian re-recognition method and system based on half-coupling identification dictionary learning.

The invention is realized in such a way that a leaf-sheltered pedestrian re-recognition method based on half-coupling identification dictionary learning comprises the following steps:

firstly, collecting a data set, and adding real branch and leaf occlusion including an occlusion video and a common video from two public pedestrian re-identification data sets; then respectively extracting the characteristics of the shielded video and the ordinary video; secondly, processing the extracted sample characteristics, wherein the characteristics of each user in the two cameras are projected in a characteristic set learning set class; the invention introduces a dictionary learning method to learn dictionary pairs of the occlusion video and the common video; learning a projection matrix from the occlusion video and the common video; finally, the invention introduces the identification thought learning dictionary pair, thereby realizing the leaf-occlusion pedestrian re-recognition algorithm based on the half-coupling identification dictionary learning. The invention has the advantages of solving the problem of pedestrian re-identification with branch and leaf shielding and providing the leaf shielding pedestrian re-identification (SCD) based on the half-coupling identification dictionary learning for the first time²L) technique. The technology learns a half-coupling mapping matrix and an identification dictionary pair, the half-coupling mapping matrix can compensate the difference between a shielding video and a common video, the identification dictionary pair can enable the same person to be more compact in different cameras, and different persons are separated in different cameras. The experimental results on 2 public data sets prove that the proposed methodHas better recognition performance.

Further, the leaf-sheltered pedestrian re-recognition method based on the half-coupling identification dictionary learning comprises the following steps:

step (1): collecting data and establishing a database, wherein the database consists of an occlusion video and a common video;

step (2): respectively extracting characteristics of the occlusion video and the common video;

and (3): designing a dictionary pair with a half-coupling mapping;

and (4): learning dictionary pairs by using the identification ideas;

and (5): respectively learning a set projection matrix W of an occluded video and a set projection matrix V of an unoccluded video;

and (6): obtaining a total objective function;

and (7): performing algorithm optimization on the target function;

and (8): and re-identifying the pedestrians under the occlusion video.

Further, the specific method for collecting data and establishing the database in the step (1) is as follows:

the invention solves the problem of pedestrian re-identification in an occlusion scene and a normal scene, and because no data set for pedestrian re-identification exists in the occlusion scene, the invention makes 2 public data sets again, and uses a real occlusion template (real branches and leaves) to simulate on PRID2011 and iLIDS-VID, so as to form 2 new data sets LO-PRID2011 and LO-iLIDS-VID. The PRID2011 dataset has a total of 200 people present in both the a camera 385 and the B camera 749, in 2 views, each video consisting of 5 to 675 image frames. To ensure the effective length of each gait cycle, the present invention selects 178 persons over 20 frames, randomly selects 89 pairs from them for training, and uses the remaining pairs for testing. The present invention uses an occlusion template to add occlusions to PRID2011 to generate a Lo-PRID2011 dataset. Similarly, for a total of 300 pedestrians in 2 cameras of the lidis-VID, image frames from 23 to 192, and an average frame number of 73 frames, the present invention overlays the lidis-VID data set with an occlusion template to generate the LO-lidis-VID data set. The simulation process is similar to the process of generating the Lo-PRID2011 dataset. The resulting occluded dataset can be derived from https: i// sites. ***. com/site/LODB.

Further, in the step (2), the characteristics of the occlusion video and the normal video are respectively extracted, for the occlusion video, the walking period characteristics are extracted by FEP (Wang et al 2014; Liu et al 2015), and the method empirically selects 20 frames as one walking period of the occlusion video according to the average number of walking periods in the normal video. For normal video, the present invention extracts the STFV3D feature directly for each user.

Further, in step (3), dictionary pairs with half-coupling mappings are designed. Different cameras have different pose and viewpoint changes for the same person, but occlusion has similarities in visual appearance and spatiotemporal features with normal video. Relaxing the strong assumption that these two spaces are equal, the present invention considers a half-coupling mapping matrix to reflect the relationship between the corresponding features. To this end, the invention designs dictionary pairs with half-coupling mappings, as follows:

s.t.A_i＝D_OX_i；B_i＝D_NY_i

wherein E_represent(. to) represents a sub-dictionary representing fidelity terms, X and Y are A and B at D, respectively_OAnd D_NIs a coefficient of expression of_mapping(. cndot.) is a mapping fidelity term, the objective is to find the relationship between the encoding coefficients of A and B, and Φ (-) is a coefficient mapping function of an occlusion and normal dictionary pair. A ═ A₁，A₂...A_N]Is the spatio-temporal feature that obscures the training video,

is the video feature subset of the ith person, n_iIs the number of walking cycles of the first person, N is the number of occluded persons, B ═ B₁，B₂...B_N]Is a spatio-temporal feature set of an occlusion-free training video, wherein

Is and n_jThe spatiotemporal feature set of the jth person corresponding to each gait cycle, N is the number of people in the normal video.

Further, in the step (4), learning dictionary pairs by using the identification ideas; in order to improve the discrimination fidelity of the sparse representation coefficients, the invention minimizes the distance of the same person among different cameras, and maximizes the distance of different persons among 2 cameras. To define a discriminative reconstruction error term:

where S represents the same class, D represents a different class, and θ is a balance factor.

Further, in the step (5), respectively learning a set projection matrix W of the occluded video and a set projection matrix V of the unoccluded video; due to the shielding of the posture and the viewpoint of the same person in the same video and the difference of the posture and the viewpoint, certain difference exists among samples. These differences can scatter the features of the same person, making matching more difficult. Therefore, based on some criteria similar to the Fisher criterion, the invention considers learning a subspace projection, so that the sample has small intra-class divergence but large inter-class divergence.

The mapping matrix can construct the relation between the shielding video characteristics and the common video characteristics, and the information loss of the space-time characteristics caused by shielding is made up to a certain extent. Where the intra-class divergence of the feature set in camera a can be expressed as:

the inter-class divergence of the feature set in camera a can be expressed as:

wherein mu_iIs camera A in A_iIs the average vector of the other samples in camera a (excluding the ith person),

namely:

N_allrefers to the number of all samples in the same camera. The subspace projection in camera a can thus be expressed as:

where W is the subspace projection matrix of the feature sets in camera a, to some extent, the feature sets of the same person can be compressed while reducing intra-class variation. The projection of B is similar to a. The subspace projection in camera B can thus be expressed as:

where V is the subspace projection matrix for the feature set in camera B. By learning the projection matrix W, the variation inside the video can be reduced, and the final projections of a and B in the embedding space can be written as:

further, in step (6), an overall objective function is obtained. Half-coupled mapping of sparse representation coefficients, and video-in-projection embedding subspace terms, taking into account discriminant reconstruction errors. The goal of the invention is to minimize the objective function:

where γ, α, β, η, λ are regularization parameter balance factors,

is a regularization term to prevent overfitting. Equation (8) is converted to the dictionary pair learning and ridge regression problem, SCD²The objective function of L can be expressed as:

W^TW＝I，V^TV＝I (9)

SCD of the present invention²L joint learning half-coupling projection matrix and dictionary pair D_OAnd D_N. The learned projection matrix can reconstruct the relation between the shielding video and the common video, and the loss of the space-time characteristic information caused by shielding can be compensated to a certain extent.

Further, in step (7), an algorithm optimization is performed on the objective function. All variables in equation (9) are not convex and the present invention uses an alternative optimization strategy to solve for unknown variables. In other words, each time the present invention updates one variable, the other variables are fixed. To minimize equation (9), it can be divided into 4 sub-problems, i.e. the update of sparse coding of training samples, the update of dictionary pairs, the update of subspace projection matrices within the video, the update of sparse representation coefficient mapping functions. First, the present invention initializes the mapping matrix P to an identity matrix. D_OAnd D_NThere are many methods of initialization, such as random matrix and PCA basis. The invention initializes dictionary pairs to a random matrix with the Frobenius norm for each column vector and initializes sparse representation coefficients for X and Y by solving equations (10) and (11):

(1) the other variables are fixed to update W and V. Equation (9) can be rewritten as:

by setting the derivatives of W and V, respectively, equations (12) and (13) can be solved as follows:

(2) the other variables are fixed to update X and Y. Updating X first, equation (9) can be rewritten as:

wherein

Is represented by the formula X_iA subset of related correctly or incorrectly matched cameras B. By mixing X_iIs set to 0, and can be obtained by the following equation:

updating Y, similar to equation (16), equation (9) may be rewritten as:

wherein

Is represented by the formula_iA subset of related correctly or incorrectly matched cameras a. By mixing Y_iIs set to 0, and can be obtained by the following equation:

(3) fixing other variable updates D_OAnd D_N. Equation (9) can be rewritten as:

the present invention may utilize the ADMM algorithm to obtain solutions of equations (20) and (21).

(4) The projection matrix P is updated with the other variables fixed. Equation (9) can be rewritten as:

by setting the derivative of P to 0 to solve, the present invention can obtain:

P＝(XX^T+((λ/γ)I)^-1(YX^T) (23)

further, in step (8), pedestrian re-identification under the occlusion video is performed. By learning dictionary pair D_OAnd D_NThe mapping matrix P and the subspace projections W and V in the video can respectively obtain robust sparse representation and effective sparse representation of the test video. Because the occlusion video is used as the characteristic of the probe set F and the common video characteristic is used as the characteristic of the gallery set G, the matching process is executed as follows:

(1) encoding the representation coefficient f of the probe set on the occlusion dictionary by solving the formula (10) by using the learned P, W and V,

(2) encoding the representation coefficient g of the general dictionary on the gallery set by solving the formula (11),

(3) images of the same person in the gallery set as in the probe set are identified. By the obtained sparse representation coefficients, the distance between the features of the atlas and the probe set can be calculated, then the distances are sorted, and the image of the atlas with the minimum distance is the image which is correctly matched with the probe set, namely the image of the person on the correct probe set is matched on the atlas.

The invention further aims to provide a leaf-occluded pedestrian re-recognition system based on semi-coupled identification dictionary learning, which implements the leaf-occluded pedestrian re-recognition method based on semi-coupled identification dictionary learning.

The invention further aims to provide a traffic road pedestrian image re-identification device for implementing the leaf-shielded pedestrian re-identification method based on the semi-coupling identification dictionary learning.

In summary, the advantages and positive effects of the invention are:

to verify whether the method of the present invention has good superiority, the present invention proposes SCD²The L method is compared with several most advanced pedestrian re-identification methods, including 2 characteristic learning pedestrian re-identification methods: STFV3D, RFA-Net; 1 dictionary learning method: the pedestrian re-identification method based on PHDL and distance measurement comprises the following steps: RDC, TDL, KISSME, SI²And (4) DL. Experiments were performed on two data sets, LO-PRID2011 and LO-iLIDS-VID, respectively.

The experimental results are as follows:

table 1 counts the Top R match rates on both LO-PRID2011 and LO-iLIDS-VID datasets.

FIG. 2 shows statistics of Top R match rates on both LO-PRID2011 and LO-iLIDS-VID datasets.

TABLE 1

Through the table 1 and the fig. 2, compared with the comparison algorithm, the leaf-shielded pedestrian re-recognition algorithm based on the semi-coupling identification dictionary learning has better performance than other comparison methods. Table 1 shows the detailed matching rate of Top-R, more precisely, the Rank-1 matching rate is improved by 5% (27.3% -22.3%) compared to the best matching method on Lo-iLIDS-VID dataset. It can be seen that these comparison methods perform poorly for occlusion video based pedestrian re-identification problems, as it may be somewhat more challenging for many people in the original iLIDS-vid dataset to have some occlusion in some image frames, e.g. some people may be occluded by some objects or other pedestrians. By utilizing the video inner projection learned by the identification thought, the feature set of each user can be more compact, and the feature sets of different people are separated. Thus, the method of the present invention may perform better than other comparative methods.

The experimental results are as follows: fig. 3 (a): whether the Top 1 matching rate of the video inner set projection matrix on the two data sets of LO-PRID2011 and LO-iLIDS-VID is counted. Fig. 3 (b): the Top 1 matching rate of the semi-coupled projection matrix on the two data sets of LO-PRID2011 and LO-iLIDS-VID is counted. Fig. 4 (a): the Top 1 matching rate of different dictionary sizes on the LO-PRID2011 dataset is counted. Fig. 4 (b): the convergence plot on the LO-PRID2011 dataset was counted.

To evaluate the effect of aggregating projection items W and V within a video, and to perform Top 1 matching on both LO-PRID2011 and LO-iLIDS-VID datasets, the objective is to separate the video set of the same person from different persons. To evaluate the effect of W and V, the present invention employs a method of removing W and V, respectively called SCD²L-W，SCD²L-V and SCD²L-WV. The experimental results of FIG. 3(a) show that the match rate decreases when each term is deleted, especially when there are no two termsIn the case of (2), the performance is significantly reduced. Therefore, the projection items in the video play an important role in the re-identification based on the feature set.

To evaluate the effect of the half-coupled projection matrix and perform Top 1 matching on both LO-PRID2011 and LO-iLIDS-VID datasets, the present invention sets the mapping matrix to an identity matrix, called SCD²L-P. The experimental result of fig. 3(b) shows that the half-coupling mapping term can reflect the relationship between the occlusion coefficient of the feature and the non-occlusion sparse coefficient, can compensate the difference between the occlusion and the feature under the ordinary camera, and is beneficial to re-identification of people.

Equation (9) is a joint optimization problem, the present invention uses an alternating iterative optimization algorithm for this process, and fig. 4(b) shows a convergence curve of the algorithm of the present invention on the LO-PRID2011 data set, and it can be seen that the curve rapidly decreases and becomes stable after 19 iterations. The algorithm can be converged after being iterated for less than 16 times on the LO-PRID2011 data set, and can be converged after being iterated for less than 19 times on the LO-iLIDS-VID data set.

Drawings

Fig. 1 is a flowchart of a leaf-occluded pedestrian re-recognition method based on half-coupled discriminative dictionary learning according to an embodiment of the present invention.

Fig. 2 is a graph of Top R matching rates on two LO-PRID2011 and LO-iLIDS-VID datasets that are statistical and provided by embodiments of the present invention.

FIG. 3 is a graph of Top 1 matching rate provided by an embodiment of the present invention.

In the figure: (a) the method comprises the following steps Whether the Top 1 matching rate of the video inner set projection matrix on the two data sets of LO-PRID2011 and LO-iLIDS-VID is counted. (b) The method comprises the following steps The Top 1 matching rate of the semi-coupled projection matrix on the two data sets of LO-PRID2011 and LO-iLIDS-VID is counted.

Fig. 4 is a convergence graph provided by an embodiment of the present invention.

In the figure: (a) the method comprises the following steps Counting Top 1 matching rate graphs of different dictionary sizes on an LO-PRID2011 data set; (b) the method comprises the following steps The convergence plot on the LO-PRID2011 dataset was counted.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the prior art, visual spatiotemporal information contained in a video cannot be identified. The problem of pedestrian heavy identification with branch and leaf shielding can not be solved.

To solve the above problems, the present invention will be described in detail with reference to specific embodiments.

The embodiment of the invention provides a leaf occlusion pedestrian re-identification method based on semi-coupling identification dictionary learning, which comprises the steps of firstly collecting data sets, adding real branch and leaf occlusion including an occlusion video and a common video from two public pedestrian re-identification data sets; then respectively extracting the characteristics of the shielded video and the ordinary video; secondly, processing the extracted sample characteristics, wherein the characteristics of each user in the two cameras are projected in a characteristic set learning set class; the invention introduces a dictionary learning method to learn dictionary pairs of the occlusion video and the common video; learning a projection matrix from the occlusion video and the common video; finally, the invention introduces the identification thought learning dictionary pair, thereby realizing the leaf-occlusion pedestrian re-recognition algorithm based on the half-coupling identification dictionary learning. The invention has the advantages of solving the problem of pedestrian re-identification with branch and leaf shielding and providing the leaf shielding pedestrian re-identification (SCD) based on the half-coupling identification dictionary learning for the first time²L) technique. The technology learns a half-coupling mapping matrix and an identification dictionary pair, the half-coupling mapping matrix can compensate the difference between a shielding video and a common video, the identification dictionary pair can enable the same person to be more compact in different cameras, and different persons are separated in different cameras. The experimental results on 2 public data sets prove that the proposed method has better identification performance

As shown in fig. 1, in the embodiment of the present invention, the leaf-occluded pedestrian re-recognition method based on half-coupled discriminative dictionary learning specifically includes the following steps:

step (1): and collecting data to establish a database, wherein the database consists of an occlusion video and a common video.

Step (2): and respectively extracting characteristics of the occlusion video and the common video.

And (3): dictionary pairs with half-coupling mappings are designed.

And (4): the dictionary pairs are learned using discriminative ideas.

And (5): and respectively learning a set projection matrix W of the occluded video and a set projection matrix V of the unoccluded video.

And (6): an overall objective function is obtained.

And (7): and performing algorithm optimization on the objective function.

And (8): and re-identifying the pedestrians under the occlusion video.

As a preferred embodiment of the present invention, in step (1), the specific method for collecting data and establishing the database is as follows:

the technology solves the problem of pedestrian re-identification in an occlusion scene and a normal scene, and because no data set for pedestrian re-identification exists in the occlusion scene, the invention manufactures 2 public data sets again, and uses a real occlusion template (real branches and leaves) to simulate on PRID2011 and iLIDS-VID, so as to form 2 new data sets LO-PRID2011 and LO-iLIDS-VID. The PRID2011 dataset has a total of 200 people present in both the a camera 385 and the B camera 749, in 2 views, each video consisting of 5 to 675 image frames. To ensure the effective length of each gait cycle, the present invention selects 178 persons over 20 frames, randomly selects 89 pairs from them for training, and uses the remaining pairs for testing. The present invention uses an occlusion template to add occlusions to PRID2011 to generate a Lo-PRID2011 dataset. Similarly, for a total of 300 pedestrians in 2 cameras of the lidis-VID, image frames from 23 to 192, and an average frame number of 73 frames, the present invention overlays the lidis-VID data set with an occlusion template to generate the LO-lidis-VID data set. The simulation process is similar to the process of generating the Lo-PRID2011 dataset. The resulting occluded dataset can be derived from https: i// sites. ***. com/site/LODB.

In the step (2), features are respectively extracted from the occlusion video and the normal video, walking cycle features are extracted from the occlusion video by FEP (Wang et al 2014; Liu et al 2015), and 20 frames are empirically selected as one walking cycle of the occlusion video according to the average number of walking cycles in the normal video. For normal video, the present invention extracts the STFV3D feature directly for each user.

As a preferred embodiment of the present invention, in step (3), a dictionary pair having a half-coupling mapping is designed. Different cameras have different pose and viewpoint changes for the same person, but occlusion has similarities in visual appearance and spatiotemporal features with normal video. Relaxing the strong assumption that these two spaces are equal, the present invention considers a half-coupling mapping matrix to reflect the relationship between the corresponding features. To this end, the invention designs dictionary pairs with half-coupling mappings, as follows:

s.t.A_i＝D_OX_i；B_i＝D_NY_i

is the video feature subset of the ith person, n_iIs the number of walking cycles of the first person, N is the number of occluded persons, B ═ B₁，B₂...B_N]Is no shielding trainingA spatio-temporal feature set for the training video, whereinIs and n_jThe spatiotemporal feature set of the jth person corresponding to each gait cycle, N is the number of people in the normal video.

The method for re-identifying the leaf-occluded pedestrian based on the semi-coupled identification dictionary learning is characterized in that in the step (4), a dictionary pair is learned by using an identification idea; in order to improve the discrimination fidelity of the sparse representation coefficients, the invention minimizes the distance of the same person among different cameras, and maximizes the distance of different persons among 2 cameras. To define a discriminative reconstruction error term:

The method for re-identifying the pedestrian with the occlusion based on the learning of the semi-coupled discriminative dictionary as claimed in claim 1 is characterized in that in the step (5), a set projection matrix W of the occlusion video and a set projection matrix V of the non-occlusion video are respectively learned; due to the shielding of the posture and the viewpoint of the same person in the same video and the difference of the posture and the viewpoint, certain difference exists among samples. These differences can scatter the features of the same person, making matching more difficult. Therefore, based on some criteria similar to the Fisher criterion, the invention considers learning a subspace projection, so that the sample has small intra-class divergence but large inter-class divergence.

the inter-class divergence of the feature set in camera a can be expressed as:

namely:

as a preferred embodiment of the present invention, according to the method for re-identifying a leaf-occluded pedestrian based on half-coupled discriminative dictionary learning as set forth in claim 1, in step (6), a total objective function is obtained. Half-coupled mapping of sparse representation coefficients, and video-in-projection embedding subspace terms, taking into account discriminant reconstruction errors. The goal of the invention is to minimize the objective function:

where γ, α, β, η, λ are regularization parameter balance factors,

W^TW＝I，V^TV＝I (9)。

As a preferred embodiment of the present invention, in step (7), an algorithm optimization is performed on the objective function. All variables in equation (9) are not convex, and the present invention uses an alternative optimization strategy to solve for unknown variables. In other words, each time the present invention updates one variable, the other variables are fixed. To minimize equation (9), it can be divided into 4 sub-problems, i.e. the update of sparse coding of training samples, the update of dictionary pairs, the update of subspace projection matrices within the video, the update of sparse representation coefficient mapping functions. First, the present invention initializes the mapping matrix P to an identity matrix. D_OAnd D_NThere are many methods of initialization, such as random matrix and PCA basis. The invention initializes dictionary pairs to a random matrix with the Frobenius norm for each column vector and initializes sparse representation coefficients for X and Y by solving equations (10) and (11):

the other variables are fixed to update W and V. Equation (9) can be rewritten as:

the other variables are fixed to update X and Y. Updating X first, equation (9) can be rewritten as:

wherein

updating Y, similar to equation (16), equation (9) may be rewritten as:

wherein

fixing other variable updates D_OAnd D_N. Equation (9) can be rewritten as:

The projection matrix P is updated with the other variables fixed. Equation (9) can be rewritten as:

by setting the derivative of P to 0 to solve, the present invention can obtain:

P＝(XX^T+((λ/γ)I)^-1(YX^T) (23)。

as a preferred embodiment of the present invention, according to the method for re-identifying a pedestrian occluded by leaves based on half-coupled discriminative dictionary learning in claim 1, in step (8), the pedestrian under the occluded video is re-identified. By learning dictionary pair D_OAnd D_NMapping matrix P, video inner subspace projection W and V, robust sparse table of test video can be obtained respectivelyAnd an efficient sparse representation. Because the occlusion video is used as the characteristic of the probe set F and the common video characteristic is used as the characteristic of the gallery set G, the matching process is executed as follows:

In the embodiment of the present invention, fig. 2 is a graph of Top R matching rates on two data sets of LO-PRID2011 and LO-iLIDS-VID according to statistics provided by the embodiment of the present invention.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A leaf-shielded pedestrian re-recognition method based on half-coupling identification dictionary learning is characterized in that the leaf-shielded pedestrian re-recognition method based on half-coupling identification dictionary learning comprises the following steps:

collecting a data set, and adding real branch and leaf occlusion including an occlusion video and a common video from a public pedestrian re-identification data set; respectively extracting the characteristics of the shielded video and the ordinary video;

processing the extracted sample features, introducing a dictionary learning method for feature set learning set class inner projection, learning dictionary pairs of an occlusion video and a common video, and learning a projection matrix from the occlusion video and the common video;

introducing a pair of an identification thought learning dictionary, and performing leaf occlusion pedestrian re-identification based on half-coupling identification dictionary learning;

the leaf-occlusion pedestrian re-identification method based on the semi-coupling identification dictionary learning specifically comprises the following steps:

and (3): designing a dictionary pair with a half-coupling mapping;

and (4): learning dictionary pairs by using the identification ideas;

and (6): obtaining a total objective function;

and (7): performing algorithm optimization on the target function;

and (8): re-identifying pedestrians under the occlusion video;

in the step (2), extracting features of the occlusion video and the common video respectively, extracting walking period features of the occlusion video through FEP, and selecting 20 frames as a walking period of the occlusion video according to the average times of the walking period in the normal video; for a common video, extracting STFV3D features directly for each user;

in step (3), dictionary pairs with half-coupling mappings are designed, as represented below:

s.t.A_i＝D_OX_i；B_i＝D_NY_i

wherein E_represent(. to) represents a sub-dictionary representing fidelity terms, X and Y are A and B at D, respectively_OAnd D_NIs a coefficient of expression of_mapping(. h) is a mapping fidelity term, with the purpose of finding the relationship between the coding coefficients of A and B, and Φ (·) is a coefficient mapping function of the occlusion and common dictionary pairs; a ═ A₁,A₂...A_N]Is the spatio-temporal feature that obscures the training video,is the video feature subset of the ith person, n_iIs the number of walking cycles of the first person, N is the number of occluded persons, B ═ B₁,B₂...B_N]Is a spatio-temporal feature set that does not occlude the training video,

2. The method for re-identifying the leaf-occluded pedestrian based on the semi-coupled discriminative dictionary learning of claim 1, wherein the specific method for collecting the data and establishing the database in the step (1) comprises the following steps:

two public data sets are manufactured again, and a real shielding template is used for simulating on the PRID2011 and the iLIDS-VID to form 2 new data sets LO-PRID2011 and LO-iLIDS-VID; selecting more than 20 178 persons, randomly selecting 89 pairs from the persons for training, and testing by using the rest pairs; adding occlusion to the PRID2011 using an occlusion template to generate a Lo-PRID2011 dataset; covering the iLIDS-VID data set by using an occlusion template to generate an LO-iLIDS-VID data set; and finally, obtaining the generated data set with the occlusion.

3. The method for re-identifying the leaf-occluded pedestrian based on the learning of the semi-coupled discriminative dictionary of claim 1, wherein in the step (4), the distance of the same person among the different cameras is minimized, and the distance of the different persons among the 2 cameras is maximized; defining a discriminative reconstruction error term:

wherein S represents the same class, D represents different classes, and theta is a balance factor;

in the step (5), respectively learning a set projection matrix W of an occluded video and a set projection matrix V of an unoccluded video; learning a subspace projection to make the sample have small intra-class divergence but large inter-class divergence; the mapping matrix constructs the relationship between the occlusion video features and the common video features, and the intra-class divergence of the feature set in the camera A is expressed as:

the inter-class divergence of the feature set in camera a is expressed as:

wherein mu_iIs camera A in A_iOfAverage vector, μ is the average vector of the other samples in camera A,

N_allrefers to the number of all samples in the same camera; the subspace projection in camera a is represented as:

where W is the subspace projection matrix for the feature set in Camera A; the subspace projection in camera B is represented as:

v is a subspace projection matrix of the feature set in camera B; the final projection of a and B in the embedding space is written as:

4. the method for leaf-occluded pedestrian re-recognition based on semi-coupled discriminative dictionary learning of claim 1, wherein in step (6), the objective is to minimize an objective function:

gamma, alpha, beta, eta, lambda are regularization parameter balance factors,

is a regularization term to prevent overfitting; SCD²The objective function of L is expressed as:

W^TW＝I,V^TV＝I。

5. the method for re-identifying the leaf-occluded pedestrian based on the semi-coupled discriminative dictionary learning of claim 1, wherein in the step (7), an alternative optimization strategy is adopted to solve unknown variables; when one variable is updated each time, other variables are fixed; to minimize SCD²The target function of L is divided into 4 subproblems of updating sparse codes of training samples, updating dictionary pairs, updating subspace projection matrixes in the video and updating sparse representation coefficient mapping functions; initializing a mapping matrix P into an identity matrix; initializing dictionary pairs to a random matrix with Frobenius norm for each column vector, and initializing sparse representation coefficients for X and Y by solving the following equations:

fixing other variable updates W and V; SCD²The objective function of L is rewritten as:

by setting the derivatives of W and V, respectively, by solving for:

fixing other variables to update X and Y; first, update X, SCD²The objective function of L is rewritten as:

wherein

Is represented by the formula X_iA subset of related correctly or incorrectly matched cameras B; by mixing X_iIs set to 0, obtained by the following equation:

updating Y, SCD²The objective function of L is rewritten as:

wherein

Is represented by the formula_iSubsets of related correctly or incorrectly matched cameras a; by mixing Y_iIs set to 0 and is solved for by the following equation:

fixing other variable updates D_OAnd D_N；SCD²The objective function of L is rewritten as:

updating the projection matrix P by fixing other variables; SCD²The objective function formula for L is rewritten as:

solving by setting the derivative of P to 0 yields:

P＝(XX^T+((λ/γ)I)^-1(YX^T)。

6. the method for leaf-occluded pedestrian re-recognition based on semi-coupled discriminative dictionary learning according to claim 1, wherein in step (8), D is paired by a learning dictionary_OAnd D_NMapping matrix P and video inner subspace projection W and V to respectively obtain robust sparse representation and effective sparse representation of the test video; taking the occlusion video as the characteristic of the probe set F and the common video characteristic as the characteristic of the gallery set G, and executing the matching process as follows:

1) using the learned P, W, V, by solving the equationEncoding the representation coefficients f of the probe set on the occlusion dictionary,

2) by solving the formula

Encoding the representation coefficient g of the common dictionary on the gallery set,

3) identifying images of the same person in the gallery set as in the probe set; and calculating the distance between the features of the atlas and the probe set through the obtained sparse representation coefficients, and then sequencing the distances, wherein the image of the atlas with the minimum distance is the image which is correctly matched with the probe set.

7. The leaf-occluded pedestrian re-recognition system based on the semi-coupled identification dictionary learning is used for implementing the leaf-occluded pedestrian re-recognition method based on the semi-coupled identification dictionary learning in claim 1.

8. A traffic road pedestrian image re-recognition device for implementing the leaf-occlusion pedestrian re-recognition method based on semi-coupled identification dictionary learning of claim 1.