CN116777004B

CN116777004B - Non-convex discrimination migration subspace learning method integrating distribution alignment information

Info

Publication number: CN116777004B
Application number: CN202310779007.7A
Authority: CN
Inventors: 罗廷金; 刘玥瑛
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2024-02-06
Anticipated expiration: 2043-06-29
Also published as: CN116777004A

Abstract

The invention discloses a non-convex discrimination migration subspace learning method integrated with distribution alignment information, which comprises the following steps: s1, constructing an objective function by cutting the square of the Frobenius norm; s2, carrying out iterative solution on the objective function through an IALM algorithm to obtain a projection matrix P; s3, projecting the source domain data and the target domain data into a common characteristic subspace through a projection matrix P. Compared with trace norms, the invention adopts a non-convex regularization term which is more compact to rank function approximation as a proxy model, better approximates low-rank constraint by minimizing the first k minimum singular values of a reconstruction matrix, and aligns class information of data by minimizing joint distribution difference between a source domain and a target domain, so that the source domain data can reconstruct the target domain data better, and classification performance is improved.

Description

Non-convex discrimination migration subspace learning method integrating distribution alignment information

Technical Field

The invention relates to the technical field of non-supervision, in particular to a non-convex discrimination migration subspace learning method integrating distribution alignment information.

Background

With the development of big data age, in fields such as image recognition, natural language processing, medical health, etc., data often come from different domains, so that the distribution is different. Meanwhile, mass data is not generally marked. Therefore, how to classify these unlabeled data (i.e., target domain data) is a very important issue, and also draws attention from many expert scholars. Unsupervised domain adaptation provides a natural way to solve the above problem by deep mining the information in the tagged data (i.e., source domain data) to facilitate learning of the target classifier.

In the unsupervised field adaptation, the conventional methods can be divided into two main types, namely, distribution-based adaptation and migration subspace-based learning, and the first type of method reduces the difference between the two domains by jointly distributing Ji Yuanyu and the target domain. Whereas measuring the difference between two domains can be achieved by defining distances in different ways, the MMD criterion is widely adopted for its simplicity and a solid theoretical basis. There are many methods based on distributed adaptation, and several classical methods are presented below. Migration component analysis (TCA) maps data of two domains together into a high-dimensional regenerated kernel hilbert space when solving the problem of different source domain and target domain distributions. In this space, the edge distribution differences between the two are minimized while their respective internal properties are maximally preserved. On the basis, the Joint Distribution Adaptation (JDA) simultaneously considers the edge distribution difference and the class condition distribution difference of the data, and realizes finer alignment of the two-domain distribution from the whole to various types. Wherein the calculation of class conditional distribution differences requires the use of labels for the target domain data, which is precisely our task. To address this problem, JDA proposes the idea of pseudo tags. While taking into account the unreliability of the pseudo tag, an iterative update mechanism is employed to reduce adverse effects. JDA, however, does not make appropriate adjustments to the weights of the edge distribution differences and the condition distribution differences according to the application scenario. Such as when the data sets are more disparate, meaning that the edge distribution differences are greater; and when the data sets are similar, this means that the condition distribution requires more attention. Therefore, if both weights are set to a certain value, the performance of the algorithm is only degraded. For this idea Wang et al propose Balanced Distribution Adaptation (BDA). Meanwhile, the author also proposes a new weighted balance distribution adaptive (W-BDA) algorithm which solves the problem of class imbalance in migration by adaptively changing the weights of the classes. Furthermore, zhang et al believe that the previous approach is to calculate a weighted sum of the edge distribution differences and class condition distribution differences of the two domains, and the more natural measure is to directly calculate their joint probability distribution differences. The authors also propose that the previous approach only increases the mobility between domains, ignoring the discriminability between different classes of data, which may lead to a reduction in model classification performance, and therefore propose Joint Probability Distribution Adaptation (JPDA) to solve this problem. Wang et al search the nature of MMD through theoretical deduction, verify that minimizing the difference of class condition distribution is equivalent to minimizing the inter-class distance of source domain and target domain classes, and also equivalent to maximizing the data variance and inter-class distance of two domain classes, which intuitively illustrates the reason for the reduced feature discriminability in MMD. Through theoretical guidance, wang et al propose a new discriminant MMD with two parallel strategies, alleviating the negative impact of MMD on feature discriminant.

Compared with the first type of methods, the methods based on the migration subspace learning start from the geometric structural features of the data, so that the expression is more concise and effective, and the methods can be divided into two types: the manifold learning retention data structure is introduced by methods represented by geodesic flow Sampling (SGF) and geodesic flow kernel method (GFK). They consider the domains as one point in the glasman manifold, and connect to form a path through d intermediate points in the geodesic distance. Thus, a suitable transformation for each step is found, i.e. a transformation from the source domain to the target domain can be achieved. While another approach considers statistical features on Ji Yuanyu and target domain data to achieve knowledge migration, such as Subspace Alignment (SA), subspace Distribution Alignment (SDA), and variance correlation alignment (CORAL), among others. These methods are susceptible to noise and outliers, and Shao et al therefore propose low rank migration subspace learning (LTSL). The method migrates data in a source domain and a target domain into a unified generalized subspace, realizes linear reconstruction of the target domain sample through the source domain sample, and applies low-rank constraint to a reconstruction matrix to ensure the global structure of the data. In consideration of the influence of noise in the migration process, noise items are introduced to improve the robustness of the method. Fang et al then uses a least squares fit tag. Considering the adverse effect of tag noise on the model, the tag of the source domain data is relaxed, and the operation provides more freedom for adapting the tag and increases the margin between different types of data as much as possible. Discrimination migration subspace learning (DTSL) combines the two, and it is believed that better results can be obtained by only linearly representing the target samples by several similar source domain samples, rather than all the samples, when reconstructed. Therefore, sparse constraint is applied to the reconstruction matrix, and the local structure of the data is described. The joint feature selection and structure preservation method (FSSP) and the joint low rank representation and feature selection (JLRFS) method then impose structured sparse constraints on the projection matrix, causing the model to screen more critical features. And meanwhile, the drawing Laplace term is introduced, so that the data structure is better reserved.

However, these methods still have many limitations, first, the methods based on migration subspace learning typically impose low rank constraints on reconstruction matrices to preserve the global structure of the data. Since rank minimization problem is NP-hard, many scholars approximate rank functions by employing trace norms. However, this approximation method is significantly affected by the maximum singular value, and the rank function is not, so its solution may lead to a serious deviation from the optimal solution. Secondly, the traditional migration subspace learning method only uses the characteristic information of the data to reconstruct the target sample, but ignores the category information of the data. If the target samples are represented linearly by source data of similar characteristics but different classes, it is difficult to split. In order to solve the problems, the invention provides a non-convex discriminant migration subspace learning method integrated with distribution alignment information to solve the problems.

Disclosure of Invention

The invention aims to provide a non-convex discriminant migration subspace learning method integrated with distribution alignment information, so as to overcome the defects existing in the prior art.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the non-convex discrimination migration subspace learning method integrated with the distribution alignment information is used for image recognition and comprises the following steps of:

s1, constructing an objective function by cutting the square of the Frobenius norm;

s2, carrying out iterative solution on the objective function through an IALM algorithm to obtain a projection matrix P;

s3, projecting the source domain data and the target domain data into a common characteristic subspace through a projection matrix P.

Further, the step S1 specifically includes:

s10, linearly representing the target domain data by the source domain data as follows:

P ^T X _t ＝P ^T X _s Z+E

wherein X is _t And X _s Feature matrices of the target domain data and the source domain data respectively; wherein n is _t And n _s The data quantity of the two domains respectively; z is a reconstruction matrix and E is a noise matrix.

S11, applying low-rank constraint on Z to maintain global structure of data, and applying l ₁ The norm promotes the sparsity on Z to ensure the local structure of the data, and simultaneously applies sparsity constraint to E to obtain an objective function as follows:

wherein alpha, beta, lambda are regularization parameters;

s12, the square of the Frobenius norm is truncated to promote the first k minimum singular values to be 0 to approximate a low-rank condition, and an objective function is converted into:

where k=n _s -rank(Z)，Is a discriminant subspace learning function term that enhances the discriminant of data by fitting source domain labels in subspaces, using the relaxed labels y+=y _s +B.sup.M to eliminate the influence of tag noise on the model, Y _s For the single thermal encoding of source domain data, M is a non-negative matrix, and by Hadamard operator, matrix B is:

s13, applying structured sparsity on the projection matrix _1,2 The norm converts the objective function in step S12 into:

wherein, gamma is a balance factor;

s14, reducing the distance between a source domain and a target domain in the subspace by learning P, and obtaining the following formula based on a distribution self-adaption method:

where ω is the balance factor,the number of class c samples in the source domain and the target domain, x= [ X ] _s ,X _t ]，M ₀ ,M _c The definition is as follows：

S15, calculating the formula in the step S14 by adopting the pseudo tag, and carrying out iterative refinement on the pseudo tag to enable the pseudo tag to be made The final objective function is:

further, the step S2 specifically includes:

s20, introducing matrixes Q and J, wherein the solution can be equivalently obtained into an objective function by minimizing the augmented Lagrangian function:

wherein Y is ₁ ,Y ₂ ,Y ₃ Is Lagrangian multiplier, μ is penalty parameter;

s21, fixing matrices Q, Z, J, E and M, and rewriting the objective function formula of step S20 to:

by calculating the partial derivative of the objective function in step S21 to make it 0, a closed-form solution of the matrix P can be obtained:

wherein G is ₁ ＝Y _s +B⊙M，G ₂ ＝X _t -X _s Z；

S22, fixing matrices P, Z, J, E and M, updating the objective function of Q as:

wherein the method comprises the steps ofWith a contraction operator, the solution for Q can be expressed as:

wherein [ A ]] _:,i Representing the ith column of matrix a;

s23, fixing matrices P, Q, J, E and M, updating the objective function of Z as:

due toIs NP-difficult, based on the Ky Fan theory, which is equivalent toThe objective function may be rewritten as:

and (3) carrying out iterative computation on F and Z until the objective function converges to obtain an optimal solution of Z, and then updating in the step S24, instead of only iterating F and Z once, enabling the partial derivative of the formula to be 0, so as to obtain a closed solution of Z as follows:

wherein,in calculating F, the following formula is optimized:

the optimal solution of F is formed by k left singular vectors corresponding to k minimum singular values of Z, namely U ₂ Wherein U is a left singular vector matrix obtained by SVD decomposition of Z, and U= [ U ₁ ,U ₂ ]， Since only FF is needed in computing the closed-form solution of Z ^T Calculated by the following formula:

s24, fixing matrices P, Q, Z, E and M, updating the objective function of J as:

the iterative solution of J is:

where shrnk (x, c) =sign max (|x| -c, 0).

S25, fixing the matrices P, Q, Z, J and M, updating the objective function of E as:

the iterative solution of E is:

s26, fixing matrices P, Q, Z, J and E, updating the objective function of M to be

Let r=p ^T X _s -Y _s The objective function is decomposed into d×n _s Sub-optimization problemWherein d, n _s The number of rows and columns of the non-negative loose label matrix M are respectively, and the optimal solution is M _ij ＝max(R _ij B _ij 0), the closed-form solution of the objective function is:

M ^* ＝max(R⊙B,0)

s27, updating Lagrangian multiplier Y ₁ ,Y ₂ ,Y ₃ Penalty factor μ:

wherein ρ is learning rate, μ _max Is the maximum value that mu can take.

Compared with the prior art, the invention has the advantages that: compared with trace norms, the invention adopts a non-convex regularization term which is more compact to rank function approximation as a proxy model, better approximates low-rank constraint by minimizing the first k minimum singular values of a reconstruction matrix, and aligns class information of data by minimizing joint distribution difference between a source domain and a target domain, so that the source domain data can reconstruct the target domain data better, and classification performance is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an iterative optimization solution diagram of a non-convex discriminating migration subspace learning method incorporating distribution alignment information of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, thereby making clear and defining the scope of the present invention.

Conventional migratory subspace learning methods typically employ a trace norm to approximate a low rank, however it is too relaxed, resulting in its solution being severely biased from the ideal solution. In addition, the conventional method ignores category information of the alignment data, so that the category performance is reduced. Therefore, the present embodiment proposes a non-convex discrimination migration subspace learning method incorporating distribution alignment information to solve these problems. Specifically, the object of the present embodiment is to learn the projection matrix P to project the source domain and the target domain into one common subspace.

Referring to fig. 1, the embodiment discloses a non-convex discrimination migration subspace learning method integrated with distribution alignment information, which is used for image recognition and comprises the following steps:

step S1, constructing an objective function by cutting off the square of the Frobenius norm, wherein the method specifically comprises the following steps:

in step S10, since the two domain data are in the same manifold, the target domain data can be linearly represented by the source domain data, and the original formula is written as follows:

P ^T X _t ＝P ^T X _s Z+E (1)

Step S11, each target sample may be represented linearly by a combination of its source domain neighbor samples. Thus, a low rank constraint is imposed on the reconstruction matrix Z to preserve this property and preserve the global structure of the data. At the same time, the reconstruction of each target sample only needs few source domain samples to participate, and l is applied ₁ The norm promotes the sparseness of the data on Z to ensure the local structure of the data, and the obtained objective function is as follows:

where α, β, λ are regularization parameters.

In step S12, since the trace norm is the most tightly convex approximation of the rank function on the unit sphere, many expert scholars approximate the low rank condition as a proxy model. However, according to the definition of the trace norm, it varies drastically with a significant change in the maximum singular value of the matrix, but the rank of the matrix does not change. Thus, the relaxation of the trace norms may cause the solution to deviate strictly from the ideal solution. In the embodiment, the square of the Frobenius norm is truncated, so that the first k minimum singular values are caused to be 0 to approximate a low-rank condition, the influence of other larger singular values on a proxy model is eliminated, and the objective function (2) is converted into:

step S13, additionally, the selected features may be redundant, thus requiring the application of a structured sparsity l to the projection matrix _1,2 The norm converts the objective function in step S12 into:

wherein γ is a balance factor.

In step S14, however, the conventional method and the above formula only use the characteristic information of the data to disregard the influence of the data class information on the model for the subspace of Ji Yuanyu and the target domain. Therefore, the embodiment utilizes semantic information of different categories to reduce the distribution difference between two domains, thereby improving the classification performance of the model. Specifically, the present embodiment reduces the two-domain distance in the subspace by learning P, and obtains the following formula based on the distributed adaptive method:

where ω is the balance factor,the number of class c samples in the source domain and the target domain, x= [ X ] _s ,X _t ]，M ₀ ,M _c The definition is as follows:

step S15, however, in the calculation of equation (5), since the target field is not labeled, it cannot be known whether the jth sample in the target field belongs to the c-th class, i.e.Unknown. Thus, a pseudo tag is employed for the calculation. Because the reliability of the pseudo tag is lower, the pseudo tag is iteratively refined to make +.>The final objective function is:

and S2, carrying out iterative solution on the objective function through an IALM algorithm to obtain a projection matrix P.

Specifically, the present embodiment designs the IALM algorithm to effectively solve the objective function (6). The step S2 specifically includes:

step S20, introducing matrices Q and J, the solution of which can be equivalently obtained by minimizing the augmented lagrangian function (7):

step S21, fixing the matrices Q, Z, J, E and M, and rewriting the objective function formula of step S20 to:

by calculating the partial derivative of the objective function (8) to be 0, the closed-form solution of the matrix P can be obtained:

wherein G is ₁ ＝Y _s +B⊙M，G ₂ ＝X _t -X _s Z；

Step S22, fixing the matrices P, Z, J, E and M, updating the objective function of Q as:

wherein [ A ]] _:,i Representing the ith column of matrix a;

step S23, fixing the matrices P, Q, J, E and M, updating the objective function of Z as:

and (3) carrying out iterative computation on F and Z until the objective function converges to obtain an optimal solution of Z, and then updating in the step S24, instead of only iterating F and Z once, so that the partial derivative of the formula (13) is 0, and obtaining a closed solution of Z as follows:

wherein,in calculating F, the following formula is optimized:

step S24, fixing the matrices P, Q, Z, E and M, updating the objective function of J as:

the iterative solution of J is:

where shrnk (x, c) =sign max (|x| -c, 0).

Step S25, fixing the matrices P, Q, Z, J and M, updating the objective function of E as:

the iterative solution of E is:

step S26, fixing matrices P, Q, Z, J and E, updating the objective function of M to

M ^* ＝max(R⊙B,0) (21)

step S27, updating Lagrangian multiplier Y ₁ ,Y ₂ ,Y ₃ Penalty factor μ:

wherein ρ is learning rate, μ _max Is the maximum value that mu can take.

In summary, the various optimization steps of the model are summarized in table 1 below:

and S3, projecting the source domain data and the target domain data into a common characteristic subspace through a projection matrix P.

The present embodiment also improves the algorithm described above. Another variation is employed when approximating the rank functionWhich is constructed by truncating the trace norms. Compared with the approximation in the previous text->Variant->The method is closer to the rank function in formula definition, so that the approximation effect is better. The objective function can be obtained as follows:

analysis shows that the proxy model that changes the low-rank condition only affects the computation of the reconstruction matrix Z, so that the objective function transitions to:

in the same way as described above,it is NP-hard to solve. Fortunately, its equivalence can be converted to the following formula

Where r=rank (Z). While in optimizing trace norms, the present invention converts it to Tr (Z ^T DZ), iteratively solving. Wherein the method comprises the steps of Representing the current optimal solution. Thus, the objective function (24) can be converted into

When the matrices F and G are fixed, Z is updated. The closed-form solution for Z can be obtained as follows:

when Z is fixed, the optimal solutions of F and G are respectively composed of left and right singular vectors corresponding to the first r maximum singular values of Z.

Compared with trace norms, the invention adopts a non-convex regularization term which is more compact to rank function approximation as a proxy model, better approximates low-rank constraint by minimizing the first k minimum singular values of a reconstruction matrix, and aligns class information of data by minimizing joint distribution difference between a source domain and a target domain, so that the source domain data can reconstruct the target domain data better, and classification performance is improved.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, the patentees may make various modifications or alterations within the scope of the appended claims, and are intended to be within the scope of the invention as described in the claims.

Claims

1. The non-convex discrimination migration subspace learning method integrated with the distribution alignment information is used for image recognition and is characterized by comprising the following steps of:

s3, projecting the source domain data and the target domain data into a common feature subspace through a projection matrix P;

the step S1 specifically includes:

P ^T X _t ＝P ^T X _s Z+E

wherein X is _t And X _s Feature matrices of the target domain data and the source domain data respectively; wherein n is _t And n _s The data quantity of the two domains respectively; z is a reconstruction matrix, E is a noise matrix;

wherein alpha, beta, lambda are regularization parameters;

wherein, gamma is a balance factor;

s15, calculating the formula in the step S14 by adopting the pseudo tag, and carrying out iterative refinement on the pseudo tag to enable the pseudo tag to be madeThe final objective function is:

the step S2 specifically includes:

initializing: p=p ₀ ,Q＝0，M＝1,J＝Z＝0,E＝0,Y ₁ ＝0,Y ₂ ＝0,Y ₃ ＝0,μ ₀ ＝0.05,μ _max ＝10 ⁶ ,ρ＝1.1,ε＝10 ^-6 Repeating the following steps until the objective function meets the convergence condition;

wherein G is ₁ ＝Y _s +B⊙M，G ₂ ＝X _t -X _s Z；

S22, fixing matrices P, Z, J, E and M, updating the objective function of Q as:

wherein [ A ]] _:,i Representing the ith column of matrix a;

s23, fixing matrices P, Q, J, E and M, updating the objective function of Z as:

due toIs NP-difficult, which is equivalent to +.>The objective function may be rewritten as:

wherein,in calculating F, the following formula is optimized:

s24, fixing matrices P, Q, Z, E and M, updating the objective function of J as:

the iterative solution of J is:

where shrnk (x, c) =sign max (|x| -c, 0);

the iterative solution of E is:

M ^* ＝max(R⊙B,0)

s27, updating Lagrangian multiplier Y ₁ ,Y ₂ ,Y ₃ Penalty factor μ:

wherein ρ is learning rate, μ _max Is the maximum value that mu can take.