CN116431849A

CN116431849A - Lu Bangtu text retrieval method based on evidence learning

Info

Publication number: CN116431849A
Application number: CN202310369406.6A
Authority: CN
Inventors: 胡鹏; 秦阳; 李源; 彭德中; 彭玺
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-07-14
Anticipated expiration: 2043-04-07
Also published as: CN116431849B

Abstract

The invention discloses a robust image-text retrieval method based on evidence learning, which comprises the following steps: processing a training data set comprising the image and the corresponding text description to obtain a processed training data set; constructing a robust image-text retrieval model based on evidence learning according to the processed training data set; inputting a retrieval data mode into the robust image-text retrieval model, and calculating data similarity; the invention solves the problem of poor robustness of the image-text retrieval method by carrying out similarity sorting according to the calculated data similarity and outputting the image-text retrieval result.

Description

Lu Bangtu text retrieval method based on evidence learning

Technical Field

The invention relates to the field of cross-modal retrieval, in particular to a robust image-text retrieval method based on evidence learning.

Background

The existing cross-modal retrieval is mainly divided into two main types in terms of method, wherein the first type is real-value representation learning, and is characterized in that the other type directly learns the characteristics extracted from different modalities, and the second type is binary representation learning, and is characterized in that the characteristics extracted from different modalities are mapped into a Hamming binary space first, and then the characteristics are learned in the space.

However, the above method has the following problems: even if a large amount of complete and accurate data is used for training, when a plurality of different sentences are used for expressing one image or a plurality of different images are used for describing one sentence in the face of one-to-many problems, whether a plurality of search results given by the user meet the search requirements cannot be judged.

Disclosure of Invention

Aiming at the defects in the prior art, the robust image-text retrieval method based on evidence learning solves the problem of poor robustness of the image-text retrieval method.

In order to achieve the aim of the invention, the invention adopts the following technical scheme: the robust image-text retrieval method based on evidence learning comprises the following steps:

s1, processing a training data set comprising images and corresponding text descriptions to obtain a processed training data set;

s2, constructing a robust image-text retrieval model based on evidence learning according to the processed training data set;

s3, inputting a retrieval data mode into the robust image-text retrieval model, and calculating data similarity;

s4, performing similarity sorting according to the calculated data similarity, and outputting a graph and text retrieval result.

Further: the step S1 comprises the following sub-steps:

s11, determining the number K of image-text pairs in a training data set;

s12, converting all image data in the training data set into a three-channel RGB image to obtain a processed image data set;

s13, dividing each text segment in the training data set into words or phrases, deleting the prepositions, the conjunctions and the auxiliary words of the texts, converting each word into a digital number, recording the total length Z, and converting each text segment in the training data set into a Z-dimensional vector to obtain a processed text data set;

s14, taking the processed image data set and the processed text data set as the processed training data set.

Further: the step S2 comprises the following sub-steps:

s21, constructing an image pre-training network by using a Faster RCNN, inputting the processed image data set into the image pre-training network, and flattening the image data set into a one-dimensional vector to obtain an image matrix V', wherein the dimension of the image matrix V is K rows and P columns, and each row corresponds to a pre-training vector of an image;

s22, constructing a text pre-training network by using a Bi-GRU network, inputting the processed text data set into the image pre-training network, flattening the processed text data set into a one-dimensional vector to obtain a text matrix T ', wherein the dimension of the text matrix T' is K rows and Q columns, and each row corresponds to a pre-training vector of a text;

s23, constructing a network VSE which maps data of different modes to the same space by using VSE++, and setting the data to be output as a D-dimensional vector;

s24, inputting an image matrix V 'and a text matrix T' into a network VSE to obtain an image feature matrix V and a text feature matrix T corresponding to the image matrix V and the text matrix T, wherein V and T are K rows and D columns;

s25, calculating a similarity matrix S of the image feature matrix V and the text feature matrix T, and calculating an evidence matrix E;

s26, calculating a Dirichlet distribution parameter matrix alpha according to the evidence matrix E;

s27, calculating an uncertainty loss function L according to the Dirichlet distribution parameter matrix alpha _ce And a consistency loss function L _kl ；

S28, training uncertainty loss function L by adopting Adam optimizer _ce And a consistency loss function L _kl And (5) completing the construction of the image-text retrieval model.

Further: the calculation formulas of the similarity matrix S and the evidence matrix E in the step S25 are as follows:

wherein S is a similarity matrix, E is an evidence matrix, T is a transpose of the matrix, T E (0, 1) is a constant parameter, and T is a dot product symbol.

Further: the calculation formula of the dirichlet distribution parameter matrix α in step S26 is as follows:

α＝E+L

wherein, α is a dirichlet distribution parameter matrix, each of which is a dirichlet distribution parameter, L is a matrix with all 1 data elements, and the number of rows and columns is the same as that of the evidence matrix E.

Further: the uncertainty loss function L in the step S27 _ce And a consistency loss function L _kl The calculation formula of (2) is as follows:

wherein i and j are counting parameters, alpha _i For the ith row of the Dirichlet distribution parameter matrix alpha, alpha _ij Is Di Li KeThe lightning distribution parameter matrix alpha, row i, column j, ψ (i) is a dual gamma function, Γ (i) is a gamma function,

i row of the K-order identity matrix, +.H is Hadamard product, B ()'s beta function, +.>

Intermediate matrix->

O is a K-dimensional vector with 1 for each dimension; />

Is->

Sum of elements of each column,/->

Is->

Is (j) th column element->

Is->

Is the k-th column element of (c).

Further: the step S3 comprises the following sub-steps:

s31, inputting the number M of output results to be matched and the data mode to be searched into a Lu Bangtu text search model;

s32, the input data mode is the data to be matched of the data mode to be searched, and the similarity of all data in the rest data search library is calculated.

Further: the method of the step S4 is as follows: and carrying out similarity sorting according to the calculated data similarity, obtaining M matching results with highest similarity, outputting the matching results, and completing retrieval.

The beneficial effects of the invention are as follows:

1. compared with the prior art, the uncertainty of the prediction result can be captured;

2. compared with the prior art, the robustness of the image-text retrieval method is enhanced, and the image-text retrieval method has high reliability and accuracy.

Drawings

Fig. 1 is a flowchart of the image-text searching method according to the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.

As shown in fig. 1, in one embodiment of the present invention, a robust image-text retrieval method based on evidence learning is provided, which includes the following steps:

In this embodiment, the step S1 includes the following sub-steps:

s11, determining the number K of image-text pairs in a training data set;

In this embodiment, the step S2 includes the following sub-steps:

s25, calculating a similarity matrix S of the image feature matrix V and the text feature matrix T, and calculating an evidence matrix E, wherein the calculation formula is as follows:

wherein S is a similar matrix, E is an evidence matrix, T is the transpose of the matrix, T E (0, 1) is a constant parameter, and T is a dot product symbol;

s26, calculating a Dirichlet distribution parameter matrix alpha according to the evidence matrix E, wherein the calculation formula is as follows:

α＝E+L

wherein, alpha is a dirichlet allocation parameter matrix, each behavior dirichlet allocation parameter is a matrix with 1 data element and the row and column numbers are the same as the evidence matrix E;

s27, calculating an uncertainty loss function L according to the Dirichlet distribution parameter matrix alpha _ce And a consistency loss function L _kl The calculation formula is as follows:

wherein i and j are counting parameters, alpha _i For the ith row of the Dirichlet distribution parameter matrix alpha, alpha _ij For the dirichlet distribution parameter matrix alpha, row i, column j, ψ (i) is a dual gamma function, Γ (i) is a gamma function,

Intermediate matrix->

O is a K-dimensional vector with 1 for each dimension; />

Is->

Sum of elements of each column,/->

Is->

Is (j) th column element->

Is->

Is the k-th column element of (2);

In this embodiment, the step S3 includes the following sub-steps:

In this embodiment, the method in step S4 is as follows: and carrying out similarity sorting according to the calculated data similarity, obtaining M matching results with highest similarity, outputting the matching results, and completing retrieval.

In the description of the present invention, it should be understood that the terms "center," "thickness," "upper," "lower," "horizontal," "top," "bottom," "inner," "outer," "radial," and the like indicate or are based on the orientation or positional relationship shown in the drawings, merely to facilitate description of the present invention and to simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be configured and operated in a particular orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be interpreted as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defined as "first," "second," "third," or the like, may explicitly or implicitly include one or more such feature.

The invention provides a robust image-text retrieval method based on evidence learning, which solves the problem of poor robustness of the image-text retrieval method.

Claims

1. The Lu Bangtu text retrieval method based on evidence learning is characterized by comprising the following steps of:

2. The method for Lu Bangtu text retrieval based on evidence learning of claim 1, wherein said step S1 comprises the sub-steps of:

s11, determining the number K of image-text pairs in a training data set;

3. The method for Lu Bangtu text retrieval based on evidence learning of claim 2, wherein said step S2 comprises the sub-steps of:

4. The method for Lu Bangtu text retrieval based on evidence learning according to claim 3, wherein the calculation formulas of the similarity matrix S and the evidence matrix E in the step S25 are as follows:

5. The method for Lu Bangtu text retrieval based on evidence learning according to claim 4, wherein the dirichlet allocation parameter matrix α in step S26 is calculated as follows:

α＝E+L

6. The evidence learning-based Lu Bangtu text retrieval method according to claim 5, wherein the uncertainty loss function L in step S27 _ce And a consistency loss function L _kl The calculation formula of (2) is as follows:

Intermediate matrix->

O is a K-dimensional vector with 1 for each dimension;

is->

Sum of elements of each column,/->

Is->

Is (j) th column element->

Is->

Is the k-th column element of (c).

7. The method for Lu Bangtu text retrieval based on evidence learning of claim 6, wherein said step S3 comprises the sub-steps of:

8. The method for Lu Bangtu text retrieval based on evidence learning of claim 7, wherein the method of step S4 is as follows: and carrying out similarity sorting according to the calculated data similarity, obtaining M matching results with highest similarity, outputting the matching results, and completing retrieval.