CN116431849A - Lu Bangtu text retrieval method based on evidence learning - Google Patents

Lu Bangtu text retrieval method based on evidence learning Download PDF

Info

Publication number
CN116431849A
CN116431849A CN202310369406.6A CN202310369406A CN116431849A CN 116431849 A CN116431849 A CN 116431849A CN 202310369406 A CN202310369406 A CN 202310369406A CN 116431849 A CN116431849 A CN 116431849A
Authority
CN
China
Prior art keywords
matrix
text
image
data set
evidence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310369406.6A
Other languages
Chinese (zh)
Other versions
CN116431849B (en
Inventor
胡鹏
秦阳
李源
彭德中
彭玺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202310369406.6A priority Critical patent/CN116431849B/en
Publication of CN116431849A publication Critical patent/CN116431849A/en
Application granted granted Critical
Publication of CN116431849B publication Critical patent/CN116431849B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a robust image-text retrieval method based on evidence learning, which comprises the following steps: processing a training data set comprising the image and the corresponding text description to obtain a processed training data set; constructing a robust image-text retrieval model based on evidence learning according to the processed training data set; inputting a retrieval data mode into the robust image-text retrieval model, and calculating data similarity; the invention solves the problem of poor robustness of the image-text retrieval method by carrying out similarity sorting according to the calculated data similarity and outputting the image-text retrieval result.

Description

Lu Bangtu text retrieval method based on evidence learning
Technical Field
The invention relates to the field of cross-modal retrieval, in particular to a robust image-text retrieval method based on evidence learning.
Background
The existing cross-modal retrieval is mainly divided into two main types in terms of method, wherein the first type is real-value representation learning, and is characterized in that the other type directly learns the characteristics extracted from different modalities, and the second type is binary representation learning, and is characterized in that the characteristics extracted from different modalities are mapped into a Hamming binary space first, and then the characteristics are learned in the space.
However, the above method has the following problems: even if a large amount of complete and accurate data is used for training, when a plurality of different sentences are used for expressing one image or a plurality of different images are used for describing one sentence in the face of one-to-many problems, whether a plurality of search results given by the user meet the search requirements cannot be judged.
Disclosure of Invention
Aiming at the defects in the prior art, the robust image-text retrieval method based on evidence learning solves the problem of poor robustness of the image-text retrieval method.
In order to achieve the aim of the invention, the invention adopts the following technical scheme: the robust image-text retrieval method based on evidence learning comprises the following steps:
s1, processing a training data set comprising images and corresponding text descriptions to obtain a processed training data set;
s2, constructing a robust image-text retrieval model based on evidence learning according to the processed training data set;
s3, inputting a retrieval data mode into the robust image-text retrieval model, and calculating data similarity;
s4, performing similarity sorting according to the calculated data similarity, and outputting a graph and text retrieval result.
Further: the step S1 comprises the following sub-steps:
s11, determining the number K of image-text pairs in a training data set;
s12, converting all image data in the training data set into a three-channel RGB image to obtain a processed image data set;
s13, dividing each text segment in the training data set into words or phrases, deleting the prepositions, the conjunctions and the auxiliary words of the texts, converting each word into a digital number, recording the total length Z, and converting each text segment in the training data set into a Z-dimensional vector to obtain a processed text data set;
s14, taking the processed image data set and the processed text data set as the processed training data set.
Further: the step S2 comprises the following sub-steps:
s21, constructing an image pre-training network by using a Faster RCNN, inputting the processed image data set into the image pre-training network, and flattening the image data set into a one-dimensional vector to obtain an image matrix V', wherein the dimension of the image matrix V is K rows and P columns, and each row corresponds to a pre-training vector of an image;
s22, constructing a text pre-training network by using a Bi-GRU network, inputting the processed text data set into the image pre-training network, flattening the processed text data set into a one-dimensional vector to obtain a text matrix T ', wherein the dimension of the text matrix T' is K rows and Q columns, and each row corresponds to a pre-training vector of a text;
s23, constructing a network VSE which maps data of different modes to the same space by using VSE++, and setting the data to be output as a D-dimensional vector;
s24, inputting an image matrix V 'and a text matrix T' into a network VSE to obtain an image feature matrix V and a text feature matrix T corresponding to the image matrix V and the text matrix T, wherein V and T are K rows and D columns;
s25, calculating a similarity matrix S of the image feature matrix V and the text feature matrix T, and calculating an evidence matrix E;
s26, calculating a Dirichlet distribution parameter matrix alpha according to the evidence matrix E;
s27, calculating an uncertainty loss function L according to the Dirichlet distribution parameter matrix alpha ce And a consistency loss function L kl
S28, training uncertainty loss function L by adopting Adam optimizer ce And a consistency loss function L kl And (5) completing the construction of the image-text retrieval model.
Further: the calculation formulas of the similarity matrix S and the evidence matrix E in the step S25 are as follows:
Figure BDA0004168144410000031
Figure BDA0004168144410000032
wherein S is a similarity matrix, E is an evidence matrix, T is a transpose of the matrix, T E (0, 1) is a constant parameter, and T is a dot product symbol.
Further: the calculation formula of the dirichlet distribution parameter matrix α in step S26 is as follows:
α=E+L
wherein, α is a dirichlet distribution parameter matrix, each of which is a dirichlet distribution parameter, L is a matrix with all 1 data elements, and the number of rows and columns is the same as that of the evidence matrix E.
Further: the uncertainty loss function L in the step S27 ce And a consistency loss function L kl The calculation formula of (2) is as follows:
Figure BDA0004168144410000033
Figure BDA0004168144410000034
Figure BDA0004168144410000035
Figure BDA0004168144410000036
wherein i and j are counting parameters, alpha i For the ith row of the Dirichlet distribution parameter matrix alpha, alpha ij Is Di Li KeThe lightning distribution parameter matrix alpha, row i, column j, ψ (i) is a dual gamma function, Γ (i) is a gamma function,
Figure BDA0004168144410000041
i row of the K-order identity matrix, +.H is Hadamard product, B ()'s beta function, +.>
Figure BDA0004168144410000042
Intermediate matrix->
Figure BDA0004168144410000043
O is a K-dimensional vector with 1 for each dimension; />
Figure BDA0004168144410000044
Is->
Figure BDA0004168144410000045
Sum of elements of each column,/->
Figure BDA0004168144410000046
Is->
Figure BDA0004168144410000047
Is (j) th column element->
Figure BDA0004168144410000048
Is->
Figure BDA0004168144410000049
Is the k-th column element of (c).
Further: the step S3 comprises the following sub-steps:
s31, inputting the number M of output results to be matched and the data mode to be searched into a Lu Bangtu text search model;
s32, the input data mode is the data to be matched of the data mode to be searched, and the similarity of all data in the rest data search library is calculated.
Further: the method of the step S4 is as follows: and carrying out similarity sorting according to the calculated data similarity, obtaining M matching results with highest similarity, outputting the matching results, and completing retrieval.
The beneficial effects of the invention are as follows:
1. compared with the prior art, the uncertainty of the prediction result can be captured;
2. compared with the prior art, the robustness of the image-text retrieval method is enhanced, and the image-text retrieval method has high reliability and accuracy.
Drawings
Fig. 1 is a flowchart of the image-text searching method according to the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
As shown in fig. 1, in one embodiment of the present invention, a robust image-text retrieval method based on evidence learning is provided, which includes the following steps:
s1, processing a training data set comprising images and corresponding text descriptions to obtain a processed training data set;
s2, constructing a robust image-text retrieval model based on evidence learning according to the processed training data set;
s3, inputting a retrieval data mode into the robust image-text retrieval model, and calculating data similarity;
s4, performing similarity sorting according to the calculated data similarity, and outputting a graph and text retrieval result.
In this embodiment, the step S1 includes the following sub-steps:
s11, determining the number K of image-text pairs in a training data set;
s12, converting all image data in the training data set into a three-channel RGB image to obtain a processed image data set;
s13, dividing each text segment in the training data set into words or phrases, deleting the prepositions, the conjunctions and the auxiliary words of the texts, converting each word into a digital number, recording the total length Z, and converting each text segment in the training data set into a Z-dimensional vector to obtain a processed text data set;
s14, taking the processed image data set and the processed text data set as the processed training data set.
In this embodiment, the step S2 includes the following sub-steps:
s21, constructing an image pre-training network by using a Faster RCNN, inputting the processed image data set into the image pre-training network, and flattening the image data set into a one-dimensional vector to obtain an image matrix V', wherein the dimension of the image matrix V is K rows and P columns, and each row corresponds to a pre-training vector of an image;
s22, constructing a text pre-training network by using a Bi-GRU network, inputting the processed text data set into the image pre-training network, flattening the processed text data set into a one-dimensional vector to obtain a text matrix T ', wherein the dimension of the text matrix T' is K rows and Q columns, and each row corresponds to a pre-training vector of a text;
s23, constructing a network VSE which maps data of different modes to the same space by using VSE++, and setting the data to be output as a D-dimensional vector;
s24, inputting an image matrix V 'and a text matrix T' into a network VSE to obtain an image feature matrix V and a text feature matrix T corresponding to the image matrix V and the text matrix T, wherein V and T are K rows and D columns;
s25, calculating a similarity matrix S of the image feature matrix V and the text feature matrix T, and calculating an evidence matrix E, wherein the calculation formula is as follows:
Figure BDA0004168144410000061
Figure BDA0004168144410000062
wherein S is a similar matrix, E is an evidence matrix, T is the transpose of the matrix, T E (0, 1) is a constant parameter, and T is a dot product symbol;
s26, calculating a Dirichlet distribution parameter matrix alpha according to the evidence matrix E, wherein the calculation formula is as follows:
α=E+L
wherein, alpha is a dirichlet allocation parameter matrix, each behavior dirichlet allocation parameter is a matrix with 1 data element and the row and column numbers are the same as the evidence matrix E;
s27, calculating an uncertainty loss function L according to the Dirichlet distribution parameter matrix alpha ce And a consistency loss function L kl The calculation formula is as follows:
Figure BDA0004168144410000063
Figure BDA0004168144410000064
Figure BDA0004168144410000065
Figure BDA0004168144410000066
wherein i and j are counting parameters, alpha i For the ith row of the Dirichlet distribution parameter matrix alpha, alpha ij For the dirichlet distribution parameter matrix alpha, row i, column j, ψ (i) is a dual gamma function, Γ (i) is a gamma function,
Figure BDA0004168144410000071
i row of the K-order identity matrix, +.H is Hadamard product, B ()'s beta function, +.>
Figure BDA0004168144410000072
Intermediate matrix->
Figure BDA0004168144410000073
O is a K-dimensional vector with 1 for each dimension; />
Figure BDA0004168144410000074
Is->
Figure BDA0004168144410000075
Sum of elements of each column,/->
Figure BDA0004168144410000076
Is->
Figure BDA0004168144410000077
Is (j) th column element->
Figure BDA0004168144410000078
Is->
Figure BDA0004168144410000079
Is the k-th column element of (2);
s28, training uncertainty loss function L by adopting Adam optimizer ce And a consistency loss function L kl And (5) completing the construction of the image-text retrieval model.
In this embodiment, the step S3 includes the following sub-steps:
s31, inputting the number M of output results to be matched and the data mode to be searched into a Lu Bangtu text search model;
s32, the input data mode is the data to be matched of the data mode to be searched, and the similarity of all data in the rest data search library is calculated.
In this embodiment, the method in step S4 is as follows: and carrying out similarity sorting according to the calculated data similarity, obtaining M matching results with highest similarity, outputting the matching results, and completing retrieval.
In the description of the present invention, it should be understood that the terms "center," "thickness," "upper," "lower," "horizontal," "top," "bottom," "inner," "outer," "radial," and the like indicate or are based on the orientation or positional relationship shown in the drawings, merely to facilitate description of the present invention and to simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be configured and operated in a particular orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be interpreted as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defined as "first," "second," "third," or the like, may explicitly or implicitly include one or more such feature.
The invention provides a robust image-text retrieval method based on evidence learning, which solves the problem of poor robustness of the image-text retrieval method.

Claims (8)

1. The Lu Bangtu text retrieval method based on evidence learning is characterized by comprising the following steps of:
s1, processing a training data set comprising images and corresponding text descriptions to obtain a processed training data set;
s2, constructing a robust image-text retrieval model based on evidence learning according to the processed training data set;
s3, inputting a retrieval data mode into the robust image-text retrieval model, and calculating data similarity;
s4, performing similarity sorting according to the calculated data similarity, and outputting a graph and text retrieval result.
2. The method for Lu Bangtu text retrieval based on evidence learning of claim 1, wherein said step S1 comprises the sub-steps of:
s11, determining the number K of image-text pairs in a training data set;
s12, converting all image data in the training data set into a three-channel RGB image to obtain a processed image data set;
s13, dividing each text segment in the training data set into words or phrases, deleting the prepositions, the conjunctions and the auxiliary words of the texts, converting each word into a digital number, recording the total length Z, and converting each text segment in the training data set into a Z-dimensional vector to obtain a processed text data set;
s14, taking the processed image data set and the processed text data set as the processed training data set.
3. The method for Lu Bangtu text retrieval based on evidence learning of claim 2, wherein said step S2 comprises the sub-steps of:
s21, constructing an image pre-training network by using a Faster RCNN, inputting the processed image data set into the image pre-training network, and flattening the image data set into a one-dimensional vector to obtain an image matrix V', wherein the dimension of the image matrix V is K rows and P columns, and each row corresponds to a pre-training vector of an image;
s22, constructing a text pre-training network by using a Bi-GRU network, inputting the processed text data set into the image pre-training network, flattening the processed text data set into a one-dimensional vector to obtain a text matrix T ', wherein the dimension of the text matrix T' is K rows and Q columns, and each row corresponds to a pre-training vector of a text;
s23, constructing a network VSE which maps data of different modes to the same space by using VSE++, and setting the data to be output as a D-dimensional vector;
s24, inputting an image matrix V 'and a text matrix T' into a network VSE to obtain an image feature matrix V and a text feature matrix T corresponding to the image matrix V and the text matrix T, wherein V and T are K rows and D columns;
s25, calculating a similarity matrix S of the image feature matrix V and the text feature matrix T, and calculating an evidence matrix E;
s26, calculating a Dirichlet distribution parameter matrix alpha according to the evidence matrix E;
s27, calculating an uncertainty loss function L according to the Dirichlet distribution parameter matrix alpha ce And a consistency loss function L kl
S28, training uncertainty loss function L by adopting Adam optimizer ce And a consistency loss function L kl And (5) completing the construction of the image-text retrieval model.
4. The method for Lu Bangtu text retrieval based on evidence learning according to claim 3, wherein the calculation formulas of the similarity matrix S and the evidence matrix E in the step S25 are as follows:
Figure FDA0004168144400000021
Figure FDA0004168144400000022
wherein S is a similarity matrix, E is an evidence matrix, T is a transpose of the matrix, T E (0, 1) is a constant parameter, and T is a dot product symbol.
5. The method for Lu Bangtu text retrieval based on evidence learning according to claim 4, wherein the dirichlet allocation parameter matrix α in step S26 is calculated as follows:
α=E+L
wherein, α is a dirichlet distribution parameter matrix, each of which is a dirichlet distribution parameter, L is a matrix with all 1 data elements, and the number of rows and columns is the same as that of the evidence matrix E.
6. The evidence learning-based Lu Bangtu text retrieval method according to claim 5, wherein the uncertainty loss function L in step S27 ce And a consistency loss function L kl The calculation formula of (2) is as follows:
Figure FDA0004168144400000031
Figure FDA0004168144400000032
Figure FDA0004168144400000033
Figure FDA0004168144400000034
wherein i and j are counting parameters, alpha i For the ith row of the Dirichlet distribution parameter matrix alpha, alpha ij For the dirichlet distribution parameter matrix alpha, row i, column j, ψ (i) is a dual gamma function, Γ (i) is a gamma function,
Figure FDA0004168144400000035
i row of the K-order identity matrix, +.H is Hadamard product, B ()'s beta function, +.>
Figure FDA0004168144400000036
Intermediate matrix->
Figure FDA0004168144400000037
O is a K-dimensional vector with 1 for each dimension;
Figure FDA0004168144400000038
is->
Figure FDA0004168144400000039
Sum of elements of each column,/->
Figure FDA00041681444000000310
Is->
Figure FDA00041681444000000311
Is (j) th column element->
Figure FDA00041681444000000312
Is->
Figure FDA00041681444000000313
Is the k-th column element of (c).
7. The method for Lu Bangtu text retrieval based on evidence learning of claim 6, wherein said step S3 comprises the sub-steps of:
s31, inputting the number M of output results to be matched and the data mode to be searched into a Lu Bangtu text search model;
s32, the input data mode is the data to be matched of the data mode to be searched, and the similarity of all data in the rest data search library is calculated.
8. The method for Lu Bangtu text retrieval based on evidence learning of claim 7, wherein the method of step S4 is as follows: and carrying out similarity sorting according to the calculated data similarity, obtaining M matching results with highest similarity, outputting the matching results, and completing retrieval.
CN202310369406.6A 2023-04-07 2023-04-07 Lu Bangtu text retrieval method based on evidence learning Active CN116431849B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310369406.6A CN116431849B (en) 2023-04-07 2023-04-07 Lu Bangtu text retrieval method based on evidence learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310369406.6A CN116431849B (en) 2023-04-07 2023-04-07 Lu Bangtu text retrieval method based on evidence learning

Publications (2)

Publication Number Publication Date
CN116431849A true CN116431849A (en) 2023-07-14
CN116431849B CN116431849B (en) 2024-01-02

Family

ID=87092083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310369406.6A Active CN116431849B (en) 2023-04-07 2023-04-07 Lu Bangtu text retrieval method based on evidence learning

Country Status (1)

Country Link
CN (1) CN116431849B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200234086A1 (en) * 2019-01-22 2020-07-23 Honda Motor Co., Ltd. Systems for modeling uncertainty in multi-modal retrieval and methods thereof
CN112000818A (en) * 2020-07-10 2020-11-27 中国科学院信息工程研究所 Cross-media retrieval method and electronic device for texts and images
US20210103814A1 (en) * 2019-10-06 2021-04-08 Massachusetts Institute Of Technology Information Robust Dirichlet Networks for Predictive Uncertainty Estimation
US20210117760A1 (en) * 2020-06-02 2021-04-22 Intel Corporation Methods and apparatus to obtain well-calibrated uncertainty in deep neural networks
WO2022036616A1 (en) * 2020-08-20 2022-02-24 中山大学 Method and apparatus for generating inferential question on basis of low labeled resource
CN114372523A (en) * 2021-12-31 2022-04-19 北京航空航天大学 Binocular matching uncertainty estimation method based on evidence deep learning
CN114817596A (en) * 2022-04-14 2022-07-29 华侨大学 Cross-modal image-text retrieval method integrating semantic similarity embedding and metric learning
CN114999006A (en) * 2022-05-20 2022-09-02 南京邮电大学 Multi-modal emotion analysis method, device and equipment based on uncertainty estimation
CN115033727A (en) * 2022-05-10 2022-09-09 中国科学技术大学 Image text matching method based on cross-modal confidence perception
CN115221947A (en) * 2022-06-22 2022-10-21 北京邮电大学 Robust multi-mode active learning method based on pre-training language model
CN115455171A (en) * 2022-11-08 2022-12-09 苏州浪潮智能科技有限公司 Method, device, equipment and medium for mutual retrieval and model training of text videos

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200234086A1 (en) * 2019-01-22 2020-07-23 Honda Motor Co., Ltd. Systems for modeling uncertainty in multi-modal retrieval and methods thereof
US20210103814A1 (en) * 2019-10-06 2021-04-08 Massachusetts Institute Of Technology Information Robust Dirichlet Networks for Predictive Uncertainty Estimation
US20210117760A1 (en) * 2020-06-02 2021-04-22 Intel Corporation Methods and apparatus to obtain well-calibrated uncertainty in deep neural networks
CN112000818A (en) * 2020-07-10 2020-11-27 中国科学院信息工程研究所 Cross-media retrieval method and electronic device for texts and images
WO2022036616A1 (en) * 2020-08-20 2022-02-24 中山大学 Method and apparatus for generating inferential question on basis of low labeled resource
CN114372523A (en) * 2021-12-31 2022-04-19 北京航空航天大学 Binocular matching uncertainty estimation method based on evidence deep learning
CN114817596A (en) * 2022-04-14 2022-07-29 华侨大学 Cross-modal image-text retrieval method integrating semantic similarity embedding and metric learning
CN115033727A (en) * 2022-05-10 2022-09-09 中国科学技术大学 Image text matching method based on cross-modal confidence perception
CN114999006A (en) * 2022-05-20 2022-09-02 南京邮电大学 Multi-modal emotion analysis method, device and equipment based on uncertainty estimation
CN115221947A (en) * 2022-06-22 2022-10-21 北京邮电大学 Robust multi-mode active learning method based on pre-training language model
CN115455171A (en) * 2022-11-08 2022-12-09 苏州浪潮智能科技有限公司 Method, device, equipment and medium for mutual retrieval and model training of text videos

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANG QIN: "Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval", 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDI, pages 1 - 4 *

Also Published As

Publication number Publication date
CN116431849B (en) 2024-01-02

Similar Documents

Publication Publication Date Title
US11775838B2 (en) Image captioning with weakly-supervised attention penalty
CN109147767B (en) Method, device, computer equipment and storage medium for recognizing numbers in voice
US10635949B2 (en) Latent embeddings for word images and their semantics
US8908961B2 (en) System and methods for arabic text recognition based on effective arabic text feature extraction
US8160402B2 (en) Document image processing apparatus
JP6003705B2 (en) Information processing apparatus and information processing program
KR20110028034A (en) Method and apparatus for searching a label
CN112434134B (en) Search model training method, device, terminal equipment and storage medium
CN114298035A (en) Text recognition desensitization method and system thereof
CN101493896A (en) Document image processing apparatus and method
Saluja et al. Error detection and corrections in Indic OCR using LSTMs
JP2019153293A (en) Text image processing using stroke-aware max-min pooling for ocr system employing artificial neural network
CN116610803A (en) Industrial chain excellent enterprise information management method and system based on big data
CN114724156B (en) Form identification method and device and electronic equipment
CN113159013A (en) Paragraph identification method and device based on machine learning, computer equipment and medium
CN115658934A (en) Image-text cross-modal retrieval method based on multi-class attention mechanism
CN108628826B (en) Candidate word evaluation method and device, computer equipment and storage medium
CN108694167B (en) Candidate word evaluation method, candidate word ordering method and device
CN116431849B (en) Lu Bangtu text retrieval method based on evidence learning
CN112270189A (en) Question type analysis node generation method, question type analysis node generation system and storage medium
CN116843175A (en) Contract term risk checking method, system, equipment and storage medium
EP4089568A1 (en) Cascade pooling for natural language document processing
CN115410185A (en) Method for extracting specific name and unit name attributes in multi-modal data
JP2019204146A (en) Data conversion apparatus, image processing apparatus and program
Wilkinson et al. Neural word search in historical manuscript collections

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant