CN117541882B - Instance-based multi-view vision fusion transduction type zero sample classification method - Google Patents

Instance-based multi-view vision fusion transduction type zero sample classification method Download PDF

Info

Publication number
CN117541882B
CN117541882B CN202410017127.8A CN202410017127A CN117541882B CN 117541882 B CN117541882 B CN 117541882B CN 202410017127 A CN202410017127 A CN 202410017127A CN 117541882 B CN117541882 B CN 117541882B
Authority
CN
China
Prior art keywords
pictures
unseen
view
semantic
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410017127.8A
Other languages
Chinese (zh)
Other versions
CN117541882A (en
Inventor
汤龙
赵靖涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202410017127.8A priority Critical patent/CN117541882B/en
Publication of CN117541882A publication Critical patent/CN117541882A/en
Application granted granted Critical
Publication of CN117541882B publication Critical patent/CN117541882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an example-based multi-view vision fusion transduction type zero sample classification method, which comprises the following steps: extracting multi-view visual characteristics of the seen type pictures; sending the multi-view visual characteristics and semantic attributes of the seen pictures into a multi-view visual-semantic mapping model, and learning a conversion matrix at different views by using an alternate direction multiplier method; predicting semantic projection of the unseen pictures by using the learned conversion matrix; further extracting final semantic representation of the unseen pictures from the semantic projection and realizing identification of the unseen pictures based on the final semantic representation; according to the invention, the interaction sharing of visual information at different visual angles is realized by adopting a single linear constraint, so that the complexity of a traditional multi-visual angle information fusion model is simplified; meanwhile, in order to further mine visual-semantic association hidden in the unseen class, a self-supervision learning strategy is provided, semantic calibration on the unseen class picture is realized by utilizing consistency among multiple visual angles, and zero sample classification performance can be greatly improved.

Description

Instance-based multi-view vision fusion transduction type zero sample classification method
Technical Field
The invention relates to the technical field of image recognition, in particular to a multi-view vision fusion transduction type zero sample classification method based on an example.
Background
In recent years, zero Sample Learning (ZSL) has received increasing attention. Unlike conventional pattern recognition, ZSL is able to recognize samples with tags that are not used in training. And classifying the samples of the unseen category by constructing a mapping relation between the visual features and the semantic attributes by using the inherent association of the semantic attributes among the categories. Most ZSL methods currently use only a single visual feature representation, however in many practical scenarios, multiple viewing angles of visual feature representations are often available through different channels. For high resolution images, different feature extractors (SIFT, SURF, PHOG, pre-training depth networks, etc.) may be used to acquire features. Due to the variability between different perspectives, example-based multi-perspective visual data may provide a more comprehensive description than single visual data, if utilized properly, is expected to greatly improve ZSL performance.
Disclosure of Invention
The invention aims to: the invention aims to provide an example-based multi-view vision fusion transduction type zero sample classification method, which improves the generalization performance of a zero sample classifier, so as to realize more accurate identification of unseen pictures.
The technical scheme is as follows: the invention discloses an example-based multi-view vision fusion transduction type zero sample classification method, which comprises the following steps of:
(1) Extracting multi-view visual characteristics of the seen type pictures and the unseen type pictures;
(2) Sending the multi-view visual characteristics of the seen type pictures and the corresponding category semantic attributes into a multi-view visual-semantic mapping model, and learning a conversion matrix on different view angles by using an alternate direction multiplier method;
(3) Predicting semantic projection of the unseen pictures by using the learned conversion matrix;
(4) And (3) further extracting final semantics of the unseen pictures according to the semantic projection obtained in the step (3) and identifying the unseen pictures.
Further, the step (1) specifically comprises the following steps: visual features were extracted using ResNet and GoogLeNet pre-trained on the ImageNet database, representing view a and view B, respectively.
Further, the multi-view visual-semantic mapping model in the step (2) is expressed as the following optimization problem:
the constraint conditions are as follows:
wherein, ,/>,/>,/>Is an optimized variable matrix; representing a view angle feature matrix on a v-th view angle of the seen type picture, wherein each column corresponds to one seen type picture; /(I) A category semantic attribute matrix representing the seen type pictures, wherein each column corresponds to one seen type picture; Representing the average matrix of the semantic attributes of the seen classes, wherein each column of the average matrix is the average vector of all the semantic attributes of the seen classes; A dimension that is a view feature at a v-th view; m is the dimension of the category semantic attribute; n is the number of the pictures of the seen class; /(I) 、/>、/>、/>、/>Are super parameters; v is the number of viewing angles.
Further, the alternative direction multiplier method in the step (2) is specifically as follows:
Initializing:
,/>,/>,/>,/>,/>
Let the iteration times Determining convergence threshold/>,/>And related parameters/>,/>,/>
By solving the followingEquation of/>; Wherein/>For the parameters within the alternate direction multiplier method, the formula is as follows:
by solving the following Optimization problem of/>The formula is as follows:
by solving the following Equation of/>The formula is as follows:
updating by
Updating by
Updating Lagrangian multipliers by the following formula,/>,/>And/>
If it is
Then convergence; otherwise, letContinuing the updating operation; the final transformation matrix obtained through iteration is: /(I)
Further, the semantic projection of the unseen picture on a single view angle obtained in the step (3) is as follows:
wherein, Representing a view angle feature matrix on a v view angle of the unseen picture, wherein each column corresponds to one unseen picture; /(I)The number of the unseen pictures.
Further, the final semantic formula of the unseen pictures is extracted in the step (4) as follows:
wherein, The final semantic representation of the unseen pictures to be extracted, namely the optimization variables;;/> Is a diagonal matrix;
is a super parameter.
Further, the method comprises the steps of,Calculated by the following formula:
wherein, In the form of a block matrix,
Further, the identifying of the unseen picture in the step (4) includes:
And (3) averaging the final semantic representation of the unseen pictures at each view angle, wherein the formula is as follows:
category labels for unseen pictures are obtained using the following formula:
wherein, Returning a number vector representing the largest element of each column of the input matrix; semantic attributes are not found; /(I) The number of the unobserved categories is the number of the unobserved categories; /(I)And marking the category of the identified unseen pictures.
The invention relates to an example-based multi-view vision fusion transduction type zero sample identification system, which comprises the following components:
The data acquisition module is used for extracting multi-view visual characteristics of the seen pictures and the unseen pictures;
The model learning module is used for sending the multi-view visual characteristics of the seen type pictures and the corresponding category semantic attributes into a multi-view visual-semantic mapping model, and learning the conversion matrixes at different view angles by using an alternate direction multiplier method; predicting semantic projection of the unseen pictures by using the learned conversion matrix; further extracting final semantic representation of the unseen pictures from the semantic projection;
And the picture identification module is used for classifying the extracted final semantic representations of the unseen pictures.
An apparatus of the present invention includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program implementing an example-based multi-view vision fusion transduction zero sample classification method of any one of the above when loaded into the processor.
The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages: the multi-view visual features are utilized to contain richer, more sufficient and more comprehensive information of the training samples, so that the generalization performance of the zero sample classifier is effectively improved, and more accurate identification of unseen pictures is realized. Compared with the existing zero sample learning method, the method has the advantages that the classification accuracy of unseen pictures is improved to a large extent, the method is simple and efficient, and the method has good application prospects in the related fields of pattern recognition, data mining, computer vision and the like.
Drawings
FIG. 1 is a flow chart of the present invention.
Description of the embodiments
The technical scheme of the invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention provides an example-based multi-view vision fusion transduction type zero sample classification method, which includes the following steps:
(1) Extracting multi-view visual characteristics of the seen type pictures and the unseen type pictures; the method comprises the following steps: visual features were extracted using ResNet (fc 9 layer 2048 dimension) and GoogLeNet (fc 17 layer 1024 dimension) pre-trained on the ImageNet database, representing view a and view B, respectively.
(2) Sending the multi-view visual characteristics of the seen type pictures and the corresponding category semantic attributes into a multi-view visual-semantic mapping model, and learning a conversion matrix on different view angles by using an alternate direction multiplier method; the multi-view visual-semantic mapping model is expressed as the following optimization problem P1:
the constraint conditions are as follows:
wherein, ,/>,/>,/>Is an optimized variable matrix; representing a view angle feature matrix on a v-th view angle of the seen type picture, wherein each column corresponds to one seen type picture; /(I) A category semantic attribute matrix representing the seen type pictures, wherein each column corresponds to one seen type picture; Representing the average matrix of the semantic attributes of the seen classes, wherein each column of the average matrix is the average vector of all the semantic attributes of the seen classes; A dimension that is a view feature at a v-th view; m is the dimension of the category semantic attribute; n is the number of the pictures of the seen class; /(I) 、/>、/>、/>、/>Are super parameters; v is the number of viewing angles.Is a loss term; /(I)For consistency items, the prediction results of all the visual angles are kept consistent on the seen type samples, and constraint 1.1 is that single linear constraint is adopted to realize interactive sharing of visual information at different visual angles; constraint 1.2-1.4 is used to construct a reconfigurable subspace in the map; constraint 1.5 is a non-negative constraint. The variable of the problem P1 input is/>,/>,/>
The solution variable is,/>,/>,/>,/>
For the optimization problem P1, an alternate direction multiplier method is adopted for solving, and the method is specifically as follows:
inputting training set data ,/>,/>; Super parameter/>,/>、/>、/>、/>; Let iteration times/>Determining convergence threshold/>,/>And related parameters/>,/>,/>
Initializing:
,/>,/>,/>,/>,/>
Let the iteration times Determining convergence threshold/>,/>And related parameters/>,/>,/>
By solving the followingEquation of/>; Wherein/>For the parameters within the alternate direction multiplier method, the formula is as follows:
by solving the following Optimization problem of/>The formula is as follows:
by solving the following Equation of/>The formula is as follows:
updating by
Updating by
Updating Lagrangian multipliers by the following formula,/>,/>And/>
If it is
Then convergence; otherwise, letContinuing the updating operation; the final transformation matrix obtained through iteration is: /(I)
(3) Predicting semantic projection of the unseen pictures by using the learned conversion matrix; the semantic projection of the unseen pictures on a single view angle is obtained as follows:
wherein, Representing a view angle feature matrix on a v view angle of the unseen picture, wherein each column corresponds to one unseen picture; /(I)The number of the unseen pictures.
(4) And (3) further extracting final semantics of the unseen pictures according to the semantic projection obtained in the step (3) and identifying the unseen pictures. The final semantic formula for extracting the unseen pictures is as follows:
wherein, The final semantic representation of the unseen pictures to be extracted, namely the optimization variables;
Is a diagonal matrix;
is a super parameter.
Calculated by the following formula:
wherein, In the form of a block matrix,
The identification of the unseen pictures comprises the following steps:
And (3) averaging the final semantic representation of the unseen pictures at each view angle, wherein the formula is as follows:
category labels for unseen pictures are obtained using the following formula:
wherein, Returning a number vector representing the largest element of each column of the input matrix; /(I)Semantic attributes are not found; /(I)The number of the unobserved categories is the number of the unobserved categories; /(I)And marking the category of the identified unseen pictures.
In order to verify the effect and performance of the method provided by the invention, the invention adopts three classical zero sample classification data sets of AwA, CUB, SUN and the like to carry out a comparison experiment. Table 1 lists the unseen identification accuracies of several existing ZSL methods.
Table 1 comparison of identification results of several methods
Compared with other methods, the multi-view visual fusion transduction type zero sample classification method based on the example provided by the invention can fully utilize the characteristic information of different views, has obvious advantages in generalization performance, and can achieve higher level of accuracy in recognition of unseen pictures.
The embodiment of the invention also provides a multi-view visual fusion transduction type zero sample identification system based on the example, which comprises the following steps:
The data acquisition module is used for extracting multi-view visual characteristics of the seen pictures and the unseen pictures;
The model learning module is used for sending the multi-view visual characteristics of the seen type pictures and the corresponding category semantic attributes into a multi-view visual-semantic mapping model, and learning the conversion matrixes at different view angles by using an alternate direction multiplier method; predicting semantic projection of the unseen pictures by using the learned conversion matrix; further extracting final semantic representation of the unseen pictures from the semantic projection;
And the picture identification module is used for classifying the extracted final semantic representations of the unseen pictures.
The embodiment of the invention also provides equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the multi-view visual fusion transduction zero sample classification method based on any one of the examples when being loaded to the processor.

Claims (7)

1. An example-based multi-view vision fusion transduction type zero sample classification method is characterized by comprising the following steps of:
(1) Extracting multi-view visual characteristics of the seen type pictures and the unseen type pictures;
(2) Sending the multi-view visual characteristics of the seen type pictures and the corresponding category semantic attributes into a multi-view visual-semantic mapping model, and learning a conversion matrix on different view angles by using an alternate direction multiplier method; the alternate direction multiplier method is specifically as follows:
Initializing:
,/>,/>,/>,/>,/>
Let the iteration times Determining convergence threshold/>,/>Sum parameter/>,/>,/>
By solving the followingEquation of/>; Wherein/>For the parameters within the alternate direction multiplier method, the formula is as follows:
by solving the following Optimization problem of/>The formula is as follows:
by solving the following Equation of/>The formula is as follows:
updating by
Updating by
Updating Lagrangian multipliers by the following formula,/>,/>And/>
If it is
Then convergence; otherwise, letContinuing the updating operation; the final transformation matrix obtained through iteration is: /(I)
(3) Predicting semantic projection of the unseen pictures by using the learned conversion matrix;
(4) And (3) further extracting final semantics of the unseen pictures according to the semantic projection obtained in the step (3) and identifying the unseen pictures.
2. The example-based multi-view vision fusion transduction zero sample classification method according to claim 1, wherein the step (1) is specifically as follows: visual features were extracted using ResNet and GoogLeNet pre-trained on the ImageNet database, representing view a and view B, respectively.
3. The example-based multi-view vision fusion transduction zero sample classification method according to claim 1, wherein the multi-view vision-semantic mapping model in the step (2) is expressed as the following optimization problem:
the constraint conditions are as follows:
wherein, ,/>,/>,/>Is an optimized variable matrix; representing a view angle feature matrix on a v-th view angle of the seen type picture, wherein each column corresponds to one seen type picture; /(I) A category semantic attribute matrix representing the seen type pictures, wherein each column corresponds to one seen type picture; Representing the average matrix of the semantic attributes of the seen classes, wherein each column of the average matrix is the average vector of all the semantic attributes of the seen classes; A dimension that is a view feature at a v-th view; m is the dimension of the category semantic attribute; n is the number of the pictures of the seen class; /(I) 、/>、/>、/>、/>Are super parameters; v is the number of viewing angles.
4. The example-based multi-view visual fusion transduction zero sample classification method according to claim 1, wherein the semantic projection of the unseen pictures on a single view angle is obtained in the step (3):
wherein, Representing a view angle feature matrix on a v view angle of the unseen picture, wherein each column corresponds to one unseen picture; /(I)The number of the unseen pictures.
5. The example-based multi-view visual fusion transduction zero sample classification method according to claim 1, wherein the final semantic formula of the extracted unseen pictures in the step (4) is as follows:
wherein, The final semantic representation of the unseen pictures to be extracted, namely the optimization variables;;/> Is a diagonal matrix;
is a super parameter.
6. The method for zero sample classification based on instance-based multi-view visual fusion transduction of claim 1, wherein,Calculated by the following formula:
wherein, In the form of a block matrix,
7. The example-based multi-view visual fusion transduction zero sample classification method according to claim 1, wherein the identifying of the unseen class picture in the step (4) comprises:
And (3) averaging the final semantic representation of the unseen pictures at each view angle, wherein the formula is as follows:
category labels for unseen pictures are obtained using the following formula:
wherein, Returning a number vector representing the largest element of each column of the input matrix; /(I)Semantic attributes are not found; /(I)The number of the unobserved categories is the number of the unobserved categories; /(I)And marking the category of the identified unseen pictures.
CN202410017127.8A 2024-01-05 2024-01-05 Instance-based multi-view vision fusion transduction type zero sample classification method Active CN117541882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410017127.8A CN117541882B (en) 2024-01-05 2024-01-05 Instance-based multi-view vision fusion transduction type zero sample classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410017127.8A CN117541882B (en) 2024-01-05 2024-01-05 Instance-based multi-view vision fusion transduction type zero sample classification method

Publications (2)

Publication Number Publication Date
CN117541882A CN117541882A (en) 2024-02-09
CN117541882B true CN117541882B (en) 2024-04-19

Family

ID=89796173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410017127.8A Active CN117541882B (en) 2024-01-05 2024-01-05 Instance-based multi-view vision fusion transduction type zero sample classification method

Country Status (1)

Country Link
CN (1) CN117541882B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109643384A (en) * 2016-08-16 2019-04-16 诺基亚技术有限公司 Method and apparatus for zero sample learning
CN110431565A (en) * 2017-03-06 2019-11-08 诺基亚技术有限公司 Zero sample learning method and system of direct-push and/or adaptive maximum boundary
CN111222471A (en) * 2020-01-09 2020-06-02 中国科学技术大学 Zero sample training and related classification method based on self-supervision domain perception network
KR20200130759A (en) * 2019-04-25 2020-11-20 연세대학교 산학협력단 Zero Shot Recognition Apparatus for Automatically Generating Undefined Attribute Information in Data Set and Method Thereof
CN112801105A (en) * 2021-01-22 2021-05-14 之江实验室 Two-stage zero sample image semantic segmentation method
CN113361646A (en) * 2021-07-01 2021-09-07 中国科学技术大学 Generalized zero sample image identification method and model based on semantic information retention
CN113902969A (en) * 2021-10-12 2022-01-07 西安电子科技大学 Zero-sample SAR target identification method fusing similarity of CNN and image
CN115424096A (en) * 2022-11-08 2022-12-02 南京信息工程大学 Multi-view zero-sample image identification method
KR20230078134A (en) * 2021-11-26 2023-06-02 연세대학교 산학협력단 Device and Method for Zero Shot Semantic Segmentation
CN116433977A (en) * 2023-04-18 2023-07-14 国网智能电网研究院有限公司 Unknown class image classification method, unknown class image classification device, computer equipment and storage medium
CN117274726A (en) * 2023-11-23 2023-12-22 南京信息工程大学 Picture classification method and system based on multi-view supplementary tag

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11694042B2 (en) * 2020-06-16 2023-07-04 Baidu Usa Llc Cross-lingual unsupervised classification with multi-view transfer learning
CN114037879A (en) * 2021-10-22 2022-02-11 北京工业大学 Dictionary learning method and device for zero sample recognition

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109643384A (en) * 2016-08-16 2019-04-16 诺基亚技术有限公司 Method and apparatus for zero sample learning
CN110431565A (en) * 2017-03-06 2019-11-08 诺基亚技术有限公司 Zero sample learning method and system of direct-push and/or adaptive maximum boundary
KR20200130759A (en) * 2019-04-25 2020-11-20 연세대학교 산학협력단 Zero Shot Recognition Apparatus for Automatically Generating Undefined Attribute Information in Data Set and Method Thereof
CN111222471A (en) * 2020-01-09 2020-06-02 中国科学技术大学 Zero sample training and related classification method based on self-supervision domain perception network
CN112801105A (en) * 2021-01-22 2021-05-14 之江实验室 Two-stage zero sample image semantic segmentation method
CN113361646A (en) * 2021-07-01 2021-09-07 中国科学技术大学 Generalized zero sample image identification method and model based on semantic information retention
CN113902969A (en) * 2021-10-12 2022-01-07 西安电子科技大学 Zero-sample SAR target identification method fusing similarity of CNN and image
KR20230078134A (en) * 2021-11-26 2023-06-02 연세대학교 산학협력단 Device and Method for Zero Shot Semantic Segmentation
CN115424096A (en) * 2022-11-08 2022-12-02 南京信息工程大学 Multi-view zero-sample image identification method
CN116433977A (en) * 2023-04-18 2023-07-14 国网智能电网研究院有限公司 Unknown class image classification method, unknown class image classification device, computer equipment and storage medium
CN117274726A (en) * 2023-11-23 2023-12-22 南京信息工程大学 Picture classification method and system based on multi-view supplementary tag

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A sharing multi-view feature selection method via Alternating Direction Method of Multipliers;Qiang Lin等;《Neurocomputing》;20190314;第333卷;124-134 *
Zero-Shot Learning via Robust Latent Representation and Manifold Regularization;MIn Meng等;《IEEE Transactions on Image Processing》;20190430;第28卷(第4期);1824-1836 *
基于零样本学习的图像分类研究;王欣洁;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200115;I138-2247 *

Also Published As

Publication number Publication date
CN117541882A (en) 2024-02-09

Similar Documents

Publication Publication Date Title
Kapoor et al. Active learning with gaussian processes for object categorization
CN108537269B (en) Weak interactive object detection deep learning method and system thereof
WO2020228525A1 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
CN111858954A (en) Task-oriented text-generated image network model
CN111523029A (en) Personalized recommendation method based on knowledge graph representation learning
CN112215171B (en) Target detection method, device, equipment and computer readable storage medium
CN114037674B (en) Industrial defect image segmentation detection method and device based on semantic context
Sahbi Imageclef annotation with explicit context-aware kernel maps
CN113159067A (en) Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
CN112037239B (en) Text guidance image segmentation method based on multi-level explicit relation selection
Liao et al. Exploring more concentrated and consistent activation regions for cross-domain semantic segmentation
CN116935170A (en) Processing method and device of video processing model, computer equipment and storage medium
CN114333062B (en) Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
Huang et al. Pedestrian detection using RetinaNet with multi-branch structure and double pooling attention mechanism
CN107729821B (en) Video summarization method based on one-dimensional sequence learning
Jiao et al. Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval
CN117312594A (en) Sketching mechanical part library retrieval method integrating double-scale features
CN117541882B (en) Instance-based multi-view vision fusion transduction type zero sample classification method
CN103049570A (en) Method for searching and sorting images and videos on basis of relevancy preserving mapping and classifier
CN112750128A (en) Image semantic segmentation method and device, terminal and readable storage medium
Cai et al. Adaptive visual-depth fusion transfer
CN114387489A (en) Power equipment identification method and device and terminal equipment
CN114692715A (en) Sample labeling method and device
Zhang et al. Bicanet: Lidar point cloud classification network based on coordinate attention and blueprint separation involution neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant