CN117611924A - Plant leaf phenotype disease classification method based on graphic subspace joint learning - Google Patents

Plant leaf phenotype disease classification method based on graphic subspace joint learning Download PDF

Info

Publication number
CN117611924A
CN117611924A CN202410067314.7A CN202410067314A CN117611924A CN 117611924 A CN117611924 A CN 117611924A CN 202410067314 A CN202410067314 A CN 202410067314A CN 117611924 A CN117611924 A CN 117611924A
Authority
CN
China
Prior art keywords
text
subspace
learning
mode
plant leaf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410067314.7A
Other languages
Chinese (zh)
Other versions
CN117611924B (en
Inventor
王崎
张家伟
吴雪
王亚洲
高珍冉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou University
Original Assignee
Guizhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou University filed Critical Guizhou University
Priority to CN202410067314.7A priority Critical patent/CN117611924B/en
Publication of CN117611924A publication Critical patent/CN117611924A/en
Application granted granted Critical
Publication of CN117611924B publication Critical patent/CN117611924B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a plant leaf phenotype disease classification method based on graphic subspace joint learning, which comprises the following steps: firstly, projecting two modes of image and text projection data into a subspace shared by the modes, learning the commonality of the two modes, and then respectively projecting the two modes into a subspace special for the modes, and obtaining corresponding characteristic representation. These feature representations provide an overall view of the multimodal data for feature fusion to enhance the final classification effect. According to the invention, which disease category of which plant belongs to can be predicted more accurately according to the given text and image, the classification effect of the plant leaf disease is good, and higher accuracy is obtained.

Description

Plant leaf phenotype disease classification method based on graphic subspace joint learning
Technical Field
The invention belongs to the technical field of intelligent processing of digital images, and particularly relates to a plant leaf phenotype disease classification method based on graphic subspace joint learning.
Background
In the traditional plant leaf phenotype disease classification task, the disease on leaves is mainly identified manually, but the method has great limitation, because different disease expression forms on leaves can be greatly different, and an expert in the field is required to accurately classify the disease. However, this classification method is too costly in terms of time and effort, and thus is difficult to be successfully applied to a large-scale classification task. In recent years, with the rapid development of natural language processing technology and computer vision technology, more and more researchers have begun to explore plant leaf phenotype disease classification methods that use text information to aid in images. The keywords related to the diseases are extracted from the text to assist in classifying the images, so that the accuracy and stability of classification are improved, and the method has good effect even on tasks with high fine granularity requirements. In the plant leaf disease classification task, the technology of text-guided image classification can help the classifier to more accurately identify multiple plant and multiple disease types. For example, if a leaf has a yellow brown spot but is not apparent on the image, it can be more easily identified by the classifier with the aid of text. In addition, in the disease identification process, text guidance is used to help the classifier to better and comprehensively understand the characteristics and the expression form of each disease, so that the classification accuracy and the classification robustness are improved. For example, chinese patent publication No. CN 115050014A discloses a small sample tomato disease recognition system and method based on image text learning in 2022, 09, 13, comprising: the system comprises an image classification module, a text classification module and a joint classification module; the image classification module is used for obtaining a first prediction probability of tomato disease types based on the tomato images; the text classification module is used for obtaining a second prediction probability of tomato disease types based on tomato text information; and the joint classification module is used for jointly outputting the first prediction probability and the second prediction probability to obtain disease categories. Although the text mode auxiliary image is used for classification, the identification method is only aimed at disease classification of single plants, the classification quantity is low, universality is not achieved, and the relevance and irrelevance of different modes are ignored, so that detailed features in some complex scenes are not effectively utilized.
In summary, for the current multi-plant multi-disease classification technology, the manual classification method and the single-mode image classification method cannot achieve the effect of high accuracy, so the technology of using text to guide image classification has wide application prospect in plant leaf phenotype disease classification, and provides more effective technical means for health management in the agricultural field in the future.
Disclosure of Invention
The invention aims to overcome the defects and provide the plant leaf phenotype disease classification method based on graphic subspace joint learning, which has good plant leaf disease classification effect and higher accuracy.
The invention discloses a plant leaf phenotype disease classification method based on graph and text subspace joint learning, which comprises the following steps:
step 1, data preprocessing:
traversing the catalog of the data set, firstly arranging the relative paths and label information of each group of samples into a row, writing the row into a csv file, reading the csv file with the written information, sampling the csv file, and performing sampling according to the following steps of 2:8, dividing the test set and the training set in proportion, thereby obtaining test.csv and train.csv files;
step 2, constructing a network model:
in the feature extraction stage, for data of a text mode, performing self-attention coding on input by using a BERT model in deep learning, learning a context relation, obtaining BERT output, and then performing averaging operation on the output in a dimension of 1 by using a mask in the input to obtain a Uttriance corpus level expression of the text input modeThe method comprises the steps of carrying out a first treatment on the surface of the For the visual mode information in the model, adopting a ViT model in deep learning to obtain the utterance corpus level expression of the visual mode>The method comprises the steps of setting the size of an image to be 128, setting the size of each patch to be 16, setting the number of categories to be classified to be 200, setting the dimension of each characterization to be 1024, setting the depth of a model to be 6, setting the number of attention heads to be 16, and setting the dimension of a multi-layer perceptron to be 2048;
in the modal representation learning phase, a framework is used to project text and image modalities into two different subspaces, one of which is modal-invariant and the other is modal-specific, by、 />Andthe three self-encoders pass the text and the representation of the visual mode obtained in the feature extraction stage through the self-encodersAnd->Obtaining +.>And representing a visual modality specificity characterizationText message->And visual information->Input to shared self-encoder->Obtaining +.>And +.about.representing a constant characterization of the visual modality>;/>And->Projecting into the two different subspaces to obtain mode specific information and mode sharing information, and obtaining four vectors +.>Representing the specific and modal invariant representations of the text and visual modes respectively, performing a series operation on them in the dimension dim=0 to obtain a new matrix M, transforming this matrix M using a multi-head attention mechanism, obtaining a new matrix h, the parameters of the multi-head attention mechanism being: the input dimension is 128, head=2; finally, a fusion operation is carried out on the output h, and the fusion network comprises the following layers: first, a full connection layer with an input dimension of 512 and an output dimension of 384; then a drop_rate of 0.5 dropout layer, then a ReLU activation layer, and finally a full connection layer, wherein the input dimension is 384, and the output dimension is 200, so that the final classification effect is obtained;
step 3, model training and loss calculation:
inputting the train.csv file in the plant leaf disease data set obtained in the step (1) into the network model constructed in the step (2), and training the network model, wherein the optimizer selects Adam, and the loss function comprises: similarity lossA differential loss->And a cross entropy loss for classification +.>The method comprises the steps of carrying out a first treatment on the surface of the Similarity loss->Based on the center distance difference (CMD) to calculate, CMD measures the difference between two matrices by matching their moment differences, the value of CMD becomes smaller as the similarity of the two distributions increases, let +.>For interval->On, respectively by probability->Sampling the obtained bounded random sample, then +.>Wherein->Is the desire of the sample, +.>Is->Definition of central moment of->Is->In the training process, set +.>And->Respectively represent +.>Each hidden vector of the modalities, the uncorrelation constant of the modality vector is +.>Wherein->Representing the Frobenius paradigm, the loss is defined asCross entropy loss is used->As a predictive loss function for downstream tasks, the formula is as follows:
the total loss function is as follows:
and 4, training a network to obtain the best accuracy 0.999257.
The plant leaf phenotype disease classification method based on image-text subspace joint learning comprises the following steps: the feature extraction stage described in step (2) also applies 0.1 dropout and emb_dropout.
The plant leaf phenotype disease classification method based on image-text subspace joint learning comprises the following steps: the step (2)、 />And->The self-encoder is three self-encoders which are completely consistent in structure, and comprises a fully-connected layer with 128 input and output and a sigmoid activation layer; />For processing text,/text>To process visual vision information.
The plant leaf phenotype disease classification method based on image-text subspace joint learning comprises the following steps: the method for projecting the image to the subspace with unchanged mode in the step (2) comprises the following steps: the corpus vector is processed through a full-connection layer with 128 inputs and outputs and a sigmoid activation layer.
The plant leaf phenotype disease classification method based on image-text subspace joint learning comprises the following steps: the method of projecting to the modality-specific subspace in the step (2) is as follows: first, each corpus vector is encoded, a full-connected layer with 128 output neurons is used for encoding, and then the encoded result is mapped to a range between 0 and 1 through a sigmoid activation function.
Compared with the prior art, the invention has obvious beneficial effects, and the technical scheme can be known from the following: the invention uses a plant leaf disease data set which is truly photographed from the field, the plant types are rich, the disease types are comprehensive, the adopted framework is to project each mode into two different subspaces, one subspace is mode-unchanged, and the cross-mode representation learns the commonality of the plant leaf disease data set and reduces the mode gap. The other subspace is modality-specific, it is private to each modality, and captures its features. By training a network model with good plant leaf disease classification effect through the framework, which disease class of which plant belongs to can be predicted more accurately according to given text and images, and higher accuracy is obtained. The invention uses the image-text data set of the true plant leaf diseases for model training, has more practicability in the application of the agricultural field, and also provides assistance for the research of the agricultural diseases.
Drawings
FIG. 1 is a schematic diagram of a plant leaf phenotype disease classification network according to the present invention;
FIG. 2 is a graph showing the results of classifying plant leaf phenotype diseases according to the present invention.
Detailed Description
The following is a detailed description of specific embodiments, structures, features and effects of a plant leaf phenotype disease classification method based on graphic subspace joint learning according to the present invention with reference to fig. 1 and fig. 2.
The invention discloses a plant leaf phenotype disease classification method based on graph and text subspace joint learning, which comprises the following steps:
step 1, data preprocessing:
traversing the catalog of the data set, firstly arranging the relative paths and label information of each group of samples into a row, writing the row into a csv file, reading the csv file with the written information, sampling the csv file, and performing sampling according to the following steps of 2:8, dividing a test set (the test set is used for evaluating model performance) and a training set (the training set is used for optimizing model super-parameters and model selection), thereby obtaining test.csv and train.csv files;
step 2, constructing a network model:
in the feature extraction stage, for text modal data, self-attention coding is carried out on the input by using a BERT model in deep learning, rich semantic representation is obtained by learning context relation, after BERT output is obtained, the mask in the input is used for carrying out average value solving operation on the output in the dimension of 1, and the user corpus level expression of the text input mode (more abstract and global) is obtainedThereby better capturing the characteristics and semantic information of the text data; for the visual modality information in the model, a ViT model in deep learning is adopted. By using the ViT model, a representation of the image can be generated, resulting in a utarence of the visual modalityCorpus-level expression->In order to control the performance of the ViT model, some relevant parameters are set. The size of the image is set to 128, the size of each patch is 16, the number of categories to be classified is 200, the dimension of each characterization is 1024, the depth of the model is 6, the number of attention heads is 16, the dimension of the multi-layer perceptron is 2048, and the dropout and the emb_dropout of 0.1 are also applied to improve the generalization capability of the model.
In the modal representation learning phase, according to the modal representation learning in fig. 1, one framework is used to project text and image modalities into two different subspaces, one of which is modal-invariant, aiming to learn commonalities of the two modalities across modalities and reduce differences between them. The other subspace being modality-specific for capturing features of each modality by、/>And->The three self-encoders represent the text and visual modalities obtained in the feature extraction stage, wherein: />、/>And->The self-encoder is three self-encoders which are completely consistent in structure, and comprises a fully-connected layer with 128 input and output and a sigmoid activation layer; />For processing the text of the text,for processing visual vision information via a self-encoder +.>And->The content obtained is +.>And +.about.representing a characterization of the specificity of the visual modality>,/>Is a shared self-encoder for processing text and visual information simultaneously; text message +.>And visual information->Input to shared self-encoder->The content obtained is +.>And +.about.representing a constant characterization of the visual modality>(learning the shared representation in a common subspace with distributed similarity constraints helps to minimize the heterogeneous gap, which is an ideal property for multimodal fusion),>and->Projected into these two different subspaces to obtain modality specific information and modality sharing information. The method of projection to the modality-specific subspace is: first, each corpus vector is encoded, a full-connected layer with 128 output neurons is used for encoding, and then the encoded result is mapped to a range between 0 and 1 through a sigmoid activation function. Such operations may effectively model and represent particular modality information; the method of projecting to the unchanged subspace of the mode (processing text and visual information simultaneously) is as follows: the corpus vector is processed through a full-connection layer with 128 inputs and outputs and a sigmoid activation layer. This portion of the operation helps to minimize the heterogeneous gap between modalities in order to better capture the commonality features between modalities. Through such a modality representation learning process, content can be described in more detail while retaining semantics, information of text and image modalities is projected into different subspaces, characteristics and commonalities of modalities are captured at the same time, and differences between modalities are minimized, thereby achieving better modality representation learning. By the preceding steps, four vectors are obtained +.>(thus, every corpus vector +.>Projection onto two different representations, namely the two representations of modality invariance and modality specificity, whose presence provides the possibility of effectively fusing the desired features, represent respectively the specific and modality invariance representations of the text and of the visual modality, which are operated in series in the dimension dim=0 in order to integrate them further, thus obtaining a new matrix M, which is transformed using a multi-headed attention mechanism, obtaining a new matrix h (in so doing allowing each representation to induce from the other representation the underlying information that has a synergistic effect on the task as a whole). The multi-head attention mechanism can fully capture the relation between different attention heads and extract a richer characteristic representation. The parameters of the multi-head attention mechanism are: conveying deviceThe ingress dimension is 128, head=2; and finally, carrying out one-time fusion operation on the output h. This converged network includes the following layers: first, a full connection layer with an input dimension of 512 and an output dimension of 384; a drop_rate of 0.5 drop out layer follows to reduce the risk of model overfitting. Then a ReLU activation layer is used to introduce nonlinear features. Finally, a full connection layer with 384 input dimension and 200 output dimension is provided, and the function of the fusion network is to further fuse and map the features obtained before so as to obtain the final classification effect;
step 3, model training and loss calculation:
inputting the train.csv file in the plant leaf disease data set obtained in the step (1) into the network model constructed in the step (2), and training the network model, wherein the optimizer selects Adam, and the loss function comprises: similarity lossA differential loss->And a cross entropy loss for classification +.>The method comprises the steps of carrying out a first treatment on the surface of the The purpose of introducing similarity loss is to reduce the difference between each modality representation, helping to unify common cross-modality features in a shared subspace, similarity loss ∈ ->Based on the center distance difference (CMD) to calculate, CMD measures the difference between two matrices by matching their moment differences, the value of CMD becomes smaller as the similarity of the two distributions increases, let +.>For interval->On, respectively by probability->Sampling the obtained bounded random sample, then +.>Wherein->Is the desire of the sample, +.>Is->Definition of central moment of->Is->;/>Is to ensure the modal invariance and modal specificity characterization, the input characteristics can be captured from different angles, and in the training process, the +.>And->Respectively represent +.>Each hidden vector of the modalities, the uncorrelation constant of the modality vector is +.>Wherein->Representing the Frobenius paradigm, the loss is defined as +.>Cross entropy loss is used->As a predictive loss function for downstream tasks, the formula is as follows:
the total loss function is as follows:
step 4. The best accuracy obtained by training the network is 0.999257 (see effect shown in the last line in fig. 2).
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any simple modification, equivalent variation and variation of the above embodiment according to the technical matter of the present invention still fall within the scope of the technical scheme of the present invention.

Claims (5)

1. A plant leaf phenotype disease classification method based on image-text subspace joint learning comprises the following steps:
step 1, data preprocessing:
traversing the catalog of the data set, firstly arranging the relative paths and label information of each group of samples into a row, writing the row into a csv file, reading the csv file with the written information, sampling the csv file, and performing sampling according to the following steps of 2:8, dividing the test set and the training set in proportion, thereby obtaining test.csv and train.csv files;
step 2, constructing a network model:
in the feature extraction stage, for data of a text mode, performing self-attention coding on input by using a BERT model in deep learning, learning a context relation, obtaining BERT output, and then performing averaging operation on the output in a dimension of 1 by using a mask in the input to obtain a Uttriance corpus level expression of the text input modeThe method comprises the steps of carrying out a first treatment on the surface of the For the visual mode information in the model, adopting a ViT model in deep learning to obtain the utterance corpus level expression of the visual mode>The method comprises the steps of setting the size of an image to be 128, setting the size of each patch to be 16, setting the number of categories to be classified to be 200, setting the dimension of each characterization to be 1024, setting the depth of a model to be 6, setting the number of attention heads to be 16, and setting the dimension of a multi-layer perceptron to be 2048;
in the modal representation learning phase, a framework is used to project text and image modalities into two different subspaces, one of which is modal-invariant and the other is modal-specific, by、 />Andthe three self-encoders pass the text and the representation of the visual mode obtained in the feature extraction stage through the self-encodersAnd->Obtaining +.>And +.about.representing a characterization of the specificity of the visual modality>Text message->And visual information->Input to shared self-encoder->Obtaining the representation of the unchanged representation of the text modeAnd +.about.representing a constant characterization of the visual modality>;/>And->Projecting into the two different subspaces to obtain mode specific information and mode sharing information, and obtaining four vectors +.>Representing the specific and modal invariant representations of the text and visual modes respectively, performing a series operation on them in the dimension dim=0 to obtain a new matrix M, transforming this matrix M using a multi-head attention mechanism, obtaining a new matrix h, the parameters of the multi-head attention mechanism being: the input dimension is 128, head=2; finally, a fusion operation is carried out on the output h, and the fusion network comprises the following layers: first, a full connection layer with an input dimension of 512 and an output dimension of 384; then a drop_rate of 0.5 dropout layer, then a ReLU activation layer, and finally a full connection layer, wherein the input dimension is 384, and the output dimension is 200, so that the final classification effect is obtained;
step 3, model training and loss calculation:
the train.csv file in the plant leaf disease data set obtained in the step (1) is used for obtainingInputting the model into the network model constructed in the step (2), and training the network model, wherein the optimizer selects Adam, and the loss function comprises: similarity lossA differential loss->And a cross entropy loss for classification +.>The method comprises the steps of carrying out a first treatment on the surface of the Similarity loss->Based on the center distance difference (CMD) to calculate, CMD measures the difference between two matrices by matching their moment differences, the value of CMD becomes smaller as the similarity of the two distributions increases, let +.>For interval->On, respectively by probability->Sampling the obtained bounded random sample, then +.>Wherein->Is the desire of the sample, +.>Is->Definition of central moment of->Is that;/>In the training process, set +.>And->Respectively represent +.>Each hidden vector of the modalities, the uncorrelation constant of the modality vector is +.>Wherein->Representing the Frobenius paradigm, the loss is defined as +.>Cross entropy loss is used->As a predictive loss function for downstream tasks, the formula is as follows:
the total loss function is as follows:
and 4, training a network to obtain the best accuracy 0.999257.
2. The plant leaf phenotype disease classification method based on graphic subspace joint learning of claim 1, wherein: the feature extraction stage described in step 2 applies a dropout and an emb_dropout of 0.1.
3. The plant leaf phenotype disease classification method based on graphic subspace joint learning of claim 1, wherein: described in step 2、 />And->The self-encoder is three self-encoders which are completely consistent in structure, and comprises a fully-connected layer with 128 input and output and a sigmoid activation layer; />For processing the text of the text,to process visual vision information.
4. The plant leaf phenotype disease classification method based on graphic subspace joint learning according to claim 1, wherein in the mode representation learning stage in step 2, a framework is used to project text and image modes into two different subspaces, one subspace is mode-invariant, and the method of projecting the text and image modes into the mode-invariant subspace is as follows: the corpus vector is processed through a full-connection layer with 128 inputs and outputs and a sigmoid activation layer.
5. A plant leaf phenotype disease classification method based on joint learning of graphic subspaces as claimed in claim 1 or 4, wherein in the stage of learning the modal representation in step 2, a framework is used to project text and image modalities into two different subspaces, one subspace is modality-invariant, the other subspace is modality-specific, and the method of projection into modality-specific subspaces is as follows: first, each corpus vector is encoded, a full-connected layer with 128 output neurons is used for encoding, and then the encoded result is mapped to a range between 0 and 1 through a sigmoid activation function.
CN202410067314.7A 2024-01-17 2024-01-17 Plant leaf phenotype disease classification method based on graphic subspace joint learning Active CN117611924B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410067314.7A CN117611924B (en) 2024-01-17 2024-01-17 Plant leaf phenotype disease classification method based on graphic subspace joint learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410067314.7A CN117611924B (en) 2024-01-17 2024-01-17 Plant leaf phenotype disease classification method based on graphic subspace joint learning

Publications (2)

Publication Number Publication Date
CN117611924A true CN117611924A (en) 2024-02-27
CN117611924B CN117611924B (en) 2024-04-09

Family

ID=89958138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410067314.7A Active CN117611924B (en) 2024-01-17 2024-01-17 Plant leaf phenotype disease classification method based on graphic subspace joint learning

Country Status (1)

Country Link
CN (1) CN117611924B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018219198A1 (en) * 2017-06-02 2018-12-06 腾讯科技(深圳)有限公司 Man-machine interaction method and apparatus, and man-machine interaction terminal
WO2019148898A1 (en) * 2018-02-01 2019-08-08 北京大学深圳研究生院 Adversarial cross-media retrieving method based on restricted text space
US20200183989A1 (en) * 2018-12-10 2020-06-11 Ebay Inc. Generating app or web pages via extracting interest from images
CN114168784A (en) * 2021-12-10 2022-03-11 桂林电子科技大学 Layered supervision cross-modal image-text retrieval method
US20220245391A1 (en) * 2021-01-28 2022-08-04 Adobe Inc. Text-conditioned image search based on transformation, aggregation, and composition of visio-linguistic features
CN115048537A (en) * 2022-07-11 2022-09-13 河北农业大学 Disease recognition system based on image-text multi-mode collaborative representation
CN115050014A (en) * 2022-06-15 2022-09-13 河北农业大学 Small sample tomato disease identification system and method based on image text learning
CN115131627A (en) * 2022-07-01 2022-09-30 贵州大学 Construction and training method of lightweight plant disease and insect pest target detection model
WO2022257578A1 (en) * 2021-06-07 2022-12-15 京东科技信息技术有限公司 Method for recognizing text, and apparatus
CN115984842A (en) * 2023-02-13 2023-04-18 广州数说故事信息科技有限公司 Multi-mode-based video open tag extraction method
CN116258989A (en) * 2023-01-10 2023-06-13 南京邮电大学 Text and vision based space-time correlation type multi-modal emotion recognition method and system
US20230196633A1 (en) * 2021-07-09 2023-06-22 Nanjing University Of Posts And Telecommunications Method of image reconstruction for cross-modal communication system and device thereof
CN116824366A (en) * 2023-06-14 2023-09-29 天津商业大学 Crop disease identification method based on local selection and feature interaction
CN116842475A (en) * 2023-06-30 2023-10-03 东航技术应用研发中心有限公司 Fatigue driving detection method based on multi-mode information fusion
CN116843952A (en) * 2023-06-06 2023-10-03 中国海洋大学 Small sample learning classification method for fruit and vegetable disease identification
US20230342550A1 (en) * 2018-06-06 2023-10-26 Nippon Telegraph And Telephone Corporation Degree of difficulty estimating device, and degree of difficulty estimating model learning device, method, and program

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018219198A1 (en) * 2017-06-02 2018-12-06 腾讯科技(深圳)有限公司 Man-machine interaction method and apparatus, and man-machine interaction terminal
WO2019148898A1 (en) * 2018-02-01 2019-08-08 北京大学深圳研究生院 Adversarial cross-media retrieving method based on restricted text space
US20230342550A1 (en) * 2018-06-06 2023-10-26 Nippon Telegraph And Telephone Corporation Degree of difficulty estimating device, and degree of difficulty estimating model learning device, method, and program
US20200183989A1 (en) * 2018-12-10 2020-06-11 Ebay Inc. Generating app or web pages via extracting interest from images
US20220245391A1 (en) * 2021-01-28 2022-08-04 Adobe Inc. Text-conditioned image search based on transformation, aggregation, and composition of visio-linguistic features
WO2022257578A1 (en) * 2021-06-07 2022-12-15 京东科技信息技术有限公司 Method for recognizing text, and apparatus
US20230196633A1 (en) * 2021-07-09 2023-06-22 Nanjing University Of Posts And Telecommunications Method of image reconstruction for cross-modal communication system and device thereof
CN114168784A (en) * 2021-12-10 2022-03-11 桂林电子科技大学 Layered supervision cross-modal image-text retrieval method
CN115050014A (en) * 2022-06-15 2022-09-13 河北农业大学 Small sample tomato disease identification system and method based on image text learning
CN115131627A (en) * 2022-07-01 2022-09-30 贵州大学 Construction and training method of lightweight plant disease and insect pest target detection model
CN115048537A (en) * 2022-07-11 2022-09-13 河北农业大学 Disease recognition system based on image-text multi-mode collaborative representation
CN116258989A (en) * 2023-01-10 2023-06-13 南京邮电大学 Text and vision based space-time correlation type multi-modal emotion recognition method and system
CN115984842A (en) * 2023-02-13 2023-04-18 广州数说故事信息科技有限公司 Multi-mode-based video open tag extraction method
CN116843952A (en) * 2023-06-06 2023-10-03 中国海洋大学 Small sample learning classification method for fruit and vegetable disease identification
CN116824366A (en) * 2023-06-14 2023-09-29 天津商业大学 Crop disease identification method based on local selection and feature interaction
CN116842475A (en) * 2023-06-30 2023-10-03 东航技术应用研发中心有限公司 Fatigue driving detection method based on multi-mode information fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯徐广: "基于图像和文本的多模态蔬菜叶部病害识别研究", 《河北农业大学论文集》, 31 August 2023 (2023-08-31), pages 1 - 52 *
李耀羲等: "CSNet:基于多尺度MLP-Mixer 的数量监督麦穗计数方法", 第二十届中国作物学会学术年会论文摘要集, 1 November 2023 (2023-11-01), pages 434 *

Also Published As

Publication number Publication date
CN117611924B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
Wen et al. Ensemble of deep neural networks with probability-based fusion for facial expression recognition
CN112307995B (en) Semi-supervised pedestrian re-identification method based on feature decoupling learning
CN111898736A (en) Efficient pedestrian re-identification method based on attribute perception
CN112732921B (en) False user comment detection method and system
Hou et al. Distilling knowledge from object classification to aesthetics assessment
CN114936623B (en) Aspect-level emotion analysis method integrating multi-mode data
Oyewole et al. Product image classification using Eigen Colour feature with ensemble machine learning
Ocquaye et al. Dual exclusive attentive transfer for unsupervised deep convolutional domain adaptation in speech emotion recognition
Rao et al. Exploring deep learning techniques for kannada handwritten character recognition: A boon for digitization
Ye et al. A joint-training two-stage method for remote sensing image captioning
Ribeiro et al. Deep learning in digital marketing: brand detection and emotion recognition
Zhou et al. Semantic adaptation network for unsupervised domain adaptation
Wu et al. Sentimental visual captioning using multimodal transformer
Wang et al. R2-trans: Fine-grained visual categorization with redundancy reduction
Yao [Retracted] Application of Higher Education Management in Colleges and Universities by Deep Learning
CN113221680B (en) Text pedestrian retrieval method based on text dynamic guiding visual feature extraction
Alrowais et al. Modified earthworm optimization with deep learning assisted emotion recognition for human computer interface
Ma et al. Multi-scale cooperative multimodal transformers for multimodal sentiment analysis in videos
Hou et al. Confidence-guided self refinement for action prediction in untrimmed videos
Jadhav et al. Content based facial emotion recognition model using machine learning algorithm
CN117611924B (en) Plant leaf phenotype disease classification method based on graphic subspace joint learning
CN116958677A (en) Internet short video classification method based on multi-mode big data
Saffari et al. Low-rank sparse generative adversarial unsupervised domain adaptation for multi-target traffic scene semantic segmentation
CN114936279A (en) Unstructured chart data analysis method for collaborative manufacturing enterprise
Raut et al. Mood-Based Emotional Analysis for Music Recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant