GB2616316A - Neural network training technique - Google Patents

Neural network training technique Download PDF

Info

Publication number
GB2616316A
GB2616316A GB2204314.5A GB202204314A GB2616316A GB 2616316 A GB2616316 A GB 2616316A GB 202204314 A GB202204314 A GB 202204314A GB 2616316 A GB2616316 A GB 2616316A
Authority
GB
United Kingdom
Prior art keywords
neural network
dataset
image
processor
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2204314.5A
Other versions
GB202204314D0 (en
Inventor
Hatamizadeh Ali
Xu Daguang
Wang Xiaosong
Tam Lickkong
Bhalodia Riddhish
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nvidia Corp
Original Assignee
Nvidia Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nvidia Corp filed Critical Nvidia Corp
Priority claimed from PCT/US2022/018217 external-priority patent/WO2022187167A1/en
Publication of GB202204314D0 publication Critical patent/GB202204314D0/en
Publication of GB2616316A publication Critical patent/GB2616316A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Train Traffic Observation, Control, And Security (AREA)

Abstract

Apparatuses, systems, and techniques to train a neural network to infer a condition based on an image. In at least one embodiment, a first portion of a neural network is trained to infer a condition from an image using a first dataset, and a second portion of the neural network is trained using a second dataset.

Claims (31)

  1. CLAIMS WHAT IS CLAIMED IS: 1. A processor, comprising: one or more circuits to train a first portion of a neural network using a first dataset and a second portion of the neural network using a second dataset.
  2. 2. The processor of claim 1, wherein the first and second portions of the neural network are trained in parallel, and wherein the second portion of the neural network is taught during training to provide a ground truth for training the first portion of the neural network.
  3. 3. The processor of claim 1, wherein the first and second portions of the neural network are trained in parallel to encode features of the first and second datasets to a shared latent space.
  4. 4. The processor of claim 1, wherein the first dataset comprises image data and the second dataset comprises textual descriptions of corresponding image data in the first dataset.
  5. 5. The processor of claim 1, the neural network comprising a cross-attention encoder, wherein a query input to the cross-attention encoder comprises output from the second portion of the neural network, and wherein key and value input to the cross-attention encoder comprises output from the first portion of the neural network.
  6. 6. The processor of claim 1, the neural network comprising a decoder to generate a saliency map based, at least in part, on output of a cross-attention encoder.
  7. 7. The processor of claim 1, wherein the first dataset comprises an image and the second dataset comprises a textual document, and wherein output of the neural network comprises a classification of a condition depicted in the image and described in the textual document.
  8. 8. The processor of claim 1, wherein output of the neural network comprises information identifying a condition depicted in an image
  9. 9. A system, comprising: one or more processors to train a first portion of a neural network using a first dataset and a second portion of the neural network using a second dataset
  10. 10. The system of claim 9, wherein the first and second portions of the neural network are trained in parallel, and wherein the second portion of the neural network is taught during training to provide information for training the first portion of the neural network
  11. 11. The system of claim 9, wherein the first and second portions of the neural network are trained in parallel to encode features of the first and second datasets to a shared latent space
  12. 12. The system of claim 9, wherein the first dataset comprises an image and the second dataset comprises a description of the image
  13. 13. The system of claim 9, wherein the neural network comprises a cross-attention encoder, wherein a query input to the cross-attention encoder comprises output from the second portion of the neural network, and wherein key and value input to the cross-attention encoder comprises output from the first portion of the neural network
  14. 14. The system of claim 9, the neural network comprising a decoder to generate information indicative of a region of an image
  15. 15. The system of claim 9, wherein output of the neural network comprises a classification of a condition depicted in an image
  16. 16. The system of claim 9, wherein the first dataset comprises a diagnostic image and the second dataset comprises a diagnostic report corresponding to the diagnostic image .
  17. 17. A processor comprising: one or more circuits to use a neural network to infer information about a first dataset based, at least in part, on a second dataset.
  18. 18. The processor of claim 17, wherein a first portion of the neural network is trained to encode features of image data in the first dataset and a second portion of the neural network is trained to encode features of textual data in the second dataset
  19. 19. The processor of claim 18, wherein the first portion of the neural network, and the second portion of the neural network, encode their respective inputs to a common latent space
  20. 20. The processor of claim 17, wherein the neural network is trained based, at least in part, on output of a cross-attention encoder using, as input to the cross-attention encoder, output of an image encoder and output of a language encoder
  21. 21. The processor of claim 17, wherein the first dataset comprises diagnostic images and the second dataset comprises diagnostic reports corresponding to the diagnostic images
  22. 22. The processor of claim 17, wherein the inferred information comprises information indicative of an area of interest in an image
  23. 23. The processor of claim 17, wherein a first portion of the neural network is trained to encode features of image data in the first dataset and a second portion of the neural network is trained to encode features of textual data in the second dataset, and wherein the first portion of the neural network, after training, is capable of inferring the information independently of the second portion
  24. 24. A method, comprising: training a neural network to diagnose a condition depicted in a diagnostic image, based at least in part on a first dataset comprising a set of diagnostic images and a second dataset comprising a set of diagnostic reports corresponding to diagnostic images in the set of diagnostic images .
  25. 25. The method of claim 24, wherein a first portion of the neural network is trained in parallel with a second portion of the neural network, and wherein the second portion of the neural network is trained to encode features of the diagnostic reports.
  26. 26. The method of claim 25, wherein the first and second portions of the neural network are trained to encode features of the first and second datasets to a shared latent space
  27. 27. The method of claim 24, further comprising: providing, as input to a cross-attention encoder, a query input comprising output from a language encoder, and key and value input comprising output from an image encoder
  28. 28. The method of claim 24, further comprising: training a language encoder of the neural network to encode features of the diagnostic reports to a latent space shared with output of an image encoder
  29. 29. The method of claim 24, further comprising: decoding output of an encoder to generate information summarizing the condition
  30. 30. The method of claim 24, wherein the neural network comprises a decoder to generate information indicative of a region in the diagnostic image that depicts the condition .
  31. 31. The method of claim 24, wherein diagnoses of the condition comprises identifying one or more categories of conditions determined, by the neural network, to be associated with a region of the diagnostic image.
GB2204314.5A 2022-02-28 2022-02-28 Neural network training technique Pending GB2616316A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2022/018217 WO2022187167A1 (en) 2021-03-01 2022-02-28 Neural network training technique

Publications (2)

Publication Number Publication Date
GB202204314D0 GB202204314D0 (en) 2022-05-11
GB2616316A true GB2616316A (en) 2023-09-06

Family

ID=81449445

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2204314.5A Pending GB2616316A (en) 2022-02-28 2022-02-28 Neural network training technique

Country Status (1)

Country Link
GB (1) GB2616316A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180350459A1 (en) * 2017-06-05 2018-12-06 University Of Florida Research Foundation, Inc. Methods and apparatuses for implementing a semantically and visually interpretable medical diagnosis network
CN111985369A (en) * 2020-08-07 2020-11-24 西北工业大学 Course field multi-modal document classification method based on cross-modal attention convolution neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180350459A1 (en) * 2017-06-05 2018-12-06 University Of Florida Research Foundation, Inc. Methods and apparatuses for implementing a semantically and visually interpretable medical diagnosis network
CN111985369A (en) * 2020-08-07 2020-11-24 西北工业大学 Course field multi-modal document classification method based on cross-modal attention convolution neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
RIDDHISH BHALODIA ET AL,"Improving Pneumonia Localization via Cross-Attention on Medical Images and Reports", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, (2021-10-06), the whole document *
WEI XI ET AL,"Multi-Modality Cross Attention Network for Image and Sentence Matching",2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, (2020-06-13), pages 10938-10947, doi:10.1109/CVPR42600 .2020.01095, [ret on 2020-08-03] pg 10938 pg 10945, right-hand column, para 1 *
XIAOSONG WANG ET AL,"TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays", ARXIV. ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, (2018-01-12), the whole document *

Also Published As

Publication number Publication date
GB202204314D0 (en) 2022-05-11

Similar Documents

Publication Publication Date Title
AU2019200270B2 (en) Concept mask: large-scale segmentation from semantic concepts
US10817521B2 (en) Near-real-time prediction, classification, and notification of events in natural language systems
US20190318099A1 (en) Using Gradients to Detect Backdoors in Neural Networks
US11640527B2 (en) Near-zero-cost differentially private deep learning with teacher ensembles
KR102011788B1 (en) Visual Question Answering Apparatus Using Hierarchical Visual Feature and Method Thereof
WO2021051497A1 (en) Pulmonary tuberculosis determination method and apparatus, computer device, and storage medium
CN112400187A (en) Knockout autoencoder for detecting anomalies in biomedical images
US20220230061A1 (en) Modality adaptive information retrieval
US11853706B2 (en) Generative language model for few-shot aspect-based sentiment analysis
US20200321101A1 (en) Rule out accuracy for detecting findings of interest in images
CN116468746B (en) Bidirectional copy-paste semi-supervised medical image segmentation method
US20230281390A1 (en) Systems and methods for enhanced review comprehension using domain-specific knowledgebases
JP2019511797A (en) INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND PROGRAM
CN113780365B (en) Sample generation method and device
CN116844731A (en) Disease classification method, disease classification device, electronic device, and storage medium
US11113466B1 (en) Generating sentiment analysis of content
Wang et al. SERR‐U‐Net: Squeeze‐and‐Excitation Residual and Recurrent Block‐Based U‐Net for Automatic Vessel Segmentation in Retinal Image
Patel et al. PTXNet: An extended UNet model based segmentation of pneumothorax from chest radiography images
GB2616316A (en) Neural network training technique
TWI742312B (en) Machine learning system, machine learning method and non-transitory computer readable medium for operating the same
US20220027688A1 (en) Image identification device, method for performing semantic segmentation, and storage medium
Geldenhuys et al. Deep learning approaches to landmark detection in tsetse wing images
CN112785001B (en) Artificial intelligence educational back-province robot for overcoming discrimination and prejudice
Ahn et al. MMRR: Unsupervised Anomaly Detection through Multi-Level Masking and Restoration with Refinement
Ishikawa et al. Saliency prediction based on object recognition and gaze analysis