GB2616316A

GB2616316A - Neural network training technique

Info

Publication number: GB2616316A
Application number: GB2204314.5A
Authority: GB
Inventors: Hatamizadeh Ali; Xu Daguang; Wang Xiaosong; Tam Lickkong; Bhalodia Riddhish
Original assignee: Nvidia Corp
Current assignee: Nvidia Corp
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2023-09-06
Also published as: GB202204314D0

Abstract

Apparatuses, systems, and techniques to train a neural network to infer a condition based on an image. In at least one embodiment, a first portion of a neural network is trained to infer a condition from an image using a first dataset, and a second portion of the neural network is trained using a second dataset.

Claims

CLAIMS WHAT IS CLAIMED IS: 1. A processor, comprising: one or more circuits to train a first portion of a neural network using a first dataset and a second portion of the neural network using a second dataset.
2. The processor of claim 1, wherein the first and second portions of the neural network are trained in parallel, and wherein the second portion of the neural network is taught during training to provide a ground truth for training the first portion of the neural network.
3. The processor of claim 1, wherein the first and second portions of the neural network are trained in parallel to encode features of the first and second datasets to a shared latent space.
4. The processor of claim 1, wherein the first dataset comprises image data and the second dataset comprises textual descriptions of corresponding image data in the first dataset.
5. The processor of claim 1, the neural network comprising a cross-attention encoder, wherein a query input to the cross-attention encoder comprises output from the second portion of the neural network, and wherein key and value input to the cross-attention encoder comprises output from the first portion of the neural network.
6. The processor of claim 1, the neural network comprising a decoder to generate a saliency map based, at least in part, on output of a cross-attention encoder.
7. The processor of claim 1, wherein the first dataset comprises an image and the second dataset comprises a textual document, and wherein output of the neural network comprises a classification of a condition depicted in the image and described in the textual document.
8. The processor of claim 1, wherein output of the neural network comprises information identifying a condition depicted in an image
9. A system, comprising: one or more processors to train a first portion of a neural network using a first dataset and a second portion of the neural network using a second dataset
10. The system of claim 9, wherein the first and second portions of the neural network are trained in parallel, and wherein the second portion of the neural network is taught during training to provide information for training the first portion of the neural network
11. The system of claim 9, wherein the first and second portions of the neural network are trained in parallel to encode features of the first and second datasets to a shared latent space
12. The system of claim 9, wherein the first dataset comprises an image and the second dataset comprises a description of the image
13. The system of claim 9, wherein the neural network comprises a cross-attention encoder, wherein a query input to the cross-attention encoder comprises output from the second portion of the neural network, and wherein key and value input to the cross-attention encoder comprises output from the first portion of the neural network
14. The system of claim 9, the neural network comprising a decoder to generate information indicative of a region of an image
15. The system of claim 9, wherein output of the neural network comprises a classification of a condition depicted in an image
16. The system of claim 9, wherein the first dataset comprises a diagnostic image and the second dataset comprises a diagnostic report corresponding to the diagnostic image .
17. A processor comprising: one or more circuits to use a neural network to infer information about a first dataset based, at least in part, on a second dataset.
18. The processor of claim 17, wherein a first portion of the neural network is trained to encode features of image data in the first dataset and a second portion of the neural network is trained to encode features of textual data in the second dataset
19. The processor of claim 18, wherein the first portion of the neural network, and the second portion of the neural network, encode their respective inputs to a common latent space
20. The processor of claim 17, wherein the neural network is trained based, at least in part, on output of a cross-attention encoder using, as input to the cross-attention encoder, output of an image encoder and output of a language encoder
21. The processor of claim 17, wherein the first dataset comprises diagnostic images and the second dataset comprises diagnostic reports corresponding to the diagnostic images
22. The processor of claim 17, wherein the inferred information comprises information indicative of an area of interest in an image
23. The processor of claim 17, wherein a first portion of the neural network is trained to encode features of image data in the first dataset and a second portion of the neural network is trained to encode features of textual data in the second dataset, and wherein the first portion of the neural network, after training, is capable of inferring the information independently of the second portion
24. A method, comprising: training a neural network to diagnose a condition depicted in a diagnostic image, based at least in part on a first dataset comprising a set of diagnostic images and a second dataset comprising a set of diagnostic reports corresponding to diagnostic images in the set of diagnostic images .
25. The method of claim 24, wherein a first portion of the neural network is trained in parallel with a second portion of the neural network, and wherein the second portion of the neural network is trained to encode features of the diagnostic reports.
26. The method of claim 25, wherein the first and second portions of the neural network are trained to encode features of the first and second datasets to a shared latent space
27. The method of claim 24, further comprising: providing, as input to a cross-attention encoder, a query input comprising output from a language encoder, and key and value input comprising output from an image encoder
28. The method of claim 24, further comprising: training a language encoder of the neural network to encode features of the diagnostic reports to a latent space shared with output of an image encoder
29. The method of claim 24, further comprising: decoding output of an encoder to generate information summarizing the condition
30. The method of claim 24, wherein the neural network comprises a decoder to generate information indicative of a region in the diagnostic image that depicts the condition .
31. The method of claim 24, wherein diagnoses of the condition comprises identifying one or more categories of conditions determined, by the neural network, to be associated with a region of the diagnostic image.