CN112085718B - NAFLD ultrasonic video diagnosis system based on twin attention network - Google Patents
NAFLD ultrasonic video diagnosis system based on twin attention network Download PDFInfo
- Publication number
- CN112085718B CN112085718B CN202010924390.7A CN202010924390A CN112085718B CN 112085718 B CN112085718 B CN 112085718B CN 202010924390 A CN202010924390 A CN 202010924390A CN 112085718 B CN112085718 B CN 112085718B
- Authority
- CN
- China
- Prior art keywords
- attention
- module
- loss
- twin
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/245—Classification techniques relating to the decision surface
- G06F18/2451—Classification techniques relating to the decision surface linear, e.g. hyperplane
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10132—Ultrasound image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30056—Liver; Hepatic
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Quality & Reliability (AREA)
- Ultra Sonic Daignosis Equipment (AREA)
Abstract
The invention discloses a NAFLD ultrasonic video diagnosis system based on a twin attention network, which consists of two twin attention subnetworks with the same structure and sharing weight and a loss function, wherein the twin attention subnetworks consist of a double-flow feature extraction module, a linear classification module and a context attention module, and the loss function consists of binary cross entropy loss (BCE), Contrast Similarity Loss (CSL) and Contrast Difference Loss (CDL). The invention adds the double-current feature extraction module on the basis of the twin attention network and introduces the loss function, so that the NAFLD ultrasonic video diagnosis system achieves the accuracy of 90.56 percent, the specificity of 88.26 percent and the sensitivity of 93.58 percent, and an efficient and feasible method is provided for NAFLD ultrasonic video diagnosis.
Description
Technical Field
The invention relates to the technical field of short video processing, in particular to a NAFLD ultrasonic video diagnosis system based on a twin attention network.
Background
Early screening of non-alcoholic fatty liver disease (NAFLD) helps patients to prevent irreversible advanced liver disease, but manual diagnosis of NAFLD's ultrasound video requires physicians to review lengthy videos, which is both cumbersome and time consuming in clinical practice. Therefore, the method of deep learning can be utilized to realize the automatic diagnosis of NAFLD in the ultrasonic video so as to improve the diagnosis efficiency.
The most major problems facing the task of NAFLD diagnosis in ultrasound video are interference of extraneous information and poor characterization of the low quality of the ultrasound itself.
Disclosure of Invention
In order to solve the problems, the invention provides a NAFLD ultrasonic video diagnosis system based on a twin attention network, so as to realize efficient automatic diagnosis of NAFLD.
The invention adopts the following technical scheme:
a NAFLD ultrasonic video diagnosis system based on a twin attention network is composed of two twin attention subnetworks which are identical in structure and share weight, and a loss function, wherein the twin attention subnetworks are composed of a double-current feature extraction module, a linear classification module and a context attention module, and the loss function is composed of binary cross entropy loss, contrast similarity loss and contrast difference loss.
Further, the double-flow feature extraction module comprises a sharing module, a classification module and an attention module; the double-flow feature extraction module is used for extracting different features of classification and attention.
Further, the sharing module is to extract low-level features shared by the classification module and the attention module; the classification module is used for extracting features of a high level to generate a classification; the attention module is used to extract features of a high level to generate attention.
Further, V ═ I for a given videot1, 2.. T }, the dual-stream feature extraction module providing two feature representations for each frame of the video, each frame ItAre respectively fcls(It;θcls,θ)∈RDAnd fatt(It;θatt,θ)∈RDWhere θ denotes a sharing parameter, θcls,θattIndependent parameters, I, representing the classification module and the attention module, respectivelytIs the T-th frame of the video, and T represents the frame number of the video.
Further, the linear classification module uses a linear classifier to predict the probability that each frame belongs to NAFLD, providing a fine-grained reference for the final diagnosis.
Further, the linear classification module is based on the feature f extracted by the double-current feature extraction moduleclsLearning the linear mapping W ∈ R1×DExpressing the feature fclsConversion to a one-dimensional scalar WfclsThe sigmoid function normalizes the scalar to be between 0 and 1, representing the final probability value, as follows:
pt=σ(Wfcls+ b), where b is a constant term and σ represents a sigmoid function.
Further, the contextual attention module scores the importance of each frame in conjunction with the context for highlighting the discriminative information on key frames and suppressing extraneous information for useless frames.
Further, the contextual attention module is based on a feature vector f of each frameattThe hidden layer features containing timing information further extracted by Bi-LSTM can be expressed as follows:
wherein, thereinAndrespectively represent parameters ofForward LSTM (t from 1 to t) and parameters ofBackward LSTM (t from t to 1), then the fully connected layer linearly maps W from feature to significance learninga∈R1×D/2Then, the importance of all frames is normalized by the softmax function, as follows:
further, at the end of the system, the classification probability of each frame is weighted and summed according to the attention distribution, and the obtained final probability value is used for representing the diagnosis result of the whole video, wherein the diagnosis result is represented as:
further, the mathematical expression of the loss function L is as follows:
L=LBCE+λ(LSSL+LCDL)
wherein λ is a scaling factor that controls the relative importance of binary cross-entropy loss (BCE), Contrast Similarity Loss (CSL), and Contrast Difference Loss (CDL);
the binary cross entropy loss is based on the prediction probability of each videoWith the true value y, the final loss function can be calculated as follows:
wherein N represents the video frequency in the training set;
the contrast similarity loss is used to represent the similarity of key frame portions between positive and negative sample pairs, and the feature of the key frame portion used for attention generation of each video can be represented as follows:
in addition, cosine similarity is used to measure the similarity between two feature vectors, which can be expressed as:
thus, the contrast similarity loss is calculated as follows:
where P represents the positive and negative sample pair logarithm in a batch.
The contrast difference loss is used to represent the difference of the key frame portions between the positive and negative sample pairs, and the feature of the key frame portion for classifying each video can be represented as follows:
thus, the loss of contrast variability is calculated as follows:
after adopting the technical scheme, compared with the background technology, the invention has the following advantages:
the context attention network effectively solves the problem of irrelevant information interference by introducing an attention mechanism; the negative influence of low quality of ultrasound is relieved to a certain extent by combining time sequence information; the characteristics used for the classification module and the attention module are respectively extracted by adopting different branches in the double-flow characteristic extraction module, so that the expressiveness of the extracted characteristics is effectively improved, and the performance of the system is further improved, and the expressiveness of the system is further improved by combining the double-flow characteristic extraction module with a loss function (namely binary cross entropy loss, contrast similarity loss and contrast difference loss), so that the accuracy of 90.56%, the specificity of 88.26% and the sensitivity of 93.58% are finally obtained.
Drawings
Fig. 1 is a schematic diagram of twin attention subnetwork structure of the NAFLD ultrasonic diagnostic system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example one
The invention discloses a NAFLD ultrasonic video diagnosis system based on a twin attention network, which consists of two twin attention subnetworks with the same structure and sharing weight, wherein the twin attention network consists of two twin attention subnetworks with the same structure and sharing weight and a loss function, as shown in figure 1, the twin attention subnetworks consist of a double-current feature extraction module a, a linear classification module b and a context attention c module, and the loss function consists of binary cross entropy loss, contrast similarity loss and contrast difference loss.
The double-flow feature extraction module a comprises a sharing module, a classification module and an attention module; the double-flow feature extraction module a is used for extracting different features of classification and attention.
The sharing module is used for extracting low-level features shared by the classification module and the attention module, and the calculation cost is greatly reduced while the relevance of two task bottom layers is established; then, the classification module is used for extracting the high-level features to generate classification, and the attention module is used for extracting the high-level features to generate attention, at the moment, the features of different branches can be well suitable for the requirements of different tasks, and the effect of a follow-up module is further improved.
For a given video V ═ It1, 2.. T }, said dual-stream feature extraction module a providing two feature representations for each frame of the video, each frame ItAre respectively fcls(It;θcls,θ)∈RDAnd fatt(It;θatt,θ)∈RDWhere θ denotes a sharing parameter, θcls,θattIndependent parameters, I, representing the classification module and the attention module, respectivelytIs the T-th frame of the video, and T represents the frame number of the video.
The linear classifier is used by the linear classification module b to predict the probability that each frame belongs to NAFLD, providing a fine-grained reference for the final diagnosis.
The linear classification module b extracts the features f based on the double-flow feature extraction module aclsLearning the linear mapping W ∈ R1 ×DExpressing the feature fclsConversion to a one-dimensional scalar WfclsSigmoid function normalizes scalar to be between 0 and 1, representing the final probability valueAs follows:
pt=σ(Wfcls+ b), where b is a constant term and σ represents a sigmoid function.
The context attention module c scores the importance of each frame in conjunction with the context for highlighting the discriminative information on the key frames and suppressing extraneous information for the useless frames.
The contextual attention module c is based on the feature vector f of each frameattThe hidden layer features containing timing information further extracted by Bi-LSTM can be expressed as follows:
wherein, thereinAndrespectively represent parameters ofForward LSTM (t from 1 to t) and parameters ofBackward LSTM (t from t to 1), then the fully connected layer linearly maps W from feature to significance learninga∈R1×D/2Then, the importance of all frames is normalized by the softmax function, as follows:
at the end of the system, the classification probability of each frame is weighted and summed according to the attention distribution, and the obtained final probability value is used for representing the diagnosis result of the whole video, wherein the diagnosis result is represented as:
after the NAFLD ultrasonic video diagnosis system is constructed, a training process is started to the model, and loss functions used in the training process are divided into the following three parts: binary Cross Entropy Loss (BCEL), Contrast Similarity Loss (CSL), Contrast Dissimilarity Loss (CDL). The binary cross entropy loss acts on the final diagnosis result, the difference between the final diagnosis result and the actual value is measured, and each module is optimized; the contrast similarity loss measures the feature similarity of the key frame part between the positive sample pair and the negative sample pair, and the contrast difference loss measures the feature difference of the key frame part between the positive sample pair and the negative sample pair, so that the selection capability of the model on the key frame is promoted, and the expressiveness of the features is enhanced.
The mathematical expression of the loss function L is as follows:
L=LBCE+λ(LSS1+LCDL)
wherein λ is a scaling factor that controls the relative importance of binary cross-entropy loss (BCE), Contrast Similarity Loss (CSL), and Contrast Difference Loss (CDL);
the binary cross entropy loss is based on the prediction probability of each videoWith the true value y, the final loss function can be calculated as follows:
wherein N represents the video frequency in the training set;
the contrast similarity loss is used to represent the similarity of key frame portions between positive and negative sample pairs, and the feature of the key frame portion used for attention generation of each video can be represented as follows:
in addition, cosine similarity is used to measure the similarity between two feature vectors, which can be expressed as:
thus, the contrast similarity loss is calculated as follows:
where P represents the positive and negative sample pair logarithm in a batch.
The contrast difference loss is used to represent the difference of the key frame portions between the positive and negative sample pairs, and the feature of the key frame portion for classifying each video can be represented as follows:
thus, the loss of contrast variability is calculated as follows:
the data used for training consisted of 520 subjects' liver ultrasound videos, with 260 videos from NAFLD patients and an additional 260 videos belonging to normal samples. Since the input of the training phase is a pair of positive and negative samples, we need to ensure that the positive and negative samples have the same length, so we sample 20 frames of images at equal intervals for all videos. The original resolution of the video is 800 × 600, and the sampling frequency is 31 Hz. After video acquisition, 3 doctors with abundant experience carry out manual annotation at the same time, and the voting results of the 3 doctors are synthesized to finally judge whether the subject suffers from NAFLD.
The evaluation indexes adopted in the embodiment include accuracy, specificity, sensitivity and AUC values. The following results were obtained:
(1) validity of dual stream feature extraction module
The ResNet50 is used as a basic network, improvement is carried out on the basis of the twin attention network, an original feature extraction module of the twin attention network is replaced by a double-flow feature extraction module, and model performances before and after replacement are compared to verify the superiority of the double-flow feature extraction module. The results obtained are as follows:
TABLE 1 double-flow feature extraction Module validation results
Method | Rate of accuracy | Specificity of | Sensitivity of the composition | AUC value |
CAN(ResNet50) | 0.8736 | 0.8322 | 0.9358 | 0.9415 |
CAN (double current feature extraction module) | 0.8868 | 0.8622 | 0.9207 | 0.9459 |
As can be seen from the above table, the dual-flow feature extraction module effectively improves the performance of most of the indicators of the twin attention network compared to the original ResNet50 as the base network. Specifically, compared with the original twin attention network, the twin attention network using the dual-flow feature extraction module as the base network improves the accuracy and specificity by 0.72%, 3.00% and 0.44% respectively. The results prove that the classification and attention modules need different features, and the double-branch structure of the double-flow feature extraction module can effectively improve the features with task specificity and enhance the performance of the model.
(2) Effectiveness of contrast difference loss and contrast similarity loss
The contrast difference loss and the contrast similarity loss are respectively used for measuring the difference and the similarity of key frame parts between the positive and negative sample pairs, a pair of positive and negative samples are given on the basis of the double-current feature extraction network, and for the features required by the classification branches, the contrast difference loss enables the key frame parts to be as far away as possible, so that the models can be better distinguished; for features required by the attention branch, the contrast similarity loss makes the key frame part distance as close as possible, so that the model can better select the key frame.
In this embodiment, contrast difference loss and contrast similarity loss with different weights are introduced to a twin attention network to which a dual-flow feature extraction module is added, and the following results are obtained by comparing networks without introduced loss functions:
table 2 CDL and CSL validity verification results
Method (lambda) | Rate of accuracy | Specificity of | Sensitivity of the composition | AUC value |
0(CAN+BFEM) | 0.8868 | 0.8622 | 0.9207 | 0.9459 |
0.2 | 0.8942 | 0.8700 | 0.9269 | 0.9473 |
0.4 | 0.9056 | 0.8826 | 0.9358 | 0.9521 |
0.6 | 0.8903 | 0.8690 | 0.9192 | 0.9402 |
As shown in table 2, at lower weights, the indices were improved after the addition of both CDL and CSL, with the best results being achieved around 0.4, with improvements in accuracy, specificity, sensitivity and AUC values of 1.88%, 2.04%, 1.51%, 0.62%, respectively, compared to the group without the introduction of CDL and CSL.
(3) The effectiveness of the NAFLD ultrasonic video diagnosis system of the invention
Compared with the common twin attention network (CAN), the NAFLD ultrasonic video diagnosis System (SAN) provided by the invention is additionally provided with a double-flow feature extraction module, and a loss function (binary cross entropy loss, contrast difference loss and contrast similarity loss) BFEM and a newly designed loss function are introduced, for example, a table 3 shows a comparison result with the common twin attention network CAN.
TABLE 3 SAN superiority verification results
Method | Rate of accuracy | Specificity of | Sensitivity of the composition | AUC |
CAN | 0.8736 | 0.8322 | 0.9358 | 0.9415 |
SAN | 0.9056 | 0.8826 | 0.9358 | 0.9521 |
As shown in table 3, SAN increased by 2.2%, 5.04% and 1.06%, respectively, compared to CAN, with the same sensitivity.
From the above results, it can be seen that the BFEM in the ANFLD ultrasound video diagnostic system SAN provided by the present invention effectively extracts different features required for classification and attention, and the newly designed CSL and CDL further constrain the distribution of the features, enhancing the expressiveness of the features, and finally obtaining an accuracy of 90.56%, a specificity of 88.26%, and a sensitivity of 93.58% by combining the two, which proves the feasibility and effectiveness of SAN.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (6)
1. A NAFLD ultrasonic video diagnosis system based on twin attention network is characterized in that: the system comprises two twin attention subnetworks with the same structure and sharing weight and a loss function, wherein the twin attention subnetworks comprise a double-current feature extraction module, a linear classification module and a context attention module, and the loss function comprises binary cross entropy loss BCE, contrast similarity loss CSL and contrast difference loss CDL;
the double-flow feature extraction module comprises a sharing module, a classification module and an attention module; the double-flow feature extraction module is used for extracting different features of classification and attention; the sharing module is used for extracting low-level features shared by the classification module and the attention module; the classification module is used for extracting features of a high level to generate a classification; the attention module is used for extracting features of a high level to generate attention;
the linear classification module predicts the probability that each frame belongs to the NAFLD by using a linear classifier, and provides fine-grained reference for final diagnosis;
the contextual attention module scores the importance of each frame in conjunction with the context for highlighting discriminative information on key frames and suppressing irrelevant information of useless frames.
2. The NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein:
for a given video V ═ It1, 2.. T }, the dual-stream feature extraction module providing two feature representations for each frame of the video, each frame ItAre respectively fcls(It;θcls,θ)∈RDAnd fatt(It;θatt,θ)∈RDWhere θ denotes a sharing parameter, θcls,θattIndependent parameters, I, representing the classification module and the attention module, respectivelytIs the T-th frame of the video, and T represents the frame number of the video.
3. The NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: the linear classification module is based on the feature f extracted by the double-flow feature extraction moduleclsLearning the linear mapping W ∈ R1×DExpressing the feature fclsConversion to a one-dimensional scalar WfclsThe sigmoid function normalizes the scalar to be between 0 and 1, representing the final probability value, as follows:
pt=σ(Wfcls+ b), where b is a constant term and σ represents a sigmoid function.
4. The NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: the contextual attention module is based on a feature vector f for each frameattHidden layer containing timing information further extracted by Bi-LSTMThe features may be expressed as follows:
wherein, thereinAndrespectively represent parameters ofHas forward LSTM and parameters ofBackward LSTM of (1), where forward LSTM represents t from 1 to t and backward LSTM represents t from t to 1, and then the fully connected layer is linearly mapped W from feature to significance learninga∈R1×D/2Then, the importance of all frames is normalized by the softmax function, as follows:
5. the NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: at the end of the system, the classification probability of each frame is weighted and summed according to the attention distribution, and the final probability value obtained is used for representing the whole videoThe diagnostic result of (a), said diagnostic result being represented as:
6. the NAFLD ultrasound video diagnostic system based on a twin attention network of claim 1, wherein: the mathematical expression of the loss function L is as follows:
L=LBCE+λ(LCSL+LCDL);
wherein, λ is a scale factor controlling the relative importance of binary cross entropy loss BCE, contrast similarity loss CSL and contrast difference loss CDL;
the binary cross entropy loss is based on the prediction probability of each videoWith the true value y, the final loss function can be calculated as follows:
wherein N represents the video frequency in the training set;
the contrast similarity loss is used to represent the similarity of key frame portions between positive and negative sample pairs, and the feature of the key frame portion used for attention generation of each video can be represented as follows:
in addition, cosine similarity is used to measure the similarity between two feature vectors, which can be expressed as:
the loss of contrast similarity is calculated as follows:
wherein, P represents the positive and negative sample pair logarithm in a batch;
the contrast difference loss is used to represent the difference of the key frame portions between the positive and negative sample pairs, and the feature of the key frame portion for classifying each video can be represented as follows:
the loss of contrast variability was calculated as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010924390.7A CN112085718B (en) | 2020-09-04 | 2020-09-04 | NAFLD ultrasonic video diagnosis system based on twin attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010924390.7A CN112085718B (en) | 2020-09-04 | 2020-09-04 | NAFLD ultrasonic video diagnosis system based on twin attention network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112085718A CN112085718A (en) | 2020-12-15 |
CN112085718B true CN112085718B (en) | 2022-05-10 |
Family
ID=73732599
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010924390.7A Active CN112085718B (en) | 2020-09-04 | 2020-09-04 | NAFLD ultrasonic video diagnosis system based on twin attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112085718B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110335290A (en) * | 2019-06-04 | 2019-10-15 | 大连理工大学 | Twin candidate region based on attention mechanism generates network target tracking method |
CN111291679A (en) * | 2020-02-06 | 2020-06-16 | 厦门大学 | Target specific response attention target tracking method based on twin network |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111354017B (en) * | 2020-03-04 | 2023-05-05 | 江南大学 | Target tracking method based on twin neural network and parallel attention module |
CN111539316B (en) * | 2020-04-22 | 2023-05-05 | 中南大学 | High-resolution remote sensing image change detection method based on dual-attention twin network |
-
2020
- 2020-09-04 CN CN202010924390.7A patent/CN112085718B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110335290A (en) * | 2019-06-04 | 2019-10-15 | 大连理工大学 | Twin candidate region based on attention mechanism generates network target tracking method |
CN111291679A (en) * | 2020-02-06 | 2020-06-16 | 厦门大学 | Target specific response attention target tracking method based on twin network |
Also Published As
Publication number | Publication date |
---|---|
CN112085718A (en) | 2020-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109493308B (en) | Medical image synthesis and classification method for generating confrontation network based on condition multi-discrimination | |
CN114926746B (en) | SAR image change detection method based on multiscale differential feature attention mechanism | |
CN111985538A (en) | Small sample picture classification model and method based on semantic auxiliary attention mechanism | |
Yang et al. | TTL-IQA: Transitive transfer learning based no-reference image quality assessment | |
CN111325750B (en) | Medical image segmentation method based on multi-scale fusion U-shaped chain neural network | |
CN111539491B (en) | System and method for classifying multiple nodules based on deep learning and attention mechanism | |
CN114169442B (en) | Remote sensing image small sample scene classification method based on double prototype network | |
CN107301409B (en) | System and method for selecting Bagging learning to process electrocardiogram based on Wrapper characteristics | |
CN115049603B (en) | Intestinal polyp segmentation method and system based on small sample learning | |
CN112085742B (en) | NAFLD ultrasonic video diagnosis method based on context attention | |
CN116864103A (en) | Myopenia diagnosis method based on multi-modal contrast learning | |
CN115496720A (en) | Gastrointestinal cancer pathological image segmentation method based on ViT mechanism model and related equipment | |
CN117408946A (en) | Training method of image processing model and image processing method | |
CN114093507A (en) | Skin disease intelligent classification method based on contrast learning in edge computing network | |
Kang et al. | Label-assemble: Leveraging multiple datasets with partial labels | |
Zhou et al. | Multi-objective evolutionary generative adversarial network compression for image translation | |
CN117371511A (en) | Training method, device, equipment and storage medium for image classification model | |
CN113344028A (en) | Breast ultrasound sequence image classification method and device | |
CN112085718B (en) | NAFLD ultrasonic video diagnosis system based on twin attention network | |
Wang et al. | Self-supervised learning for high-resolution remote sensing images change detection with variational information bottleneck | |
CN115834161A (en) | Power grid false data injection attack detection method of artificial intelligence four-layer architecture | |
CN113486969A (en) | X-ray image classification method based on improved Resnet network | |
Amalia et al. | The Application of Modified K-Nearest Neighbor Algorithm for Classification of Groundwater Quality Based on Image Processing and pH, TDS, and Temperature Sensors | |
CN117974693B (en) | Image segmentation method, device, computer equipment and storage medium | |
Yu et al. | Towards better dermoscopic image feature representation learning for melanoma classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |