KR20220134403A

KR20220134403A - Severity Quantication and Lesion Localization Method of Infectious Disease on CXR using Vision Transformer and Apparatus Therefor

Info

Publication number: KR20220134403A
Application number: KR1020210070757A
Authority: KR
Inventors: 예종철; 박상준; 김광현
Original assignee: 한국과학기술원
Priority date: 2021-03-26
Filing date: 2021-06-01
Publication date: 2022-10-05
Also published as: KR102558096B1

Abstract

Disclosed are a method for quantifying the severity of infectious diseases based on a vision transducer using chest radiographic features and an apparatus using the same. According to an embodiment of the present invention, the method for quantifying the severity of infectious diseases based on a vision transducer using chest radiographic features comprises the steps of: receiving an input chest radiographic image; extracting a feature map from the received input chest radiographic image by using a pre-trained neural network; classifying lesions in the input chest radiographic image by using a vision transducer, based on the extracted feature map; and quantifying the severity of the input chest radiographic image based on the extracted feature map and the classified lesions. Therefore, the method can estimate an infectious disease such as COVID-19 and quantify the severity by using the vision transducer using the chest radiographic features.

Description

흉부 방사선 영상 특징을 활용한 비전 변환기 기반의 감염병 중증도 정량화 방법 및 그 장치 {Severity Quantication and Lesion Localization Method of Infectious Disease on CXR using Vision Transformer and Apparatus Therefor}{Severity Quantication and Lesion Localization Method of Infectious Disease on CXR using Vision Transformer and Apparatus Therefor}

본 발명은 흉부 방사선 영상 특징을 활용한 비전 변환기 기반의 감염병 중증도 정량화 기술에 관한 것으로, 보다 구체적으로는 흉부 방사선 영상 특징을 활용한 비전 변환기를 이용하여 감염병 예를 들어, COVID-19의 추정 및 중증도를 정량화할 수 있는 방법 및 그 장치에 관한 것이다.The present invention relates to a technology for quantifying the severity of an infectious disease based on a vision transducer using chest radiographic image characteristics, and more specifically, the estimation and severity of an infectious disease, for example, COVID-19, using a vision transducer utilizing chest radiographic image characteristics. It relates to a method and an apparatus capable of quantifying .

중증급성호흡기증후군 코로나바이러스-2로 인한 신종 코로나바이러스 2019(COVID-19)는 2021년 3월 1일 현재 전 세계적으로 2,526,007명의 사망자가 발생하고 113,695,296명의 확진자가 발생한 팬데믹이다. COVID-19에 의한 전례 없는 팬데믹에 직면하여 공공 의료 시스템은 의료 자원의 심각한 부족을 포함하여 많은 측면에서 과제에 직면해 있으며, 많은 의료 사업자가 감염되었다. COVID-19의 높은 전염성과 병리학적 특성 때문에, 질병의 추가 확산을 방지하고 의료 시스템의 부담을 줄이기 위해 COVID-19의 조기 검진이 점점 더 중요해지고 있다.Novel Coronavirus 2019 (COVID-19) caused by Severe Acute Respiratory Syndrome Coronavirus-2 is a pandemic with 2,526,007 deaths and 113,695,296 confirmed cases worldwide as of March 1, 2021. In the face of the unprecedented pandemic caused by COVID-19, public health systems are challenged in many respects, including a severe shortage of health care resources, with many health care providers infected. Because of the highly contagious and pathological nature of COVID-19, early detection of COVID-19 is becoming increasingly important to prevent further spread of the disease and reduce the burden on the healthcare system.

현재 실시간 중합효소 연쇄반응(RT-PCR)은 높은 민감도와 특이성으로 인해 COVID-19 확인에서 최적표준이지만 결과를 얻는 데는 몇 시간이 걸린다. 확인된 COVID-19를 가진 많은 환자들이 폐렴의 방사선 소견을 제시하므로, 방사선 검사는 빠른 진단에 유용할 수 있다. 흉부 컴퓨터 단층 촬영(CT)은 COVID-19 진단에 대한 민감도와 특수성이 우수하지만, CT의 일상적인 사용은 흉부 방사선 촬영(CXR)보다 높은 비용과 상대적으로 긴 스캔 시간으로 인해 의료 시스템에 큰 부담을 준다. 따라서, CXR을 세계적인 유행병 하에서 일차 선별 도구로 사용할 수 있는 실질적인 이점이 있다. COVID-19의 일반적인 CXR 소견에는 폐 하부 및 주변부의 간유리음영(ground glass opacities) 및 패치 경결 등이 포함된다. CXR만 사용하는 COVID-19 진단의 민감도와 특수성이 CT 또는 RT-PCR보다 낮다고 보고되었지만 CXR은 여전히 환자 검사 중 COVID-19를 빠르게 검사할 수 있는 잠재력을 가지고 있어 팬데믹 상황에서 포화된 의료 시스템을 돕기 위한 환자 치료의 우선 순위를 결정한다.Currently, real-time polymerase chain reaction (RT-PCR) is the best standard for confirming COVID-19 due to its high sensitivity and specificity, but it takes several hours to obtain results. Since many patients with confirmed COVID-19 present with radiographic findings of pneumonia, radiographic tests may be useful for rapid diagnosis. Although chest computed tomography (CT) has excellent sensitivity and specificity for the diagnosis of COVID-19, the routine use of CT places a heavy burden on the health care system due to its higher cost and relatively longer scan times than chest radiography (CXR). give. Thus, there is a substantial advantage to using CXR as a primary screening tool under a global pandemic. Common CXR findings for COVID-19 include ground glass opacities and patch induration in the lower and periphery of the lungs. Although the sensitivity and specificity of COVID-19 diagnosis using CXR-only has been reported to be lower than that of CT or RT-PCR, CXR still has the potential to rapidly test for COVID-19 during patient testing, helping to overcome a saturated healthcare system in a pandemic situation. Prioritize patient care to help.

따라서, CXR로 COVID-19를 진단하기 위해 딥 러닝을 사용하는 많은 접근법이 제안되었지만, 제한된 수의 라벨링된 COVID-19 데이터라는 공통 문제를 겪었고, 결과적으로 일반화 능력이 저하되었다. 완전히 다른 새로운 데이터셋에 대하여 신뢰할 수 있는 일반화 성능은 시스템의 실제 채택에 중요하다.Therefore, many approaches using deep learning to diagnose COVID-19 with CXR have been proposed, but they suffer from the common problem of a limited number of labeled COVID-19 data, and consequently their ability to generalize is poor. Reliable generalization performance on completely different new datasets is important for practical adoption of the system.

일반적으로, 이 문제를 해결하기 위한 가장 일반적인 접근 방식은 수백만 개의 트레이닝 데이터로 적대적으로 강력한 모델을 구축하는 것이다. 그러나 많은 국가에서 의료 시스템의 포화 때문에 라벨링된 COVID-19 사례가 많이 포함된 잘 정제된 데이터셋을 구축하는 것은 어렵다. 기존 연구들은 ImageNet과 같은 다른 대규모 데이터셋의 전이 학습을 사용하거나 약지도 학습 방법 및 이상 탐지를 활용하여 문제를 완화하려고 했지만, 그들의 성능은 종종 차선책이며 일반화 능력을 보장하지는 않는다. 또한 COVID-19는 일반적으로 낮은 영역 우성을 가진 두 폐 영역을 모두 포함하므로 모델은 질병의 전역적 징후를 기반으로 특징을 추출해야 한다.In general, the most common approach to solving this problem is to build an adversarially strong model with millions of training data. However, it is difficult to build well-refined datasets containing many labeled COVID-19 cases because of the saturation of health care systems in many countries. Existing studies have tried to alleviate the problem by using transfer learning of other large datasets such as ImageNet, or by utilizing weakly supervised learning methods and anomaly detection, but their performance is often suboptimal and does not guarantee generalizability. Additionally, since COVID-19 typically includes both lung regions with low-region dominance, the model must extract features based on global manifestations of the disease.

자연어 처리(NLP) 분야에서 가장 먼저 도입된 변환기는 상당히 큰 수용 영역을 가진 자기 주의 메커니즘 기반의 딥 뉴럴 네트워크이다. NLP에서 놀라운 결과를 얻은 후, 영상 내에서 장거리 종속성을 모델링할 수 있기 때문에 비전 커뮤니티가 컴퓨터 비전에서의 응용 프로그램을 연구하도록 영감을 주었다. 비전 변환기(ViT)는 우선 딥 뉴럴 네트워크에서 변환기가 어떻게 표준 컨볼루션 연산을 완전히 대체하며 최첨단 성능을 달성하는지 보여주었다. 그러나 비전 변환기를 처음부터 다시 트레이닝하려면 막대한 양의 데이터가 필요하므로 초기 특징 임베딩을 생성하는 기존의 컨볼루션 뉴럴 네트워크 예를 들어, ResNet 백본을 활용함으로써, 하이브리드 모델도 제안하였다. 이와 같이, ResNet 백본에 의해 생성된 특징 코퍼스(corpus)를 사용하여 트레이닝되는 변환기는 주로 글로벌 주의를 학습하는 데 초점을 맞출 수 있다. 경험적 결과는 하이브리드 모델이 소규모 데이터셋에서 더 나은 성능을 나타낸다는 것을 보여준다.The first transformer introduced in the field of natural language processing (NLP) is a deep neural network based on self-attention mechanisms with a fairly large receptive area. After achieving surprising results in NLP, it has inspired the vision community to study its applications in computer vision because of the ability to model long-range dependencies within images. Vision Transformer (ViT) first showed how transformers completely replace standard convolution operations in deep neural networks and achieve state-of-the-art performance. However, since a huge amount of data is required to retrain the vision transducer from scratch, a hybrid model is also proposed by utilizing the existing convolutional neural network, for example, the ResNet backbone to generate initial feature embeddings. As such, a transformer trained using the feature corpus generated by the ResNet backbone may focus primarily on learning global attention. Empirical results show that the hybrid model performs better on small datasets.

이 예비 결과는 유망하지만 ResNet에 의해 생성된 corpus가 CXR을 사용한 진단을 위한 최적의 입력 특징 임베딩이 아닐 수 있다는 우려는 여전히 남아 있다. 다행히 COVID-19가 발생하기 전에 구축된 CXR 분류를 위해 공개적으로 사용할 수 있는 대규모 데이터셋이 몇 개 있다. 그들 중 CheXpert 데이터셋은 전염병 진단에 유용한 저레벨의 CXR 특징 예를 들어, 음영, 경결, 부종 등을 포함하여 라벨링된 비정상 관측으로 구성된다. 또한 고급 CNN 아키텍처가 존재하는데, 이 아키텍처는 확률론적 클래스 활성화 맵(PCAM) 풀링을 사용하여 클래스 활성화 맵의 이점을 명시적으로 활용하여 이러한 저레벨의 특징에 대한 분류 및 위치지정 능력을 모두 향상시킬 수 있다. Although these preliminary results are promising, concerns remain that the corpus generated by ResNet may not be the optimal input feature embedding for diagnosis using CXR. Fortunately, there are several large publicly available datasets for CXR classification that were built before the COVID-19 outbreak. Among them, the CheXpert dataset consists of labeled anomalous observations, including low-level CXR features, such as shading, induration, edema, etc., useful for diagnosing infectious diseases. Advanced CNN architectures also exist, which can explicitly exploit the benefits of class activation maps using probabilistic class activation map (PCAM) pooling to improve both classification and localization capabilities for these low-level features. have.

본 발명의 실시예들은, 흉부 방사선 영상 특징을 활용한 비전 변환기를 이용하여 감염병 예를 들어, COVID-19의 추정 및 중증도를 정량화할 수 있는 방법 및 그 장치를 제공한다.Embodiments of the present invention provide a method and apparatus for quantifying the estimation and severity of an infectious disease, for example, COVID-19, using a vision transducer using chest radiographic image characteristics.

본 발명의 실시예들은, 사전 구축된 (Pre-built) 대규모 공개 데이터셋(Large public data set)에서 저레벨 특징(Low-level feature)을 분류하도록 학습한 후, 이렇게 학습된 모델에서 특징 맵을 얻어낸 후 이를 비전 변환기(Vision Transformer)를 활용하여 조합함으로써, COVID-19로 추정된 영상에서 중증도의 정도까지 정량화할 수 있는 방법 및 그 장치를 제공한다.Embodiments of the present invention, after learning to classify a low-level feature from a pre-built large public data set, obtain a feature map from this learned model Then, by combining them using a Vision Transformer, a method and device capable of quantifying to the degree of severity in images estimated to be COVID-19 are provided.

본 발명의 일 실시예에 따른 비전 변환기 기반의 감염병 중증도 정량화 방법은 입력 흉부 방사선 영상을 수신하는 단계; 상기 수신된 입력 흉부 방사선 영상에 대하여, 미리 학습된 뉴럴 네트워크를 이용하여 특징 맵(Feature map)을 추출하는 단계; 상기 추출된 특징맵을 기반으로 비전 변환기를 이용하여 상기 입력 흉부 방사선 영상에서 병변을 분류하는 단계; 및 상기 추출된 특징맵과 상기 분류된 병변에 기초하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하는 단계를 포함한다.According to an embodiment of the present invention, a method for quantifying the severity of an infectious disease based on a vision transducer includes receiving an input chest radiographic image; extracting a feature map from the received input chest radiographic image using a pre-trained neural network; classifying the lesion in the input chest radiographic image using a vision transducer based on the extracted feature map; and quantifying the severity of the input chest radiographic image based on the extracted feature map and the classified lesion.

상기 특징 맵을 추출하는 단계는 상기 수신된 입력 흉부 방사선 영상을 정규화 영상으로 변환한 후 상기 정규화 영상에 대하여 상기 뉴럴 네트워크를 이용하여 특징 맵을 추출할 수 있다.The extracting of the feature map may include converting the received input chest radiographic image into a normalized image and then extracting the feature map from the normalized image using the neural network.

상기 특징 맵을 추출하는 단계는 상기 입력 흉부 방사선 영상의 영상의 너비와 높이, 픽셀 값 범위를 일정한 범위로 정규화함으로써, 상기 입력 흉부 방사선 영상을 상기 정규화 영상으로 변환할 수 있다.The extracting of the feature map may convert the input chest radiographic image into the normalized image by normalizing a width, a height, and a pixel value range of the input chest radiographic image to a predetermined range.

상기 정량화하는 단계는 상기 추출된 특징맵과 상기 분류된 병변에 기초하여 상기 병변을 영역화하는 단계를 포함할 수 있다.The quantifying may include regionalizing the lesion based on the extracted feature map and the classified lesion.

상기 특징 맵을 추출하는 단계는 상기 입력 흉부 방사선 영상에 대하여, 폐렴, 경결(consolidation), 폐 음영, 흉막유출, 심장 비대, 부종, 무기폐, 기흉, 보조 장치 및 소견 없음의 저레벨 특징이 담긴 상기 특징 맵을 추출할 수 있다.The step of extracting the feature map includes the low-level features of pneumonia, consolidation, lung shading, pleural effusion, cardiac hypertrophy, edema, atelectasis, pneumothorax, auxiliary devices, and no findings for the input chest radiographic image. Maps can be extracted.

상기 병변을 분류하는 단계는 상기 특징 맵에 포함된 정보를 조합하여 병변을 분류하고, 상기 분류된 병변 분류 결과를 활용하여 최종적인 진단을 추정할 수 있다.The classifying of the lesion may include classifying the lesion by combining the information included in the feature map, and estimating a final diagnosis by using the classified lesion classification result.

상기 정량화하는 단계는 상기 추출된 특징맵에 포함된 정보의 조합과 상기 분류된 병변에 기초하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하고 상기 병변의 위치를 영역화할 수 있다.The quantifying may include quantifying the severity of the input chest radiographic image based on a combination of information included in the extracted feature map and the classified lesion and localizing the location of the lesion.

상기 정량화하는 단계는 상기 추출된 특징맵과 상기 분류된 병변에 기초하여 병변 확률 맵을 생성하고, 상기 생성된 병변 확률 맵을 이용하여 상기 입력 흉부 방사선 영상의 중증도를 정량화할 수 있다.The quantifying may include generating a lesion probability map based on the extracted feature map and the classified lesion, and quantifying the severity of the input chest radiographic image using the generated lesion probability map.

상기 정량화하는 단계는 상기 입력 흉부 방사선 영상의 좌/우 각각의 폐 영역을 세 개의 영역으로 분할하고, 상기 분할된 영역 각각에 대하여 병변 확률 최대 값을 할당하며, 상기 할당된 병변 확률 최대 값을 이용하여 상기 분할된 영역 각각에 대하여 상기 입력 흉부 방사선 영상의 중증도를 정량화할 수 있다.In the quantifying, each of the left and right lung regions of the input chest radiographic image is divided into three regions, a maximum lesion probability value is assigned to each of the divided regions, and the allocated maximum lesion probability value is used. Thus, the severity of the input chest radiographic image may be quantified for each of the divided regions.

상기 뉴럴 네트워크는 라벨링된 대규모의 제1 트레이닝 데이터셋에 의하여 미리 학습된 후, 분류하고자 하는 병변에 대하여 라벨링된 소규모의 제2 트레이닝 데이터셋을 이용한 트레이닝을 통해 상기 비전 변환기와 함께 지도 및 약지도 학습될 수 있다.The neural network is pre-trained by the first large-scale training dataset labeled, and then, through training using the small-scale second training dataset labeled for the lesion to be classified, it learns a map and a ring map together with the vision converter. can be

본 발명의 다른 일 실시예에 따른 비전 변환기 기반의 감염병 중증도 정량화 방법은 입력 흉부 방사선 영상을 정규화 영상으로 변환하는 단계; 상기 변환된 정규화 영상에 대하여, 미리 학습된 뉴럴 네트워크를 이용하여 특징 맵(Feature map)을 추출하는 단계; 상기 추출된 특징 맵을 기반으로 비전 변환기를 이용하여 상기 입력 흉부 방사선 영상에서 병변을 분류하는 단계; 및 상기 추출된 특징맵과 상기 분류된 병변에 기초하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하는 단계를 포함한다.According to another embodiment of the present invention, a method for quantifying the severity of an infectious disease based on a vision converter includes converting an input chest radiographic image into a normalized image; extracting a feature map from the transformed normalized image using a pre-trained neural network; classifying the lesion in the input chest radiographic image using a vision transducer based on the extracted feature map; and quantifying the severity of the input chest radiographic image based on the extracted feature map and the classified lesion.

본 발명의 일 실시예에 따른 비전 변환기 기반의 감염병 중증도 정량화 장치는 입력 흉부 방사선 영상을 수신하는 수신부; 상기 수신된 입력 흉부 방사선 영상에 대하여, 미리 학습된 뉴럴 네트워크를 이용하여 특징 맵(Feature map)을 추출하는 추출부; 상기 추출된 특징맵을 기반으로 비전 변환기를 이용하여 상기 입력 흉부 방사선 영상에서 병변을 분류하는 분류부; 및 상기 추출된 특징맵과 상기 분류된 병변에 기초하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하는 정량화부를 포함한다.According to an embodiment of the present invention, an apparatus for quantifying the severity of an infectious disease based on a vision transducer includes: a receiver configured to receive an input chest radiographic image; an extractor for extracting a feature map from the received input chest radiographic image using a pre-trained neural network; a classification unit for classifying lesions in the input chest radiographic image using a vision transducer based on the extracted feature map; and a quantifier configured to quantify the severity of the input chest radiographic image based on the extracted feature map and the classified lesion.

상기 추출부는 상기 수신된 입력 흉부 방사선 영상을 정규화 영상으로 변환한 후 상기 정규화 영상에 대하여 상기 뉴럴 네트워크를 이용하여 특징 맵을 추출할 수 있다.The extractor may convert the received input chest radiographic image into a normalized image and then extract a feature map from the normalized image using the neural network.

상기 추출부는 상기 입력 흉부 방사선 영상의 영상의 너비와 높이, 픽셀 값 범위를 일정한 범위로 정규화함으로써, 상기 입력 흉부 방사선 영상을 상기 정규화 영상으로 변환할 수 있다.The extractor may convert the input chest radiographic image into the normalized image by normalizing a width, a height, and a pixel value range of the input chest radiographic image to a predetermined range.

상기 정량화부는 상기 추출된 특징맵과 상기 분류된 병변에 기초하여 상기 병변을 영역화할 수 있다.The quantification unit may localize the lesion based on the extracted feature map and the classified lesion.

상기 추출부는 상기 입력 흉부 방사선 영상에 대하여, 폐렴, 경결(consolidation), 폐 음영, 흉막유출, 심장 비대, 부종, 무기폐, 기흉, 보조 장치 및 소견 없음의 저레벨 특징이 담긴 상기 특징 맵을 추출할 수 있다.The extraction unit may extract the feature map containing low-level features of pneumonia, consolidation, lung shading, pleural outflow, cardiac hypertrophy, edema, atelectasis, pneumothorax, auxiliary devices, and no findings from the input chest radiographic image. have.

상기 분류부는 상기 특징 맵에 포함된 정보를 조합하여 병변을 분류하고, 상기 분류된 병변 분류 결과를 활용하여 최종적인 진단을 추정할 수 있다.The classification unit may classify the lesion by combining the information included in the feature map, and may estimate a final diagnosis by using the classified lesion classification result.

상기 정량화부는 상기 추출된 특징맵에 포함된 정보의 조합과 상기 분류된 병변에 기초하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하고 상기 병변의 위치를 영역화할 수 있다.The quantifier may quantify the severity of the input chest radiographic image based on a combination of information included in the extracted feature map and the classified lesion, and may localize the location of the lesion.

상기 정량화부는 상기 추출된 특징맵과 상기 분류된 병변에 기초하여 병변 확률 맵을 생성하고, 상기 생성된 병변 확률 맵을 이용하여 상기 입력 흉부 방사선 영상의 중증도를 정량화할 수 있다.The quantifier may generate a lesion probability map based on the extracted feature map and the classified lesion, and quantify the severity of the input chest radiographic image using the generated lesion probability map.

상기 정량화부는 상기 입력 흉부 방사선 영상의 좌/우 각각의 폐 영역을 세 개의 영역으로 분할하고, 상기 분할된 영역 각각에 대하여 병변 확률 최대 값을 할당하며, 상기 할당된 병변 확률 최대 값을 이용하여 상기 분할된 영역 각각에 대하여 상기 입력 흉부 방사선 영상의 중증도를 정량화할 수 있다.The quantification unit divides each of the left and right lung regions of the input chest radiographic image into three regions, assigns a maximum lesion probability value to each of the divided regions, and uses the allocated maximum lesion probability value. The severity of the input chest radiographic image may be quantified for each of the divided regions.

본 발명의 실시예들에 따르면, 흉부 방사선 영상 특징을 활용한 비전 변환기를 이용하여 감염병 예를 들어, COVID-19의 추정 및 중증도를 정량화할 수 있는데, 사전 구축된(Pre-built) 대규모 공개 데이터셋(Large public data set)에서 저레벨 특징(Low-level feature)을 분류하도록 학습한 후, 이렇게 학습된 모델에서 특징 맵을 얻어낸 후 이를 비전 변환기(Vision Transformer)를 활용하여 조합함으로써, COVID-19로 진단된 영상에서 중증도의 정도까지 정량화할 수 있다.According to embodiments of the present invention, it is possible to quantify the estimate and severity of an infectious disease, for example, COVID-19, using a vision transducer utilizing chest radiographic image characteristics, and pre-built large-scale public data After learning to classify low-level features from a large public data set, obtain a feature map from this trained model and combine it using a Vision Transformer, resulting in COVID-19 It can be quantified up to the degree of severity in the diagnosed image.

본 발명의 실시예들에 따르면, 추정 뿐만 아니라 중증도 정량화를 가능하게 함으로써, 감염병 선별 진료 과정에서 활용하면 감염병의 전파를 최소화하여 의료 자원의 효율적인 분배를 기대할 수 있으며, 치료 효과 확인 및 추적 관찰 과정에 적용하여 효과적으로 임상 의사를 보조 할 수 있다. 예를 들어, 본 발명은 단순 방사선 촬영 영상을 활용하여 코로나-19 등의 감염병으로 진단된 환자의 중증를 정량화 함으로써, 이미 기 진단된 환자의 추적 관찰(Follow-up) 및 치료 방침 결정(Treatment plan) 과정에서도 유용하게 활용될 수 있다.According to the embodiments of the present invention, by enabling not only estimation but also quantification of severity, when used in the process of screening and treating infectious diseases, it is possible to minimize the spread of infectious diseases to expect efficient distribution of medical resources, and to check treatment effects and follow-up procedures. It can effectively assist clinicians by applying it. For example, the present invention uses a simple radiographic image to quantify the severity of a patient diagnosed with an infectious disease such as Corona-19, and thus follow-up of a previously diagnosed patient and determine a treatment plan (Treatment plan) It can also be useful in the process.

본 발명의 실시예들에 따르면, 저레벨 특징은 대규모 데이터셋으로부터 학습된 모델을 활용하여 추출하므로 일반화 성능 저하 현상이 거의 나타나지 않고, 소규모 라벨 데이터로 학습된 비전 변환기는 저레벨 특징들을 조합하여 최종 결과를 내는 것 뿐이므로 마찬가지로 일반화 성능 저하의 영향을 적게 받는다.According to the embodiments of the present invention, low-level features are extracted using a model learned from a large-scale dataset, so there is almost no degradation in generalization performance, and the vision converter trained with small-scale label data combines low-level features to produce the final result. As it is only used, it is also less affected by the generalization performance degradation.

흉부 단순 방사선 영상(Chest X-ray; CXR) 촬영은 타 감염병 진단 기술 대비 촬영 방식이 단순하고 빠르기 때문에 의료 시장 뿐만 아니라 전문 의료 인력이 부족한 민간시설에서도 응용될 수 있다. 공항이나 군사시설, 개발도상국 등 인구가 밀집한 반면 전문 의료자원이 부족한 환경에서, 본 발명을 감염병 선별 진료로서 활용한다면, 감염병의 무분별한 전파를 사전에 효과적으로 차단할 수 있다.Chest X-ray (CXR) imaging can be applied not only in the medical market, but also in private facilities that lack specialized medical personnel because the imaging method is simpler and faster than other infectious disease diagnosis technologies. In an environment where the population is dense, such as airports, military facilities, and developing countries, while specialized medical resources are scarce, if the present invention is used as a screening and treatment for infectious diseases, it is possible to effectively block the reckless spread of infectious diseases in advance.

도 1은 본 발명의 일 실시예에 따른 비전 변환기 기반의 감염병 중증도 정량화 방법에 대한 동작 흐름도를 나타낸 것이다.
도 2는 본 발명의 방법에 대한 프레임워크를 설명하기 위한 일 예시도를 나타낸 것이다.
도 3은 특징 임베딩 네트워크의 구조와 비전 변환기의 구조에 대한 일 예시도를 나타낸 것이다.
도 4는 셀프 트레이닝 방법을 설명하기 위한 일 예시도를 나타낸 것이다.
도 5는 본 발명의 방법과 모델과 DenseNet-121 기반 모델 간의 BIMCV 외부 테스트셋에 대한 영역화 성능을 비교한 일 예시도를 나타낸 것이다.
도 6은 본 발명의 일 실시예에 따른 비전 변환기 기반의 감염병 중증도 정량화 장치에 대한 구성을 나타낸 것이다.1 is a flowchart showing an operation of a method for quantifying the severity of an infectious disease based on a vision transducer according to an embodiment of the present invention.
Figure 2 shows an exemplary diagram for explaining the framework for the method of the present invention.
3 shows an exemplary diagram of a structure of a feature embedding network and a structure of a vision converter.
4 shows an exemplary diagram for explaining a self-training method.
5 shows an exemplary diagram comparing the scalability performance of the BIMCV external test set between the method and model of the present invention and the DenseNet-121 based model.
6 is a diagram showing the configuration of an apparatus for quantifying the severity of an infectious disease based on a vision transducer according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be embodied in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며, 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.The terminology used herein is for the purpose of describing the embodiments, and is not intended to limit the present invention. In this specification, the singular also includes the plural, unless specifically stated otherwise in the phrase. As used herein, “comprises” and/or “comprising” refers to the presence of one or more other components, steps, operations and/or elements mentioned. or addition is not excluded.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein may be used with the meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless clearly specifically defined.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예들을 보다 상세하게 설명하고자 한다. 도면 상의 동일한 구성요소에 대해서는 동일한 참조 부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. The same reference numerals are used for the same components in the drawings, and repeated descriptions of the same components are omitted.

CXR에 대하여 COVID-19의 중증도를 정량화(Quantification) 및 병변을 영역화(Localization) 하는 것은 진단된 환자의 추적 관찰(Follow-up) 및 치료 방침 결정(Treatment plan) 과정에서도 유용하게 활용될 수 있다. 딥러닝 기반으로 중증도를 정량화 및 영역화를 하기 위해서는 보통 픽셀 단위의 라벨링(Pixel-level labelling)이 유용하지만, 이러한 라벨이 포함된 데이터셋을 구축하는 것에는 많은 양의 시간과 노동력이 소모된다. 따라서, 흉부를 6개의 영역으로 나눈 뒤, COVID-19 관련 병변의 유무에 따라 6개의 영역에 1 또는 0을 할당하는 배열 방식의 라벨링 기법이 도입되었다. 이러한 방식의 라벨링을 이용해 구축한 데이터셋을 기반으로 COVID-19의 중증도를 정량화하고 병변을 영역화하는 딥러닝 기반의 접근법이 제안되었다.Quantification of the severity of COVID-19 and localization of lesions for CXR can also be usefully used in the process of follow-up and treatment plan of diagnosed patients. . In order to quantify and localize severity based on deep learning, pixel-level labeling is usually useful, but building a dataset including these labels consumes a lot of time and labor. Therefore, after dividing the chest into 6 regions, an array-type labeling technique was introduced that assigns 1 or 0 to 6 regions depending on the presence or absence of COVID-19-related lesions. A deep learning-based approach to quantify the severity of COVID-19 and localize the lesion based on the dataset built using this method of labeling has been proposed.

흉부 단순 방사선 영상(Chest X-ray)을 활용한 감염병 진단 인공신경망 기술을 개발하기 위해서는 대규모의 학습 데이터가 필수적이며, 일관된 전처리(Pre-processing) 과정을 거친 단일 기관 데이터를 활용하여 학습할 수록 정확도, 민감도 등의 성능을 높일 수 있다. 그러나, 전세계적으로 감염병이 유행하는 상황에서는 이와 같이 일관되게 정제된 다기관 데이터 세트를 구축하는 것에 한계가 있다. 따라서, 현실적으로는 미리 구축된 데이터 세트를 활용하거나, 다양한 출처에서 획득한 데이터 세트를 활용하여 학습하는 과정이 필수적이나, 이와 같은 방법으로 인공신경망을 학습시킬 경우 새로운 데이터셋(Unseen dataset)에서 급격하게 성능이 떨어지고, 학습 시 사용했던 데이터셋에서만 성능이 잘 나오는 일반화(Generalization) 성능 저하 및 과적합(Overfitting) 문제가 발생할 수 있다.In order to develop an artificial neural network technology for diagnosing infectious diseases using chest X-ray, large-scale learning data is essential, and the more accurate it is to learn by using single-organ data that has undergone a consistent pre-processing process. , sensitivity, etc. can be improved. However, there is a limit to constructing such a consistently refined multi-center data set in a situation where infectious diseases are prevalent around the world. Therefore, in reality, it is essential to use a pre-established data set or to learn by using data sets obtained from various sources. Performance may be poor, and generalization performance degradation and overfitting problems may occur, where performance is only good on the dataset used for training.

본 발명의 실시예들은, 사전 구축된(Pre-built) 대규모 공개 데이터셋(Large public data set)에서 저레벨 특징(Low-level feature)을 분류하도록 학습한 후, 이렇게 학습된 모델에서 특징 맵을 얻어낸 후 이를 비전 변환기(Vision Transformer)를 활용하여 조합함으로써, 감염병 예를 들어, COVID-19로 추정(또는 진단)된 영상에서 중증도의 정도까지 정량화하는 것을 그 요지로 한다.Embodiments of the present invention learn to classify a low-level feature in a pre-built large public data set, and then obtain a feature map from the learned model. The gist is to quantify to the degree of severity in images estimated (or diagnosed) of an infectious disease, for example, COVID-19 by combining them using a Vision Transformer.

이러한 본 발명은 저레벨 특징을 대규모 데이터셋으로부터 학습된 모델을 활용하여 추출하므로 일반화 성능 저하 현상이 거의 나타나지 않고, 소규모 라벨 데이터로 학습된 비전 변환기에서 저레벨 특징들을 조합하여 최종 결과를 내기 때문에 마찬가지로 일반화 성능 저하의 영향을 적게 받는다. 즉, 본 발명은 소규모 학습 데이터에서 일반화 성능 저하라는 문제점을 해결하고, 적은 노동력으로 단시간에 수집이 가능한 중증도 배열(Severity array)의 라벨로부터 약지도 학습(Weakly-supervised learning)된 모델을 활용하여 중증도 맵을 예측함으로써, 단순한 중증도 배열의 라벨로부터 약지도 학습 방식을 통해 중증도 맵의 예측 정확성을 향상시킬 수 있다.As such, the present invention extracts low-level features by using a model learned from a large-scale dataset, so there is almost no degradation in generalization performance. less affected by degradation. That is, the present invention solves the problem of generalization performance degradation in small-scale training data, and utilizes a weakly-supervised learning model from a label of a severity array that can be collected in a short time with little labor. By predicting the map, it is possible to improve the prediction accuracy of the severity map through the weakly supervised learning method from the labels of the simple severity sequence.

도 1은 본 발명의 일 실시예에 따른 비전 변환기 기반의 감염병 중증도 정량화 방법에 대한 동작 흐름도를 나타낸 것이다.1 is a flowchart showing an operation of a method for quantifying the severity of an infectious disease based on a vision transducer according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 비전 변환기 기반의 감염병 중증도 정량화 방법은 입력 흉부 방사선 영상을 수신하는 과정(S110), 수신된 입력 흉부 방사선 영상에 대하여, 미리 학습된 뉴럴 네트워크를 이용하여 특징 맵(Feature map)을 추출하는 과정(S120), 추출된 특징맵을 기반으로 비전 변환기를 이용하여 입력 흉부 방사선 영상에서 병변을 분류하는 과정(S130) 및 추출된 특징맵과 분류된 병변에 기초하여 병변 예를 들어, COVID-19의 중증도를 정량화하는 과정(S140)을 포함한다.Referring to FIG. 1 , the method for quantifying the severity of an infectious disease based on a vision transducer according to an embodiment of the present invention includes a process of receiving an input chest radiation image ( S110 ), and a pre-trained neural network for the received input chest radiation image. A process of extracting a feature map by using (S120), a process of classifying a lesion from an input chest radiographic image using a vision transducer based on the extracted feature map (S130), and a process of classifying the lesion with the extracted feature map (S130) Based on the lesion, for example, it includes the process of quantifying the severity of COVID-19 (S140).

단계 S110은 감염병으로 진단된 영상 예를 들어, COVID-19로 진단된 CXR 영상을 수신하거나 감염병 여부를 판단하고자 하는 CXR 영상을 수신할 수도 있다.In step S110, an image diagnosed with an infectious disease, for example, a CXR image diagnosed with COVID-19, or a CXR image for determining whether an infectious disease is received may be received.

단계 S120은 단계 S110에서 수신된 입력 흉부 방사선 영상을 정규화 영상으로 변환한 후 정규화 영상에 대하여 뉴럴 네트워크 예를 들어, 백본 네트워크 또는 특징 임베딩 네트워크를 이용하여 특징 맵을 추출할 수 있다.In operation S120, after the input chest radiographic image received in operation S110 is converted into a normalized image, a feature map may be extracted from the normalized image by using a neural network, for example, a backbone network or a feature embedding network.

여기서, 단계 S120은 입력 흉부 방사선 영상의 영상의 너비와 높이, 픽셀 값 범위를 일정한 범위로 정규화함으로써, 입력 흉부 방사선 영상을 정규화 영상으로 변환할 수 있다. 물론, 본 발명의 방법은 정규화 영상으로 변환하는 과정을 필요에 따라 생략할 수도 있으며, 이러한 부분은 본 발명의 기술을 제공하는 사업자 또는 개인에 의해 결정될 수 있다.Here, in step S120, the input chest radiographic image may be converted into a normalized image by normalizing the width, height, and pixel value range of the image of the input chest radiographic image to a predetermined range. Of course, the method of the present invention may omit the process of converting to a normalized image if necessary, and this part may be determined by a business operator or an individual providing the technology of the present invention.

단계 S120에서의 뉴럴 네트워크는 라벨링된 대규모의 제1 트레이닝 데이터셋에 의하여 미리 학습된 후, 분류하고자 하는 병변 예를 들어, COVID-19에 대하여 라벨링된 소규모의 제2 트레이닝 데이터셋을 이용한 트레이닝을 통해 비전 변환기와 함께 지도 및 약지도 학습됨으로써, 입력 흉부 방사선 영상에 대하여 분류하고자 하는 병변에 대한 특징 맵을 추출할 수 있다.The neural network in step S120 is pre-trained by the labeled large-scale first training dataset, and then through training using a small-scale second training dataset labeled for the lesion to be classified, for example, COVID-19. By learning the map and ring map together with the vision transducer, it is possible to extract a feature map for a lesion to be classified from the input chest radiographic image.

여기서, 단계 S120은 상기 입력 흉부 방사선 영상에 대하여, 폐렴, 경결(consolidation), 폐 음영, 흉막유출, 심장 비대, 부종, 무기폐, 기흉, 보조 장치 및 소견 없음의 저레벨 특징이 담긴 상기 특징 맵을 추출할 수 있다.Here, step S120 extracts the feature map containing the low-level features of pneumonia, consolidation, lung shading, pleural outflow, cardiac hypertrophy, edema, atelectasis, pneumothorax, auxiliary devices, and no findings from the input chest radiographic image. can do.

단계 S130은 특징 맵에 포함된 정보를 조합하여 병변을 분류하고, 분류된 병변 분류 결과를 활용하여 최종적인 진단을 추정하거나 병변 분류 결과 중 가장 많이 분류된 병변을 최종 병변으로 분류할 수 있다. 즉, 비전 변환기는 입력된 특징 맵을 이용하여 맵 헤드로 제공하기 위한 최종 특징을 생성할 수 있다.In step S130, the lesion is classified by combining the information included in the feature map, and the final diagnosis can be estimated using the classified lesion classification result, or the most classified lesion among the lesion classification results can be classified as the final lesion. That is, the vision converter may generate a final feature to be provided to the map head by using the input feature map.

단계 S140은 추출된 특징맵과 비전 변환기에 의해 분류된 병변에 기초하여 병변을 정량화하여 병변을 영역화할 수 있다.In step S140, the lesion may be quantified based on the extracted feature map and the lesion classified by the vision converter to region the lesion.

여기서, 단계 S140은 상기 추출된 특징맵에 포함된 정보의 조합과 분류된 병변에 기초하여 입력 흉부 방사선 영상의 중증도를 정량화하고 병변의 위치를 영역화할 수 있다.Here, in step S140, the severity of the input chest radiographic image may be quantified based on a combination of information included in the extracted feature map and the classified lesion, and the location of the lesion may be regionalized.

예를 들어, 단계 S140은 추출된 특징맵과 분류된 병변에 기초하여 병변 확률 맵을 생성하고, 생성된 병변 확률 맵을 이용하여 입력 흉부 방사선 영상 또는 병변의 중증도를 정량화할 수 있는데, 입력 흉부 방사선 영상의 좌/우 각각의 폐 영역을 세 개의 영역으로 분할하고, 분할된 영역 각각에 대하여 병변 확률 최대 값을 할당하며, 할당된 병변 확률 최대 값을 이용하여 분할된 영역 각각에 대하여 입력 흉부 방사선 영상의 중증도를 정량화할 수 있다.For example, in step S140, a lesion probability map may be generated based on the extracted feature map and the classified lesion, and the input chest radiographic image or the severity of the lesion may be quantified using the generated lesion probability map. Each of the left and right lung regions of the image is divided into three regions, a maximum lesion probability value is assigned to each of the divided regions, and an input chest radiographic image is obtained for each of the divided regions using the assigned maximum lesion probability value. can quantify the severity of

이러한 본 발명의 방법에 대하여 도 2 내지 도 5를 참조하여 설명하면 다음과 같다.The method of the present invention will be described with reference to FIGS. 2 to 5 as follows.

본 발명의 모델은 주석이 달린 중증도 점수 배열로 전면 CXR 영상으로 트레이닝된다. 특히, 좌/우 각각의 폐는 먼저 수직 방향으로 세 개의 영역으로 나뉜다. 하단 면적은 늑간 홈에서 하단 hilar 마크까지 확장되고, 가운데 부분은 하단 hilar 마크에서 상단 hilar 마크까지, 위쪽 부분은 상단 hilar 마크에서 상단 hilar 마크까지 이어진다. 그런 다음, 각 영역은 spines를 가로지르는 수평 방향을 따라 두 영역으로 나뉜다. 2진 점수 0/1은 음영의 유무에 따라 각 영역에 할당된다. 따라서 완료된 라벨은 3Х2 배열 형식 및 모든 요소 범위 0-6의 합계인 글로벌 중증도 점수를 가질 수 있다.The model of the present invention is trained on anterior CXR images with an annotated sequence of severity scores. In particular, each of the left and right lungs is first divided into three regions in the vertical direction. The lower area extends from the intercostal groove to the lower hilar mark, the middle part runs from the lower hilar mark to the upper hilar mark, and the upper part runs from the upper hilar mark to the upper hilar mark. Each region is then divided into two regions along a horizontal direction across the spines. Binary score 0/1 is assigned to each area according to the presence or absence of shadows. Thus, a completed label can have a 3Х2 array format and a global severity score that is the sum of all element ranges 0-6.

본 발명의 모델의 전체적인 아키텍처는 도 2a에 도시된 바와 같이, 첫째, 입력 CXR 영상이 사전 처리되어 폐 분할 네트워크(STGV2)에 부여된다. 분할된 폐 영상은 특징 임베딩 네트워크에 공급되고 비전 변환기에 이어진다. 비전 변환기에서 생성된 최종 특징은 완전한 COVID-19 확률 맵을 생성하는 맵 헤드에 제공된다. ROI max pooling에 의해 3Х2 COVID-19 중증도 배열은 최종 출력으로 추정된다. The overall architecture of the model of the present invention is as shown in Fig. 2a, first, the input CXR image is pre-processed and given to the lung segmentation network (STGV2). The segmented lung image is fed to a feature embedding network and followed by a vision transducer. The final features generated by the vision transducer are fed to the map head, which creates a complete COVID-19 probability map. By ROI max pooling, the 3Х2 COVID-19 severity array is estimated as the final output.

본 발명의 모델의 장점은 변환기가 공개적으로 사용 가능한 크고 잘 다듬어진 CXR 데이터셋에서 비정상적인 CXR 특징을 추출하도록 트레이닝된 특징 임베딩 네트워크에서 얻은 저레벨의 CXR 특징 코퍼스(corpus)를 이용할 수 있다는 것이다.An advantage of the model of the present invention is that the transformer can utilize a low-level CXR feature corpus obtained from a feature embedding network that has been trained to extract anomalous CXR features from large and well-polished CXR datasets that are publicly available.

저레벨의 특징 corpus를 위한 사전 트레이닝 특징 임베딩 네트워크: 영상에서 저레벨의 CXR 특징 corpus를 추출하기 위한 특징 임베딩 네트워크(또는 백본 네트워크)로서, 도 3a에 도시된 바와 같이 이 모델은 DenseNet-121 기반 특징 추출기의 출력에 확률론적 클래스 활성화 맵(PCAM) 풀링을 활용하여 분류 및 위치지정 능력을 모두 향상시키기 위해 클래스 활성화 맵의 이점을 명시적으로 활용한다. 특징 임베딩 네트워크는 폐렴, 경결(consolidation), 폐 음영, 흉막유출, 심장 비대, 부종, 무기폐, 기흉, 보조 장치 및 소견 없음의 10가지 방사선 소견을 분류하는 광범위한 공개 CXR 영상 데이터셋에서 사전 트레이닝될 수 있다. 구체적으로, 본 발명은 DenseNet-121의 전환 계층 3 이전에 16Х16Х1024 특징을 사용할 수 있다.Pre-trained feature embedding network for low-level feature corpus: A feature embedding network (or backbone network) for extracting low-level CXR feature corpus from an image. Explicitly exploit the benefits of class activation maps to improve both classification and localization capabilities by utilizing probabilistic class activation map (PCAM) pooling on the output. The feature embedding network can be pre-trained on an extensive public CXR imaging dataset that classifies 10 radiological findings: pneumonia, consolidation, lung shading, pleural effusion, cardiac hypertrophy, edema, atelectasis, pneumothorax, assistive devices, and no findings. have. Specifically, the present invention can use the 16Х16Х1024 feature before the transition layer 3 of DenseNet-121.

특징 임베딩 네트워크(backbone network)는 도 3a에 도시된 바와 같이, 특징 임베딩을 추출할 수 있는 여러 계층이 있으며, PCAM 작업 이전의 중간 레벨 임베딩에 가장 유용한 정보가 포함되어 있다. 그러나 최적의 PCAM 맵을 제공하도록 정렬된 특징을 안내하여 중간 레벨 특징 임베딩의 정확도를 향상시키기 위해 특정 저레벨 CXR 특징 예를 들어, 심장 비대, 폐 음영, 부종, 경결 등으로 트레이닝된 PCAM 장치가 필수적이었기 때문에 주의해야 한다. 특히 사전 트레이닝된 특징 임베딩 네트워크 F를 사용하여 입력 영상 예를 들어, 분할된 폐 x∈R^H×W×C가 중간 특징 맵 c∈R^H'×W'×C'으로 인코딩(또는 투영)될 수 있다. 각 H'ХW' 픽셀의 C' 차원 특징 벡터 c는 각 픽셀 위치에 투영된 형상 벡터로, 각 픽셀 위치의 저레벨 특징에 대한 인코딩된 표현으로 사용하고 저레벨 CXR 특징 코퍼스를 구성할 수 있으며, 아래 <수학식 1> 및 <수학식 2>와 같이 나타낼 수 있다.As shown in FIG. 3A, the feature embedding network (backbone network) has several layers from which feature embeddings can be extracted, and the most useful information is included in the intermediate level embedding before the PCAM operation. However, to improve the accuracy of mid-level feature embeddings by guiding the aligned features to provide an optimal PCAM map, a PCAM device trained with certain low-level CXR features, e.g., cardiac hypertrophy, lung shading, edema, induration, etc., was essential. because you have to be careful. In particular, an input image e.g. a segmented lung x∈R ^H×W×C will be encoded (or projected) into an intermediate feature map c∈R ^H'×W'×C' using a pretrained feature embedding network F. can The C'-dimensional feature vector c of each H'ХW' pixel is the shape vector projected at each pixel location, which can be used as an encoded representation of the low-level feature at each pixel location and construct a low-level CXR feature corpus, below < It can be expressed as Equation 1> and <Equation 2>.

[수학식 1][Equation 1]

[수학식 2][Equation 2]

비전 변환기: 비전 변환기 모델은 BERT(Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirec-tional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018))와 유사하게 도 3b에 도시된 바와 같이 인코더 전용 아키텍처로, ViT-B/16 아키텍처를 채택하며, 입력은 16×16 패치일 수 있다. 변환기 인코더는 차원 D의 상수 잠재 벡터를 사용하기 때문에, 1Х1 컨볼루션 커널을 사용하여 차원 C'의 인코딩된 특징 c를 차원 D의 c_p에 투영한다. BERT의 [class] 토큰과 유사하게, 본 발명은 이 [class] 토큰 z⁰ _L의 마지막 L 계층 출력이 z⁰ _L에 분류 헤드를 연결하여 전체 CXR 영상(=y)의 진단을 나타내도록 투영된 특징 c_p에 학습 가능한 추가 임베딩 벡터 c_cls를 권장할 수 있다. 즉, BERT의 [class] 토큰 역할인 학습 가능한 벡터 c_cls는 ViT 트레이닝에 포함되어 있다. 그러나 토큰 위치의 출력을 제외하고 ViT-B/16의 최종 계층 출력을 사용한다. 또한, 본 발명은 순차적 순서의 개념을 투영된 특징 c_p에 인코딩하기 위해 위치 임베딩 E_pos를 추가함으로써, 특징 맵의 위치 정보를 잃지 않는다. 모델에 사용된 변환기 인코더 계층은 각 블록의 다중 헤드 자기 주의(MSA), 다중 계층 퍼셉트론(MLP), 계층 정규화(LN) 및 잔여 연결로 구성된 표준 변환기 인코더와 동일하며, 이 절차는 아래 <수학식 3>과 같이 나타낼 수있다.Vision Transformers: The Vision Transformers Model is BERT (Devlin, J., Chang, MW, Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirec-tional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) )), as shown in FIG. 3b , as an encoder-only architecture, adopts a ViT-B/16 architecture, and the input may be a 16×16 patch. Since the transformer encoder uses a constant latent vector of dimension D, it uses a 1Х1 convolution kernel to project the encoded feature c of dimension C' onto c _p of dimension D. Similar to BERT's [class] token, the present invention proposes that the last L-layer output of this [class] token z ⁰ _L is projected to represent the diagnosis of the entire CXR image (=y) by connecting a classification head to z ⁰ _L . An additional learnable embedding vector c _cls can be recommended for features c _p . That is, the learnable vector c _cls , which is the role of BERT's [class] token, is included in ViT training. However, it uses the final layer output of ViT-B/16 except for the output of the token position. Furthermore, the present invention does not lose the position information of the feature map by adding the position embedding E _pos to encode the concept of sequential order into the projected features c _p . The transformer encoder layer used in the model is the same as the standard transformer encoder, consisting of multi-head self-attention (MSA), multi-layer perceptron (MLP), layer normalization (LN) and residual concatenation of each block, the procedure is as follows 3> can be expressed as

[수학식 3][Equation 3]

여기서, L은 ViT의 계층수를 의미할 수 있다. 예를 드어, ViT-B/16의 경우 L은 12일 수 있다.Here, L may mean the number of layers of ViT. For example, for ViT-B/16, L may be 12.

확률 맵 생성 및 ROI Max Pooling: ViT의 출력을 사용하는 맵 헤드는 4개의 업사이징(upsizing) 컨볼루션 블록으로 구성되며, 입력 크기와 동일한 크기의 맵을 생성한다. 맵 헤드의 상세 구조는 도 2b에 도시된 바와 같으며, 맵 헤드의 출력에 폐 마스크 m∈R^H×W를 곱하면 COVID-19 병변 확률 맵 y∈R^H×W이 생성된다. ROI max-pooling(RMP)은 도 2a에 도시된 바와 같이, COVID-19 병변 맵을 중증도 배열 a∈R^3×2로 변환하는 데 사용될 수 있으며, 병변 확률 맵과 중증도 배열은 아래 <수학식 4>와 같이 나타낼 수 있다.Probability map generation and ROI Max Pooling: The map head using the output of ViT consists of 4 upsizing convolution blocks, and generates a map of the same size as the input size. The detailed structure of the map head is as shown in FIG. 2B , and when the output of the map head is multiplied by the lung mask m∈R ^H×W , a COVID-19 lesion probability map y∈R ^H×W is generated. ROI max-pooling (RMP) can be used to transform the COVID-19 lesion map into a severity array a∈R ^3×2 , as shown in FIG. 2A , and the lesion probability map and severity array are below <Equation 4 > can be expressed as

[수학식 4][Equation 4]

특히 폐 마스크의 연결된 구성 요소를 계산하여 폐가 오른쪽과 왼쪽 폐로 분리된다. 다음으로, 좌/우 각각의 폐는 폐 마스크의 가장 높은 위치와 가장 낮은 위치의 5/12 및 2/3 지점의 분할선에 의해 세 영역으로 분할된다. 그런 다음, 6개의 각 영역의 최대값이 3Х2 배열의 각 해당 요소에 할당된다. 모델을 최적화하기 위해 예측된 중증도 배열과 레이블 중증도 배열 간에 이진 교차 엔트로피 손실이 계산된다. 이러한 라인 추정 및 max pooling 프로세스는 약지도 학습 계획의 핵심일 수 있다.Specifically, by counting the connected components of the lung mask, the lungs are separated into right and left lungs. Next, each of the left and right lungs is divided into three regions by dividing lines at 5/12 and 2/3 points of the highest and lowest positions of the lung mask. Then, the maximum value of each of the six regions is assigned to each corresponding element of the 3Х2 array. To optimize the model, a binary cross-entropy loss is computed between the predicted severity array and the label severity array. This line estimation and max pooling process can be the heart of a weakly supervised learning plan.

라벨링되지 않은 데이터셋에 대한 셀프 트레이닝(self-training)Self-training on unlabeled datasets

팬데믹 상황에서는 라벨링 방법이 매우 간단하더라도 충분한 중증도 라벨을 수집하기가 종종 어렵다. 본 발명은 모델의 성능을 향상시킬 수 있도록 중증도 라벨이 없는 더 큰 데이터셋과 중증도 라벨링된 더 작은 데이터셋을 활용하는 셀프 트레이닝을 사용할 수 있다. 셀프 트레이닝 방법의 상세한 과정은 도 4와 같다.In a pandemic situation, it is often difficult to collect sufficient severity labels, even if the labeling method is very simple. The present invention may use self-training utilizing larger datasets without severity labels and smaller datasets with severity labels to improve the performance of the model. A detailed process of the self-training method is shown in FIG. 4 .

도 4를 참조하여 셀프 트레이닝 방법에 대해 설명하면, 첫 번째 단계에서 교사 네트워크는 라벨링된 데이터셋으로 트레이닝된다. 두 번째 단계에서 새로운 교사의 복사본으로 생성된 학생 네트워크는 이전 데이터셋에 라벨의 없는 새로운 데이터셋의 일부가 더해진 데이터셋에 대해 트레이닝된다. 학생 모델은 라벨링된 입력에 대해서는 참 라벨에 의해 라벨링되지 않은 입력에 대해 교사 네트워크에서 생성된 의사(pseudo) 라벨로 최적화된다. 그 다음, 학생은 새로운 교사가 되고, 두 번째 단계로 돌아감으로써, 그 과정이 반복되며, 지속적으로 학생 모델을 갱신한다.The self-training method will be described with reference to FIG. 4 , in a first step, a teacher network is trained with a labeled dataset. In the second step, the student network created with the new teacher's copy is trained on the old dataset plus a portion of the new, unlabeled dataset. The student model is optimized with pseudo labels generated by the teacher network for unlabeled inputs by true labels for labeled inputs. Then, the student becomes the new teacher, and by returning to the second stage, the process repeats, continuously updating the student model.

아래 <표 1>은 CNUH(Chungnam National Univerity Hospital) 외부 테스트셋에 대한 중증도 정량화 성능의 정량적 비교를 나타낸 것으로, 표 1을 통해 알 수 있듯이, 본 발명의 모델은 대부분의 메트릭에서 CNN 기반 모델보다 우수한 성능과 일반화 가능성을 보여주는 것을 알 수 있다. 여기서, MSE는 평균 제곱 오차로, 전역 중증도 점수 범위 0~6의 회귀에 대한 주 메트릭으로 사용될 수 있고, MAE는 평균 절대 오차를 의미하며, CC는 상관 계수를 의미하고, R²는 전역 점수 회귀에 대한 점수를 의미하며, AUC는 ROC 커브의 아래 면적 평균을 의미할 수 있다.<Table 1> below shows the quantitative comparison of the severity quantification performance for the CNUH (Chungnam National University Hospital) external test set. As can be seen from Table 1, the model of the present invention is superior to the CNN-based model in most metrics. It can be seen that the performance and generalizability are shown. where MSE is the mean squared error, which can be used as the main metric for regression in the global severity score range 0-6, MAE means the mean absolute error, CC means the correlation coefficient, and R ² is the global score regression. may mean a score for , and AUC may mean an average of the area under the ROC curve.

도 5는 본 발명의 방법과 모델과 DenseNet-121 기반 모델 간의 BIMCV 외부 테스트셋에 대한 영역화 성능을 비교한 일 예시도를 나타낸 것으로, 도 5를 통해 알 수 있듯이, 본 발명의 모델(a)의 CXR 영상에서 비정상 영역에 대한 예측이 DenseNet-121 기반 모델(b)보다 더 정확한 위치 파악을 보여주는 것을 알 수 있다.Figure 5 shows an exemplary view comparing the scoping performance of the BIMCV external test set between the method and model of the present invention and the DenseNet-121 based model. As can be seen from Figure 5, the model (a) of the present invention It can be seen that the prediction of the abnormal region in the CXR image of , shows more accurate localization than the DenseNet-121 based model (b).

이와 같이, 본 발명의 실시예에 따른 방법은 흉부 방사선 영상 특징을 활용한 비전 변환기를 이용하여 감염병 예를 들어, COVID-19의 추정 및 중증도를 정량화할 수 있는데, 사전 구축된(Pre-built) 대규모 공개 데이터셋(Large public data set)에서 저레벨 특징(Low-level feature)을 분류하도록 학습한 후, 이렇게 학습된 모델에서 특징 맵을 얻어낸 후 이를 비전 변환기(Vision Transformer)를 활용하여 조합함으로써, COVID-19로 진단된 영상에서 중증도의 정도까지 정량화할 수 있다.As such, the method according to an embodiment of the present invention can quantify the estimation and severity of an infectious disease, for example, COVID-19, using a vision transducer utilizing chest radiographic image characteristics, and pre-built After learning to classify low-level features from a large public data set, obtaining a feature map from this trained model and combining it using a Vision Transformer, COVID-19 It can be quantified up to the degree of severity in the images diagnosed with -19.

또한, 본 발명의 실시예에 따른 방법은 추정 뿐만 아니라 중증도 정량화를 가능하게 함으로써, 감염병 선별 진료 과정에서 활용하면 감염병의 전파를 최소화하여 의료 자원의 효율적인 분배를 기대할 수 있으며, 치료 효과 확인 및 추적 관찰 과정에 적용하여 효과적으로 임상 의사를 보조 할 수 있다. 예를 들어, 본 발명은 단순 방사선 촬영 영상을 활용하여 코로나-19 등의 감염병으로 진단된 환자의 중증를 정량화 함으로써, 이미기 진단된 환자의 추적 관찰(Follow-up) 및 치료 방침 결정(Treatment plan) 과정에서도 유용하게 활용될 수 있다.In addition, the method according to an embodiment of the present invention enables not only estimation but also quantification of severity, so that when used in the process of screening and treating infectious diseases, it is possible to minimize the spread of infectious diseases to expect efficient distribution of medical resources, and to check and follow up treatment effects It can be applied to the process and effectively assist the clinician. For example, the present invention uses a simple radiographic image to quantify the severity of a patient diagnosed with an infectious disease, such as Corona-19, to follow-up and determine a treatment plan of a previously diagnosed patient (Treatment plan) It can also be useful in the process.

또한, 본 발명의 실시예에 따른 방법에서 저레벨 특징은 대규모 데이터셋으로부터 학습된 모델을 활용하여 추출하므로 일반화 성능 저하 현상이 거의 나타나지 않고, 소규모 라벨 데이터로 학습된 비전 변환기는 저레벨 특징들을 조합하여 최종 결과를 내는 것 뿐이므로 마찬가지로 일반화 성능 저하의 영향을 적게 받는다.In addition, in the method according to an embodiment of the present invention, low-level features are extracted by utilizing a model learned from a large-scale dataset, so there is almost no generalization performance degradation. Since it only produces results, it is also less affected by the generalization performance degradation.

또한, 본 발명의 실시예에 따른 방법은 감염병 진단 및 정량화에 국한된 것이 아니라 저레벨 특징을 조합하여 고레벨 결과 값을 도출하는 모든 방식의 알고리즘에서 활용할 수 있으며, 중증도 라벨이 되어있지 않은 데이터셋과 셀프 트레이닝을 이용하여 소규모의 중증도 라벨된 데이터셋으로부터 임상 전문가 수준의 성능을 보여주는 중증도 정량화 모델을 개발할 수 있다.In addition, the method according to an embodiment of the present invention is not limited to diagnosis and quantification of infectious diseases, but can be utilized in all algorithms for deriving high-level result values by combining low-level features. can be used to develop a severity quantification model that shows clinical expert-level performance from a small, severity-labeled dataset.

이러한 본 발명의 방법에서 중증도 정량화와 병변 영역화를 위한 새로운 ViT 체계의 중요한 장점 중 하나는 변환기의 글로벌 주의 맵이 전체 병변 맵으로 이어질 수 있다는 점이다. 여기서, 각 픽셀 값은 COVID-19의 이상 확률을 직접 의미하고, 셀프 트레이닝은 중증도 라벨링된 작은 데이터셋 외에도 라벨링되지 않은 큰 데이터셋을 활용할 수 있다.One of the important advantages of the novel ViT system for severity quantification and lesion localization in this method of the present invention is that the global attention map of the transducer can lead to the entire lesion map. Here, each pixel value directly means the odds of COVID-19, and self-training can utilize large, unlabeled datasets in addition to small datasets labeled with severity.

도 6은 본 발명의 일 실시예에 따른 비전 변환기 기반의 감염병 중증도 정량화 장치에 대한 구성을 나타낸 것으로, 도 1 내지 도 5의 방법을 수행하는 장치에 대한 개념적인 구성을 나타낸 것이다.6 shows the configuration of an apparatus for quantifying the severity of an infectious disease based on a vision transducer according to an embodiment of the present invention, and shows a conceptual configuration of an apparatus for performing the method of FIGS. 1 to 5 .

도 6을 참조하면, 본 발명의 일 실시예에 따른 비전 변환기 기반의 감염병 중증도 정량화 장치(600)는 수신부(610), 추출부(620), 분류부(630) 및 정량화부(640)를 포함한다.Referring to FIG. 6 , an apparatus 600 for quantifying the severity of an infectious disease based on a vision transducer according to an embodiment of the present invention includes a receiving unit 610 , an extracting unit 620 , a classifying unit 630 , and a quantifying unit 640 . do.

수신부(610)는 입력 흉부 방사선 영상을 수신한다.The receiver 610 receives an input chest radiographic image.

이 때, 수신부(610)는 감염병으로 진단된 영상 예를 들어, COVID-19로 진단된 CXR 영상을 수신하거나 감염병 여부를 판단하고자 하는 CXR 영상을 수신할 수도 있다.In this case, the receiving unit 610 may receive an image diagnosed with an infectious disease, for example, a CXR image diagnosed with COVID-19 or a CXR image for determining whether an infectious disease is present.

추출부(620)는 수신된 입력 흉부 방사선 영상에 대하여, 미리 학습된 뉴럴 네트워크를 이용하여 특징 맵(Feature map)을 추출한다.The extractor 620 extracts a feature map from the received input chest radiographic image using a pre-trained neural network.

이 때, 추출부(620)는 수신된 입력 흉부 방사선 영상을 정규화 영상으로 변환한 후 정규화 영상에 대하여 뉴럴 네트워크 예를 들어, 백본 네트워크 또는 특징 임베딩 네트워크를 이용하여 특징 맵을 추출할 수 있다.In this case, the extractor 620 may convert the received input chest radiographic image into a normalized image and then extract a feature map from the normalized image by using a neural network, for example, a backbone network or a feature embedding network.

추출부(620)는 입력 흉부 방사선 영상의 영상의 너비와 높이, 픽셀 값 범위를 일정한 범위로 정규화함으로써, 입력 흉부 방사선 영상을 정규화 영상으로 변환할 수 있다. The extractor 620 may convert the input chest radiographic image into a normalized image by normalizing the width, height, and pixel value range of the image of the input chest radiographic image to a predetermined range.

추출부(620)에서의 뉴럴 네트워크는 라벨링된 대규모의 제1 트레이닝 데이터셋에 의하여 미리 학습된 후, 분류하고자 하는 병변 예를 들어, COVID-19에 대하여 라벨링된 소규모의 제2 트레이닝 데이터셋을 이용한 트레이닝을 통해 비전 변환기와 함께 약지도 학습됨으로써, 입력 흉부 방사선 영상에 대하여 분류하고자 하는 병변에 대한 특징 맵을 추출할 수 있다.After the neural network in the extraction unit 620 is pre-trained by the labeled large-scale first training dataset, the lesion to be classified, for example, using a small-scale second training dataset labeled for COVID-19 By learning the ring map together with the vision transducer through training, a feature map for a lesion to be classified can be extracted from the input chest radiographic image.

여기서, 추출부(620)는 입력 흉부 방사선 영상에 대하여, 폐렴, 경결(consolidation), 폐 음영, 흉막유출, 심장 비대, 부종, 무기폐, 기흉, 보조 장치 및 소견 없음의 저레벨 특징이 담긴 상기 특징 맵을 추출할 수 있다.Here, the extraction unit 620 for the input chest radiographic image, pneumonia, consolidation (consolidation), lung shading, pleural outflow, cardiac hypertrophy, edema, atelectasis, pneumothorax, auxiliary device, and the feature map containing the low-level features of no findings. can be extracted.

분류부(630)는 추출된 특징맵을 기반으로 비전 변환기를 이용하여 입력 흉부 방사선 영상에서 병변을 분류한다.The classifier 630 classifies the lesion in the input chest radiographic image using a vision transducer based on the extracted feature map.

이 때, 분류부(630)는 특징 맵에 포함된 정보를 조합하여 병변을 분류하고, 분류된 병변 분류 결과를 활용하여 최종적인 진단을 추정하거나 병변 분류 결과 중 가장 많이 분류된 병변을 최종 병변으로 분류할 수 있다. 즉, 비전 변환기는 입력된 특징 맵을 이용하여 맵 헤드로 제공하기 위한 최종 특징을 생성할 수 있다.At this time, the classification unit 630 classifies the lesion by combining the information included in the feature map, and estimates the final diagnosis by using the classified lesion classification result, or sets the most classified lesion among the lesion classification results as the final lesion. can be classified. That is, the vision converter may generate a final feature to be provided to the map head by using the input feature map.

정량화부(640)는 추출된 특징맵과 분류된 병변에 기초하여 병변 예를 들어, COVID-19의 중증도를 정량화한다.The quantification unit 640 quantifies the severity of a lesion, for example, COVID-19, based on the extracted feature map and the classified lesion.

나아가, 정량화부(640)는 추출된 특징맵과 비전 변환기에 의해 분류된 병변에 기초하여 병변을 정량화하고, 병변을 영역화할 수 있다.Furthermore, the quantification unit 640 may quantify the lesion based on the extracted feature map and the lesion classified by the vision transducer, and may localize the lesion.

여기서, 정량화부(640)는 상기 추출된 특징맵에 포함된 정보의 조합과 분류된 병변에 기초하여 입력 흉부 방사선 영상의 중증도를 정량화하고 병변의 위치를 영역화할 수 있다.Here, the quantification unit 640 may quantify the severity of the input chest radiographic image based on a combination of the information included in the extracted feature map and the classified lesion, and may localize the location of the lesion.

예를 들어, 정량화부(640)는 추출된 특징맵과 분류된 병변에 기초하여 병변 확률 맵을 생성하고, 생성된 병변 확률 맵을 이용하여 입력 흉부 방사선 영상의 중증도를 정량화할 수 있는데, 입력 흉부 방사선 영상의 좌/우 각각의 폐 영역을 세 개의 영역으로 분할하고, 분할된 영역 각각에 대하여 병변 확률 최대 값을 할당하며, 할당된 병변 확률 최대 값을 이용하여 분할된 영역 각각에 대하여 병변의 중증도를 정량화할 수 있다.For example, the quantification unit 640 may generate a lesion probability map based on the extracted feature map and the classified lesion, and quantify the severity of the input chest radiographic image using the generated lesion probability map. Each of the left and right lung regions of the radiographic image is divided into three regions, the maximum lesion probability is assigned to each of the divided regions, and the severity of the lesion is applied to each of the divided regions using the assigned maximum lesion probability value. can be quantified.

비록, 도 6의 장치에서 그 설명이 생략되었더라도, 도 6을 구성하는 각 구성 수단은 도 1 내지 도 5에서 설명한 모든 내용을 포함할 수 있으며, 이는 이 기술 분야에 종사하는 당업자에게 있어서 자명하다.Although the description of the device of FIG. 6 is omitted, each component constituting FIG. 6 may include all the contents described with reference to FIGS. 1 to 5 , which is apparent to those skilled in the art.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

입력 흉부 방사선 영상을 수신하는 단계;
상기 수신된 입력 흉부 방사선 영상에 대하여, 미리 학습된 뉴럴 네트워크를 이용하여 특징 맵(Feature map)을 추출하는 단계;
상기 추출된 특징맵을 기반으로 비전 변환기를 이용하여 상기 입력 흉부 방사선 영상에서 병변을 분류하는 단계; 및
상기 추출된 특징맵과 상기 분류된 병변에 기초하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하는 단계
를 포함하는 비전 변환기 기반의 감염병 중증도 정량화 방법.
receiving an input chest radiographic image;
extracting a feature map from the received input chest radiographic image using a pre-trained neural network;
classifying the lesion in the input chest radiographic image using a vision transducer based on the extracted feature map; and
Quantifying the severity of the input chest radiographic image based on the extracted feature map and the classified lesion
A method for quantifying the severity of an infectious disease based on a vision transducer comprising a.

제1항에 있어서,
상기 특징 맵을 추출하는 단계는
상기 수신된 입력 흉부 방사선 영상을 정규화 영상으로 변환한 후 상기 정규화 영상에 대하여 상기 뉴럴 네트워크를 이용하여 특징 맵을 추출하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 방법.
According to claim 1,
The step of extracting the feature map is
The method for quantifying the severity of an infectious disease based on a vision converter, characterized in that after converting the received input chest radiographic image into a normalized image, and extracting a feature map from the normalized image using the neural network.

제2항에 있어서,
상기 특징 맵을 추출하는 단계는
상기 입력 흉부 방사선 영상의 영상의 너비와 높이, 픽셀 값 범위를 일정한 범위로 정규화함으로써, 상기 입력 흉부 방사선 영상을 상기 정규화 영상으로 변환하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 방법.
3. The method of claim 2,
The step of extracting the feature map is
The method for quantifying the severity of an infectious disease based on a vision converter, characterized in that the input chest radiographic image is converted into the normalized image by normalizing the width, height, and pixel value range of the input chest radiographic image to a predetermined range.

제1항에 있어서,
상기 정량화하는 단계는
상기 추출된 특징맵과 상기 분류된 병변에 기초하여 상기 병변을 영역화하는 단계
를 포함하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 방법.
According to claim 1,
The quantification step is
regionalizing the lesion based on the extracted feature map and the classified lesion;
A method for quantifying the severity of an infectious disease based on a vision transducer, comprising a.

제1항에 있어서,
상기 특징 맵을 추출하는 단계는
상기 입력 흉부 방사선 영상에 대하여, 폐렴, 경결(consolidation), 폐 음영, 흉막유출, 심장 비대, 부종, 무기폐, 기흉, 보조 장치 및 소견 없음의 저레벨 특징이 담긴 상기 특징 맵을 추출하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 방법.
According to claim 1,
The step of extracting the feature map is
Extracting the feature map containing low-level features of pneumonia, consolidation, lung shading, pleural effusion, cardiac hypertrophy, edema, atelectasis, pneumothorax, auxiliary devices and no findings from the input chest radiographic image A method for quantifying the severity of infectious diseases based on vision transducers.

제1항에 있어서,
상기 병변을 분류하는 단계는
상기 특징 맵에 포함된 정보를 조합하여 병변을 분류하고, 상기 분류된 병변 분류 결과를 활용하여 최종적인 진단을 추정하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 방법.
According to claim 1,
Classifying the lesion is
A method for quantifying the severity of an infectious disease based on a vision transducer, characterized in that a lesion is classified by combining the information included in the feature map, and a final diagnosis is estimated using the classified lesion classification result.

제4항에 있어서,
상기 정량화하는 단계는
상기 추출된 특징맵에 포함된 정보의 조합과 상기 분류된 병변에 기초하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하고 상기 병변의 위치를 영역화하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 방법.
5. The method of claim 4,
The quantification step is
A method for quantifying the severity of an infectious disease based on a vision transducer, characterized in that, based on a combination of information included in the extracted feature map and the classified lesion, the severity of the input chest radiographic image is quantified and the location of the lesion is localized.

제1항에 있어서,
상기 정량화하는 단계는
상기 추출된 특징맵과 상기 분류된 병변에 기초하여 병변 확률 맵을 생성하고, 상기 생성된 병변 확률 맵을 이용하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 방법.
According to claim 1,
The quantification step is
A lesion probability map is generated based on the extracted feature map and the classified lesion, and the severity of the input chest radiographic image is quantified using the generated lesion probability map. Way.

제8항에 있어서,
상기 정량화하는 단계는
상기 입력 흉부 방사선 영상의 좌/우 각각의 폐 영역을 세 개의 영역으로 분할하고, 상기 분할된 영역 각각에 대하여 병변 확률 최대 값을 할당하며, 상기 할당된 병변 확률 최대 값을 이용하여 상기 분할된 영역 각각에 대하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 방법.
9. The method of claim 8,
The quantification step is
Each of the left and right lung regions of the input chest radiographic image is divided into three regions, a maximum lesion probability value is assigned to each of the divided regions, and the divided region is obtained using the allocated maximum lesion probability value. A method for quantifying the severity of an infectious disease based on a vision transducer, characterized in that the severity of the input chest radiographic image is quantified for each.

제1항에 있어서,
상기 뉴럴 네트워크는
라벨링된 대규모의 제1 트레이닝 데이터셋에 의하여 미리 학습된 후, 분류하고자 하는 병변에 대하여 라벨링된 소규모의 제2 트레이닝 데이터셋을 이용한 트레이닝을 통해 상기 비전 변환기와 함께 지도 및 약지도 학습되는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 방법.
According to claim 1,
The neural network is
After being pre-trained by the first large-scale labeled training dataset, the map and the ring map are learned together with the vision converter through training using a small-scale second training dataset labeled for the lesion to be classified. A vision transducer-based method for quantifying the severity of infectious diseases.

입력 흉부 방사선 영상을 정규화 영상으로 변환하는 단계;
상기 변환된 정규화 영상에 대하여, 미리 학습된 뉴럴 네트워크를 이용하여 특징 맵(Feature map)을 추출하는 단계;
상기 추출된 특징 맵을 기반으로 비전 변환기를 이용하여 상기 입력 흉부 방사선 영상에서 병변을 분류하는 단계; 및
상기 추출된 특징맵과 상기 분류된 병변에 기초하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하는 단계
를 포함하는 비전 변환기 기반의 감염병 중증도 정량화 방법.
converting an input chest radiographic image into a normalized image;
extracting a feature map from the transformed normalized image using a pre-trained neural network;
classifying the lesion in the input chest radiographic image using a vision transducer based on the extracted feature map; and
Quantifying the severity of the input chest radiographic image based on the extracted feature map and the classified lesion
A method for quantifying the severity of an infectious disease based on a vision transducer comprising a.

입력 흉부 방사선 영상을 수신하는 수신부;
상기 수신된 입력 흉부 방사선 영상에 대하여, 미리 학습된 뉴럴 네트워크를 이용하여 특징 맵(Feature map)을 추출하는 추출부;
상기 추출된 특징맵을 기반으로 비전 변환기를 이용하여 상기 입력 흉부 방사선 영상에서 병변을 분류하는 분류부; 및
상기 추출된 특징맵과 상기 분류된 병변에 기초하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하는 정량화부
를 포함하는 비전 변환기 기반의 감염병 중증도 정량화 장치.
a receiver for receiving an input chest radiographic image;
an extractor for extracting a feature map from the received input chest radiographic image using a pre-trained neural network;
a classification unit for classifying lesions in the input chest radiographic image using a vision transducer based on the extracted feature map; and
A quantification unit quantifying the severity of the input chest radiographic image based on the extracted feature map and the classified lesion
A vision transducer-based infectious disease severity quantification device comprising a.

제12항에 있어서,
상기 추출부는
상기 수신된 입력 흉부 방사선 영상을 정규화 영상으로 변환한 후 상기 정규화 영상에 대하여 상기 뉴럴 네트워크를 이용하여 특징 맵을 추출하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 장치.
13. The method of claim 12,
The extraction unit
The apparatus for quantifying the severity of an infectious disease based on a vision converter, characterized in that after converting the received input chest radiographic image into a normalized image, and extracting a feature map from the normalized image using the neural network.

제13항에 있어서,
상기 추출부는
상기 입력 흉부 방사선 영상의 영상의 너비와 높이, 픽셀 값 범위를 일정한 범위로 정규화함으로써, 상기 입력 흉부 방사선 영상을 상기 정규화 영상으로 변환하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 장치.
14. The method of claim 13,
The extraction unit
The apparatus for quantifying the severity of an infectious disease based on a vision transducer, characterized in that the input chest radiographic image is converted into the normalized image by normalizing the width, height, and pixel value range of the input chest radiographic image to a predetermined range.

제12항에 있어서,
상기 정량화부는
상기 추출된 특징맵과 상기 분류된 병변에 기초하여 상기 병변을 영역화하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 장치.
13. The method of claim 12,
The quantification unit
A vision transducer-based infectious disease severity quantification apparatus, characterized in that the lesion is regionalized based on the extracted feature map and the classified lesion.

제12항에 있어서,
상기 추출부는
상기 입력 흉부 방사선 영상에 대하여, 폐렴, 경결(consolidation), 폐 음영, 흉막유출, 심장 비대, 부종, 무기폐, 기흉, 보조 장치 및 소견 없음의 저레벨 특징이 담긴 상기 특징 맵을 추출하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 장치.
13. The method of claim 12,
The extraction unit
Extracting the feature map containing low-level features of pneumonia, consolidation, lung shading, pleural effusion, cardiac hypertrophy, edema, atelectasis, pneumothorax, auxiliary devices and no findings from the input chest radiographic image Vision transducer-based infectious disease severity quantification device.

제12항에 있어서,
상기 분류부는
상기 특징 맵에 포함된 정보를 조합하여 병변을 분류하고, 상기 분류된 병변 분류 결과를 활용하여 최종적인 진단을 추정하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 장치.
13. The method of claim 12,
The classification section
A vision transducer-based infectious disease severity quantification device, characterized in that a lesion is classified by combining the information included in the feature map, and a final diagnosis is estimated using the classified lesion classification result.

제15항에 있어서,
상기 정량화부는
상기 추출된 특징맵에 포함된 정보의 조합과 상기 분류된 병변에 기초하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하고 상기 병변의 위치를 영역화하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 장치.
16. The method of claim 15,
The quantification unit
The apparatus for quantifying the severity of an infectious disease based on a vision transducer, characterized in that, based on a combination of information included in the extracted feature map and the classified lesion, the severity of the input chest radiographic image is quantified and the location of the lesion is localized.

제12항에 있어서,
상기 정량화부는
상기 추출된 특징맵과 상기 분류된 병변에 기초하여 병변 확률 맵을 생성하고, 상기 생성된 병변 확률 맵을 이용하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 장치.
13. The method of claim 12,
The quantification unit
A lesion probability map is generated based on the extracted feature map and the classified lesion, and the severity of the input chest radiographic image is quantified using the generated lesion probability map. Device.

제19항에 있어서,
상기 정량화부는
상기 입력 흉부 방사선 영상의 좌/우 각각의 폐 영역을 세 개의 영역으로 분할하고, 상기 분할된 영역 각각에 대하여 병변 확률 최대 값을 할당하며, 상기 할당된 병변 확률 최대 값을 이용하여 상기 분할된 영역 각각에 대하여 상기 입력 흉부 방사선 영상의 중증도를 정량화하는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 장치.
20. The method of claim 19,
The quantification unit
Each of the left and right lung regions of the input chest radiographic image is divided into three regions, a maximum lesion probability value is assigned to each of the divided regions, and the divided region is obtained using the allocated maximum lesion probability value. Vision transducer-based infectious disease severity quantification device, characterized in that for quantifying the severity of the input chest radiographic image for each.

제12항에 있어서,
상기 뉴럴 네트워크는
라벨링된 대규모의 제1 트레이닝 데이터셋에 의하여 미리 학습된 후, 분류하고자 하는 병변에 대하여 라벨링된 소규모의 제2 트레이닝 데이터셋을 이용한 트레이닝을 통해 상기 비전 변환기와 함께 지도 및 약지도 학습되는 것을 특징으로 하는 비전 변환기 기반의 감염병 중증도 정량화 장치.
13. The method of claim 12,
The neural network is
After being pre-trained by the first large-scale labeled training dataset, the map and the ring map are learned together with the vision converter through training using a small-scale second training dataset labeled for the lesion to be classified. A vision transducer-based infectious disease severity quantification device.