CN116188309A

CN116188309A - OCT image enhancement method based on registration network and application

Info

Publication number: CN116188309A
Application number: CN202310158800.5A
Authority: CN
Inventors: 陈新建; 谭志苇; 石霏
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-05-30

Abstract

The invention relates to an OCT image enhancement method based on a registration network, which comprises the steps that any one OCT image in a selected sample is a fixed image, and the rest OCT images are moving images; inputting the fixed image into a fixed image coding module, and inputting the coded output into a corresponding decoding block through five fixed coding blocks; inputting the moving image into a moving image coding module, and inputting the coded output into a corresponding decoding block through five moving coding blocks; the outputs of the third fixed coding block and the third movable coding block are output to a first decoding block of the decoding module through the multi-head self-attention conversion module; each decoding block in the decoding module is based on the output of the last decoding block and the output of the corresponding encoding block, and after the dimensionality is restored layer by layer from low resolution to high resolution, the decoding blocks are output to the multi-scale deformation field fusion module, and a registration image is obtained; and repeating the steps to obtain a plurality of registration images, and carrying out superposition averaging on the registration images and the fixed images to obtain a denoising image.

Description

OCT image enhancement method based on registration network and application

Technical Field

The invention relates to the technical field of OCT imaging, in particular to an OCT image enhancement method based on a registration network and application thereof.

Background

Optical coherence tomography (Optical Coherence Tomography, OCT) is a high resolution, non-contact, non-invasive technique that is widely used for structural imaging of living biological tissue. Statistics show that more than half of OCT reports are in the ophthalmic field for diagnosing various retinal diseases, etc. However, since OCT is based on low coherence interferometry, the measurement signal is inevitably affected by speckle noise, resulting in poor OCT image quality.

In recent years, there are many methods for obtaining high-quality OCT images by suppressing speckle noise, and methods for suppressing speckle noise are mainly classified into two types, one of which is to reduce speckle noise by performing post-processing on a two-dimensional image of a single OCT, for example, BM3D using a conventional algorithm. The method provides an end-to-end condition-based generation countermeasure network, adds an Edge loss function into an objective function, enhances the sensitivity of the model to Edge details, enhances the contrast of an image and inhibits speckle noise in the image. Another type of method is denoising based on multi-frame superposition averaging, which is one of effective methods for reducing speckle noise, however, when shooting, the eye movement and the position difference between too many frames are all very likely to cause artifacts after averaging, so before averaging, misalignment between images must be calibrated by a registration algorithm. Therefore, such methods are focused on technical researches of registration between images, such as correcting axial offset by using a regular dynamic programming technology in the axial direction, and acquiring the optimal registration position by calculating the cross correlation degree of a series of positions in the horizontal direction; for example, when image registration is performed based on a deep learning TransMorph model, image pixel relations are captured by utilizing a convolution structure of encoding and decoding, and registration deformation fields are formed by the decoding features of the last layer for registration.

Single-amplitude denoising reduces speckle noise by post-processing a two-dimensional slice. Conventional single-amplitude denoising ratios such as BM3D algorithm utilize filtering method to perform denoising, but there are some drawbacks, such as the method has higher complexity in implementation and low calculation efficiency. In a single-amplitude denoising ratio such as Edge-cGAN based on deep learning, a method of generating an Edge-cGAN for performing end-to-end image noise removal on an anti-network cGAN based on improved conditions is proposed, and good performance is obtained, but there are still some disadvantages, such as limitation of single slice scanning, resulting in information loss at some noise covered positions; the complexity of the improved cGAN model is increased, and the super-parameters are excessive; the deep learning-based single denoising requires training of a denoising gold standard of an image, and a source of the gold standard is obtained by averaging a plurality of images, and the deep learning approaches the gold standard through different training and cannot exceed the gold standard.

If a plurality of denoising signals are directly averaged, serious artifacts are caused due to the position difference between frames, so that registration is needed first; in the aspect of registration, if the regular dynamic programming technology is applied to correct axial offset in the axial direction, the traditional registration method for acquiring the optimal registration position by calculating the cross-correlation degree of a series of positions in the horizontal direction needs more complex cross-correlation degree calculation and needs larger calculation cost. However, although the registration method based on deep learning in the tranMorph model has greatly improved the calculation efficiency and the registration accuracy, some coding and decoding intermediate layers can be omitted, and the characteristic information is lost due to errors such as interpolation.

Disclosure of Invention

Therefore, the invention aims to solve the technical problems of image feature loss and low accuracy in the process of multi-denoising registration in the prior art.

In order to solve the technical problems, the invention provides an OCT image enhancement method based on a registration network, which comprises the following steps:

selecting any one OCT image in a certain sample of the training set as a fixed image, and taking the rest OCT images in the sample as moving images;

the method comprises the steps that a fixed image is input into a fixed image coding module of a registration network, and the coded output is input into a corresponding decoding block through five fixed coding blocks;

the method comprises the steps that a moving image is input into a moving image coding module of a registration network, and the coded output is input into a corresponding decoding block through five moving coding blocks;

the output of the third fixed coding block of the fixed image coding module and the output of the third movable coding block of the movable image coding module pass through the multi-head self-attention conversion module; the multi-head self-attention conversion module performs convolution operation and flattening operation on an input fixed image feature image and a moving image feature image respectively, performs splicing and fusion on the fixed image feature image and the moving image feature image through cascading operation, inputs the fused feature image and image position information into the converter coding unit, and performs layer normalization and convolution block operation flattening output to a first decoding block of the decoding module;

each decoding block in the decoding module is used for outputting a multi-scale deformation field fusion module after the dimension of an input characteristic image is restored layer by layer from low resolution to high resolution based on the output of the last decoding block and the output of the corresponding fixed encoding block and the corresponding mobile encoding block, so as to acquire a registration image of the fixed image;

repeatedly inputting the fixed image and the unregistered moving image into a registration network to obtain the rest registration images of the fixed image;

and carrying out superposition averaging on the fixed image and the plurality of registration images to obtain a denoising image of the fixed image.

In one embodiment of the invention, the converter coding unit comprises a multi-head self-attention block and a multi-layer perceptron; the multi-headed self-attention block includes a plurality of self-attention blocks that are channel-wise connected together in the transformer coding unit.

In one embodiment of the present invention, the fixed image encoding module and the moving image encoding module each include a first encoding block, a second encoding block, a third encoding block, a fourth encoding block and a fifth encoding block that are sequentially connected in series along a forward propagation direction;

the first coding block comprises two serially connected convolution units;

the convolution unit comprises a convolution block, a batch normalization layer and a ReLU layer which are sequentially connected in series along the forward propagation direction;

the second coding block and the third coding block comprise a maximum pooling unit and two convolution units which are sequentially connected in series along the positive transmission direction;

the fourth coding block and the fifth coding block each include a maximum pooling unit.

In one embodiment of the present invention, the decoding module includes a first decoding block, a second decoding block, a third decoding block, a fourth decoding block, and a fifth decoding block connected in series in order; each decoding block comprises a multi-scale deformation field fusion module, a space transformation module and a convolution unit which are sequentially connected in series; the multi-scale deformation field fusion module is used for learning details of global affine registration; the convolution unit comprises a convolution block, a batch normalization layer and a ReLU layer which are sequentially connected in series; the space transformation module is used for learning the input characteristics to obtain transformation parameters, and applying the transformation parameters to the moving image to map to obtain a registration image.

In one embodiment of the present invention, the multi-scale deformation field fusion module in the first decoding block convolves the input feature to obtain a first deformation field, and upsamples the first deformation field to obtain a first output deformation field;

the multi-scale deformation field fusion module in the nth decoding block convolves the input characteristics to obtain an nth deformation field, upsamples the nth deformation field and the n-1 deformation field respectively, and superimposes the upsampled results to obtain an nth output deformation field; wherein n is more than or equal to 2 and less than or equal to 5.

In one embodiment of the present invention, before any one OCT image in a certain sample of the fixed training set is a fixed image, the method includes collecting OCT images of a plurality of retinal samples, resampling, constructing an image dataset, and dividing the image dataset into a training set and a test set according to a preset proportion.

In one embodiment of the present invention, the acquiring the plurality of registered images of the fixed image includes:

and according to deformation field parameters output by the multi-scale deformation field fusion module, the space change module is utilized to distort the moving image so as to be registered with the fixed image, and a registration image of the fixed image is obtained.

In one embodiment of the present invention, after the obtaining the registered image of the fixed image, the method further includes:

optimizing a registration network according to a loss function of the fixed image and the registration image;

selecting any one OCT image in a certain sample of the test set as a fixed image, taking the rest OCT images in the sample as moving images, and repeatedly inputting the fixed image and different moving images into an optimized registration network to obtain a plurality of registration images of the fixed image in the sample of the test set;

and superposing and averaging the fixed image of the sample and a plurality of registration images to obtain a denoising image.

In one embodiment of the invention, the fixed image and registered image loss function is:

L _joint ＝L _sim +αL _smooth ，

wherein L is _sim For constraining smoothness in deformation field, L _smooth The method is used for restraining local spatial variation in a deformation field, alpha is a regularization parameter and is set to be 10;

representing a fixed image layer segmentation map L _f K-th layer tag->

Representing registered image layer segmentation map L _w K represents the number of layers in the layer-divided label; phi represents the deformation field, p represents any point in the deformation field, μ (p) represents the value of the coordinates (x, y) of the point p in the deformation field, and p is the gradient in the x direction of the deformation field

Gradient of p in the y-direction of the deformation field

The embodiment of the invention also provides an application of the OCT image enhancement method based on the registration network in the field of retina OCT image enhancement.

Compared with the prior art, the technical scheme of the invention has the following advantages:

according to the OCT image enhancement method based on the registration network, the registered images are utilized to carry out multiple average denoising, so that the problem of image information coverage in single denoising is effectively solved; compared with the traditional single-amplitude denoising method, the calculation efficiency is greatly improved; compared with a single denoising method based on deep learning, the method is not limited by the standard quality of denoising gold in training data. The invention combines the multi-head self-attention conversion module and the multi-scale deformation field fusion module to construct an image registration network; the multi-head self-attention conversion module is utilized to capture more relations between the fixed image and the moving image, so that the high-precision registration of the fixed image and the moving image is realized; the multi-scale deformation field fusion module is utilized to learn details of global affine registration by using a multi-resolution strategy, deformation fields with high resolution are output based on deformation fields and features of the previous layer, fine-granularity positioning is enhanced, volume overlapping between anatomical structures is enhanced, the number of folded voxels between registration deformation fields is reduced, and feature information is prevented from being lost, so that artifacts are reduced, images are enhanced, image registration with high registration precision is realized, and good image denoising effect is achieved.

Drawings

In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings, in which

Fig. 1 is a frame diagram of a registration network provided by the present invention;

FIG. 2 is a block diagram of a multi-headed self-attention transformation module of the registration network provided by the present invention;

fig. 3 is a schematic diagram of a decoding block structure of a registration network according to the present invention;

FIG. 4 is a flowchart of a registration network-based retinal OCT image enhancement method provided by the present invention;

fig. 5 (a) shows an original image in the first test set, and fig. 5 (b), (c), (d), (e), (f), (g), and (h) show denoising effect images obtained by denoising using BM D, KSVD, DP +hc, edge-cGAN, mini-cGAN, DHNet, and the method according to the embodiment of the present invention, respectively;

fig. 6 (a) shows the original images in the second test set, and fig. 6 (b), (c), (d), (e), (f), (g), and (h) show the denoising effect images denoising by using the methods proposed in the embodiments of the present invention, namely, BM D, KSVD, DP +hc, edge-cGAN, mini-cGAN, DHNet.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.

Referring to fig. 1, the method for enhancing OCT images based on a registration network provided by the present invention includes:

s1: selecting any one OCT image in a certain sample of the training set as a fixed image, and taking the rest OCT images in the sample as moving images;

s2: the method comprises the steps that a fixed image is input into a fixed image coding module of a registration network, and the coded output is input into a corresponding decoding block through five fixed coding blocks;

the fixed image coding module comprises a first fixed coding block, a second fixed coding block, a third fixed coding block, a fourth fixed coding block and a fifth fixed coding block which are sequentially connected in series along the positive transmission direction;

the first fixed coding block comprises two serially connected convolution units; the convolution unit comprises a convolution block, a batch normalization layer and a ReLU layer which are sequentially connected in series along the forward propagation direction; the second fixed coding block and the third fixed coding block comprise a maximum pooling unit and two convolution units which are sequentially connected in series along the positive propagation direction; the fourth fixed encoding block and the fifth fixed encoding block each include a maximum pooling unit.

S3: the method comprises the steps that a moving image is input into a moving image coding module of a registration network, and the coded output is input into a corresponding decoding block through five moving coding blocks;

the moving image coding module comprises a first moving coding block, a second moving coding block, a third moving coding block, a fourth moving coding block and a fifth moving coding block which are sequentially connected in series along the positive transmission direction;

the first mobile coding block comprises two convolution units connected in series; the second mobile coding block and the third mobile coding block comprise a maximum pooling unit and two convolution units which are sequentially connected in series along the positive propagation direction; the fourth mobile coding block and the fifth mobile coding block each include a maximum pooling unit.

S4: the output of the third fixed coding block of the fixed image coding module and the output of the third movable coding block of the movable image coding module pass through the multi-head self-attention conversion module; the multi-head self-attention conversion module performs convolution operation and flattening operation on an input fixed image feature image and a moving image feature image respectively, performs splicing and fusion on the fixed image feature image and the moving image feature image through cascading operation, inputs the fused feature image and image position information into the converter coding unit, and performs layer normalization and convolution block operation flattening output to a first decoding block of the decoding module;

referring to fig. 2, in the multi-head self-attention conversion module provided by the embodiment of the invention, the characteristic images of the fixed image and the moving image are simultaneously subjected to convolution operation and flattening operation with a convolution kernel of 8, and then the two characteristic images are fused through cascade operation, and the fused characteristic images and the position information of the images are sent to the converter coding module; the converter coding module comprises a multi-head self-attention block and a multi-layer perceptron; finally, the output of the multi-head self-attention conversion module is obtained through norm operation and convolution operation.

The multi-head self-attention transformation module can successfully model long-distance dependence mainly belongs to multi-head self-attention blocks, and the multi-head self-attention blocks mainly comprise a plurality of self-attention blocks SA connected together in a channel manner in a converter coding block, and in the embodiment of the invention, the number of the SA blocks is set to be 12.

S5: each decoding block in the decoding module is used for outputting a multi-scale deformation field fusion module after the dimension of an input characteristic image is restored layer by layer from low resolution to high resolution based on the output of the last decoding block and the output of the corresponding fixed encoding block and the corresponding mobile encoding block, so as to acquire a registration image of the fixed image;

referring to fig. 3, a decoding module provided in an embodiment of the present invention corresponds to a fixed image encoding module and a moving image encoding module, and includes a first decoding block, a second decoding block, a third decoding block, a fourth decoding block and a fifth decoding block that are sequentially connected in series, so as to implement dimension recovery from low resolution to high resolution. Each decoding block comprises a multi-scale deformation field fusion module, a space transformer module and a convolution unit which are sequentially connected in series along the forward propagation direction; the space transformation module is used for learning input features to obtain transformation parameters, applying the transformation parameters to the moving image, and under the guidance of a deformation field, registering the moving image features with the same size with the fixed image by twisting, and mapping to obtain a registered image; the multi-scale fusion module is used for fusing the multi-layer deformation field information so as to enhance fine granularity positioning; the convolution unit is mainly used for fusing the multi-semantic context information.

The multi-scale deformation field fusion module in the first decoding block convolves the input characteristics to obtain a first deformation field, and upsamples the first deformation field to obtain a first output deformation field; the multi-scale deformation field fusion module in the nth decoding block convolves the input characteristics to obtain an nth deformation field, upsamples the nth deformation field and the n-1 deformation field respectively, and superimposes the upsampled results to obtain an nth output deformation field; wherein n is more than or equal to 2 and less than or equal to 5.

Specifically, the outputs of a fifth fixed encoding block of the fixed image encoding module and a fifth movable encoding block of the movable image encoding module are both input to a first decoding block, and a first characteristic image is generated by combining the outputs of the multi-head self-attention conversion module and is input to a second decoding block; the second decoding block generates a second characteristic image according to the output of the fourth fixed encoding block and the fourth movable encoding block and the output of the first decoding block; the third decoding block generates a third characteristic image according to the output of the third fixed encoding block and the third movable encoding block and the output of the second decoding block; the fourth decoding block generates a fourth characteristic image according to the output of the second fixed encoding block and the second mobile encoding block and the output of the third decoding block; the fifth decoding block generates a fifth characteristic image according to the output of the first fixed encoding block and the first mobile encoding block and the output of the fourth decoding block, and inputs the fifth characteristic image into the multi-scale deformation field fusion module; so that the multi-scale deformation field fusion module acquires a plurality of registration images of the fixed image according to the registration deformation field.

S6: repeatedly inputting the fixed image and the unregistered moving image into a registration network to obtain the registering image of the fixed image;

s7: and carrying out superposition averaging on the fixed image and the plurality of registration images to obtain a denoising image of the fixed image.

In particular, referring to fig. 3, to further understand the potential relationship between the fixed image and the image to be registered, a multi-scale deformation field fusion module is proposed, which uses a multi-resolution strategy to learn the details of global affine registration, using the low resolution and strong semantic features of the previous layer as input F _in To obtain a high-resolution deformation field phi _out And high resolution feature F _out The four decoding blocks require the deformation field phi of the upper layer in addition to the decoding block 1 _in As input.

Specifically, each decoding block is directed to the input feature F _in Performing convolution operation with convolution kernel of 3 to obtain deformation field phi _conv ：

φ _conv ＝Conv _3×3 (F _in )，

For the first decoding block, the output distortion field phi _out From upsampling phi _conv The method comprises the following steps:

φ _out ＝φ _up-conv ＝UpSample(φ _conv )，

for the second, third, fourth and fifth decoding blocks, upsampling phi is required _conv Obtaining phi _up-conv Then, the deformation field phi of the upper layer is changed _in Also as input, upsample phi _in Obtaining the deformation field phi _up-in Output deformation field phi of final module _out ：

φ _up-conv ＝UpSample(φ _conv )，φ _up-in ＝UpSampleφ _in ，φ _out ＝φ _up-conv +φ _up-in ；

Spatial transformation module to construct a warped image, a differentiable spatial transformation module based on a spatial transformation network is used. The module learns transformation parameters of the regression input features, then applies the estimated transformation parameters to the input features for mapping, and finally constructs a final warped image through a sampler. Specifically, by inputting the feature map and deformation field parameters to be registered, an image after the image to be registered is distorted according to the deformation field parameters can be output. This module is model independent and allows spatial transformation of each pixel without any additional training supervision and modification to the optimization process.

Based on the above embodiment, in the embodiment of the present invention, step S1 further includes collecting OCT images of a plurality of retinal samples, resampling, constructing an image dataset, and dividing the image dataset into a training set and a test set according to a preset ratio; step S6 is followed by optimizing the registration network according to a loss function of the fixed image and the registration image; selecting any one OCT image in a certain sample of the test set as a fixed image, taking the rest OCT images in the sample as moving images, and repeatedly inputting the fixed image and different moving images into an optimized registration network to obtain a plurality of registration images of the fixed image in the sample of the test set; and superposing and averaging the fixed image of the sample and all the registration images to obtain a denoising image.

Specifically, by calculating a loss function of the fixed image and the moving image after registration, the image registration network is optimized, and the loss function is: l (L) _joint ＝L _sim +αL _smooth ，

representing a fixed image layer segmentation map L _f K-th layer tag->

Gradient of p in the y-direction of the deformation field

According to the OCT image enhancement method based on the registration network, the registered images are utilized to carry out multiple average denoising, so that the problem of image information coverage in single denoising is effectively solved; compared with the traditional single-amplitude denoising method, the calculation efficiency is greatly improved; compared with a single denoising method based on deep learning, the method is not limited by the standard quality of denoising gold in training data. The invention combines the multi-head self-attention conversion module and the multi-scale deformation field fusion module to construct an image registration network; the multi-head self-attention conversion module is utilized to capture more relations between the fixed image and the moving image, so that the high-precision registration of the fixed image and the moving image is realized; the multi-scale deformation field fusion module is utilized to learn details of global affine registration by using a multi-resolution strategy, deformation fields with high resolution are output based on deformation fields and features of the previous layer, fine-granularity positioning is enhanced, volume overlapping between anatomical structures is enhanced, the number of folded voxels between registration deformation fields is reduced, and loss of feature information is avoided, so that artifacts are reduced, images are enhanced, and image registration with high registration accuracy is realized when average denoising is performed.

Based on the embodiment, the invention also provides an application of the OCT image enhancement method based on the registration network in the field of retina OCT image enhancement.

Inputting the fixed layer segmentation map of the selected fixed image and the moving layer segmentation map of the moving image into an image registration network when the retina OCT image is enhanced; optimizing a registration network according to a loss function of the fixed layer segmentation map and the mobile layer segmentation map; the segmentation layers of the fixed layer segmentation map and the mobile layer segmentation map comprise nerve fiber layers, ganglion cell layers, outer plexiform layers, outer nuclear layers and layers from ellipsoid to retinal pigment epithelial layers and background areas.

Referring to fig. 4, a method for enhancing a retina OCT image based on a registration network according to an embodiment of the present invention includes:

based on the above embodiment, in the present embodiment, data is acquired with five commercial OCT scanners, the OCT image is resampled to 1024×1024, and the data is divided proportionally into one training data set D _tr With two test data sets D _ts1 And D _ts2 . Training data set D _tr Is 248 eyes of image data collected from a BV1000 scanner in a single scan mode, and the data size for each eye is 1000×1024×20 (width×height×slice number). First test data set D _ts1 Is combined with D _tr 62 eyes of data were collected in the same manner. Second test data set D _ts2 Is image data of a total of 48 eyes obtained in a regional scan mode by four scanners, including Topcon DRI-1, topcon 1000, topcon 2000, and Zeiss. The specific distributions of the training data set, the first test data set and the second test data set are shown in table 1.

Table 1: specific distribution of training data set, first test data set and second test data set

And carrying out online data amplification on the data in the data set, wherein the online data amplification comprises horizontal overturn, horizontal translation with the width range of 2.5%, vertical translation with the height range of 20%, rotation with the range of 15 degrees, random blurring and local self-adaptive histogram equalization. Pytorch-based integrated environment and NVIDIA RTX 3060GPU with 1 block of 12GB storage space complete training and testing of the model. The model was trained by back propagation algorithm minimizing cross entropy loss, and optimizer Adam was used to minimize cost function, with both the base learning rate and weight decay set to 0.00004. Batch (Batch) size was set to 1 and iteration number (Epoch) was set to 40000.

To quantitatively evaluate the performance of the image registration network provided by the present invention, 4 common classification evaluation indexes are used, including Signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), noise suppression index SSI (Speckle suppression index), and edge preservation coefficient EPI (Edge preservation index), which are defined as follows:

signal-to-noise ratio:

contrast to noise ratio:

noise suppression index:

edge retention coefficient:

wherein sigma _s Sum sigma _b Standard deviations of signal and background regions, respectively, in CNR, S is the number of regions of interest, u _i Sum sigma _i The mean and standard deviation of the ith region of interest, respectively. In SSI

The speckle intensity is measured, in EPI, I ₀ And I _d Representing a noise image and a denoising image, respectively, i and j representing the coordinates of the longitudinal direction and the transverse direction in the image.

In order to test whether the invention is helpful to reduce the speckle noise highlighted by the OCT of retina, the invention tests on two test data sets, firstly carries out registration training on a network, selects any one as a fixed image for each sample of a training set, takes other images of the sample as a moving image, and optimizes and stores a model by calculating a loss function. During testing, a fixed image in a test sample is selected, the whole sample is sent into a stored network structure, and the fixed image of each sample and a moving image registered with the fixed image are obtained by using stored network parameters; and (5) superposing and averaging all the images to obtain a final denoised image.

The performance of the OCT image enhancement method based on the registration network proposed by the present invention is compared with other methods, including block matching and 3D filtering (BM 3D), KSVD, multiple average denoising methods based on dynamic programming and hill climbing (dp+hc), edge-cGAN and Mini-cGAN and DHNet, and parameters of each method are set in these experiments to achieve the best results.

Example denoising results a graph is shown with reference to FIG. 5, at a first test dataset D _ts1 By means of the registration process, motion artifacts generated by physiological features such as tremors, drifting, scanning modes of a system light path during scanning and shooting are removed, so that a plurality of registration frames acquired at the same position can be averaged to obtain a denoising image on a test set, compared with an original image, the vision quality is remarkably improved, the integral structure and details of retina can be well reserved, and the optimal signal-to-noise ratio is obtained. The signal to noise ratio of the method is improved from 7.32dB to 34.85dB on the measurement index, and the average value of CNR, SSI and EPI is very close to the optimal value although the average value of CNR, SSI and EPI does not reach the optimal effect. The image detail can show that the proposed method can well display information such as the separation after the vitreous body and the external retina, and the retina boundary is clearer, and the test index results are shown in table 2:

table 2: test evaluation index of different denoising methods on first test data set

	SNR(dB)	CNR(dB)	SSI	EPI
					Original image	0.53±0.43	5.15±0.57	1.00±0.00	1.00±0.00
BM3D	7.32±3.26	13.56±1.54	1.24±0.03	0.37±0.02
					KSVD	4.31±1.59	12.71±0.74	0.47±0.03	0.45±0.04
DP+HC	13.60±4.09	10.78±2.15	0.17±0.13	1.17±0.06
					Edge-cGAN	17.89±4.33	12.45±1.20	0.11±0.01	0.73±0.06
Mini-cGAN	21.64±1.73	13.47±0.90	0.09±0.01	1.03±0.05
					DHNet	14.76±6.40	12.40±1.64	0.12±0.01	0.95±0.07
The method of the invention	34.85±4.15	11.80±0.96	0.10±0.01	1.00±0.03

Example denoising results a graph is shown with reference to FIG. 6, at a second test dataset D _ts2 Above, although based on the first test dataset D _ts1 Model obtained from training data collected by corresponding equipment in a second test data set D _ts2 The above-mentioned effects are still excellent. Visually BM3D has weaker contrast and blurred edges, which also manifest in lower EPI index and highest SSI index; the KSVD image is more blurred, has the lowest SNR, and can see double images by multiple averages based on dynamic programming and mountain climbing; edge-cGAN, mini-cGAN, and DHNet are overly smooth in the retinal layers with adhesion between layers; while our method performs best overall, also gets the best confidence in the index metricThe results of the noise ratios and the test index are shown in Table 3:

table 3: test evaluation index of different denoising methods on second test data set

/>

Based on the graph, the OCT image enhancement method based on the registration network provided by the invention guides speckle noise denoising of the retina OCT image by using the registration network (MsFTMorph) based on multi-scale fusion and attention mechanism, has higher registration precision and obtains the optimal signal to noise ratio. In detail, the image registration network provided by the invention is added with two new modules, namely a multi-head self-attention transformation module MST and a multi-scale deformation field fusion module MsDFF, so as to capture global context information, enhance the volume overlapping between anatomical structures and reduce the number of folded voxels between registration deformation fields. In addition, according to the experimental results of performing speckle noise suppression on the data sets acquired by different OCT scanners by using different acquisition modes, the method provided by the invention reserves more retina structure information and obtains better contrast enhancement. Even for the data of different retina diseases, the invention has better performance than other denoising methods, and proves the robustness, generalization and effectiveness of the invention on speckle denoising.

The invention combines the multi-head self-attention mechanism and the convolution neural network method in the aspect of the registration of the retina OCT image for the first time, and provides a registration method based on multi-scale fusion and attention mechanism for guiding speckle noise denoising of the retina OCT image for the first time, which is suitable for denoising of an Optical Coherence Tomography (OCT) image and achieves good denoising effect. The invention combines the multi-head self-attention conversion module and the multi-scale deformation field fusion module to construct an image registration network; the multi-head self-attention conversion module is utilized to capture more relations between the fixed image and the moving image, so that the high-precision registration of the fixed image and the moving image is realized; the multi-scale deformation field fusion module is utilized to learn details of global affine registration by using a multi-resolution strategy, deformation fields with high resolution are output based on deformation fields and features of the previous layer of features, fine-granularity positioning is enhanced, volume overlapping among anatomical structures is enhanced, the number of folded voxels among registration deformation fields is reduced, and loss of feature information is avoided, so that artifacts are reduced, images are enhanced, image registration with high registration precision is realized when average denoising is performed, and better robustness, generalization and effectiveness are realized for speckle denoising.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims

1. An OCT image enhancement method based on a registration network, comprising:

2. The registration network-based OCT image enhancement method of claim 1, wherein the transducer encoding unit comprises a multi-headed self-attention block and a multi-layered perceptron; the multi-headed self-attention block includes a plurality of self-attention blocks that are channel-wise connected together in the transformer coding unit.

3. The OCT image enhancement method based on a registration network of claim 1, wherein the fixed image encoding module and the moving image encoding module each include a first encoding block, a second encoding block, a third encoding block, a fourth encoding block, and a fifth encoding block sequentially connected in series along a positive propagation direction;

the first coding block comprises two serially connected convolution units;

4. The registration network-based OCT image enhancement method of claim 1, wherein the decoding module comprises a first decoding block, a second decoding block, a third decoding block, a fourth decoding block, and a fifth decoding block in series in order; each decoding block comprises a multi-scale deformation field fusion module, a space transformation module and a convolution unit which are sequentially connected in series; the multi-scale deformation field fusion module is used for learning details of global affine registration; the convolution unit comprises a convolution block, a batch normalization layer and a ReLU layer which are sequentially connected in series; the space transformation module is used for learning the input characteristics to obtain transformation parameters, and applying the transformation parameters to the moving image to map to obtain a registration image.

5. The OCT image enhancement method of claim 4, wherein the multi-scale deformation field fusion module in the first decoding block convolves the input features to obtain a first deformation field, and upsamples the first deformation field to obtain a first output deformation field;

6. The method for enhancing OCT images based on a registration network according to claim 1, wherein before any one OCT image in a certain sample of the fixed training set is a fixed image, the method comprises collecting OCT images of a plurality of retinal samples, resampling, constructing an image dataset, and dividing the image dataset into a training set and a test set according to a preset proportion.

7. The registration network-based OCT image enhancement method of claim 1, wherein the acquiring the plurality of registration images of the fixed image comprises:

8. The registration network-based OCT image enhancement method of claim 1, further comprising, after the acquiring the registration image of the fixed image:

9. The registration network-based OCT image enhancement method of claim 8, wherein the fixed image and registration image loss function is:

L _joint ＝L _sim +αL _smooth ，

representing a fixed image layer segmentation map L _f K-th layer tag->

Gradient of p in the y-direction of the deformation field

10. Use of the registration network-based OCT image enhancement method of any one of claims 1 to 9 in the field of retinal OCT image enhancement.