CN116612092A

CN116612092A - Microscope image definition evaluation method based on improved MobileViT network

Info

Publication number: CN116612092A
Application number: CN202310595908.0A
Authority: CN
Inventors: 周厚奎; 吴学程; 王陈燕
Original assignee: Zhejiang A&F University ZAFU
Current assignee: Zhejiang A&F University ZAFU
Priority date: 2023-05-25
Filing date: 2023-05-25
Publication date: 2023-08-18

Abstract

The invention discloses a definition evaluation method of a microscope image, which comprises the following steps: acquiring microscope images with different focusing distances through a microscope, and carrying out data enhancement such as rotation and overturning on the images; the number of MV2 modules and MobileViT block modules in the model is adjusted to improve the expression capacity and accuracy of the model; adjusting a model optimizer; adding a learning rate adjustment strategy of cosine annealing to dynamically adjust the learning rate; the number of nodes of the full-connection layer of the model is adjusted to change the model from a classification model to a regression model; and introducing an average absolute error and a mean square error to calculate the error between the output predicted distance and the image true distance. The invention provides a microscope image definition evaluation algorithm based on a MobileViT network, which can ensure that the focusing distance of a microscope image is predicted.

Description

Microscope image definition evaluation method based on improved MobileViT network

Technical Field

The present invention relates to a method for evaluating the sharpness of an image, and more particularly to a method for evaluating the sharpness of a microscopic image.

Background

About 78% of the information in humans is obtained from the eyes. An optical imaging system that treats the human eye as a reduced version. The quality of the imaged image determines the efficiency of the information acquisition. With the progress of science, various imaging principles are becoming more and more understood by humans. To better appreciate the world, many imaging systems, such as cameras, microscopes, and telescopes, are planned and manufactured. Regardless of the type of optical imaging system, it is most important or how a clear image is obtained.

An unclear image is often caused by two factors. Firstly, the reasons for the existence of the photographed object itself are as follows: the environment brightness of the shot object is lower and the moving state is in at the moment, and the image of the optical imaging system is unclear due to the problem of the optical imaging system itself is not clear: the position of the imaging surface changes, imaging lens parameters are set incorrectly, and the like. For these reasons, in order to make the imaged image clearer, one will generally choose to adjust the position of the imaging plane, which is what we often refer to as focusing.

In the early development stage of an optical imaging system, the focusing mode of an image is only the simplest manual focusing mode, and the manual focusing method needs strong professional ability and proficiency of operators, has strong subjective property, takes a long time and has low accuracy. In contrast, the electronic instrument replaces the automatic focusing technology of completing focusing monitoring and control by human eyes, so that the problems of long time consumption, low accuracy and the like caused by manual focusing can be well solved, and the focusing efficiency is far higher than that of manual focusing. Thus, from the end of the 19 th century, the increasing perfection of the theoretical system of optical imaging systems and the continuous development of the industrial manufacturing level lay the foundation for many researchers to study the autofocus technology.

As an important part of the autofocus technology, the sharpness evaluation method plays a critical role in microscopic imaging. Conventional auto-focus methods typically rely on manually designed features and rules that are difficult to accommodate for a variety of complex scene and condition changes. The introduction of the deep learning technology can help us to automatically learn the characteristic representation of microscopic image definition from the data, and improve the robustness and accuracy of automatic focusing.

The current deep learning definition evaluation method mainly adopts a Convolutional Neural Network (CNN) as an infrastructure, and has remarkable results in the field of image processing. However, since CNNs have relatively weak capturing capability for global context information, they cannot process some sequence data with long-range dependency relationships well, and thus have a certain limitation in coping with some specific tasks. In contrast, the transformer architecture is very excellent in application effect in the field of natural language processing, and is widely used due to its excellent capability of capturing global context information. However, for the task of sharpness evaluation in the field of image processing, the application of the transducer architecture is very limited, even blank. Therefore, the definition evaluation method based on the improved MobileViT lightweight network model is provided, and the model adopts a transform architecture, so that the flexibility and the expandability are high, and the model complexity and the calculation amount can be greatly reduced while the model accuracy is ensured.

In summary, the contribution of the invention is to apply the transducer architecture to the image definition evaluation task, and a definition evaluation method based on an improved MobileViT lightweight network model is provided, which has high flexibility and expandability, and can greatly reduce model complexity and calculation amount while ensuring model accuracy.

Disclosure of Invention

The invention aims to provide a definition evaluation method based on a MobileViT network for a microscope image, which can ensure that the focusing distance of the microscope image can be accurately predicted after model training.

A microscope image definition evaluation method based on a MobileViT network can comprise the following steps:

1. data acquisition and enhancement the dataset was photographed using a nikon Eclipse electric microscope, 0.75 NA, 20-fold objective. The samples used for training were 35 study-grade human pathological sections with hematoxylin and eosin staining (Omano OMSK-HP 50). The image was acquired using a 500 ten thousand pixel color camera (Pointgrey BFS-U3-51S 5C-C) with a pixel size of 3.45 [ mu ] m. In the acquisition process, microscope cell images of different focusing states are acquired by moving the sample to 41 different defocus positions (ranging from-10 [ mu ] m to +10 [ mu ] m, with a step size of 0.5 [ mu ] m). In most cases the range-10 μm to +10 μm is sufficient to cover most of the differently focused images. In addition, two different classes of samples were used as prediction sets, the first class of samples being likewise 697 stained tissue slides from hematoxylin and eosin stained research grade human pathology sections (Omano OMSK-HP 50), identical to the slides used in the training dataset (these slides were not used during the training process). The second type of sample is a total of 1312 samples of de-identified H & E skin tissue slides prepared by a separate clinical laboratory (the department of dermatology, university of ct). The first type of sample is referred to as a "same tissue slice" and the second type of sample is referred to as a "different tissue slice".

In addition, aiming at the problem of insufficient data image samples, the data enhancement strategy adopted by the invention comprises two modes of overturning and rotating, wherein the overturning of the image is vertical overturning, and the vertical overturning is horizontal overturning, and the upper half part and the lower half part of the image are exchanged according to the horizontal central axis. Similarly, the horizontal overturning is to exchange the left half side and the right half side of the image according to the vertical central axis of the image; the rotation operation is a rotation operation in which the image is rotated by 90 °, 180 °, and 270 ° with the origin of coordinates of the upper left corner thereof as the vertex and the positive x-axis direction as the 0 ° angle. Finally, the original 13552 pictures were amplified to 72848 pictures by a series of amplification processes, of which 65592 were used as training data and 7256 were used as verification data.

2. The module quantity is adjusted, the MobileViT network mainly comprises an MV2 module and a MobileViT block module, the difference of the two module quantities can influence the calculation quantity and the expression capacity of a final model, and the relative proportion of the two module quantities is changed by adjusting the module quantity proportion of the two parts. Specifically, the number of MV2 blocks in the layer2 layer of the model is reduced from 2 to 1, and the number of MobileViT blocks in the layer3 layer is increased from 2 to 4, so that the expression capacity and accuracy of the model are improved by increasing the number of MobileViT blocks in the model.

3. Optimizers test tuning in order to further improve model performance, the present invention attempts to use different optimizers, including SGD, adam, and AdamW. These optimizers are all commonly used optimizers, and their differences are mainly in the optimization algorithm and parameter update strategy. By comparing the effects of the different optimizers, it is desirable to find an optimal training strategy to improve the training efficiency and generalization ability of the model. Finally, the model obtained through training is used for predicting a new microscope image, and the generalization capability and the practicability of the model are checked by calculating the average absolute error and the mean square error. The present invention compares the behavior of different models and optimizers on predicting new images to determine the best model and training strategy.

4. And a cosine annealing learning rate adjustment strategy is added, so that the training effect of the model is further improved, and the cosine annealing learning rate adjustment strategy is also introduced. The strategy can help the model to quickly converge in the early stage of training, and gradually reduce the learning rate in the later stage, so that the model learns the characteristics in the data more carefully. Specifically, the present invention sets the initial learning rate to a large value, and then gradually decreases the learning rate over a certain period while adjusting the variation of the learning rate using a cosine function. The adjustment strategy can enable the model to learn data characteristics more efficiently, and avoid the occurrence of over-fitting in the later period of training. According to the invention, the cosine annealing learning rate adjustment strategy is applied to the definition evaluation method based on the MobileViT, and the comparison experiment result shows that the strategy can help the model to obtain a better definition evaluation result under the same training round, so that a better effect is shown. Therefore, the cosine anneal learning rate adjustment strategy plays an important role in the methods herein. The mathematical formula is expressed as follows:

wherein, the method comprises the steps of, wherein,for the index value of the learning rate adjustment period,andrespectively represent the maximum value and the minimum value of the learning rate,the number of iterations that the learning rate has passed up to the current position after the last restart is recorded,the total number of iterations per adjustment period.

5. In order to better adapt to the microscopic image definition evaluation task, the invention adjusts the original classification network into a regression network. In the classification network, the model outputs a probability value for each category, and the focusing distance of the image needs to be determined according to the probability value. In the regression network, the model output is a continuous real value, which directly represents the specific focusing distance of the image, and more accords with the actual requirement of the invention on definition. In particular, in adapting the network structure, the present invention replaces the last layer of the network with two fully connected layers containing the RELU activation function. On the basis, a full-connection layer with a node of 1 is added, and the output of the layer is the focusing distance of the image. Meanwhile, a Mean Square Error (MSE) is used as a loss function for measuring the difference between the model input focusing distance and the actual focusing distance, and network parameters are updated through a back propagation method.

By changing the classification network into the regression network, the adaptability of the model is improved, the calculated amount and the parameter number can be reduced, and the efficiency and the practicability of the model are further improved. In experiments, the invention discovers that the adjustment strategy can improve the performance of the definition evaluation method to a certain extent, and provides more accurate and reliable results for definition evaluation of microscopic images.

6. Model prediction, in which the data set size used in the prediction process is 2448×2048, and the data set size used in the training and verification is 224×224, it is necessary to cut each predicted image into 90 images of 224×224 in the prediction process, discard the rest of the images, perform regression prediction on 90 images, and calculate the mean absolute error and mean square error of each image. In addition, in the microscope image, some areas may be empty, that is, no effective information can be extracted, and the focusing distance of these areas is likely to have abnormal values, which if not processed, will generate a larger error on the whole prediction result. In addition, some areas may have little or no contrast, and these areas are also prone to outliers. Therefore, these problems can be effectively avoided from the median value of all the predicted focusing distances of one image as the predicted distance of the image. The mean absolute error and mean square error mathematical formulas are expressed as follows:

，wherein, the method comprises the steps of, wherein,the true focus distance is indicated as such,representing the model predicted focus distance.

7. The model comparison is carried out by using other different deep learning models and the improved MobileViT network model, the model parameter data and the like of the networks are compared, and the same prediction samples of the same tissue section and different tissue sections used in the experiment are respectively predicted and calculated. And finally, respectively taking the predicted distance of all the images in the prediction set and the true distance of the images in the prediction set as an ordinate and an abscissa, and drawing error scatter diagrams of different models.

The invention has the following characteristics:

1. the invention provides an improved MobileViT network-based definition evaluation method, which can ensure that correct prediction can be made on the focusing distance of a microscope image after model training.

2. The system is simple to realize, the core method part can be completed by only one computer, and the microscope image focusing distance prediction can be performed by only loading trained weights after the system starts to operate.

Drawings

FIG. 1 is a flow chart of a method according to the present invention

Fig. 2 is a diagram of a MobileViT network architecture modified in accordance with the present invention

FIG. 3 is a block diagram of MV2 module

FIG. 4 is a diagram of a MobileViT block module architecture

FIG. 5 is an exemplary diagram of training image data augmentation

FIG. 6 is an exemplary view of a predicted image

FIG. 7 is a MobileViT network error scatter plot for predictive image improvement

FIG. 8 is a plot of error scatter of the predictive image contrast model "same tissue slice

FIG. 9 is a plot of error scatter for a predictive image contrast model "different tissue slices

Detailed Description

The invention will be further described with reference to the drawings and the specific embodiments.

The invention provides a definition evaluation method based on an improved MobileViT network, which is described in detail below with reference to figures one to nine:

a flowchart of the method according to the present invention is shown in fig. 1. Firstly, aiming at the problem of insufficient image data samples, the data enhancement strategy adopted by the invention comprises two modes of overturning and rotating, wherein the overturning of the image is vertical overturning, and the vertical overturning is that the upper half part and the lower half part of the image are exchanged according to the horizontal central axis. Similarly, the horizontal overturning is to exchange the left half and the right half of the image according to the vertical central axis of the image. The original 13552 pictures were amplified to 72848 pictures by a series of amplification processes, of which 65592 were used as training data and 7256 were used as verification data. Regression training is performed among inputting images into the modified MobileViT network in batches, and the size of the input images is fixed at 224×224 in the network. The experiment carries out training on an input microscope image set through regression model prediction, sums and averages the training results of each image, and finally calculates the average focusing distance of all training images. In the prediction process, by introducing an average absolute error (Mean Absolute Error, MAE) and a mean square error (Mean Squared Error, MSE) as evaluation indexes of the quality of the model, the smaller the numerical value is, the closer the prediction result is to the actual result, and the mathematical formula is expressed as follows:

It should be noted that the data set size used in the prediction process is 2448×2048, and the data set size used in the training and verification is 224×224, so that in the prediction process, each predicted image needs to be cut into 90 images with the size of 224×224, the rest of the images are discarded, regression prediction is performed on 90 images, and the average absolute error and mean square error of each image are calculated. In addition, in the microscope image, some areas may be empty, that is, no effective information can be extracted, and the focusing distance of these areas is likely to have abnormal values, which if not processed, will generate a larger error on the whole prediction result. In addition, some areas may have little or no contrast, and these areas are also prone to outliers. Therefore, these problems can be effectively avoided from the median value of all the predicted focusing distances of one image as the predicted distance of the image. Finally, an error scatter diagram is drawn.

The diagram two shows the structure diagram of the improved MobileViT network of the invention, which mainly comprises an MV2 module and a MobileViT block module, the difference of the two modules affects the calculation amount and the expression capacity of the final model, and the relative proportion of the two modules is changed by adjusting the proportion of the two modules. Specifically, the number of MV2 blocks in the layer2 layer of the model is reduced to 1, and the number of MobileViT blocks in the layer3 layer is increased from 2 to 4, so that the expression capacity and accuracy of the model are improved by increasing the number of MobileViT blocks in the model.

As shown in the third figure, in the MV2 structure, the 1×1 convolution check image is adopted to perform the dimension-increasing operation, the channel is deepened, the depth separable convolution is performed through the convolution kernel with the size of 3×3, and then the dimension-decreasing operation is performed through the convolution kernel with the size of 1×1; and the activation function is set to ReLU6, the input value is set to 0 when the input value is smaller than 0, the input value is not changed in the interval of 0-6, and when the input value is larger than 6, the activation function is assigned to 6, and the specific expression formula is as follows:

。

as shown in fig. four, a diagram of a MobileViT Block is shown, in the MobileViT Block, a characteristic diagram with the number of high H and wide W channels being C is input first, local characterization (Local representations) is completed by convolving with a convolution kernel with the size of 3×3 and then adjusting the number of channels with a convolution kernel with the size of 1×1, then, after global characterization (global representations) is performed, L Transformer Block are unfolded and then the characteristic diagram is folded back, the number of channels is restored to be the same as the number of channels of the input characteristic diagram again with a convolution kernel with the size of 1×1, and then, two characteristic diagrams are subjected to a splicing operation by introducing shortcut branches and are fused with a convolution kernel with the size of 3×3.

In order to improve the robustness and generalization capability of the model, a plurality of data amplification methods including rotation, horizontal and vertical overturn and the like are applied to the training data, and the diversity and the number of data sets can be increased by the methods, so that the overfitting phenomenon can be effectively avoided. In addition, two different classes of samples were used as prediction sets, the first class of samples being likewise 697 stained tissue slides from hematoxylin and eosin stained research grade human pathology sections (Omano OMSK-HP 50), identical to the slides used in the training dataset (these slides were not used during the training process). The second type of sample is a total of 1312 samples of de-identified H & E skin tissue slides prepared by a separate clinical laboratory (the department of dermatology, university of ct). The first type of sample is referred to as a "same tissue slice" and the second type of sample is referred to as a "different tissue slice".

The predicted image improvement MobileViT network error scatter plot, the predicted image comparison model "same tissue slice" error scatter plot and the predicted image comparison model "different tissue slices" error scatter plot are shown in fig. seven, fig. eight and fig. nine, and are used in experiments of microscopic image definition evaluation. Firstly, in order to verify the training efficiency and generalization capability of different optimizers on a MobileViT network model used in the experiment, carrying out regression prediction on the network model trained by different optimizers by using a prediction set to obtain average absolute errors and mean square errors under different optimizers, and respectively predicting and calculating two types of prediction samples of the same tissue slice and different tissue slices used in the experiment. The specific experimental results are shown in table 1, and the numerical values are all 4 bits after decimal places.

According to the table 1, experimental tests show that when the network model is optimized by adopting different optimizers, the AdamW optimizer achieves the best effect on the model used herein, which shows that the AdamW optimizer is more suitable than SGD and Adam in this task. The AdamW optimizer adds a weight attenuation term on the basis of the Adam optimizer, so that the overfitting problem of the model can be effectively controlled, and the model is better in L2 regularization processing. Thus, the present invention takes an AdamW optimizer as the optimizer of the training model herein.

Table 1 prediction results of MobileViT network under different optimizers

Optimizer	Identical tissue sections	Identical tissue sections	Different tissue sections	Different tissue sections
						MAE	MSE	MAE	MSE
SGD	1.3894	2.1445	2.0206	6.4182
					Adam	0.6643	0.5255	1.2169	1.8251
AdamW	0.4481	0.3118	0.8443	1.3299

Secondly, after a model optimizer is selected, the invention uses deep learning models AlexNet, VGG16, resNet50, mobilenetV2, mobilenetV3 and ConvNeXt to carry out comparison experiments with the improved MobileViT network model adopted in the invention, and compares the data such as the calculation amount, the model parameter number and the like of the networks; similarly, prediction calculation was performed on two types of prediction samples, namely, the "same tissue section" and the "different tissue section" used in the experiment. The specific experimental results are shown in tables 2 and 3. And finally, respectively taking the predicted distance of all the images in the prediction set and the true distance of the images in the prediction set as an ordinate and an abscissa, and drawing error scatter diagrams of different models.

TABLE 2 comparison of results for "same tissue section" under AdamW optimizer

Model	Flops	Params	MAE	MSE
					AlexNet	0.71B	61.1M	2.0755	4.4333
VGG16	154B	0.1G	1.9089	3.8790
					ResNet50	4.14B	25.6M	0.5119	0.3430
MobilenetV2	0.33B	3.5M	1.3175	1.9603
					MobilenetV3	0.66B	4.1M	1.1490	1.5785
ConvNeXt	0.38B	15.4M	1.1635	1.3871
					Improved MobileViT	0.24B	3.6M	0.4481	0.3118

According to Table 2, under the AdamW optimizer, since the predicted image used was also a stained tissue slide from hematoxylin and eosin stained research grade human pathology section (Omano OMSK-HP 50), both the mean absolute error and mean squared error were smaller compared to the predicted image of Table 1 using the "different tissue sections".

In addition, compared with ResNet50, the improved MobileViT network adopts a transducer architecture, so that the improved MobileViT network not only has higher characterization capability and better generalization capability, but also can utilize the attention mechanism of the transducer to improve the feature extraction capability and the position awareness capability of the network, thereby further improving the performance of the model. In addition, mobileViT further reduces the computational complexity and memory space occupation of the model by well-designed depth separable convolution, attention mechanisms, and modular structures, and uses some skills in the training process, such as data enhancement and optimizer tuning, to optimize the performance of the network. These optimizations allow the improved MobileViT network of the present invention to outperform the conventional ResNet50 network in both training speed and model performance. Thus, in table 2, the improved MobileViT network model used herein achieves minimum mean absolute error and mean square error, which is about 14.2% and 10% less than the second smaller ResNet50 network.

Table 3 comparison of results for "different tissue sections" under AdamW optimizer

Model	Flops	Params	MAE	MSE
					AlexNet	0.71B	61.1M	3.6133	17.949
VGG16	154B	0.1G	2.9609	9.3219
					ResNet50	4.14B	25.6M	0.7650	0.9912
MobilenetV2	0.33B	3.5M	2.0234	4.3142
					MobilenetV3	0.66B	4.1M	1.9044	3.9488
ConvNeXt	0.38B	15.4M	1.2581	1.6388
					Improved MobileViT	0.24B	3.6M	0.8443	1.3299

However, as shown in table 3, the improved MobileViT network model proposed by the present invention only achieves the next lowest mean absolute error and mean square error on the predicted image of the "different tissue slices", which is probably because the predicted image is too low in similarity to the training image, and MobileViT is a lightweight network model, which is somewhat deficient compared to the ResNet50, so that the mean absolute error and mean square error do not achieve the minimum. However, although the mean absolute error and mean squared error do not take on a minimum, they are only about 9.4% and 25.5% greater than the smallest ResNet50 network.

As shown in fig. seven and fig. eight, the predicted distances of all the predicted set images and the true distances thereof are respectively taken as an ordinate and an abscissa, and a focusing error scatter diagram of a comparison model is drawn, wherein the deviation between the predicted result and the actual result of the predicted images in the experiment by the AlexNet and the VGG16 networks is larger, so that the predicted distances and the true distances are not shown in the focusing error scatter diagram. The improved mobile vit network focusing error scatter diagram is shown in figure six, and it can be seen from the figure that the errors of the predicted distance and the true distance of the improved mobile vit network used in the invention are basically concentrated in the range of-0.5 μm to 0.5 μm, and the predicted distances of other network models are slightly worse than the predicted result of the improved mobile vit network, which can also indicate that the invention performs some optimized fine tuning on the mobile vit network, and the performance of the network is improved to a certain extent.

Claims

1. The microscope image definition evaluation algorithm based on the improved MobileViT deep neural network is characterized by adopting image preprocessing, an improved MobileViT network, an optimizer selection and cosine annealing learning rate adjustment strategy, adjusting the network into a regression output to judge and identify the definition of a microscope image, and calculating the error between the output predicted distance and the image true distance.

2. The microscope image preprocessing method according to claim 1, wherein the image input to the network is subjected to size normalization processing, and the image size is scaled to 224 x 224 pixel size; the image data amplifying method is to amplify the divided images by performing rotation operations of vertical mirror image, horizontal mirror image and angles of 90 DEG, 180 DEG and 270 DEG respectively.

3. The improved MobileViT network of claim 1, wherein the MobileViT network is composed of MV2 modules and MobileViT blocks, the difference in the number of the two modules affects the calculation amount and expression capacity of the final model, and the relative specific gravity of the two parts is changed by adjusting the ratio of the number of the modules of the two parts; specifically, the number of MV2 blocks in the second layer of the model is reduced from 2 to 1, and the number of MobileViT blocks in the third layer is increased from 2 to 4, so that the expression capacity and accuracy of the model are improved by increasing the number of MobileViT blocks in the model.

4. The optimizer selection of claim 1, wherein the AdamW optimizer is finally selected as the optimizer of the present invention by comparison experiments with different optimizers.

5. The cosine annealing learning rate adjustment strategy as set forth in claim 1, wherein a cosine annealing learning rate adjustment strategy is employed to optimize training effects of the model, the cosine annealing strategy being a cosine function-based learning rate decay strategy capable of dynamically adjusting learning during trainingThe learning rate is linearly reduced from an initial value to a smaller value through periodical change of a cosine function, so that the model can be effectively prevented from sinking into a local optimal solution, and the generalization performance of the model is improved; at the end of each cycle, the learning rate is readjusted back to the initial value to maintain a certain randomness and diversity; when optimizing the objective function using an optimization algorithm, the learning rate should become smaller as it gets closer to the global minimum of the Loss value so that the model is as close as possible to this point, while cosine annealing can reduce the learning rate by a cosine function; the cosine value in the cosine function firstly slowly descends along with the increase of the iteration times, then descends in an accelerating way, and slowly descends again; the descent pattern can be matched with the learning rate to generate good effect in a very effective calculation mode, the mathematical formula is expressed as follows,wherein, the method comprises the steps of, wherein,for the index value of the learning rate adjustment period,andrespectively represent the maximum value and the minimum value of the learning rate,the number of iterations that the learning rate has passed up to the current position after the last restart is recorded,the total number of iterations per adjustment period.

6. The regression network model of claim 1 wherein the standard MobileViT classification network is changed to a regression network by replacing the last layer of the network with two fully connected layers containing RELU activation functions, adding a fully connected layer with node 1, and adjusting the network to a regression output network model.

7. The method for calculating errors according to claim 1, wherein the average absolute error MAE and the mean square error MSE are introduced as evaluation indexes, the error between the distance of the focusing result predicted by the calculation model on the microscope image and the true focusing distance of the microscope image is used as the evaluation index of the model method, the smaller the error value is, the closer the prediction result is to the actual result, the mathematical formula is expressed as follows,，wherein, the method comprises the steps of, wherein,the true focus distance is indicated as such,representing the model predicted focus distance.