CN115482160A

CN115482160A - Tongue color correction method based on deep convolution neural network

Info

Publication number: CN115482160A
Application number: CN202210948496.XA
Authority: CN
Inventors: 贺宁波; 李志平; 陈占春; 李治; 程旭; 李志娟; 王倩倩; 何平
Original assignee: Shanghai Bayes Health Technology Co ltd
Current assignee: Shanghai Bayes Health Technology Co ltd
Priority date: 2022-08-09
Filing date: 2022-08-09
Publication date: 2022-12-16

Abstract

A tongue color correction method based on a deep convolutional neural network relates to the field of medical image color correction, and solves the problems of color deviation, tongue color image color distortion and the like generated by tongue image data of mobile equipment in a non-standard light source environment; and then, under the environment of a color cast light source, the collected data is used for carrying out unified scaling. The method comprises the steps of training a data set under a standard light source by using a deep convolution neural network, delivering a result output by the neural network to a downstream to perform a classification task on each pixel point, converting each pixel point from an RGB space to an HSV space, calculating a point, closest to the HSV space, of each current pixel point, taking a closest pixel value as a standard value, finally calculating an RGB value corresponding to HSV to perform color space reduction, and finally realizing tongue picture color correction. The invention can restore the real color of the tongue picture image to the maximum extent.

Description

Tongue color correction method based on deep convolution neural network

Technical Field

The invention relates to the field of medical image color correction, in particular to a tongue color correction method based on a deep neural network.

Background

The tongue diagnosis is an effective characteristic diagnosis method in traditional Chinese medicine, and still plays an important role in the clinical practice of today. At present, there are two main methods for color correction, one is correction based on image analysis, such as mean white balance, perfect reflection, gray scale world assumption, etc.; the other is a color correction method based on a deep full convolution network. The problems with these two correction methods are as follows:

1. they do not refer to a specific standard to correct in the correction process, such as a standard color block, so that the final correction result cannot be measured by the pixel value of the standard color block;

2. even if the correction is performed based on the full convolution network, the true color of each pixel point cannot be estimated in many cases due to the influence of the network structure, and the overflow of pixel values even occurs in some images due to the local exposure problem.

3. The two methods are based on the whole idea to carry out correction, and each pixel point cannot be subjected to one-to-one mapping correction, so that the correction precision is not ideal.

In conclusion, the novel tongue image correction method provided by the invention has a certain reference significance for the research combining the deep learning and tongue diagnosis.

Disclosure of Invention

The invention aims to solve the problem of color deviation generated by tongue image data of mobile equipment in a non-standard light source environment, and provides a tongue color correction method based on a deep neural network.

A tongue color correction method based on a deep neural network comprises the following steps:

the method comprises the following steps of firstly, acquiring image data, and dividing the image data into a pre-trained data set and a test set to be corrected;

step two, constructing a deep neural network model based on Unet for a pre-trained data set;

step three, transmitting the pre-trained data set into the neural network model constructed in the step two for training to obtain a trained neural network model;

step four, transferring the test set to be corrected into the neural network model trained in the step three for prediction to obtain a prediction result; the method specifically comprises the following steps:

mapping the output prediction result into an HSV space from an RGB space, solving the similarity distance between the predicted HSV pixel value and the HSV space, and finding out the pixel point closest to the predicted HSV pixel value;

and step two, taking the pixel point closest to the HSV space as a standard pixel point for color correction, and finally converting the HSV space into an RGB space to realize color correction.

The invention has the beneficial effects that:

the tongue color correction method provided by the invention provides a network structure based on Unet, firstly, a deep full convolution network is adopted to extract color block information close to tongue color under a standard light source in an encoding-decoding stage, then each pixel information with extracted tongue image characteristics is classified at an output end by utilizing a softmax multi-classification idea, pixels are mapped to a standard color space to calculate a similarity distance, and then the standard color block pixel value closest to the standard color block is used as a corrected standard value to be corrected, and the tongue color can be corrected to the maximum extent by combining the full convolution network and the standard color block mapping idea.

The method combines the advantages of various classical networks on the deep neural network structure, takes color correction as a multi-classification task to predict, and selects the standard pixel point closest to the predicted classification result as the pixel point of color correction according to the distance between the standard pixel point and the predicted classification result in the HSV space, thereby reducing the real color of the tongue picture image to the maximum extent. A (c)

Drawings

FIG. 1 is a flow chart of a tongue color correction method based on a deep neural network according to the present invention;

fig. 2 is a schematic structural diagram of a deep neural network.

Detailed Description

The first embodiment is described with reference to fig. 1 and 2, and the tongue color correction method based on the deep neural network is implemented by the following steps:

step 1: image data is acquired.

Step 1.1: data collection, firstly, carrying out scaling and data enhancement processing of uniform size on a data set collected by mobile equipment under a standard light source, taking the data set as a pre-trained data set, and collecting a test data set to be corrected by the mobile equipment under various color cast light sources;

step 1.2: and data processing, namely performing unified size scaling on the image data, wherein the image distortion can be caused by a common scaling method, so that the image data is scaled by adopting a letterbox method. And then carrying out data augmentation on the zoomed data by methods such as turning, translation, rotation and the like. And carrying out scaling processing on the test data set to be corrected collected under the color cast light source in the same way, but not carrying out data enhancement. The specific pre-trained data set flipping method is as follows: horizontally and vertically turning; the rotating mode is as follows: the tongue picture is respectively rotated by 5 degrees, 10 degrees, 15 degrees, 30 degrees and 45 degrees to increase data.

And 2, step: and constructing a deep neural network for training based on characteristics of the Unet network.

Step 2.1: the whole deep neural network is built by a tensoflow deep learning framework and consists of three parts: a back-downsampled layer, a hack-depth separable convolutional layer, a head-upsampled layer. The construction of the whole convolutional neural network is shown in fig. 2.

The implementation of the backhaul-downsampling layer is as follows: the feature fusion and extraction are carried out by using a NIN module in Googlene Inception V1 in the downsampling of the back bone, and the advantages of using the NIN module are that: the large and small convolution parallel connection is used in the same layer, the adaptability of the network to the scale is increased while the width is increased, convolution kernels of various views are parallel, the network can learn useful information for the network, and the 1 x 1 convolution kernel can reduce the number of channels and can greatly reduce the calculation amount of parameters while realizing dimension reduction by fusing information of each channel. That is to say, the feature of the NIN can well fuse large and small objects in the feature map, has better local abstraction capability, and batch normalization (batch normalization) normalization processing is performed after the relu activation function is performed on each convolution, so that the consistency of the data distribution before and after the convolution is ensured as much as possible. The NIN idea is used for three times continuously, a maximum pooling (maxporoling) is connected behind each NIN module, the step length is set to be 2 to reduce the dimension, down sampling is achieved, the NIN serial number is flexible, the number of the NIN modules can be increased or decreased according to requirements, and the network depth is controlled automatically.

In the embodiment, in order to prevent gradient dispersion or explosion when the updated parameters are propagated backward due to the deepening of the network, a short circuit (shortcut) concept in a residual error network is adopted. The convolution kernel size during shortcut is set to be 1 x 1, and the step size can be customized according to the size of an input picture, so that add operation can be performed every time NIN module outputs.

The implementation of the tack-depth separable convolutional layer is as follows: compared with the common convolution, the convolution method has fewer required parameters, and the key point is that the depth separable convolution realizes the separation of a channel and a region, so that the model deployment is lighter, and the parameters of the three times of depth separable convolution are specifically set as: the number of convolution kernels of the first-time depth separable convolution layer is set to be 256, the size of each convolution kernel is 1 multiplied by 1, the step length is 1, the number of channels of convolution output in the depth direction of each input channel is 2, and the expansion coefficient is 1; the second depth-separable convolutional layer convolution kernels are set to be 256, the size of the convolutional kernels is 1 x 1, the step size is 1, the number of channels output by depth direction convolution of each input channel is 2, the expansion coefficient is 2, the second depth-separable convolutional layer convolution kernels are set to be 256, the size of the convolutional kernels is 1 x 1, the step size is 1, the number of channels output by depth direction convolution of each input channel is 2, the expansion coefficient is 4, and finally jump-connection is carried out on the input end and the output end.

The Head-up sampling layer specifically comprises: with 3 consecutive upsamplings, the input block _ size is 2. Compared with the common upsampling, the method can solve the problem of too low resolution to a great extent, because the main function of the PixelShuffle is to obtain a high-resolution feature map by recombining a low-resolution feature map and multiple channels of a convolution kernel, the method becomes an effective upsampling means for solving the problem of super-resolution.

In the embodiment, splicing (concat) stacking is performed on the channel dimension during each upsampling so as to realize fusion of feature maps with different scales, and the purpose of using the upsampling method is to make the downstream classification task more accurate. Then the obtained feature graph is subjected to global average pooling, and finally the multi-classification task is realized by full connection.

And 3, step 3: and transmitting the pre-trained data sets into the constructed neural network model in batches for training. The concrete implementation is as follows:

the data set is divided into data with the proportion of 5; 1/5 of the training set is determined as a verification set, model parameters are selected through the verification set, and if overfitting (overfitting) occurs, training (training) can be terminated in advance; 1/6 is defined as a test set, and data pollution can be prevented when a final model test is selected through the test set; and verifying whether the model overlaps or not by using a cross-verification mode during training.

In this embodiment, the training rounds epoch are set to 50, the batch size transmitted into the network model each time is set to 32, and the learning rate is set to be dynamic, that is, the learning rate increases first and then decreases with the increase of the training rounds, so that the model convergence process is more flexible, and the loss function is a cross entropy loss function.

In this embodiment, the gradient optimizer is chosen as Adam, which is chosen because it has great advantages in non-convex function optimization: the parameter updating is not influenced by the gradient scaling transformation; the hyper-parameters are very well interpretable and require no or little fine tuning; the step size of the update can be limited to approximately the range of measurement; the annealing process can be naturally realized (the learning rate is automatically adjusted); the adaptive learning method is suitable for unstable target functions and the like, namely the Adam algorithm is different from the traditional gradient descent algorithm, the traditional gradient descent algorithm keeps a single learning rate to update the weight and cannot be changed in the training process, and Adam designs independent adaptive learning rates for different parameters by calculating the first moment estimation and the second moment estimation of the gradient.

The formula of the cross entropy loss function is specifically as follows:

L＝-[ylogy+(1-y)log(1-y)]

in the formula, y is a label of a predicted value, and y is a label of a true value.

And 4, step 4: and transmitting the data obtained under the color cast light source into the trained model for prediction. And (4) solving the distance between the predicted pixel point and the standard pixel point in the HSV space, and taking the pixel point closest to the predicted pixel point as the standard pixel point for color correction to finally realize the color correction of the tongue picture. The concrete implementation is as follows:

the color cast data is used as data for correcting the color of the model, and only uniform scaling is carried out without data enhancement, so that other interference in the test process is avoided, and the color correction performance of the model can be better tested. The correction process of the tongue picture image comprises the following specific steps:

firstly, multi-classification is carried out according to the prediction result of the model, and each pixel point is classified according to 143 types of the color comparison table.

Firstly, converting the 143 color table category pixel points from RGB space to HSV space, and determining the positions of the 143 color category pixel points in a three-dimensional space formed by HSV.

And (3) the calculation model predicts that the HSV value of the pixel point is closest to the category in which color table in the HSV three-dimensional space, the closest color category point is taken as a corrected standard pixel point, and finally the RGB value corresponding to HSV is calculated to carry out color space reduction, and finally the color correction of the tongue picture is realized.

The formula for converting the RGB color space into the HSV color space is as follows:

R′＝R/255

G′＝G/255

B′＝B/255

C _max ＝max(R′,G′,B′)

C _min ＝min(R′,G′,B′)

Δ＝C _max -C _min

in the formula: r is the red channel, G is the green channel, and B is the blue channel of the image. R ', G ' and B ' are red, green and blue channels, respectively, which are converted to HSV color space. C _max And C _min Respectively taking the maximum value and the minimum value of the three-color channel, wherein delta is the maximum value C _max And minimum value C _min The difference of (a).

Hue (H) calculation:

saturation (S) calculation:

luminance Value (V) calculation:

V＝C _max

the HSV-to-RGB formula is specifically realized as follows:

C＝V×S

X＝C×(1-|(H/60°)mod2-1|)

m＝V-C

(R,G,B)＝((R′+m)×255,(G′+m)×255,(B′+m)×255)

in the formula: v denotes luminance, S denotes saturation, H denotes hue, C denotes a product of luminance V and saturation S, X denotes a product of C and hue H, and m denotes a difference between luminance V and saturation S.

The spatial distance formula used is as follows, where the letter subscript p in the formula represents the value of the predicted value in HSV space, and the subscript r represents the standard value of the color class in HSV space:

the above description is only a preferred embodiment of the present invention, and not intended to limit the present invention, the scope of the present invention is defined by the appended claims, and all structural changes that can be made by using the contents of the description and the drawings of the present invention are intended to be embraced therein.

Claims

1. The tongue color correction method based on the deep neural network is characterized by comprising the following steps: the method is realized by the following steps:

step four, the color cast data serving as a test set to be corrected is transmitted into the neural network model trained in the step three for prediction, and a prediction result is obtained; the method comprises the following specific steps:

step four, mapping the output prediction result into an HSV space from an RGB space, solving the similarity distance between the predicted HSV pixel value and the HSV space, and finding out the pixel point closest to the HSV space;

2. The tongue color correction method based on the deep neural network of claim 1, wherein: the specific process in the first step is as follows:

firstly, carrying out uniform-size scaling and data enhancement processing on a data set collected under a standard light source by adopting mobile equipment, and taking the data set as a pre-training data set;

collecting a test data set to be corrected under various color cast light sources by adopting mobile equipment;

step two, data processing, namely zooming the pre-trained data set by adopting a letterbox method to perform unified size zooming, and then performing data amplification on the zoomed data by adopting a turning, translation and rotation method;

carrying out scaling processing on test data to be corrected collected under various color cast light sources by adopting a letterbox method;

the specific pre-trained data set flipping method is as follows: horizontally and vertically turning;

the specific rotation mode is as follows: data augmentation is performed by rotating the pre-trained data set by 5 °, 10 °, 15 °, 30 °, and 45 °, respectively.

3. The tongue color correction method based on the deep neural network of claim 1, wherein: in the second step, the specific process of constructing the deep neural network model based on the Unet is as follows:

step two, setting a neural network model from a down-sampling layer, a depth separable convolution layer and an up-sampling layer;

the down-sampling layer adopts NIN modules for multi-scale feature fusion for 4 times continuously, adopts short-circuit operation in a residual error network at the input end and the output end of each NIN module, and carries out addition operation when each NIN module outputs;

the depth separable convolution layer adopts three times of continuous depth separable convolution and simultaneously performs jump connection on the input end and the output end of the depth separable convolution layer;

the up-sampling layer adopts 3 times of continuous up-sampling, the up-sampling mode is completed through PixelShuffle, and splicing and stacking are carried out on the channel dimension after each up-sampling so as to realize the fusion of feature maps with different scales.

4. The tongue color correction method based on the deep neural network of claim 1, wherein: the concrete process of the third step is as follows:

dividing the data set into 5;

wherein, 5/6 of the data is used as a training set, and the model parameters are updated through the training set; 1/5 of the training set is determined as a verification set, model parameters are selected through the verification set, and if overfitting occurs, training can be terminated in advance;

and 1/6 of the data is used as a test set, and a cross validation mode is used for verifying whether the model is over-fitted during training.

5. The tongue color correction method based on the deep neural network of claim 4, wherein: the number of training rounds epoch is set to be 50, the batch size of the network model transmitted into each time is set to be 32, the learning rate is set to be dynamic, and the loss function adopts a cross entropy loss function;

the formula of the cross entropy loss function is specifically as follows:

L＝-[ylogy+(1-y)log(1-y)]

in the formula, y is a label of a predicted value, and y is a label of a real value;

the gradient optimizer is chosen to be Adam.