LU500916B1

LU500916B1 - Method and device for image classification based on improved stochastic gradient descent

Info

Publication number: LU500916B1
Application number: LU500916A
Authority: LU
Inventors: Qiang Wang
Original assignee: Univ Shandong
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-05-27

Abstract

The present disclosure provide a method and device for image classification based on improved stochastic gradient descent, the method comprising: receiving image data to be classified, carrying out the data preprocessing, obtaining images of preset size; according to a depth network model established by the improved stochastic gradient descent method, extracting features of the preprocessed data; using the extracted features to classify the received image data to be classified. Adopting the improved stochastic gradient descent method to train parameters in the depth network model; obtaining a gain value by using an estimated value of previous gradient and an observed value of present gradient; using the obtained gain value, the estimated value and observed value of the previous gradient and the observed value of the present gradient to obtain the estimated value of the present gradient; using the obtained estimation value of gradient to update the model parameters, using the gradient descent method or stochastic gradient descent method to train the model parameters until the model meets a preset training termination criteria. The present disclosure can effectively reduce the oscillation of parameters in the process of model training during image classification and increase the stability of the classification model.

Description

METHOD AND DEVICE FOR IMAGE CLASSIFICATION BASED ON IMPROVED STOCHASTIC GRADIENT DESCENT

TECHNICAL FIELD The present disclosure belongs to the technical field of image classification, in particular to a method and device for image classification based on improved stochastic gradient descent.

BACKGROUND Information of the Related Art part is merely disclosed to increase the understanding of the overall background of the present invention, but is not necessarily regarded as acknowledging or suggesting, in any form, that the information constitutes the prior art known to a person of ordinary skill in the art.

In recent years, image classification technology has developed rapidly and plays an important role in many fields, such as face recognition, handwriting recognition, automatic driving and so on. A very important way to improve the performance of image classification is deep learning, such as VGG model and ResNet model, which greatly improves the accuracy of traditional classification algorithms. However, the optimization objective of the deep learning model is a non convex function, and there are many parameters, which makes the training process very difficult.

At present, gradient descent or stochastic gradient descent algorithm is mainly used to solve the problem of model training in image classification. When the data set is small, the gradient descent method can be used to train the model parameters. However, with the increase of the data set, it is very unrealistic to calculate the gradient of the overall data, because the computational complexity is too high, resulting in the abnormally slow model training process. Therefore, when the data set is large to some extent, the algorithm is only a theoretical result, which is almost infeasible in practice. In the actual model training, people often use the stochastic gradient descent algorithm, that is, stochastically select several samples (M may be set) from the overall training samples, which is called a patch, and then take the average gradient on the patch as the gradient estimation of the overall samples for training. This algorithm is extremely effective for the training of depth model and promotes the further development of in-depth learning. However, this method also has its own defects - the model parameters cannot converge. Even on some data sets, the parameters fluctuate seriously at the initial stage of training, which seriously affects the stability of the model.

SUMMARY In view of the shortcomings in the prior art, one or more embodiments of the present disclosure provide a method and device for image classification based on improved stochastic gradient descent, which can effectively reduce the oscillation of parameters in the process of model training during image classification and increase the stability of the classification model. According to one aspect of one or more embodiments of the present disclosure, a method for image classification based on improved stochastic gradient descent is provided. The method for image classification based on improved stochastic gradient descent, comprising: receiving image data to be classified, carrying out the data preprocessing, obtaining images of preset size; according to a depth network model established by the improved stochastic gradient descent method, extracting features of the preprocessed data; using the extracted features to classify the received image data to be classified. Further, in the method, the data preprocessing is to cut the received image data, and the cutted image size is consistent with the sample image cutting size used to establish the depth network model. Further, in the method, the specific steps of establishing the depth network model according to the improved stochastic gradient descent method include: receiving a sample set of training of image, carrying out the data preprocessing for the image of the sample set, obtaining the sample set of training of consistent size image; according to the preprocessed sample set of training of image, using the improved stochastic gradient descent method to train parameters in the depth network model, establishing the depth network model.

Further, in the method, the specific steps of using the improved stochastic gradient descent method to train the parameters in the depth network model include: obtaining a gain value by using an estimated value of previous gradient and an observed value of present gradient; using the obtained gain value, the estimated value and observed value of the previous gradient and the observed value of the present gradient to obtain the estimated value of the present gradient; using the obtained estimation value of gradient to update the model parameters, using the gradient descent method or stochastic gradient descent method to train the model parameters until the model meets a preset training termination criteria.

Preferably, in the method, using the obtained gradient estimation value to calculate updated velocity estimation, and using the obtained velocity estimation to update the model parameters.

Further, in the method, the updated model parameter is a difference between the model parameters in previous step and the product of a learning rate and the estimated value of the present gradient.

Further, in the method, establishing in advance a Softmax network between the features of the extracted image and image classification, and classifying the received image data to be classified by using the extracted features based on the Softmax network.

According to one aspect of one or more embodiments of the present disclosure, a computer-readable storage medium is provided.

The computer-readable storage medium, with storing a plurality of instructions on; the instructions are suitable for loading by a processor of a terminal device and executing the method for image classification based on improved stochastic gradient descent.

According to one aspect of one or more embodiments of the present disclosure, a terminal device is provided.

The terminal device, including a processor and a computer-readable storage medium, and the processor is used to implement each instruction; the computer-readable storage medium is used to store a plurality of instructions suitable, and the instructions are suitable for loading by the processor and executing the method for image classification based on improved stochastic gradient descent.

According to one aspect of one or more embodiments of the present disclosure, a device for image classification based on improved stochastic gradient descent is provided.

The device for image classification based on improved stochastic gradient descent, being based on a method for image classification based on improved stochastic gradient descent, including: a data preprocessing module, a feature extraction module and an image classification module connected sequentially; the data preprocessing module, being used for receiving an image data to be classified, performing data preprocessing to obtain an image of a preset size; the feature extraction module, being used for feature extraction of preprocessed data according to a depth network model established by the improved stochastic gradient descent method; the image classification module, being used to classify the received image data to be classified by using the extracted features.

Beneficial effects of the present disclosure: The present disclosure provides a method and device for image classification based on improved stochastic gradient descent, being suitable for parameter training of image classification model and effectively solving the phenomenon of severe parameter fluctuation in the training process of the required model. The depth network model is trained by the improved stochastic gradient descent algorithm to estimate the gradient and convergence rate of each batch of stochastically selected data. Firstly, for the gradient estimation selected in the previous step and the observed value of the sample gradient selected in present step, the gradient gain of present step is obtained; secondly, the gradient estimation of present step is calculated by using the obtained gain, the gradient estimation and observation of the previous step; finally, according to the update of the corresponding gradient estimation, the update of the model parameters is obtained; repeat the above process until the model reaches the termination criterion. The disclosure effectively reduces the parameter fluctuation of the model and increases the stability of the model.

BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings constituting a part of the present invention are used to provide a further understanding of the present invention. The exemplary examples of the present invention and descriptions thereof are used to explain the present invention, and do not constitute an improper limitation of the present invention. FIG. 1 is a flowchart of a method for image classification based on improved stochastic 5 gradient descent according to one or more embodiments; FIG. 2 is a flowchart of a parameter training algorithm according to one or more embodiments; FIG. 3 is a schematic diagram of a neural network parameter model according to one or more embodiments.

DETAILED DESCRIPTION The technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below in combination with the accompanying drawings in one or more embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on one or more embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work belong to the protection scope of the present invention.

It should be noted that, the following detailed descriptions are all exemplary, and are intended to provide further descriptions of the present disclosure. Unless otherwise specified, all technical and scientific terms used herein have the same meanings as those usually understood by a person of ordinary skill in the art to which the present disclosure belongs.

It should be noted that the terms used herein are merely used for describing specific implementations, and are not intended to limit exemplary implementations of the present disclosure. As used herein, the singular form is also intended to include the plural form unless the context clearly dictates otherwise. In addition, it should further be understood that, terms ‘comprise’ and/or ‘include’ used in this specification indicate that there are features, steps, operations, devices, components, and/or combinations thereof.

It should be noted that the flowchart and block diagram in the drawings illustrate the architecture, functions and operations of possible implementations of methods and systems according to various embodiments of the present disclosure. It should be noted that each block in the flowchart or block diagram may represent a module, program segment, or part of code, which may include one or more executable instructions for realizing the logical functions specified in various embodiments. It should also be noted that in some alternative implementations, the functions marked in the box may also occur in a different order than those marked in the drawings. For example, two consecutive blocks can actually be executed in substantially parallel, or they can sometimes be executed in reverse order, depending on the function involved. It should also be noted that each block in the flowchart and / or block diagram, as well as the combination of blocks in the flowchart and / or block diagram, may be implemented using a dedicated hardware based system performing specified functions or operations, or may be implemented using a combination of dedicated hardware and computer instructions.

Without conflict, the embodiments and features in the embodiments of the present disclosure can be combined with each other. The present disclosure is further described below in combination with the accompanying drawings and embodiments.

Image classification algorithm plays a key role in many application scenarios and greatly promotes the development of artificial intelligence. However, the gradient descent method used in the process of model training itself does not converge, fluctuates greatly on some data sets, and there is a oscillatory state, resulting in the instability of the model.

In order to overcome this defect, according to one aspect of one or more embodiments of the present disclosure, a method for image classification based on improved stochastic gradient descent is provided. In a process of model training, the present invention does not completely use the gradient value of stochastically selected samples on the patch, but estimates the current gradient value based on the previous estimation and the observed new sample gradient, uses the estimated new gradient as the gradient value of the model, and then uses the gradient descent method or stochastic gradient descent method for model training, effectively reduce or eliminate the fluctuation state of parameters in the training process and increase the stability of the model. In addition, in order to solve the problem of slow convergence of the algorithm on some data sets, the present invention also provides an optimization measure.

As shown in FIG. 1, the method for image classification based on improved stochastic gradient descent, comprising:

Step (1): receiving an image data to be classified, performing a data preprocessing, and obtaining the image of preset size; the purpose of this step is to keep the size of all images consiStept{2): extracting features of the preprocessed data according to a depth network model established by the improved stochastic gradient descent method; Step (3): classifying the received image data to be classified by using the extracted features.

In the step (1) of the present embodiment, the data preprocessing is to crop the received image data, and the cropped image size is consistent with the sample image cutting size used to establish the depth network model.

In the whole flow chart, the feature extraction of the depth network model is an important link to determine the performance of the algorithm. Compared with the existing stochastic gradient descent algorithm, the present invention firstly calculates a gradient gain by updating the gradient; then, based on the gradient gain, to re-estimating an update direction of the gradient. Compared with the existing gradient descent algorithm, a fluctuation of the parameters of the algorithm is gradually slow, and a stability of the model is increased.

In the step (2) of the present embodiment, the specific steps of establishing the depth network model according to the improved stochastic gradient descent method include: Step (2-1): receiving a sample set of training of image, carrying out the data preprocessing for the image of the sample set, obtaining the sample set of training of consistent size image; Step (2-2): according to the preprocessed sample set of training of image, using the improved stochastic gradient descent method to train parameters in the depth network model, establishing the depth network model.

In the step (2-1) of the present embodiment, the data preprocessing is to crop the received image data so that the size of all pictures remains the same. In order to increase the sample size of the data, the same data can be cropped several times at random and its label remains unchanged.

For ease of expression, the gradient of each selected sample is recorded as 1 i i . . . Ex =—V, 2 Lf 0),Y') , wherein 81>82>--8, is the sample gradient sequence, L(x, y) representing the distance function, and f(x,0) representing the depth model. As shown in FIG. 2, in the step (2-2) of the present embodiment, the specific steps of using the improved stochastic gradient descent method to train the parameters in the depth network model include: step (2-2-1): calculating the gain value by using an estimated value of the previous gradient and an observed value of the present gradient; _ |g: T gl, the gain obtained is: 7+ 7 > Ss Ja +84. - gill, step (2-2-2): using the obtained gain value, the estimated value and observed value of the previous gradient and the observed value of the present gradient to obtain the estimated value of the present gradient; the gradient estimation is: &;4 = 4 + Ye (81x41 — 8), wherein Lo 1 is the gradient estimation sequence; step (2-2-3): using the obtained gradient estimation value to update the model parameters, and using the gradient descent method or stochastic gradient descent method to train the model parameters; the calculated parameter is updated as: 0,, =6, —€g,,,, wherein 6,,0,....,6, represents the update velocity sequence, and € is the learning rate parameter.

Preferably, in the method, updated velocity estimation is calculated by using the obtained gradient estimation value, and the model parameters are updated by using the obtained velocity estimation.

The calculated updated of velocity is: V,,, = ŒV, —€&,,, , and the calculated update of parameters is: 6,,=6, +V,, , wherein V,,V,.…,V, represents the update velocity sequence,

0,,0,,..,0, is the update parameter sequence, @ is the momentum parameter, and € is the learning rate parameter.

Step (2-2-4): repeating the above steps (2-2-1) to step (2-2-3) until the model meets a preset training termination criteria.

The specific improved stochastic gradient descent algorithm is as follows: Require: learning rate, momentum parameter Require: initial parameter ©, , initial speed v, Require: initial gradient ÿ = Ô and initial gradient estimation d 0 =0 While (does not meet the termination criteria) do ; w 2) (m) Co. stochastically collect a small batch {x ,x,..,x" } containing m samples from the training set, and the corresponding target is y® ; . . 1 i i calculating the gradient: 8; = —V, 2 Lf 0), 5") _ lg: — gl, calculating the gain: Zest 7 > Jo + ge - 8: gradient estimation: 2,41 = 4, + Ven (Sen 7 84) parameter update: 0, = 0, — E 844 (optionally: velocity update: V,,, = ŒV, — E 8441 parameter update: 6,,, = 0, +v,,,) end while For the feature extraction link, the following takes the neural network shown in FIG. 3 as an example to illustrate the present embodiment for the parameter training in the model: randomly selecting m samples from the data set as a Batch, and defining the corresponding optimization objective function as:

1 2 15 1 © ©? JW b;x, y)= =|, = = Cw, CH=), 2 ms 2 the parameter @ is (W,b). First of all, let all AW’ =0, AA” =0, calculating the gradient of 1 2 the objective function J(W,b:x, y) = PL (x) — | with respect to the parameter W,b . Assuming that the parameter W is sXf dimensions, b is r dimensions, firstly calculating (k) Ö (k) 0 8 =a JW.b),g, #7 JW,b) and keep them for the next step. ow, ob, Putting back the samples selected last time, randomly re-selecting the m data as a new Batch, (k+1) 0 (k+1) 0 and calculating the gradients on the Batch as follows: 8; = aw’ (W,b),8, = Ed (W,b) ij 1 Calculating the gradient gain using the gradient values of two adjacent gradients: > De ODA CE y ++ _ iJ ’ ’ 1 i Fe DEN i j 1 For all !,j and /, updating the gradient value as: gun _ gw + (ge _ gt) gn _ 20 + y (gm —g) Finally, updating the model parameters: We =W; _ eg" , pm =p" “eg ° Repeating the above process layer-by-layer until the model reaches a certain termination condition.

In the step (3) of the present embodiment, a Softmax network between the feature of the extracted image and the image classification is established in advance, and the received image data to be classified is classified by using the extracted feature based on the Softmax network.

These computer executable instructions, when running in a device, cause the device to perform a method or process described in accordance with various embodiments of the present disclosure.

In the present embodiment, the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for performing various aspects of the present disclosure. The computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device. The computer-readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non exhaustive list) of computer-readable storage media include: portable computer disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital multifunction disk (DVD), memory stick, floppy disk A mechanical coding device, such as a punch card or a convex structure in a groove on which instructions are stored, and any suitable combination of the above. The computer-readable storage medium used herein is not interpreted as an instantaneous signal itself, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., an optical pulse through an optical fiber cable), or an electrical signal transmitted through a wire.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device through a network, such as the Internet, local area network, wide area network and/or wireless network. The network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing / processing device receives a computer-readable program instruction from the network and forwards the computer-readable program instruction for storage in a computer-readable storage medium in each computing/processing device.

The computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as C++, and conventional procedural programming languages such as °C’ or similar programming languages. Computer readable program instructions may be executed completely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer, partially on the remote computer, or completely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., through an Internet service provider). In some embodiments, various aspects of the present disclosure may be implemented by personalizing an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), by utilizing the state information of computer-readable program instructions.

The device for image classification based on improved stochastic gradient descent, being based on a method for image classification based on improved stochastic gradient descent, including: a data preprocessing module, a feature extraction module and an image classification module connected sequentially; the data preprocessing module, being used for receiving an image data to be classified, performing data preprocessing to obtain an image of a preset size; the feature extraction module, being used for feature extraction of preprocessed data according to a depth network model established by the improved stochastic gradient descent method: the image classification module, being used to classify the received image data to be classified by using the extracted features.

It should be noted that although several modules or sub modules of the equipment are mentioned in the above detailed description, this division is only exemplary and not mandatory.

In fact, according to an embodiment of the present disclosure, the features and functions of the two or more modules described above may be embodied in one module.

On the contrary, the features and functions of one module described above can be further divided into multiple modules.

Beneficial effects of the present disclosure: the present disclosure provides a method and device for image classification based on improved stochastic gradient descent, being suitable for parameter training of image classification model and effectively solving the phenomenon of severe parameter fluctuation in the training process of the required model.

The depth network model is trained by the improved stochastic gradient descent algorithm to estimate the gradient and convergence rate of each batch of stochastically selected data.

Firstly, for the gradient estimation selected in the previous step and the observed value of the sample gradient selected in present step, the gradient gain of present step is obtained; secondly, the gradient estimation of present step is calculated by using the obtained gain, the gradient estimation and observation of the previous step; finally, according to the update of the corresponding gradient estimation, the update of the model parameters is obtained; repeat the above process until the model reaches the termination criterion.

The disclosure effectively reduces the parameter fluctuation of the model and increases the stability of the model.

The foregoing descriptions are merely preferred embodiments of the present invention, but not intended to limit the present invention.

A person skilled in the art may make various alterations and variations to the present invention.

Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Therefore, the present invention will not be limited to these embodiments shown herein, but will conform to the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for image classification based on improved stochastic gradient descent, comprising: receiving image data to be classified, carrying out the data preprocessing, obtaining images of preset size; according to a depth network model established by the improved stochastic gradient descent method, extracting features of the preprocessed data; using the extracted features to classify the received image data to be classified.

2. The method according to claim 1, wherein the data preprocessing is to crop the received image data, the cropped image size is consistent with the sample image cutting size used to establish the depth network model.

3. The method according to claim 1, wherein the establishing the depth network model according to the improved stochastic gradient descent method, comprising the specific steps of: receiving a sample set of training of image, carrying out the data preprocessing for the image of the sample set, obtaining the sample set of training of consistent size image; according to the preprocessed sample set of training of image, using the improved stochastic gradient descent method to train parameters in the depth network model, establishing the depth network model.

4. The method according to claim 3, wherein the using the improved stochastic gradient descent method to train the parameters in the depth network model, comprising specific steps of: obtaining a gain value by using an estimated value of previous gradient and an observed value of present gradient; using the obtained gain value, the estimated value and observed value of the previous gradient and the observed value of the present gradient to obtain the estimated value of the present gradient; using the obtained estimation value of gradient to update the model parameters, using the gradient descent method or stochastic gradient descent method to train the model parameters until the model meets a preset training termination criteria.

5. The method according to claim 4, wherein the updated model parameter is a difference between the model parameters in previous step and the product of a learning rate and the estimated value of the present gradient.

6. The method according to claim 4, wherein using the obtained gradient estimation value to calculate updated velocity estimation, and using the obtained velocity estimation to update the model parameters.

7. The method according to claim 1, wherein, establishing in advance a Softmax network between the features of the extracted image and image classification, and classifying the received image data to be classified by using the extracted features based on the Softmax network.

8. A computer-readable storage medium, with storing a plurality of instructions on, wherein the instructions being suitable for loading by a processor of a terminal device and executing the method for image classification based on improved stochastic gradient descent of any of claim 1-7.

9. A terminal device, including a processor and a computer-readable storage medium, and the processor being used to implement each instruction; the computer-readable storage medium being used to store a plurality of instructions suitable, and the instructions being suitable for loading by the processor and executing the method for image classification based on improved stochastic gradient descent of any of claim 1-7.

10. A device for image classification based on improved stochastic gradient descent, wherein, being based on a method for image classification based on improved stochastic gradient descent, the device comprising: a data preprocessing module, a feature extraction module and an image classification module connected sequentially; the data preprocessing module, being used for receiving an image data to be classified, performing data preprocessing to obtain an image of a preset size; the feature extraction module, being used for feature extraction of preprocessed data according to a depth network model established by the improved stochastic gradient descent method; the image classification module, being used to classify the received image data to be classified by using the extracted features.