CN112183563A - Image recognition model generation method, storage medium and application server - Google Patents

Image recognition model generation method, storage medium and application server Download PDF

Info

Publication number
CN112183563A
CN112183563A CN201910593320.5A CN201910593320A CN112183563A CN 112183563 A CN112183563 A CN 112183563A CN 201910593320 A CN201910593320 A CN 201910593320A CN 112183563 A CN112183563 A CN 112183563A
Authority
CN
China
Prior art keywords
image
hidden
sub
training
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910593320.5A
Other languages
Chinese (zh)
Inventor
俞大海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TCL Corp
TCL Research America Inc
Original Assignee
TCL Research America Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TCL Research America Inc filed Critical TCL Research America Inc
Priority to CN201910593320.5A priority Critical patent/CN112183563A/en
Publication of CN112183563A publication Critical patent/CN112183563A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a generation method of an image recognition model, a storage medium and an application server, wherein the method comprises the following steps: the method comprises the steps of obtaining a training image set, wherein the training image set comprises a plurality of training images and a real label corresponding to each training image, each training image is an image with a hidden sub-region and a non-hidden sub-region, the training image set is used as an input item of a preset convolutional neural network model, a prediction label is obtained through the preset convolutional neural network, and network parameters of the preset convolutional neural network are corrected through the real label and the prediction label to obtain a trained image recognition model. In the image recognition model generated by the embodiment, the training image with the partially hidden area is used as the training sample, so that the recognition accuracy of the image with the shielding function of the image recognition module is improved, and the robustness of the image recognition model is improved.

Description

Image recognition model generation method, storage medium and application server
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method for generating an image recognition model, a storage medium, and an application server.
Background
With the rapid development of artificial intelligence technology in the field of image vision, the demand and application of target detection tasks based on images are increasing. The target detection technology can predict the content type and position in the image through a neural network model, and then obtain a more accurate position through back propagation. However, the neural network model adopted by the current target detection technology is mainly obtained by deep learning by taking an image marked with the position of a target object as a training sample. However, for an image in which a target object is partially blocked, the existing neural network for image recognition cannot accurately recognize the target object, which brings inconvenience to a user.
Disclosure of Invention
The present invention is directed to a method for generating an image recognition model, a storage medium, and an application server, which are provided to overcome the disadvantages of the related art.
The technical scheme adopted by the invention is as follows:
a method of generating an image recognition model, comprising:
acquiring a training image set, wherein the training image set comprises a plurality of training images and a real label corresponding to each training image, each training image is an image with a hidden sub-region and a non-hidden sub-region, and the real label corresponding to each training image is used for representing an object type corresponding to the non-hidden sub-region in the training image;
inputting the training images in the training image set into a preset convolutional neural network model, and acquiring a prediction label corresponding to the training images output by the preset convolutional neural network model;
correcting the model parameters of the preset convolutional neural network model according to the real label corresponding to the training image and the prediction label corresponding to the training image; and continuing to execute the step of inputting the training images in the training image set into a preset convolutional neural network model until the training condition of the preset convolutional neural network model meets a preset condition, so as to obtain a trained image recognition model.
The image recognition model generation method comprises the steps that each training image in the training image set is obtained after an original image is preprocessed; the preprocessing mode for obtaining the training image according to the original image is as follows:
dividing an original image into a first preset number of sub-regions;
determining a second preset number of sub-areas to be hidden in the first preset number of sub-areas; wherein the first preset number is greater than the second preset number;
and setting the pixel value of the pixel point in each sub-region to be hidden as the preset pixel value to obtain the training image with the second preset number of hidden sub-regions.
The method for generating the image recognition model, wherein the dividing the original image into the sub-regions of the first preset number specifically includes:
acquiring the image size of an original image;
determining the number of pixel points corresponding to the original image according to the size of the original image, and calculating a first preset number corresponding to the original image according to the number of the pixel points;
dividing the original image into the first preset number of sub-regions.
The method for generating the image recognition model, wherein the determining of the sub-regions to be hidden in the first preset number of the sub-regions specifically includes:
randomly selecting the sub-regions of the second preset number from the sub-regions of the first preset number;
and taking the selected sub-areas with the second preset number as sub-areas to be hidden so as to obtain the sub-areas to be hidden with the second preset number.
The generation method of the image recognition model, wherein the pixel value of the pixel point in each sub-region to be hidden is set to the preset pixel value, so as to obtain the training image with the second preset number of hidden sub-regions, specifically:
and reading the pixel value of each pixel point in each selected to-be-hidden sub-area, and replacing the pixel value of each pixel point in each to-be-hidden sub-area with a pixel value 0 to obtain the training image with the second preset number of hidden sub-areas.
The generation method of the image recognition model, wherein the corresponding relation between the second preset number and the first preset number is as follows:
the second predetermined number RAND ([0.4,0.6]) the first predetermined number
Wherein the RAND () is a random function.
The method for generating the image recognition model, wherein the sequentially inputting the training images in the training image set into a preset convolutional neural network model and obtaining the prediction labels corresponding to the training images output by the preset convolutional neural network model, specifically comprises:
dividing the training image into a plurality of calculation areas according to a convolution kernel of the preset convolution neural network model, and acquiring an image state corresponding to each calculation area, wherein the image state at least comprises one of a completely hidden area, a partially hidden area and a completely non-hidden area;
respectively determining the calculation rules of each calculation region according to the image state corresponding to each calculation region, and respectively calculating the pixel value of each pixel point in each calculation region by adopting the calculation rules of each calculation region;
and obtaining a characteristic image which does not carry a hidden subarea and corresponds to the training image according to the pixel value of each pixel point in each calculation area, and identifying a prediction label corresponding to the training image according to the characteristic image.
The image recognition model generation method comprises the following steps:
Figure BDA0002114419010000031
wherein w ═ { w ═ w1,w2,w3,…,wK*KX is the weight of the convolution filter, x ═ x1,x2,x3,…,xK*KIs the pixel value in the training image, K is the size of the convolution kernel, v is the average value of all pixels in the training image, p1,p2,p3All the regions are calculation regions, view is a complete non-hidden region, inview is a complete hidden region, and pariview is a partial hidden region.
An image recognition method applying an image recognition model as described in any one of the above, the image recognition method comprising:
acquiring an image to be recognized, and inputting the image to be recognized into the image recognition model;
and identifying the image to be identified through the image identification model to obtain an object carried in the image to be identified.
A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps in the method for generating an image recognition model as described in any above or the steps in the method for image recognition as described above.
An application server, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;
the communication bus realizes connection communication between the processor and the memory;
the processor, when executing the computer readable program, implements the steps in the method for generating an image recognition model as described in any of the above or the steps in the method for image recognition as described above.
Has the advantages that: compared with the prior art, the invention provides a method for generating an image recognition model, a storage medium and an application server. Therefore, in the image recognition model generated in the embodiment, the training image with the partially hidden area is used as the training sample, so that the recognition accuracy of the image with the shielding function of the image recognition module is improved, and the robustness of the image recognition model is improved.
Drawings
FIG. 1 is a flow chart of a method for generating an image recognition model according to the present invention;
FIG. 2 is a schematic diagram of a training image in the method for generating an image recognition model according to the present invention;
FIG. 3 is a schematic diagram of an original image in the method for generating an image recognition model according to the present invention;
FIG. 4 is a flow chart of a preprocessing process in the image recognition model generation method provided by the present invention;
FIG. 5 is a schematic diagram of a preprocessing process in the image recognition model generation method provided by the present invention;
fig. 6 is a schematic diagram of a training image divided into a first preset number of sub-regions in the method for generating an image recognition model according to the present invention;
FIG. 7 is a flowchart of step M10 in the preprocessing process of the method for generating an image recognition model according to the present invention;
FIG. 8 is a flowchart of step M20 in the preprocessing process of the method for generating an image recognition model according to the present invention;
FIG. 9 is a flowchart of step S20 in the method for generating an image recognition model according to the present invention;
FIG. 10 is a schematic diagram of a training image processing procedure in the image recognition model generation method provided by the present invention;
fig. 11 is a schematic structural diagram of an embodiment of an application server provided in the present invention.
Detailed Description
The present invention provides a method for generating an image recognition model, a storage medium, and an application server, and in order to make the objects, technical solutions, and effects of the present invention clearer and clearer, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The invention will be further explained by the description of the embodiments with reference to the drawings.
The embodiment provides a method for generating an image recognition model, as shown in fig. 1, the method includes:
s10, a training image set is obtained, the training image set comprises a plurality of training images and real labels corresponding to the training images, each training image is an image with a hidden sub-region and a non-hidden sub-region, and the real labels corresponding to the training images are used for representing object types corresponding to the non-hidden sub-regions in the training images.
Specifically, the real label is used to label an object type of an object carried by a non-hidden sub-region in the training image, where the object type may include an object name. For example, the object carried by the non-hidden sub-region of the training image is a flower, then the real label comprises a flower. Furthermore, the non-hidden sub-area may carry one or more objects, and accordingly, the real tag may be one or more, and the real tag corresponds to an object carried by the non-hidden sub-area, for example, the non-hidden sub-area carries a rose and a babysbreath, the real tag includes a rose tag and a babysbreath tag, and the rose tag corresponds to the rose and the babysbreath tag corresponds to the babysbreath tag.
Further, the non-hidden sub-region is a region where the training image carries image information, and the hidden sub-region is a region where the image information of the training image is hidden, for example, the hidden sub-region displays one color, such as a gray region 10 in fig. 2. In addition, in this embodiment, the training image in the training sample may be obtained by preprocessing a raw image, and the raw image may be obtained by shooting with an imaging device (e.g., a camera), or may be obtained through the internet (e.g., a picture downloaded by hundreds of degrees). For example, as shown in fig. 3, the original image is a picture of a rose and a starry sky taken by an image pickup device.
Further, in an implementation manner of this embodiment, as shown in fig. 4 and 5, a preprocessing manner of preprocessing the original image to obtain the training image may be:
m10, dividing the original image into a first preset number of sub-regions;
m20, determining a second preset number of sub-areas to be hidden in the first preset number of sub-areas; wherein the first preset number is greater than the second preset number;
m30, setting the pixel value of the pixel point in each sub-region to be hidden as the preset pixel value, and obtaining the training image with the second preset number of hidden sub-regions.
Specifically, in the step M10, the dividing of the original image into the first preset number of sub-regions may equally divide the original image into the first preset number of sub-regions, or may divide the original image into the first preset number of sub-regions in a preset shape, for example, as shown in fig. 6, a gray region block in the diagram represents a preset square, and the training image is divided into the first preset number of square sub-regions, where each sub-region obtained by dividing the training image into the first preset number of sub-regions is represented by a gray region for the purpose of dividing the training image into the first preset number of state diagrams. Of course, in practical applications, the preset image may be rectangular or triangular. In addition, the original image is divided into a first preset number of sub-regions, so that the sum of the areas of the first preset number of sub-regions is equal to the area of the original image, and the original image can be formed by splicing the first preset number of sub-regions according to a dividing mode.
Further, in a preferred implementation manner of this embodiment, as shown in fig. 7, the dividing the original image into a first preset number of sub-regions may include the following steps:
m11, acquiring the image size of the original image;
m12, determining the number of pixel points corresponding to the original image according to the size of the original image, and calculating a first preset number corresponding to the original image according to the number of the pixel points;
m13, dividing the original image into the first preset number of sub-regions.
Specifically, the image size of the original image refers to the pixel size of the original image, and the image size of the acquired original image refers to the pixel size of the acquired original image, including the pixel size in the length direction and the pixel size in the width direction. For example, if the original image is 600 × 400, the image size of the original image is 600 pixels × 400 pixels, the pixel size of the original image in the longitudinal direction is 600 pixels, which is denoted as the length of the original image, and the pixel size of the original image in the width direction is 400 pixels, which is denoted as the length of the original image.
Further, the number of pixel points corresponding to the original image is the sum of the number of all pixel points included in the original image, and after the image size of the original image is obtained, the number of all pixel points included in the original image, that is, the number of pixel points of the original image, can be calculated according to the length and width of the original image, where the number of pixel points of the original image is equal to the product of the length of the original image and the width of the original image. For example, if the original image has a length of 600 and a width of 400, the number of pixels corresponding to the original image is 600 × 400 — 240000.
In addition, the corresponding relation between the first preset number and the number of the pixel points is preset, and after the number of the pixel points of the original image is obtained, the first preset number corresponding to the original image can be determined according to the corresponding relation, so that the speed of determining the first preset number corresponding to the original image is increased, and the efficiency of dividing the original image can be increased. In this embodiment, the correspondence between the first preset number and the number of the pixel points is preferably:
Figure BDA0002114419010000081
wherein n is the number of pixel points of the original image, S2Equal to a first preset number.
For example, the number of pixel points of the original image is 2000, and it can be known from the corresponding relationship between the first preset number and the number of pixel points that: since 2000 is greater than 1000, S-7 can be obtained, while the first preset number is equal to S2Thus, it can be obtained that the first preset number is 7249. That is, for an original image having a pixel count of 2000, the original image is divided into 49 sub-regions
Further, after the first preset number is determined, when the original image is divided according to the first preset number, the element image may be divided into m parts according to the length direction, and then divided into n parts according to the width direction, so as to obtain the sub-regions of the first preset number. Preferably, the m parts in the length direction and the n parts in the width direction are equal and are each equal to half of a first preset number, wherein the first preset number is a multiple of 2. After the number of copies in the length direction and the width direction is determined, the length and the width of each sub-region can be calculated according to the determined number of copies and the length and the width of the original image. In the process of calculating the length and the width of each subregion, if the length and the width of the original image are both multiple of the number of copies, taking the quotient of the length and the number of the copies of the image as the length of the subregion, and taking the quotient of the width and the number of the copies of the image as the width of the subregion; if the length and the width of the image are not multiples of the number of copies, adding a supplementary region in the length and/or width direction of the original image, wherein the pixel values of the added pixels are preset pixel values, so that the length and the width of the supplemented image are multiples of the number of copies, then adopting the quotient of the length and the number of the supplemented image as the length of the sub-region, and adopting the quotient of the width and the number of the supplemented image as the width of the sub-region. Of course, it is worth to be noted that the number of sub-regions obtained by dividing the supplemented image is the first preset number.
Further, in the step M20, the sub-region to be hidden is one of the first preset number of sub-regions obtained by dividing the original image, and the number of the sub-regions to be hidden is smaller than the number of the sub-regions obtained by dividing the original image, that is, the second preset number is smaller than the first preset number, so that an unselected sub-region exists in the original image, and the training image with the hidden sub-region obtained by preprocessing has the non-hidden sub-region, and the training image with the hidden sub-region obtained by preprocessing includes image information, where the image information is information that can be used to determine an object in the training image, and for example, the image information may include pixel points with different pixel values, and an object boundary in the image may be identified by the pixel points with different pixel values. Therefore, when the training image is used as an input item of the preset convolutional neural network, the preset convolutional neural network can obtain a prediction label according to the image information of the image, and the problem that the preset convolutional neural network cannot be identified due to the fact that the image is completely hidden is solved.
In addition, the second predetermined number may be randomly selected on the premise that the second predetermined number is smaller than the first predetermined number, for example, the first predetermined number is 16, and the second predetermined number is 10. The second preset number may also be determined according to the first preset number, for example, the second preset number is half of the first preset number. In this embodiment, the second preset number is selected according to a preset rule according to a first preset number, where the preset rule is that the second preset number is RAND ([0.4,0.6]) the first preset number, and the RAND () is a random function. In this embodiment, the RAND function may be any random function, and the RAND function may be directly called in python API, MatLab, Java API.
Further, in an implementation manner of this embodiment, after the second preset number is obtained, a second preset number of sub-regions may be randomly selected from the first preset number of sub-regions, and the selected second preset number of sub-regions serve as sub-regions to be hidden. Accordingly, as shown in fig. 8, the step M20, for each image of the original image set, selecting a second preset number of sub-regions in the original image may include the following steps:
m21, randomly selecting sub-regions with a second preset number from the sub-regions with the first preset number, wherein the first preset number is larger than the second preset number;
and M22, selecting each sub-area as a sub-area to be hidden so as to obtain a second preset number of sub-areas to be hidden.
Specifically, each sub-region to be hidden is randomly selected from a first preset number of sub-regions, and two sub-regions in each sub-region to be hidden may or may not be adjacent to each other. In this embodiment, the two unconnected sub-regions are stored in each sub-region to be hidden, and when all selected sub-regions to be hidden are hidden, the non-hidden region in the original image carries image information. For example, as shown in fig. 2, the gray sub-region 10 identifies a selected sub-region to be hidden, the non-gray sub-region 20 is a non-hidden sub-region, and the non-hidden sub-region carries image information. In addition, after a second preset number of sub-regions are obtained, the second preset sub-regions are marked as sub-regions to be hidden. The mark is a hidden sub-region, the hidden mark may be set in each selected sub-region, for example, two characters of "hidden" are added, and the region to be hidden may be determined according to the recorded coordinate region by recording the coordinate region of each selected sub-region in the image coordinate system of the original image. Of course, in practical applications, the labels of the sub-regions of the first preset number may also be obtained after the sub-regions of the second preset number are selected, so as to record the selected sub-regions of the second preset number.
Further, in the step M30, after a second preset number of sub-regions to be hidden are obtained, the pixel values of the pixel points in the sub-regions to be hidden are set as preset pixel values. When the pixel values of all the pixel points in the sub-region to be hidden are the preset pixel values, the sub-region to be hidden is an image with the same color, and the image content of the region to be hidden is hidden through the image with the same color so as to generate a training image. In this embodiment, the preset pixel value is preferably a pixel value of 0, so that in the recognition process, the influence of the hidden area on the recognition result is avoided. Therefore, the pixel value of the pixel point in each sub-region to be hidden is set to the preset pixel value, so as to obtain the training image with the second preset number of hidden sub-regions, specifically: and reading the pixel value of each pixel point in each selected to-be-hidden sub-area, and replacing the read pixel value of each pixel point in each to-be-hidden sub-area with a pixel value 0 to obtain the training image with the second preset number of hidden sub-areas.
S20, inputting the training images in the training image set into a preset convolutional neural network model, and obtaining a prediction label corresponding to the training images output by the preset convolutional neural network model, wherein the prediction label is used for indicating that the preset convolutional neural network model identifies the object type corresponding to the non-hidden sub-region in the training images.
Specifically, the training image set is an input item of a preset convolutional neural network, and is used for training the preset convolutional neural network model. The prediction label is a training image in a training image set, is input into a preset convolutional neural network, and identifies an object type corresponding to a non-hidden subregion of the training image obtained through the preset convolutional neural network. And the predicted labels are used for comparing the real labels of the corresponding training images so as to calculate the loss value of the preset convolutional neural network.
In addition, in the training process of the image recognition model, when the preset convolutional neural network model calculates the training image, the size of the calculated sub-region each time is the size of the convolutional kernel of the preset convolutional neural network model. Therefore, when the training image is processed through the preset convolutional neural network model, the training image can be divided into a plurality of calculation regions according to the size of the convolutional kernel of the preset convolutional neural network model, the pixel value of each calculation region is calculated according to the calculation rule corresponding to the image state of each calculation region obtained through division, and the prediction label of the object carried by the training image is identified according to the calculated pixel value of each calculation region. Correspondingly, as shown in fig. 9, the sequentially inputting the training images in the training image set into a preset convolutional neural network model and obtaining the prediction labels corresponding to the training images output by the preset convolutional neural network model may include the following steps:
s21, dividing the training image into a plurality of calculation areas according to the convolution kernel of the preset convolution neural network model, and acquiring the image state corresponding to each calculation area, wherein the image state at least comprises one of a completely hidden area, a partially hidden area and a completely non-hidden area;
s22, determining the calculation rules of each calculation region according to the image state corresponding to each calculation region, and calculating the pixel value of each pixel point in each calculation region by adopting the calculation rules of each calculation region;
and S23, obtaining a feature image which does not carry a hidden sub-region and corresponds to the training image according to the pixel value of each pixel point in each calculation region, and identifying a prediction label corresponding to the training image according to the feature image.
Specifically, the image state includes at least one of a completely hidden region, a partially hidden region, and a completely non-hidden region, and accordingly, the divided calculation regions include at least one of a completely hidden calculation region, a partially hidden calculation region, and a completely non-hidden calculation region, for example, as shown in fig. 10, the divided calculation regions include a completely hidden calculation region, a partially hidden calculation region, and a completely non-hidden calculation region. The fully hidden calculation region refers to a calculation region in which the image information is completely hidden, the partially hidden calculation region refers to a calculation region in which the image information is partially hidden and partially not hidden, and the fully non-hidden calculation region refers to a calculation region in which the image information is not completely hidden. For example, the first calculation region 1 shown in fig. 10 is a completely hidden region, the second calculation region 2 is a partially hidden region, and the third calculation region 3 is a completely non-hidden region.
Further, the calculation region is obtained by dividing a convolution kernel of a preset convolution neural network model, and the size of the calculation region is the same as that of the convolution kernel, so that the preset convolution neural network model calculates one calculation region at a time. Meanwhile, in order to avoid influence of a pixel value 0 of the hidden region on robustness of the preset convolutional neural network model, when each calculation region is calculated, each pixel value contained in each calculation region can be updated by adopting a preset rule to obtain an updated feature image, and a prediction label of an object carried by a non-hidden region of the training image is identified according to the updated feature image, so that accuracy of identification of the prediction label can be improved, and learning speed of the preset convolutional neural network model can be improved. In this embodiment, different calculation manners are set for calculation areas of different image types, and the correspondence between the image state and the calculation rule may be: the correspondence between the image state and the calculation rule may be:
Figure BDA0002114419010000131
wherein w ═ { w ═ w1,w2,w3,…,wK*KX is the weight of the convolution filter, x ═ x1,x2,x3,…,xK*KIs the pixel value in the training image, K is the size of the convolution filter, v is the average value of all pixels in the training image, p1,p2,p3All the regions are calculation regions, view is a complete non-hidden region, inview is a complete hidden region, and pariview is a partial hidden region.
Of course, in practical applications, the preset convolutional neural network may directly perform recognition operation on the training image to recognize the object carried by the training image. For example, after the training image is divided into a plurality of calculation regions according to a convolution kernel, the pixel values corresponding to the calculation regions are directly adopted to perform recognition operation to obtain the object carried by the training image, and the feature image of the training image is not calculated through a rule in advance.
S30, correcting the model parameters of the preset convolutional neural network model according to the real label corresponding to the training image and the prediction label corresponding to the training image; and continuing to execute the step of inputting the training images in the training image set into a preset convolutional neural network model until the training condition of the preset convolutional neural network model meets a preset condition, so as to obtain a trained image recognition model.
Specifically, the preset condition includes that the loss value meets a preset requirement or the training times reach a preset number. The preset requirement may be determined according to the accuracy of the image recognition model, which is not described in detail herein, and the preset number may be a maximum training number of the preset convolutional neural network, for example, 1000 times. Therefore, a prediction label is output in a preset convolutional neural network, a loss value of the preset convolutional neural network is calculated according to the prediction label and the real label, and after the loss value is obtained through calculation, whether the loss value meets a preset requirement is judged; if the loss value meets the preset requirement, ending the training; if the loss value does not meet the preset requirement, judging whether the training times of the preset convolutional neural network reach the prediction times, and if not, correcting the network parameters of the preset neural network according to the loss value; and if the preset times are reached, ending the training. Therefore, whether the preset convolutional neural network training is finished or not is judged through the loss value and the training frequency, and the phenomenon that the training of the preset convolutional neural network enters a dead cycle due to the fact that the loss value cannot meet the preset requirement can be avoided.
Based on the above method for generating an image recognition model, the present invention further provides an image recognition method, which applies the image recognition model according to the above embodiment, and the image recognition method includes:
acquiring an image to be recognized, and inputting the image to be recognized into the image recognition model;
and identifying the image to be identified through the image identification model to obtain an object carried by the image to be identified.
Specifically, after the image to be recognized is input into the image recognition model, the image recognition model divides the image to be recognized into a plurality of calculation regions, wherein the size of each calculation region is the same as the size of a convolution kernel of the image recognition model. After the calculation regions are obtained through division, calculation is carried out on each calculation region, and when calculation is carried out on each calculation region, the image state of each calculation region is obtained, wherein the image state is located in one of a completely hidden region, a partially hidden region and a completely non-hidden region. In addition, in this embodiment, after the image state of the calculation region is acquired, the calculation rule corresponding to the calculation region is determined according to the image state, and the correspondence relationship between the image state and the calculation rule may be:
Figure BDA0002114419010000141
wherein w ═ { w ═ w1,w2,w3,…,wK*KX is the weight of the convolution filter, x ═ x1,x2,x3,…,xK*KIs the pixel value in the training imageK is the size of the convolution filter, v is the average of all pixels in the training image, p1,p2,p3All the regions are calculation regions, view is a complete non-hidden region, inview is a complete hidden region, and pariview is a partial hidden region.
Further, after the calculation rule is determined according to the image state of the calculation region, each pixel value included in each calculation region may be updated according to the calculation rule corresponding to each calculation region to obtain a feature image corresponding to the training image, and an object of an object carried in a non-hidden region of the training image is identified according to the feature image, so that the accuracy of object identification may be improved. Certainly, in a possible implementation manner, the image recognition model in this embodiment may be obtained by training in a manner of directly calculating a training image to obtain a prediction label, and is obtained by training in a manner of not obtaining a feature image without a hidden sub-region according to the calculation training image and then determining the prediction label according to the feature image; correspondingly, when the image to be recognized is input into the image recognition model, the image recognition model can directly perform recognition operation on the image to be recognized so as to obtain an object carried by the image to be recognized. For example, when an image to be recognized is acquired, the image to be recognized is input into the image recognition model, the image recognition model acquires pixel values of pixel points included in the model to be recognized, the image to be recognized is recognized according to the acquired pixel values to acquire an object carried by the image to be recognized, that is, a feature image without a hidden sub-region is not required to be acquired according to the image to be recognized, and then a label is determined according to the feature image.
Based on the above image recognition model generation method, the present invention also provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the image recognition model generation method according to the above embodiment.
Based on the above image recognition model generation method, the present invention further provides an application server, as shown in fig. 11, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory)22, and may further include a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the methods in the embodiments described above.
Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.
The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.
The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.
In addition, the specific processes loaded and executed by the instruction processors in the storage medium and the application server are described in detail in the method, and are not stated herein.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (11)

1. A method for generating an image recognition model, comprising:
acquiring a training image set, wherein the training image set comprises a plurality of training images and a real label corresponding to each training image, each training image is an image with a hidden sub-region and a non-hidden sub-region, and the real label corresponding to each training image is used for representing an object type corresponding to the non-hidden sub-region in the training image;
inputting training images in the training image set into a preset convolutional neural network model, and acquiring a prediction label corresponding to the training images output by the preset convolutional neural network model, wherein the prediction label is used for representing that the preset convolutional neural network model identifies an object type corresponding to a non-hidden subregion in the training images;
correcting the model parameters of the preset convolutional neural network model according to the real label corresponding to the training image and the prediction label corresponding to the training image; and continuing to execute the step of inputting the training images in the training image set into a preset convolutional neural network model until the training condition of the preset convolutional neural network model meets a preset condition, so as to obtain a trained image recognition model.
2. The method for generating an image recognition model according to claim 1, wherein each training image in the training image set is obtained by preprocessing an original image; the preprocessing mode for obtaining the training image according to the original image is as follows:
dividing an original image into a first preset number of sub-regions;
determining a second preset number of sub-areas to be hidden in the first preset number of sub-areas; wherein the first preset number is greater than the second preset number;
and setting the pixel value of the pixel point in each sub-region to be hidden as the preset pixel value to obtain the training image with the second preset number of hidden sub-regions.
3. The method for generating an image recognition model according to claim 2, wherein the dividing the original image into a first preset number of sub-regions specifically comprises:
acquiring the image size of an original image;
determining the number of pixel points corresponding to the original image according to the size of the original image, and calculating a first preset number corresponding to the original image according to the number of the pixel points;
dividing the original image into the first preset number of sub-regions.
4. The method for generating an image recognition model according to claim 2, wherein the determining a second preset number of sub-regions to be hidden in the first preset number of sub-regions specifically includes:
randomly selecting the sub-regions of the second preset number from the sub-regions of the first preset number;
and taking the selected sub-areas with the second preset number as sub-areas to be hidden so as to obtain the sub-areas to be hidden with the second preset number.
5. The method for generating an image recognition model according to claim 2, wherein the step of setting the pixel value of the pixel point in each sub-region to be hidden as the preset pixel value to obtain the training image with the second preset number of hidden sub-regions comprises:
and reading the pixel value of each pixel point in each selected to-be-hidden sub-area, and replacing the pixel value of each pixel point in each to-be-hidden sub-area with a pixel value 0 to obtain the training image with the second preset number of hidden sub-areas.
6. The method for generating an image recognition model according to claims 2 to 5, wherein the correspondence relationship between the second preset number and the first preset number is:
the second predetermined number RAND ([0.4,0.6]) the first predetermined number
Wherein the RAND () is a random function.
7. The method for generating an image recognition model according to any one of claims 1 to 5, wherein the inputting the training images in the training image set into a preset convolutional neural network model and obtaining the prediction labels corresponding to the training images output by the preset convolutional neural network model specifically includes:
dividing the training image into a plurality of calculation areas according to a convolution kernel of the preset convolution neural network model, and acquiring an image state corresponding to each calculation area, wherein the image state at least comprises one of a completely hidden area, a partially hidden area and a completely non-hidden area;
respectively determining the calculation rules of each calculation region according to the image state corresponding to each calculation region, and respectively calculating the pixel value of each pixel point in each calculation region by adopting the calculation rules of each calculation region;
and obtaining a characteristic image which does not carry a hidden subarea and corresponds to the training image according to the pixel value of each pixel point in each calculation area, and identifying a prediction label corresponding to the training image according to the characteristic image.
8. The method for generating an image recognition model according to claim 7, wherein the calculation rule is:
Figure FDA0002114417000000031
wherein w ═ { w ═ w1,w2,w3,…,wK*KX is the weight of the convolution filter, x ═ x1,x2,x3,…,xK*KIs the pixel value in the training image, K is the size of the convolution kernel, v is the average value of all pixels in the training image, p1,p2,p3All the regions are calculation regions, view is a complete non-hidden region, inview is a complete hidden region, and pariview is a partial hidden region.
9. An image recognition method, characterized in that the image recognition model according to any one of claims 1-8 is applied, the image recognition method comprising:
acquiring an image to be recognized, and inputting the image to be recognized into the image recognition model;
and identifying the image to be identified through the image identification model to obtain an object carried in the image to be identified.
10. A computer readable storage medium storing one or more programs which are executable by one or more processors to perform the steps of the method for generating an image recognition model according to any one of claims 1 to 8 or the steps of the method for image recognition according to claim 9.
11. An application server, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;
the communication bus realizes connection communication between the processor and the memory;
the processor, when executing the computer readable program, implements the steps in the method for generating an image recognition model according to any one of claims 1 to 8 or the steps in the method for image recognition according to claim 9.
CN201910593320.5A 2019-07-01 2019-07-01 Image recognition model generation method, storage medium and application server Pending CN112183563A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910593320.5A CN112183563A (en) 2019-07-01 2019-07-01 Image recognition model generation method, storage medium and application server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910593320.5A CN112183563A (en) 2019-07-01 2019-07-01 Image recognition model generation method, storage medium and application server

Publications (1)

Publication Number Publication Date
CN112183563A true CN112183563A (en) 2021-01-05

Family

ID=73914411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910593320.5A Pending CN112183563A (en) 2019-07-01 2019-07-01 Image recognition model generation method, storage medium and application server

Country Status (1)

Country Link
CN (1) CN112183563A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372920A (en) * 2023-09-21 2024-01-09 中山大学 Pool boiling process identification method, model training method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102037A (en) * 2018-06-04 2018-12-28 平安科技(深圳)有限公司 Chinese model training, Chinese image-recognizing method, device, equipment and medium
CN109359559A (en) * 2018-09-27 2019-02-19 天津师范大学 A kind of recognition methods again of the pedestrian based on dynamic barriers sample
CN109447981A (en) * 2018-11-12 2019-03-08 平安科技(深圳)有限公司 Image-recognizing method and Related product
CN109784255A (en) * 2019-01-07 2019-05-21 深圳市商汤科技有限公司 Neural network training method and device and recognition methods and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102037A (en) * 2018-06-04 2018-12-28 平安科技(深圳)有限公司 Chinese model training, Chinese image-recognizing method, device, equipment and medium
CN109359559A (en) * 2018-09-27 2019-02-19 天津师范大学 A kind of recognition methods again of the pedestrian based on dynamic barriers sample
CN109447981A (en) * 2018-11-12 2019-03-08 平安科技(深圳)有限公司 Image-recognizing method and Related product
CN109784255A (en) * 2019-01-07 2019-05-21 深圳市商汤科技有限公司 Neural network training method and device and recognition methods and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KRISHNA KUMAR SINGH: "Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond", ARXIV:1811.02545V1 [CS.CV], pages 1 - 14 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372920A (en) * 2023-09-21 2024-01-09 中山大学 Pool boiling process identification method, model training method, device and equipment

Similar Documents

Publication Publication Date Title
CN108122234B (en) Convolutional neural network training and video processing method and device and electronic equipment
CN110176027B (en) Video target tracking method, device, equipment and storage medium
CN110188760B (en) Image processing model training method, image processing method and electronic equipment
CN111667520B (en) Registration method and device for infrared image and visible light image and readable storage medium
CN108875931B (en) Neural network training and image processing method, device and system
CN109919971B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN113724128B (en) Training sample expansion method
CN109685805B (en) Image segmentation method and device
CN111814820B (en) Image processing method and device
CN113516697A (en) Image registration method and device, electronic equipment and computer-readable storage medium
US20210125317A1 (en) Learning device, image generation device, learning method, image generation method, and program
CN108734712B (en) Background segmentation method and device and computer storage medium
CN112183563A (en) Image recognition model generation method, storage medium and application server
CN112396594A (en) Change detection model acquisition method and device, change detection method, computer device and readable storage medium
CN110210314B (en) Face detection method, device, computer equipment and storage medium
CN113012030A (en) Image splicing method, device and equipment
CN116091784A (en) Target tracking method, device and storage medium
CN110751163A (en) Target positioning method and device, computer readable storage medium and electronic equipment
CN113034449B (en) Target detection model training method and device and communication equipment
CN112149745B (en) Method, device, equipment and storage medium for determining difficult example sample
CN114511702A (en) Remote sensing image segmentation method and system based on multi-scale weighted attention
CN111754518B (en) Image set expansion method and device and electronic equipment
CN109977937B (en) Image processing method, device and equipment
CN113391779A (en) Parameter adjusting method, device and equipment for paper-like screen
Jin et al. SPOID: a system to produce spot-the-difference puzzle images with difficulty

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination