CN111310800A - Image classification model generation method and device, computer equipment and storage medium - Google Patents

Image classification model generation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111310800A
CN111310800A CN202010063515.1A CN202010063515A CN111310800A CN 111310800 A CN111310800 A CN 111310800A CN 202010063515 A CN202010063515 A CN 202010063515A CN 111310800 A CN111310800 A CN 111310800A
Authority
CN
China
Prior art keywords
target object
classification model
image classification
image
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010063515.1A
Other languages
Chinese (zh)
Other versions
CN111310800B (en
Inventor
张力文
刘建光
金子杰
武小亮
罗育浩
潘浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Shilian Technology Co ltd
Original Assignee
21cn Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 21cn Corp ltd filed Critical 21cn Corp ltd
Priority to CN202010063515.1A priority Critical patent/CN111310800B/en
Publication of CN111310800A publication Critical patent/CN111310800A/en
Application granted granted Critical
Publication of CN111310800B publication Critical patent/CN111310800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to an image classification model generation method, an image classification model generation device, computer equipment and a storage medium, wherein a picture data set is obtained; inputting the picture data set into an image classification model to be trained, determining the position of a target object in a picture, obtaining a target object image and outputting the target object image to an object recognition unit; identifying a feature map corresponding to the target object image by an object identification unit; determining a loss function value between the characteristic diagram and the target object image through a preset weighted Euclidean loss function; and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the recognized characteristic diagram and the target object image are smaller than a preset threshold value, and finishing training. The method accelerates the convergence speed of the loss function value, improves the generation efficiency of the image classification model, and improves the identification capability of the image classification model obtained by the method on the low-pixel image.

Description

Image classification model generation method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to a method and an apparatus for generating an image classification model, a computer device, and a storage medium.
Background
With the rapid development of computer vision technology, image classification technology has become mature and is applied to a plurality of life scenes.
The loss function is widely applied to evaluation of the training effect of the image classification model, and the smaller the loss function value is, the better the training effect of the image classification model is, and the higher the accuracy of the model prediction result is.
However, when the image classification model is trained by using a low-pixel picture, the recognition effect of the model on the low-pixel picture is poor, and the convergence speed of the loss function value is slow, so that the efficiency of obtaining an available image classification model is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an image classification model generation method, apparatus, computer device, and storage medium for solving the above technical problems.
A method of generating an image classification model, the method comprising:
acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of the target object in the picture according to the position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained;
identifying a feature map corresponding to the target object image by the object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
In one embodiment, the loss function value between the feature map and the target object image may be calculated by:
Figure BDA0002375253290000021
wherein L issalIs the preset weighted Euclidean loss function; l issal(x, g) is the loss function value; x is the pixel point value of the feature map, g is the pixel point value of the target object image, gi>0.5, i represents the ith pixel point of the characteristic diagram, d is the maximum value of the number of the pixel points of the characteristic diagram, and α is fixed weight.
In one embodiment, the position of the pre-labeled target object is identified as a rectangular border containing the target object; the position identification carries the width and height information of the rectangular frame and position coordinates of at least two opposite angles.
In one embodiment, the target object image may be calculated by:
Figure BDA0002375253290000022
Figure BDA0002375253290000023
Figure BDA0002375253290000024
wherein M isgIs the target object image; x and y are two vectors of a rectangular frame where the target object is located; cxiAnd CyiThe center point of the rectangular frame where the target object is located; s is a training step length in the training parameters; t is matrix transposition; n is the number of rectangular borders where the target objects are located in the picture; v. ofxyIs the coordinate of the center position of the rectangular frame where the target object is located, vxy=[x,y]TAnd v isxy∈RBi,RBiRepresents the area ratio of the rectangular frame of the target object in the picture, BiRepresenting the ith rectangular frame in the picture; mu.siIs the center position coordinates of the feature map; sigmaiIs a covariance matrix; w is aiAnd hiRespectively representing the width and height of the frame where the target object is located.
In one embodiment, before inputting the picture data set into an image classification model to be trained, the method further includes: and adjusting the sizes of the pictures in the picture data set to enable the sizes of the pictures in the picture data set to be uniform.
In one embodiment, the image classification model to be trained is a convolutional neural network.
In one embodiment, after obtaining the trained image classification model, the method further includes:
inputting a test picture into the trained image classification model to obtain an image classification result returned by the trained image classification model; the image classification result contains category identification information; the test picture at least comprises one target object and also comprises a category identification of the target object;
and if the category identification information in the image classification result is the same as the category identification of the test picture and the matching degree reaches a preset matching threshold, determining to generate the image classification model.
An image classification model generation apparatus, the apparatus comprising:
the data set acquisition module is used for acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
the position information determining module is used for inputting the picture data set into an image classification model to be trained, so that a position detecting unit in the image classification model determines the position of the target object in the picture according to the position identification to obtain a target object image and outputs the target object image to an object recognition unit of the image classification model to be trained;
the characteristic diagram acquisition module is used for identifying a characteristic diagram corresponding to the target object image through the object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
a loss function value determination module configured to determine a loss function value between the feature map and the target object image based on the feature identification information, the position detection information, and the category identification information by a preset weighted euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and the model training module is used for adjusting parameters to be trained in the position detection unit and/or the object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of the target object in the picture according to the position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained;
identifying a feature map corresponding to the target object image by the object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of the target object in the picture according to the position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained;
identifying a feature map corresponding to the target object image by the object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
The image classification model generation method, the image classification model generation device, the computer equipment and the storage medium are used for acquiring the image data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance; inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of a target object in a picture according to a position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained; identifying a feature map corresponding to the target object image by an object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object; determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted Euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function; and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model. The method comprises the steps of determining a target object image according to a position identification of a target object in an image participating in training through a position detection unit, comparing a difference between a feature map identified by the target object image and the target object image through a weighted Euclidean loss function to obtain a loss function value, adjusting a parameter to be trained in an image classification model according to the loss function value, accelerating the convergence speed of the loss function value, improving the generation efficiency of the image classification model, and improving the recognition capability of the image classification model obtained through the method on a low-pixel image.
Drawings
FIG. 1 is a schematic flow chart diagram of a method for generating an image classification model according to an embodiment;
FIG. 2 is a diagram illustrating a picture in a picture dataset according to an embodiment;
FIG. 3 is a flowchart illustrating the steps for determining an image classification model to generate after obtaining a trained image classification model according to one embodiment;
FIG. 4 is a diagram illustrating an image classification result output by the image classification model in one embodiment;
FIG. 5 is a block diagram showing the configuration of an image classification model generation apparatus according to an embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In an embodiment, as shown in fig. 1, an image classification model generation method is provided, and this embodiment is illustrated by applying the method to a server, and it is to be understood that the method may also be applied to a terminal, and may also be applied to a system including a terminal and a server, and is implemented by interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step S11, acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of the target object which are marked in advance.
The picture data set is a file set composed of a plurality of pictures, and the format, size, and the like of the picture file are not limited. Each picture should contain a specific kind of target object; as shown in fig. 2, the graph includes 2 categories of target objects, which are Person and Laptop; the target object is a target which is actually required to be identified by the image classification model in one picture; the common marking form is that rectangular frames are used for enclosing target objects, each rectangular frame is correspondingly marked with one target object, the rectangular frames are target object position marks, four corners carry coordinate information, and the positions of the target objects in the picture can be determined according to the coordinate information of any two opposite corners; it should be noted that the format of labeling the target object is not limited to the rectangular frame, and may be replaced by other forms or formats; the upper portion of the rectangular border is labeled as a category label, which may also be commonly referred to as a "label".
Specifically, the target object types required by training are determined, arranged in sequence, and the pictures containing the target objects are labeled. The image classification model acquires the marked image data set from data sources such as a database and the like, can be directly acquired in a data packet mode, and can also be acquired one by one until all data required by training are acquired, and the category and the number are not limited; in order to ensure the effect of model training, the number of pictures corresponding to each category should be greater than 1500. The picture data set can be added in the middle of the training process according to the requirement. After the image classification model obtains the image data set, the integrity of the image data set, such as the number check of images, the class identifier missing check, the position identifier missing check, and the like, can be checked according to various parameters required in the generation process of the image classification model.
In the step, the picture data set containing the target object image to be recognized is obtained by acquiring the picture data set, each picture in the data set carries the position identification and the category identification, the integrity of the picture data set is checked, the data base of model generation is ensured, errors in training are avoided, and the generation efficiency of the image classification model is improved.
Step S12, the image data set is input into the image classification model to be trained, so that the position detection unit in the image classification model determines the position of the target object in the image according to the position identifier, and the target object image is obtained and output to the object recognition unit of the image classification model to be trained.
The image classification model is an operation model used for recognizing an input image and outputting an image classification result after training, and may be a MobileNet Network structure (a mobile terminal or a lightweight CNN Network in an embedded device) or a convolutional neural Network VGG16(Visual Geometry Group Network). The position detection unit can determine the position of the target object in the picture through the position identification; as shown in fig. 2, the rectangular frame is the position identifier, and the image content in the rectangular frame is the target object to be identified.
Specifically, after the image classification model to be trained acquires the picture data, a position detection unit is called to identify a position identifier marked in advance, so that the position of a target object to be identified is determined, and each target object is stripped from the picture to obtain a target object image; then, the picture classification model outputs the obtained target object image to the object recognition unit.
Determining a target object image according to the position identification of a target object in the images participating in training by a position detection unit of an image classification model, and separating the target object to be recognized from the image; the data size of the image classification model to be recognized is reduced, errors of recognized objects during training are avoided, and the image classification model generation efficiency is improved due to the smaller data size.
Step S13, recognizing, by the object recognition unit, a feature map corresponding to the target object image; the feature map includes feature identification information, position detection information, and category identification information of the target object.
The object recognition unit can analyze the target object image and extract more remarkable feature points in the target object image. The feature map is composed of a plurality of feature points, and also carries feature identification information in which feature point data information is stored, and position detection information and category identification information corresponding to the target object image. The position detection information is positioning result information obtained after the image classification model identifies the position identification of the image, and the category identification information is target object category result information obtained after the image classification model identifies the category identification of the image.
Specifically, after an object identification unit in the image classification model acquires a target object image to be identified, the target object image is identified, and a feature map is generated after convolution processing is performed according to features of the target object image. It should be noted that the feature map needs to be updated continuously for image classification, so that the image classification model needs to identify the target object image and update the feature map for multiple times, thereby improving the accuracy of the image classification model. Different training parameter settings are provided in the object recognition unit according to the difference of the utilized neural network model, and the characteristic diagram is obtained after multiple operations according to the training parameters which are continuously adjusted and updated.
In the step, an object identification unit in an image classification model identifies an acquired target object image to obtain a feature map; the training effect and progress of the image classification model can be judged through the characteristic diagram, timely adjustment is made, and the generation efficiency of the image classification model is improved.
Step S14 of determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information, and the category identification information by a preset weighted euclidean loss function; the weighted euclidean loss function converges faster than the euclidean loss function.
The loss function is often used to measure the prediction ability of the neural network learning model, and the algorithms involved in the neural network basically depend on the process of maximizing or minimizing the objective function, so the minimized function is often called the loss function. In the process of generating the image classification model, the purpose of the loss function is to minimize the difference between the prediction image and the target object image, the process of gradually reducing the difference is convergence, and the convergence speed is related to the training speed of the model; although the euclidean loss function is widely used to measure the distance between pixels, this method performs poorly if the pixels are commonly 0 or the pixels are commonly low, normalizing most pixels to 0. Therefore, we use a weighted euclidean loss function to better handle this case; the weighted Euclidean loss function can distribute more weights to the target object image, so that the background of the target object image is lightened, and the problem of non-difference normalization of pixel points under the condition of low pixels is solved.
Specifically, the image classification model compares the identified feature map and the target object image by using a weighted euclidean loss function, and obtains a loss function value by combining the feature identification information, the position detection information, and the category identification information and comparing the difference therebetween. The image classification model can evaluate the training effect according to the obtained loss function value, and determine the adjustment direction, adjustment range, adjustment value and the like of the training parameters.
Comparing the difference between a characteristic image identified aiming at a target object image and the target object image by a weighted Euclidean loss function to obtain a loss function value; the weighted Euclidean loss function can distribute more weights to the target object image, so that the background of the target object image is lightened, the problem of non-difference normalization of pixel points under the condition of low pixel is solved, training parameters can be better adjusted, and the recognition capability of the image classification model obtained by the method to the low pixel image is improved.
And step S15, adjusting parameters to be trained in the position detection unit and/or the object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
The parameters to be trained are initial parameters of the neural network model before training, and the training process is a process of continuously adjusting the training parameters according to the loss function values.
Specifically, a loss function value between the target object image and the recognized feature map is calculated based on the feature recognition information, the position detection information and the category recognition information, and a training parameter of the image classification model is updated and adjusted according to the loss function value until the calculated loss function value reaches a preset threshold value, and the training parameter is determined to be applicable to image classification, that is, the training of the image classification model is completed.
The training parameters of the image classification model are continuously adjusted according to the loss function values, the training is stopped until the image classification model reaches the preset precision, the accuracy of the image classification model is ensured, the parameters to be trained in the image classification model are adjusted according to the loss function values, the convergence speed of the loss function values is accelerated, the generation efficiency of the image classification model is improved, and meanwhile, the recognition capability of the obtained image classification model on low-pixel pictures is improved.
In the image classification model generation method, the position detection unit determines the target object image according to the position identification of the target object in the image participating in training, the weighted Euclidean loss function is used for comparing the difference between the characteristic diagram identified by the target object image and the target object image to obtain a loss function value, and the parameter to be trained in the image classification model is adjusted according to the loss function value, so that the speed of convergence of the loss function value is accelerated, the image classification model generation efficiency is improved, and the identification capability of the image classification model obtained by the method on the low-pixel image is improved. The detection of the target object location identification can be re-planned and designed on the basis of the neural network layer based on mobilenets and vgg16, and simultaneously, a novel weighted Euclidean loss function is used to enable the whole image classification model to be converged more easily.
In one embodiment, the loss function value between the feature map and the target object image may be calculated by:
Figure BDA0002375253290000101
wherein L issalIs a preset weighted Euclidean loss function; l issal(x, g) is the loss function value; x is the pixel point value of the feature map, g is the pixel point value of the target object image, gi>0.5, i represents the ith pixel point of the characteristic diagram, d is the maximum value of the number of the pixel points of the characteristic diagram, and α is fixed weight.
In one embodiment, the location identifier of the pre-labeled target object is a rectangular border containing the target object; the position identifier carries width and height information of the rectangular frame and position coordinates of at least two opposite angles.
The rectangular frame is selected to mark the target object quickly, and meanwhile, the method is economical. The basic attribute of the whole rectangle can be determined by only knowing the coordinates and length and width data of any two opposite angles of the rectangular frame, and the operation speed is high. Marking a target object in each picture, wherein the marked basic unit is a frame surrounding the whole image; the selected pictures are preferably one picture and a plurality of target objects.
In one embodiment, the target object image may be calculated by:
Figure BDA0002375253290000102
Figure BDA0002375253290000103
Figure BDA0002375253290000104
wherein M isgIs a target object image; x and y are two vectors of a rectangular frame where the target object is located; cxiAnd CyiThe center point of the rectangular frame where the target object is located; s is a training step length in the training parameters; t is matrix transposition; n is the number of rectangular borders where the target objects are located in the picture; v. ofxyIs the coordinate of the center position of the rectangular frame where the target object is located, vxy=[x,y]TAnd v isxy∈RBi,RBiRepresents the area ratio of the rectangular frame of the target object in the picture, BiRepresenting the ith rectangular frame in the picture; mu.siIs the center position coordinates of the feature map; sigmaiIs a covariance matrix; w is aiAnd hiRespectively representing the width and height of the frame where the target object is located.
It should be noted that, in the conventional object detection, the frames of the position identifiers corresponding to the labeled multiple target objects cannot be well separated, and therefore, the rectangular frames corresponding to the multiple target objects in the picture are separated by using gaussian distribution to highlight the rectangular frames.
In one embodiment, before inputting the picture data set into the image classification model to be trained, the method further comprises: and adjusting the sizes of the pictures in the picture data set to enable the sizes of the pictures in the picture data set to be uniform.
The sizes of the pictures are not limited when the picture data sets are input, and the sizes of the pictures are unified and normalized when the image classification models are input for training. Similarly, based on the characteristics of the neural network model, a gradient descent algorithm or a back propagation algorithm is used for training, the input characteristics of the convolutional neural network need to be standardized, and before the target object image to be trained is input into the convolutional neural network, input data needs to be normalized in a channel or time/frequency dimension.
Specifically, size information is preset in the image classification model, and all input pictures which can be used for training are adjusted according to the preset size information; the size adjusting unit reads the actual size of the picture in the picture data set, then carries out conversion according to preset size information, and carries out size adjustment on the picture or the target object image according to the conversion result to obtain the adjusted picture or the target object image.
According to the image classification model generation method and device, the sizes of all the pictures are kept consistent by carrying out unified picture size adjustment on the input picture data set, and the image classification model generation efficiency is improved.
In one embodiment, the image classification model to be trained is a convolutional neural network.
The Convolutional Neural Networks (CNN) are a type of feed-forward Neural Networks (fed-forward Neural Networks) containing convolution calculation and having a deep structure, and are a representative algorithm of deep learning (deep learning); can be constructed by imitating the visual perception mechanism of organisms, and can perform supervised learning and unsupervised learning. The hidden layer of the convolutional neural network comprises a convolutional layer, a pooling layer and a full-connection layer 3 common structures; the order in which 3 types of common constructs are built into the hidden layer is typically: input, convolutional layer, pooling layer, full-link layer, and output.
In an embodiment, as shown in fig. 3, the step S15, after obtaining the trained image classification model, further includes:
step S31, inputting the test picture into the trained image classification model to obtain the image classification result returned by the trained image classification model; the image classification result contains category identification information; the test picture at least comprises a target object and also comprises a category identification of the target object;
step S32, if the category identification information in the image classification result is the same as the category identifier of the test picture and the matching degree reaches the preset matching threshold, determining to generate an image classification model.
The image classification result output by the image classification model is shown in fig. 4, the identified target object is surrounded by a rectangular frame, and labels "Cat" and "Dog" above the frame are the category identification information; the class identification information is a matching degree, which represents the probability that the class identification information identified by the image classification model belongs to the class.
Specifically, after the training of the image classification model is completed, the classification precision of the image classification model needs to be detected; inputting a picture containing at least one target object, wherein the type of the target object is the type of the target object input when the image classification model is generated; according to the class identification information and the matching degree output by the image classification model, whether the image classification model is trained to reach the required precision threshold value can be judged.
In the embodiment, the effect of training the image classification model is tested by inputting the test picture containing the target object, and whether the image classification model can be used for image classification is judged according to the test result, so that the generation effect of the image classification model is ensured.
It should be understood that although the steps in the flowcharts of fig. 1 and 3 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 and 3 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 5, there is provided an image classification model generation apparatus including: a data set acquisition module 51, a location information determination module 52, a feature map acquisition module 53, a loss function value determination module 54, and a model training module 55, wherein:
a data set obtaining module 51, configured to obtain a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of the target object which are marked in advance.
And the position information determining module 52 is configured to input the picture data set into the image classification model to be trained, so that the position detecting unit in the image classification model determines the position of the target object in the picture according to the position identifier, obtain an image of the target object, and output the image of the target object to the object identifying unit of the image classification model to be trained.
A feature map acquisition module 53, configured to identify a feature map corresponding to the target object image by the object identification unit; the feature map includes feature identification information, position detection information, and category identification information of the target object.
A loss function value determination module 54 configured to determine a loss function value between the feature map and the target object image based on the feature identification information, the position detection information, and the category identification information by using a preset weighted euclidean loss function; the weighted euclidean loss function converges faster than the euclidean loss function.
And the model training module 55 is configured to adjust a parameter to be trained in the position detection unit and/or the object recognition unit in the image classification model according to the loss function value, and end the training until the loss function value corresponding to the feature map recognized by the image classification model and the target object image is smaller than a preset threshold value, so as to obtain the trained image classification model.
In one embodiment, the image classification model generation apparatus further includes a size adjustment module for adjusting the sizes of the pictures in the picture data set so that the sizes of the pictures in the picture data set are uniform.
In one embodiment, the image classification model generation device further includes a test module, configured to input a test picture into the trained image classification model, so as to obtain an image classification result returned by the trained image classification model; the image classification result contains category identification information; the test picture at least comprises a target object and also comprises a category identification of the target object; and if the category identification information in the image classification result is the same as the category identification of the test picture and the matching degree reaches a preset matching threshold, determining to generate an image classification model.
According to the embodiment, the position of the target object is determined according to the position identification through the position detection unit, and then the difference between the identified feature map and the target object image is compared through the weighted Euclidean loss function to obtain the loss function value, so that the training parameters can be better adjusted, the recognition capability of the model on the low-pixel image is enhanced, the convergence speed of the loss function value is accelerated, and the generation efficiency of the image classification model is improved.
For specific limitations of the image classification model generation apparatus, reference may be made to the above limitations of the image classification model generation method, which are not described herein again. The modules in the image classification model generation device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing image classification model data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image classification model method.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of a target object in a picture according to a position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained;
identifying a feature map corresponding to the target object image by an object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted Euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and adjusting the sizes of the pictures in the picture data set to enable the sizes of the pictures in the picture data set to be uniform.
In one embodiment, the processor, when executing the computer program, further performs the steps of: inputting the test picture into the trained image classification model to obtain an image classification result returned by the trained image classification model; the image classification result contains category identification information; the test picture at least comprises a target object and also comprises a category identification of the target object; and if the category identification information in the image classification result is the same as the category identification of the test picture and the matching degree reaches a preset matching threshold, determining to generate an image classification model.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of a target object in a picture according to a position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained;
identifying a feature map corresponding to the target object image by an object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted Euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
In one embodiment, the computer program when executed by the processor further performs the steps of: and adjusting the sizes of the pictures in the picture data set to enable the sizes of the pictures in the picture data set to be uniform.
In one embodiment, the computer program when executed by the processor further performs the steps of: inputting the test picture into the trained image classification model to obtain an image classification result returned by the trained image classification model; the image classification result contains category identification information; the test picture at least comprises a target object and also comprises a category identification of the target object; and if the category identification information in the image classification result is the same as the category identification of the test picture and the matching degree reaches a preset matching threshold, determining to generate an image classification model.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile memory may include Read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for generating an image classification model, the method comprising:
acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of the target object in the picture according to the position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained;
identifying a feature map corresponding to the target object image by the object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
2. The method of claim 1, wherein the loss function value between the feature map and the target object image is calculated by:
Figure FDA0002375253280000011
wherein L issalIs the preset weighted Euclidean loss function; l issal(x, g) is the loss function value; x is the pixel point value of the feature map, g is the pixel point value of the target object image, gi>0.5, i represents the ith pixel point of the characteristic diagram, d is the maximum value of the number of the pixel points of the characteristic diagram, and α is fixed weight.
3. The method of claim 1, wherein the pre-labeled target object is identified as a rectangular border containing the target object; the position identification carries the width and height information of the rectangular frame and the position coordinates of two opposite angles.
4. The method of claim 3, wherein the target object image is computed by:
Figure FDA0002375253280000021
Figure FDA0002375253280000022
Figure FDA0002375253280000023
wherein M isgIs the target object image; x and y are two vectors of a rectangular frame where the target object is located; cxiAnd CyiThe center point of the rectangular frame where the target object is located; s is a training step length in the training parameters; t is matrix transposition; n is the number of rectangular borders where the target objects are located in the picture; v. ofxyIs the target objectThe coordinate of the center position of the rectangular frame, vxy=[x,y]TAnd v isxy∈RBi,RBiRepresents the area ratio of the rectangular frame of the target object in the picture, BiRepresenting the ith rectangular frame in the picture; mu.siIs the center position coordinates of the feature map; sigmaiIs a covariance matrix; w is aiAnd hiRespectively representing the width and height of the frame where the target object is located.
5. The method of claim 1, prior to inputting the picture data set into an image classification model to be trained, further comprising: and adjusting the sizes of the pictures in the picture data set to enable the sizes of the pictures in the picture data set to be uniform.
6. The method of claim 1, wherein the image classification model to be trained is a convolutional neural network.
7. The method of claim 1, further comprising, after the obtaining the trained image classification model:
inputting a test picture into the trained image classification model to obtain an image classification result returned by the trained image classification model; the image classification result contains category identification information; the test picture at least comprises one target object and also comprises a category identification of the target object;
and if the category identification information in the image classification result is the same as the category identification of the test picture and the matching degree reaches a preset matching threshold, determining to generate the image classification model.
8. An apparatus for generating an image classification model, the apparatus comprising:
the data set acquisition module is used for acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
the position information determining module is used for inputting the picture data set into an image classification model to be trained, so that a position detecting unit in the image classification model determines the position of the target object in the picture according to the position identification to obtain a target object image and outputs the target object image to an object recognition unit of the image classification model to be trained;
the characteristic diagram acquisition module is used for identifying a characteristic diagram corresponding to the target object image through the object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
a loss function value determination module configured to determine a loss function value between the feature map and the target object image based on the feature identification information, the position detection information, and the category identification information by a preset weighted euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and the model training module is used for adjusting parameters to be trained in the position detection unit and/or the object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010063515.1A 2020-01-20 2020-01-20 Image classification model generation method, device, computer equipment and storage medium Active CN111310800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010063515.1A CN111310800B (en) 2020-01-20 2020-01-20 Image classification model generation method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010063515.1A CN111310800B (en) 2020-01-20 2020-01-20 Image classification model generation method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111310800A true CN111310800A (en) 2020-06-19
CN111310800B CN111310800B (en) 2023-10-10

Family

ID=71158216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010063515.1A Active CN111310800B (en) 2020-01-20 2020-01-20 Image classification model generation method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111310800B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613574A (en) * 2020-12-30 2021-04-06 清华大学 Training method of image classification model, image classification method and device
CN112926437A (en) * 2021-02-22 2021-06-08 深圳中科飞测科技股份有限公司 Detection method and device, detection equipment and storage medium
CN113095194A (en) * 2021-04-02 2021-07-09 北京车和家信息技术有限公司 Image classification method and device, storage medium and electronic equipment
CN113313129A (en) * 2021-06-22 2021-08-27 中国平安财产保险股份有限公司 Method, device and equipment for training disaster recognition model and storage medium
CN113744161A (en) * 2021-09-16 2021-12-03 北京顺势兄弟科技有限公司 Enhanced data acquisition method and device, data enhancement method and electronic equipment
CN114637868A (en) * 2022-02-23 2022-06-17 广州市玄武无线科技股份有限公司 Product data processing method and system applied to fast-moving industry

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808143A (en) * 2017-11-10 2018-03-16 西安电子科技大学 Dynamic gesture identification method based on computer vision
CN108920999A (en) * 2018-04-16 2018-11-30 深圳市深网视界科技有限公司 A kind of head angle prediction model training method, prediction technique, equipment and medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808143A (en) * 2017-11-10 2018-03-16 西安电子科技大学 Dynamic gesture identification method based on computer vision
CN108920999A (en) * 2018-04-16 2018-11-30 深圳市深网视界科技有限公司 A kind of head angle prediction model training method, prediction technique, equipment and medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613574A (en) * 2020-12-30 2021-04-06 清华大学 Training method of image classification model, image classification method and device
CN112613574B (en) * 2020-12-30 2022-07-19 清华大学 Training method of image classification model, image classification method and device
CN112926437A (en) * 2021-02-22 2021-06-08 深圳中科飞测科技股份有限公司 Detection method and device, detection equipment and storage medium
CN113095194A (en) * 2021-04-02 2021-07-09 北京车和家信息技术有限公司 Image classification method and device, storage medium and electronic equipment
CN113313129A (en) * 2021-06-22 2021-08-27 中国平安财产保险股份有限公司 Method, device and equipment for training disaster recognition model and storage medium
CN113313129B (en) * 2021-06-22 2024-04-05 中国平安财产保险股份有限公司 Training method, device, equipment and storage medium for disaster damage recognition model
CN113744161A (en) * 2021-09-16 2021-12-03 北京顺势兄弟科技有限公司 Enhanced data acquisition method and device, data enhancement method and electronic equipment
CN113744161B (en) * 2021-09-16 2024-03-29 北京顺势兄弟科技有限公司 Enhanced data acquisition method and device, data enhancement method and electronic equipment
CN114637868A (en) * 2022-02-23 2022-06-17 广州市玄武无线科技股份有限公司 Product data processing method and system applied to fast-moving industry

Also Published As

Publication number Publication date
CN111310800B (en) 2023-10-10

Similar Documents

Publication Publication Date Title
CN111310800B (en) Image classification model generation method, device, computer equipment and storage medium
US10936911B2 (en) Logo detection
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
CN109344727B (en) Identity card text information detection method and device, readable storage medium and terminal
WO2016138838A1 (en) Method and device for recognizing lip-reading based on projection extreme learning machine
CN111667001B (en) Target re-identification method, device, computer equipment and storage medium
EP3745309A1 (en) Training a generative adversarial network
CN111832581B (en) Lung feature recognition method and device, computer equipment and storage medium
CN111275685A (en) Method, device, equipment and medium for identifying copied image of identity document
CN111401387B (en) Abnormal sample construction method, device, computer equipment and storage medium
CN110598638A (en) Model training method, face gender prediction method, device and storage medium
CN112613515A (en) Semantic segmentation method and device, computer equipment and storage medium
CN110705489B (en) Training method and device for target recognition network, computer equipment and storage medium
CN110827292B (en) Video instance segmentation method and device based on convolutional neural network
WO2023284608A1 (en) Character recognition model generating method and apparatus, computer device, and storage medium
CN113034514A (en) Sky region segmentation method and device, computer equipment and storage medium
CN112836682B (en) Method, device, computer equipment and storage medium for identifying object in video
CN116266387A (en) YOLOV4 image recognition algorithm and system based on re-parameterized residual error structure and coordinate attention mechanism
CN109978058A (en) Determine the method, apparatus, terminal and storage medium of image classification
WO2023066142A1 (en) Target detection method and apparatus for panoramic image, computer device and storage medium
CN109657083B (en) Method and device for establishing textile picture feature library
CN114677578A (en) Method and device for determining training sample data
CN115620083A (en) Model training method, face image quality evaluation method, device and medium
CN114241202A (en) Method and device for training dressing classification model and method and device for dressing classification
CN111428553B (en) Face pigment spot recognition method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220113

Address after: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200040

Applicant after: Tianyi Digital Life Technology Co.,Ltd.

Address before: 1 / F and 2 / F, East Garden, Huatian International Plaza, 211 Longkou Middle Road, Tianhe District, Guangzhou, Guangdong 510630

Applicant before: Century Dragon Information Network Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240313

Address after: Unit 1, Building 1, China Telecom Zhejiang Innovation Park, No. 8 Xiqin Street, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province, 311100

Patentee after: Tianyi Shilian Technology Co.,Ltd.

Country or region after: China

Address before: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200040

Patentee before: Tianyi Digital Life Technology Co.,Ltd.

Country or region before: China