Disclosure of Invention
In view of the foregoing, it is desirable to provide an image classification model generation method, apparatus, computer device, and storage medium for solving the above technical problems.
A method of generating an image classification model, the method comprising:
acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of the target object in the picture according to the position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained;
identifying a feature map corresponding to the target object image by the object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
In one embodiment, the loss function value between the feature map and the target object image may be calculated by:
wherein L issalIs the preset weighted Euclidean loss function; l issal(x, g) is the loss function value; x is the pixel point value of the feature map, g is the pixel point value of the target object image, gi>0.5, i represents the ith pixel point of the characteristic diagram, d is the maximum value of the number of the pixel points of the characteristic diagram, and α is fixed weight.
In one embodiment, the position of the pre-labeled target object is identified as a rectangular border containing the target object; the position identification carries the width and height information of the rectangular frame and position coordinates of at least two opposite angles.
In one embodiment, the target object image may be calculated by:
wherein M isgIs the target object image; x and y are two vectors of a rectangular frame where the target object is located; cxiAnd CyiThe center point of the rectangular frame where the target object is located; s is a training step length in the training parameters; t is matrix transposition; n is the number of rectangular borders where the target objects are located in the picture; v. ofxyIs the coordinate of the center position of the rectangular frame where the target object is located, vxy=[x,y]TAnd v isxy∈RBi,RBiRepresents the area ratio of the rectangular frame of the target object in the picture, BiRepresenting the ith rectangular frame in the picture; mu.siIs the center position coordinates of the feature map; sigmaiIs a covariance matrix; w is aiAnd hiRespectively representing the width and height of the frame where the target object is located.
In one embodiment, before inputting the picture data set into an image classification model to be trained, the method further includes: and adjusting the sizes of the pictures in the picture data set to enable the sizes of the pictures in the picture data set to be uniform.
In one embodiment, the image classification model to be trained is a convolutional neural network.
In one embodiment, after obtaining the trained image classification model, the method further includes:
inputting a test picture into the trained image classification model to obtain an image classification result returned by the trained image classification model; the image classification result contains category identification information; the test picture at least comprises one target object and also comprises a category identification of the target object;
and if the category identification information in the image classification result is the same as the category identification of the test picture and the matching degree reaches a preset matching threshold, determining to generate the image classification model.
An image classification model generation apparatus, the apparatus comprising:
the data set acquisition module is used for acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
the position information determining module is used for inputting the picture data set into an image classification model to be trained, so that a position detecting unit in the image classification model determines the position of the target object in the picture according to the position identification to obtain a target object image and outputs the target object image to an object recognition unit of the image classification model to be trained;
the characteristic diagram acquisition module is used for identifying a characteristic diagram corresponding to the target object image through the object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
a loss function value determination module configured to determine a loss function value between the feature map and the target object image based on the feature identification information, the position detection information, and the category identification information by a preset weighted euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and the model training module is used for adjusting parameters to be trained in the position detection unit and/or the object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of the target object in the picture according to the position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained;
identifying a feature map corresponding to the target object image by the object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of the target object in the picture according to the position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained;
identifying a feature map corresponding to the target object image by the object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
The image classification model generation method, the image classification model generation device, the computer equipment and the storage medium are used for acquiring the image data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance; inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of a target object in a picture according to a position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained; identifying a feature map corresponding to the target object image by an object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object; determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted Euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function; and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model. The method comprises the steps of determining a target object image according to a position identification of a target object in an image participating in training through a position detection unit, comparing a difference between a feature map identified by the target object image and the target object image through a weighted Euclidean loss function to obtain a loss function value, adjusting a parameter to be trained in an image classification model according to the loss function value, accelerating the convergence speed of the loss function value, improving the generation efficiency of the image classification model, and improving the recognition capability of the image classification model obtained through the method on a low-pixel image.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In an embodiment, as shown in fig. 1, an image classification model generation method is provided, and this embodiment is illustrated by applying the method to a server, and it is to be understood that the method may also be applied to a terminal, and may also be applied to a system including a terminal and a server, and is implemented by interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step S11, acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of the target object which are marked in advance.
The picture data set is a file set composed of a plurality of pictures, and the format, size, and the like of the picture file are not limited. Each picture should contain a specific kind of target object; as shown in fig. 2, the graph includes 2 categories of target objects, which are Person and Laptop; the target object is a target which is actually required to be identified by the image classification model in one picture; the common marking form is that rectangular frames are used for enclosing target objects, each rectangular frame is correspondingly marked with one target object, the rectangular frames are target object position marks, four corners carry coordinate information, and the positions of the target objects in the picture can be determined according to the coordinate information of any two opposite corners; it should be noted that the format of labeling the target object is not limited to the rectangular frame, and may be replaced by other forms or formats; the upper portion of the rectangular border is labeled as a category label, which may also be commonly referred to as a "label".
Specifically, the target object types required by training are determined, arranged in sequence, and the pictures containing the target objects are labeled. The image classification model acquires the marked image data set from data sources such as a database and the like, can be directly acquired in a data packet mode, and can also be acquired one by one until all data required by training are acquired, and the category and the number are not limited; in order to ensure the effect of model training, the number of pictures corresponding to each category should be greater than 1500. The picture data set can be added in the middle of the training process according to the requirement. After the image classification model obtains the image data set, the integrity of the image data set, such as the number check of images, the class identifier missing check, the position identifier missing check, and the like, can be checked according to various parameters required in the generation process of the image classification model.
In the step, the picture data set containing the target object image to be recognized is obtained by acquiring the picture data set, each picture in the data set carries the position identification and the category identification, the integrity of the picture data set is checked, the data base of model generation is ensured, errors in training are avoided, and the generation efficiency of the image classification model is improved.
Step S12, the image data set is input into the image classification model to be trained, so that the position detection unit in the image classification model determines the position of the target object in the image according to the position identifier, and the target object image is obtained and output to the object recognition unit of the image classification model to be trained.
The image classification model is an operation model used for recognizing an input image and outputting an image classification result after training, and may be a MobileNet Network structure (a mobile terminal or a lightweight CNN Network in an embedded device) or a convolutional neural Network VGG16(Visual Geometry Group Network). The position detection unit can determine the position of the target object in the picture through the position identification; as shown in fig. 2, the rectangular frame is the position identifier, and the image content in the rectangular frame is the target object to be identified.
Specifically, after the image classification model to be trained acquires the picture data, a position detection unit is called to identify a position identifier marked in advance, so that the position of a target object to be identified is determined, and each target object is stripped from the picture to obtain a target object image; then, the picture classification model outputs the obtained target object image to the object recognition unit.
Determining a target object image according to the position identification of a target object in the images participating in training by a position detection unit of an image classification model, and separating the target object to be recognized from the image; the data size of the image classification model to be recognized is reduced, errors of recognized objects during training are avoided, and the image classification model generation efficiency is improved due to the smaller data size.
Step S13, recognizing, by the object recognition unit, a feature map corresponding to the target object image; the feature map includes feature identification information, position detection information, and category identification information of the target object.
The object recognition unit can analyze the target object image and extract more remarkable feature points in the target object image. The feature map is composed of a plurality of feature points, and also carries feature identification information in which feature point data information is stored, and position detection information and category identification information corresponding to the target object image. The position detection information is positioning result information obtained after the image classification model identifies the position identification of the image, and the category identification information is target object category result information obtained after the image classification model identifies the category identification of the image.
Specifically, after an object identification unit in the image classification model acquires a target object image to be identified, the target object image is identified, and a feature map is generated after convolution processing is performed according to features of the target object image. It should be noted that the feature map needs to be updated continuously for image classification, so that the image classification model needs to identify the target object image and update the feature map for multiple times, thereby improving the accuracy of the image classification model. Different training parameter settings are provided in the object recognition unit according to the difference of the utilized neural network model, and the characteristic diagram is obtained after multiple operations according to the training parameters which are continuously adjusted and updated.
In the step, an object identification unit in an image classification model identifies an acquired target object image to obtain a feature map; the training effect and progress of the image classification model can be judged through the characteristic diagram, timely adjustment is made, and the generation efficiency of the image classification model is improved.
Step S14 of determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information, and the category identification information by a preset weighted euclidean loss function; the weighted euclidean loss function converges faster than the euclidean loss function.
The loss function is often used to measure the prediction ability of the neural network learning model, and the algorithms involved in the neural network basically depend on the process of maximizing or minimizing the objective function, so the minimized function is often called the loss function. In the process of generating the image classification model, the purpose of the loss function is to minimize the difference between the prediction image and the target object image, the process of gradually reducing the difference is convergence, and the convergence speed is related to the training speed of the model; although the euclidean loss function is widely used to measure the distance between pixels, this method performs poorly if the pixels are commonly 0 or the pixels are commonly low, normalizing most pixels to 0. Therefore, we use a weighted euclidean loss function to better handle this case; the weighted Euclidean loss function can distribute more weights to the target object image, so that the background of the target object image is lightened, and the problem of non-difference normalization of pixel points under the condition of low pixels is solved.
Specifically, the image classification model compares the identified feature map and the target object image by using a weighted euclidean loss function, and obtains a loss function value by combining the feature identification information, the position detection information, and the category identification information and comparing the difference therebetween. The image classification model can evaluate the training effect according to the obtained loss function value, and determine the adjustment direction, adjustment range, adjustment value and the like of the training parameters.
Comparing the difference between a characteristic image identified aiming at a target object image and the target object image by a weighted Euclidean loss function to obtain a loss function value; the weighted Euclidean loss function can distribute more weights to the target object image, so that the background of the target object image is lightened, the problem of non-difference normalization of pixel points under the condition of low pixel is solved, training parameters can be better adjusted, and the recognition capability of the image classification model obtained by the method to the low pixel image is improved.
And step S15, adjusting parameters to be trained in the position detection unit and/or the object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
The parameters to be trained are initial parameters of the neural network model before training, and the training process is a process of continuously adjusting the training parameters according to the loss function values.
Specifically, a loss function value between the target object image and the recognized feature map is calculated based on the feature recognition information, the position detection information and the category recognition information, and a training parameter of the image classification model is updated and adjusted according to the loss function value until the calculated loss function value reaches a preset threshold value, and the training parameter is determined to be applicable to image classification, that is, the training of the image classification model is completed.
The training parameters of the image classification model are continuously adjusted according to the loss function values, the training is stopped until the image classification model reaches the preset precision, the accuracy of the image classification model is ensured, the parameters to be trained in the image classification model are adjusted according to the loss function values, the convergence speed of the loss function values is accelerated, the generation efficiency of the image classification model is improved, and meanwhile, the recognition capability of the obtained image classification model on low-pixel pictures is improved.
In the image classification model generation method, the position detection unit determines the target object image according to the position identification of the target object in the image participating in training, the weighted Euclidean loss function is used for comparing the difference between the characteristic diagram identified by the target object image and the target object image to obtain a loss function value, and the parameter to be trained in the image classification model is adjusted according to the loss function value, so that the speed of convergence of the loss function value is accelerated, the image classification model generation efficiency is improved, and the identification capability of the image classification model obtained by the method on the low-pixel image is improved. The detection of the target object location identification can be re-planned and designed on the basis of the neural network layer based on mobilenets and vgg16, and simultaneously, a novel weighted Euclidean loss function is used to enable the whole image classification model to be converged more easily.
In one embodiment, the loss function value between the feature map and the target object image may be calculated by:
wherein L issalIs a preset weighted Euclidean loss function; l issal(x, g) is the loss function value; x is the pixel point value of the feature map, g is the pixel point value of the target object image, gi>0.5, i represents the ith pixel point of the characteristic diagram, d is the maximum value of the number of the pixel points of the characteristic diagram, and α is fixed weight.
In one embodiment, the location identifier of the pre-labeled target object is a rectangular border containing the target object; the position identifier carries width and height information of the rectangular frame and position coordinates of at least two opposite angles.
The rectangular frame is selected to mark the target object quickly, and meanwhile, the method is economical. The basic attribute of the whole rectangle can be determined by only knowing the coordinates and length and width data of any two opposite angles of the rectangular frame, and the operation speed is high. Marking a target object in each picture, wherein the marked basic unit is a frame surrounding the whole image; the selected pictures are preferably one picture and a plurality of target objects.
In one embodiment, the target object image may be calculated by:
wherein M isgIs a target object image; x and y are two vectors of a rectangular frame where the target object is located; cxiAnd CyiThe center point of the rectangular frame where the target object is located; s is a training step length in the training parameters; t is matrix transposition; n is the number of rectangular borders where the target objects are located in the picture; v. ofxyIs the coordinate of the center position of the rectangular frame where the target object is located, vxy=[x,y]TAnd v isxy∈RBi,RBiRepresents the area ratio of the rectangular frame of the target object in the picture, BiRepresenting the ith rectangular frame in the picture; mu.siIs the center position coordinates of the feature map; sigmaiIs a covariance matrix; w is aiAnd hiRespectively representing the width and height of the frame where the target object is located.
It should be noted that, in the conventional object detection, the frames of the position identifiers corresponding to the labeled multiple target objects cannot be well separated, and therefore, the rectangular frames corresponding to the multiple target objects in the picture are separated by using gaussian distribution to highlight the rectangular frames.
In one embodiment, before inputting the picture data set into the image classification model to be trained, the method further comprises: and adjusting the sizes of the pictures in the picture data set to enable the sizes of the pictures in the picture data set to be uniform.
The sizes of the pictures are not limited when the picture data sets are input, and the sizes of the pictures are unified and normalized when the image classification models are input for training. Similarly, based on the characteristics of the neural network model, a gradient descent algorithm or a back propagation algorithm is used for training, the input characteristics of the convolutional neural network need to be standardized, and before the target object image to be trained is input into the convolutional neural network, input data needs to be normalized in a channel or time/frequency dimension.
Specifically, size information is preset in the image classification model, and all input pictures which can be used for training are adjusted according to the preset size information; the size adjusting unit reads the actual size of the picture in the picture data set, then carries out conversion according to preset size information, and carries out size adjustment on the picture or the target object image according to the conversion result to obtain the adjusted picture or the target object image.
According to the image classification model generation method and device, the sizes of all the pictures are kept consistent by carrying out unified picture size adjustment on the input picture data set, and the image classification model generation efficiency is improved.
In one embodiment, the image classification model to be trained is a convolutional neural network.
The Convolutional Neural Networks (CNN) are a type of feed-forward Neural Networks (fed-forward Neural Networks) containing convolution calculation and having a deep structure, and are a representative algorithm of deep learning (deep learning); can be constructed by imitating the visual perception mechanism of organisms, and can perform supervised learning and unsupervised learning. The hidden layer of the convolutional neural network comprises a convolutional layer, a pooling layer and a full-connection layer 3 common structures; the order in which 3 types of common constructs are built into the hidden layer is typically: input, convolutional layer, pooling layer, full-link layer, and output.
In an embodiment, as shown in fig. 3, the step S15, after obtaining the trained image classification model, further includes:
step S31, inputting the test picture into the trained image classification model to obtain the image classification result returned by the trained image classification model; the image classification result contains category identification information; the test picture at least comprises a target object and also comprises a category identification of the target object;
step S32, if the category identification information in the image classification result is the same as the category identifier of the test picture and the matching degree reaches the preset matching threshold, determining to generate an image classification model.
The image classification result output by the image classification model is shown in fig. 4, the identified target object is surrounded by a rectangular frame, and labels "Cat" and "Dog" above the frame are the category identification information; the class identification information is a matching degree, which represents the probability that the class identification information identified by the image classification model belongs to the class.
Specifically, after the training of the image classification model is completed, the classification precision of the image classification model needs to be detected; inputting a picture containing at least one target object, wherein the type of the target object is the type of the target object input when the image classification model is generated; according to the class identification information and the matching degree output by the image classification model, whether the image classification model is trained to reach the required precision threshold value can be judged.
In the embodiment, the effect of training the image classification model is tested by inputting the test picture containing the target object, and whether the image classification model can be used for image classification is judged according to the test result, so that the generation effect of the image classification model is ensured.
It should be understood that although the steps in the flowcharts of fig. 1 and 3 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 and 3 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 5, there is provided an image classification model generation apparatus including: a data set acquisition module 51, a location information determination module 52, a feature map acquisition module 53, a loss function value determination module 54, and a model training module 55, wherein:
a data set obtaining module 51, configured to obtain a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of the target object which are marked in advance.
And the position information determining module 52 is configured to input the picture data set into the image classification model to be trained, so that the position detecting unit in the image classification model determines the position of the target object in the picture according to the position identifier, obtain an image of the target object, and output the image of the target object to the object identifying unit of the image classification model to be trained.
A feature map acquisition module 53, configured to identify a feature map corresponding to the target object image by the object identification unit; the feature map includes feature identification information, position detection information, and category identification information of the target object.
A loss function value determination module 54 configured to determine a loss function value between the feature map and the target object image based on the feature identification information, the position detection information, and the category identification information by using a preset weighted euclidean loss function; the weighted euclidean loss function converges faster than the euclidean loss function.
And the model training module 55 is configured to adjust a parameter to be trained in the position detection unit and/or the object recognition unit in the image classification model according to the loss function value, and end the training until the loss function value corresponding to the feature map recognized by the image classification model and the target object image is smaller than a preset threshold value, so as to obtain the trained image classification model.
In one embodiment, the image classification model generation apparatus further includes a size adjustment module for adjusting the sizes of the pictures in the picture data set so that the sizes of the pictures in the picture data set are uniform.
In one embodiment, the image classification model generation device further includes a test module, configured to input a test picture into the trained image classification model, so as to obtain an image classification result returned by the trained image classification model; the image classification result contains category identification information; the test picture at least comprises a target object and also comprises a category identification of the target object; and if the category identification information in the image classification result is the same as the category identification of the test picture and the matching degree reaches a preset matching threshold, determining to generate an image classification model.
According to the embodiment, the position of the target object is determined according to the position identification through the position detection unit, and then the difference between the identified feature map and the target object image is compared through the weighted Euclidean loss function to obtain the loss function value, so that the training parameters can be better adjusted, the recognition capability of the model on the low-pixel image is enhanced, the convergence speed of the loss function value is accelerated, and the generation efficiency of the image classification model is improved.
For specific limitations of the image classification model generation apparatus, reference may be made to the above limitations of the image classification model generation method, which are not described herein again. The modules in the image classification model generation device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing image classification model data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image classification model method.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
acquiring a picture data set; the picture data set comprises a plurality of pictures, and each picture at least comprises a target object; the picture also comprises a position mark and a category mark of a target object which are marked in advance;
inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of a target object in a picture according to a position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained;
identifying a feature map corresponding to the target object image by an object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted Euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and adjusting the sizes of the pictures in the picture data set to enable the sizes of the pictures in the picture data set to be uniform.
In one embodiment, the processor, when executing the computer program, further performs the steps of: inputting the test picture into the trained image classification model to obtain an image classification result returned by the trained image classification model; the image classification result contains category identification information; the test picture at least comprises a target object and also comprises a category identification of the target object; and if the category identification information in the image classification result is the same as the category identification of the test picture and the matching degree reaches a preset matching threshold, determining to generate an image classification model.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
inputting the picture data set into an image classification model to be trained, enabling a position detection unit in the image classification model to determine the position of a target object in a picture according to a position identification, obtaining a target object image and outputting the target object image to an object recognition unit of the image classification model to be trained;
identifying a feature map corresponding to the target object image by an object identification unit; the feature map comprises feature identification information, position detection information and category identification information of the target object;
determining a loss function value between the feature map and the target object image based on the feature identification information, the position detection information and the category identification information through a preset weighted Euclidean loss function; the convergence speed of the weighted Euclidean loss function is faster than that of the Euclidean loss function;
and adjusting parameters to be trained in a position detection unit and/or an object recognition unit in the image classification model according to the loss function values until the loss function values corresponding to the characteristic diagram recognized by the image classification model and the target object image are smaller than a preset threshold value, and finishing training to obtain the trained image classification model.
In one embodiment, the computer program when executed by the processor further performs the steps of: and adjusting the sizes of the pictures in the picture data set to enable the sizes of the pictures in the picture data set to be uniform.
In one embodiment, the computer program when executed by the processor further performs the steps of: inputting the test picture into the trained image classification model to obtain an image classification result returned by the trained image classification model; the image classification result contains category identification information; the test picture at least comprises a target object and also comprises a category identification of the target object; and if the category identification information in the image classification result is the same as the category identification of the test picture and the matching degree reaches a preset matching threshold, determining to generate an image classification model.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile memory may include Read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.