CN115272794A

CN115272794A - Model training method, computer device, and storage medium

Info

Publication number: CN115272794A
Application number: CN202210882475.2A
Authority: CN
Inventors: 冯威
Original assignee: Shenzhen Huace Huihong Technology Co ltd
Current assignee: Shenzhen Huace Huihong Technology Co ltd
Priority date: 2022-07-25
Filing date: 2022-07-25
Publication date: 2022-11-01

Abstract

The application provides a model training method, computer equipment and a storage medium. The model training method comprises the following steps: acquiring an image to be processed; calling an object recognition model to perform density estimation processing on a target object contained in an image to be processed to obtain a target density map corresponding to the image to be processed, wherein the object recognition model is obtained by training based on a sample image and a reference density map corresponding to the sample image, and the reference density map is obtained by labeling the target object contained in the sample image; based on the target density map, the number of target objects contained in the image to be processed is determined. By the method and the device, the accuracy of model training can be improved, and the number of the target objects can be identified more accurately.

Description

Model training method, computer device, and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a model training method, a computer device, and a computer-readable storage medium.

Background

With the continuous development of computer technology, image recognition technology is spread in every aspect of people's daily life. For example, image recognition techniques are employed to identify the number of target objects in an image. Currently, an object detection technique is generally used to identify the object in the image and then determine the number of objects contained in the image according to the identification result.

The method is generally suitable for scenes with low target object density, and if the target object density in the image is high, the recognition effect may be poor, and the accuracy of image recognition may be affected.

Disclosure of Invention

The embodiment of the application provides a model training method, computer equipment and a storage medium, which can improve the accuracy of model training, so that the number of target objects can be more accurately identified.

In one aspect, an embodiment of the present application provides a model training method, where the method includes:

acquiring a sample image;

labeling a target object contained in the sample image to obtain a reference density map corresponding to the sample image;

calling a neural network model to carry out density estimation processing on the sample image to obtain a predicted density map corresponding to the sample image;

and training the neural network model based on the reference density map and the prediction density map to obtain an object recognition model, wherein the object recognition model is used for recognizing the number of the target objects in the image.

acquiring an image to be processed;

calling an object recognition model to perform density estimation processing on a target object contained in an image to be processed to obtain a target density image corresponding to the image to be processed, wherein the object recognition model is obtained by the model training method;

based on the target density map, the number of target objects contained in the image to be processed is determined.

In one aspect, an embodiment of the present application provides a model training apparatus, where the apparatus includes:

an acquisition unit configured to acquire a sample image;

the processing unit is used for performing labeling processing on a target object contained in the sample image to obtain a reference density map corresponding to the sample image;

the processing unit is also used for calling the neural network model to carry out density estimation processing on the sample image to obtain a predicted density map corresponding to the sample image;

and the training unit is used for training the neural network model based on the reference density map and the prediction density map to obtain an object recognition model, and the object recognition model is used for recognizing the number of the target objects in the image.

In a possible implementation manner, the processing unit performs labeling processing on the target object to obtain a reference density map corresponding to the sample image, and is configured to perform the following operations:

marking the position of a boundary point of the target object by adopting an edge marking algorithm to obtain k boundary points corresponding to the target object, wherein k is a positive integer;

performing polygon fitting on k boundary points corresponding to the target object to obtain a polygon object for representing the shape of the target object;

and determining a corresponding reference density map of the sample image based on the polygonal object.

In one possible implementation, the polygonal object is an ellipse; the processing unit determines a reference density map corresponding to the sample image based on the polygonal object, and is used for executing the following operations:

based on the parameters of the ellipse, performing Gaussian distribution expression on the target object to obtain a Gaussian distribution expression, wherein the parameters of the ellipse comprise any one or more of the following items: the included angles of the long half shaft, the short half shaft, the central point, the long half shaft and the horizontal direction are formed;

and determining a reference density map corresponding to the sample image based on the Gaussian distribution expression.

In a possible implementation manner, the processing unit determines, based on the gaussian distribution expression, a reference density map corresponding to the sample image, and is configured to perform the following operations:

determining the density value of each pixel point in the polygonal object based on the Gaussian distribution expression;

and carrying out normalization processing on the density value of each pixel point in the polygonal object to obtain a reference density map corresponding to the sample image.

In a possible implementation manner, the training unit trains the neural network model based on the reference density map and the predicted density map, so as to obtain an object recognition model, and is configured to perform the following operations:

determining a network loss for a neural network model based on difference data between a reference density map and the predicted density map;

and carrying out iterative adjustment processing on the neural network model based on network loss, and taking the adjusted neural network model as an object recognition model if the adjusted neural network model meets the model convergence condition.

In one possible implementation, the training unit determines a network loss of the neural network model based on difference data between the reference density map and the predicted density map for performing the following operations:

acquiring pixel scale loss based on pixel value difference values between each pixel point in the reference density map and each pixel point in the predicted density map;

obtaining a count scale loss based on a number difference between the reference density map and the predicted density map for the target object;

and determining the network loss of the neural network model according to the pixel scale loss and the counting scale loss.

the acquisition unit is used for acquiring an image to be processed;

the processing unit is used for calling an object recognition model to perform density estimation processing on a target object contained in the image to be processed to obtain a target density map corresponding to the image to be processed, and the object recognition model is obtained by the model training method;

and the determining unit is used for determining the number of the target objects contained in the image to be processed based on the target density map.

In a possible implementation manner, the determining unit determines the number of target objects contained in the image to be processed based on the target density map, and is used for executing the following operations:

acquiring density values corresponding to all pixel points in the target density map;

and performing summation operation on the density values corresponding to the pixel points to obtain the number of target objects contained in the image to be processed.

In one aspect, embodiments of the present application provide a computer device, which includes a memory and a processor, where the memory stores one or more computer programs, and when the computer programs are executed by the processor, the processor executes the above-mentioned model training method.

In one aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is read by a processor of a computer device and executed, the computer program causes the computer device to perform the above-mentioned model training method.

In one aspect, embodiments of the present application provide a computer program product or a computer program, where the computer program product or the computer program includes a computer program, and the computer program is stored in a computer readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program, so that the computer device executes the model training method described above.

In the embodiment of the application, a sample image is obtained; labeling a target object contained in the sample image to obtain a reference density map corresponding to the sample image; calling a neural network model to carry out density estimation processing on the sample image to obtain a predicted density map corresponding to the sample image; and training the neural network model based on the reference density map and the prediction density map to obtain an object recognition model, wherein the object recognition model is used for recognizing the number of the target objects in the image. Therefore, the target objects in the sample images can be labeled, so that the reference density maps corresponding to the sample images are obtained, the object recognition models are trained through the reference density maps, the object recognition models can accurately recognize the density maps, the number of the target objects is determined based on the density maps, and compared with the method for directly recognizing the target objects in the images, the number of the target objects can be accurately determined, the model training process is more accurate, and the number of the target objects can be accurately recognized based on the more accurate object recognition models.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an architecture of a model training system according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram illustrating a model training method according to an embodiment of the present disclosure;

fig. 3a is a scene schematic diagram of a sample image provided in an embodiment of the present application;

FIG. 3b is a schematic view of a scene of an edge annotation provided in an embodiment of the present application;

FIG. 4a is a schematic diagram of a reference density map provided in an embodiment of the present application;

FIG. 4b is a schematic illustration of another reference density map provided by an embodiment of the present application;

FIG. 5a is a schematic diagram of an object recognition model provided in an embodiment of the present application;

FIG. 5b is a schematic diagram of a network structure of an object recognition model according to an embodiment of the present application;

FIG. 6 is a schematic flow chart diagram illustrating a model training method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a scenario of a model application provided in an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a model training apparatus according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The embodiment of the application provides a model training scheme, the scheme can be used for training to obtain an object recognition model, the object recognition model can be used for recognizing the number of target objects in an image, and the accuracy of recognizing the number of the target objects can be improved based on a counting mode of a reference density map. The general principle of the scheme is as follows: acquiring a sample image; labeling a target object contained in the sample image to obtain a reference density map corresponding to the sample image; calling a neural network model to carry out density estimation processing on the sample image to obtain a predicted density map corresponding to the sample image; and training the neural network model based on the reference density map and the prediction density map to obtain an object recognition model, wherein the object recognition model is used for recognizing the number of target objects in the image.

Therefore, the target objects in the sample images can be labeled, so that the reference density maps corresponding to the sample images are obtained, the object recognition models are trained through the reference density maps, the object recognition models can accurately recognize the density maps, the number of the target objects is determined based on the density maps, and compared with the method for directly recognizing the target objects in the images, the number of the target objects can be accurately determined, the model training process is more accurate, and the number of the target objects can be accurately recognized based on the more accurate object recognition models.

Next, the above-mentioned model training scheme is described in conjunction with the technical terms related to the present application:

1. artificial intelligence:

an Artificial Intelligence (AI) technology is a comprehensive subject, and relates to a wide range of fields, namely a hardware technology and a software technology. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, large model training technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In a possible implementation manner, the model training scheme provided by the embodiment of the present application may be combined with a machine learning technique belonging to the field of artificial intelligence. Specifically, the neural network model can be trained by a machine learning technique to obtain an object recognition model. Then, the trained object recognition model can be applied to various machine learning scenes such as an object recognition scene and an object counting scene, so that the number of the target objects can be recognized and obtained. Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formula learning.

2. Cloud technology:

cloud computing (cloud computing) is a computing model that distributes computing tasks over a pool of resources formed by a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

In a possible implementation manner, when the model training scheme of the application is executed, the neural network model is called to perform density estimation processing on the sample image to obtain a predicted density map corresponding to the sample image, the process relates to large-scale calculation and requires large calculation power and storage space, so that computer equipment can obtain sufficient calculation power and storage space through a cloud computing technology, and further a specific process related to the application for determining the predicted density map corresponding to the sample image is executed.

3. Block chains:

the Blockchain (Blockchain) is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. The blockchain is essentially a decentralized database, which is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The model training scheme can be combined with a block chain technology, for example, data such as a sample image, a reference density map and a prediction density map can be uploaded to a block chain for storage, and the data on the block chain can be guaranteed not to be tampered easily, so that the safety of the model training process is guaranteed.

It should be noted that, in the following detailed description of the present application, the data related to the object information (such as identification, name, etc.) and the like are referred, when the above embodiments of the present application are applied to specific products or technologies, permission or consent of the object needs to be obtained, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant countries and regions.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an architecture of a model training system according to an embodiment of the present disclosure. The architecture diagram of the model training system comprises: server 140 and a terminal device cluster, where the terminal device cluster may include: terminal device 110, terminal device 120, terminal device 130, and the like. The terminal device cluster and the server 140 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

The server 140 shown in fig. 1 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.

The terminal device 110, the terminal device 120, the terminal device 130, and the like shown in fig. 1 may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Mobile Internet Device (MID), a vehicle-mounted device, a roadside device, an aircraft, a wearable device, such as a smart watch, a smart bracelet, a pedometer, and the like, and may be an intelligent device having a model training function.

In a possible implementation manner, taking the terminal device 110 and the server 140 to perform the above-mentioned model training scheme together as an example, specific operations performed by the terminal device 110 and the server 140 are described separately. Wherein the terminal device 110 may acquire the sample image. Terminal device 110 may then send the sample image to server 140. The server 140 may perform labeling processing on the target object included in the sample image to obtain a reference density map corresponding to the sample image. Then, the server 140 may also call the neural network model to perform density estimation processing on the sample image, so as to obtain a predicted density map corresponding to the sample image. Finally, the server 140 may also train the neural network model based on the reference density map and the predicted density map to obtain an object recognition model. Finally, the server 140 may send the object recognition model to the terminal device 110 to cause the terminal device 110 to call the object recognition model for recognizing the number of target objects in the image.

It should be understood that the above is merely illustrative of specific operations performed by terminal device 110 and server 140. In another possible implementation manner, the target object included in the sample image is labeled to obtain a reference density map corresponding to the sample image, which may also be executed by the terminal device 110; therefore, the terminal device 110 sends the reference density map corresponding to the sample image to the server 140, so that the server 140 calls the neural network model to perform density estimation processing on the sample image, and a predicted density map corresponding to the sample image is obtained. In another possible implementation manner, the above-mentioned model training scheme may be executed by the server 140 or the terminal device 110 in the model training system alone, which is not specifically limited in this embodiment of the present application.

In a possible implementation manner, the model training system provided in the embodiment of the present application may be deployed on a blockchain, for example, the server 140 and each terminal device (terminal device 110, terminal device 120, terminal device 130, and the like) included in the terminal device cluster may be used as a node device of the blockchain to jointly form a blockchain network. Therefore, the model training process can be executed on the block chain, so that fairness of the model training process can be guaranteed, meanwhile, the model training process can have traceability, and safety of the model training process is improved.

It is to be understood that the system architecture diagram described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present application, and does not constitute a limitation to the technical solution provided in the embodiment of the present application, and as a person having ordinary skill in the art knows that along with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.

Based on the above analysis, the model training method of the present application is described below with reference to fig. 2. Referring to fig. 2, fig. 2 is a schematic flowchart of a model training method according to an embodiment of the present disclosure. The model training method may be performed by the computer device mentioned above, and the computer device may be a terminal device or a server. Referring to fig. 2, the model training method may include steps S201 to S204:

s201: a sample image is acquired.

In an embodiment of the application, the sample image may comprise at least one target object. Among them, the types of the target object may include but are not limited to: animal, human, flower, grass, etc., and the type of the target object is not particularly limited in the examples of the present application. For convenience of explanation, in the following embodiments of the present application, the type of the target object is taken as an example for a corresponding explanation, and specifically, the target object of the animal type may be, for example: cats, dogs, pigs, rabbits, etc. It should be understood that the sample images mentioned in the embodiments of the present application include a larger number of target objects, which facilitates subsequent sample images based on a larger number of target objects, thereby obtaining corresponding density maps.

S202: and labeling the target object contained in the sample image to obtain a reference density map corresponding to the sample image.

In a possible implementation manner, the performing, by a computer device, a labeling process on the target object to obtain a reference density map corresponding to the sample image may include: marking the position of a boundary point of the target object by adopting an edge marking algorithm to obtain k boundary points corresponding to the target object, wherein k is a positive integer; and determining a reference density map corresponding to the sample image according to the k boundary points corresponding to the target object.

For example, please refer to fig. 3a, and fig. 3a is a scene schematic diagram of a sample image according to an embodiment of the present disclosure. As shown in fig. 3a, if the target object included in the sample image is a pig S100, the position of the boundary point of the pig S100 in the pig image may be labeled based on an edge labeling algorithm, so as to obtain k boundary points corresponding to the pig S100. It should be noted that the number k of boundary points marked on each pig may be set in a customized manner according to the type of the target object, for example, if the type of the target object is an ellipse, k may be greater than or equal to 5; if the type of the target object is hexagonal, k may be greater than or equal to 6; also if the type of target object is triangular, k may be greater than or equal to 3.

In a possible implementation manner, the determining, by the computer device, a reference density map corresponding to the sample image according to k boundary points corresponding to the target object may include: and performing polygon fitting on the k boundary points corresponding to the target object to obtain a polygon object for representing the shape of the target object. Based on the polygonal object, carrying out Gaussian distribution expression on the target object to obtain a Gaussian distribution expression; and determining a reference density map corresponding to the sample image based on the Gaussian distribution expression.

Specifically, the above mentioned polygon objects may include, but are not limited to: oval, rectangular, circular, triangular, quadrilateral, pentagonal, etc. For example, assuming that the target object is a pig, the corresponding polygonal object of the pig may be an ellipse. Then, for each pig of the pig pictures, ellipse fitting is performed on the marked boundary points, so that parameters of an ellipse representing the shape of the pig can be obtained. The parameters of the ellipse may include, but are not limited to: ellipse center point coordinates (xc, yc), major axis length (Ra), minor axis length (Rb), and tilt angle (θ). Specifically, the parameters of the ellipse corresponding to each pig can be obtained by solving with the least square method based on the position coordinates of the labeled boundary points of each pig.

For example, please refer to fig. 3b, wherein fig. 3b is a schematic view of a scene of an edge annotation provided in the embodiment of the present application. For example, the number k may be equal to 5, as shown in fig. 3b, each pig in the pig image in fig. 3a may be labeled based on an edge labeling method, so as to obtain 5 boundary points corresponding to each pig, and ellipse fitting is performed based on the corresponding boundary points of each pig, so as to obtain a plurality of ellipse objects, such as S001, S002, S003, S004, S005, and so on.

Based on the mode, the edge of the target object is labeled firstly, and then the corresponding polygon object is fitted based on the edge labeling result, and the polygon object can be used for expressing the shape of the target object, so that the polygon object (such as an ellipse) which is convenient for processing by computer equipment can be used for approximately expressing the target object (such as a pig) in real life, the complexity in the image processing process is reduced, and the image processing efficiency can be improved.

Further, when the polygonal object is an ellipse, the computer device performs gaussian distribution expression on the target object based on the polygonal object (ellipse), and the specific process of obtaining the gaussian distribution expression is as follows: based on the parameters of the ellipse, performing Gaussian distribution expression on the target object to obtain a Gaussian distribution expression, wherein the parameters of the ellipse comprise any one or more of the following items: the included angle between the long half shaft, the short half shaft, the central point, the long half shaft and the horizontal direction.

For example, since the sum of the values of the gaussian distribution within 3 σ can reach 99.7% of the sum of the values of all ranges, the ellipse of the pig positioning can be approximately regarded as the 3 σ boundary of the gaussian distribution in the embodiment of the present application. Further, the relationship between the parameters of the Gaussian distribution and the parameters of the ellipse of the obtained pig is shown in the following formula 1.1-1.5:

μ＝(x_c',y_c')＝(x_c/8,y_c/8) (1.2)

σ₁＝R_a'/3＝R_a/24 (1.3)

σ₂＝R_b'/3＝R_b/24 (1.4)

in the above formulas 1.1 to 1.5, (x)_c,y_c) Is the center point of the ellipse, (R)_a,R_b) The major semi-axis and the minor semi-axis of the ellipse, theta is the clockwise angle between the major semi-axis and the horizontal direction of the ellipse, and G is the standard function of the Gaussian distribution expression. In this way, after the target object in the sample image is labeled, the target object is fitted to a polygonal object which can be used for representing the shape of the object. And then, performing gaussian distribution expression according to parameters (such as parameters of an ellipse) of the polygonal object to obtain a corresponding gaussian distribution expression. Since a polygon object for expressing the shape of the target object can be obtained by polygon fitting, a Gaussian distribution expression is determined based on the parameters of the polygon object, and the Gaussian distribution expression is usedThe method can be used for reflecting the density attribute of the target object, so that the corresponding density map can be determined more accurately based on a more accurate Gaussian distribution expression.

In one possible implementation, the determining, by the computer device, a reference density map corresponding to the sample image based on the gaussian distribution expression may include: determining the density value of each pixel point in the polygonal object based on the Gaussian distribution expression; and carrying out normalization processing on the density value of each pixel point in the polygonal object to obtain a reference density map corresponding to the sample image.

Specifically, by the above formula, after μ, σ 1, σ 2, and Σ i are obtained by calculation, the position coordinates of each pixel point in the ellipse are substituted into the formula, and then the density value corresponding to each pixel point in the ellipse can be calculated.

For example, the position coordinate of any pixel point in the ellipse can be expressed as (x)₀，y₀) Then based on the position coordinates (x)₀，y₀) The vector of the pixel point can be obtained by calculation

So-called vector

Refers to a quantity having a magnitude and a direction, specifically, in terms of a position coordinate (x)₀，y₀) Vectors can be calculated

Is of the size of

And, from the position coordinates (x)₀，y₀) Vectors can also be calculated

In the direction of (a). And then the vector of each pixel point can be obtained

Next, the vector of any pixel point is calculated

Substituting μ, σ 1, σ 2, and Σ i into equation (1.1), the density value corresponding to any pixel point in the ellipse can be obtained as follows:

wherein, the calculation range of the Gaussian distribution expression can be simply limited in the horizontal direction [ x ] according to the 3 sigma principle_c-3σ₁₁，x_c+3σ₁₁]In the vertical direction [ y_c-3σ₂₂，y_c+3σ₂₂]。

Finally, in order to ensure that the sum in the ellipse is 1, the density values of the pixel points in the polygon object need to be normalized, so that the reference density map corresponding to the finally obtained sample image is represented by the following formula 2:

and Zi is the sum of the density values corresponding to all the calculated pixel points.

In one possible implementation, the determining, by the computer device, a reference density map corresponding to the sample image based on the gaussian distribution expression may include: determining the density value of each pixel point in the polygonal object based on the Gaussian distribution expression; and carrying out normalization processing on the density value of each pixel point in the polygonal object to obtain a reference density map corresponding to the sample image. Based on the mode, the density value is subjected to normalization processing, so that the image processing flow can be simplified, the operation complexity is reduced, and the processing efficiency is improved.

For example, please refer to fig. 4a, fig. 4a is a schematic diagram of a reference density map provided in the present embodiment. As shown in fig. 4a, it can be seen from the reference density map calculated in the above manner that the density value corresponding to the pixel point inside the ellipse is greater than the density value corresponding to the pixel point outside the ellipse. Further, please refer to fig. 4b, wherein fig. 4b is a schematic diagram of another reference density map provided in the embodiment of the present application. The reference density map shown in fig. 4b is a density map generated by a conventional gaussian kernel convolution method, which has the following principle: only the position of the target objects needs to be considered and the size of the gaussian kernel depends on the distance between the target objects. By comparing the reference density map shown in fig. 4a (obtained in the embodiment of the present application) with the reference density map shown in fig. 4b (obtained in the conventional manner), it can be seen that the region corresponding to the target object in the reference density map obtained in the present application is clearer than the region corresponding to the target object in the reference density map obtained in the conventional manner, so that the accuracy of identifying the number of target objects based on the reference density map is improved.

S203: and calling the neural network model to carry out density estimation processing on the sample image to obtain a predicted density map corresponding to the sample image.

In the embodiment of the present application, the neural network model may include, but is not limited to: convolutional Neural Networks (CNN) models, deep Neural Networks (DNN) models, and the like have network models for density estimation processing, and the embodiment of the present application does not specifically limit the model structure of the Neural network model.

For example, in the embodiment of the present application, a DNN deep neural network model may be used to perform density estimation processing on a sample image, so as to obtain a predicted density map corresponding to the sample image. Referring to fig. 5a, fig. 5a is a schematic diagram of an object recognition model according to an embodiment of the present disclosure. As shown in fig. 5a, the input of the object recognition model is a sample image, and after model processing (such as convolution processing, global average pooling processing, and normalization processing), a predicted density map corresponding to the sample image can be output.

Further, the object recognition model mentioned in the embodiment of the present application is a multi-branch network structure formed based on a plurality of selective kernel convolutions, bottleneck layers and residual blocks. The selective kernel convolution is used for realizing the self-adaptive adjustment of the convolution kernel size; the bottleneck layer and the residual block are used for reducing network parameters and improving the network characteristic expression capability of the object recognition model.

Specifically, the existing reference density map regression network and its advantages are comprehensively considered and improved and fused, please refer to fig. 5b, and fig. 5b is a schematic diagram of a network structure of an object recognition model provided in an embodiment of the present application. As shown in fig. 5b, the multi-branch network structure based on the selective kernel convolution (SK conv) in the present application realizes adaptive adjustment of the convolution kernel size, such as SK _1, SK _2, SK _3, and so on. Meanwhile, the method and the device realize multi-layer feature fusion by adopting a mode of up-sampling and stacking deep features so as to achieve high-level features. That is, the characteristics output by the SK _2 module and the characteristics output by the SK _3 module after upsampling are spliced to obtain high-level characteristics, which can be subsequently used for inputting into a subsequent network module (for example, a subsequent bottleeck module). In addition, the design of the network adopts the structures such as commonly used bottlenck (hourglass-shaped structure: 3 × 3conv9,3 × 3conv10,1 × 1conv11,1 × 1conv 12) and residual block (residual block), thereby reducing the network parameters and improving the expression capacity of the network characteristics.

Still further, after the sample image is input to the object recognition model for density estimation, the size of each network module is as shown in table 1 below:

TABLE 1 size corresponding to each network module in the object recognition model

Layer Name// Stride dimension	Output Size (Output Size)
		Input image	B×3×768×768
7×7conv1//2	B×16×384×384
		Max pooling//2	B×16×192×192
SK_1//1	B×32×192×192
		SK_2//2	B×64×96×96
SK_3//2	B×128×48×48
		3×3transposed conv1//2	B×64×96×96
Concat (splicing)	B×128×96×96
		3×3conv9//1	B×64×96×96
3×3conv10//1	B×64×96×96
		1×1conv11//1	B×512×96×96
1×1conv12//1	B×1×96×96

Through the mode, the improved convolutional neural network model is adopted as the model structure of the object recognition model, self-adaptive adjustment of the size of a convolutional kernel is realized through the multi-branch network structure based on the selective kernel constraint, and meanwhile, the multi-layer feature fusion is realized by adopting the up-sampling and stacking deep features so as to utilize the high-level features, so that the object recognition model in the scheme can have the capability of generating a more accurate density map.

S204: and training the neural network model based on the reference density map and the prediction density map to obtain an object recognition model, wherein the object recognition model is used for recognizing the number of target objects in the image.

In one possible implementation, the training, by the computer device, the neural network model based on the reference density map and the predicted density map to obtain the object recognition model may include: determining a network loss for the neural network model based on difference data between the reference density map and the predicted density map; and carrying out iterative adjustment processing on the neural network model based on network loss, and taking the adjusted neural network model as an object recognition model if the adjusted neural network model meets the model convergence condition.

The model convergence condition may be: when the training times of the neural network model reach a preset training threshold value, for example, 100 times, the neural network model meets the model convergence condition; or when the error between the predicted density map corresponding to each sample image and the reference density map corresponding to each sample image is smaller than the error threshold, the neural network model meets the model convergence condition; or, when the change between the reference density maps corresponding to each sample image obtained by two adjacent times of training of the neural network model is smaller than the change threshold, the neural network model meets the model convergence condition.

Specifically, the computer device determining a network loss of the neural network model based on difference data between the reference density map and the predicted density map may include: obtaining pixel scale loss based on pixel value difference values between each pixel point in the reference density map and each pixel point in the prediction density map; obtaining a count scale loss based on a number difference for the target object between the reference density map and the predicted density map; and determining the network loss of the neural network model according to the pixel scale loss and the counting scale loss.

For example, the expression for calculating the network loss is shown in equations 3.1-3.3 below:

loss＝l_pixel+λl_count (3.3)

wherein l_pixelThe pixel scale loss is used for representing the difference value of two pictures (a reference density graph and a prediction density graph) at each pixel point position; l. the_countFor counting the loss of scale, the difference in counts between two pictures (reference density map and predicted density map) is characterized. D (x, y) and

respectively representing the density values of all the pixel points in the predicted density map and the reference density map; mean represents the mean. In addition, the weighting coefficients of the pixel scale loss and the count scale loss may be set in a user-defined manner, which is not specifically limited in the embodiment of the present application.

It can be understood that, when the network loss is calculated, the loss of the pixel scale (pixel scale loss) and the counting loss of the global scale (counting scale loss) are considered at the same time, and the position and the number of the pig are constrained through the network loss, so that the neural network model can be ensured to obtain a smaller counting error on the premise that the predicted density map is close to the reference density map, the accuracy of the object recognition model obtained after training is improved, and the number of the target objects in the image can be accurately recognized and obtained on the basis of the more accurate object recognition model subsequently.

In the embodiment of the application, a sample image is obtained; labeling a target object contained in the sample image to obtain a reference density map corresponding to the sample image; calling a neural network model to carry out density estimation processing on the sample image to obtain a predicted density map corresponding to the sample image; and training the neural network model based on the reference density map and the prediction density map to obtain an object recognition model, wherein the object recognition model is used for recognizing the number of target objects in the image. Therefore, the target objects in the sample images can be labeled, so that the reference density maps corresponding to the sample images are obtained, the object recognition models are trained through the reference density maps, the object recognition models can accurately recognize the density maps, the number of the target objects is determined based on the density maps, and compared with the method for directly recognizing the target objects in the images, the method can accurately determine the number of the target objects, and therefore accuracy of model training is improved. Meanwhile, self-adaptive adjustment of the size of a convolution kernel is realized through a multi-branch network structure based on selective kernel volume, and multi-layer feature fusion is realized by adopting an up-sampling and deep-layer feature to utilize high-level features, so that the object recognition model in the scheme can have the capability of generating a more accurate density map, and the accuracy of determining the number of target objects is improved.

Referring to fig. 6, fig. 6 is a schematic flowchart of a model training method according to an embodiment of the present disclosure. As shown in fig. 6, the model training method may be performed by the computer device mentioned above, and the computer device may be a terminal device or a server.

Referring to fig. 6, the model training method may include steps S601 to S603:

s601: and acquiring an image to be processed.

In an embodiment of the application, the sample image may comprise at least one target object. The types of the target object may include, but are not limited to: animal, human, flower, grass, etc., and the type of the target object is not particularly limited in the examples of the present application. For convenience of explanation, in the following embodiments of the present application, the type of the target object is taken as an example for a corresponding explanation, and specifically, the target object of the animal type may be, for example: cats, dogs, pigs, rabbits, etc.

S602: and calling an object recognition model to perform density estimation processing on a target object contained in the image to be processed to obtain a target density map corresponding to the image to be processed, wherein the object recognition model is obtained by training a neural network model based on the sample image and a reference density map corresponding to the sample image, and the reference density map is obtained by labeling the target object contained in the sample image.

Specifically, the computer device may invoke the trained object recognition model, and perform density estimation processing on a target object included in the image to be processed, thereby obtaining a target density map corresponding to the image to be processed.

As can be seen from the above, the object recognition model in the embodiment of the present application is a multi-branch network structure based on selective kernel convolution, so that adaptive adjustment of the size of a convolution kernel can be achieved, and meanwhile, multi-layer feature fusion is achieved by using an upsampling and stacking deep feature to utilize a high-level feature, so that the object recognition model in the scheme can generate a more accurate density map.

S603: based on the target density map, the number of target objects contained in the image to be processed is determined.

In this embodiment of the application, the density map may be used to reflect the density value of each pixel, that is, the density map may include the density value of each pixel. In one possible implementation, the computer device determines the number of target objects contained in the image to be processed based on the target density map, as shown in formula (4):

wherein D (x) represents the density value corresponding to each pixel point in the target density map. It can be understood that, as can be seen from the foregoing embodiment, the sum of the density values of the pixel points in each polygonal object (for example, ellipse) in the density map is 1, then the sum result obtained by summing the density values corresponding to the pixel points in the target density map can be regarded as the number of ellipses, and the number of ellipses can be regarded as the number of target objects.

According to the above description, the determination of the number of target objects in the image to be processed according to the embodiment of the present application is illustrated below by referring to an example. Referring to fig. 7, fig. 7 is a scene schematic diagram of a model application according to an embodiment of the present disclosure. As shown in fig. 7, for example, the input image to be processed is as shown in a picture 701a in fig. 7, the actual number of the target objects determined from the density map shown in a picture 701b may be 14, and the predicted number of the target objects determined from the density map shown in a picture 701c may be 13.75; for another example, the input image to be processed is as shown in a picture 702a in fig. 7, the actual number of the target objects determined according to the density map shown in the picture 702b may be 23, and the predicted number of the target objects determined according to the density map shown in the picture 702c may be 23.36; for example, as shown in a picture 703a in fig. 7, the input image to be processed may have a true number of target objects determined from the density map shown in the picture 703b of 42, and a predicted number of target objects determined from the density map shown in the picture 703c of 41.15.

In one possible implementation, the computer device may further determine, based on the target density map, position data of a target object included in the image to be processed. Specifically, the computer device may perform segmentation processing on the target density map to obtain a plurality of density patches; then, the computer equipment carries out clustering processing on each target object in each density image block to obtain a clustering result of each density image block; finally, the computer device may determine, based on the clustering result of each density tile, position data of a target object contained in the image to be processed.

Next, a process of how to obtain position data of the target object based on the target density map will be described in detail:

(1) And carrying out segmentation processing on the target density graph to obtain a plurality of density graph blocks.

Specifically, the computer device may perform segmentation processing on the target density map to obtain a plurality of density patches, which may specifically include: and performing threshold segmentation processing on the target density map to obtain a plurality of density image blocks. The so-called thresholding is a segmentation technique based on image regions, and the principle is to classify image pixels into several classes, for example, thresholding may include but is not limited to: otsu (the Otsu method or the maximum inter-class variance method) threshold segmentation, adaptive threshold segmentation, maximum entropy threshold segmentation, iterative threshold segmentation, and the like.

In the embodiment of the application, based on the threshold segmentation technology, the pixel points in the target density graph can be divided, so that a plurality of density graph blocks which are not connected with each other are obtained, and it can be understood that the area of the regions of the plurality of density graph blocks can be the same or different. And a plurality of density image blocks are obtained based on a threshold segmentation mode, and are subsequently processed based on each density image block as a unit, so that the complexity of clustering processing can be reduced, and the image processing efficiency is improved.

(2) And clustering each target object in each density image block to obtain a clustering result of each density image block.

Specifically, in the embodiment of the present application, each density tile contains at least one target object therein. The computer device may perform clustering processing on each target object in each density tile based on a clustering algorithm, so as to obtain a clustering result of each density tile. For example, the clustering algorithm may include, but is not limited to: K-Means Clustering algorithm, mean shift Clustering algorithm, DBSCAN Clustering algorithm (Density-Based Clustering of Applications with Noise), hierarchical Clustering algorithm. The clustering algorithm in the embodiment of the application can be a K-Means clustering algorithm.

The method is characterized in that clustering processing is carried out on each target object in each density graph block by using a K-Means clustering algorithm, and the clustering processing precision is ensured by using the clustering algorithm through dividing the distribution area of the target objects and giving more reliable initial clustering points, namely N initial clustering points (namely the number of the target objects in the target density graph block) are determined from pixel points with larger density values. Therefore, the clustering process is more biased to the position with higher density value, and the higher precision of the clustering position is ensured.

(3) And determining the position data of the target object contained in the image to be processed based on the clustering result of each density image block.

It is understood that, in the above manner, the clustering result of each density tile can be obtained. The clustering result of any one of the density patches may include position data (e.g., position coordinates) of the cluster center point. Specifically, the determining, by the computer device, the position data of the target object included in the image to be processed based on the clustering result of each density tile may include: and taking the position data of the cluster center point in each density image block as the position data of the target object contained in the image to be processed.

In the embodiment of the present application, the object recognition model obtained based on the training in the above embodiment may be used to recognize the number of target objects in the image. The improved object recognition model is adopted, and the positions and the number of the target objects are restrained based on network loss, so that the error of counting the target objects can be reduced, and the accuracy of the number of the target objects obtained by recognition is improved.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a model training device according to an embodiment of the present disclosure. The model training apparatus 800 can be applied to a computer device (e.g., a terminal device or a server) in the foregoing embodiments. The model training apparatus 800 may be a computer program (including program code) running on a computer device, for example, the model training apparatus 800 is an application software; the model training apparatus 800 may be used to perform corresponding steps in the data processing method provided in the embodiments of the present application. The model training apparatus 800 may include:

an acquisition unit 801 for acquiring a sample image;

the processing unit 802 is configured to perform labeling processing on a target object included in the sample image to obtain a reference density map corresponding to the sample image;

the processing unit 802 is further configured to invoke the neural network model to perform density estimation processing on the sample image, so as to obtain a predicted density map corresponding to the sample image;

the training unit 803 is configured to train the neural network model based on the reference density map and the predicted density map to obtain an object recognition model, where the object recognition model is used to recognize the number of target objects in the image.

In a possible implementation manner, the processing unit 802 performs labeling processing on the target object to obtain a reference density map corresponding to the sample image, and is configured to perform the following operations:

In one possible implementation, the polygonal object is an ellipse; the processing unit 802 determines, based on the polygonal object, a reference density map corresponding to the sample image, for performing the following operations:

In one possible implementation, the processing unit 802 determines a reference density map corresponding to the sample image based on a gaussian distribution expression, and is configured to perform the following operations:

In a possible implementation manner, the training unit 803 trains the neural network model based on the reference density map and the predicted density map, resulting in an object recognition model, which is used to perform the following operations:

In one possible implementation, the training unit 803 determines a network loss of the neural network model based on difference data between the reference density map and the predicted density map for performing the following operations:

In the embodiment of the application, a sample image is obtained; labeling a target object contained in the sample image to obtain a reference density map corresponding to the sample image; calling a neural network model to carry out density estimation processing on the sample image to obtain a predicted density map corresponding to the sample image; and training the neural network model based on the reference density map and the prediction density map to obtain an object recognition model, wherein the object recognition model is used for recognizing the number of target objects in the image. Therefore, the target objects in the sample images can be labeled, so that the reference density maps corresponding to the sample images are obtained, the object recognition models are trained through the reference density maps, the object recognition models can accurately recognize the density maps, the number of the target objects is determined based on the density maps, and compared with the method for directly recognizing the target objects in the images, the number of the target objects can be accurately determined, the model training process is more accurate, and the number of the target objects can be accurately recognized based on the more accurate object recognition models.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a model training device according to an embodiment of the present disclosure. The model training apparatus 900 can be applied to a computer device (e.g., a terminal device or a server) in the foregoing embodiments. The model training apparatus 900 may be a computer program (including program code) running on a computer device, for example, the model training apparatus 900 is an application software; the model training apparatus 900 can be used to execute the corresponding steps in the data processing method provided by the embodiment of the present application. The model training apparatus 900 may include:

an acquiring unit 901 configured to acquire an image to be processed;

the processing unit 902 is configured to invoke an object recognition model to perform density estimation processing on a target object included in the image to be processed, so as to obtain a target density map corresponding to the image to be processed, where the object recognition model is obtained by the above-mentioned model training method;

a determining unit 903, configured to determine the number of target objects included in the image to be processed based on the target density map.

In one possible implementation, the determining unit 903 determines the number of target objects contained in the image to be processed based on the target density map, for performing the following operations:

In the embodiment of the present application, the object recognition model obtained based on the training in the above embodiment may be used to recognize the number of target objects in the image. Due to the fact that the improved object recognition model is adopted and the positions and the number of the target objects are constrained based on network loss, the error of counting the target objects can be reduced, and accuracy of the number of the target objects obtained through recognition is improved.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure. The computer device 1000 is configured to perform the steps performed by the computer device (terminal device or server) in the foregoing method embodiment, and the computer device 1000 includes: one or more processors 1010; one or more input devices 1020, one or more output devices 1030, and a memory 1040. The processor 1010, input device 1020, output device 1030, and memory 1040 are connected by a bus 1050. The memory 1040 is used for storing a computer program comprising program instructions, and the processor 1010 is used for calling the program instructions stored in the memory 1040 to execute the following operations:

acquiring a sample image;

and training the neural network model based on the reference density map and the prediction density map to obtain an object recognition model, wherein the object recognition model is used for recognizing the number of target objects in the image.

In a possible implementation manner, the processor 1010 performs labeling processing on the target object to obtain a reference density map corresponding to the sample image, and is configured to perform the following operations:

In one possible implementation, the polygonal object is an ellipse; the processor 1010 determines a reference density map corresponding to the sample image based on the polygonal object, and is configured to:

In one possible implementation, the processor 1010 determines a reference density map corresponding to the sample image based on a gaussian distribution expression, and is configured to:

In one possible implementation, the processor 1010 trains the neural network model based on the reference density map and the predicted density map, resulting in an object recognition model for performing the following operations:

In one possible implementation, the processor 1010 determines a network loss of the neural network model based on difference data between the reference density map and the predicted density map for performing the following operations:

In another possible implementation, the processor 1010 is configured to call the program instructions stored in the memory 1040, and further configured to perform the following operations:

acquiring an image to be processed;

In one possible implementation, the processor 1010 determines the number of target objects contained in the image to be processed based on the target density map, and is configured to:

Furthermore, it is to be noted here that: an embodiment of the present application further provides a computer storage medium, where a computer program is stored in the computer storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the method in the foregoing corresponding embodiment can be executed, and therefore, details will not be described here again. For technical details not disclosed in the embodiments of the computer storage medium referred to in the present application, reference is made to the description of the embodiments of the method of the present application. By way of example, program instructions may be deployed to be executed on one computer device or on multiple computer devices at one site or distributed across multiple sites and interconnected by a communication network.

According to an aspect of the application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device can perform the method in the foregoing embodiments, and therefore, the detailed description is omitted here.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A method of model training, comprising:

acquiring a sample image;

training the neural network model based on the reference density map and the prediction density map to obtain an object recognition model, wherein the object recognition model is used for recognizing the number of target objects in the image.

2. The method of claim 1, wherein the labeling the target object to obtain a reference density map corresponding to the sample image comprises:

performing polygon fitting on the k boundary points corresponding to the target object to obtain a polygon object for representing the shape of the target object;

3. The method of claim 2, wherein the polygonal object is an ellipse; the determining a corresponding reference density map of the sample image based on the polygonal object comprises:

4. The method of claim 3, wherein said determining a reference density map corresponding to the sample image based on the Gaussian distribution expression comprises:

5. The method of claim 1, wherein training the neural network model based on the reference density map and the predicted density map to derive an object recognition model comprises:

determining a network loss for the neural network model based on difference data between the reference density map and the predicted density map;

and carrying out iterative adjustment processing on the neural network model based on the network loss, and taking the adjusted neural network model as an object recognition model if the adjusted neural network model meets a model convergence condition.

6. The method of claim 5, wherein determining the network loss of the neural network model based on difference data between the reference density map and the predicted density map comprises:

acquiring pixel scale loss based on pixel value difference values between each pixel point in the reference density map and each pixel point in the prediction density map;

obtaining a count scale loss based on a number difference for a target object between the reference density map and the predicted density map;

7. A method of model training, comprising:

acquiring an image to be processed;

calling an object recognition model to perform density estimation processing on a target object contained in the image to be processed to obtain a target density map corresponding to the image to be processed, wherein the object recognition model is obtained by the model training method according to any one of claims 1-6;

and determining the number of target objects contained in the image to be processed based on the target density map.

8. The method of claim 7, wherein the determining the number of target objects contained in the image to be processed based on the target density map comprises:

and performing summation operation on the density values corresponding to the pixel points to obtain the number of the target objects contained in the image to be processed.

9. A computer device, comprising:

a processor adapted to execute a computer program;

a computer-readable storage medium, in which a computer program is stored which, when executed by the processor, carries out the method according to any one of claims 1-6 or 7-8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method of any of claims 1-6 or 7-8.