CN116091785A - Identification model generation method and device, electronic equipment and storage medium - Google Patents

Identification model generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116091785A
CN116091785A CN202310031651.6A CN202310031651A CN116091785A CN 116091785 A CN116091785 A CN 116091785A CN 202310031651 A CN202310031651 A CN 202310031651A CN 116091785 A CN116091785 A CN 116091785A
Authority
CN
China
Prior art keywords
image
target
data set
images
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310031651.6A
Other languages
Chinese (zh)
Inventor
吕文玉
黄奎
倪烽
***
党青青
刘毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310031651.6A priority Critical patent/CN116091785A/en
Publication of CN116091785A publication Critical patent/CN116091785A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/72Data preparation, e.g. statistical preprocessing of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a generation method, a generation device, electronic equipment and a storage medium of an identification model, relates to the field of artificial intelligence, and particularly relates to the field of computer vision. The specific implementation scheme is as follows: acquiring an image data set, carrying out anchor-free regression processing on the image data set, and determining target frames of a plurality of images contained in the image data set, wherein the target frames are used for identifying areas of objects in corresponding images; determining corresponding proportion distribution data of an image data set based on target frames of a plurality of images, wherein the proportion distribution data represent distribution information of the proportion of areas of the target frames in the corresponding images in the image data set; performing image cutting processing on the image data set according to the duty distribution data to obtain an image data set subjected to the image cutting processing; training an initial recognition model based on the image dataset after image cutting processing to obtain a target recognition model, wherein the initial recognition model is at least used for multi-scale feature extraction of objects in the image dataset after image cutting processing.

Description

Identification model generation method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the field of computer vision, and specifically relates to a method and a device for generating an identification model, electronic equipment and a storage medium.
Background
Target detection is an important research direction in the field of computer vision and is also the basis of other complex visual tasks. As a basis for image understanding and computer vision, object detection is the basis for addressing higher-level visual tasks such as segmentation, scene understanding, object tracking, image description, and event detection. While small object detection is a difficulty in object detection, it aims to accurately detect small objects with few visual features in the image, e.g., objects below 32 pixels x 32 pixels. The small target has the characteristics of small coverage area, small information and the like because the pixel occupation ratio of the small target is small, so that the detection precision of the small target is difficult to improve. In a real scene, a large number of small targets to be detected exist, so that the small target detection has wide application prospect.
At present, the existing small target detection models cannot give consideration to the recognition precision and the operation efficiency of small target detection, for example, some detection models have simple implementation flow, but the precision is generally lower; or, some detection accuracy is improved, but the implementation process is complex.
Disclosure of Invention
The disclosure provides a generation method and device of an identification model, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a method for generating an identification model, including: acquiring an image data set, carrying out anchor-free regression processing on the image data set, and determining target frames of a plurality of images contained in the image data set, wherein the target frames are used for identifying areas of objects in corresponding images; determining corresponding proportion distribution data of an image data set based on target frames of a plurality of images, wherein the proportion distribution data represent distribution information of the proportion of areas of the target frames in the corresponding images in the image data set; performing image cutting processing on the image data set according to the duty distribution data to obtain an image data set subjected to the image cutting processing; training an initial recognition model based on the image dataset after image cutting processing to obtain a target recognition model, wherein the initial recognition model is at least used for multi-scale feature extraction of objects in the image dataset after image cutting processing.
According to another aspect of the present disclosure, there is also provided a generating apparatus of an identification model, including: the acquisition module is used for acquiring an image data set, carrying out anchor-free frame regression processing on the image data set, and determining target frames of a plurality of images contained in the image data set, wherein the target frames are used for identifying the areas of the objects in the corresponding images; the distribution determining module is used for determining the corresponding proportion distribution data of the image data set based on the target frames of the plurality of images, wherein the proportion distribution data represent the distribution information of the area proportion of the target frames in the corresponding images in the image data set; the image cutting module is used for cutting the image data set according to the duty distribution data to obtain an image data set subjected to image cutting; the model training model is used for training the initial recognition model based on the image dataset after the image cutting processing to obtain the target recognition model, wherein the initial recognition model is at least used for multi-scale feature extraction of the objects in the image dataset after the image cutting processing.
According to another aspect of the present disclosure, there is also provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating the identification model.
According to another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the above-described generation method of the identification model.
According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements a method of generating an identification model according to the above.
As can be seen from the above, in the present disclosure, the image dataset is processed by adopting the anchor-frame-free regression method, so that the accuracy of identifying the target frame of the image area can be improved, and the accuracy of identifying the target by the identification model can be improved; moreover, the anchor-frame-free regression mode is adopted, and manual labeling of the areas in each image is not needed, so that the recognition efficiency of the recognition model is improved. In addition, image data sets are subjected to image cutting processing according to the duty distribution data of the target frames, so that the recognition accuracy of small targets is improved; in addition, in the process of training the recognition model, multi-scale feature extraction can be performed on the object in the image data set, and further the recognition accuracy of the recognition model can be improved according to the multi-scale features. In addition, the scheme provided by the disclosure is based on the image dataset after image cutting for model training, and the targets contained in the image dataset after image cutting are small targets, namely the recognition of the small targets can be realized, the calculation consumption of a computer system is not increased in the recognition process, and the recognition precision and the calculation efficiency are both realized.
Therefore, the scheme provided by the disclosure achieves the purpose of identifying the small target, improves the detection precision and the operation efficiency of the small target, and solves the problem that in the related art, the identification accuracy and the operation efficiency are incompatible when the small target is identified.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a method of generating an identification model according to an embodiment of the present disclosure;
FIG. 2 is a training schematic of a target recognition model according to an embodiment of the present disclosure;
FIG. 3 is a network schematic diagram of an object recognition model according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a generation apparatus of an identification model according to an embodiment of the present disclosure;
fig. 5 is a schematic block diagram of an electronic device used to implement an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, etc. of the related personal information of the user all conform to the rules of the related laws and regulations, and do not violate the popular regulations of the public order.
Example 1
According to an aspect of the present disclosure, there is provided a method for generating an identification model, which may be executed in a server or a terminal device, and in this embodiment, explanation is made with the terminal device as an execution subject. Fig. 1 is a flowchart of a method for generating an identification model according to the present embodiment, where, as shown in fig. 1, the method includes:
step S102, an image data set is obtained, anchor-free frame regression processing is carried out on the image data set, and target frames of a plurality of images contained in the image data set are determined, wherein the target frames are used for identifying areas of objects in corresponding images.
In step S102, the image dataset is composed of a plurality of images, at least a part of the images in the plurality of images include a small target object, where the small target object is an object whose number of pixels occupied in the image is less than a preset number of pixels, and for example, the small target object may be an object with a pixel size of 32×32 pixels or less.
In addition, in step S102, each object has one frame, and there may be only one object or there may be a plurality of objects in one image. Compared with the prior art, the method for determining the target frame in each image by adopting the anchor frame adopts the anchor frame-free regression processing method to determine the target frame corresponding to each image, so that the problems of artificial design superparameter and complex calculation caused by the anchor frame can be solved, the recognition precision of the target recognition model can be improved, and the recognition efficiency of the target recognition model can be improved.
Step S104, determining the corresponding proportion distribution data of the image data set based on the target frames of the plurality of images.
In step S104, the distribution data of the duty ratio characterizes the distribution information of the area duty ratio of the target frame in the corresponding image in the image data set. Optionally, the above-mentioned duty distribution data may include a first type area duty ratio and a second type area duty ratio, where the first type area duty ratio characterizes the area duty ratio to be less than or equal to a target threshold, and the second type area duty ratio characterizes the area duty ratio to be greater than the target threshold, that is, the first type area duty ratio characterizes the area duty ratio of the small target object, the second type area duty ratio characterizes the area duty ratio of the non-small target object, and the area of the area where the object is located may be determined by the area of the target frame corresponding to the object.
The method is characterized in that the duty distribution data characterizes the number duty ratio of small target objects in the current image, whether the current image is subjected to image cutting processing is determined by analyzing the duty distribution data, and then the initial recognition model is trained based on the image data set after the image cutting processing, so that the recognition precision of the target recognition model for recognizing the small target objects can be improved; moreover, the identification process is simple, and the identification efficiency of the small target object is improved.
And step S106, performing image cutting processing on the image data set according to the duty distribution data to obtain the image data set subjected to the image cutting processing.
In step S106, after determining the duty distribution data of each image, the terminal device may perform a cut-out process on an image whose duty distribution data satisfies a certain condition, for example, the terminal device performs a cut-out process on an image containing a larger number of small target objects, while not performing a cut-out process on an image containing a smaller number of small target objects. On one hand, the method and the device for identifying the small target object have the advantages that the image containing a large number of small target objects is subjected to image cutting processing, and the initial identification model is trained by the image data set based on the image cutting processing, so that the identification precision of the small target objects identified by the target identification model can be improved; on the other hand, the image containing a small number of small target objects is not subjected to image cutting processing, so that the problem that the accuracy of identifying the large target objects is low due to image cutting processing of the large target objects can be avoided, and the accuracy of identifying the target objects by the target identification model can be further improved.
Optionally, in the process of cutting the image dataset according to the duty distribution data, the terminal device may cut the image of the current image according to the target frame in the current image, and the image corresponding to the target frame maintains the integrity of the target frame in one complete sub-image of the multiple sub-images obtained after cutting the image of the current image, that is, in the process of cutting the image, so as to further ensure the integrity of the small target object. In addition, only one target frame may be included in one sub-image, and a plurality of target frames may be included in one sub-image.
Step S108, training an initial recognition model based on the image dataset after image cutting processing to obtain a target recognition model, wherein the initial recognition model is at least used for multi-scale feature extraction of objects in the image dataset after image cutting processing.
In step S108, the initial recognition model includes at least a P2 feature layer and a transmission layer, and by adding the two layers, multi-scale feature extraction can be performed on the object in the image dataset after the segmentation processing, so that recognition accuracy of the target recognition model can be improved according to the multi-scale features.
It should be noted that the target recognition model obtained by training may be applied in the fields of automatic driving, smart medical treatment, defect detection, aerial image analysis, etc., for example, the target recognition model may be used to recognize a small target object in an image of an industrial device, so as to detect whether the industrial device is qualified or not.
Based on the above-mentioned schemes defined in step S102 to step S108, it can be known that in the present disclosure, a manner of performing a graph cutting process on an image dataset is adopted, after the image dataset is obtained, an anchor-free regression process is performed on the image dataset, a target frame of a plurality of images included in the image dataset is determined, and the corresponding proportion distribution data of the image dataset is determined based on the target frames of the plurality of images; then, performing image cutting processing on the image data set according to the duty distribution data to obtain an image data set subjected to the image cutting processing; and finally, training an initial recognition model based on the image dataset after image cutting processing to obtain a target recognition model, wherein the initial recognition model is at least used for multi-scale feature extraction of the objects in the image dataset after image cutting processing.
It is easy to note that in the present disclosure, the image dataset is processed in a non-anchor frame regression manner, so that the accuracy of identifying the target frame of the image area can be improved, and the accuracy of identifying the target by the identification model is improved; moreover, the anchor-frame-free regression mode is adopted, and manual labeling of the areas in each image is not needed, so that the recognition efficiency of the target recognition model is improved. In addition, image data sets are subjected to image cutting processing according to the duty distribution data of the target frames, so that the recognition accuracy of small targets is improved; in addition, in the process of training the recognition model, multi-scale feature extraction can be performed on the object in the image data set, and further the recognition accuracy of the target recognition model can be improved according to the multi-scale features. In addition, the scheme provided by the disclosure is based on the image dataset after image cutting for model training, and the targets contained in the image dataset after image cutting are small targets, namely the recognition of the small targets can be realized, the calculation consumption of a computer system is not increased in the recognition process, and the recognition precision and the calculation efficiency are both realized.
Therefore, the scheme provided by the disclosure achieves the purpose of identifying the small target, improves the detection precision and the operation efficiency of the small target, and solves the problem that in the related art, the identification accuracy and the operation efficiency are incompatible when the small target is identified.
Example 2
According to an aspect of the present disclosure, a method for generating an identification model is also provided. In this embodiment, each step mentioned in embodiment 1 is explained in detail with reference to a flowchart system of a generating method of an identification model provided in fig. 1.
In an alternative embodiment, after the terminal device acquires the image dataset, step S102 may be executed, that is, the image dataset is subjected to the anchor-free regression process, the target frames of the plurality of images included in the image dataset are determined, and then whether to perform the slicing process on the image is determined based on the target frame of each image.
Specifically, the terminal device predicts an initial anchor frame range corresponding to each image based on the positions of objects in a plurality of images in the image data set in the corresponding images, then counts the initial anchor frame range to obtain anchor frame size distribution information corresponding to the image data set, adjusts the initial anchor frame range based on the anchor frame size distribution information to obtain a target anchor frame range, and finally determines a target frame corresponding to each image based on the target anchor frame range.
Optionally, the terminal device first identifies the object in each image, identifies the position of the object in the image, and predicts the object anchor frame corresponding to each object with the position of the object in the image as the center, and further determines the initial anchor frame range corresponding to the current image according to the object anchor frame of each object, where the current image is any image in the image dataset, and the initial anchor frame range indicates that most (e.g., 98%) of the objects in the current image can be covered by the anchor frames in the initial anchor frame range. After obtaining the initial anchor frame range of each image, the terminal equipment counts anchor frame size distribution information of the initial anchor frame range in the whole image data set, adjusts the initial anchor frame range according to the anchor frame size distribution information, simultaneously adds negative coefficient weight values (for example, -2, -1 and the like), calculates expected values corresponding to the target anchor frame range through the negative coefficient weight values, and determines the target frame of each image based on the expected values.
It should be noted that, the image dataset is subjected to the anchor-free regression processing, and the target frame of each image is determined, that is, the size of the target frame can be dynamically adjusted, so that the small target object can be accurately detected, and the recognition precision of the small target object is effectively improved.
In an alternative embodiment, as shown in fig. 1, after determining the target frame of each image, the terminal device performs step S104 and step S106, that is, determines the duty distribution data based on the target frame of each image, and performs the image slicing process on the image dataset according to the duty distribution data, to obtain the image dataset after the image slicing process.
Specifically, the terminal equipment counts the number corresponding to the first type area occupation ratio and the number corresponding to the second type area occupation ratio of each image according to the occupation ratio distribution data; determining a first type of image from the image data set, and performing image cutting processing on the first type of image to obtain a first type of image subjected to image cutting processing; and determining a second type of image from the image data set, performing no image cutting processing on the second type of image, and generating an image data set after image cutting processing based on the first type of image and the second type of image after image cutting processing.
The first-class region duty ratio characterization region duty ratio is smaller than or equal to a target threshold value, and the second-class region duty ratio characterization region duty ratio is larger than the target threshold value; the first type of images are images with the first type of areas with the corresponding number being larger than or equal to the corresponding number of the second type of areas with the corresponding number, namely the first type of images are images with the large number of small target objects, and the second type of images are images with the small number of small target objects.
Optionally, fig. 2 shows a training schematic diagram of an optional object recognition model, in fig. 2, the terminal device first detects the duty distribution data of the size of the object frame in the image data set, determines the number of small object objects contained in each image according to the duty distribution data, and further determines whether to perform the slicing process on the current image according to the number of small object objects contained in each image. When the current image contains more small target objects, performing image cutting processing on the current image, and using the sub-image after image cutting for training a target recognition model; when the current image contains fewer small target objects, the current image is not subjected to image cutting processing, and the current image can be directly used for training of the target recognition model.
It should be noted that, the image dataset is subjected to image cutting processing according to the duty distribution data of the target frame, so as to improve the recognition accuracy of the small target, model training is performed based on the image dataset after image cutting, and the targets contained in the image dataset after image cutting are the small targets, so that the recognition of the small targets is realized, the computing consumption of the computer system is not increased in the recognition process, and the recognition accuracy and the computing efficiency are both realized.
Further, in the process of performing image cutting processing on the first type of image, the terminal equipment cuts each image in the first type of image into a plurality of sub-images based on the target anchor frame range, and generates the first type of image after image cutting processing based on the plurality of sub-images corresponding to each image.
Optionally, the terminal device may perform image cutting processing on the current image according to the target anchor frame range, so that each target frame may be completely presented on one sub-image, that is, the integrity of the target frame is maintained during image cutting, and further, the integrity of the small target object is ensured.
In practical application, it should be noted that it may also be determined whether to divide a plurality of target frames into one sub-image or include one target frame in each sub-image according to distribution information of the target frames. For example, when the target frame distribution is denser, a plurality of target frames may be divided into one sub-image; in a sparsity of target box distributions, each target box may be partitioned into a separate sub-image.
In addition, the image data set based on the image cutting processing is used for training the initial recognition model, so that the recognition precision of the target recognition model for recognizing the small target objects can be improved; according to the method and the device, the image containing a small number of small target objects is not subjected to image cutting processing, the problem that the accuracy of large target object identification is low due to image cutting processing of large target objects can be avoided, and the accuracy of target object identification by the target identification model can be further improved.
Further, as shown in fig. 1, after performing the image slicing process on the image dataset, the terminal device trains the initial recognition model based on the image dataset after the image slicing process, so as to obtain the target recognition model. After the target recognition model is obtained, the terminal device can also adjust the target recognition model in order to improve the recognition accuracy of the target recognition model.
Specifically, after a first recognition result of recognizing a first type of image after the cut-off image processing by the target recognition model and a second recognition result of recognizing a second type of image by the target recognition model are obtained, the terminal equipment determines the recognition accuracy of recognizing the first type of image by the target recognition model based on the first recognition result to obtain the first recognition accuracy; determining the recognition accuracy of the target recognition model for recognizing the second type of image based on the second recognition result to obtain second recognition accuracy; parameters of the target recognition model are then adjusted based on the first recognition accuracy and/or the second recognition accuracy.
Optionally, as shown in fig. 2, the terminal device performs recognition in different manners for different types of images, and performs parameter adjustment on the target recognition model in combination with recognition results corresponding to the different types of images, so that the target recognition model can accurately recognize the different types of images.
Further, as shown in fig. 2, for an image containing a large number of small target objects, the terminal device needs to perform a mosaic process when detecting the accuracy of the target recognition model. Specifically, the terminal device performs jigsaw processing on first recognition results corresponding to the plurality of sub-images to obtain spliced images, and adjusts parameters of the target recognition model based on similarity between the spliced images and original images, wherein the original images are images before the spliced images correspond to the cut images in the first type of images.
Optionally, as shown in fig. 2, for an image containing a large number of small target objects, the terminal device performs stitching processing on the subgraphs identified by the target identification model to obtain stitched images, and compares the similarity between the stitched images and the original image to determine the identification accuracy of the target identification model for identifying the small target objects. If the similarity between the spliced image and the original image is larger than the preset similarity, the identification accuracy of the target identification model for identifying the small target object is high, and parameter adjustment is not needed for the target identification model; otherwise, the identification accuracy of the identification of the small target object by the characterization target identification model is low, and parameter adjustment is needed to be carried out on the target identification model.
In addition, for an image containing a small number of small target objects, the terminal device directly determines whether to perform parameter adjustment on the target recognition model according to the recognition accuracy of the target recognition model on the image.
In addition, the terminal device may also adjust parameters of the target recognition model by combining recognition accuracy of the two types of images, for example, the terminal device may set weight values for the two types of images, perform weighted summation on the recognition accuracy of the two types of images, and determine whether to adjust the parameters of the target recognition model according to the weighted summation result.
It should be noted that, by adjusting parameters of the target recognition model, the recognition accuracy of the target recognition model is improved.
As can be seen from the above, according to the scheme provided by the present disclosure, by performing image slicing processing on the image dataset including a large number of small target objects, performing training of the target recognition model by using the sub-images after image slicing, and adjusting parameters of the target recognition model in a jigsaw manner, recognition accuracy of the target recognition model in recognizing the small target objects can be improved, and recognition efficiency is improved. In addition, the scheme provided by the disclosure can realize the extraction of multi-scale characteristics, and greatly improves the detection precision of the small target object. In addition, the method optimizes the regression detection frame, provides a determination mode of the regression range, introduces a central priori frame (predicting the initial anchor frame range), and greatly improves the accuracy of matching of the small target object.
In addition, according to the scheme provided by the disclosure, 38.29mAP can be achieved through Matlab test on a visclone standard data set, and compared with PP-YOLOE, the detection accuracy is improved by 1.16; compared with tph-yolov5, the detection accuracy is improved by 2.09. Compared with ppyole_plus mAP, AP50 and AP-small on the Coco data set, the detection precision of the method is respectively improved by 0.1, 0.3 and 1.9.
Example 3
According to an aspect of the present disclosure, an application scenario of a generating method of an identification model is also provided. Specifically, after training an initial recognition model based on an image dataset after image cutting processing to obtain a target recognition model, the terminal equipment performs feature extraction on an image to be recognized through a feature layer in the target recognition model to obtain first data features; carrying out multi-scale information extraction on the first data features through a transmission layer in the target recognition model to obtain second data features; and then identifying the object to be identified in the image to be identified based on the second data characteristic to obtain a target identification result.
Optionally, fig. 3 shows a feature layer and a transmission layer in the target recognition model, and as can be seen from fig. 3, a P2 feature layer is added in the feature layer, and the feature layer can retain features of more small target objects, so that recognition accuracy of the target recognition model to the objects to be recognized is ensured. In addition, for the last layer output of the backbone network, a transmission layer is introduced to further process the features extracted by the P2 feature layer, more multi-scale information is extracted, and the overall recognition effect of the target recognition model is improved.
Alternatively, the object to be identified in the image to be identified may be a small target object, for example, an object having a pixel size smaller than 32 pixels×32 pixels. Taking fig. 3 as an example, after the image to be identified is obtained, the terminal device inputs the image to be identified into a feature layer in the target identification model so as to extract more features of the small target object; and then extracting the multi-scale information from the extracted features through the transmission layer to obtain multi-scale information (namely second data features). Finally, after the feature pyramid processing, predicting according to the output result of each layer of pyramid to obtain the identification result.
Example 4
According to an aspect of the present disclosure, there is also provided a generating apparatus of an identification model, wherein fig. 4 is a schematic diagram of the generating apparatus of an identification model, as shown in fig. 4, the apparatus includes: an acquisition module 401, a distribution determination module 403, a graph cut module 405, and a model training model 407.
The acquiring module 401 is configured to acquire an image dataset, perform anchor-free regression processing on the image dataset, and determine target frames of a plurality of images contained in the image dataset, where the target frames are used to identify an area of an object in a corresponding image; a distribution determining module 403, configured to determine, based on target frames of the plurality of images, distribution data of a duty ratio corresponding to the image data set, where the distribution data of the duty ratio characterizes distribution information of a region duty ratio of the target frames in the corresponding images in the image data set; the image cutting module 405 is configured to perform image cutting processing on the image data set according to the duty distribution data, so as to obtain an image data set after the image cutting processing; the model training model 407 is configured to train an initial recognition model based on the image dataset after the image segmentation process to obtain a target recognition model, where the initial recognition model is at least used for multi-scale feature extraction of the object in the image dataset after the image segmentation process.
It should be noted that, the above-mentioned obtaining module 401, the distribution determining module 403, the graph cutting module 405 and the model training model 407 correspond to the steps S102 to S108 of the above-mentioned embodiment, and the four modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above-mentioned embodiment.
Optionally, the acquiring module includes: the system comprises a prediction module, a statistical prediction module, an anchor frame adjustment module and a first determination module. The prediction module is used for predicting an initial anchor frame range corresponding to each image based on the positions of the objects in the images in the image data set in the corresponding images; the statistical prediction module is used for counting the initial anchor frame range to obtain anchor frame size distribution information corresponding to the image data set; the anchor frame adjusting module is used for adjusting the initial anchor frame range based on the anchor frame size distribution information to obtain a target anchor frame range; and the first determining module is used for determining the target frame corresponding to each image based on the target anchor frame range.
Optionally, the graph cutting module includes: the system comprises a first statistics module, a first processing module and a second processing module. The first statistics module is used for counting the corresponding number of the first type of area occupation ratios and the corresponding number of the second type of area occupation ratios of each image according to the occupation ratio distribution data, wherein the occupation ratio of the first type of area occupation ratio characterization area is smaller than or equal to a target threshold value, and the occupation ratio of the second type of area occupation ratio characterization area is larger than the target threshold value; the first processing module is used for determining a first type of image from the image data set, and carrying out image cutting processing on the first type of image to obtain a first type of image after image cutting processing, wherein the first type of image is an image with the number corresponding to the first type of area ratio being greater than or equal to the number corresponding to the second type of area ratio; the second processing module is used for determining a second type of image from the image data set, performing no image cutting processing on the second type of image, and generating an image data set after image cutting processing based on the first type of image and the second type of image after image cutting processing.
Optionally, the first processing module includes: the system comprises a first graph cutting module and a generating module. The first image cutting module is used for cutting each image in the first type of images into a plurality of sub-images based on the target anchor frame range; and the generation module is used for generating a first type image after the image cutting processing based on the plurality of sub-images corresponding to each image.
Optionally, the generating device of the identification model further includes: the device comprises a result acquisition module, a first identification module, a second identification module and a first adjustment module. The image processing module is used for processing the first type image and the second type image according to the first type image, and obtaining a first recognition result of the first type image after the image processing module is used for processing the second type image; the first recognition module is used for determining the recognition accuracy of the target recognition model for recognizing the first type of image based on the first recognition result to obtain first recognition accuracy; the second recognition module is used for determining the recognition accuracy of the target recognition model for recognizing the second class of images based on the second recognition result to obtain second recognition accuracy; and the first adjustment module is used for adjusting the parameters of the target recognition model based on the first recognition accuracy and/or the second recognition accuracy.
Optionally, the first adjustment module includes: the jigsaw module and the second adjusting module. The image processing module is used for processing the first recognition results corresponding to the plurality of sub-images in a jigsaw mode to obtain spliced images; and the second adjusting module is used for adjusting the parameters of the target recognition model based on the similarity between the spliced image and the original image, wherein the original image is an image before the corresponding cut image of the spliced image in the first type of image.
Optionally, the generating device of the identification model further includes: the device comprises a feature extraction module, an information extraction module and a third identification module. The feature extraction module is used for training the initial recognition model based on the image dataset after image cutting processing to obtain a target recognition model, and extracting features of the image to be recognized through a feature layer in the target recognition model to obtain first data features; the information extraction module is used for extracting multi-scale information of the first data features through a transmission layer in the target identification model to obtain second data features; and the third recognition module is used for recognizing the object to be recognized in the image to be recognized based on the second data characteristic to obtain a target recognition result.
Example 5
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 5 illustrates a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 includes a computing unit 501 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 505 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 501 performs the respective methods and processes described above, for example, a generation method of the recognition model. For example, in some embodiments, the method of generating the recognition model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the above-described generation method of the identification model may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the generation method of the recognition model in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (11)

1. A method of generating an identification model, comprising:
acquiring an image data set, carrying out anchor-free frame regression processing on the image data set, and determining target frames of a plurality of images contained in the image data set, wherein the target frames are used for identifying areas of objects in corresponding images;
determining corresponding proportion distribution data of the image data set based on target frames of the plurality of images, wherein the proportion distribution data represent distribution information of area proportion of the target frames in the corresponding images in the image data set;
Performing image cutting processing on the image data set according to the duty distribution data to obtain an image data set subjected to image cutting processing;
training an initial recognition model based on the image dataset after image cutting processing to obtain a target recognition model, wherein the initial recognition model is at least used for carrying out multi-scale feature extraction on objects in the image dataset after image cutting processing.
2. The method of claim 1, wherein performing anchor-free regression processing on the image dataset to determine target frames for a plurality of images contained in the image dataset comprises:
predicting an initial anchor frame range corresponding to each image based on the positions of objects in the images in the image data set in the corresponding images;
counting the initial anchor frame range to obtain anchor frame size distribution information corresponding to the image dataset;
adjusting the initial anchor frame range based on the anchor frame size distribution information to obtain a target anchor frame range;
and determining the target frame corresponding to each image based on the target anchor frame range.
3. The method of claim 2, wherein performing a slicing process on the image dataset according to the duty cycle distribution data to obtain a sliced image dataset comprises:
Counting the corresponding number of the first type region duty ratio and the corresponding number of the second type region duty ratio of each image according to the duty ratio distribution data, wherein the first type region duty ratio represents that the region duty ratio is smaller than or equal to a target threshold value, and the second type region duty ratio represents that the region duty ratio is larger than the target threshold value;
determining a first type of image from the image data set, and performing image cutting processing on the first type of image to obtain a first type of image subjected to image cutting processing, wherein the first type of image is an image with the number corresponding to the first type of area ratio being greater than or equal to the number corresponding to the second type of area ratio;
and determining a second type of image from the image data set, performing no image cutting processing on the second type of image, and generating the image data set after image cutting processing based on the first type of image after image cutting processing and the second type of image.
4. A method according to claim 3, wherein performing a cropping process on the first type of image to obtain a first type of image after the cropping process comprises:
dividing each image in the first type of images into a plurality of sub-images based on the target anchor frame range;
And generating the first type image after the image cutting processing based on the plurality of sub-images corresponding to each image.
5. The method of claim 4, wherein after training an initial recognition model based on the cut-map processed image dataset to obtain a target recognition model, the method further comprises:
acquiring a first recognition result of the target recognition model for recognizing the first type of image after the image cutting processing and a second recognition result of the target recognition model for recognizing the second type of image;
determining the recognition accuracy of the target recognition model for recognizing the first type of image based on the first recognition result to obtain first recognition accuracy;
determining the recognition accuracy of the target recognition model for recognizing the second type of image based on the second recognition result to obtain second recognition accuracy;
parameters of the target recognition model are adjusted based on the first recognition accuracy and/or the second recognition accuracy.
6. The method of claim 5, wherein adjusting parameters of the object recognition model based on the first recognition accuracy comprises:
Performing jigsaw processing on the first recognition results corresponding to the plurality of sub-images to obtain spliced images;
and adjusting parameters of the target recognition model based on the similarity between the spliced image and an original image, wherein the original image is an image before the image is cut corresponding to the spliced image in the first type of image.
7. The method of claim 1, wherein after training an initial recognition model based on the cut-map processed image dataset to obtain a target recognition model, the method further comprises:
extracting features of the image to be identified through a feature layer in the target identification model to obtain first data features;
carrying out multi-scale information extraction on the first data features through a transmission layer in the target recognition model to obtain second data features;
and identifying the object to be identified in the image to be identified based on the second data characteristic to obtain a target identification result.
8. A generation apparatus of an identification model, comprising:
the acquisition module is used for acquiring an image data set, carrying out anchor-free frame regression processing on the image data set, and determining target frames of a plurality of images contained in the image data set, wherein the target frames are used for identifying the areas of objects in the corresponding images;
The distribution determining module is used for determining the corresponding proportion distribution data of the image data set based on the target frames of the plurality of images, wherein the proportion distribution data represent the distribution information of the area proportion of the target frames in the corresponding images in the image data set;
the image cutting module is used for carrying out image cutting processing on the image data set according to the duty distribution data to obtain an image data set subjected to image cutting processing;
the model training model is used for training an initial recognition model based on the image dataset after the image cutting processing to obtain a target recognition model, wherein the initial recognition model is at least used for carrying out multi-scale feature extraction on objects in the image dataset after the image cutting processing.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating the identification model of any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the generation method of the recognition model according to any one of claims 1 to 7.
11. A computer program product comprising a computer program which, when executed by a processor, implements a method of generating an identification model according to any one of claims 1 to 7.
CN202310031651.6A 2023-01-10 2023-01-10 Identification model generation method and device, electronic equipment and storage medium Pending CN116091785A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310031651.6A CN116091785A (en) 2023-01-10 2023-01-10 Identification model generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310031651.6A CN116091785A (en) 2023-01-10 2023-01-10 Identification model generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116091785A true CN116091785A (en) 2023-05-09

Family

ID=86211608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310031651.6A Pending CN116091785A (en) 2023-01-10 2023-01-10 Identification model generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116091785A (en)

Similar Documents

Publication Publication Date Title
CN112801164A (en) Training method, device and equipment of target detection model and storage medium
CN112633276B (en) Training method, recognition method, device, equipment and medium
CN113033537A (en) Method, apparatus, device, medium and program product for training a model
CN112989995B (en) Text detection method and device and electronic equipment
CN112861885B (en) Image recognition method, device, electronic equipment and storage medium
CN113869449A (en) Model training method, image processing method, device, equipment and storage medium
CN112488060B (en) Target detection method, device, equipment and medium
CN113947188A (en) Training method of target detection network and vehicle detection method
CN112528858A (en) Training method, device, equipment, medium and product of human body posture estimation model
CN114449343A (en) Video processing method, device, equipment and storage medium
CN113688887A (en) Training and image recognition method and device of image recognition model
CN117392733A (en) Acne grading detection method and device, electronic equipment and storage medium
CN113011345B (en) Image quality detection method, image quality detection device, electronic equipment and readable storage medium
CN113379750A (en) Semi-supervised learning method of semantic segmentation model, related device and product
CN113705380A (en) Target detection method and device in foggy days, electronic equipment and storage medium
CN113269280A (en) Text detection method and device, electronic equipment and computer readable storage medium
CN116168132B (en) Street view reconstruction model acquisition method, device, equipment and medium
CN115482248B (en) Image segmentation method, device, electronic equipment and storage medium
CN115457329B (en) Training method of image classification model, image classification method and device
CN113139483B (en) Human behavior recognition method, device, apparatus, storage medium, and program product
CN116091785A (en) Identification model generation method and device, electronic equipment and storage medium
CN115019057A (en) Image feature extraction model determining method and device and image identification method and device
CN113936158A (en) Label matching method and device
CN114049518A (en) Image classification method and device, electronic equipment and storage medium
CN114092739B (en) Image processing method, apparatus, device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination