CN116994085A

CN116994085A - Image sample screening method, model training method, device and computer equipment

Info

Publication number: CN116994085A
Application number: CN202310771636.5A
Authority: CN
Inventors: 杨艳鑫; 郑影; 李志涛; 王湾湾; 杨恒; 王杨俊杰
Original assignee: Zhongdian Jinxin Software Co Ltd
Current assignee: Zhongdian Jinxin Software Co Ltd
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-11-03

Abstract

The application relates to an image sample screening method, a model training method, a device, a computer device, a storage medium and a computer program product. The method comprises the following steps: acquiring a first image sample set; inputting the first image sample set into an image processing model, and extracting the characteristics of each first image sample in the first image sample set through the image processing model to obtain the corresponding scale characteristics of each first image sample; performing feature stitching on the scale features corresponding to each first image sample to obtain a target descriptor corresponding to each first image sample; screening the first image sample set based on the target descriptor to obtain a second image sample set; the second image sample set is used for generating a model training sample after labeling. By adopting the method, the efficiency of acquiring the target image sample set can be improved.

Description

Image sample screening method, model training method, device and computer equipment

Technical Field

The present application relates to the field of computer vision, and in particular, to an image sample screening method, a model training method, an apparatus, a computer device, a storage medium, and a computer program product.

Background

Along with the development of the deep learning technology, in an image recognition task, the accuracy and the speed of image recognition of an image to be detected by an image recognition model are remarkably improved due to a large number of image samples carrying label marks.

In the prior art, a large number of unlabeled image samples are collected through a camera, after a large number of unlabeled image samples are collected, firstly screening the unlabeled image samples according to an active learning method to obtain screened image samples, and then manually labeling the screened image samples to obtain the image samples with label labels for an image recognition model.

However, at present, the active learning method is directly applied to a large number of unlabeled image samples collected by the camera, and the image training samples are obtained, so that the current image training samples are low in obtaining efficiency.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an image sample screening method, a model training method, an apparatus, a computer device, a computer-readable storage medium, and a computer program product.

In a first aspect, the present application provides a method for screening an image sample. The method comprises the following steps:

Acquiring a first image sample set;

inputting the first image sample set into an image processing model, and extracting features of each first image sample in the first image sample set through the image processing model to obtain scale features corresponding to each first image sample;

performing feature stitching on the scale features corresponding to each first image sample to obtain a target descriptor corresponding to each first image sample;

screening the first image sample set based on the target descriptor to obtain a second image sample set; the second set of image samples is used to generate model training samples.

In one embodiment, the inputting the first image sample set into an image processing model, and performing feature extraction on each first image sample in the first image sample set through the image processing model to obtain a scale feature corresponding to each first image sample, includes:

inputting the first image sample set into an image processing model, and extracting the characteristics of each first image sample in the first image sample set through a backbone network and a characteristic fusion layer of the image processing model to obtain a plurality of scale characteristics corresponding to each first image sample.

In one embodiment, the feature stitching is performed on the scale feature corresponding to each first image sample to obtain a target descriptor corresponding to each first image sample, including:

performing dimension reduction processing on a plurality of scale features corresponding to each first image sample to obtain a plurality of first descriptors corresponding to each first image sample;

and splicing the plurality of first descriptors corresponding to each first image sample to obtain a target descriptor corresponding to each first image sample.

In one embodiment, the backbone network and feature fusion layer of the image processing model is connected to a global pooling layer; performing dimension reduction processing on the scale features corresponding to each first image sample to obtain a plurality of first descriptors corresponding to each first image sample, including:

and carrying out global pooling on the plurality of scale features according to a global pooling layer of the image processing model aiming at the plurality of scale features corresponding to each first image sample to obtain first descriptors corresponding to the plurality of scale features corresponding to each first image sample.

In one embodiment, the filtering the first image sample set based on the target descriptor and the active learning algorithm to obtain a second image sample set includes:

Screening the first image sample set according to a preset similarity calculation algorithm and the target descriptor to obtain a second image sample set;

the method further comprises the steps of:

screening the second image sample set based on an active learning algorithm to obtain a target image sample set;

and labeling the target image sample set to obtain a model training sample.

In one embodiment, the filtering the first image sample set based on the target descriptor according to a preset similarity calculation algorithm to obtain a second image sample set includes:

traversing the first image sample set, and taking any one of the first image samples as a reference image sample;

calculating the similarity between the target descriptor of the reference image sample and the target descriptors of other image samples in the first image sample set based on a similarity calculation algorithm;

and eliminating the target descriptors of the other image samples with the similarity larger than the similarity threshold and the other image samples to obtain a second image sample set.

In a second aspect, the application further provides a model training method. The method comprises the following steps:

Obtaining a model training sample; the model training sample is obtained based on the target image sample set; the target image sample set is obtained by the method of the first aspect;

and inputting the model training sample into a target model, and carrying out model training on the target model to obtain a trained target model.

In a third aspect, the application further provides an image sample screening device. The device comprises:

the first acquisition module is used for acquiring a first image sample set;

the feature extraction module is used for inputting the first image sample set into an image processing model, and extracting features of each first image sample in the first image sample set through the image processing model to obtain scale features corresponding to each first image sample;

the feature stitching module is used for performing feature stitching on the scale features corresponding to each first image sample to obtain a target descriptor corresponding to each first image sample;

the first screening module is used for screening the first image sample set based on the target descriptor to obtain a second image sample set; the second set of image samples is used to generate model training samples.

In one embodiment, the feature extraction module is specifically configured to:

In one embodiment, the feature stitching module is specifically configured to:

and splicing the plurality of first descriptors corresponding to each first image sample to obtain a target descriptor corresponding to each first image sample. In one embodiment, the feature stitching module is specifically configured to:

In one embodiment, the first screening module is specifically configured to:

the apparatus further comprises:

the second screening module is used for screening the second image sample set based on an active learning algorithm to obtain a target image sample set;

and labeling the target image sample set to obtain a model training sample.

In one embodiment, the first screening module is specifically configured to:

In a fourth aspect, the application further provides a model training device. The device comprises:

The second acquisition module is used for acquiring a model training sample; the model training sample is obtained based on the target image sample set; the target image sample set is obtained by the method of the first aspect;

and the training module is used for inputting the model training sample into the image detection model of the target, and carrying out model training on the image detection model to obtain a trained image detection model.

In a fifth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

acquiring a first image sample set;

In a sixth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a seventh aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

acquiring a first image sample set;

In an eighth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a ninth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

acquiring a first image sample set;

In a tenth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

According to the image sample screening method, the model training method, the device, the computer equipment, the storage medium and the computer program product, the scale features corresponding to the first image samples are obtained by extracting the features of the first image sample sets, the scale features corresponding to the first image samples are subjected to feature fusion to obtain the discriminative target descriptors corresponding to each first image sample in the first image sample sets, the first image sample sets are screened based on the discriminative target descriptors and the active learning algorithm to obtain the second image sample sets, and the efficiency of obtaining the target image sample sets can be improved.

Drawings

FIG. 1 is a diagram of an application environment of an image sample screening method in one embodiment;

FIG. 2 is a flow diagram of a method for obtaining a target descriptor in one embodiment;

FIG. 3 is a flow diagram of a max-pooling step in one embodiment;

FIG. 4 is a flow chart of a method for determining a target image sample in one embodiment;

FIG. 5 is a flowchart of a method for screening a first image sample based on a preset similarity calculation algorithm and a target descriptor in an embodiment;

FIG. 6 is a flow diagram of a model training method in one embodiment;

FIG. 7 is a flowchart of an example of an image sample screening method in one embodiment;

FIG. 8 is a block diagram showing the construction of an image sample screening apparatus according to an embodiment;

FIG. 9 is a block diagram of a model training device in one embodiment;

fig. 10 is an internal structural view of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In one embodiment, as shown in fig. 1, an image sample screening method is provided, where the method is applied to a terminal to illustrate the method, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server.

In this embodiment, the method includes the steps of:

step 102, a first set of image samples is acquired.

Wherein the first image sample set comprises a plurality of first image samples;

in the embodiment of the application, the terminal acquires images through the camera, acquires a plentiful and large number of non-marked images, and constructs a first image sample set according to the plentiful and non-marked images acquired by the camera. Optionally, the terminal may periodically perform frame extraction on the video data in the video data collected by the camera, for example, frame extraction is performed every second in the video data, so as to obtain a non-labeled image and construct a first image sample set.

Optionally, the terminal may perform data cleaning or data enhancement processing on the non-labeling image, so as to improve the quality of the first image sample set.

Step 104, inputting the first image sample set into an image processing model, and extracting features of each first image sample in the first image sample set through the image processing model to obtain scale features corresponding to each first image sample.

In the embodiment of the application, the image processing model may be image processing based on a model structure of a YOLO model (You Only Look Once, an object detection model), or may be image processing based on other object detection models, for example, an R-CNN model (Region-based Convolutional Neural Network, an object detection model), an SSD model (Single Shot Multibox Detector, an object detection model), and the like.

In the embodiment of the application, an image processing model is taken as a YOLO model for illustration, a terminal inputs first image samples into the image processing model, and feature extraction is carried out on each first image sample according to the image processing model to obtain a plurality of scale features corresponding to each first image sample.

Optionally, the terminal may process a plurality of first image samples simultaneously, or may process one first image sample, and the terminal inputs the first image sample set into the YOLO model by using the terminal as an example, and the terminal performs feature extraction on one first image sample in the first image sample set each time until feature extraction on each first image sample in the first image sample set is completed, so as to obtain a plurality of scale features corresponding to each first image sample.

Optionally, before screening the first image sample set, the terminal may perform preliminary training on the target detection model or the image classification model for constructing the image processing model according to a small number of training samples carrying labels corresponding to the task type of the image processing, so as to obtain an image processing model for completing the preliminary training.

And 106, performing feature stitching on the scale features corresponding to each first image sample to obtain a target descriptor corresponding to each first image sample.

In the embodiment of the application, the terminal firstly performs dimension reduction processing on the scale features corresponding to each first image sample, and then performs feature stitching on the scale features corresponding to the first image samples after the dimension reduction processing to obtain the target descriptors corresponding to each first image sample. The scale features of different dimensions comprise semantic information and position information of different levels, and the scale features of different dimensions are spliced to obtain a target descriptor which can describe the semantic information and the position information contained in the corresponding first image sample in a finer granularity.

And step 108, screening the first image sample set based on the target descriptor to obtain a second image sample set.

The second image sample set is used for generating a model training sample after labeling.

In the embodiment of the application, the terminal firstly calculates the similarity of each first image sample based on the target descriptor, then screens the first image samples in the first image sample set based on the similarity of each two first image samples and the active learning algorithm, determines the second image samples, and forms a target image sample set according to the second image samples.

In the image sample screening method, the first image sample set is subjected to feature extraction to obtain the scale features corresponding to the first image samples, the scale features corresponding to the first image samples are subjected to feature fusion to obtain the target descriptors with discriminant corresponding to each first image sample in the first image sample set, and the first image sample set is screened based on the target descriptors with discriminant and the active learning algorithm to obtain the second image sample set, so that the efficiency of acquiring the target image sample set can be improved.

In one embodiment, in step 104, the first image sample set is input into an image processing model, and feature extraction is performed on each first image sample in the first image sample set by using the image processing model to obtain a scale feature corresponding to each first image sample, including:

In the embodiment of the application, the imageThe processing model can be a model structure based on different computer vision models, wherein the output scale of the feature fusion structure of the different computer vision models is different. The example is illustrated by the YOLO model in the computer vision model, which includes a backbone and a neg. And the terminal performs feature extraction on the first image sample through a backhaul. The backup comprises a multi-layer convolutional neural network, the terminal can initially obtain feature expressions of multiple layers of the first image samples, and then the terminal performs feature fusion on the multi-layer feature expressions in the first image samples according to the neg to obtain multiple scale features of each first image sample. The embodiment of the application is described by taking the scale features of the first image samples as three, and the dimensional expression of the scale features of each first image sample is (N, C) ₁ ，H ₁ ，W ₁ )、(N，C ₂ ，H ₂ ，W ₂ )、(N，C ₃ ，H ₃ ，W ₃ ) Wherein, N is the batch size (a parameter in the YOLO model) of the YOLO model, and represents the number of first image samples processed by the YOLO model at the same time; c (C) ₁ 、C ₂ 、C ₃ Dimension channels H of convolution layers corresponding to three dimension features respectively ₁ 、H ₂ 、H ₃ The heights of the first image samples after the three dimension features pass through the corresponding convolution layers respectively; w (W) ₁ 、W ₂ 、，W ₃ The width of the first image sample after the three dimensional features pass through the corresponding convolution layers, respectively.

In this embodiment, feature extraction is performed on the first image samples by using the image detection model, so that scale features of each first image sample can be obtained, the scale features can reflect feature differences of each first image sample, so that the terminal can determine the target descriptor according to the scale features with feature discrimination, and further screen the first image sample set based on the target descriptor with discrimination and the active learning algorithm, thereby improving the efficiency of obtaining the target image sample set.

In one embodiment, as shown in fig. 2, in step 106, feature stitching is performed on the scale features corresponding to each first image sample to obtain a target descriptor corresponding to each first image sample, including:

step 202, performing dimension reduction processing on a plurality of scale features corresponding to each first image sample to obtain a plurality of first descriptors corresponding to each first image sample.

In the embodiment of the application, the terminal performs dimension reduction processing on a plurality of scale features corresponding to each first image sample according to the image processing model, specifically, the terminal averages or takes the maximum value of each scale feature to obtain a plurality of one-dimensional first descriptors corresponding to the first image sample.

And 204, splicing the first descriptors corresponding to each first image sample to obtain the target descriptor corresponding to each first image sample.

In the embodiment of the present application, the terminal splices the first descriptors corresponding to each first image sample according to a specific splicing method, for example, the terminal concatenates a plurality of first descriptors corresponding to each first image sample to obtain a target descriptor corresponding to each first image sample, and uses three first descriptors included in each first image sample as an example for explanation, a dimension channel of the target descriptor is (N, C ₁ + ₂ +C ₃ ) Wherein N is the number of first image samples processed simultaneously by the image processing model, C ₁ 、C ₂ 、C ₃ And the dimension channels are respectively convolution layers corresponding to three first descriptors corresponding to the first image samples.

In this embodiment, different first descriptors corresponding to each first image sample have features with different scales, the features with different scales can reflect semantic information and position information with different levels, and a target descriptor corresponding to each image sample is obtained by stitching a plurality of first descriptors corresponding to the first image sample, where the target descriptor can describe feature information contained in the first image sample that is currently input in a finer granularity.

In one embodiment, the dimension reduction processing is performed on the plurality of scale features corresponding to each first image sample in step 202, to obtain a plurality of first descriptors corresponding to each first image sample, including:

The image processing system comprises a main network and a feature fusion layer of an image processing model, and is connected to a global pooling layer.

In the embodiment of the present application, the terminal pools the plurality of scale features corresponding to each first image sample according to the global pooling layer of the image processing model to obtain the plurality of scale features corresponding to each first image sample, for example, the first image sample may include three first descriptors, and the dimension channels of each first descriptor are (N, C ₁ )、(N，C ₂ )、(N，C ₃ )。

Optionally, the global pooling layer may be one of an average pooling layer and a maximum pooling layer, and the corresponding global pooling layer is selected according to a task type of image processing. If the task type of the image processing is image classification, the global pooling layer may be an average pooling layer, the average pooling operation may average the global areas of the plurality of scale features as the values of the corresponding positions of the output feature map, and the spatial dimensions of the plurality of scale features are reduced by performing average processing on the global areas so as to reduce the number of parameters. For example, a local region within the global region of the input scale feature may correspond to background information of the image, and insignificant low frequency features within the local region may be suppressed by the averaging pooling operation.

If the task type of the image processing is image detection, the global pooling layer may be a maximum pooling layer, as shown in fig. 3, the maximum pooling operation may take the maximum value of the global area of the multiple scale features as the first descriptor corresponding to the output scale feature, and the most significant feature of the global area is reserved by selecting the maximum value, so as to inhibit the unimportant feature, thereby improving the robustness and representativeness of each scale feature. For example, a local region within a global region of the input scale features may correspond to a feature of an object, and the most representative feature within the local region may be preserved by the max pooling operation.

In this embodiment, the pooling process is performed on the plurality of scale features corresponding to the first image sample according to the global pooling layer, and by extracting the maximum value or the average value of the active values in a certain area in the scale features, a certain invariance is provided to the tiny change of the input data, so that the one-dimensional first descriptor corresponding to each scale feature can be obtained, and meanwhile, the robustness is provided to the noise contained in the input scale feature.

In one embodiment, as shown in fig. 4, the filtering the first image sample set based on the target descriptor in step 108 to obtain a second image sample set includes:

Step 402, screening the first image sample set according to a preset similarity calculation algorithm and the target descriptor to obtain a second image sample set.

In the embodiment of the application, a terminal determines a preset similarity calculation algorithm according to the task type of image processing, calculates the similarity between first image samples according to the target descriptor corresponding to the first image samples, screens a first image sample set based on the similarity of every two first image samples, and filters the first image samples which do not meet the similarity condition to obtain a second image sample set.

Optionally, when the terminal screens the first image sample set, an acceleration algorithm may be applied to improve the screening efficiency, for example, an acceleration algorithm such as a k-d tree.

In one embodiment, the method further comprises:

step 404, screening the second image sample set based on the active learning algorithm to obtain a target image sample set.

The active learning algorithm comprises an active learning algorithm constructed based on a reference model such as FCOS (Fully Convolutional One-Stage Object Detection, a single-stage target detection algorithm based on a full convolution network), faster R-CNN (a target detection algorithm) and the like, for example, an active learning algorithm such as uncertainty sampling, maximum interval sampling, information entropy sampling and the like, and the active learning algorithm is not limited.

In the embodiment of the application, after the terminal performs preliminary screening on the first image sample set to obtain the second image sample set, the second image sample set is input into the active learning algorithm, the active learning method can sort the information quantity of the second image sample based on the preset measurement index, then the second image sample with the preset quantity before the sorting is selected, and finer screening on the second image sample set is completed, so that the target image sample set for marking is obtained.

And 406, labeling the target image sample set to obtain a model training sample.

In the embodiment of the application, the terminal performs labeling processing on the target image sample set according to an automatic labeling model or algorithm to obtain a model training sample. Specifically, the terminal may automatically label the target image sample set based on a predefined rule or template, for example, for a target with a specific shape or color, automatic labeling may be implemented by using some rules or templates, and labeling effect may be improved by continuously optimizing the rules or templates. The terminal can also analyze and process the target image sample set based on the image processing technology, determine targets existing in each sample image of the target image sample set, and automatically label the targets. For example, the object may be separated by an image segmentation algorithm and then automatically labeled using morphological analysis, perimeter, area, etc.

In this embodiment, the first image sample set is screened by using the object descriptor with discriminant property corresponding to the first image sample and the preset similarity calculation algorithm, so that the first image sample with higher similarity can be initially filtered out, and the second image sample set is obtained. Specifically, the second image sample set is used for finer screening by using the active learning algorithm, and the efficiency of the active learning algorithm for screening the image samples can be improved while the target image sample set with incremental improvement on the training effect of the target model can be obtained.

In one embodiment, as shown in fig. 5, in step 402, the filtering the first image sample set according to the preset similarity calculation algorithm and the target descriptor to obtain a second image sample set includes:

step 502, traversing the first image sample set, and taking any first image sample as a reference image sample.

In the embodiment of the application, the terminal traverses each first image sample in the first image sample set, and any one first image sample is selected from a plurality of first image samples to be used as a reference image sample.

Step 504, based on a similarity calculation algorithm, calculates a similarity of the target descriptor of the reference image sample with the target descriptors of other image samples in the first image sample set.

In the embodiment of the present application, the preset similarity calculation algorithm may be an euclidean distance (L2 norm) algorithm, a cosine similarity algorithm, or other similarity calculation algorithms, and the terminal calculates the similarity between the target descriptor of the reference image sample and the target descriptors of other image samples in the first image sample set according to the preset similarity calculation algorithm. The L2 norm algorithm is used for calculating the similarity between the relatively dense target descriptors, the cosine similarity algorithm is used for calculating the similarity between the relatively sparse target descriptors, and the terminal can select a preset similarity calculation algorithm according to the task type of image processing.

And step 506, eliminating the target descriptors and other image samples of other image samples with similarity greater than the similarity threshold to obtain a second image sample set.

In the embodiment of the application, the terminal traverses the similarity between the target descriptors of the reference image sample and the target descriptors of other image samples by adopting a cyclic traversal method, and if the similarity between the target descriptors of other image samples and the target descriptors of the reference image sample is larger than the similarity threshold value, the characteristic of the other image sample and the reference image sample is close, so that the target descriptors of other image samples with the similarity larger than the similarity threshold value are eliminated, and a second image sample set is obtained. Specifically, the other image samples are eliminated in a cyclic traversal mode, and after the similarity between the target descriptor of the current reference image sample and the target descriptors of other image samples is calculated and obtained, the target descriptions of other image samples with the similarity greater than the similarity threshold and other image samples can be directly eliminated.

In an alternative embodiment, the terminal may preset a similarity calculation algorithm according to a sample path list of the first image sample and an nxk column scale feature matrix, and the similarity threshold value obtains a second image sample set, for example, the terminal traverses each scale feature in the scale feature matrix, and calculates the similarity of the scale feature corresponding to the reference image sample and the scale feature corresponding to other image samples according to the similarity calculation algorithm; the terminal compares the relation between the similarity and the similarity threshold, marks other positions corresponding to the similarity larger than (smaller than) the similarity threshold as True, and marks False otherwise; based on the mask value, the terminal eliminates the scale features which do not meet the similarity requirement in the scale feature matrix to obtain a new scale feature matrix; repeating the steps until the last scale feature is traversed; the terminal obtains a second image sample set based on the mask value and the sample path list.

In this embodiment, according to a preset similarity calculation algorithm, the similarity between the first image samples is calculated through traversal, and the target descriptions of other image samples with the similarity greater than the similarity threshold and other image samples are directly eliminated, so that repeated calculation of the similarity between the first image samples can be avoided, complexity of similarity calculation between the first image samples is reduced, and efficiency of screening the first image sample set is improved.

In one embodiment, as shown in fig. 6, there is provided a training method of an image detection model, the method comprising:

step 602, obtaining a model training sample.

The model training sample is obtained based on a target image sample set; the target image sample set is obtained by any one of the embodiments of the image sample screening method.

In the embodiment of the application, the terminal outputs the target image sample set on the display interface, responds to the labeling operation of a user on each target image sample in the target image sample set, obtains the labeled target image sample set, and takes the labeled target image sample set as a model training sample for model training after the terminal obtains the labeled target image sample set.

Step 604, inputting a model training sample into the target model, and performing model training on the target model to obtain a trained target model.

The target model may be a target detection model or an image classification model for constructing an image processing model.

In the embodiment of the application, a model training sample is input into a target model by a terminal, the output result of the target model is compared with the labeling information of the model training sample, the loss function of the target model on the model training sample is calculated, the network parameters of the target model are counter-propagated according to the loss function, and the gradient of the parameters is calculated. And the terminal updates and optimizes the network parameters of the target model according to an optimization algorithm such as random gradient descent, and the like, repeatedly calculates a loss function of the target model and updates and optimizes the network parameters until the target model meets preset iteration conditions, for example, the target model converges or meets preset iteration times, and training of the target model is stopped to obtain the trained target model.

Optionally, the terminal may perform model training on the target model to obtain a trained target model, and may also perform iterative upgrade on the trained target model.

In this embodiment, the model training sample obtained by using the image sample screening method is used to train the target model, so as to avoid subjectivity of manually screening the first image sample set, and the first image sample set is screened based on scale features by using the image processing model, so that the target image sample set with diversity for the target model can be screened and obtained, and meanwhile, the efficiency of screening the image sample is improved.

In one embodiment, as shown in fig. 7, an example of an image sample screening method is provided, the method comprising:

step 701, selecting an image detection model required by a specific task;

step 702, inputting a first image sample set into an image processing model, and extracting features of each first image sample in the first image sample set through a backbone network and a feature fusion layer of the image processing model to obtain a plurality of scale features corresponding to each first image sample;

step 703, performing global pooling on the plurality of scale features according to the global pooling layer of the image processing model for the plurality of scale features corresponding to each first image sample to obtain a first descriptor corresponding to the plurality of scale features corresponding to each first image sample;

Step 704, stitching the plurality of first descriptors corresponding to each first image sample to obtain a target descriptor corresponding to each first image sample;

step 705, traversing the first image sample set, taking any first image sample as a reference image sample, and calculating the similarity between the target descriptor of the reference image sample and the target descriptors of other image samples in the first image sample set based on a similarity calculation algorithm;

step 706, eliminating the target descriptors and other image samples of other image samples with similarity greater than the similarity threshold to obtain a second image sample set;

step 707, screening the second image sample set based on the active learning algorithm to obtain a target image sample set.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an image sample screening device for realizing the above-mentioned image sample screening method, and a model training device for realizing the above-mentioned model training method. The implementation schemes of the image sample screening device and the model training device for solving the problems are similar to those described in the above method, so the specific limitations in the embodiments of the image sample screening device and the model training device provided below can be referred to the above limitations of the image sample screening method and the model training method, and are not repeated herein.

In one embodiment, as shown in fig. 8, there is provided an image sample screening apparatus 800 comprising: a first obtaining module 801, a feature extracting module 802, a feature stitching module 803 and a first screening module 804, wherein:

a first obtaining module 801, configured to obtain a first image sample set;

the feature extraction module 802 is configured to input the first image sample set into an image processing model, and perform feature extraction on each first image sample in the first image sample set through the image processing model to obtain a scale feature corresponding to each first image sample;

The feature stitching module 803 is configured to perform feature stitching on the scale feature corresponding to each first image sample, so as to obtain a target descriptor corresponding to each first image sample;

a first screening module 804, configured to screen the first image sample set based on the target descriptor to obtain a second image sample set; the second image sample set is used for generating a model training sample after labeling.

In one embodiment, the feature extraction module 802 is specifically configured to:

In one embodiment, the feature stitching module 803 is specifically configured to:

and splicing the plurality of first descriptors corresponding to each first image sample to obtain the target descriptor corresponding to each first image sample. In one embodiment, the feature stitching module is specifically configured to:

In one embodiment, the first screening module 804 is specifically configured to:

screening the first image sample set according to a preset similarity calculation algorithm and a target descriptor to obtain a second image sample set;

the apparatus 800 further comprises:

the second screening module is used for screening the second image sample set based on the active learning algorithm to obtain a target image sample set;

and labeling the target image sample set to obtain a model training sample.

traversing the first image sample set, and taking any first image sample as a reference image sample;

and eliminating target descriptors and other image samples of other image samples with similarity greater than the similarity threshold value to obtain a second image sample set.

In one embodiment, as shown in fig. 9, there is provided a model training apparatus 900 comprising: a second acquisition module 901 and a training module 902, wherein:

a second obtaining module 901, configured to obtain a model training sample; the model training sample is obtained based on the labeling of the target image sample set; the target image sample set is obtained by any one of the embodiments of the image sample screening method;

the training module 902 is configured to input a model training sample to a target model, and perform model training on the target model to obtain a trained target model.

The respective modules of the above-described image sample screening apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

The various modules in the model training apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing image sample data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by the processor implements an image sample screening method, or the computer program when executed by the processor implements a model training method.

It will be appreciated by those skilled in the art that the structure shown in FIG. 10 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

acquiring a first image sample set;

inputting the first image sample set into an image processing model, and extracting the characteristics of each first image sample in the first image sample set through the image processing model to obtain the corresponding scale characteristics of each first image sample;

screening the first image sample set based on the target descriptor to obtain a second image sample set; the second image sample set is used for generating a model training sample after labeling.

In one embodiment, the processor when executing the computer program further performs the steps of:

and splicing the plurality of first descriptors corresponding to each first image sample to obtain the target descriptor corresponding to each first image sample.

and labeling the target image sample set to obtain a model training sample.

obtaining a model training sample; the model training sample is obtained based on the labeling of the target image sample set; the target image sample set is obtained by any one of the embodiments of the image sample screening method;

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of screening an image sample, the method comprising:

acquiring a first image sample set;

2. The method according to claim 1, wherein the inputting the first image sample set into an image processing model, performing feature extraction on each first image sample in the first image sample set by using the image processing model, to obtain a scale feature corresponding to each first image sample, includes:

3. The method according to claim 1, wherein the performing feature stitching on the scale feature corresponding to each first image sample to obtain the target descriptor corresponding to each first image sample includes:

4. A method according to claim 3, wherein the backbone network and feature fusion layer of the image processing model is connected to a global pooling layer; performing dimension reduction processing on the scale features corresponding to each first image sample to obtain a plurality of first descriptors corresponding to each first image sample, including:

5. The method of claim 1, wherein the screening the first set of image samples based on the target descriptor to obtain a second set of image samples comprises:

the method further comprises the steps of:

and labeling the target image sample set to obtain a model training sample.

6. The method of claim 5, wherein the screening the first image sample set based on the target descriptor according to a preset similarity calculation algorithm to obtain a second image sample set includes:

7. A method of model training, the method comprising:

Obtaining a model training sample; the model training sample is obtained based on the target image sample set; the target image sample set is obtained by the method of any one of the preceding claims 1 to 6;

8. An image sample screening apparatus, the apparatus comprising:

the first acquisition module is used for acquiring a first image sample set;

9. A model training apparatus, the apparatus comprising:

the second acquisition module is used for acquiring a model training sample; the model training sample is obtained based on the target image sample set; the target image sample set is obtained by the method of any one of the preceding claims 1 to 6;

and the training module is used for inputting the model training sample into a target model, and carrying out model training on the target model to obtain a trained target model.

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.