CN114155366B

CN114155366B - Dynamic cabinet image recognition model training method and device, electronic equipment and medium

Info

Publication number: CN114155366B
Application number: CN202210115424.7A
Authority: CN
Inventors: 邓博洋; 程杨武
Original assignee: Beijing Missfresh Ecommerce Co Ltd
Current assignee: Wuhan Qinggouyun Technology Co ltd
Priority date: 2022-02-07
Filing date: 2022-02-07
Publication date: 2022-05-20
Anticipated expiration: 2042-02-07
Also published as: CN114155366A

Abstract

The embodiment of the disclosure discloses a dynamic cabinet image recognition model training method and device, electronic equipment and a medium. One embodiment of the method comprises: acquiring a first dynamic cabinet image; inputting the first dynamic cabinet image into a first image feature extraction network trained in advance to obtain a first image feature vector; determining at least one first detection frame corresponding to the first dynamic cabinet image; generating a third dynamic cabinet image according to the first dynamic cabinet image, the first image feature vector, a pre-stored second dynamic cabinet image set, a pre-stored first article image set and at least one first detection frame; and taking the third dynamic cabinet image as a training image sample, and training a dynamic cabinet image recognition model by using a machine learning model training method to obtain the trained dynamic cabinet image recognition model. The embodiment can generate the dynamic cabinet image recognition model with more accurate recognition effect.

Description

Dynamic cabinet image recognition model training method and device, electronic equipment and medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a dynamic cabinet image recognition model training method and device, electronic equipment and a medium.

Background

Currently, target detection models have been widely used in various fields. In the current target detection model training process, the problem that the target detection model is not enough corresponding to training image samples often exists. For generating more training image samples, the following method is generally adopted: more training image samples are generated by means of data enhancement (e.g., image rotation, image shift, image scaling, etc.).

However, when the training image samples are generated in the above manner, the following technical problems often exist:

firstly, the training image sample pattern generated by data enhancement is single, and cannot provide more characteristic information for the training of the subsequent target detection model, so that the prediction accuracy of the target detection model after the subsequent training is low.

Second, the feature extraction network that extracts the first dynamic cabinet image often requires a label for the first dynamic cabinet image. In real life, a large number of first dynamic cabinet images are often required to train a target detection model. A large amount of time is wasted in labeling a large number of dynamic cabinet images, and the problem of wrong labeling of the dynamic cabinet images exists.

Thirdly, by means of image cutting and similar image splicing in data enhancement, the generated training image sample has more characteristic information. However, there is a problem that the accuracy of confirming similar images is low.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a dynamic cabinet image recognition model training method, apparatus, electronic device, and computer readable medium to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a dynamic cabinet image recognition model training method, including: acquiring a first dynamic cabinet image; inputting the first dynamic cabinet image into a first image feature extraction network trained in advance to obtain a first image feature vector; determining at least one first detection frame corresponding to the first dynamic cabinet image; generating a third dynamic cabinet image according to the first dynamic cabinet image, the first image feature vector, a prestored second dynamic cabinet image set, a prestored first article image set and the at least one first detection frame, wherein the second dynamic cabinet image set has a corresponding second detection frame set group, and an image corresponding to each second detection frame in the second detection frame set group is the first article image set; and taking the third dynamic cabinet image as a training image sample, and training a dynamic cabinet image recognition model by using a machine learning model training method to obtain the trained dynamic cabinet image recognition model.

In a second aspect, some embodiments of the present disclosure provide a dynamic cabinet image recognition model training apparatus, including: an acquisition unit configured to acquire a first dynamic cabinet image; the input unit is configured to input the first dynamic cabinet image to a first image feature extraction network trained in advance to obtain a first image feature vector; a determining unit configured to determine at least one first detection frame corresponding to the first dynamic cabinet image; a generating unit configured to generate a third dynamic cabinet image according to the first dynamic cabinet image, the first image feature vector, a prestored second dynamic cabinet image set, a prestored first item image set and the at least one first detection frame, wherein the second dynamic cabinet image set has a corresponding second detection frame set group, and an image corresponding to each second detection frame in the second detection frame set group is the first item image set; and the training unit is configured to train the dynamic cabinet image recognition model by using the third dynamic cabinet image as a training image sample and using a machine learning model training method to obtain the trained dynamic cabinet image recognition model.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, where the program when executed by a processor implements a method as described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following beneficial effects: the dynamic cabinet image recognition model training method of some embodiments of the present disclosure can generate a dynamic cabinet image recognition model with a more accurate recognition effect. Specifically, the reason why a more accurate motion picture recognition model cannot be generated is that: firstly, the training image sample pattern generated by data enhancement is single, and cannot provide more characteristic information for the training of the subsequent target detection model, so that the prediction accuracy of the target detection model after the subsequent training is low. Second, the feature extraction network that extracts the first dynamic cabinet image often requires a label for the first dynamic cabinet image. In real life, a large number of first dynamic cabinet images are often needed to train a target detection model. A large amount of time is wasted in labeling a large number of dynamic cabinet images, and the problem of wrong labeling of the dynamic cabinet images exists. Based on this, the dynamic cabinet image recognition model training method of some embodiments of the present disclosure may first acquire a first dynamic cabinet image. And then, inputting the first dynamic cabinet image to a first image feature extraction network trained in advance to obtain a first image feature vector. Here, the obtained first image feature vector is used for subsequently generating a third dynamic cabinet image. Similarly, at least one first detection frame corresponding to the first dynamic cabinet image is determined for subsequently generating a third dynamic cabinet image. Furthermore, a third dynamic cabinet image including more image feature information may be generated according to the first dynamic cabinet image, the first image feature vector, the second dynamic cabinet image set stored in advance, the first item image set stored in advance, and the at least one first detection frame. And finally, taking the third dynamic cabinet image as a training image sample, and training a dynamic cabinet image recognition model by using a machine learning model training method, so that the obtained trained dynamic cabinet image recognition model is more accurate for subsequent dynamic cabinet image recognition.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a schematic diagram of one application scenario of a dynamic cabinet image recognition model training method, according to some embodiments of the present disclosure;

FIG. 2 is a flow diagram of some embodiments of a dynamic cabinet image recognition model training method according to the present disclosure;

FIG. 3 is a flow diagram of further embodiments of a dynamic cabinet image recognition model training method according to the present disclosure;

FIG. 4 is a schematic structural diagram of some embodiments of a motion picture cabinet image recognition model training apparatus according to the present disclosure;

FIG. 5 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a schematic diagram of an application scenario of a dynamic cabinet image recognition model training method according to some embodiments of the present disclosure.

In the application scenario of fig. 1, the electronic device 101 may first acquire a first dynamic cabinet image. Then, the electronic device 101 may input the first moving picture cabinet image 102 into a first image feature extraction network 103 trained in advance, so as to obtain a first image feature vector 104. Furthermore, the electronic device 101 may determine at least one first detection frame 105 corresponding to the first moving object image 102. In the application scenario, the at least one first detection frame 105 includes: a first detection box 1051, a first detection box 1052 and a first detection box 1053. Next, the electronic device 101 may generate a third moving chest image 108 based on the first moving chest image 102, the first image feature vector 104, a second moving chest image set 106 stored in advance, a first item image set 107 stored in advance, and the at least one first detection frame 105, wherein a corresponding second detection frame set group exists in the second moving chest image set 106, and an image corresponding to each second detection frame in the second detection frame set group is the first item image set 107. In this application scenario, the second dynamic population of images 106 includes: a second dynamic cabinet image 1061, a second dynamic cabinet image 1062, and a second dynamic cabinet image 1063. The first image set of items 107 comprises: a first item image 1071 corresponding to the first dynamic cabinet image 1061, a first item image 1072 corresponding to the first dynamic cabinet image 1062, and a first item image 1073 corresponding to the first dynamic cabinet image 1063. Finally, the electronic device 101 may train the moving object image recognition model 109 using the third moving object image 108 as a training image sample by using a machine learning model training method, to obtain a trained moving object image recognition model 110.

The electronic device 101 may be hardware or software. When the electronic device is hardware, the electronic device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the electronic device is embodied as software, it may be installed in the above-listed hardware devices. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of electronic devices in fig. 1 is merely illustrative. There may be any number of electronic devices, as desired for implementation.

With continued reference to fig. 2, a flow 200 of some embodiments of a dynamic cabinet image recognition model training method according to the present disclosure is shown. The dynamic cabinet image recognition model training method comprises the following steps:

step 201, a first dynamic cabinet image is obtained.

In some embodiments, an executing subject (e.g., the electronic device shown in fig. 1) of the dynamic cabinet image recognition model training method may acquire the first dynamic cabinet image through a wired connection manner or a wireless connection manner.

Optionally, the first dynamic cabinet image is a preprocessed image. The first dynamic cabinet image can also be an image of the dynamic cabinet shot by the target camera.

Step 202, inputting the first dynamic cabinet image to a first image feature extraction network trained in advance to obtain a first image feature vector.

In some embodiments, the executing subject may input the first motion picture into a first image feature extraction network trained in advance to obtain a first image feature vector. The first image feature extraction network may be a neural network that extracts image features. The first feature vector may represent image feature information of the first motion picture cabinet image. As an example, the first image feature extraction network described above may be, but is not limited to, one of: convolutional Neural Networks (CNN), Residual Networks (ResNet).

In some optional implementations of some embodiments, the first image feature extraction network is trained by:

in the first step, a training image sample is obtained.

And secondly, performing data enhancement on the training image samples for multiple times to obtain a plurality of data-enhanced image samples.

As an example, the executing entity may respectively rotate, crop, zoom, and shift the training image sample to obtain four data-enhanced image samples.

And thirdly, inputting the plurality of data-enhanced image samples into a first initial image feature extraction network to output a plurality of first feature vectors. The first initial image feature extraction network may be a first image feature extraction network after network parameters are initialized.

And fourthly, generating a first loss value according to the plurality of first feature vectors and a preset loss function.

As an example, the execution body may input a plurality of first feature vectors to the preset loss function to generate the first loss value.

For example, the number of feature vectors of the plurality of first feature vectors is 2. The two images are respectively the first image

Second image

. First image

The corresponding first feature vector is

. Second image

The corresponding first feature vector is

. The preset loss function is as follows:

。

and fifthly, inputting the plurality of first feature vectors into a second initial image feature extraction network to obtain a plurality of second feature vectors. The second initial image feature extraction network may be a second image feature extraction network after network parameters are initialized. The second image feature extraction network may be a neural network for further extracting feature information of the training image sample. For example, the second image feature extraction network may be, but is not limited to, one of: convolutional neural networks, residual error networks.

And a sixth step of generating a second loss value according to the plurality of second eigenvectors and the loss function.

As an example, the execution body may input a plurality of second feature vectors to the loss function, respectively, to output the second loss value.

And seventhly, inputting the plurality of second feature vectors into a third initial image feature extraction network to obtain a plurality of third feature vectors. The third initial image feature extraction network may be a third image feature extraction network after network parameter initialization. The third image feature extraction network may be a neural network for further extracting feature information of the training image sample. For example, the third image feature extraction network may be, but is not limited to, one of: convolutional neural networks, residual error networks.

And an eighth step of generating a third loss value based on the plurality of third eigenvectors and the loss function.

As an example, the execution body may input a plurality of third eigenvectors to the loss function, respectively, to output the third loss value.

A ninth step of generating a fourth loss value based on the first loss value, the second loss value, and the third loss value.

As an example, the execution body may add the first loss value, the second loss value, and the third loss value to generate the fourth loss value.

Tenth, in response to determining that the fourth loss value is greater than the target value, training each parameter in the first initial image feature extraction network, the second initial image feature extraction network, and the third initial image feature extraction network. The target value may be a preset value.

As an inventive point of the embodiment of the present disclosure, it is solved the technical problem mentioned in the background art that "the feature extraction network for extracting the first dynamic cabinet image often needs a label of the first dynamic cabinet image. In real life, a large number of first dynamic cabinet images are often needed to train a target detection model. The labeling of a large number of dynamic cabinet images wastes a large amount of time, and the problem of wrong labeling of the dynamic cabinet images exists. Based on this, to avoid the need to label the first motion picture cabinet image. The present disclosure introduces multi-level feature extraction for images after enhancement of training image samples by multiple image feature extraction networks. Then, each loss value is generated by comparing the multi-level feature information with a preset loss function. Finally, based on each loss value, the first image feature extraction network can be trained on the basis of ensuring that the label of the first dynamic cabinet image is not needed and the accuracy of the first image feature extraction network is guaranteed.

Step 203, determining at least one first detection frame corresponding to the first dynamic cabinet image.

In some embodiments, the execution subject may determine at least one first detection frame corresponding to the first dynamic cabinet image. The shape of the first detection frame may be various shapes set in advance. For example, it may be rectangular.

As an example, the execution subject may input the first moving cabinet image to a pre-trained detection box determination network to output at least one first detection box.

Step 204, generating a third dynamic cabinet image according to the first dynamic cabinet image, the first image feature vector, a pre-stored second dynamic cabinet image set, a pre-stored first item image set and the at least one first detection frame.

In some embodiments, the execution agent may generate a third dynamic cabinet image in various ways according to the first dynamic cabinet image, the first image feature vector, a second dynamic cabinet image set stored in advance, a first item image set stored in advance, and the at least one first detection frame. And the second dynamic cabinet image set has a corresponding second detection frame set group, and the image corresponding to each second detection frame in the second detection frame set group is the first article image set.

Optionally, the generating a third dynamic cabinet image according to the first dynamic cabinet image, the first image feature vector, the second dynamic cabinet image set stored in advance, the first item image set stored in advance, and the at least one first detection frame includes:

and firstly, inputting each second dynamic cabinet image in the second dynamic cabinet image set to a pre-trained first image feature extraction network to output a second image feature vector set.

And secondly, inputting the first dynamic cabinet image into a pre-trained similar image generation network to output at least one first similar image. The similar image generation network may be a Generative Adaptive Network (GAN).

And thirdly, inputting the at least one first similar image into a pre-trained first image feature extraction network to output a third image feature vector set.

And fourthly, determining a first cosine distance between each second image feature vector in the second image feature vector set and the first image feature vector to obtain a first cosine distance set.

And fifthly, screening out a third number of first cosine distances with the cosine distance rank satisfying the target condition from the first cosine distance set, and taking the third number of first cosine distances as first target cosine distances to obtain a first target cosine distance set. Wherein the third number may be preset. The target condition may be that the magnitude of the cosine distance value is located at the first third number.

And sixthly, screening out a second image characteristic vector subset corresponding to the first target cosine distance set from the second dynamic cabinet image set.

And seventhly, determining a third cosine distance between each second image feature vector in the second image feature vector subsets and each third image feature vector to generate each third cosine distance, so as to obtain a third cosine distance set.

And eighthly, screening a fourth number of third cosine distances with the cosine distance rank satisfying the target condition from the third cosine distance set, and taking the fourth number of third cosine distances as third target cosine distances to obtain a third target cosine distance set. Wherein the fourth number may be preset.

And ninthly, determining a second dynamic cabinet image corresponding to each third target cosine distance in the third target cosine distance set as a second target dynamic cabinet image to obtain a second target dynamic cabinet image set.

And step ten, determining at least one first article image corresponding to each second target dynamic image in the second target dynamic cabinet image set.

And step ten, splicing the first dynamic cabinet image and the at least one first similar image to obtain a spliced image.

And a twelfth step of blending at least one first article image corresponding to each second target dynamic image into the spliced image to obtain a blended image serving as a third dynamic cabinet image. And the intersection ratio between the at least one first detection frame and the second detection frame subset group in the third dynamic cabinet image is less than 0.3, and the image corresponding to the second detection frame subset group is at least one first article image corresponding to each second target dynamic image.

The method solves the technical problems mentioned in the background art, namely, the mode of image cropping and similar image stitching in data enhancement is utilized, and the generated training image sample has more characteristic information. However, there is a problem that the accuracy of confirming similar images is low ". Factors that lead to a low accuracy of identifying similar images tend to be as follows: the problem of low accuracy exists in determining at least one similar image in the second dynamic cabinet image set, which is most similar to the seismograph dynamic cabinet image, by only solving the cosine distance. Based on this, the present disclosure introduces generating at least one similar image by using a similar image generation network such that the generated at least one similar image is similar to the first dynamic cabinet image. Therefore, the image after splicing the subsequent first dynamic cabinet image and the at least one similar image has more object information of the object corresponding to the first dynamic cabinet image. In addition, by determining at least one first article image corresponding to each second target dynamic image, the feature information possibly related to the object corresponding to the first dynamic cabinet image can be further merged into the merged image, so that the feature information of the object included in the obtained merged image is richer. Based on the method, the dynamic cabinet image recognition model is trained according to the merged images, so that the dynamic cabinet image recognition model obtained by subsequent training is more accurate.

And step 205, taking the third dynamic cabinet image as a training image sample, and training a dynamic cabinet image recognition model by using a machine learning model training method to obtain a trained dynamic cabinet image recognition model.

In some embodiments, the executing entity may train the dynamic cabinet image recognition model by using the third dynamic cabinet image as a training image sample and using a machine learning model training method, so as to obtain a trained dynamic cabinet image recognition model.

The above embodiments of the present disclosure have the following beneficial effects: the dynamic cabinet image recognition model training method of some embodiments of the present disclosure can generate a dynamic cabinet image recognition model with a more accurate recognition effect. Specifically, the reason why a more accurate motion picture recognition model cannot be generated is that: firstly, the training image sample pattern generated by data enhancement is single, and cannot provide more characteristic information for the training of the subsequent target detection model, so that the prediction accuracy of the target detection model after the subsequent training is low. Second, the feature extraction network that extracts the first dynamic cabinet image often requires a label for the first dynamic cabinet image. In real life, a large number of first dynamic cabinet images are often needed to train a target detection model. A large amount of time is wasted by labeling a large number of dynamic cabinet images, and the problem of wrong labeling of the dynamic cabinet images exists. Based on this, the dynamic cabinet image recognition model training method of some embodiments of the present disclosure may first acquire a first dynamic cabinet image. And then, inputting the first dynamic cabinet image to a first image feature extraction network trained in advance to obtain a first image feature vector. Here, the obtained first image feature vector is used for subsequently generating a third dynamic cabinet image. Similarly, at least one first detection frame corresponding to the first dynamic cabinet image is determined for subsequently generating a third dynamic cabinet image. Furthermore, a third dynamic cabinet image including more image feature information may be generated according to the first dynamic cabinet image, the first image feature vector, the second dynamic cabinet image set stored in advance, the first item image set stored in advance, and the at least one first detection frame. And finally, taking the third dynamic cabinet image as a training image sample, and training a dynamic cabinet image recognition model by using a machine learning model training method, so that the obtained trained dynamic cabinet image recognition model is more accurate for subsequent dynamic cabinet image recognition.

With further reference to FIG. 3, a flow 300 of further embodiments of a dynamic cabinet image recognition model training method according to the present disclosure is shown. The dynamic cabinet image recognition model training method comprises the following steps:

step 301, a first dynamic cabinet image is obtained.

Step 302, inputting the first dynamic cabinet image to a first image feature extraction network trained in advance to obtain a first image feature vector.

Step 303, determining at least one first detection frame corresponding to the first dynamic cabinet image.

Step 304, inputting each second dynamic cabinet image in the second dynamic cabinet image set to the pre-trained first image feature extraction network to output a second image feature vector, so as to obtain a second image feature vector set.

In some embodiments, an executing subject (e.g., the electronic device shown in fig. 1) may input each of the second dynamic cabinet images in the second dynamic cabinet image set to the pre-trained first image feature extraction network to output a second image feature vector, resulting in a second image feature vector set.

Step 305, determining a first cosine distance between each second image feature vector in the second image feature vector set and the first image feature vector to obtain a first cosine distance set.

In some embodiments, the executing entity may determine a first cosine distance between each second image feature vector in the second image feature vector set and the first image feature vector, to obtain a first cosine distance set.

Step 306, a first number of first cosine distances meeting the predetermined condition are screened out from the first cosine distance set to serve as first target cosine distances, and a first target cosine distance set is obtained.

In some embodiments, the execution subject may select a first number of first cosine distances satisfying a predetermined condition from the first cosine distance set as first target cosine distances to obtain a first target cosine distance set.

Step 307, a second dynamic cabinet image subset corresponding to the first target cosine distance set is selected from the second dynamic cabinet image set.

In some embodiments, the execution subject may filter out a second subset of motion bin images from the second set of motion bin images that corresponds to the first set of target cosine distances.

And 308, splicing the first dynamic cabinet image and each second dynamic cabinet image in the second dynamic cabinet image subset according to a preset mode to obtain a first spliced image.

In some embodiments, the executing entity may stitch the first dynamic cabinet image and each second dynamic cabinet image in the second dynamic cabinet image subset according to a predetermined manner to obtain a first stitched image.

Step 309, generating the third dynamic cabinet image according to the first stitched image, the first item image set, and the at least one first detection frame.

In some embodiments, the execution subject may generate the third moving cabinet image according to the first stitched image, the first item image set, and the at least one first detection frame.

In some optional implementations of some embodiments, the generating the third dynamic cabinet image according to the first stitched image, the first item image set, and the at least one first detection frame may include:

a first step of executing, for each first detection frame corresponding to the at least one first detection frame, a first image generation step of:

the first substep, confirm the correspondent second article picture of the above-mentioned first detection frame.

And a second substep of determining a third image feature vector corresponding to the second item image.

As an example, the executing entity may input the second image to a first image feature extraction network trained in advance, so as to obtain a second image feature vector.

And a third substep, determining a fourth image feature vector corresponding to each first article image in the first article image set to obtain a fourth image feature vector set.

As an example, the executing entity may input each first item image in the first item image set to a first image feature extraction network trained in advance to output a fourth image feature vector, resulting in a fourth image feature vector set.

And a fourth substep of determining a second cosine distance between each fourth image feature vector in the fourth image feature vector set and the third image feature vector to obtain a second cosine distance set.

And a fifth substep of screening a second number of second cosine distances satisfying the predetermined condition from the second cosine distance set to serve as second target cosine distances to obtain a second target cosine distance set. Wherein the second number may be preset. The precondition may be a second cosine distance with a second cosine distance set size at the target quantile position. For example, the predetermined conditions may be a second cosine distance having a largest distance in the second set of cosine distances, a second cosine distance having a magnitude of 3/4 deciles, and a second cosine distance having a magnitude of 1/2 deciles.

And a sixth substep of determining a first article image subset corresponding to the second target cosine distance set.

And secondly, generating the third dynamic cabinet image according to the obtained first article image subset group, the at least one first detection frame and the first spliced image.

As an example, the executing body may first determine a first item image in the first item image subset group, where the same item information exists as the at least one first detection frame, to obtain at least one first item image. And finally, splicing the at least one first article image and the first spliced image to obtain a third dynamic cabinet image.

Optionally, the generating the third moving picture of the cabinet according to the obtained first item picture subset group, the at least one first detection frame and the first stitched picture may include:

firstly, screening a target number of first article images from the first article image subset group to obtain a screened first article image set. Wherein the target number may be preset. For example, it may be 12.

And secondly, performing data enhancement on each first article image in the screened first article image set to obtain each data-enhanced first article image.

And thirdly, blending the data-enhanced first article images into the first spliced image to obtain a blended image, wherein intersection ratio information between the detection frame corresponding to each data-enhanced first article image in the blended image and the at least one first detection frame is smaller than a target threshold value. Wherein, the target threshold may be preset. For example, it may be 0.2.

Optionally, the blending the first article image after the data enhancement into the first stitched image to obtain a blended image includes:

for each of the data-enhanced first item images, the executing entity may perform a merging step of:

the first sub-step is to perform first gaussian blurring processing on the sub-image in the first range in the first article image after the data enhancement, and perform second gaussian blurring processing on the sub-image in the second range in the first article image to obtain a processed first article image. Wherein, the first Gaussian blur processing and the second Gaussian blur processing can be realized by adopting a larger convolution kernel.

And a second substep of performing image fusion on the processed first article image and the first spliced image to obtain a second fused image.

As an example, the executing body may first multiply each pixel in the processed first item image within a predetermined boundary range by the parameter a to obtain a first multiplied image. Then, the execution subject may multiply each pixel in the first stitched image within a predetermined boundary range by the parameter 1-a to obtain a second multiplied image. And finally, performing pixel addition fusion in a corresponding preset boundary range on the first multiplied image and the second multiplied image to obtain the second fused image.

And a third sub-step of performing bilinear difference processing on the sub-image in the third range in the second fused image to obtain a processed image serving as the fused image.

And 310, taking the third dynamic cabinet image as a training image sample, and training a dynamic cabinet image recognition model by using a machine learning model training method to obtain the trained dynamic cabinet image recognition model.

In some embodiments, the specific implementation of steps 301-303 and 310 and the technical effect thereof can refer to steps 201-203 and 205 in the embodiment corresponding to fig. 2, and are not described herein again.

As can be seen from fig. 3, the specific steps of generating the third dynamic cabinet image are more highlighted by the flow 300 of the dynamic cabinet image recognition model training method in some embodiments corresponding to fig. 3 than by the description of some embodiments corresponding to fig. 2. Therefore, the solutions described in the embodiments may generate a third dynamic cabinet image including more image feature information by using the first dynamic cabinet image, the first image feature vector, the second dynamic cabinet image set, the first item image set, and the at least one first detection frame, so that a dynamic cabinet image recognition model trained subsequently using the third dynamic cabinet image as a training sample is more accurate.

With further reference to fig. 4, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of a motion picture cabinet image recognition model training apparatus, which correspond to those shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 4, a motion picture cabinet image recognition model training apparatus 400 includes: acquisition unit 401, input unit 402, determination unit 403, generation unit 404, and training unit 405. Wherein the acquiring unit 401 is configured to acquire a first dynamic cabinet image; an input unit 402, configured to input the first moving object image into a first image feature extraction network trained in advance, so as to obtain a first image feature vector; a determining unit 403, configured to determine at least one first detection frame corresponding to the first moving object image; a generating unit 404, configured to generate a third dynamic cabinet image according to the first dynamic cabinet image, the first image feature vector, a prestored second dynamic cabinet image set, a prestored first item image set, and the at least one first detection frame, where the second dynamic cabinet image set has a corresponding second detection frame set group, and an image corresponding to each second detection frame in the second detection frame set group is the first item image set; a training unit 405 configured to train the dynamic cabinet image recognition model by using the third dynamic cabinet image as a training image sample and using a machine learning model training method, so as to obtain a trained dynamic cabinet image recognition model.

It will be understood that the elements described in the apparatus 400 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 400 and the units included therein, and will not be described herein again.

Referring now to fig. 5, a schematic diagram of an electronic device (e.g., the electronic device of fig. 1) 500 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 5 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a first dynamic cabinet image; inputting the first dynamic cabinet image into a first image feature extraction network trained in advance to obtain a first image feature vector; determining at least one first detection frame corresponding to the first dynamic cabinet image; generating a third dynamic cabinet image according to the first dynamic cabinet image, the first image feature vector, a prestored second dynamic cabinet image set, a prestored first article image set and the at least one first detection frame, wherein the second dynamic cabinet image set has a corresponding second detection frame set group, and an image corresponding to each second detection frame in the second detection frame set group is the first article image set; and taking the third dynamic cabinet image as a training image sample, and training a dynamic cabinet image recognition model by using a machine learning model training method to obtain the trained dynamic cabinet image recognition model.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an input unit, a determination unit, a generation unit, and a training unit. Where the names of the units do not in some cases constitute a limitation of the unit itself, for example, the acquisition unit may also be described as a "unit acquiring a first dynamic cabinet image".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combinations of the above-mentioned features, and other embodiments in which the above-mentioned features or their equivalents are combined arbitrarily without departing from the spirit of the invention are also encompassed. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A dynamic cabinet image recognition model training method comprises the following steps:

acquiring a first dynamic cabinet image;

inputting the first dynamic cabinet image to a first image feature extraction network trained in advance to obtain a first image feature vector;

determining at least one first detection frame corresponding to the first dynamic cabinet image;

inputting each second dynamic cabinet image in a second dynamic cabinet image set stored in advance into the first image feature extraction network trained in advance to output a second image feature vector to obtain a second image feature vector set, wherein the second dynamic cabinet image set has a corresponding second detection frame set group, and images corresponding to each second detection frame in the second detection frame set group are first article image sets stored in advance;

determining a first cosine distance between each second image feature vector in the second image feature vector set and the first image feature vector to obtain a first cosine distance set;

screening a first number of first cosine distances meeting preset conditions from the first cosine distance set to serve as first target cosine distances to obtain a first target cosine distance set;

screening out a second dynamic cabinet image subset corresponding to the first target cosine distance set from the second dynamic cabinet image set;

splicing the first dynamic cabinet image and each second dynamic cabinet image in the second dynamic cabinet image subset according to a preset mode to obtain a first spliced image;

generating the third dynamic cabinet image according to the first spliced image, the first article image set and the at least one first detection frame;

and taking the third dynamic cabinet image as a training image sample, and training a dynamic cabinet image recognition model by using a machine learning model training method to obtain the trained dynamic cabinet image recognition model.

2. The method of claim 1, wherein the generating the third dynamic cabinet image from the first stitched image, the first item image set, and the at least one first detection box comprises:

executing a first image generation step for each first detection frame corresponding to the at least one first detection frame:

determining a second article image corresponding to the first detection frame;

determining a third image feature vector corresponding to the second object image;

determining a fourth image feature vector corresponding to each first article image in the first article image set to obtain a fourth image feature vector set;

determining a second cosine distance between each fourth image feature vector in the fourth image feature vector set and the third image feature vector to obtain a second cosine distance set;

screening a second number of second cosine distances meeting the preset condition from the second cosine distance set to serve as second target cosine distances to obtain a second target cosine distance set;

determining a first article image subset corresponding to the second target cosine distance set;

and generating the third dynamic cabinet image according to the obtained first article image subset group, the at least one first detection frame and the first spliced image.

3. The method of claim 2, wherein generating the third dynamic cabinet image from the obtained first item image subset group, the at least one first detection box, and the first stitched image comprises:

screening a target number of first article images from the first article image subset group to obtain a screened first article image set;

performing data enhancement on each first article image in the screened first article image set to obtain each data-enhanced first article image;

merging the data-enhanced first article images into the first merged image to obtain a merged image, wherein intersection ratio information between the detection frames corresponding to the data-enhanced first article images in the merged image and the at least one first detection frame is smaller than a target threshold value;

and determining the merged image as the third dynamic cabinet image.

4. The method of claim 3, wherein said blending said each data-enhanced first item image into said first stitched image to obtain a blended image comprises:

for each data-enhanced first item image in the respective data-enhanced first item images, performing a blending step:

performing first Gaussian blur processing on the sub-image in the first range in the first article image after the data enhancement, and performing second Gaussian blur processing on the sub-image in the second range in the first article image to obtain a processed first article image;

performing image fusion on the processed first article image and the first spliced image to obtain a second fused image;

and performing bilinear difference processing on the sub-image in the third range in the second fused image to obtain a processed image serving as the fused image.

5. The method of one of claims 1 to 4, wherein the first image feature extraction network is trained by:

acquiring a training image sample;

performing data enhancement on the training image samples for multiple times to obtain a plurality of data-enhanced image samples;

inputting the plurality of data-enhanced image samples into a first initial image feature extraction network to output a plurality of first feature vectors;

generating a first loss value according to the plurality of first eigenvectors and a preset loss function;

inputting the plurality of first feature vectors into a second initial image feature extraction network to obtain a plurality of second feature vectors;

generating a second loss value according to the plurality of second eigenvectors and the loss function;

inputting the plurality of second feature vectors into a third initial image feature extraction network to obtain a plurality of third feature vectors;

generating a third loss value according to the plurality of third eigenvectors and the loss function;

generating a fourth loss value according to the first loss value, the second loss value and the third loss value;

in response to determining that the fourth loss value is greater than a target value, training respective parameters in the first initial image feature extraction network, the second initial image feature extraction network, and the third initial image feature extraction network.

6. A dynamic cabinet image recognition model training device comprises:

an acquisition unit configured to acquire a first dynamic cabinet image;

the input unit is configured to input the first dynamic cabinet image to a pre-trained first image feature extraction network to obtain a first image feature vector;

a first determining unit configured to determine at least one first detection frame corresponding to the first dynamic cabinet image;

the input unit is configured to input each second dynamic cabinet image in a second dynamic cabinet image set stored in advance to the first image feature extraction network trained in advance to output a second image feature vector to obtain a second image feature vector set, wherein the second dynamic cabinet image set has a corresponding second detection frame set group, and images corresponding to second detection frames in the second detection frame set group are first article image sets stored in advance;

a second determining unit configured to determine a first cosine distance between each second image feature vector in the second image feature vector set and the first image feature vector, resulting in a first cosine distance set;

a first screening unit configured to screen out a first number of first cosine distances satisfying a predetermined condition from the first cosine distance set as first target cosine distances, resulting in a first target cosine distance set;

a second filtering unit configured to filter out a second dynamic cabinet image subset corresponding to the first target cosine distance set from the second dynamic cabinet image set;

a stitching unit configured to stitch the first dynamic cabinet image and each second dynamic cabinet image in the second dynamic cabinet image subset according to a predetermined manner to obtain a first stitched image;

a generating unit configured to generate the third dynamic cabinet image according to the first stitched image, the first item image set, and the at least one first detection frame;

and the training unit is configured to train the dynamic cabinet image recognition model by using the third dynamic cabinet image as a training image sample and using a machine learning model training method to obtain the trained dynamic cabinet image recognition model.

7. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

8. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.