CN116912518B

CN116912518B - Image multi-scale feature processing method and device

Info

Publication number: CN116912518B
Application number: CN202311168251.6A
Authority: CN
Inventors: 蒋召; 周靖宇
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2024-01-05
Anticipated expiration: 2043-09-12
Also published as: CN116912518A

Abstract

The disclosure relates to the technical field of image processing, and provides a method and a device for processing multi-scale characteristics of an image, wherein the method comprises the following steps: processing the feature map of the image to obtain a first scale feature vector matrix of the image; processing the first fusion feature vector matrix of the image to obtain a second scale feature vector matrix of the image; processing the second fusion feature vector matrix of the image to obtain a third scale feature vector matrix of the image; determining a multi-scale feature vector matrix of the image according to the first-scale feature vector matrix of the image, the second-scale feature vector matrix of the image and the third-scale feature vector matrix of the image; the target object is located from the at least one re-identified object based on the multi-scale feature vector matrix of the image of the particular object and the multi-scale feature vector matrix of the image of the at least one re-identified object. The present disclosure obtains a target object based on a multi-scale feature vector matrix, and the target object obtained in this way is more accurate.

Description

Image multi-scale feature processing method and device

Technical Field

The disclosure relates to the technical field of image processing, and in particular relates to a method and a device for processing multi-scale features of an image.

Background

With the rapid development of internet technology, people can perform re-recognition tasks based on image models, so that a target object can be located from a large amount of image data. The conventional re-recognition algorithm does not consider the scale feature change in the recognition process, namely, when a human being observes a specific object, firstly, the human being observes the whole object, and then gradually observes local information of the human being, so that the local information has high resolution, and is helpful for re-recognition tasks. At present, in order to improve the precision, attention collection is introduced into an identification network by a re-identification algorithm, but the attention is only focused on a single scale to enable the network to learn more useful characteristics, so that the accuracy of an obtained re-identification task result is low, the actual requirements of people cannot be met, and the user experience is affected.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a method, an apparatus, an electronic device, and a computer readable storage medium for processing multi-scale features of an image, so as to solve the technical problem that in the prior art, the accuracy of the result of the task re-identification is not high, the actual needs of people cannot be satisfied, and the user experience is affected.

In a first aspect of embodiments of the present disclosure, a method for processing multi-scale features of an image is provided, the method comprising: acquiring a feature map of an image, wherein the image comprises an image of a specific object and at least one image of a re-identified object; processing the feature map of the image to obtain a first scale feature vector matrix of the image; processing a first fused feature vector matrix of the image to obtain a second scale feature vector matrix of the image, wherein the first fused feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and a feature map of the image; processing a second fused feature vector matrix of the image to obtain a third-scale feature vector matrix of the image, wherein the second fused feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and the first fused feature vector matrix of the image; determining a multi-scale feature vector matrix of the image according to the first-scale feature vector matrix of the image, the second-scale feature vector matrix of the image and the third-scale feature vector matrix of the image; and acquiring a multi-scale feature vector matrix of the image of the specific object and a multi-scale feature vector matrix of the image of the at least one re-identification object in a cyclic manner, and positioning the target object from the at least one re-identification object according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of the at least one re-identification object.

In a second aspect of embodiments of the present disclosure, there is provided an apparatus for multi-scale feature processing of an image, the apparatus comprising: the acquisition module is used for acquiring a feature map of an image, wherein the image comprises an image of a specific object and at least one image of a re-identification object; the first processing module is used for processing the feature images of the images to obtain a first scale feature vector matrix of the images; the second processing module is used for processing the first fusion feature vector matrix of the image to obtain a second scale feature vector matrix of the image, and the first fusion feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and the feature map of the image; the third processing module is used for processing a second fused feature vector matrix of the image to obtain a third scale feature vector matrix of the image, wherein the second fused feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and the first fused feature vector matrix of the image; the determining module is used for determining a multi-scale feature vector matrix of the image according to the first-scale feature vector matrix of the image, the second-scale feature vector matrix of the image and the third-scale feature vector matrix of the image; and the circulation module is used for acquiring a multi-scale feature vector matrix of the image of the specific object and a multi-scale feature vector matrix of the image of the at least one re-identification object in a circulation mode, and positioning the target object from the at least one re-identification object according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of the at least one re-identification object.

In a third aspect of the disclosed embodiments, an electronic device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the disclosed embodiments, a computer-readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the steps of the above-described method.

Compared with the prior art, the embodiment of the disclosure has the beneficial effects that: the embodiment of the disclosure can process the feature map of the image to obtain a first scale feature vector matrix of the image; processing the first fusion feature vector matrix of the image to obtain a second scale feature vector matrix of the image; processing the second fusion feature vector matrix of the image to obtain a third scale feature vector matrix of the image; determining a multi-scale feature vector matrix of the image according to the first-scale feature vector matrix of the image, the second-scale feature vector matrix of the image and the third-scale feature vector matrix of the image; the target object is located from the at least one re-identified object based on the multi-scale feature vector matrix of the image of the particular object and the multi-scale feature vector matrix of the image of the at least one re-identified object, such that the target object can be acquired based on the multi-scale feature vector matrix, the target object acquired in this way being more accurate.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the invention may be applied;

FIG. 2 is a flow chart of a method for multi-scale feature processing of an image provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of steps provided by an embodiment of the present disclosure for locating a target object from at least one re-identified object;

FIG. 4 is a schematic structural view of an image multi-scale feature processing apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

It should be noted that, the user information (including, but not limited to, terminal device information, user personal information, etc.) and the data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the present invention may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.

The user can interact with the server 105 through the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or transmit image data, or the like. The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices with display screens including, but not limited to, smartphones, tablet computers, portable computers, desktop computers, and the like.

The server 105 may be a server providing various services. For example, the server 105 may acquire an image of a specific object and at least one image of a re-identified object from the first terminal device 103 (or may be the second terminal device 102 or the third terminal device 103), and acquire a feature map of the image including the image of the specific object and the image of the at least one re-identified object through the server 105; processing the feature map of the image to obtain a first scale feature vector matrix of the image; processing a first fused feature vector matrix of the image to obtain a second scale feature vector matrix of the image, wherein the first fused feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and a feature map of the image; processing a second fused feature vector matrix of the image to obtain a third-scale feature vector matrix of the image, wherein the second fused feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and the first fused feature vector matrix of the image; determining a multi-scale feature vector matrix of the image according to the first-scale feature vector matrix of the image, the second-scale feature vector matrix of the image and the third-scale feature vector matrix of the image; in a cyclic manner, a multi-scale feature vector matrix of the image of the specific object and a multi-scale feature vector matrix of the image of the at least one re-identification object are acquired, and the target object is positioned from the at least one re-identification object according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of the at least one re-identification object, so that the target object can be acquired based on the multi-scale feature vector matrix, and the target object acquired in the manner is more accurate.

In some embodiments, the method for processing multi-scale features of an image provided by the embodiments of the present invention is generally performed by the server 105, and accordingly, the device for processing multi-scale features of an image is generally disposed in the server 105. In other embodiments, some terminal devices may have similar functionality as a server to perform the method. Therefore, the multi-scale feature processing method of the image provided by the embodiment of the invention is not limited to be executed at the server side.

Methods and apparatus for multi-scale feature processing of images according to embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

Fig. 2 is a flow chart of a method for processing multi-scale features of an image according to an embodiment of the present disclosure. The method provided by the embodiments of the present disclosure may be performed by any electronic device having computer processing capabilities, for example, the electronic device may be a server as shown in fig. 1.

As shown in fig. 2, the multi-scale feature processing method of the image includes steps S210 to S260.

In step S210, a feature map of an image is acquired, the image including an image of a specific object and an image of at least one re-identified object.

Step S220, processing the feature map of the image to obtain a first scale feature vector matrix of the image.

In step S230, the first fused feature vector matrix of the image is processed to obtain a second scale feature vector matrix of the image, where the first fused feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and the feature map of the image.

In step S240, a second fused feature vector matrix of the image is processed to obtain a third scale feature vector matrix of the image, where the second fused feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and the first fused feature vector matrix of the image.

In step S250, a multi-scale feature vector matrix of the image is determined from the first-scale feature vector matrix of the image, the second-scale feature vector matrix of the image, and the third-scale feature vector matrix of the image.

In step S260, a multi-scale feature vector matrix of the image of the specific object and a multi-scale feature vector matrix of the image of the at least one re-recognition object are acquired in a round robin manner, and the target object is located from the at least one re-recognition object according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of the at least one re-recognition object.

The method can process the feature map of the image to obtain a first scale feature vector matrix of the image, process the first fusion feature vector matrix of the image to obtain a second scale feature vector matrix of the image, process the second fusion feature vector matrix of the image to obtain a third scale feature vector matrix of the image, determine a multi-scale feature vector matrix of the image according to the first scale feature vector matrix of the image, the second scale feature vector matrix of the image and the third scale feature vector matrix of the image, and locate a target object from at least one re-identified object according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of the at least one re-identified object, so that the target object can be obtained based on the multi-scale feature vector matrix, and the target object obtained in the way is more accurate.

In some embodiments of the present disclosure, the image of the specific object may be an image of a real object. The at least one image of the re-identified object may be an image containing a real object and an image containing a non-real object. For example, an object identical to a real object is located from a set of images based on the image of the real object. The image of the real image is an image of a specific object, and the image set comprises at least one image of a re-identified object. For a lost scene of a child, an image of the child (namely an image of a specific object) and each frame of image (at least one image of a re-identification object) of the child in the multimedia data of the lost area can be acquired, and the image and each frame of image of the child can be analyzed from the aspect of multi-scale characteristics based on the image identification model of the application, so that the child can be quickly and accurately positioned from each frame of image of the multimedia data.

In some embodiments of the present disclosure, the image recognition model may include a plurality of scale processing modules for extracting global features capable of representing the object structure from the feature map of the image, and extracting local features capable of representing the object structure from different fusion features. In this embodiment, each scale processing module includes a global averaging pooling layer for extracting global features and local features, where the global features and the local features represent features of different scales. The different fusion features may be a first fusion feature vector matrix obtained by adding the feature vector matrix output by the bottleneck layer to the feature map of the image, and a second fusion feature vector matrix obtained by adding the feature vector matrix output by the bottleneck layer to the first fusion feature vector matrix.

In some embodiments of the present disclosure, before acquiring the feature map of the image, the method further includes: acquiring training samples, wherein each training sample comprises a first historical image, a second historical image and a third historical image, the first historical image and the second historical image are the same, and the third historical image is different from the first historical image; inputting a training sample into an image recognition model, and extracting feature images of each historical image in the training sample through a backbone network in the image recognition image; inputting the feature images of each historical image to a first global average pooling layer, pooling the feature images of each historical image through the first global average pooling layer to obtain a first scale feature vector matrix of each historical image, inputting the first scale feature vector matrix of each historical image to a first full-connection layer, processing the first scale feature vector matrix of each historical image through the first full-connection layer, and calculating classification loss and ternary loss under the first scale according to the output result of the first full-connection layer; inputting the feature images of each historical image to a bottleneck layer, processing the feature images of each image through the bottleneck layer, correspondingly adding the feature vector matrix of each image output by the bottleneck layer and the feature images of each historical image to obtain a first fused feature vector matrix of each historical image, inputting the first fused feature vector matrix of each historical image to a second global average pooling layer, and pooling the first fused feature vector matrix of each historical image through the second global average pooling layer to obtain a second scale feature vector matrix of each historical image; inputting a second scale feature vector matrix of each historical image to a second full-connection layer, processing the second scale feature vector matrix of each historical image through the second full-connection layer, and calculating classification loss and ternary loss under the second scale according to the output result of the second full-connection layer; adding the feature vector of each image output by the bottleneck layer with the first fusion feature vector matrix of each historical image to obtain a second fusion feature vector matrix of each image, inputting the second fusion feature vector matrix of each historical image to a third global average pooling layer, and pooling the second fusion feature vector matrix of each historical image through the third global average pooling layer to obtain a third scale feature vector matrix of each historical image; inputting a third-scale feature vector matrix of each historical image to a third full-connection layer, processing the third-scale feature vector matrix of each historical image through the third full-connection layer, and calculating classification loss and ternary loss under a third scale according to the output result of the third full-connection layer; determining total loss in multiple scales according to the classification loss and the ternary loss in the first scale, the classification loss and the ternary loss in the second scale, and the classification loss and the ternary loss in the third scale; and stopping training until the image recognition model converges in a cyclic iteration mode. In this way, the parameters of the backbone network and the bottleneck layer can be reversely updated based on the total loss, and training is stopped after the parameters of the backbone network and the bottleneck layer are stable.

Based on the foregoing embodiments, the total loss at multiple scales is determined from the classification loss and the ternary loss at the first scale, the classification loss and the ternary loss at the second scale, the classification loss and the ternary loss at the third scale. For example, the classification loss and the ternary loss at three different scales add to give the total loss at multiple scales.

In some embodiments of the present disclosure, processing a feature map of an image to obtain a first scale feature vector matrix of the image includes: inputting the feature map of the image to a first global averaging pooling layer; and processing the feature map of the image through a first global average pooling layer to obtain a first scale feature vector matrix of the image. In this way, the initial global features of the image can be obtained.

In some embodiments of the present disclosure, the method further comprises: inputting the feature images of the images to a bottleneck layer, and processing the feature images of the images through the bottleneck layer to obtain a feature vector matrix output by the bottleneck layer; and adding the feature vector matrix output by the bottleneck layer and the feature map of the image to obtain a first fusion feature vector matrix of the image. Local features of the image may be enhanced in this way so that the local features that may subsequently be extracted from the first fused feature vector matrix are more robust and resolved.

In some embodiments of the present disclosure, the bottleneck layer includes a first convolution layer, a second convolution layer, and a third convolution layer, where the first convolution layer has the same convolution kernel as the third convolution layer, and the second convolution layer has a different convolution kernel from the third convolution layer; processing the feature map of the image through the bottleneck layer to obtain a feature vector matrix output by the bottleneck layer comprises the following steps: inputting the feature images of the images into a first convolution layer, and carrying out convolution processing on the feature images of the images through the first convolution layer to obtain a feature vector matrix output by the first convolution layer; inputting the feature vector matrix output by the first convolution layer into a second convolution layer, and carrying out convolution processing on the feature vector matrix output by the first convolution layer through the second convolution layer to obtain the feature vector matrix output by the second convolution layer; and inputting the feature vector matrix output by the second convolution layer into a third convolution layer, and carrying out convolution processing on the feature vector matrix output by the second convolution layer through the third convolution layer to obtain the feature vector matrix output by the third convolution layer. In this application, local features in the feature map may be enhanced by three serially connected convolutional layers in the bottleneck layer. The convolution kernels of the three convolution layers may be 1x1, 3x3, 1x1.

In some embodiments of the present disclosure, processing the first fused feature vector matrix of the image to obtain a second scale feature vector matrix of the image includes: inputting the first fused feature vector matrix of the image to a second global average pooling layer, and pooling the first fused feature vector matrix of the image through the second global average pooling layer to obtain a second scale feature vector matrix of the image. The second scale feature vector matrix obtained in this way can more embody the local information of the image structure.

In some embodiments of the present disclosure, processing the second fused feature vector matrix of the image to obtain a third scale feature vector matrix of the image includes: and inputting the second fused feature vector matrix of the image into a third global average pooling layer, and pooling the second fused feature vector matrix of the image through the third global average pooling layer to obtain a third scale feature vector matrix of the image. In this embodiment, the feature vector matrix output by the bottleneck layer is added to the first fused feature vector matrix to obtain the second fused feature vector matrix, so that local features can be further enhanced, and features with robustness and low resolution can be extracted from the second fused feature vector matrix through the third global averaging pooling layer.

Based on the foregoing embodiment, adding the first scale feature vector matrix, the second scale feature vector matrix, and the third scale feature vector matrix of the image of the specific object to obtain a multi-scale feature vector matrix of the image of the specific object; and adding the first scale feature vector matrix, the second scale feature vector matrix and the third scale feature vector matrix of the image of the at least one re-identification object to obtain a multi-scale feature vector matrix of the image of the at least one re-identification object. The multi-scale feature vector matrix obtained in this way contains global features and local features of a specific object image or a re-identification image. Based on the similarity of the two, the target object can be quickly and accurately positioned from at least one re-identification object. In this embodiment, the specific object is the same as the target object.

Fig. 3 is a flowchart illustrating steps for locating a target object from at least one re-identified object provided by an embodiment of the present disclosure.

As shown in fig. 3, in the above step S260, the "positioning the target object from the at least one re-recognition object based on the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of the at least one re-recognition object" may be specifically performed in step S310 and step S320.

In step S310, a similarity between the specific object and each re-identified object is determined according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of the at least one re-identified object.

In step S320, a target object is located from at least one re-recognition object according to the degree of similarity between the specific object and each re-recognition object.

According to the method, the target object can be positioned from at least one re-identification object according to the similarity between the specific object and each re-identification object, and in this way, the target object can be quickly and accurately positioned from at least one re-identification object, so that the actual requirement of an application scene is met, and the experience of using an image identification model is improved.

In some embodiments, the similarity between the particular object and each of the re-identified objects may be calculated based on a multi-scale feature vector matrix from the image of the particular object and a multi-scale feature vector matrix from the image of the at least one re-identified object by a euclidean distance formula or cosine formula.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. The multi-scale feature processing apparatus of the image described below and the multi-scale feature processing method of the image described above may be referred to correspondingly to each other. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 4 is a schematic structural diagram of an image multi-scale feature processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 4, the multi-scale feature processing apparatus 400 of the image includes an acquisition module 410, a first processing module 420, a second processing module 430, a third processing module 440, a determination module 450, and a loop module 460.

Specifically, the acquiring module 410 is configured to acquire a feature map of an image, where the image includes an image of a specific object and at least one image of a re-identified object.

The first processing module 420 is configured to process the feature map of the image to obtain a first scale feature vector matrix of the image.

The second processing module 430 is configured to process the first fused feature vector matrix of the image to obtain a second scale feature vector matrix of the image, where the first fused feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and the feature map of the image.

The third processing module 440 is configured to process a second fused feature vector matrix of the image to obtain a third scale feature vector matrix of the image, where the second fused feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and the first fused feature vector matrix of the image.

A determining module 450, configured to determine a multi-scale feature vector matrix of the image according to the first scale feature vector matrix of the image, the second scale feature vector matrix of the image, and the third scale feature vector matrix of the image.

The loop module 460 is configured to obtain, in a loop manner, a multi-scale feature vector matrix of the image of the specific object and a multi-scale feature vector matrix of the image of the at least one re-recognition object, and locate the target object from the at least one re-recognition object according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of the at least one re-recognition object.

The multi-scale feature processing device 400 of the image can process the feature map of the image to obtain a first scale feature vector matrix of the image, process the first fusion feature vector matrix of the image to obtain a second scale feature vector matrix of the image, process the second fusion feature vector matrix of the image to obtain a third scale feature vector matrix of the image, determine the multi-scale feature vector matrix of the image according to the first scale feature vector matrix of the image, the second scale feature vector matrix of the image and the third scale feature vector matrix of the image, and locate the target object from the at least one re-recognition object according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of the at least one re-recognition object, so that the target object can be acquired based on the multi-scale feature vector matrix, and the target object acquired in the mode is more accurate.

In some embodiments of the present disclosure, the first processing module 420 is configured to: inputting the feature map of the image to a first global averaging pooling layer; and processing the feature map of the image through a first global average pooling layer to obtain a first scale feature vector matrix of the image.

In some embodiments of the present disclosure, the multi-scale feature processing apparatus 400 of the image is further configured to: inputting the feature images of the images to a bottleneck layer, and processing the feature images of the images through the bottleneck layer to obtain a feature vector matrix output by the bottleneck layer; and adding the feature vector matrix output by the bottleneck layer and the feature map of the image to obtain a first fusion feature vector matrix of the image.

In some embodiments of the present disclosure, the bottleneck layer includes a first convolution layer, a second convolution layer, and a third convolution layer, where the first convolution layer has the same convolution kernel as the third convolution layer, and the second convolution layer has a different convolution kernel from the third convolution layer; processing the feature map of the image through the bottleneck layer to obtain a feature vector matrix output by the bottleneck layer comprises the following steps: inputting the feature images of the images into a first convolution layer, and carrying out convolution processing on the feature images of the images through the first convolution layer to obtain a feature vector matrix output by the first convolution layer; inputting the feature vector matrix output by the first convolution layer into a second convolution layer, and carrying out convolution processing on the feature vector matrix output by the first convolution layer through the second convolution layer to obtain the feature vector matrix output by the second convolution layer; and inputting the feature vector matrix output by the second convolution layer into a third convolution layer, and carrying out convolution processing on the feature vector matrix output by the second convolution layer through the third convolution layer to obtain the feature vector matrix output by the third convolution layer.

In some embodiments of the present disclosure, the second processing module 430 is configured to: inputting the first fused feature vector matrix of the image into a second global average pooling layer, and pooling the first fused feature vector matrix of the image through the second global average pooling layer to obtain a second scale feature vector matrix of the image;

processing the second fused feature vector matrix of the image to obtain a third scale feature vector matrix of the image comprises: and inputting the second fused feature vector matrix of the image into a third global average pooling layer, and pooling the second fused feature vector matrix of the image through the third global average pooling layer to obtain a third scale feature vector matrix of the image.

In some embodiments of the present disclosure, locating the target object from the at least one re-identified object based on the multi-scale feature vector matrix of the image of the particular object and the multi-scale feature vector matrix of the image of the at least one re-identified object comprises: determining the similarity between the specific object and each re-recognition object according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of at least one re-recognition object; the target object is located from at least one of the re-identified objects based on the similarity between the particular object and each of the re-identified objects.

In some embodiments of the present disclosure, prior to acquiring the feature map of the image, the multi-scale feature processing apparatus 400 of the image is further configured to: acquiring training samples, wherein each training sample comprises a first historical image, a second historical image and a third historical image, the first historical image and the second historical image are the same, and the third historical image is different from the first historical image; inputting a training sample into an image recognition model, and extracting feature images of each historical image in the training sample through a backbone network in the image recognition image; inputting the feature images of each historical image to a first global average pooling layer, pooling the feature images of each historical image through the first global average pooling layer to obtain a first scale feature vector matrix of each historical image, inputting the first scale feature vector matrix of each historical image to a first full-connection layer, processing the first scale feature vector matrix of each historical image through the first full-connection layer, and calculating classification loss and ternary loss under the first scale according to the output result of the first full-connection layer; inputting the feature images of each historical image to a bottleneck layer, processing the feature images of each image through the bottleneck layer, correspondingly adding the feature vector matrix of each image output by the bottleneck layer and the feature images of each historical image to obtain a first fused feature vector matrix of each historical image, inputting the first fused feature vector matrix of each historical image to a second global average pooling layer, and pooling the first fused feature vector matrix of each historical image through the second global average pooling layer to obtain a second scale feature vector matrix of each historical image; inputting a second scale feature vector matrix of each historical image to a second full-connection layer, processing the second scale feature vector matrix of each historical image through the second full-connection layer, and calculating classification loss and ternary loss under the second scale according to the output result of the second full-connection layer; adding the feature vector of each image output by the bottleneck layer with the first fusion feature vector matrix of each historical image to obtain a second fusion feature vector matrix of each image, inputting the second fusion feature vector matrix of each historical image to a third global average pooling layer, and pooling the second fusion feature vector matrix of each historical image through the third global average pooling layer to obtain a third scale feature vector matrix of each historical image; inputting a third-scale feature vector matrix of each historical image to a third full-connection layer, processing the third-scale feature vector matrix of each historical image through the third full-connection layer, and calculating classification loss and ternary loss under a third scale according to the output result of the third full-connection layer; determining total loss in multiple scales according to the classification loss and the ternary loss in the first scale, the classification loss and the ternary loss in the second scale, and the classification loss and the ternary loss in the third scale; and reversely updating parameters of the backbone network and the bottleneck layer based on total loss in a cyclic iteration mode until the training is stopped when the image recognition model converges.

Fig. 5 is a schematic diagram of an electronic device 5 provided by an embodiment of the present disclosure. As shown in fig. 5, the electronic apparatus 5 of this embodiment includes: a processor 501, a memory 502 and a computer program 503 stored in the memory 502 and executable on the processor 501. The steps of the various method embodiments described above are implemented by processor 501 when executing computer program 503. Alternatively, the processor 501, when executing the computer program 503, performs the functions of the modules in the above-described apparatus embodiments.

The electronic device 5 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 5 may include, but is not limited to, a processor 501 and a memory 502. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the electronic device 5 and is not limiting of the electronic device 5 and may include more or fewer components than shown, or different components.

The processor 501 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

The memory 502 may be an internal storage unit of the electronic device 5, for example, a hard disk or a memory of the electronic device 5. The memory 502 may also be an external storage device of the electronic device 5, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 5. Memory 502 may also include both internal storage units and external storage devices of electronic device 5. The memory 502 is used to store computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.

The integrated modules, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims

1. A method of multi-scale feature processing of an image, the method comprising:

acquiring a feature map of an image, wherein the image comprises an image of a specific object and at least one image of a re-identified object;

processing the feature map of the image to obtain a first scale feature vector matrix of the image;

processing a first fusion feature vector matrix of the image to obtain a second scale feature vector matrix of the image, wherein the first fusion feature vector matrix of the image is determined based on a feature vector matrix output by a bottleneck layer and a feature map of the image;

Processing a second fusion feature vector matrix of the image to obtain a third-scale feature vector matrix of the image, wherein the second fusion feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and the first fusion feature vector matrix of the image;

determining a multi-scale feature vector matrix of the image according to the first-scale feature vector matrix of the image, the second-scale feature vector matrix of the image and the third-scale feature vector matrix of the image;

acquiring a multi-scale feature vector matrix of the image of the specific object and a multi-scale feature vector matrix of the image of at least one re-identification object in a cyclic manner, and positioning a target object from at least one re-identification object according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of at least one re-identification object;

before acquiring the feature map of the image, the method further comprises:

acquiring training samples, wherein each training sample comprises a first historical image, a second historical image and a third historical image, the first historical image and the second historical image are the same, and the third historical image is different from the first historical image;

Inputting the training sample into an image recognition model, and extracting feature images of each historical image in the training sample through a backbone network in the image recognition image;

inputting the feature images of each historical image to a first global average pooling layer, pooling the feature images of each historical image through the first global average pooling layer to obtain a first scale feature vector matrix of each historical image, inputting the first scale feature vector matrix of each historical image to a first full-connection layer, processing the first scale feature vector matrix of each historical image through the first full-connection layer, and calculating classification loss and ternary loss under the first scale according to the output result of the first full-connection layer;

inputting the feature images of each historical image to the bottleneck layer, processing the feature images of each image through the bottleneck layer, correspondingly adding the feature vector matrix of each image output by the bottleneck layer and the feature images of each historical image to obtain a first fused feature vector matrix of each historical image, inputting the first fused feature vector matrix of each historical image to a second global average pooling layer, and pooling the first fused feature vector matrix of each historical image through the second global average pooling layer to obtain a second scale feature vector matrix of each historical image;

Inputting a second scale feature vector matrix of each historical image to a second full-connection layer, processing the second scale feature vector matrix of each historical image through the second full-connection layer, and calculating classification loss and ternary loss under a second scale according to a result output by the second full-connection layer;

adding the feature vector of each image output by the bottleneck layer with the first fusion feature vector matrix of each historical image to obtain a second fusion feature vector matrix of each image, inputting the second fusion feature vector matrix of each historical image to a third global average pooling layer, and pooling the second fusion feature vector matrix of each historical image through the third global average pooling layer to obtain a third scale feature vector matrix of each historical image;

inputting a third-scale feature vector matrix of each historical image to a third full-connection layer, processing the third-scale feature vector matrix of each historical image through the third full-connection layer, and calculating classification loss and ternary loss under a third scale according to the output result of the third full-connection layer;

determining total loss at multiple scales according to the classification loss and the ternary loss at the first scale, the classification loss and the ternary loss at the second scale and the classification loss and the ternary loss at the third scale;

And reversely updating parameters of the backbone network and the bottleneck layer based on total loss in a cyclic iteration mode, and stopping training until the image recognition model converges.

2. The method of claim 1, wherein processing the feature map of the image to obtain a first scale feature vector matrix of the image comprises:

inputting a feature map of the image to a first global averaging pooling layer;

and processing the feature map of the image through the first global average pooling layer to obtain a first scale feature vector matrix of the image.

3. The method according to claim 1, wherein the method further comprises:

inputting the feature map of the image to the bottleneck layer, and processing the feature map of the image through the bottleneck layer to obtain a feature vector matrix output by the bottleneck layer;

and adding the feature vector matrix output by the bottleneck layer and the feature map of the image to obtain a first fusion feature vector matrix of the image.

4. The method of claim 3, wherein the bottleneck layer comprises a first convolution layer, a second convolution layer, and a third convolution layer, wherein the first convolution layer has the same convolution kernel as the third convolution layer, and wherein the second convolution layer has a different convolution kernel than the third convolution layer;

Processing the feature map of the image through the bottleneck layer to obtain a feature vector matrix output by the bottleneck layer, wherein the feature vector matrix comprises the following components:

inputting the feature image of the image into the first convolution layer, and carrying out convolution processing on the feature image of the image through the first convolution layer to obtain a feature vector matrix output by the first convolution layer;

inputting the eigenvector matrix output by the first convolution layer to the second convolution layer, and carrying out convolution processing on the eigenvector matrix output by the first convolution layer through the second convolution layer to obtain the eigenvector matrix output by the second convolution layer;

and inputting the eigenvector matrix output by the second convolution layer to the third convolution layer, and carrying out convolution processing on the eigenvector matrix output by the second convolution layer through the third convolution layer to obtain the eigenvector matrix output by the third convolution layer.

5. The method of claim 1, wherein processing the first fused feature vector matrix of the image to obtain a second scale feature vector matrix of the image comprises: inputting the first fusion feature vector matrix of the image to a second global average pooling layer, and pooling the first fusion feature vector matrix of the image through the second global average pooling layer to obtain a second scale feature vector matrix of the image;

Processing the second fused feature vector matrix of the image to obtain a third scale feature vector matrix of the image comprises: and inputting the second fused feature vector matrix of the image to a third global average pooling layer, and pooling the second fused feature vector matrix of the image through the third global average pooling layer to obtain a third scale feature vector matrix of the image.

6. The method of claim 1, wherein locating a target object from at least one of the re-identified objects based on the multi-scale feature vector matrix of the image of the particular object and the multi-scale feature vector matrix of the image of the at least one of the re-identified objects comprises:

determining the similarity between the specific object and each re-identified object according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of at least one re-identified object;

and locating a target object from at least one re-identification object according to the similarity between the specific object and each re-identification object.

7. A multi-scale feature processing apparatus for an image, the apparatus comprising:

An acquisition module for acquiring a feature map of an image, the image comprising an image of a specific object and an image of at least one re-identified object;

the first processing module is used for processing the feature map of the image to obtain a first scale feature vector matrix of the image;

the second processing module is used for processing the first fusion feature vector matrix of the image to obtain a second scale feature vector matrix of the image, and the first fusion feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and the feature map of the image;

the third processing module is used for processing the second fusion feature vector matrix of the image to obtain a third scale feature vector matrix of the image, and the second fusion feature vector matrix of the image is determined based on the feature vector matrix output by the bottleneck layer and the first fusion feature vector matrix of the image;

a determining module, configured to determine a multi-scale feature vector matrix of the image according to a first scale feature vector matrix of the image, a second scale feature vector matrix of the image, and a third scale feature vector matrix of the image;

The circulation module is used for acquiring a multi-scale feature vector matrix of the image of the specific object and a multi-scale feature vector matrix of the image of the at least one re-identification object in a circulation mode, and positioning a target object from the at least one re-identification object according to the multi-scale feature vector matrix of the image of the specific object and the multi-scale feature vector matrix of the image of the at least one re-identification object;

the multi-scale feature processing device of the image is further configured to, prior to acquiring the feature map of the image:

8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 6.