CN110598717B

CN110598717B - Image feature extraction method and device and electronic equipment

Info

Publication number: CN110598717B
Application number: CN201910873505.1A
Authority: CN
Inventors: 郭梓超
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2022-06-21
Anticipated expiration: 2039-09-12
Also published as: CN110598717A

Abstract

The invention provides an image feature extraction method, an image feature extraction device and electronic equipment, which relate to the technical field of image processing and comprise the following steps: acquiring an image to be processed; the image to be processed comprises an original image to be processed or a characteristic image to be processed; segmenting an image to be processed to obtain a plurality of sub-images; generating target weights corresponding to the sub-images based on the image to be processed and a plurality of preset initial weights; for each sub-image, performing convolution operation on the sub-image by adopting the target weight corresponding to the sub-image to obtain a characteristic image of the sub-image; and generating a characteristic image of the image to be processed based on the characteristic image of each sub-image. The invention can effectively improve the existing image feature extraction mode.

Description

Image feature extraction method and device and electronic equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image feature extraction method and apparatus, and an electronic device.

Background

The image feature extraction is a key link in computer vision and image processing, and the feature extraction is required to be carried out on the image in various application scenes such as pedestrian detection, vehicle re-identification and the like. The existing image feature extraction mode is mainly to extract features of an image to be processed through a convolutional neural network which is obtained through training and set with fixed weight. During specific implementation, a fixed weight is usually preset in a training process of the convolutional neural network, the preset weight is updated and optimized according to a loss function in the training process, and after the training of the convolutional neural network is finished, feature extraction is directly performed on an original image based on the optimized fixed weight.

However, the inventor finds that it is difficult to extract the feature information of the image well by a method of performing feature extraction on different input images to be processed based on fixed weights by using a convolutional neural network.

Disclosure of Invention

In view of the above, the present invention provides an image feature extraction method, an image feature extraction device and an electronic device, which can effectively improve the conventional image feature extraction method.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

in a first aspect, an embodiment of the present invention provides an image feature extraction method, where the method includes: acquiring an image to be processed with features to be extracted; the image to be processed comprises an original image to be processed or a characteristic image to be processed; segmenting an image to be processed to obtain a plurality of sub-images; generating target weights corresponding to the sub-images based on the image to be processed and a plurality of preset initial weights; for each sub-image, performing convolution operation on the sub-image by adopting the target weight corresponding to the sub-image to obtain a characteristic image of the sub-image; and generating a characteristic image of the image to be processed based on the characteristic image of each sub-image.

Further, the step of generating a target weight corresponding to each sub-image based on the image to be processed and a plurality of preset initial weights includes: generating a plurality of linear weighting coefficients based on the image to be processed, the plurality of sub-images and a plurality of preset initial weights; and generating a target weight corresponding to each sub-image based on the linear weighting coefficient and a plurality of preset initial weights.

Further, the step of generating a plurality of linear weighting coefficients based on the image to be processed, the plurality of sub-images and a plurality of preset initial weights includes: performing dimensionality reduction operation on an image to be processed to obtain image information subjected to dimensionality reduction; constructing a full connection layer based on the number of the sub-images and the number of the initial weights; and calculating the image information after the dimension reduction through the full connection layer to generate a plurality of linear weighting coefficients.

Further, the step of generating a target weight corresponding to each sub-image based on the linear weighting coefficient and a plurality of preset initial weights includes: generating a target weight corresponding to each sub-image according to a preset linear weighting formula, a linear weighting coefficient and a plurality of preset initial weights; wherein, the preset linear weighting formula is as follows:

wherein, W_nsTarget weight, w, corresponding to S sub-image of nth image to be processed_iIs the ith initial weight, N is the number of initial weights, alpha_iAnd the ith linear weighting coefficient corresponds to the S sub-image of the nth image to be processed.

Further, the step of segmenting the image to be processed includes: segmenting an image to be processed according to a preset segmentation rule; the preset segmentation rule comprises a uniform segmentation rule and/or a non-uniform segmentation rule.

Further, the step of generating a feature image of the image to be processed based on the feature image of each sub-image includes: and splicing the characteristic images of the sub-images according to a preset segmentation rule to obtain the characteristic image of the image to be processed.

Further, the initial weight is obtained based on an optimization algorithm; the optimization algorithm comprises one or more of a random steepest descent method SGD, a batch gradient descent method BGD, an Adam optimization algorithm and a root mean square error algorithm RMSProp.

In a second aspect, an embodiment of the present invention provides an apparatus for extracting an image feature, where the apparatus includes: the image acquisition module is used for acquiring an image to be processed with features to be extracted; the image to be processed comprises an original image to be processed or a characteristic image to be processed; the image segmentation module is used for segmenting the image to be processed to obtain a plurality of sub-images; the weight generation module is used for generating target weights corresponding to the sub-images based on the image to be processed and a plurality of preset initial weights; the convolution operation module is used for performing convolution operation on each sub-image by adopting the target weight corresponding to the sub-image to obtain a characteristic image of the sub-image; and the characteristic image generation module is used for generating a characteristic image of the image to be processed based on the characteristic image of each sub-image.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method as in the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the method in the first aspect.

The embodiment of the invention provides an image feature extraction method, an image feature extraction device and electronic equipment, wherein an acquired image to be processed is firstly segmented into a plurality of sub-images; then generating target weights corresponding to the sub-images based on the image to be processed and a plurality of preset initial weights; and finally, performing convolution operation on each sub-image by adopting the target weight corresponding to the sub-image to obtain the characteristic image of the sub-image, so as to generate the characteristic image of the image to be processed based on the characteristic image of each sub-image. Compared with the prior art that no matter what kind of image to be processed is subjected to feature extraction, a fixed weight mode is adopted, the method provided by the embodiment can generate corresponding target weights based on different sub-images of the image to be processed, and further performs feature extraction based on the target weights of the sub-images, so as to finally obtain the features of the image to be processed. On one hand, each sub-image corresponds to a target weight, the target weight and the sub-image have a strong adaptive relationship, the integrity and the accuracy of the feature extraction of the sub-image can be effectively improved by adopting the target weight of the adaptive sub-image to carry out convolution operation on the sub-image, and the feature extraction effect of each sub-image is improved, so that the feature extraction effect of the whole to-be-processed image can be comprehensively improved; on the other hand, the target weight is generated based on the image to be processed and the initial weight, namely the target weight is related to the input image to be processed, the input image to be processed is different, and the target weight is correspondingly different, so that the characteristic extraction is performed by adopting the target weight related to the image to be processed, the characteristic of the image to be processed can be extracted in a more targeted manner, and the characteristic extraction effect of the image to be processed is further improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the above-described technology of the disclosure.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an image feature extraction method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a segmentation method of an image to be processed according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating an embodiment of the present invention for extracting image features;

fig. 5 shows a block diagram of an image feature extraction apparatus according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The inventor finds in the research process that when the image is subjected to feature extraction in the prior art, the weights of the convolutional neural network are predefined fixed weights, and the following problems can exist: (1) for one image to be processed, the fixed weight is adopted to carry out global feature extraction, and the difference of different areas in the whole image cannot be considered; (2) the fixed weight is adopted for feature extraction of different images to be processed, and the difference of different images cannot be considered. All the above problems can cause that the existing feature extraction method is difficult to accurately and completely acquire the feature information of the image. Based on this, in order to improve at least one of the above problems, embodiments of the present invention provide an image feature extraction method, an image feature extraction device, and an electronic device, which can effectively improve accuracy of image feature extraction and integrity of extracted features. The technology can be applied to various tasks of image feature extraction through a convolutional neural network, such as a face recognition task, a pedestrian detection task and the like. For ease of understanding, the following detailed description will discuss embodiments of the present invention.

The first embodiment is as follows:

first, an exemplary electronic device 100 for implementing the image feature extraction method and apparatus according to the embodiment of the present invention is described with reference to fig. 1.

As shown in fig. 1, an electronic device 100 includes one or more processors 102, one or more memory devices 104, an input device 106, an output device 108, and an image capture device 110, which are interconnected via a bus system 112 and/or other type of connection mechanism (not shown). It should be noted that the components and configuration of the electronic device 100 shown in FIG. 1 are exemplary only, and not limiting, and that the electronic device may have other components and configurations as desired.

The processor 102 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.

The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The image capture device 110 may take images (e.g., photographs, videos, etc.) desired by the user and store the taken images in the storage device 104 for use by other components.

Exemplary electronic devices for implementing the image feature extraction method and apparatus according to the embodiments of the present invention may be implemented on smart terminals such as smart phones, tablet computers, and the like.

Example two:

referring to a flowchart of an image feature extraction method shown in fig. 2, the method specifically includes the following steps:

step S202, acquiring an image to be processed with features to be extracted; the image to be processed comprises an original image to be processed or a characteristic image to be processed. The original image to be processed may include an initial image, such as an RGB image, obtained by means of image capture by an image capture device, network download, local storage, or manual upload; the feature image to be processed may include a feature image obtained by performing feature extraction on an initial image in advance by using an existing feature extraction algorithm, such as a HOG (Histogram of Oriented Gradient) algorithm, an LBP (Local Binary Pattern) algorithm, or may include a next-layer feature map obtained by performing a convolution operation on the initial image or an intermediate feature map. In practical applications, the image to be processed may be expressed as a multi-dimensional structure tensor.

And step S204, segmenting the image to be processed to obtain a plurality of sub-images. In this embodiment, the image to be processed may be subjected to uniform segmentation or non-uniform segmentation, and the image to be processed is segmented into at least two local regions, where each local region is a sub-image.

Step S206, generating target weights corresponding to the sub-images based on the image to be processed and a plurality of preset initial weights. The preset initial weights may be fixed weights manually set based on a conventional work experience, or may be weights obtained by optimizing the manually set fixed weights.

And S208, performing convolution operation on each sub-image by adopting the target weight corresponding to the sub-image to obtain the characteristic image of the sub-image.

It will be appreciated that for each sub-image, when the sub-image is convolved with its corresponding target weight, the target weight is shared by multiple sliding windows of the sub-image; the target weights shared between different sub-images are different.

Step S210, generating a characteristic image of the image to be processed based on the characteristic image of each sub-image. The feature images of the sub-images may be recombined in a manner of splitting the image to be processed into the sub-images to generate the feature image of the image to be processed.

In practical applications, compared to the conventional feature extraction operation (such as convolution operation) that a fixed weight is applied to any image to be processed, the image feature extraction process provided in this embodiment may obtain different target weights for different images to be processed, that is, the weights vary with different images, so that the steps S202 to S210 may be regarded as a dynamic convolution operation different from a conventional convolution operation as a whole, each sub-image obtained by segmenting the image to be processed shares a target weight, and different target weights shared by different sub-images are different, so that the steps S202 to S210 may be regarded as a local-sharing convolution operation different from the conventional convolution operation as a whole, and in sum, the steps S202 to S210 may be regarded as a dynamic and local-sharing convolution operation as a whole. The convolutional neural network may repeatedly perform the above dynamic and local shared convolution operation for multiple times, such as, after the feature image of the image to be processed is obtained in step S210, taking the feature image of the image to be processed as a new image to be processed, and then repeatedly performing the above steps S202 to S210 on the new image to be processed, so as to perform further feature extraction on the new image to be processed.

The image feature extraction method provided by the embodiment of the invention comprises the steps of firstly segmenting an acquired image to be processed into a plurality of sub-images; then generating target weights corresponding to the sub-images based on the image to be processed and a plurality of preset initial weights; and finally, performing convolution operation on each sub-image by adopting the target weight corresponding to the sub-image to obtain the characteristic image of the sub-image, so as to generate the characteristic image of the image to be processed based on the characteristic image of each sub-image. Compared with the prior art that no matter what kind of image to be processed is subjected to feature extraction, a fixed weight mode is adopted, the method provided by the embodiment can generate corresponding target weights based on different sub-images of the image to be processed, and further performs feature extraction based on the target weights of the sub-images, so as to finally obtain the features of the image to be processed. On one hand, each sub-image corresponds to a target weight, the target weight and the sub-image have a strong adaptive relationship, the integrity and the accuracy of the feature extraction of the sub-image can be effectively improved by adopting the target weight of the adaptive sub-image to carry out convolution operation on the sub-image, and the feature extraction effect of each sub-image is improved, so that the feature extraction effect of the whole to-be-processed image can be comprehensively improved; on the other hand, the target weight is generated based on the image to be processed and the initial weight, namely the target weight is related to the input image to be processed, the input image to be processed is different, and the target weight is correspondingly different, so that the characteristic extraction is performed by adopting the target weight related to the image to be processed, the characteristic of the image to be processed can be extracted in a more targeted manner, and the characteristic extraction effect of the image to be processed is further improved.

When the image to be processed is segmented in the step S204, the image to be processed may be segmented according to a preset segmentation rule; the preset segmentation rule comprises a uniform segmentation rule and/or a non-uniform segmentation rule. For ease of understanding, examples of the segmentation modes of the uniform segmentation rule and the non-uniform segmentation rule are given below respectively:

the example of the segmentation mode according to the uniform segmentation rule is as follows: uniformly dividing the image to be processed into a plurality of local areas along the transverse direction and/or the longitudinal direction, wherein each local area is a sub-image; such as referring to the schematic diagram of the segmentation mode of the image to be processed shown in fig. 3, a mode of uniformly segmenting the image to be processed into four sub-images along the transverse direction and the longitudinal direction is shown.

The first example is a segmentation mode according to a non-uniform segmentation rule: and randomly segmenting the image to be processed, wherein each local area obtained by segmentation is a sub-image.

The second example of the segmentation mode according to the non-uniform segmentation rule is as follows: the image to be processed is segmented according to the characteristics of the image to be processed, for example, the image to be processed is a face image, the face image can be segmented into a plurality of local areas such as an eye area, a nose area, a mouth area and the like according to the distribution of key points of the face, and each local area is a sub-image.

Of course, the above is only an exemplary illustration of segmenting the image to be processed, and in practical applications, other segmenting manners may also be included, which is not limited herein.

In order to distinguish the different sub-images obtained by slicing, the sub-images may be numbered sequentially, and in practical applications, for example, S ═ 1, 2, and 3 … … may be used to represent the sequential numbers of the sub-images. The ordinal number of the sub-images can be numbered in a variety of ways, such as the following two ways:

the first method is as follows: firstly, determining the subimages arranged at the designated positions as the starting points of the serial numbers, and setting the ordinal number of the subimage as S to be 1; the designated position is usually a vertex position of the image to be processed, such as an upper left corner position, a lower left corner position, and the like of the image to be processed. Then, numbering the sub-images in a sequence number from the starting point according to a preset numbering direction to determine the sequence number of each sub-image; the numbering direction is, for example, from left to right, from top to bottom, and the ordinal numbers of the sub-images shown in fig. 3 are numbered in the numbering direction from left to right and then from top to bottom. The numbering mode is simple, and the method is suitable for scenes with relatively orderly sub-image arrangement, such as sub-images obtained after the images to be processed are segmented according to the uniform segmentation rule.

The second method comprises the following steps: firstly, acquiring position coordinates of preset key points of each subimage; the preset key points are the top points and the central points of the sub-images. Then, ordering the sub-images based on the position coordinates of the sub-images to obtain an ordering result; in specific implementation, the subimages can be sorted from small to large (or from large to small) according to the abscissa in the position coordinate, and then the subimages with the same abscissa are sorted from small to large (or from large to small) according to the ordinate in the position coordinate, so as to obtain a sorting result; of course, the sorting result can also be obtained by adopting a sorting mode based on the ordinate first and then the abscissa. And finally, carrying out ordinal numbering on the sub-images according to the sorting result. The numbering mode can well realize ordinal numbering for the sub-images in various arrangement scenes, and has strong universality.

Based on the image to be processed and the sub-images obtained by segmentation, the embodiment further provides a specific implementation manner for generating the target weights corresponding to the sub-images based on the image to be processed and a plurality of preset initial weights, which may be executed with reference to the following first step and second step:

the method comprises the following steps of firstly, generating a plurality of linear weighting coefficients based on an image to be processed, a plurality of sub-images and a plurality of preset initial weights.

And secondly, generating target weights corresponding to the sub-images based on the linear weighting coefficients and a plurality of preset initial weights.

For ease of understanding, the first and second steps are described separately below.

The embodiment first provides a method for generating a plurality of linear weighting coefficients applied in the first step, which can specifically refer to the following steps:

assuming that the image to be processed is a feature image X to be processed, the feature image X to be processed may be represented by a tensor with a dimension (n, c, h, w); wherein n represents the number of the characteristic images X to be processed; c represents the channel number of the input feature image of the feature extraction network, wherein the feature extraction network is a convolution neural network used for extracting the features of the feature image X to be processed, and can be ResNet34, VGGNet (visual Geometry Group network) and the like; h represents the height of the characteristic image X; w represents the width of the feature image X.

The dimension of each initial weight is (o, c, kh, kw); wherein o represents the number of channels of the output feature image of the feature extraction network, c represents the number of channels of the input feature image of the feature extraction network, kh represents the height of the convolution kernel, and kw represents the width of the convolution kernel.

The initial weight may be a fixed weight manually set based on a general work experience, or may be a weight obtained by optimizing the manually set fixed weight. When the initial weight is a weight obtained by optimizing a fixed weight set by a human, the initial weight may be obtained by referring to:

firstly, carrying out convolution operation on a sample image by adopting a fixed weight set by people to obtain a characteristic extraction result of the sample image; wherein the sample image comprises a plurality of different images. And then reversely adjusting the fixed weight through a preset optimization algorithm based on the feature extraction result of the sample image to obtain an optimized weight, and determining the optimized weight as an initial weight. The optimization algorithm comprises one or more of the following: SGD (Stochastic Gradient, random steepest Descent), BGD (Batch Gradient, Batch Gradient Descent), Adam optimization algorithm, root mean square error algorithm RMSProp. Of course, in practical applications, other optimization algorithms, such as an MBGD (Mini-Batch Gradient Descent) algorithm, may also be used, and are not limited herein.

For the sake of understanding, the present embodiment may describe the method for obtaining the initial weight by taking the BGD algorithm as an example, and for each fixed weight preset by human, refer to the following: determining the fixed weight as an initial weight of the convolutional neural network; constructing a calculation graph according to the convolutional neural network, and calculating the gradient of the initial weight of the convolutional neural network convolutional layer through a back propagation algorithm based on the calculation graph; and then, carrying out iterative updating on the initial weight according to the BGD algorithm, the gradient of the initial weight and the learning rate until a preset iteration frequency is reached or the learning rate of the feature extraction result of the sample image reaches a preset threshold value, and taking the weight after the initial weight is updated as the optimized initial weight.

Based on the to-be-processed image (i.e., the to-be-processed feature image X) and the initial weight described above, the manner of generating the plurality of linear weighting coefficients in the first step may include the following steps (1) to (3):

(1) and performing dimension reduction operation on the image to be processed to obtain dimension-reduced image information. In specific implementation, a numerical value of each to-be-processed image in the dimension (h, w) can be obtained first, and the sum of the height value and the width value of the to-be-processed image is obtained; and then, respectively averaging the sum of the height values and the sum of the width values, and taking the height average value and the width average value as the numerical value of each image to be processed in the dimension (h, w). In this case, the h-dimension and the w-dimension of all the images to be processed are the same, and the difference influence of the dimension (h, w) on the images to be processed can be ignored, so that the dimension of the images to be processed is reduced, and the image information with the dimension (n, c) is obtained.

(2) A fully connected layer is constructed based on the number of sub-images and the number of initial weights. The input dimension of the constructed full-connection layer is c, and the output dimension is (S, N); s is the number of sub-images, and N is the number of initial weights.

(3) And calculating the image information after the dimension reduction through the full connection layer to generate a plurality of linear weighting coefficients. Specifically, a linear weighting coefficient of dimension (N, S, N) is generated by performing matrix multiplication on the full-link layer of input dimension c and output dimension (S, N) and the image information of dimension (N, c). It can be understood that for the image to be processed with the number of samples N, each image to be processed corresponds to S sub-images and (N, S) linear weighting coefficients.

Based on the generated linear weighting coefficients, the embodiment further provides a method for generating a target weight applied to each sub-image in the second step, which may specifically refer to the following steps:

considering that the weights actually applied to feature extraction are usually optimized weights in the convolutional neural network training process, based on this, the second step can be referred to the following steps 1 to 3 in a specific implementation manner:

step 1, generating a target weight corresponding to each sub-image based on a linear weighting coefficient and a plurality of preset initial weights.

Specifically, the target weight corresponding to each sub-image may be generated according to a preset linear weighting formula, a linear weighting coefficient, and a plurality of preset initial weights; wherein, the preset linear weighting formula is as follows:

wherein, W_nsTarget weight, w, corresponding to S sub-image of nth image to be processed_iIs the ith initial weight, N is the number of initial weights, alpha_iAnd the ith linear weighting coefficient corresponds to the S sub-image of the nth image to be processed. The target weight W can be visually embodied based on the preset linear weighting formula_nsDependence on the image and sub-image to be processed, including: the input image to be processed is dynamically changed, the sequence number n of the image to be processed is 1, 2 and 3 … … to distinguish the images to be processed, and the target weight W is changed along with the dynamic change of the sequence number n of the image to be processed_nsAlso dynamically changes along with the change; further comprising: for each image to be processed, the sequence number S of the sub-images is 1, 2 and 3 … … to distinguish the sub-images, and the target weight W is changed along with the dynamic change of the sequence number S of the sub-images_nsAnd also dynamically changes.

For the purpose of understanding the above target weight W_nsReference may be made to the schematic diagram of extracting image features as shown in fig. 4, where fig. 4 shows an image X to be processed with n equal to 1₁The image to be processed X is processed₁Splitting into 4 sub-images X_nSAre each X₁₁、X₁₂、X₁₃、X₁₄(ii) a The number of initial weights is N-2, which are: w is a₁And W₂(ii) a The dimension is (n,alpha of S, N)_iThe method comprises the following steps: alpha (alpha) ("alpha")_i(1,1,1)、α_i(1,2,1)、α_i(1,3,1)、α_i(1,4,1)，α_i(1,1,2)、α_i(1,2,2)、α_i(1,3,2)、α_i(1,4,2)。

The following target weight W corresponding to each sub-image can be obtained according to the preset linear weighting formula_ns：

W₁₁＝α_i(1,1,1)*w₁+α_i(1,1,2)*w₂，W₁₂＝α_i(1,2,1)*w₁+α_i(1,2,2)*w₂，W₁₃＝α_i(1,3,1)*w₁+α_i(1,3,2)*w₂And W₁₄＝α_i(1,4,1)*w₁+α_i(1,4,2)*w₂。

Wherein the target weight W_nsThe correspondence to the partial images is determined according to the value of the ordinal number S of the partial image, for example, for the partial image X with S equal to 1₁₁Corresponding to the target weight W₁₁。

After the target weight corresponding to each sub-image is determined in the above manner, referring to fig. 4, the target weight is used to perform convolution operation on the corresponding sub-image according to the following convolution operation formula: y ═ WX.

Wherein a target weight W is utilized₁₁Sub-image X₁₁Performing convolution operation to obtain sub-image X₁₁Characteristic image Y of₁₁(ii) a Using target weight W₁₂Sub-image X₁₂Performing convolution operation to obtain sub-image X₁₂Characteristic image Y of₁₂(ii) a Using target weight W₁₃Sub-image X₁₃Performing convolution operation to obtain sub-image X₁₃Characteristic image Y of₁₃(ii) a Using target weight W₁₄Sub-image X₁₄Performing convolution operation to obtain sub-image X₁₄Characteristic image Y of₁₄。

Characteristic image Y based on each sub-image₁₁、Y₁₂、Y₁₃And Y₁₄And generating a characteristic image Y of the image X to be processed. In this embodiment, the segmentation may be performed according to a preset segmentationAnd regularly splicing the characteristic images of the sub-images to obtain a characteristic image Y of the image to be processed. The characteristic image corresponding to any image to be processed can be represented in the following way:

Y_n＝(W_n,S＝1·X_S＝1,W_n,S＝2·X_S＝2,……W_n,S＝S·X_S＝S)

wherein, Y_nAnd the bracket () represents that the S split sub-images are combined according to a preset splitting rule. The characteristic image Y_nHas a dimension of (n, c, oh, ow); wherein o represents the number of channels of the output feature image of the feature extraction network, c represents the number of channels of the input feature image of the feature extraction network, and oh represents the feature image Y_nW represents a feature image Y_nIs wide. Finally all Y_nThe output characteristic images of the input N images to be processed are formed together.

In summary, compared with the prior art that a fixed weight is adopted no matter what kind of to-be-processed image is subjected to feature extraction, the image feature extraction method provided in the above embodiment can generate corresponding target weights based on different sub-images of the to-be-processed image, and further perform feature extraction based on the target weights of the sub-images, thereby finally obtaining the features of the to-be-processed image. On one hand, each sub-image corresponds to a target weight, the target weight and the sub-image have a strong adaptive relationship, the integrity and the accuracy of the feature extraction of the sub-image can be effectively improved by adopting the target weight of the adaptive sub-image to carry out convolution operation on the sub-image, and the feature extraction effect of each sub-image is improved, so that the feature extraction effect of the whole to-be-processed image can be comprehensively improved; on the other hand, the target weight is generated based on the image to be processed and the initial weight, namely the target weight is related to the input image to be processed, the input image to be processed is different, and the target weight is correspondingly different, so that the characteristic extraction is performed by adopting the target weight related to the image to be processed, the characteristic of the image to be processed can be extracted in a more targeted manner, and the characteristic extraction effect of the image to be processed is further improved.

Example three:

as to the image feature extraction method provided in the second embodiment, an embodiment of the present invention provides an image feature extraction device, and referring to a block diagram of a structure of an image feature extraction device shown in fig. 5, the device includes:

an image obtaining module 502, configured to obtain an image to be processed with features to be extracted; the image to be processed comprises an original image to be processed or a characteristic image to be processed;

the image segmentation module 504 is configured to segment an image to be processed to obtain a plurality of sub-images;

a weight generating module 506, configured to generate a target weight corresponding to each sub-image based on the image to be processed and a plurality of preset initial weights;

a convolution operation module 508, configured to perform convolution operation on each sub-image by using the target weight corresponding to the sub-image to obtain a feature image of the sub-image;

a feature image generating module 510, configured to generate a feature image of the image to be processed based on the feature image of each sub-image.

The image feature extraction device provided by the embodiment of the invention divides the acquired image to be processed into a plurality of sub-images; then generating target weights corresponding to the sub-images based on the image to be processed and a plurality of preset initial weights; and finally, performing convolution operation on each sub-image by adopting the target weight corresponding to the sub-image to obtain the characteristic image of the sub-image, thereby generating the characteristic image of the image to be processed based on the characteristic image of each sub-image. Compared with the prior art that no matter what kind of image to be processed is subjected to feature extraction, a fixed weight mode is adopted, the method provided by the embodiment can generate corresponding target weights based on different sub-images of the image to be processed, and further performs feature extraction based on the target weights of the sub-images, so as to finally obtain the features of the image to be processed. On one hand, each sub-image corresponds to a target weight, the target weight and the sub-image have a strong adaptive relationship, the integrity and the accuracy of the feature extraction of the sub-image can be effectively improved by adopting the target weight of the adaptive sub-image to carry out convolution operation on the sub-image, and the feature extraction effect of each sub-image is improved, so that the feature extraction effect of the whole to-be-processed image can be comprehensively improved; on the other hand, the target weight is generated based on the image to be processed and the initial weight, namely the target weight is related to the input image to be processed, the input image to be processed is different, and the target weight is correspondingly different, so that the characteristic extraction is performed by adopting the target weight related to the image to be processed, the characteristic of the image to be processed can be extracted in a more targeted manner, and the characteristic extraction effect of the image to be processed is further improved.

In some embodiments, the weight generation module 506 is further configured to: generating a plurality of linear weighting coefficients based on the image to be processed, the plurality of sub-images and a plurality of preset initial weights; and generating a target weight corresponding to each sub-image based on the linear weighting coefficient and a plurality of preset initial weights.

In some embodiments, the weight generation module 506 is further configured to: performing dimensionality reduction operation on an image to be processed to obtain image information subjected to dimensionality reduction; constructing a full connection layer based on the number of the sub-images and the number of the initial weights; and calculating the image information after the dimension reduction through the full connection layer to generate a plurality of linear weighting coefficients.

In some embodiments, the weight generation module 506 is further configured to: generating a target weight corresponding to each sub-image according to a preset linear weighting formula, the linear weighting coefficients and a plurality of preset initial weights; wherein the preset linear weighting formula is:

wherein, W_nsTarget weight, w, corresponding to S sub-image of nth image to be processed_iIs the ith initial weight, N is the number of initial weights, alpha_iThe ith linear addition corresponding to the S sub-image of the nth image to be processedA weight coefficient.

In some embodiments, the image segmentation module 504 is further configured to segment the image to be processed according to a preset segmentation rule; the preset segmentation rules comprise uniform segmentation rules and/or non-uniform segmentation rules.

In some embodiments, the feature image generation module 510 is further configured to: and splicing the characteristic images of the sub-images according to a preset segmentation rule to obtain the characteristic image of the image to be processed.

In some embodiments, the initial weight is obtained based on an optimization algorithm; the optimization algorithm comprises one or more of a random steepest descent method SGD, a batch gradient descent method BGD, an Adam optimization algorithm and a root mean square error algorithm RMSProp.

The device provided in this embodiment has the same implementation principle and the same technical effects as those of the foregoing embodiment, and for the sake of brief description, reference may be made to corresponding contents in the foregoing embodiment two for parts of this embodiment that are not mentioned.

Example four:

based on the foregoing embodiments, the present embodiment provides an electronic device, including: a processor and a storage device; the storage device stores a computer program, and the computer program, when executed by the processor, executes any one of the image feature extraction methods provided in embodiment two.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the electronic device described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

Further, the present embodiment also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processing device, the computer program performs the steps of any one of the methods provided in the second embodiment.

The method and the device for extracting image features and the computer program product of the electronic device provided by the embodiments of the present invention include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image feature extraction method, characterized by comprising:

acquiring an image to be processed; the image to be processed comprises an original image to be processed or a characteristic image to be processed;

segmenting the image to be processed to obtain a plurality of sub-images;

generating a target weight corresponding to each sub-image based on the image to be processed and a plurality of preset initial weights;

for each sub-image, performing convolution operation on the sub-image by adopting the target weight corresponding to the sub-image to obtain a characteristic image of the sub-image;

generating a characteristic image of the image to be processed based on the characteristic image of each sub-image;

the step of generating a target weight corresponding to each of the sub-images based on the image to be processed and a plurality of preset initial weights includes:

generating a plurality of linear weighting coefficients based on the image to be processed, the plurality of sub-images and a plurality of preset initial weights;

generating a target weight corresponding to each of the sub-images based on the linear weighting coefficient and the preset plurality of initial weights;

the step of generating a plurality of linear weighting coefficients based on the image to be processed, the plurality of sub-images and a plurality of preset initial weights comprises:

performing dimensionality reduction operation on the image to be processed to obtain dimensionality-reduced image information;

constructing a fully connected layer based on the number of sub-images and the number of initial weights;

and calculating the image information after the dimensionality reduction through the full connection layer to generate a plurality of linear weighting coefficients.

2. The method of claim 1, wherein the step of generating a target weight corresponding to each of the sub-images based on the linear weighting coefficients and a plurality of preset initial weights comprises:

generating a target weight corresponding to each sub-image according to a preset linear weighting formula, the linear weighting coefficients and a plurality of preset initial weights; wherein the preset linear weighting formula is:

3. The method according to claim 1, wherein the step of segmenting the image to be processed comprises:

segmenting the image to be processed according to a preset segmentation rule; the preset segmentation rules comprise uniform segmentation rules and/or non-uniform segmentation rules.

4. The method according to claim 3, wherein the step of generating the feature image of the image to be processed based on the feature image of each of the sub-images comprises:

and splicing the characteristic images of the sub-images according to the preset segmentation rule to obtain the characteristic image of the image to be processed.

5. The method of claim 1, wherein the initial weights are derived based on an optimization algorithm; the optimization algorithm comprises one or more of a random steepest descent method SGD, a batch gradient descent method BGD, an Adam optimization algorithm and a root mean square error algorithm RMSProp.

6. An apparatus for extracting image features, the apparatus comprising:

the image acquisition module is used for acquiring an image to be processed; the image to be processed comprises an original image to be processed or a characteristic image to be processed;

the image segmentation module is used for segmenting the image to be processed to obtain a plurality of sub-images;

the weight generation module is used for generating target weights corresponding to the sub-images based on the image to be processed and a plurality of preset initial weights;

the convolution operation module is used for performing convolution operation on each sub-image by adopting the target weight corresponding to the sub-image to obtain a characteristic image of the sub-image;

the characteristic image generation module is used for generating a characteristic image of the image to be processed based on the characteristic image of each sub-image;

the weight generation module is further to: generating a plurality of linear weighting coefficients based on the image to be processed, the plurality of sub-images and a plurality of preset initial weights; generating a target weight corresponding to each sub-image based on the linear weighting coefficient and a plurality of preset initial weights;

the weight generation module is further to: performing dimensionality reduction on an image to be processed to obtain image information subjected to dimensionality reduction; constructing a full connection layer based on the number of the sub-images and the number of the initial weights; and calculating the image information after the dimension reduction through the full connection layer to generate a plurality of linear weighting coefficients.

7. An electronic device, comprising: a processor and a storage device;

the storage device has stored thereon a computer program which, when executed by the processor, performs the method of any of claims 1 to 5.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the claims 1 to 5.