CN113033334B

CN113033334B - Image processing method, image processing device, electronic equipment and medium

Info

Publication number: CN113033334B
Application number: CN202110247358.4A
Authority: CN
Inventors: 王诗吟; 周强
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2024-07-02
Anticipated expiration: 2041-03-05
Also published as: CN113033334A

Abstract

The disclosure provides an image processing method, an image processing device, electronic equipment and a medium; the method comprises the steps of obtaining a first mask of a visible area of a target object in an original image, obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks, stacking the original image and the first masks, inputting the first mask into a target neural network model to obtain a first feature map, and stacking the first feature map and the plurality of gesture feature masks to obtain a target complete mask of the target object, wherein the second masks are in one-to-one correspondence with the gestures of the target object, and different second masks are in correspondence with different gestures of the target object.

Description

Image processing method, image processing device, electronic equipment and medium

Technical Field

The disclosure relates to the technical field of image processing, and in particular relates to an image processing method, an image processing device, electronic equipment and a medium.

Background

In the field of image processing, it is generally required to complement some partially blocked target objects in an image, where the target objects may be human bodies or other objects, for example, if a human body part is blocked by a blocking object in one image, it is required to complement the blocked part of the human body by a series of image processing, so as to obtain a complete human body. In general, in the process of complementing a target object, it is often necessary to predict the complete mask of the target object.

In the prior art, an original image is generally input into an instance segmentation network, a mask of a visible region of a target object is obtained through the instance segmentation network, and then a target complete mask of the target object is obtained through a neural network model based on the original image and the mask of the visible region of the target object.

However, the accuracy of obtaining the target complete mask of the target object using the prior art is not high.

Disclosure of Invention

In order to solve the technical problems described above or at least partially solve the technical problems described above, the present disclosure provides an image processing method, an apparatus, an electronic device, and a medium.

A first aspect of the present disclosure provides an image processing method, the method including:

Acquiring a first mask of a visible region of a target object in an original image, wherein the original image contains the target object, and a part of region of the target object is blocked by a blocking object;

obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks, wherein different second masks correspond to different gestures of the target object;

The original image and the first mask are stacked and then input into a target neural network model to obtain a first feature map, wherein the target neural network model is obtained by training based on an original image sample, a reference complete mask of a target object sample and the plurality of second masks, the original image sample contains the target object sample, and a part of the region of the target object sample is blocked by a blocking object;

And stacking according to the first feature map and the gesture feature masks to obtain a target complete mask of the target object.

Optionally, the obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks includes:

obtaining similarity between the first mask and a plurality of second masks respectively;

And correspondingly multiplying the plurality of similarity degrees by the plurality of second masks to obtain the plurality of gesture feature masks.

Optionally, before obtaining the plurality of gesture feature masks according to the first mask and the plurality of second masks, the method further includes:

acquiring a plurality of mask samples of a target object;

Performing cluster analysis on the mask samples according to the similarity among the mask samples to obtain a plurality of cluster sets;

and for each cluster set, acquiring a second mask corresponding to the cluster set according to mask samples in the cluster set.

Optionally, the obtaining, according to the mask samples in the cluster set, a second mask corresponding to the cluster set includes:

randomly selecting one mask sample from the cluster set as a second mask corresponding to the cluster set;

Or alternatively

And averaging all mask samples in the cluster set, and taking the result as a second mask corresponding to the cluster set.

Optionally, the performing cluster analysis on the plurality of mask samples according to the similarity between the mask samples to obtain a plurality of cluster sets includes:

Obtaining a target similarity matrix according to cosine distance and/or Euclidean distance between any two mask samples, wherein element values in the target similarity matrix represent similarity between the two corresponding mask samples;

and carrying out cluster analysis on the plurality of mask samples according to the target similarity matrix to obtain a plurality of cluster sets.

Optionally, the obtaining a similarity matrix according to the cosine distance and/or the euclidean distance between any two mask samples includes:

vectorizing the mask sample to obtain a mask sample vector;

And obtaining a target similarity matrix according to the cosine distance and/or the Euclidean distance between any two mask sample vectors.

Optionally, the obtaining the target similarity matrix according to the cosine distance and/or the euclidean distance between any two mask sample vectors includes:

A first similarity matrix obtained according to the cosine distance between any two mask sample vectors and/or a second similarity matrix obtained according to the Euclidean distance between any two mask sample vectors;

And carrying out linear weighting on the first similar matrix and the second similar matrix to obtain the target similar matrix, or taking the first similar matrix as the target similar matrix or taking the second similar matrix as the target similar matrix.

Optionally, before performing cluster analysis on the plurality of mask samples according to the similarity matrix to obtain a plurality of cluster sets, the method further includes:

obtaining mask sample pairs with similarity greater than or equal to a preset threshold value by traversing the target similarity matrix;

And taking the mask sample pairs as mask samples for cluster analysis.

A second aspect of the present disclosure provides an image processing apparatus, the apparatus comprising:

The acquisition module is used for acquiring a first mask of a visible area of a target object in an original image, wherein the original image contains the target object, and a part of area of the target object is blocked by a blocking object;

the processing module is used for obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks, wherein the second masks are in one-to-one correspondence with the gestures of the target object, and the different second masks are corresponding to different gestures of the target object;

The processing module is further configured to stack the original image and the first mask, and input the stacked original image and the first mask into a target neural network model to obtain a first feature map, where the target neural network model is obtained by training based on an original image sample, a reference complete mask of a target object sample, and the plurality of second masks, the original image sample includes the target object sample, and a partial area of the target object sample is blocked by a blocking object;

and the processing module is further used for carrying out stacking processing according to the first feature map and the gesture feature masks to obtain a target complete mask of the target object.

Optionally, the processing module is specifically configured to obtain similarities between the first mask and the plurality of second masks, respectively; and correspondingly multiplying the plurality of similarity degrees by the plurality of second masks to obtain the plurality of gesture feature masks.

Optionally, the acquiring module is further configured to acquire a plurality of mask samples of the target object; performing cluster analysis on the mask samples according to the similarity among the mask samples to obtain a plurality of cluster sets; and for each cluster set, acquiring a second mask corresponding to the cluster set according to mask samples in the cluster set.

Optionally, the obtaining module is specifically configured to randomly select one mask sample from the cluster set as a second mask corresponding to the cluster set; or averaging all mask samples in the cluster set, and taking the result as a second mask corresponding to the cluster set.

Optionally, the obtaining module is specifically configured to obtain a target similarity matrix according to a cosine distance and/or a euclidean distance between any two mask samples, where an element value in the target similarity matrix represents a similarity between the two corresponding mask samples; and carrying out cluster analysis on the plurality of mask samples according to the target similarity matrix to obtain a plurality of cluster sets.

Optionally, the obtaining module is specifically configured to vectorize the mask sample to obtain a mask sample vector; and obtaining a target similarity matrix according to the cosine distance and/or the Euclidean distance between any two mask sample vectors.

Optionally, the acquiring module is specifically configured to obtain a first similarity matrix according to a cosine distance between any two mask sample vectors, and/or obtain a second similarity matrix according to a euclidean distance between any two mask sample vectors; and carrying out linear weighting on the first similar matrix and the second similar matrix to obtain the target similar matrix, or taking the first similar matrix as the target similar matrix or taking the second similar matrix as the target similar matrix.

Optionally, the obtaining module is further configured to obtain a mask sample pair with a similarity greater than or equal to a preset threshold by traversing the target similarity matrix; and taking the mask sample pairs as mask samples for cluster analysis.

A third aspect of the present disclosure provides an electronic device, comprising: a processor for executing a computer program stored in a memory, which when executed by the processor implements the steps of the method of the first aspect.

A fourth aspect of the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the first aspect.

A fifth aspect of the present disclosure provides a computer program product for, when run on a computer, causing the computer to perform the image processing method of the first aspect.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

The method comprises the steps of obtaining a first mask of a visible area of a target object in an original image, obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks, stacking the original image and the first masks, inputting the first mask into a target neural network model to obtain a first feature map, and stacking the first feature map and the plurality of gesture feature masks to obtain a target complete mask of the target object, wherein the second masks are in one-to-one correspondence with the gestures of the target object, and different second masks are in correspondence with different gestures of the target object.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a schematic diagram of an image processing system provided by the present disclosure;

fig. 2 is a schematic flow chart of an image processing method provided in the present disclosure;

FIG. 3 is a flow chart of another image processing method provided by the present disclosure;

Fig. 4 is a schematic structural diagram of an image processing apparatus provided in the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.

The original image of the present disclosure includes a target object, where a partial area of the target object is blocked by a blocking object, and for convenience of description, an area of the target object blocked by the blocking object is described as an invisible area of the target object. The area of the target object that is not occluded by the occlusion object is described as the visible area of the target object. Wherein the target object may be a human body, an animal or other object, to which the present disclosure is not limited.

In the process of complementing the invisible area of the target object of the original image, predicting the mask of the invisible area of the target object is required, and complementing the target object based on the mask of the invisible area of the target object; so as to be capable of achieving image processing tasks such as target tracking, target detection, image segmentation and the like.

The mask of the invisible area of the target object is typically obtained based on the difference between the target complete mask of the target object and the mask of the visible area of the target object, and thus the accuracy of the target complete mask of the target object directly affects the accuracy of the mask of the invisible area of the target object.

In order to improve the accuracy of the obtained target complete mask of the target object, the mask of various typical poses of the target object is generated, and the mask of the typical poses is referred to in the process of predicting the target complete mask of the target object, so that the accuracy of the obtained target complete mask of the target object is improved.

The target object of the present disclosure may be a human body, an animal, or other objects, and the following embodiments of the present disclosure are described and illustrated by taking the human body as an example, and other target objects are similar to the human body and are not described in detail.

The image processing method of the present disclosure is performed by an electronic device. The electronic device may be a tablet computer, a mobile phone (such as a folding screen mobile phone, a large screen mobile phone, etc.), a wearable device, a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personaldigital assistant, PDA), a smart television, a smart screen, a high definition television, a 4K television, a smart speaker, an intelligent projector, etc., and the internet of things (THE INTERNET of things, IOT) device, the disclosure does not limit the specific type of the electronic device.

Fig. 1 is a schematic diagram of an image processing system provided in the present disclosure, as shown in fig. 1, the system includes:

Target neural network model 101 the target neural network model 101 may comprise an hourglass network. The input signal of the target neural network model 101 is the stacking result of the first mask 103 of the visible area of the target object of the original image and the original image 102, the output of the target neural network model 101 is the first feature map 104 corresponding to the target object, a plurality of gesture feature masks 106 are obtained according to the first mask 103 and a plurality of second masks (masks of typical gestures) 105, stacking processing is performed based on the first feature map 104 and the plurality of gesture feature masks 106, and the stacking result is obtained through a multi-layer convolution network (also called a dividing head or a task head) 107 to obtain a target complete mask 108 corresponding to the target object. Because the mask of the typical gesture of the target object is combined in the process of predicting the target complete mask of the target object, the accuracy of the obtained target complete mask of the target object is improved.

Fig. 2 is a flow chart of an image processing method provided in the present disclosure, as shown in fig. 2, the method in this embodiment is as follows:

S201: a first mask of a visible region of a target object in an original image is acquired.

The original image comprises a target object, and a part of the target object area is blocked by a blocking object.

The original image comprises a target object, and a part of the target object area is blocked by a blocking object. The visible region of the target object refers to the region of the target object displayed in the original image. The original image typically contains a background, a target object, and an occlusion.

Taking fig. 1 as an example, in which the target object is a human body, the shade is grass, the legs and feet of the human body are shielded by the shade, and the head and upper body and part of the legs are visible regions of the human body.

Alternatively, the first mask of the target object visible region in the original image may be acquired through an instance segmentation network. Wherein the values in the first mask obey a 0-1 distribution.

S203: and obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks.

The second masks are in one-to-one correspondence with the gestures of the target object, and the different second masks are in correspondence with the different gestures of the target object.

The second mask is a mask of a typical gesture obtained by processing a large number of mask samples, and the typical gesture is a gesture with a high probability of occurrence of the target object, and in this embodiment, a typical human gesture is exemplified. The typical posture of a human body may be 128, for example, standing, lying, jumping, pitching forward, leaning backward, etc.

Alternatively, the similarity between the first mask and the second masks may be obtained, and the similarities are multiplied by the second masks to obtain a plurality of gesture feature masks.

For example, there are 128 second masks, M _i represents the ith second mask, i is an integer greater than or equal to 1, and then 128 second masks may be denoted as [ M ₁,M₂,M₃,……,M₁₂₈];α_o represents the similarity between the first mask and the ith second mask, and the similarity between the first mask and the 128 second masks is denoted as [ α ₁,α₂,α₃,……,α₁₂₈ ]; the pose feature mask is written as [ α ₁M₁,α₂M₂,α₃M₃,……,α₁₂₈M₁₂₈ ], 128 pose feature masks: alpha ₁M₁,α₂M₂,α₃M₃,……,α₁₂₈M₁₂₈.

S205: and stacking the original image and the first mask, and inputting the stacked original image and the first mask into a target neural network model to obtain a first feature map.

The target neural network model is trained based on the original image sample, the reference complete mask of the target object sample and a plurality of second masks. The original image sample comprises the target object sample, and a part of the area of the target object sample is blocked by a blocking object. Wherein, the reference complete mask of the target object sample refers to the accurate complete mask of the acquired target object sample.

Namely, stacking the original image sample and masks of the visible area corresponding to the original image sample, and then taking the stacked masks as input of a neural network model, and obtaining a target complete mask of a target object sample according to output of the neural network model and a plurality of second masks; taking the reference complete mask of the target object sample as a supervision signal, acquiring a loss function according to the reference complete mask of the target object sample and the target complete mask of the target object sample, and adjusting parameters of the neural network model according to the loss function until the loss function meets a preset requirement, namely, until the accuracy of the obtained target complete mask of the target object sample meets the requirement, considering the neural network model to converge, and taking the converged neural network model as the target neural network model.

The mask of the visible region corresponding to the original image sample can be obtained through an instance segmentation network, and the instance segmentation network can be an existing instance segmentation network, that is, parameters of the instance segmentation network are not adjusted in the process of training the target neural network, that is, the instance segmentation network is not trained any more.

S207: and stacking according to the first feature map and the gesture feature masks to obtain a target complete mask of the target object.

After the gesture feature masks are obtained, resampling is carried out to the same size as the first feature map, stacking processing is carried out on the first feature map, and then the target complete mask of the target object is obtained through a multi-layer convolution network (also called a segmentation head or a task head), so that the accuracy of obtaining the target complete mask of the target object can be improved.

According to the method, a first mask of a visible area of a target object in an original image is obtained, a plurality of gesture feature masks are obtained according to the first mask and the plurality of second masks, the original image and the first masks are stacked and then input into a target neural network model to obtain a first feature map, stacking processing is carried out according to the first feature map and the plurality of gesture feature masks to obtain a target complete mask of the target object, wherein the second masks are in one-to-one correspondence with the gestures of the target object, and different second masks are in correspondence with different gestures of the target object.

Fig. 3 is a flow chart of another image processing method provided in the present disclosure, and fig. 3 is a flowchart of the embodiment shown in fig. 2, further, before S203, may further include:

s2021: a plurality of mask samples of a target object are acquired.

First, a set of images is acquired that contains various poses of the target object, wherein none of the target objects in the images are occluded by an occlusion. After the image set containing various postures of the target object is obtained, the mask of the target object is scratched out in a matting mode and the like, and a mask sample of the target object is obtained.

Alternatively, after the mask samples are obtained, all mask samples may be processed to the same size by interpolation, thereby removing extraneous background interference for cluster analysis.

S2022: and carrying out cluster analysis on the mask samples according to the similarity among the mask samples to obtain a plurality of cluster sets.

Cluster analysis of the plurality of mask samples includes, but is not limited to, the following possible implementations:

one possible implementation is:

first,: and obtaining a target similarity matrix according to the cosine distance and/or the Euclidean distance between any two mask samples.

Specifically, the mask sample may be vectorized to obtain a mask sample vector, and the target similarity matrix may be obtained according to a cosine distance and/or a euclidean distance between any two mask sample vectors.

Alternatively, the target similarity matrix may be obtained according to the cosine distance between any two mask sample vectors. Or the target similarity matrix can be obtained according to the Euclidean distance between any two mask sample vectors, wherein the element value in the target similarity matrix is the reciprocal of the corresponding Euclidean distance. Or the target similarity matrix can be obtained according to the cosine distance and the Euclidean distance between any two mask sample vectors. When a target similarity matrix is obtained according to the cosine distance and the Euclidean distance between any two mask sample vectors, a first similarity matrix obtained according to the cosine distance between any two mask sample vectors and a second similarity matrix obtained according to the Euclidean distance between any two mask sample vectors can be obtained, wherein element values in the second similarity matrix are inverse of the corresponding Euclidean distance, and the first similarity matrix and the second similarity matrix are subjected to linear weighting to obtain the target similarity matrix. Wherein each element value in the target similarity matrix represents a similarity between the corresponding two mask samples.

And secondly, carrying out cluster analysis on the plurality of mask samples according to the target similarity matrix to obtain a plurality of cluster sets.

Alternatively, the plurality of mask samples may be clustered by a k-means clustering algorithm (k-means clustering algorithm), which converges when each cluster center point no longer fluctuates, resulting in a plurality of cluster sets. Each cluster set may represent the pose of a class of relatively similar target objects.

Before the clustering set is obtained through the k-means clustering algorithm, a mask sample pair with similarity larger than or equal to a preset threshold value can be obtained through traversing the target similarity matrix, and the mask sample pair is used as a mask sample for clustering analysis, so that the clustering efficiency can be further improved.

S2023: and for each cluster set, acquiring a second mask corresponding to the cluster set according to mask samples in the cluster set.

Optionally, the second mask corresponding to the cluster set is obtained in the following manner:

one possible implementation manner is as follows:

And randomly selecting one mask sample from the cluster set as a second mask corresponding to the cluster set.

Since the mask samples in each cluster set are relatively similar, one mask sample may be randomly selected as the second mask for the cluster set.

Another possible implementation is:

In this embodiment, a plurality of mask samples are obtained, and cluster analysis is performed on the plurality of mask samples to obtain a plurality of cluster sets, and for each cluster set, a second mask corresponding to the cluster set is obtained according to the mask samples in the cluster set, so as to obtain masks of a plurality of typical poses of the target object.

Alternatively, a preset number of second masks may be selected from the plurality of second masks to be used according to the requirement, for example, 128 second masks may be selected.

Fig. 4 is a schematic structural diagram of an image processing apparatus provided in the present disclosure, and as shown in fig. 4, the apparatus of this embodiment includes: an acquisition module 401 and a processing module 402, wherein,

An obtaining module 401, configured to obtain a first mask of a visible area of a target object in an original image, where the original image includes the target object, and a partial area of the target object is blocked by a blocking object;

A processing module 402, configured to obtain a plurality of gesture feature masks according to the first mask and the plurality of second masks, where the second masks are in one-to-one correspondence with the gestures of the target object, and different second masks are corresponding to different gestures of the target object;

The processing module 402 is further configured to stack the original image and the first mask, and input the stacked original image and the first mask into a target neural network model to obtain a first feature map, where the target neural network model is obtained by training based on an original image sample, a reference complete mask of a target object sample, and the plurality of second masks, the original image sample includes the target object sample, and a partial area of the target object sample is blocked by a blocking object;

the processing module 402 is further configured to perform stacking processing according to the first feature map and the gesture feature masks, to obtain a target complete mask of the target object.

Optionally, the processing module 402 is specifically configured to obtain similarities between the first mask and the plurality of second masks, respectively; and correspondingly multiplying the plurality of similarity degrees by the plurality of second masks to obtain the plurality of gesture feature masks.

Optionally, the acquiring module 401 is further configured to acquire a plurality of mask samples of the target object; performing cluster analysis on the mask samples according to the similarity among the mask samples to obtain a plurality of cluster sets; and for each cluster set, acquiring a second mask corresponding to the cluster set according to mask samples in the cluster set.

Optionally, the obtaining module 401 is specifically configured to randomly select one mask sample from the cluster set as a second mask corresponding to the cluster set; or averaging all mask samples in the cluster set, and taking the result as a second mask corresponding to the cluster set.

Optionally, the obtaining module 401 is specifically configured to obtain a target similarity matrix according to a cosine distance and/or a euclidean distance between any two mask samples, where an element value in the target similarity matrix represents a similarity between the two corresponding mask samples; and carrying out cluster analysis on the plurality of mask samples according to the target similarity matrix to obtain a plurality of cluster sets.

Optionally, the obtaining module 401 is specifically configured to vectorize the mask samples to obtain a mask sample vector; and obtaining a target similarity matrix according to the cosine distance and/or the Euclidean distance between any two mask sample vectors.

Optionally, the obtaining module 401 is specifically configured to obtain a first similarity matrix according to a cosine distance between any two mask sample vectors, and/or obtain a second similarity matrix according to a euclidean distance between any two mask sample vectors; and carrying out linear weighting on the first similar matrix and the second similar matrix to obtain the target similar matrix, or taking the first similar matrix as the target similar matrix or taking the second similar matrix as the target similar matrix.

Optionally, the obtaining module 401 is further configured to obtain a mask sample pair with a similarity greater than or equal to a preset threshold by traversing the target similarity matrix; and taking the mask sample pairs as mask samples for cluster analysis.

The technical scheme corresponding to the device of the present embodiment, which can be used to execute the above method embodiment, has similar implementation principle and technical effect, and is not repeated here.

The present disclosure also provides an electronic device, including: a processor, configured to execute a computer program stored in a memory, where the computer program when executed by the processor implements the steps of the method embodiments described above, where it is noted that the processor may be a graphics processor (Graphics Processing Unit, GPU), i.e. the program algorithm of the present disclosure may be completely implemented by the GPU. Illustratively, pyTorch, etc. of the unified computing device architecture (Compute Unified Device Architecture, CUDA) may be employed.

The present disclosure also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described method embodiments.

The present disclosure also provides a computer program product which, when run on a computer, causes the computer to perform the steps of the method embodiments described above.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image processing method, the method comprising:

2. The method of claim 1, wherein the deriving a plurality of pose feature masks from the first mask and the plurality of second masks comprises:

3. The method according to claim 1 or 2, wherein before obtaining a plurality of pose feature masks according to the first mask and the plurality of second masks, further comprising:

acquiring a plurality of mask samples of a target object;

4. The method according to claim 3, wherein the obtaining a second mask corresponding to the cluster set according to mask samples in the cluster set includes:

Or alternatively

5. The method according to claim 3, wherein performing cluster analysis on the plurality of mask samples according to the similarity between the mask samples to obtain a plurality of cluster sets includes:

6. The method according to claim 5, wherein the obtaining the similarity matrix according to the cosine distance and/or the euclidean distance between any two mask samples comprises:

vectorizing the mask sample to obtain a mask sample vector;

7. The method according to claim 6, wherein the obtaining the target similarity matrix according to the cosine distance and/or the euclidean distance between any two mask sample vectors includes:

8. The method of claim 5, wherein before performing cluster analysis on the plurality of mask samples according to the similarity matrix to obtain a plurality of cluster sets, further comprising:

And taking the mask sample pairs as mask samples for cluster analysis.

9. An image processing apparatus, characterized in that the apparatus comprises:

10. An electronic device, comprising: a processor for executing a computer program stored in a memory, which when executed by the processor carries out the steps of the method according to any one of claims 1-8.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-8.