CN113033334B - Image processing method, image processing device, electronic equipment and medium - Google Patents

Image processing method, image processing device, electronic equipment and medium Download PDF

Info

Publication number
CN113033334B
CN113033334B CN202110247358.4A CN202110247358A CN113033334B CN 113033334 B CN113033334 B CN 113033334B CN 202110247358 A CN202110247358 A CN 202110247358A CN 113033334 B CN113033334 B CN 113033334B
Authority
CN
China
Prior art keywords
mask
target
masks
target object
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110247358.4A
Other languages
Chinese (zh)
Other versions
CN113033334A (en
Inventor
王诗吟
周强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202110247358.4A priority Critical patent/CN113033334B/en
Publication of CN113033334A publication Critical patent/CN113033334A/en
Application granted granted Critical
Publication of CN113033334B publication Critical patent/CN113033334B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides an image processing method, an image processing device, electronic equipment and a medium; the method comprises the steps of obtaining a first mask of a visible area of a target object in an original image, obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks, stacking the original image and the first masks, inputting the first mask into a target neural network model to obtain a first feature map, and stacking the first feature map and the plurality of gesture feature masks to obtain a target complete mask of the target object, wherein the second masks are in one-to-one correspondence with the gestures of the target object, and different second masks are in correspondence with different gestures of the target object.

Description

Image processing method, image processing device, electronic equipment and medium
Technical Field
The disclosure relates to the technical field of image processing, and in particular relates to an image processing method, an image processing device, electronic equipment and a medium.
Background
In the field of image processing, it is generally required to complement some partially blocked target objects in an image, where the target objects may be human bodies or other objects, for example, if a human body part is blocked by a blocking object in one image, it is required to complement the blocked part of the human body by a series of image processing, so as to obtain a complete human body. In general, in the process of complementing a target object, it is often necessary to predict the complete mask of the target object.
In the prior art, an original image is generally input into an instance segmentation network, a mask of a visible region of a target object is obtained through the instance segmentation network, and then a target complete mask of the target object is obtained through a neural network model based on the original image and the mask of the visible region of the target object.
However, the accuracy of obtaining the target complete mask of the target object using the prior art is not high.
Disclosure of Invention
In order to solve the technical problems described above or at least partially solve the technical problems described above, the present disclosure provides an image processing method, an apparatus, an electronic device, and a medium.
A first aspect of the present disclosure provides an image processing method, the method including:
Acquiring a first mask of a visible region of a target object in an original image, wherein the original image contains the target object, and a part of region of the target object is blocked by a blocking object;
obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks, wherein different second masks correspond to different gestures of the target object;
The original image and the first mask are stacked and then input into a target neural network model to obtain a first feature map, wherein the target neural network model is obtained by training based on an original image sample, a reference complete mask of a target object sample and the plurality of second masks, the original image sample contains the target object sample, and a part of the region of the target object sample is blocked by a blocking object;
And stacking according to the first feature map and the gesture feature masks to obtain a target complete mask of the target object.
Optionally, the obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks includes:
obtaining similarity between the first mask and a plurality of second masks respectively;
And correspondingly multiplying the plurality of similarity degrees by the plurality of second masks to obtain the plurality of gesture feature masks.
Optionally, before obtaining the plurality of gesture feature masks according to the first mask and the plurality of second masks, the method further includes:
acquiring a plurality of mask samples of a target object;
Performing cluster analysis on the mask samples according to the similarity among the mask samples to obtain a plurality of cluster sets;
and for each cluster set, acquiring a second mask corresponding to the cluster set according to mask samples in the cluster set.
Optionally, the obtaining, according to the mask samples in the cluster set, a second mask corresponding to the cluster set includes:
randomly selecting one mask sample from the cluster set as a second mask corresponding to the cluster set;
Or alternatively
And averaging all mask samples in the cluster set, and taking the result as a second mask corresponding to the cluster set.
Optionally, the performing cluster analysis on the plurality of mask samples according to the similarity between the mask samples to obtain a plurality of cluster sets includes:
Obtaining a target similarity matrix according to cosine distance and/or Euclidean distance between any two mask samples, wherein element values in the target similarity matrix represent similarity between the two corresponding mask samples;
and carrying out cluster analysis on the plurality of mask samples according to the target similarity matrix to obtain a plurality of cluster sets.
Optionally, the obtaining a similarity matrix according to the cosine distance and/or the euclidean distance between any two mask samples includes:
vectorizing the mask sample to obtain a mask sample vector;
And obtaining a target similarity matrix according to the cosine distance and/or the Euclidean distance between any two mask sample vectors.
Optionally, the obtaining the target similarity matrix according to the cosine distance and/or the euclidean distance between any two mask sample vectors includes:
A first similarity matrix obtained according to the cosine distance between any two mask sample vectors and/or a second similarity matrix obtained according to the Euclidean distance between any two mask sample vectors;
And carrying out linear weighting on the first similar matrix and the second similar matrix to obtain the target similar matrix, or taking the first similar matrix as the target similar matrix or taking the second similar matrix as the target similar matrix.
Optionally, before performing cluster analysis on the plurality of mask samples according to the similarity matrix to obtain a plurality of cluster sets, the method further includes:
obtaining mask sample pairs with similarity greater than or equal to a preset threshold value by traversing the target similarity matrix;
And taking the mask sample pairs as mask samples for cluster analysis.
A second aspect of the present disclosure provides an image processing apparatus, the apparatus comprising:
The acquisition module is used for acquiring a first mask of a visible area of a target object in an original image, wherein the original image contains the target object, and a part of area of the target object is blocked by a blocking object;
the processing module is used for obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks, wherein the second masks are in one-to-one correspondence with the gestures of the target object, and the different second masks are corresponding to different gestures of the target object;
The processing module is further configured to stack the original image and the first mask, and input the stacked original image and the first mask into a target neural network model to obtain a first feature map, where the target neural network model is obtained by training based on an original image sample, a reference complete mask of a target object sample, and the plurality of second masks, the original image sample includes the target object sample, and a partial area of the target object sample is blocked by a blocking object;
and the processing module is further used for carrying out stacking processing according to the first feature map and the gesture feature masks to obtain a target complete mask of the target object.
Optionally, the processing module is specifically configured to obtain similarities between the first mask and the plurality of second masks, respectively; and correspondingly multiplying the plurality of similarity degrees by the plurality of second masks to obtain the plurality of gesture feature masks.
Optionally, the acquiring module is further configured to acquire a plurality of mask samples of the target object; performing cluster analysis on the mask samples according to the similarity among the mask samples to obtain a plurality of cluster sets; and for each cluster set, acquiring a second mask corresponding to the cluster set according to mask samples in the cluster set.
Optionally, the obtaining module is specifically configured to randomly select one mask sample from the cluster set as a second mask corresponding to the cluster set; or averaging all mask samples in the cluster set, and taking the result as a second mask corresponding to the cluster set.
Optionally, the obtaining module is specifically configured to obtain a target similarity matrix according to a cosine distance and/or a euclidean distance between any two mask samples, where an element value in the target similarity matrix represents a similarity between the two corresponding mask samples; and carrying out cluster analysis on the plurality of mask samples according to the target similarity matrix to obtain a plurality of cluster sets.
Optionally, the obtaining module is specifically configured to vectorize the mask sample to obtain a mask sample vector; and obtaining a target similarity matrix according to the cosine distance and/or the Euclidean distance between any two mask sample vectors.
Optionally, the acquiring module is specifically configured to obtain a first similarity matrix according to a cosine distance between any two mask sample vectors, and/or obtain a second similarity matrix according to a euclidean distance between any two mask sample vectors; and carrying out linear weighting on the first similar matrix and the second similar matrix to obtain the target similar matrix, or taking the first similar matrix as the target similar matrix or taking the second similar matrix as the target similar matrix.
Optionally, the obtaining module is further configured to obtain a mask sample pair with a similarity greater than or equal to a preset threshold by traversing the target similarity matrix; and taking the mask sample pairs as mask samples for cluster analysis.
A third aspect of the present disclosure provides an electronic device, comprising: a processor for executing a computer program stored in a memory, which when executed by the processor implements the steps of the method of the first aspect.
A fourth aspect of the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
A fifth aspect of the present disclosure provides a computer program product for, when run on a computer, causing the computer to perform the image processing method of the first aspect.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
The method comprises the steps of obtaining a first mask of a visible area of a target object in an original image, obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks, stacking the original image and the first masks, inputting the first mask into a target neural network model to obtain a first feature map, and stacking the first feature map and the plurality of gesture feature masks to obtain a target complete mask of the target object, wherein the second masks are in one-to-one correspondence with the gestures of the target object, and different second masks are in correspondence with different gestures of the target object.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic diagram of an image processing system provided by the present disclosure;
fig. 2 is a schematic flow chart of an image processing method provided in the present disclosure;
FIG. 3 is a flow chart of another image processing method provided by the present disclosure;
Fig. 4 is a schematic structural diagram of an image processing apparatus provided in the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.
The original image of the present disclosure includes a target object, where a partial area of the target object is blocked by a blocking object, and for convenience of description, an area of the target object blocked by the blocking object is described as an invisible area of the target object. The area of the target object that is not occluded by the occlusion object is described as the visible area of the target object. Wherein the target object may be a human body, an animal or other object, to which the present disclosure is not limited.
In the process of complementing the invisible area of the target object of the original image, predicting the mask of the invisible area of the target object is required, and complementing the target object based on the mask of the invisible area of the target object; so as to be capable of achieving image processing tasks such as target tracking, target detection, image segmentation and the like.
The mask of the invisible area of the target object is typically obtained based on the difference between the target complete mask of the target object and the mask of the visible area of the target object, and thus the accuracy of the target complete mask of the target object directly affects the accuracy of the mask of the invisible area of the target object.
In order to improve the accuracy of the obtained target complete mask of the target object, the mask of various typical poses of the target object is generated, and the mask of the typical poses is referred to in the process of predicting the target complete mask of the target object, so that the accuracy of the obtained target complete mask of the target object is improved.
The target object of the present disclosure may be a human body, an animal, or other objects, and the following embodiments of the present disclosure are described and illustrated by taking the human body as an example, and other target objects are similar to the human body and are not described in detail.
The image processing method of the present disclosure is performed by an electronic device. The electronic device may be a tablet computer, a mobile phone (such as a folding screen mobile phone, a large screen mobile phone, etc.), a wearable device, a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personaldigital assistant, PDA), a smart television, a smart screen, a high definition television, a 4K television, a smart speaker, an intelligent projector, etc., and the internet of things (THE INTERNET of things, IOT) device, the disclosure does not limit the specific type of the electronic device.
Fig. 1 is a schematic diagram of an image processing system provided in the present disclosure, as shown in fig. 1, the system includes:
Target neural network model 101 the target neural network model 101 may comprise an hourglass network. The input signal of the target neural network model 101 is the stacking result of the first mask 103 of the visible area of the target object of the original image and the original image 102, the output of the target neural network model 101 is the first feature map 104 corresponding to the target object, a plurality of gesture feature masks 106 are obtained according to the first mask 103 and a plurality of second masks (masks of typical gestures) 105, stacking processing is performed based on the first feature map 104 and the plurality of gesture feature masks 106, and the stacking result is obtained through a multi-layer convolution network (also called a dividing head or a task head) 107 to obtain a target complete mask 108 corresponding to the target object. Because the mask of the typical gesture of the target object is combined in the process of predicting the target complete mask of the target object, the accuracy of the obtained target complete mask of the target object is improved.
Fig. 2 is a flow chart of an image processing method provided in the present disclosure, as shown in fig. 2, the method in this embodiment is as follows:
S201: a first mask of a visible region of a target object in an original image is acquired.
The original image comprises a target object, and a part of the target object area is blocked by a blocking object.
The original image comprises a target object, and a part of the target object area is blocked by a blocking object. The visible region of the target object refers to the region of the target object displayed in the original image. The original image typically contains a background, a target object, and an occlusion.
Taking fig. 1 as an example, in which the target object is a human body, the shade is grass, the legs and feet of the human body are shielded by the shade, and the head and upper body and part of the legs are visible regions of the human body.
Alternatively, the first mask of the target object visible region in the original image may be acquired through an instance segmentation network. Wherein the values in the first mask obey a 0-1 distribution.
S203: and obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks.
The second masks are in one-to-one correspondence with the gestures of the target object, and the different second masks are in correspondence with the different gestures of the target object.
The second mask is a mask of a typical gesture obtained by processing a large number of mask samples, and the typical gesture is a gesture with a high probability of occurrence of the target object, and in this embodiment, a typical human gesture is exemplified. The typical posture of a human body may be 128, for example, standing, lying, jumping, pitching forward, leaning backward, etc.
Alternatively, the similarity between the first mask and the second masks may be obtained, and the similarities are multiplied by the second masks to obtain a plurality of gesture feature masks.
For example, there are 128 second masks, M i represents the ith second mask, i is an integer greater than or equal to 1, and then 128 second masks may be denoted as [ M 1,M2,M3,……,M128];αo represents the similarity between the first mask and the ith second mask, and the similarity between the first mask and the 128 second masks is denoted as [ α 123,……,α128 ]; the pose feature mask is written as [ α 1M12M23M3,……,α128M128 ], 128 pose feature masks: alpha 1M12M23M3,……,α128M128.
S205: and stacking the original image and the first mask, and inputting the stacked original image and the first mask into a target neural network model to obtain a first feature map.
The target neural network model is trained based on the original image sample, the reference complete mask of the target object sample and a plurality of second masks. The original image sample comprises the target object sample, and a part of the area of the target object sample is blocked by a blocking object. Wherein, the reference complete mask of the target object sample refers to the accurate complete mask of the acquired target object sample.
Namely, stacking the original image sample and masks of the visible area corresponding to the original image sample, and then taking the stacked masks as input of a neural network model, and obtaining a target complete mask of a target object sample according to output of the neural network model and a plurality of second masks; taking the reference complete mask of the target object sample as a supervision signal, acquiring a loss function according to the reference complete mask of the target object sample and the target complete mask of the target object sample, and adjusting parameters of the neural network model according to the loss function until the loss function meets a preset requirement, namely, until the accuracy of the obtained target complete mask of the target object sample meets the requirement, considering the neural network model to converge, and taking the converged neural network model as the target neural network model.
The mask of the visible region corresponding to the original image sample can be obtained through an instance segmentation network, and the instance segmentation network can be an existing instance segmentation network, that is, parameters of the instance segmentation network are not adjusted in the process of training the target neural network, that is, the instance segmentation network is not trained any more.
S207: and stacking according to the first feature map and the gesture feature masks to obtain a target complete mask of the target object.
After the gesture feature masks are obtained, resampling is carried out to the same size as the first feature map, stacking processing is carried out on the first feature map, and then the target complete mask of the target object is obtained through a multi-layer convolution network (also called a segmentation head or a task head), so that the accuracy of obtaining the target complete mask of the target object can be improved.
According to the method, a first mask of a visible area of a target object in an original image is obtained, a plurality of gesture feature masks are obtained according to the first mask and the plurality of second masks, the original image and the first masks are stacked and then input into a target neural network model to obtain a first feature map, stacking processing is carried out according to the first feature map and the plurality of gesture feature masks to obtain a target complete mask of the target object, wherein the second masks are in one-to-one correspondence with the gestures of the target object, and different second masks are in correspondence with different gestures of the target object.
Fig. 3 is a flow chart of another image processing method provided in the present disclosure, and fig. 3 is a flowchart of the embodiment shown in fig. 2, further, before S203, may further include:
s2021: a plurality of mask samples of a target object are acquired.
First, a set of images is acquired that contains various poses of the target object, wherein none of the target objects in the images are occluded by an occlusion. After the image set containing various postures of the target object is obtained, the mask of the target object is scratched out in a matting mode and the like, and a mask sample of the target object is obtained.
Alternatively, after the mask samples are obtained, all mask samples may be processed to the same size by interpolation, thereby removing extraneous background interference for cluster analysis.
S2022: and carrying out cluster analysis on the mask samples according to the similarity among the mask samples to obtain a plurality of cluster sets.
Cluster analysis of the plurality of mask samples includes, but is not limited to, the following possible implementations:
one possible implementation is:
first,: and obtaining a target similarity matrix according to the cosine distance and/or the Euclidean distance between any two mask samples.
Specifically, the mask sample may be vectorized to obtain a mask sample vector, and the target similarity matrix may be obtained according to a cosine distance and/or a euclidean distance between any two mask sample vectors.
Alternatively, the target similarity matrix may be obtained according to the cosine distance between any two mask sample vectors. Or the target similarity matrix can be obtained according to the Euclidean distance between any two mask sample vectors, wherein the element value in the target similarity matrix is the reciprocal of the corresponding Euclidean distance. Or the target similarity matrix can be obtained according to the cosine distance and the Euclidean distance between any two mask sample vectors. When a target similarity matrix is obtained according to the cosine distance and the Euclidean distance between any two mask sample vectors, a first similarity matrix obtained according to the cosine distance between any two mask sample vectors and a second similarity matrix obtained according to the Euclidean distance between any two mask sample vectors can be obtained, wherein element values in the second similarity matrix are inverse of the corresponding Euclidean distance, and the first similarity matrix and the second similarity matrix are subjected to linear weighting to obtain the target similarity matrix. Wherein each element value in the target similarity matrix represents a similarity between the corresponding two mask samples.
And secondly, carrying out cluster analysis on the plurality of mask samples according to the target similarity matrix to obtain a plurality of cluster sets.
Alternatively, the plurality of mask samples may be clustered by a k-means clustering algorithm (k-means clustering algorithm), which converges when each cluster center point no longer fluctuates, resulting in a plurality of cluster sets. Each cluster set may represent the pose of a class of relatively similar target objects.
Before the clustering set is obtained through the k-means clustering algorithm, a mask sample pair with similarity larger than or equal to a preset threshold value can be obtained through traversing the target similarity matrix, and the mask sample pair is used as a mask sample for clustering analysis, so that the clustering efficiency can be further improved.
S2023: and for each cluster set, acquiring a second mask corresponding to the cluster set according to mask samples in the cluster set.
Optionally, the second mask corresponding to the cluster set is obtained in the following manner:
one possible implementation manner is as follows:
And randomly selecting one mask sample from the cluster set as a second mask corresponding to the cluster set.
Since the mask samples in each cluster set are relatively similar, one mask sample may be randomly selected as the second mask for the cluster set.
Another possible implementation is:
and averaging all mask samples in the cluster set, and taking the result as a second mask corresponding to the cluster set.
In this embodiment, a plurality of mask samples are obtained, and cluster analysis is performed on the plurality of mask samples to obtain a plurality of cluster sets, and for each cluster set, a second mask corresponding to the cluster set is obtained according to the mask samples in the cluster set, so as to obtain masks of a plurality of typical poses of the target object.
Alternatively, a preset number of second masks may be selected from the plurality of second masks to be used according to the requirement, for example, 128 second masks may be selected.
Fig. 4 is a schematic structural diagram of an image processing apparatus provided in the present disclosure, and as shown in fig. 4, the apparatus of this embodiment includes: an acquisition module 401 and a processing module 402, wherein,
An obtaining module 401, configured to obtain a first mask of a visible area of a target object in an original image, where the original image includes the target object, and a partial area of the target object is blocked by a blocking object;
A processing module 402, configured to obtain a plurality of gesture feature masks according to the first mask and the plurality of second masks, where the second masks are in one-to-one correspondence with the gestures of the target object, and different second masks are corresponding to different gestures of the target object;
The processing module 402 is further configured to stack the original image and the first mask, and input the stacked original image and the first mask into a target neural network model to obtain a first feature map, where the target neural network model is obtained by training based on an original image sample, a reference complete mask of a target object sample, and the plurality of second masks, the original image sample includes the target object sample, and a partial area of the target object sample is blocked by a blocking object;
the processing module 402 is further configured to perform stacking processing according to the first feature map and the gesture feature masks, to obtain a target complete mask of the target object.
Optionally, the processing module 402 is specifically configured to obtain similarities between the first mask and the plurality of second masks, respectively; and correspondingly multiplying the plurality of similarity degrees by the plurality of second masks to obtain the plurality of gesture feature masks.
Optionally, the acquiring module 401 is further configured to acquire a plurality of mask samples of the target object; performing cluster analysis on the mask samples according to the similarity among the mask samples to obtain a plurality of cluster sets; and for each cluster set, acquiring a second mask corresponding to the cluster set according to mask samples in the cluster set.
Optionally, the obtaining module 401 is specifically configured to randomly select one mask sample from the cluster set as a second mask corresponding to the cluster set; or averaging all mask samples in the cluster set, and taking the result as a second mask corresponding to the cluster set.
Optionally, the obtaining module 401 is specifically configured to obtain a target similarity matrix according to a cosine distance and/or a euclidean distance between any two mask samples, where an element value in the target similarity matrix represents a similarity between the two corresponding mask samples; and carrying out cluster analysis on the plurality of mask samples according to the target similarity matrix to obtain a plurality of cluster sets.
Optionally, the obtaining module 401 is specifically configured to vectorize the mask samples to obtain a mask sample vector; and obtaining a target similarity matrix according to the cosine distance and/or the Euclidean distance between any two mask sample vectors.
Optionally, the obtaining module 401 is specifically configured to obtain a first similarity matrix according to a cosine distance between any two mask sample vectors, and/or obtain a second similarity matrix according to a euclidean distance between any two mask sample vectors; and carrying out linear weighting on the first similar matrix and the second similar matrix to obtain the target similar matrix, or taking the first similar matrix as the target similar matrix or taking the second similar matrix as the target similar matrix.
Optionally, the obtaining module 401 is further configured to obtain a mask sample pair with a similarity greater than or equal to a preset threshold by traversing the target similarity matrix; and taking the mask sample pairs as mask samples for cluster analysis.
The technical scheme corresponding to the device of the present embodiment, which can be used to execute the above method embodiment, has similar implementation principle and technical effect, and is not repeated here.
The present disclosure also provides an electronic device, including: a processor, configured to execute a computer program stored in a memory, where the computer program when executed by the processor implements the steps of the method embodiments described above, where it is noted that the processor may be a graphics processor (Graphics Processing Unit, GPU), i.e. the program algorithm of the present disclosure may be completely implemented by the GPU. Illustratively, pyTorch, etc. of the unified computing device architecture (Compute Unified Device Architecture, CUDA) may be employed.
The present disclosure also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described method embodiments.
The present disclosure also provides a computer program product which, when run on a computer, causes the computer to perform the steps of the method embodiments described above.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. An image processing method, the method comprising:
Acquiring a first mask of a visible region of a target object in an original image, wherein the original image contains the target object, and a part of region of the target object is blocked by a blocking object;
obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks, wherein different second masks correspond to different gestures of the target object;
The original image and the first mask are stacked and then input into a target neural network model to obtain a first feature map, wherein the target neural network model is obtained by training based on an original image sample, a reference complete mask of a target object sample and the plurality of second masks, the original image sample contains the target object sample, and a part of the region of the target object sample is blocked by a blocking object;
And stacking according to the first feature map and the gesture feature masks to obtain a target complete mask of the target object.
2. The method of claim 1, wherein the deriving a plurality of pose feature masks from the first mask and the plurality of second masks comprises:
obtaining similarity between the first mask and a plurality of second masks respectively;
And correspondingly multiplying the plurality of similarity degrees by the plurality of second masks to obtain the plurality of gesture feature masks.
3. The method according to claim 1 or 2, wherein before obtaining a plurality of pose feature masks according to the first mask and the plurality of second masks, further comprising:
acquiring a plurality of mask samples of a target object;
Performing cluster analysis on the mask samples according to the similarity among the mask samples to obtain a plurality of cluster sets;
and for each cluster set, acquiring a second mask corresponding to the cluster set according to mask samples in the cluster set.
4. The method according to claim 3, wherein the obtaining a second mask corresponding to the cluster set according to mask samples in the cluster set includes:
randomly selecting one mask sample from the cluster set as a second mask corresponding to the cluster set;
Or alternatively
And averaging all mask samples in the cluster set, and taking the result as a second mask corresponding to the cluster set.
5. The method according to claim 3, wherein performing cluster analysis on the plurality of mask samples according to the similarity between the mask samples to obtain a plurality of cluster sets includes:
Obtaining a target similarity matrix according to cosine distance and/or Euclidean distance between any two mask samples, wherein element values in the target similarity matrix represent similarity between the two corresponding mask samples;
and carrying out cluster analysis on the plurality of mask samples according to the target similarity matrix to obtain a plurality of cluster sets.
6. The method according to claim 5, wherein the obtaining the similarity matrix according to the cosine distance and/or the euclidean distance between any two mask samples comprises:
vectorizing the mask sample to obtain a mask sample vector;
And obtaining a target similarity matrix according to the cosine distance and/or the Euclidean distance between any two mask sample vectors.
7. The method according to claim 6, wherein the obtaining the target similarity matrix according to the cosine distance and/or the euclidean distance between any two mask sample vectors includes:
A first similarity matrix obtained according to the cosine distance between any two mask sample vectors and/or a second similarity matrix obtained according to the Euclidean distance between any two mask sample vectors;
And carrying out linear weighting on the first similar matrix and the second similar matrix to obtain the target similar matrix, or taking the first similar matrix as the target similar matrix or taking the second similar matrix as the target similar matrix.
8. The method of claim 5, wherein before performing cluster analysis on the plurality of mask samples according to the similarity matrix to obtain a plurality of cluster sets, further comprising:
obtaining mask sample pairs with similarity greater than or equal to a preset threshold value by traversing the target similarity matrix;
And taking the mask sample pairs as mask samples for cluster analysis.
9. An image processing apparatus, characterized in that the apparatus comprises:
The acquisition module is used for acquiring a first mask of a visible area of a target object in an original image, wherein the original image contains the target object, and a part of area of the target object is blocked by a blocking object;
the processing module is used for obtaining a plurality of gesture feature masks according to the first mask and the plurality of second masks, wherein the second masks are in one-to-one correspondence with the gestures of the target object, and the different second masks are corresponding to different gestures of the target object;
The processing module is further configured to stack the original image and the first mask, and input the stacked original image and the first mask into a target neural network model to obtain a first feature map, where the target neural network model is obtained by training based on an original image sample, a reference complete mask of a target object sample, and the plurality of second masks, the original image sample includes the target object sample, and a partial area of the target object sample is blocked by a blocking object;
and the processing module is further used for carrying out stacking processing according to the first feature map and the gesture feature masks to obtain a target complete mask of the target object.
10. An electronic device, comprising: a processor for executing a computer program stored in a memory, which when executed by the processor carries out the steps of the method according to any one of claims 1-8.
11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-8.
CN202110247358.4A 2021-03-05 2021-03-05 Image processing method, image processing device, electronic equipment and medium Active CN113033334B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110247358.4A CN113033334B (en) 2021-03-05 2021-03-05 Image processing method, image processing device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110247358.4A CN113033334B (en) 2021-03-05 2021-03-05 Image processing method, image processing device, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN113033334A CN113033334A (en) 2021-06-25
CN113033334B true CN113033334B (en) 2024-07-02

Family

ID=76468473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110247358.4A Active CN113033334B (en) 2021-03-05 2021-03-05 Image processing method, image processing device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN113033334B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109064397B (en) * 2018-07-04 2023-08-01 广州希脉创新科技有限公司 Image stitching method and system based on camera earphone
CN109359684B (en) * 2018-10-17 2021-10-29 苏州大学 Fine-grained vehicle type identification method based on weak supervision positioning and subcategory similarity measurement
CN110070056B (en) * 2019-04-25 2023-01-10 腾讯科技(深圳)有限公司 Image processing method, image processing apparatus, storage medium, and device
CN112419328B (en) * 2019-08-22 2023-08-04 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110728330A (en) * 2019-10-23 2020-01-24 腾讯科技(深圳)有限公司 Object identification method, device, equipment and storage medium based on artificial intelligence
CN111753882B (en) * 2020-06-01 2024-06-28 Oppo广东移动通信有限公司 Training method and device of image recognition network and electronic equipment
CN111814776B (en) * 2020-09-10 2020-12-15 平安国际智慧城市科技股份有限公司 Image processing method, device, server and storage medium
CN112001372B (en) * 2020-09-30 2023-02-03 苏州科达科技股份有限公司 Face recognition model generation and face recognition method, system, device and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于光流与多尺度上下文的图像序列运动遮挡检测;冯诚 等;《自动化学报》;20211202;1-13 *

Also Published As

Publication number Publication date
CN113033334A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
CN108229490B (en) Key point detection method, neural network training method, device and electronic equipment
CN109791625B (en) Facial recognition using artificial neural networks
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
US10216979B2 (en) Image processing apparatus, image processing method, and storage medium to detect parts of an object
US7801354B2 (en) Image processing system
CN111860398B (en) Remote sensing image target detection method and system and terminal equipment
CN110765860A (en) Tumble determination method, tumble determination device, computer apparatus, and storage medium
CN111612822B (en) Object tracking method, device, computer equipment and storage medium
CN112257738A (en) Training method and device of machine learning model and classification method and device of image
CN107862680B (en) Target tracking optimization method based on correlation filter
WO2018100668A1 (en) Image processing device, image processing method, and image processing program
CN111223128A (en) Target tracking method, device, equipment and storage medium
CN112446379A (en) Self-adaptive intelligent processing method for dynamic large scene
CN111126249A (en) Pedestrian re-identification method and device combining big data and Bayes
US9081800B2 (en) Object detection via visual search
CN111260655A (en) Image generation method and device based on deep neural network model
CN111553838A (en) Model parameter updating method, device, equipment and storage medium
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
CN111353325A (en) Key point detection model training method and device
Bose et al. In-situ recognition of hand gesture via Enhanced Xception based single-stage deep convolutional neural network
Alsanad et al. Real-time fuel truck detection algorithm based on deep convolutional neural network
CN111104911A (en) Pedestrian re-identification method and device based on big data training
CN114821823A (en) Image processing, training of human face anti-counterfeiting model and living body detection method and device
Ruchay et al. Removal of impulse noise clusters from color images with local order statistics
CN113033334B (en) Image processing method, image processing device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant