CN111246113B

CN111246113B - Image processing method, device, equipment and storage medium

Info

Publication number: CN111246113B
Application number: CN202010147639.8A
Authority: CN
Inventors: 罗彤; 李亚乾; 蒋燚
Original assignee: Shanghai Jinsheng Communication Technology Co ltd; Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Shanghai Jinsheng Communication Technology Co ltd; Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2022-03-18
Anticipated expiration: 2040-03-05
Also published as: CN111246113A

Abstract

The application discloses an image processing method, an image processing device, image processing equipment and a storage medium, and belongs to the technical field of image processing. The method comprises the following steps: acquiring a non-ideal character image, wherein characters in the non-ideal character image have non-ideal human body postures; acquiring an ideal character image, wherein characters in the ideal character image have ideal body postures; correcting the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value. The technical scheme provided by the embodiment of the application can improve the shooting efficiency of the figure image.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a storage medium.

Background

Currently, people's image capture is becoming more common in people's daily life, wherein an aesthetic body posture (which may also be referred to as a photographing posture) may enhance the overall effect of the people's image.

In the related art, after the person image is captured, the subject or the imaging person can view the captured person image, and if the body posture in the person image is not beautiful, the subject can adjust the body posture thereof, and then the imaging person can capture the subject again until the body posture in the captured person image is beautiful.

However, such an approach is cumbersome, resulting in inefficient capturing of the person's image.

Disclosure of Invention

Based on this, the embodiment of the application provides an image processing method, an image processing device, an image processing apparatus and a storage medium, which can improve the shooting efficiency of the person image.

In a first aspect, an image processing method is provided, which includes:

acquiring a non-ideal character image, wherein characters in the non-ideal character image have non-ideal human body postures; acquiring an ideal character image, wherein characters in the ideal character image have ideal body postures; correcting the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.

In a second aspect, there is provided an image processing apparatus comprising:

the first acquisition module is used for acquiring a non-ideal character image, wherein characters in the non-ideal character image have non-ideal human body gestures;

the second acquisition module is used for acquiring an ideal character image, and characters in the ideal character image have ideal human body postures;

the correction module is used for correcting the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.

In a third aspect, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, implements the image processing method according to any of the first aspects above.

In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method according to any of the first aspects described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

by acquiring a non-ideal character image and an ideal character image, wherein the character in the non-ideal character image has a non-ideal body posture, and the character in the ideal character image has an ideal body posture, and then correcting the body posture of the character in the non-ideal character image according to the ideal character image, so that the difference between the corrected body posture and the ideal body posture is smaller than a preset difference threshold value, when the body posture in the shot character image is not attractive, namely is not ideal, the correction processing can be directly performed according to the ideal character image, so that the corrected body posture is close to the ideal body posture, and because the ideal body posture is the ideal and attractive body posture, the attractiveness of the body posture in the shot character image can be improved through the correction processing, and thus, in the process of shooting the figure image, the shot person does not need to adjust the body posture of the shot person for many times, and the shot person does not need to shoot for many times, so that the efficiency of shooting the figure image can be improved.

Drawings

FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;

fig. 2 is a flowchart of an image processing method according to an embodiment of the present application;

fig. 3 is a flowchart of a method for correcting a human posture of a person in a non-ideal human image according to an embodiment of the present application;

fig. 4 is a schematic network structure diagram of a key point identification network according to an embodiment of the present disclosure;

fig. 5 is a schematic network structure diagram of a 1 st second identification subnetwork provided in the embodiment of the present application;

fig. 6 is a schematic network structure diagram of a kth second identification subnetwork provided in the embodiment of the present application;

fig. 7 is a schematic diagram of an STN network structure according to an embodiment of the present disclosure;

fig. 8 is a block diagram of an image processing apparatus according to an embodiment of the present application;

FIG. 9 is a block diagram of a calibration module provided in an embodiment of the present application;

fig. 10 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

In the following, a brief description will be given of an implementation environment related to the image processing method provided in the embodiment of the present application.

Fig. 1 is a schematic diagram of an implementation environment related to an image processing method provided in an embodiment of the present application, and as shown in fig. 1, the implementation environment may include a server 101 and a terminal 102, and the server 101 and the terminal 102 may communicate with each other through a wired network or a wireless network.

The terminal 102 may be a smart phone, a tablet computer, a wearable device, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compress standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compress standard Audio Layer 4), an e-book reader, or a vehicle-mounted device. The server 101 may be one server or a server cluster including a plurality of servers.

In the implementation environment shown in fig. 1, the terminal 102 may transmit a non-ideal character image in which a character has a non-ideal body posture, which refers to an undesirable or unaesthetic body posture, and an ideal character image, which may alternatively be a body posture unsatisfactory to the user, to the server 101, and may alternatively be a body posture desired to be posed by the user. The server 101 may perform correction processing of the human body posture of the person in the non-ideal personal image using the ideal personal image.

Of course, in some possible implementations, the implementation environment related to the image processing method provided by the embodiment of the present application may only include the terminal 102.

In the case where the implementation environment includes only the terminal 102, the terminal 102 may perform correction processing of the human body posture of the person in the non-ideal person image using the ideal person image directly after acquiring the non-ideal person image and the ideal person image.

Please refer to fig. 2, which shows a flowchart of an image processing method provided in the embodiment of the present application, where the image processing method may be applied to the server 101 or the terminal 102, and the embodiment of the present application only takes the application of the image processing method to the terminal 102 as an example for description, and a technical process of the image processing method applied to the server 101 is the same as a technical process of the image processing method applied to the terminal 102, and details of the image processing method are not repeated in the embodiment of the present application. As shown in fig. 2, the image processing method may include the steps of:

step 201, the terminal acquires a non-ideal person image.

In one embodiment of the present application, if the terminal detects a correction instruction for a personal image after capturing the personal image, the terminal may take the captured personal image as a non-ideal personal image.

Optionally, after the terminal captures the person image, the captured person image may be displayed in an image display interface, and the terminal may receive a correction instruction for the person image based on the image display interface.

In a possible implementation manner, a correction option may be set in the image presentation interface, and when a trigger operation on the correction option is detected, the terminal may receive a correction instruction for the person image.

In another possible implementation manner, when the terminal detects a preset type of touch operation in the image display interface, the terminal may receive a correction instruction for the person image, where the touch operation may be a double-click operation, a single-machine operation, or a sliding operation.

Step 202, the terminal acquires an ideal person image.

In one embodiment of the present application, the terminal may determine the above-described ideal personal image from among the plurality of candidate ideal personal images according to a selection instruction of the user.

In practical applications, the terminal may store a plurality of candidate ideal personal images in advance, or the terminal may request the server for a plurality of candidate ideal personal images, where the human body posture of the person in the candidate ideal personal images is an ideal human body posture, the user may select one candidate ideal personal image from the plurality of candidate ideal personal images as the ideal personal image, and the terminal may perform correction processing on the human body posture of the person in the non-ideal personal image based on the ideal personal image selected by the user.

And step 203, the terminal corrects the human body posture of the person in the non-ideal person image according to the ideal person image.

Wherein, the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value, in other words, the corrected human body posture is close to the ideal human body posture.

The image processing method provided by the embodiment of the application obtains the non-ideal character image and the ideal character image, wherein the character in the non-ideal character image has a non-ideal body posture, the character in the ideal character image has an ideal body posture, then, the body posture of the character in the non-ideal character image is corrected according to the ideal character image, so that the difference between the corrected body posture and the ideal body posture is smaller than the preset difference threshold value, therefore, when the body posture of the shot character image is not beautiful, namely is not ideal, the correction processing can be directly carried out according to the ideal character image, the corrected body posture is close to the ideal body posture, because the ideal body posture is the ideal and beautiful body posture, the beauty of the body posture in the shot character image can be improved through the correction processing, thus, in the process of shooting the figure image, the shot person does not need to adjust the body posture of the shot person for many times, and the shot person does not need to shoot for many times, so that the efficiency of shooting the figure image can be improved.

Referring to fig. 3, on the basis of the above embodiment, an embodiment of the present application provides a method for performing a correction process on a human posture of a person in a non-ideal person image according to an ideal person image, where the method may include the following steps:

step 2031, the terminal identifies the key points of the human skeleton for the non-ideal character image and the ideal character image respectively to obtain a plurality of key points of the human skeleton included in the non-ideal character image and a plurality of key points of the human skeleton included in the ideal character image.

Optionally, in an embodiment of the present application, the terminal may perform human skeleton keypoint identification on the non-ideal person image and the ideal person image by using a neural network, and in order to simplify the description, in the process of describing the human skeleton keypoint identification, the non-ideal person image and the ideal person image are collectively referred to as a target person image. The technical process of identifying the key points of the human skeleton of the target person image by using the neural network can comprise the following steps A and B.

And step A, the terminal inputs the target character image into a key point identification network to obtain a key point probability graph set and a limb direction vector graph set output by the key point identification network.

In the following, the embodiment of the present application will describe a keypoint probability map set and a limb direction vector map set separately.

Firstly, a key point probability graph set.

The key point probability map set comprises a plurality of key point probability maps which are in one-to-one correspondence with different types of human skeleton key points, and each key point probability map comprises a probability value used for indicating the probability that the corresponding type of human skeleton key point is located at the position point of the probability value.

For example, assuming that there are 3 kinds of human skeleton key points, certainly, the kinds of human skeleton key points in practical application are far more than 3 kinds, and for simplicity of explanation, the present embodiment is described by taking only 3 kinds of human skeleton key points as an example, where the 3 kinds of human skeleton key points are a left elbow key point, a left wrist key point, and a left shoulder key point, respectively.

In the case of 3 kinds of human skeleton key points in total, the key point probability map set described above includes 3 key point probability maps, each of which corresponds to one kind of human skeleton key points.

The keypoint probability map is essentially a matrix, each matrix element in the matrix is a probability value, the position of each matrix element (i.e., the probability value) in the matrix can be regarded as the position point of the matrix element in the keypoint probability map, and the position point has a mapping relationship with one or more pixels in the target person image.

As described above, the probability value included in the key point probability map is used to indicate the probability that the corresponding kind of human bone key point is located at the position of the probability value, and taking the key point probability map corresponding to the left-elbow key point as an example, the probability value included in the key point probability map is used to indicate the probability that the left-elbow key point is located at the position of the probability value.

And II, collecting the limb direction vector diagram.

The set of limb direction vector images includes a plurality of limb direction vector images in one-to-one correspondence with different kinds of limbs, each limb direction vector image including vector values for indicating directions of the corresponding kind of limb at a point where the vector values are located.

It should be noted that the body limb in the embodiment of the present application is not a narrow body limb, but refers to a body region between the associated skeletal key points of the human body, for example, the body region between the key point for the left eye and the key point for the right eye is one body limb, and the body region between the key point for the neck and the key point for the left shoulder is one body limb.

For example, assuming that there are 2 kinds of limbs, of course, the kinds of limbs in practical application are far more than 2 kinds, for simplicity of description, the present embodiment is described by taking only 2 kinds of limbs as an example, where the 2 kinds of limbs are a left forearm limb and a left forearm limb respectively, where the left forearm limb is a human body region between a key point of a left elbow and a key point of a left wrist, and the left forearm limb is a human body region between a key point of a left elbow and a key point of a left shoulder.

In the case of a total of 2 limbs, the set of limb direction vector images described above includes 2 limb direction vector images, each corresponding to one limb.

Wherein the body direction vector diagram is essentially a matrix, each matrix element in the matrix is a vector value, and the position of each matrix element (i.e., the vector value) in the matrix can be regarded as the position point of the matrix element in the body direction vector diagram, and the position point has a mapping relation with one or more pixels in the target person image.

Alternatively, the vector value may be a two-dimensional vector value, which may be represented by (x, y), and in general, when a certain vector value in the limb direction vector diagram is (0,0), it indicates that there is no limb of the kind corresponding to the limb direction vector diagram at the position point where the vector value is located.

As described above, the limb direction vector map includes a vector value for indicating the direction of the limb of the corresponding category at the position point of the vector value, and the limb direction vector map corresponding to the left forearm limb is taken as an example, and the vector value included in the limb direction vector map is used for indicating the direction of the left forearm limb at the position point of the vector value.

After describing the set of the keypoint probability map and the set of the limb direction vector map, the embodiment of the present application will briefly describe the network structure of the keypoint identification network.

Referring to fig. 4, the keypoint identification network may include a first identification subnetwork w1 and a cascade of n second identification subnetworks w2, n being a positive integer greater than 1.

The first recognition subnetwork w1 is used for feature extraction of the target person image and outputting a feature map. The feature map is essentially a matrix whose matrix elements are the features of the image of the target person extracted by the first recognition subnetwork w 1.

Optionally, the first recognition subnetwork w1 may be a Convolutional Neural Network (CNN), for example, the first recognition subnetwork w1 may be a mobilene v2 Network.

The input to the 1 st of the n second identification subnetworks w2 may be: the feature map output by the first recognition subnetwork w1, the 1 st second recognition subnetwork may perform recognition computation on the feature map, and output a 1 st candidate keypoint probability map set and a 1 st candidate limb direction vector map set, and optionally, the recognition computation described above may be convolution computation.

The input to the kth second recognition subnetwork of the n second recognition subnetworks w2 may be: the feature map output by the first recognition subnetwork w1, the k-1 st candidate keypoint probability map set and the k-1 st candidate limb direction vector map set, and the kth second recognition subnetwork may perform recognition computation on the feature map output by the first recognition subnetwork w1, the k-1 st candidate keypoint probability map set and the k-1 st candidate limb direction vector map set, and output the kth candidate keypoint probability map set and the kth candidate limb direction vector map set, where k is a positive integer greater than 1 and less than or equal to n, and optionally, the recognition computation described above may be convolution computation.

Alternatively, the second recognition subnetwork w2 can also be a convolutional neural network.

Fig. 5 is a schematic diagram of a network structure of an exemplary 1 st second recognition sub-network, and as shown in fig. 5, the 1 st second recognition sub-network includes two branches, inputs of the two branches are both feature maps output by the first recognition sub-network w1, and outputs of the two branches are a 1 st candidate keypoint probability map set and a 1 st candidate limb direction vector map set, respectively, where each branch includes 3 × 3 convolutional layers and 2 1 × 1 convolutional layers.

Fig. 6 is a schematic diagram of a network structure of an exemplary kth second recognition subnetwork, which, as shown in fig. 6, includes two branches, whose inputs are a feature map output by the first recognition subnetwork w1, a kth-1 candidate keypoint probability map set, and a kth-1 candidate limb direction vector map set, and whose outputs are the kth candidate keypoint probability map set and the kth candidate limb direction vector map set, respectively, where each branch includes 5 × 7 convolutional layers and 2 1 × 1 convolutional layers, respectively.

Next, the embodiment of the present application will briefly describe the technical process of step a in conjunction with the network structure of the key point identification network.

The terminal inputs the target character image into a first recognition sub-network to obtain a feature map output by the first recognition sub-network after feature extraction is carried out on the target character image, then the terminal inputs the feature map into n cascaded second recognition sub-networks, carries out recognition calculation on the ith input map through the ith second recognition sub-network, outputs an ith candidate key point probability map set and an ith candidate limb direction vector map set, and respectively uses the nth candidate key point probability map set and the nth candidate limb direction vector map set output by the nth second recognition sub-network as a key point probability map set and a limb direction vector map set finally output by the key point recognition network.

When i is equal to 1, the ith input map is a feature map output by the first recognition sub-network, and when 1 is more than i and less than or equal to n, the ith input map is a feature map output by the first recognition sub-network, an i-1 th candidate keypoint probability map set and an i-1 th candidate body direction vector map set.

And step B, the terminal identifies the human skeleton key points of the target character image according to the key point probability graph set and the limb direction vector graph set.

Wherein, step B may comprise the following substeps:

and a substep b1, for each kind of human skeleton key point, determining a plurality of candidate position points corresponding to the human skeleton key point in a key point probability graph corresponding to the human skeleton key point by the terminal.

Wherein the probability value at each candidate location point is the largest of the plurality of location points that are adjacent to the candidate location point.

Optionally, the terminal may perform maximum pooling operation on the key point probability map to obtain a pooled probability map, where the pooled probability map includes a plurality of pooled probability values, and the pooled probability values correspond to a plurality of probability values included in the key point probability map in a one-to-one manner. The terminal may then determine a target probability value from the keypoint probability map, the target probability value being equal to the corresponding pooling probability value. Then, the terminal may determine the location point where the target probability value is located as a candidate location point corresponding to the human skeleton key point.

b2, the terminal acquires a plurality of position point sets.

Each position point set comprises m candidate position points, the types of the human skeleton key points corresponding to the candidate position points in each position point set are different, and m is the number of the types of the human skeleton key points.

As in the above example, assuming that there are 3 kinds of human skeletal keypoints, where the 3 kinds of human skeletal keypoints are the left elbow keypoint, the left wrist keypoint, and the left shoulder keypoint, respectively, each position point set may include 3 candidate position points, where the 3 candidate position points correspond to the left elbow keypoint, the left wrist keypoint, and the left shoulder keypoint, respectively.

b3, the terminal determines a target position point set from the plurality of position point sets according to the limb direction vector diagram set, and determines human skeleton key points in the target character image according to each candidate position point included in the target position point set.

1. For each position point set, the terminal determines a connecting line between each candidate position point included in the position point set as a candidate limb to obtain a candidate limb set.

The candidate limb set can be described by the following mathematical languages:

wherein Z is a candidate limb set, j₁、j₂Representing the types of the human skeleton key points corresponding to the candidate position points, x and y respectively representing the numbers of the candidate position points in the position point set, m is the number of the candidate position points in the position point set,

a line connecting the candidate position point with the number x and the candidate position point with the number y, that is, it is a candidate limb.

2. And for each candidate limb of each position point set, the terminal determines a target limb direction vector diagram corresponding to the candidate limb from the limb direction vector diagram set according to the limb type of the candidate limb, and calculates the confidence coefficient of the candidate limb according to the vector value in the target limb direction vector diagram.

In practical application, the key point probability map and the limb direction vector map have the same size, so that the position points in the key point probability map and the position points in the limb direction vector map have a one-to-one correspondence relationship. Based on this one-to-one correspondence, the terminal may calculate a confidence level for the candidate limb. The confidence of the candidate limb is used for indicating the probability that the candidate position points at the two ends of the candidate limb belong to the same person.

The confidence of the candidate limb can be calculated according to the following formula:

wherein p (u) is a first coordinate of a position point interpolated between candidate position points at both ends of a candidate limb in the keypoint probability map, that is:

P(u)＝(1-u)r_x+ur_y。

wherein u is generally [0,1 ]]Is sampled at even intervals to obtain_xIs the coordinate of the candidate position point at one end of the candidate limb in the key point probability map, r_yThe coordinates of the candidate position point at the other end of the candidate limb in the key point probability map.

L_c(P (u)) is the vector value of the position point corresponding to the first coordinate in the target limb direction vector diagram, d_xFor the vector values at the position points in the target limb direction vector diagram corresponding to the candidate position point at one end of the candidate limb, d_yFor the target limb direction vector diagram and the candidate limbThe vector value at the position point corresponding to the candidate position point at the other end of (1).

3. For each position point set, the terminal calculates the confidence of the position point set according to the confidence of each candidate limb of the position point set.

Optionally, the terminal may superimpose the confidence degrees of each candidate limb of the position point set to obtain the confidence degree of the position point set.

4. And the terminal determines the target position point set from the plurality of position point sets according to the confidence coefficient of each position point set.

Optionally, the terminal may determine, as the target location point set, a location point set with the highest confidence in the plurality of location point sets.

The confidence of the position point set can be represented by the following formula:

the process of determining the target position point set is a process of finding the maximum confidence coefficient, and in the embodiment of the present application, the maximum confidence coefficient can be found by using the hungarian algorithm.

After the target position point set is obtained, the terminal can determine the pixel corresponding to each candidate position point in the target position point set from the target person image according to the corresponding relation between the position point in the key point probability map and the pixel in the target person image, and the determined pixel is used as a human skeleton key point.

Step 2032, the terminal corrects the human posture of the person in the non-ideal character image according to the plurality of human skeleton key points included in the non-ideal character image and the plurality of human skeleton key points included in the ideal character image.

Optionally, the terminal may input the plurality of human skeleton key points included in the non-ideal person image, the plurality of human skeleton key points included in the ideal person image, and the non-ideal person image into a Spatial Transform Network (STN), and perform correction processing on the human posture of the person in the non-ideal person image through the STN.

Please refer to fig. 7, which is a diagram illustrating an exemplary STN network structure. As shown in fig. 7, the STN includes a local Network (english: localization Network), a lattice generator (english: Grid generator), and a Sampler (english: Sampler).

The local network is a parameter predictor which can be a multilayer neural network, the input of the local network is a plurality of human skeleton key points included by the non-ideal character image, a plurality of human skeleton key points included by the ideal character image and the non-ideal character image, and the output of the local network is a set of grid generator parameters.

The grid generator is essentially a coordinate mapper of which the grid generator parameters output by the local network are parameters, and can output an image coordinate mapping relationship between the non-ideal character image and a target image to be output, wherein the human body posture of the character in the target image is the human body posture after correction processing. In other words, for each pixel in the non-ideal person image, the mesh generator may map it into the target image.

The coordinate mapping output by the grid generator may be represented in the following mathematical language:

wherein the content of the first and second substances,

the image coordinates of any pixel in the non-ideal character figure,

for the image coordinates of any pixel after mapping to the target image,

for mesh generator parameters。

The sampler can convert the coordinates of each pixel point in the non-ideal character image by utilizing the image coordinate mapping relation output by the grid generator so as to obtain the target image, wherein the target image is the result of the human body posture correction processing.

In the following, the embodiments of the present application will briefly describe the training process of STN:

1. a sufficient number of pairs of character images are captured, wherein each pair of character images includes a character image of a human body posture which is not subjected to correction processing and a character image of a human body posture which is subjected to correction processing (in practice, it is only necessary to ensure that the shooting places and the shooting contents of the two character images are the same, and any one of the pair of character images can be used as the character image of a human body posture which is not subjected to correction processing, so that the data capturing amount can be saved).

2. And identifying the key points of the human skeleton of each acquired person image to obtain the key points of the human skeleton included in each image.

3. The STN network is trained using, as inputs, a person image of a human body posture which is not subjected to correction processing, a human skeleton key point of the person image of the human body posture which is not subjected to correction processing, and a human skeleton key point of the person image of the human body posture which is subjected to correction processing, of the pair of person images, and using the person image of the human body posture which is subjected to correction processing as a real output.

According to the embodiment of the application, the non-ideal human body posture is corrected according to the key points of the human skeleton, so that large-amplitude limb deviation and small-amplitude posture deviation can be adjusted simultaneously, and the human body posture obtained after correction is natural.

Referring to fig. 8, a block diagram of an image processing apparatus 400 according to an embodiment of the present application is shown, where the image processing apparatus 400 may be configured in the server 101 or the terminal 102 shown in fig. 1. As shown in fig. 8, the image processing apparatus 400 may include: a first acquisition module 401, a second acquisition module 402 and a correction module 403.

The first obtaining module 401 is configured to obtain a non-ideal human image, where a human in the non-ideal human image has a non-ideal human posture.

The second obtaining module 402 is configured to obtain an ideal human image, where a human in the ideal human image has an ideal human posture.

The correction module 403 is configured to perform correction processing on the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.

Referring to fig. 9, optionally, in an embodiment of the present application, the correction module 403 may include an identification sub-module 4031 and a correction sub-module 4032.

The identification submodule 4031 is configured to perform human skeleton key point identification on the non-ideal person image and the ideal person image respectively to obtain a plurality of human skeleton key points included in the non-ideal person image and a plurality of human skeleton key points included in the ideal person image.

The correction sub-module 4032 is configured to correct the human posture of the person in the non-ideal person image according to a plurality of human skeleton key points included in the non-ideal person image and a plurality of human skeleton key points included in the ideal person image.

In an embodiment of the present application, the identification submodule 4031 is specifically configured to:

inputting the target character image into a key point recognition network to obtain a key point probability graph set and a limb direction vector graph set output by the key point recognition network, the target person image is the non-ideal person image or the ideal person image, the key point probability map set comprises a plurality of key point probability maps corresponding to different kinds of human skeleton key points in a one-to-one mode, each key point probability map comprises a probability value used for indicating the probability that the human skeleton key point of the corresponding kind is located at the position point of the probability value, the set of limb direction vector images comprises a plurality of limb direction vector images corresponding to different kinds of limbs one to one, the limb is a human body region between associated human body bone key points, and each limb direction vector diagram comprises vector values for indicating the direction of the corresponding kind of limb at the position point of the vector values; and identifying the human skeleton key points of the target character image according to the key point probability graph set and the limb direction vector graph set.

In an embodiment of the present application, the keypoint identification network includes a first identification subnetwork and n cascaded second identification subnetworks, where n is a positive integer greater than 1, and the identification submodule 4031 is specifically configured to:

inputting the target person image into the first recognition sub-network to obtain a feature map output by the first recognition sub-network after feature extraction is carried out on the target person image; inputting the feature map into the n second recognition sub-networks, performing recognition calculation on an ith input map through the ith second recognition sub-network, and outputting an ith candidate keypoint probability map set and an ith candidate limb direction vector map set, wherein when i is 1, the ith input map is the feature map, and when i is more than 1 and less than or equal to n, the ith input map is the feature map, the ith-1 candidate keypoint probability map set and the ith-1 candidate limb direction vector map set; and taking the nth candidate key point probability map set and the nth candidate limb direction vector map set output by the nth second identification subnetwork as the key point probability map set and the limb direction vector map set output by the key point identification network respectively.

for each kind of human skeleton key point, determining a plurality of candidate position points corresponding to the human skeleton key point in a key point probability graph corresponding to the human skeleton key point; acquiring a plurality of position point sets, wherein each position point set comprises m candidate position points, the types of human skeleton key points corresponding to the candidate position points in each position point set are different, and m is the number of the types of the human skeleton key points; and determining a target position point set from the plurality of position point sets according to the limb direction vector diagram set, and determining human skeleton key points in the target human image according to each candidate position point included in the target position point set.

performing maximum pooling operation on the key point probability map to obtain a pooled probability map, wherein the pooled probability map comprises a plurality of pooled probability values, and the pooled probability values correspond to a plurality of probability values included in the key point probability map in a one-to-one mode; determining a target probability value from the keypoint probability map, the target probability value being equal to the corresponding pooling probability value; and determining the position point where the target probability value is located as a candidate position point corresponding to the human skeleton key point.

for each position point set, determining a connecting line between each candidate position point included in the position point set as a candidate limb; for each candidate limb of each position point set, determining a target limb direction vector diagram corresponding to the candidate limb from the limb direction vector diagram set according to the limb type of the candidate limb, and calculating the confidence coefficient of the candidate limb according to the vector value in the target limb direction vector diagram, wherein the confidence coefficient is used for indicating the probability that the candidate position points at the two ends of the candidate limb belong to the same person; for each position point set, calculating the confidence coefficient of the position point set according to the confidence coefficient of each candidate limb of the position point set; and determining the target position point set from the plurality of position point sets according to the confidence degree of each position point set.

and superposing the confidence degrees of each candidate limb of the position point set to obtain the confidence degree of the position point set.

and determining the position point set with the highest confidence coefficient in a plurality of position point sets as the target position point set.

In an embodiment of the present application, the syndrome 4032 is specifically configured to:

inputting a plurality of human skeleton key points included in the non-ideal character image, a plurality of human skeleton key points included in the ideal character image and the non-ideal character image into a space transformation network STN, and correcting the human posture of the character in the non-ideal character image through the STN.

In an embodiment of the present application, the STN includes a local network, a trellis generator and a sampler, and the syndrome 4032 is specifically configured to:

obtaining grid generator parameters according to a plurality of human skeleton key points included by the non-ideal character image, a plurality of human skeleton key points included by the ideal character image and the non-ideal character image through the local network; obtaining an image coordinate mapping relation between the non-ideal character image and a target image to be output by using the grid generator parameters through the grid generator, wherein the human body posture of the character in the target image is the human body posture after the correction processing; and converting the coordinates of each pixel point in the non-ideal character image by using the image coordinate mapping relation through the sampler to obtain the target image.

The image processing apparatus provided in the embodiment of the present application can implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

For specific limitations of the image processing apparatus, reference may be made to the above limitations of the image processing method, which are not described herein again. The respective modules in the image processing apparatus described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the terminal, and can also be stored in a memory in the terminal in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment of the present application, a computer device is provided, and the computer device may be a terminal or a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor and a memory connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The computer program is executed by a processor to implement an image processing method provided by the embodiment of the application.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment of the present application, there is provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the following steps when executing the computer program:

In one embodiment of the application, the processor when executing the computer program further performs the steps of: respectively carrying out human skeleton key point identification on the non-ideal character image and the ideal character image to obtain a plurality of human skeleton key points included in the non-ideal character image and a plurality of human skeleton key points included in the ideal character image; and correcting the human posture of the person in the non-ideal person image according to a plurality of human skeleton key points included in the non-ideal person image and a plurality of human skeleton key points included in the ideal person image.

In one embodiment of the application, the processor when executing the computer program further performs the steps of: inputting the target character image into a key point recognition network to obtain a key point probability graph set and a limb direction vector graph set output by the key point recognition network, the target person image is the non-ideal person image or the ideal person image, the key point probability map set comprises a plurality of key point probability maps corresponding to different kinds of human skeleton key points in a one-to-one mode, each key point probability map comprises a probability value used for indicating the probability that the human skeleton key point of the corresponding kind is located at the position point of the probability value, the set of limb direction vector images comprises a plurality of limb direction vector images corresponding to different kinds of limbs one to one, the limb is a human body region between associated human body bone key points, and each limb direction vector diagram comprises vector values for indicating the direction of the corresponding kind of limb at the position point of the vector values; and identifying the human skeleton key points of the target character image according to the key point probability graph set and the limb direction vector graph set.

The keypoint identification network comprises a first identification subnetwork and a cascade of n second identification subnetworks, n being a positive integer greater than 1, and in one embodiment of the application, the processor, when executing the computer program, further implements the steps of: inputting the target person image into the first recognition sub-network to obtain a feature map output by the first recognition sub-network after feature extraction is carried out on the target person image; inputting the feature map into the n second recognition sub-networks, performing recognition calculation on an ith input map through the ith second recognition sub-network, and outputting an ith candidate keypoint probability map set and an ith candidate limb direction vector map set, wherein when i is 1, the ith input map is the feature map, and when i is more than 1 and less than or equal to n, the ith input map is the feature map, the ith-1 candidate keypoint probability map set and the ith-1 candidate limb direction vector map set; and taking the nth candidate key point probability map set and the nth candidate limb direction vector map set output by the nth second identification subnetwork as the key point probability map set and the limb direction vector map set output by the key point identification network respectively.

In one embodiment of the application, the processor when executing the computer program further performs the steps of: for each kind of human skeleton key point, determining a plurality of candidate position points corresponding to the human skeleton key point in a key point probability graph corresponding to the human skeleton key point; acquiring a plurality of position point sets, wherein each position point set comprises m candidate position points, the types of human skeleton key points corresponding to the candidate position points in each position point set are different, and m is the number of the types of the human skeleton key points; and determining a target position point set from the plurality of position point sets according to the limb direction vector diagram set, and determining human skeleton key points in the target human image according to each candidate position point included in the target position point set.

In one embodiment of the application, the processor when executing the computer program further performs the steps of: performing maximum pooling operation on the key point probability map to obtain a pooled probability map, wherein the pooled probability map comprises a plurality of pooled probability values, and the pooled probability values correspond to a plurality of probability values included in the key point probability map in a one-to-one mode; determining a target probability value from the keypoint probability map, the target probability value being equal to the corresponding pooling probability value; and determining the position point where the target probability value is located as a candidate position point corresponding to the human skeleton key point.

In one embodiment of the application, the processor when executing the computer program further performs the steps of: for each position point set, determining a connecting line between each candidate position point included in the position point set as a candidate limb; for each candidate limb of each position point set, determining a target limb direction vector diagram corresponding to the candidate limb from the limb direction vector diagram set according to the limb type of the candidate limb, and calculating the confidence coefficient of the candidate limb according to the vector value in the target limb direction vector diagram, wherein the confidence coefficient is used for indicating the probability that the candidate position points at the two ends of the candidate limb belong to the same person; for each position point set, calculating the confidence coefficient of the position point set according to the confidence coefficient of each candidate limb of the position point set; and determining the target position point set from the plurality of position point sets according to the confidence degree of each position point set.

In one embodiment of the application, the processor when executing the computer program further performs the steps of: and superposing the confidence degrees of each candidate limb of the position point set to obtain the confidence degree of the position point set.

In one embodiment of the application, the processor when executing the computer program further performs the steps of: and determining the position point set with the highest confidence coefficient in a plurality of position point sets as the target position point set.

In one embodiment of the application, the processor when executing the computer program further performs the steps of: inputting a plurality of human skeleton key points included in the non-ideal character image, a plurality of human skeleton key points included in the ideal character image and the non-ideal character image into a space transformation network STN, and correcting the human posture of the character in the non-ideal character image through the STN.

The STN comprises a local network, a trellis generator and a sampler, which when executed by a processor in one embodiment of the application further performs the steps of: obtaining grid generator parameters according to a plurality of human skeleton key points included by the non-ideal character image, a plurality of human skeleton key points included by the ideal character image and the non-ideal character image through the local network; obtaining an image coordinate mapping relation between the non-ideal character image and a target image to be output by using the grid generator parameters through the grid generator, wherein the human body posture of the character in the target image is the human body posture after the correction processing; and converting the coordinates of each pixel point in the non-ideal character image by using the image coordinate mapping relation through the sampler to obtain the target image.

The implementation principle and technical effect of the computer device provided by the embodiment of the present application are similar to those of the method embodiment described above, and are not described herein again.

In an embodiment of the application, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of:

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: respectively carrying out human skeleton key point identification on the non-ideal character image and the ideal character image to obtain a plurality of human skeleton key points included in the non-ideal character image and a plurality of human skeleton key points included in the ideal character image; and correcting the human posture of the person in the non-ideal person image according to a plurality of human skeleton key points included in the non-ideal person image and a plurality of human skeleton key points included in the ideal person image.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: inputting the target character image into a key point recognition network to obtain a key point probability graph set and a limb direction vector graph set output by the key point recognition network, the target person image is the non-ideal person image or the ideal person image, the key point probability map set comprises a plurality of key point probability maps corresponding to different kinds of human skeleton key points in a one-to-one mode, each key point probability map comprises a probability value used for indicating the probability that the human skeleton key point of the corresponding kind is located at the position point of the probability value, the set of limb direction vector images comprises a plurality of limb direction vector images corresponding to different kinds of limbs one to one, the limb is a human body region between associated human body bone key points, and each limb direction vector diagram comprises vector values for indicating the direction of the corresponding kind of limb at the position point of the vector values; and identifying the human skeleton key points of the target character image according to the key point probability graph set and the limb direction vector graph set.

The keypoint identification network comprises a first identification subnetwork and a concatenation of n second identification subnetworks, n being a positive integer greater than 1, the computer program further realizing the following steps when executed by a processor in one embodiment of the application: inputting the target person image into the first recognition sub-network to obtain a feature map output by the first recognition sub-network after feature extraction is carried out on the target person image; inputting the feature map into the n second recognition sub-networks, performing recognition calculation on an ith input map through the ith second recognition sub-network, and outputting an ith candidate keypoint probability map set and an ith candidate limb direction vector map set, wherein when i is 1, the ith input map is the feature map, and when i is more than 1 and less than or equal to n, the ith input map is the feature map, the ith-1 candidate keypoint probability map set and the ith-1 candidate limb direction vector map set; and taking the nth candidate key point probability map set and the nth candidate limb direction vector map set output by the nth second identification subnetwork as the key point probability map set and the limb direction vector map set output by the key point identification network respectively.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: for each kind of human skeleton key point, determining a plurality of candidate position points corresponding to the human skeleton key point in a key point probability graph corresponding to the human skeleton key point; acquiring a plurality of position point sets, wherein each position point set comprises m candidate position points, the types of human skeleton key points corresponding to the candidate position points in each position point set are different, and m is the number of the types of the human skeleton key points; and determining a target position point set from the plurality of position point sets according to the limb direction vector diagram set, and determining human skeleton key points in the target human image according to each candidate position point included in the target position point set.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: performing maximum pooling operation on the key point probability map to obtain a pooled probability map, wherein the pooled probability map comprises a plurality of pooled probability values, and the pooled probability values correspond to a plurality of probability values included in the key point probability map in a one-to-one mode; determining a target probability value from the keypoint probability map, the target probability value being equal to the corresponding pooling probability value; and determining the position point where the target probability value is located as a candidate position point corresponding to the human skeleton key point.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: for each position point set, determining a connecting line between each candidate position point included in the position point set as a candidate limb; for each candidate limb of each position point set, determining a target limb direction vector diagram corresponding to the candidate limb from the limb direction vector diagram set according to the limb type of the candidate limb, and calculating the confidence coefficient of the candidate limb according to the vector value in the target limb direction vector diagram, wherein the confidence coefficient is used for indicating the probability that the candidate position points at the two ends of the candidate limb belong to the same person; for each position point set, calculating the confidence coefficient of the position point set according to the confidence coefficient of each candidate limb of the position point set; and determining the target position point set from the plurality of position point sets according to the confidence degree of each position point set.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: and superposing the confidence degrees of each candidate limb of the position point set to obtain the confidence degree of the position point set.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: and determining the position point set with the highest confidence coefficient in a plurality of position point sets as the target position point set.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: inputting a plurality of human skeleton key points included in the non-ideal character image, a plurality of human skeleton key points included in the ideal character image and the non-ideal character image into a space transformation network STN, and correcting the human posture of the character in the non-ideal character image through the STN.

The STN comprises a local network, a trellis generator and a sampler, the computer program, when executed by a processor, further implementing the steps of: obtaining grid generator parameters according to a plurality of human skeleton key points included by the non-ideal character image, a plurality of human skeleton key points included by the ideal character image and the non-ideal character image through the local network; obtaining an image coordinate mapping relation between the non-ideal character image and a target image to be output by using the grid generator parameters through the grid generator, wherein the human body posture of the character in the target image is the human body posture after the correction processing; and converting the coordinates of each pixel point in the non-ideal character image by using the image coordinate mapping relation through the sampler to obtain the target image.

The implementation principle and technical effect of the computer-readable storage medium provided by this embodiment are similar to those of the above-described method embodiment, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. The non-volatile memory may include read-only memory (RO-many), programmable RO-many (PRO-many), electrically programmable RO-many (EPRO-many), electrically erasable programmable RO-many (EEPRO-many), or flash memory. Volatile memory may include random access memory (RA multi) or external cache memory. By way of illustration and not limitation, RA is available in many forms, such as static RA multiple (SRA multiple), dynamic RA multiple (DRA multiple), synchronous DRA multiple (SDRA multiple), double data rate SDRA multiple (DDRSDRA multiple), enhanced SDRA multiple (ESDRA multiple), synchronous link (Sy multiple chli multiple k) DRA multiple (SLDRA multiple), memory bus (RA multiple bus) direct RA multiple (RDRA multiple), direct memory bus dynamic RA multiple (DRDRA multiple), and memory bus dynamic RA multiple (RDRA multiple).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the claims. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring a non-ideal character image, wherein characters in the non-ideal character image have non-ideal human body postures;

acquiring an ideal character image, wherein characters in the ideal character image have ideal human body postures;

respectively carrying out human skeleton key point identification on the non-ideal character image and the ideal character image to obtain a plurality of human skeleton key points included in the non-ideal character image and a plurality of human skeleton key points included in the ideal character image;

inputting a plurality of human skeleton key points included in the non-ideal character image, a plurality of human skeleton key points included in the ideal character image and the non-ideal character image into a Space Transformation Network (STN), and correcting the human posture of the character in the non-ideal character image through the STN; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.

2. The method of claim 1, wherein the identifying the skeletal key points of the non-ideal human image and the ideal human image respectively comprises:

inputting the target character image into a key point recognition network to obtain a key point probability graph set and a limb direction vector graph set output by the key point recognition network, the target person image is the non-ideal person image or the ideal person image, the key point probability map set comprises a plurality of key point probability maps corresponding to different types of human skeleton key points in a one-to-one mode, each key point probability map comprises a probability value used for indicating the probability that the corresponding type of human skeleton key point is located at the position point of the probability value, the set of limb direction vector images comprises a plurality of limb direction vector images corresponding to different kinds of limbs one to one, the limbs are human body regions between associated human body bone key points, and each limb direction vector diagram comprises vector values for indicating the direction of the corresponding kind of limb at the position point of the vector values;

and identifying the human skeleton key points of the target character image according to the key point probability graph set and the limb direction vector graph set.

3. The method of claim 2, wherein the keypoint recognition network comprises a first recognition sub-network and a cascade of n second recognition sub-networks, n being a positive integer greater than 1, and wherein inputting the target person image into the keypoint recognition network results in a set of keypoint probability maps and a set of body orientation vector maps output by the keypoint recognition network, comprises:

inputting the target person image into the first recognition sub-network to obtain a feature map output after the first recognition sub-network performs feature extraction on the target person image;

inputting the feature map into the n second recognition sub-networks, performing recognition calculation on an ith input map through the ith second recognition sub-network, and outputting an ith candidate keypoint probability map set and an ith candidate limb direction vector map set, wherein when i is 1, the ith input map is the feature map, and when i is more than 1 and less than or equal to n, the ith input map is the feature map, the ith-1 candidate keypoint probability map set and the ith-1 candidate limb direction vector map set;

and taking the nth candidate key point probability map set and the nth candidate limb direction vector map set output by the nth second identification subnetwork as the key point probability map set and the limb direction vector map set output by the key point identification network respectively.

4. The method of claim 2, wherein said identifying the target person image for human skeletal keypoints based on said set of keypoint probability maps and said set of extremity orientation vector maps comprises:

for each kind of human skeleton key points, determining a plurality of candidate position points corresponding to the human skeleton key points in a key point probability graph corresponding to the human skeleton key points;

acquiring a plurality of position point sets, wherein each position point set comprises m candidate position points, the types of human skeleton key points corresponding to the candidate position points in each position point set are different, and m is the number of the types of the human skeleton key points;

and determining a target position point set from the plurality of position point sets according to the limb direction vector image set, and determining human skeleton key points in the target character image according to each candidate position point included in the target position point set.

5. The method of claim 4, wherein determining a plurality of candidate location points corresponding to the human bone keypoints in a keypoint probability map corresponding to the human bone keypoints comprises:

performing maximum pooling operation on the key point probability map to obtain a pooled probability map, wherein the pooled probability map comprises a plurality of pooled probability values, and the pooled probability values are in one-to-one correspondence with a plurality of probability values included in the key point probability map;

determining a target probability value from the keypoint probability map, the target probability value being equal to a corresponding pooling probability value;

and determining the position point where the target probability value is located as a candidate position point corresponding to the human skeleton key point.

6. The method according to claim 4, wherein said determining a set of target location points from said plurality of sets of location points based on said set of limb direction vector images comprises:

for each position point set, determining a connecting line between candidate position points included in the position point set as a candidate limb;

for each candidate limb of each position point set, determining a target limb direction vector diagram corresponding to the candidate limb from the limb direction vector diagram set according to the limb type of the candidate limb, and calculating the confidence coefficient of the candidate limb according to the vector value in the target limb direction vector diagram, wherein the confidence coefficient is used for indicating the probability that the candidate position points at two ends of the candidate limb belong to the same person;

for each of the location point sets, calculating a confidence of the location point set according to the confidence of each candidate limb of the location point set;

determining the target set of location points from the plurality of sets of location points based on the confidence level for each of the sets of location points.

7. The method of claim 6, wherein the calculating the confidence level for the set of location points based on the confidence level for each candidate limb of the set of location points comprises:

8. The method of claim 7, wherein determining the target set of location points from the plurality of sets of location points based on the confidence level for each of the sets of location points comprises:

and determining the position point set with the highest confidence coefficient in the plurality of position point sets as the target position point set.

9. The method of claim 1, wherein the STN comprises a local network, a mesh generator and a sampler, and wherein the performing the correction process on the human body posture of the person in the non-ideal person image through the STN comprises:

obtaining grid generator parameters according to a plurality of human skeleton key points included by the non-ideal character image, a plurality of human skeleton key points included by the ideal character image and the non-ideal character image through the local network;

obtaining an image coordinate mapping relation between the non-ideal character image and a target image to be output by using the grid generator parameters through the grid generator, wherein the human body posture of the character in the target image is the human body posture after the correction processing;

and converting the coordinates of each pixel point in the non-ideal character image by using the image coordinate mapping relation through the sampler to obtain the target image.

10. An image processing apparatus, characterized in that the apparatus comprises:

the second acquisition module is used for acquiring an ideal character image, wherein characters in the ideal character image have ideal human body postures;

the correction module is used for respectively carrying out human skeleton key point identification on the non-ideal character image and the ideal character image to obtain a plurality of human skeleton key points included in the non-ideal character image and a plurality of human skeleton key points included in the ideal character image; inputting a plurality of human skeleton key points included in the non-ideal character image, a plurality of human skeleton key points included in the ideal character image and the non-ideal character image into a Space Transformation Network (STN), and correcting the human posture of the character in the non-ideal character image through the STN; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.

11. A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, implements the image processing method of any one of claims 1 to 9.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the image processing method according to any one of claims 1 to 9.