CN112734632A

CN112734632A - Image processing method, image processing device, electronic equipment and readable storage medium

Info

Publication number: CN112734632A
Application number: CN202110009523.2A
Authority: CN
Inventors: 李益永; 黄秋实; 孙准; 井雪; 项伟
Original assignee: Bigo Technology Pte Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2021-01-05
Filing date: 2021-01-05
Publication date: 2021-04-30
Anticipated expiration: 2041-01-05
Also published as: WO2022148379A1; CN112734632B

Abstract

The invention provides an image processing method and an image processing device, wherein the image processing method comprises the following steps: acquiring an image to be migrated and a reference image; the image to be migrated includes: a target object of which the posture is to be converted; the reference image includes: a reference object presenting a reference pose; acquiring a first key feature of a target object and a second key feature of a reference object; determining a posture migration matrix according to the first key feature and the second key feature; acquiring an initial image; and determining a target synthetic image according to the attitude migration matrix, the image to be migrated and the initial image. In the embodiment of the invention, a large number of training sample training models are not required to be acquired to obtain the target synthetic image, the complexity of image migration is reduced, the initial image is acquired, and the attitude migration matrix, the image to be migrated and the initial image are adopted to migrate the whole image to be migrated, so that the details of the image to be migrated can be ensured to be displayed in the target synthetic image, and the omission of the details is prevented.

Description

Image processing method, image processing device, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a readable storage medium.

Background

The pose transition is to process one image a to make the person P in the image a have the pose of the person H in the other image B, and to obtain a composite image C.

At present, in order to implement pose migration, a plurality of images a, a plurality of images B, and a plurality of images C are used as training samples, an image migration model is trained, and then a new image a and a new image B are processed according to the image migration model to obtain a new composite image C.

According to the posture migration mode, when the image migration model is trained, a large number of training samples need to be prepared, and the training mode is complicated. Moreover, when the image migration model is used for image migration, when the difference between the clothes shapes of the people in the two images is large, the people in the composite image C cannot keep the details of the people P in the original image a, the shapes of the people in different composite images C at different viewing angles and postures are large, in addition, only part of the people in the people may be migrated, and other parts of the people need to be processed again to achieve the migration, which results in a complicated migration process.

Disclosure of Invention

In view of this, the present invention provides an image processing method, which solves the problems of tedious migration process and incomplete migration to some extent.

A first aspect of an embodiment of the present invention provides an image processing method, where the method includes:

acquiring an image to be migrated and a reference image; the image to be migrated comprises: a target object of which the posture is to be converted; the reference image comprises: a reference object presenting a reference pose;

acquiring a first key feature of the target object and a second key feature of the reference object;

determining an attitude migration matrix according to the first key feature and the second key feature;

acquiring an initial image;

and determining a target synthetic image according to the attitude transition matrix, the image to be transitioned and the initial image.

A second aspect of embodiments of the present invention provides an image processing apparatus, including:

the first acquisition module is used for acquiring an image to be migrated and a reference image; the image to be migrated comprises: a target object of which the posture is to be converted; the reference image comprises: a reference object presenting a reference pose;

the second acquisition module is used for acquiring the first key feature of the target object and the second key feature of the reference object;

the first determining module is used for determining a posture migration matrix according to the first key feature and the second key feature;

the third acquisition module is used for acquiring an initial image;

and the second determining module is used for determining a target synthetic image according to the attitude transition matrix, the image to be transitioned and the initial image.

A third aspect of embodiments of the present invention provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

A fourth aspect of embodiments of the present invention provides a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In the embodiment of the invention, an image to be migrated and a reference image are obtained; the image to be migrated comprises: a target object of which the posture is to be converted; the reference image comprises: a reference object presenting a reference pose; acquiring a first key feature of the target object and a second key feature of the reference object; determining an attitude migration matrix according to the first key feature and the second key feature; acquiring an initial image; and determining a target synthetic image according to the attitude transition matrix, the image to be transitioned and the initial image. In the embodiment of the invention, a large number of training sample training models are not required to be acquired to obtain the target synthetic image, the complexity of image migration is reduced, the initial image is acquired, and the attitude migration matrix, the image to be migrated and the initial image are adopted to migrate the whole image to be migrated, so that the details of the image to be migrated can be ensured to be displayed in the target synthetic image, and the omission of the details is prevented.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flowchart illustrating steps of an image processing method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an image processing method according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating another image processing method according to an embodiment of the present invention;

fig. 4 is a block diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Referring to fig. 1, a flowchart illustrating steps of an image processing method according to an embodiment of the present invention is shown, where the image processing method specifically includes the following steps:

step 101, acquiring an image to be migrated and a reference image; the image to be migrated comprises: a target object of which the posture is to be converted; the reference image comprises: a reference object presenting a reference pose.

Wherein, the image to be migrated is m₁*n₁Image of x 3, wherein m₁Is the width, n, of the image to be migrated₁Is the height of the image to be migrated, 3 means that the image to be migrated is an RGB image. Reference picture is m₂*n₂Image of x 3, wherein m₂Is the width of the reference picture, n₂Is the high of the reference image, 3 means that the reference image is an RGB image.

In the embodiment of the present invention, the target object and the reference object generally refer to human objects in an image; referring to fig. 2, an image a is an image to be migrated, and an image B is a reference image; the image A to be migrated comprises a target object P, and the reference image B comprises: reference object H.

In the embodiment of the present invention, the user may select the image to be migrated and the reference image from the image memory according to the requirement, or may capture the image at any time, which is not limited to this.

In addition, a user can select a section of video as a reference video, each frame image in the reference video is used as a reference image, and then the image to be migrated is processed based on each frame of reference image.

Step 102, obtaining a first key feature of the target object and a second key feature of the reference object.

In the embodiment of the present invention, the image to be migrated is represented by a dimensional vector, and the dimensional vector of the image to be migrated is: x (i × m + n + j × m + k) ═ x (j, k, i); wherein i is more than or equal to 3 and more than or equal to 1, n is more than or equal to i and more than or equal to 1, and m is more than or equal to i and more than or equal to 1. The reference image is also expressed by a dimension vector, and the dimension vector of the reference image is y (i × m × n + j × m + k) ═ y (j, k, i); wherein i is more than or equal to 3 and more than or equal to 1, n is more than or equal to i and more than or equal to 1, and m is more than or equal to i and more than or equal to 1.

Specifically, each pixel point in the image to be migrated can represent the position thereof by a dimensional vector or a coordinate, and each pixel point of the reference image can also represent the position thereof by a dimensional vector or a coordinate. For example: an image has 10 rows by 10 columns of pixel points, and the coordinates of the pixel point p in the 5 th row and the 5 th column are expressed as (5, 5); this pixel point p is denoted p (45) by a one-dimensional vector.

In the embodiment of the invention, the first key feature refers to coordinates of a plurality of feature points capable of marking the posture of the target object; for example, the first key feature may be coordinates of various joints of the target object; each joint includes: shoulder joint, elbow joint, wrist joint, carpometacarpal joint, hip joint, knee joint, ankle joint, etc. Furthermore, the first key feature may also be coordinates of a main part of the human body, for example, a part characterizing the head pose, including: eyes, nose tip, temple and chin tip; a portion characterizing a pose of an arm, comprising: shoulder, elbow and carpometacarpal joints; a portion characterizing hand pose, comprising: the knuckles and fingertips of the individual fingers; a site characterizing leg pose comprising: hip joint, knee joint, ankle joint.

In an embodiment of the present invention, the first key feature is a preset key feature in the target object; the second key features correspond to the first key features one to one.

Specifically, when the first key feature is obtained, a second key feature may be obtained according to the first key feature, where the second key feature corresponds to the first key feature, for example, the first key feature includes: and coordinates of the shoulder joint, the elbow joint, the radial wrist joint, the carpometacarpal joint, the hip joint, the knee joint and the ankle joint of the target object in the image to be migrated, and the second key characteristics comprise the coordinates of the shoulder joint, the elbow joint, the radial wrist joint, the carpometacarpal joint, the hip joint, the knee joint and the ankle joint of the reference object in the reference image.

In the embodiment of the present invention, if the target image only includes the face image, that is, only the face pose is migrated, the first key feature is set as the coordinates of each feature point of the human face, for example: eyes, nose, eyebrows, ears, mouth, etc.

In the embodiment of the present invention, the user may select the part to be migrated as needed, for example, when only the face is migrated, only the first key feature of the face is selected, and when only the body is migrated, only the first key feature of the body is selected.

Step 103, determining a posture migration matrix according to the first key feature and the second key feature.

In this embodiment of the present invention, the step 103 includes: determining coordinate values of each first key feature and each second key feature;

and determining the attitude transition matrix according to the coordinate values of the first key features and the coordinate values of the second key features, wherein the attitude transition matrix is used for converting the coordinate values of the first key features into the coordinate values of the second key features corresponding to the first key features.

In the embodiment of the present invention, the gesture migration matrix refers to a gesture migration matrix required for transferring the coordinates of the first key feature to the coordinates of the second key feature. For example, the first key features include: coordinates of the temple (a, b), the shoulder joint (c, d), the temple (m, n) of the second key feature, the shoulder joint (o, p); the coordinates of the temple of the first key feature are stored as (m, n), the coordinates of the shoulder joint of the first key feature are stored as (o, p), the coordinates of the elbow joint of the first key feature are stored as (q, r), and so on. Wherein, according to

The posture transition matrix W is obtained, and when the first key feature includes a plurality of (3 or more), the posture transition matrix W may be obtained in this manner.

In the embodiment of the present invention, if the coordinates of each first key feature is Px and the coordinates of each second key feature is Py, W ═ W [ Px, Py ] is the posture transition matrix required for transforming from the first key feature to the second key feature.

After the attitude migration matrix W is determined, all the pixel points of the image to be migrated can be transferred by adopting the attitude migration matrix W.

Step 104, acquiring an initial image.

In the embodiment of the present invention, the initial image is an initial image that needs to be input to complete a subsequent step of obtaining a target composite image in a preset manner in the embodiment of the present invention.

In an embodiment of the present invention, step 104 includes: and inputting the attitude migration matrix and the image to be migrated into an initial network model to obtain an initial image.

In this embodiment of the present invention, the initial network model may be a model obtained by training according to data samples, where the data samples include: converting a plurality of image samples to be migrated into attitude migration matrix samples of the reference image samples, and converting the plurality of image samples to be migrated and a plurality of target synthetic image samples; training by adopting the data samples to obtain an initial network model; and then inputting the attitude migration matrix and the image to be migrated into an initial network model obtained by training to obtain an initial image, wherein the initial image obtained by adopting the mode is an initial synthetic image of a target object in the image to be migrated in the attitude of a reference object, but details of the initial synthetic image are omitted and all characteristics of the image to be migrated cannot be completely presented. After the subsequent steps are continuously executed, the details of the image to be migrated can be completely supplemented.

In addition, the operating principle of the initial network model may also be Z₀W x; wherein Z is₀Is the dimensional vector of the initial image, W is the pose migration matrix, and x is the dimensional vector of the image to be migrated. The initial image obtained in this way has some characteristics of the image to be migrated, but is not clear yet, and all pixel points in the image to be migrated are not migrated. The initial image obtained in the mode is used as the basis of subsequent calculation, and the migration quality of the image to be migrated can be improved.

Optionally, step 104, includes: and taking a preset image with a zero dimensional vector as the initial image. The preset image can be stored in a memory, and is called when the image to be migrated is processed.

In the embodiment of the present invention, the value of the dimensional vector corresponding to the initial image may also be assigned to zero, so as to perform subsequent calculation.

And 105, determining a target synthetic image according to the attitude transition matrix, the image to be transitioned and the initial image.

In an embodiment of the present invention, step 105 includes: obtaining an intermediate composite image according to a preset mode, the attitude migration matrix, the image to be migrated and the initial image; and taking the intermediate synthetic image as a new initial image, and circularly executing the posture migration matrix, the image to be migrated and the initial image for preset times according to a preset mode to obtain an intermediate synthetic image.

In the embodiment of the present invention, F (Z, Px, Py) is set to represent a dimensional vector of the target synthetic image in which the target object in the image to be migrated is migrated from the posture of the target object to the posture of the reference object. Then min ∑ iif [ z, P is required_x，P_y]When x | approaches 0, the details of the image to be migrated are all present in the target composite image. Wherein x is a dimensional vector of the image to be migrated. The following pairs of min ∑ iif [ z, P_x，P_y]-x | solving step as approaching 0;

1) for min ∑ iif [ z, P_x,P_y]-x | is optimized to obtain

2) Let A be (W [ P ]_x,P_y])^TW[P_x,P_y]，b＝(W[P_x,P_y])^Tx; then pair

Carrying out inverse problem modeling to solve an equation set AZ ═ b;

3) if the solution accuracy e is set to 0.0000001, r is₀＝b-AZ₀；p₀＝r₀(ii) a If r is₀Is greater than the value of (a) e,

r_k＝r_k-1+α_k-1Apk-1；p_k＝r_k+β_k-1p_k-1；

wherein, let A be (W)^TW；P₀＝r₀；r₀＝b-AZ₀；b＝W^Tx；

4) Obtained by arranging the formula, Z_k+1＝f(b，A，Z_k) Visible, the target composite image Z_k+1Is dependent on the pose migration matrix W, the image x to be migrated, and the initial image Z_kIn (1).

In the embodiment of the present invention, as can be seen from the foregoing steps 1) to 4), the preset manner in the embodiment of the present invention is specifically as follows:

Z_k+1＝Z_k+α_kP_k；

wherein Z is_k+1A first dimension vector of the intermediate composite image; z_kA second-dimension vector in the initial image; wherein W is the attitude migration matrix;

r_k＝r_k-1+α_k-1Apk-1；p_k＝r_k+β_k-1p_k-1；

wherein, let A be (W)^TW；P₀＝r₀；r₀＝b-AZ₀；b＝W^Tx; wherein x is a third-dimensional vector of the image to be migrated, and Z₀And obtaining an initial fourth-dimensional vector of the initial image for the initial network model.

In the embodiment of the present invention, the preset manner is to adopt the above formula Z_k+1＝Z_k+α_kP_kAnd obtaining a target composite image.

In an embodiment of the invention, gesture migrationThe shift matrix W is W [ P ] as described above_x,P_y]。

Specifically, for example, the initial image obtained as described above is taken as Z₀(ii) a The attitude migration matrix W, the image x to be migrated and the initial image Z are migrated for the first time₀Inputting a formula corresponding to the preset mode to obtain:

wherein r is₀＝b-Az₀；P₀＝r₀(ii) a Then alpha is₀＝1/A＝1/(W)^TW；P₀＝b-Az₀＝W^Tx-(W)^TW·Z₀(ii) a Then Z₁＝Z₀+(1/(W)^TW)·(W^Tx-(W)^TW·Z₀)＝x/W-Z₀. Finally obtain Z₁＝x/W-Z₀Wherein Z is₁Firstly, a posture migration matrix W, an image x to be migrated and an initial image Z are migrated₀Inputting a formula corresponding to the preset mode to obtain a dimensional vector of the intermediate synthetic image; z₀Is the dimensional vector of the acquired initial image.

The intermediate composite image Z obtained above₁As a new initial image, the shift matrix W, the image x to be shifted and the new initial image Z are shifted a second time₁Inputting a formula corresponding to the preset mode to obtain: z₂＝Z₁+α₁P₁(ii) a Wherein,

r₁＝r₀+α₀Ap₀；p₁＝r₁+β₀p₀；

wherein, let A be (W)^TW；P₀＝r₀；r₀＝b-AZ₀；b＝W^Tx; to obtain Z₂。

In the embodiment of the present invention, the dimensional vector of the image may be a one-dimensional vector, a two-dimensional vector, or a three-dimensional vector, which is not limited herein.

In the embodiment of the invention, the preset times are more than or equal to 2; when the preset times is 2 times, the final target synthetic image is Z₂. Wherein when the final target synthetic image Z is obtained₂When the details are not clear enough, the intermediate synthetic image can be circularly executed as a new initial image, and the intermediate synthetic image can be obtained according to the preset mode, the posture migration matrix, the image to be migrated and the initial image for a preset number of times until the target synthetic image is an image satisfied by the user.

In the embodiment of the invention, the method comprises the following steps: a plurality of frames of reference images; the reference image comprises a temporal sequence; then after step 105, the method further includes: and arranging the plurality of frames of the target synthetic images according to the time sequence to obtain a target synthetic video.

In the embodiment of the invention, the method further comprises the step of inputting the target synthetic image into the completion model to obtain a final synthetic image. The completion model is used for completing the missing part in the target synthetic image. For example, when the image to be migrated is input by the user as an image lacking a human face or lacking a part of a limb, the completion model completes the missing part.

Specifically, the completion model can be obtained by training a large number of images as training samples; for example, a patch model is trained using a back-shot (no face shot), no leg shot, and no arm shot and corresponding full body photograph as training samples.

Wherein, a plurality of frames of reference images with time sequence form a reference video; the user can click and upload the reference video, and the reference video comprises: a plurality of frame reference images, wherein the plurality of frame reference images have corresponding time sequences; the server or the electronic device sequentially executes the

steps

101 and 105 on each frame of image in the image to be migrated and the reference video, and finally obtains a multi-frame target synthetic image, and the multi-frame target synthetic image is arranged according to the time sequence, so that the final target synthetic video can be obtained.

In the embodiment of the present invention, the method further includes: identifying each frame of image in a reference video, and selecting an image comprising a human body object as a reference image; taking an image not containing a human body object as a transition image; and finally arranging the multi-frame target synthetic image and the transition image according to the time sequence to obtain the target synthetic video.

The reference video includes a dance motion or other motions, which is not limited herein.

In an embodiment of the present invention, the step 105 includes: extracting a target object in the image to be migrated; determining a synthetic object according to the attitude transition matrix, the target object and the initial image; and synthesizing the background of the reference image and the synthetic object to obtain the target synthetic image.

In the embodiment of the present invention, only the target object in the image to be migrated is migrated, but the background of the target object in the image to be migrated is not migrated, the human body object obtained after the target object in the image to be migrated is a synthetic object, and then the background of the reference image is synthesized with the synthetic object, referring to fig. 3, after the target object corresponding to the image a to be migrated is migrated, the background of the reference image B is used to obtain the target synthetic image C.

In the embodiment of the present invention, referring to fig. 2, the whole image a to be migrated may also be migrated to obtain a target composite image C.

Fig. 4 is a block diagram of an image processing apparatus according to an embodiment of the present invention, and as shown in the drawing, the apparatus may include:

the third acquisition module is used for acquiring an initial image;

The image processing device provided by the embodiment of the invention is provided with the corresponding functional module for executing the image processing method, can execute the image processing method provided by the embodiment of the invention and can achieve the same beneficial effects.

In another embodiment provided by the present invention, there is also provided an electronic device, which may include: the processor, the memory and the computer program stored in the memory and capable of running on the processor, when the processor executes the program, the processes of the image processing method embodiment are realized, the same technical effect can be achieved, and the details are not repeated here to avoid repetition. For example, as shown in fig. 5, the electronic device may specifically include: a processor 301, a storage device 302, a display screen 303 with touch functionality, an input device 304, an output device 305, and a communication device 306. The number of the processors 301 in the electronic device may be one or more, and one processor 301 is taken as an example in fig. 5. The processor 301, the storage means 302, the display 303, the input means 304, the output means 305 and the communication means 306 of the electronic device may be connected by a bus or other means.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which has instructions stored therein, and when the instructions are executed on a computer, the instructions cause the computer to execute the image processing method described in any of the above embodiments.

In yet another embodiment, the present invention further provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the image processing method described in any of the above embodiments.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring an initial image;

2. The method of claim 1, wherein said acquiring an initial image comprises:

and inputting the attitude migration matrix and the image to be migrated into an initial network model to obtain an initial image.

3. The method of claim 1, wherein said acquiring an initial image comprises:

and taking a preset image with a zero dimensional vector as the initial image.

4. The method of claim 1, wherein determining a target composite image from the pose migration matrix, the image to be migrated, and the initial image comprises:

obtaining an intermediate composite image according to a preset mode, the attitude migration matrix, the image to be migrated and the initial image;

and taking the intermediate synthetic image as a new initial image, and circularly executing the step of obtaining an intermediate synthetic image according to a preset mode for a preset number of times, the attitude transition matrix, the image to be transitioned and the initial image to obtain the target synthetic image.

5. The method according to claim 4, wherein the predetermined manner is as follows:

Z_k+1＝Z_k+α_kP_k；

r_k＝r_k-1+α_k-1Apk-1；p_k＝r_k+β_k-1p_k-1；

6. The method of claim 1, wherein the first key feature is a preset key feature in the target object; the second key features correspond to the first key features one to one.

7. The method of claim 6, wherein determining a pose migration matrix from the first and second key features comprises:

determining coordinate values of each first key feature and each second key feature;

8. The method of claim 7, comprising: a plurality of frames of reference images; the reference image comprises a temporal sequence;

after determining the target composite image according to the attitude transition matrix, the image to be transitioned, and the initial image, the method further includes:

and arranging the plurality of frames of the target synthetic images according to the time sequence to obtain a target synthetic video.

9. The method of claim 1, wherein determining a target composite image from the pose migration matrix, the image to be migrated, and the initial image comprises:

extracting a target object in the image to be migrated;

determining a synthetic object according to the attitude transition matrix, the target object and the initial image;

and synthesizing the background of the reference image and the synthetic object to obtain the target synthetic image.

10. An image processing apparatus, characterized in that the apparatus comprises:

the third acquisition module is used for acquiring an initial image;

11. An electronic device, comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, which when executed by the processor, implement the steps of the method of any one of claims 1-9.

12. A readable storage medium, characterized in that it stores thereon a program or instructions which, when executed by a processor, implement the steps of the method according to any one of claims 1-9.