CN111680623A

CN111680623A - Attitude conversion method and apparatus, electronic device, and storage medium

Info

Publication number: CN111680623A
Application number: CN202010507406.4A
Authority: CN
Inventors: 黄思羽; 熊昊一; 窦德景
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2020-09-18
Anticipated expiration: 2040-06-05
Also published as: CN111680623B

Abstract

The application discloses a posture conversion method and device, electronic equipment and a storage medium, and relates to the technical field of processing images by using a deep learning neural network. The specific implementation scheme is as follows: performing feature extraction on an image to be processed to obtain a target apparent feature of a target body presented by the image to be processed, wherein the target apparent feature is obtained by decoupling the apparent feature and the posture feature of the target body in the image to be processed; acquiring a target posture characteristic which indicates that the target body is converted to the target posture characteristic for displaying; and performing posture conversion on the target body in the image to be processed based on the target apparent characteristic and the target posture characteristic to obtain a target image, wherein the target body in the target image is displayed by the target posture characteristic, so that the generation of a high-definition, high-quality and vivid target image in a specified posture is realized.

Description

Attitude conversion method and apparatus, electronic device, and storage medium

Technical Field

The application relates to the field of artificial intelligence, in particular to the field of image processing, and especially relates to the technical field of image processing by using a deep learning neural network.

Background

The arbitrary pose conversion technique refers to generating a new model image under the condition that a user gives a model image and a target pose, so that the model in the generated new model image is consistent with the identity and appearance of the model in the given model image, and simultaneously, the pose of the model in the generated new model image is consistent with the given target pose. In practical application, the posture conversion technology of the model image has a large application value and a wide application prospect. However, the conventional image processing technology cannot cope with the task of generating a high-definition model image, and therefore, how to generate a high-definition, high-quality and vivid model image in a specified target posture becomes a problem to be solved urgently.

Disclosure of Invention

The disclosure provides a posture conversion method and device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided an attitude conversion method including:

performing feature extraction on an image to be processed to obtain a target apparent feature of a target body presented by the image to be processed, wherein the target apparent feature is obtained by decoupling the apparent feature and the posture feature of the target body in the image to be processed;

acquiring a target posture characteristic which indicates that the target body is converted to the target posture characteristic for displaying;

and performing posture conversion on the target body in the image to be processed based on the target apparent characteristic and the target posture characteristic to obtain a target image, wherein the target body in the target image is displayed by the target posture characteristic.

According to another aspect of the present disclosure, there is provided an attitude conversion apparatus including:

the image processing device comprises a feature extraction unit, a pose feature extraction unit and a feature extraction unit, wherein the feature extraction unit is used for extracting features of an image to be processed to obtain a target apparent feature of a target body presented by the image to be processed, and the target apparent feature is obtained after the apparent feature of the target body in the image to be processed and a pose feature are decoupled;

the attitude characteristic acquisition unit is used for acquiring a target attitude characteristic which indicates that the target body is converted into the target attitude characteristic for displaying;

and the posture conversion unit is used for performing posture conversion on the target body in the image to be processed based on the target apparent feature and the target posture feature to obtain a target image, wherein the target body in the target image is displayed in the target posture feature.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method described above.

Therefore, the method and the device can extract the features of the image to be processed to obtain the target apparent features of the target body presented by the image to be processed, and the target apparent features are obtained after the apparent features and the posture features of the target body in the image to be processed are decoupled, so that the posture conversion process has sufficient robustness, controllability and interpretability, and meanwhile, the target image obtained by converting the target body in the image to be processed from the initial posture to the specified target posture features has high definition, high quality and vivid effect.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram of a flow chart of an implementation of a posture conversion method according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a process of posture conversion in a specific scenario according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of generating an image of a real model in a specific scene according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating normalization operations in a specific scenario according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an attitude conversion apparatus for implementing an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device for implementing an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The scheme of the application aims at generating a high-definition, high-quality and vivid target body image such as a model image under a specified gesture (namely a target gesture feature); specifically, the present disclosure provides a method for posture conversion, as shown in fig. 1, including:

step S101: and performing feature extraction on the image to be processed to obtain a target apparent feature of a target body presented by the image to be processed, wherein the target apparent feature is obtained by decoupling the apparent feature and the posture feature of the target body in the image to be processed.

Step S102: and acquiring target posture characteristics which indicate that the target body is converted into the target posture characteristics for displaying.

Step S103: and performing posture conversion on the target body in the image to be processed based on the target apparent characteristic and the target posture characteristic to obtain a target image, wherein the target body in the target image is displayed by the target posture characteristic.

In practical application, the target body can be a human body, an animal body and the like, and correspondingly, the posture conversion of the scheme of the application refers to converting the target body in the image to be processed, such as a human body model, from an initial posture to a target posture, so that image conversion processing support is provided for the model picture generation, the clothing aided design or the clothing sales scene, and further user experience is improved.

Here, since the present application can perform feature extraction on the image to be processed to obtain the target apparent feature of the target body presented by the image to be processed, and the target apparent feature is obtained by performing decoupling processing on the apparent feature and the pose feature of the target body in the image to be processed, the pose conversion process can have sufficient robustness, controllability and interpretability, and at the same time, the target image obtained by converting the target body in the image to be processed from the initial pose to the specified target pose feature has high definition, high quality and vivid effect.

In a specific example of the scheme of the application, the apparent target features may be obtained in the following manner, specifically, feature extraction is performed on the image to be processed to obtain initial posture features corresponding to the target volume and including posture key point information; here, the initial pose feature is a pose feature of a target in the image to be processed, such as a thermodynamic diagram of an initial pose key point; further, inputting the pose key point information in the initial pose feature and the image to be processed into a preset neural network, performing feature processing on the image through a convolutional layer of the preset neural network to obtain a first intermediate apparent feature and a first intermediate pose feature, and performing decoupling processing on the apparent feature and the pose feature of the target body in the image to be processed at least based on the first intermediate apparent feature and the first intermediate pose feature to obtain a target apparent feature. Therefore, the apparent characteristics and the posture characteristics of the target body in the image to be processed are decoupled by utilizing the preset neural network, so that a foundation is laid for enabling the subsequent posture conversion process to have sufficient robustness, controllability and interpretability, and meanwhile, a foundation is laid for obtaining a high-definition, high-quality and vivid target image subsequently.

In a specific example of the scheme of the present application, after obtaining the first intermediate apparent feature and the first intermediate pose feature, the apparent feature and the pose feature of the target body in the image to be processed may be further decoupled by multiplying the first intermediate apparent feature and the first intermediate pose feature. For example, the first intermediate appearance feature and the second intermediate pose feature are multiplied element by element to achieve decoupling. Therefore, a decoupling mode with simple processing is provided, and the processing efficiency is improved on the basis of ensuring the unchanged posture conversion effect.

In a specific example of the scheme of the application, in order to improve the decoupling effect, after obtaining the first intermediate apparent feature and the first intermediate pose feature, the target apparent feature may also be obtained in the following manner, specifically, a preset activation function is used to perform feature processing on the first intermediate pose feature, and a product operation is performed on the first intermediate pose feature after the feature processing and the first intermediate apparent feature, so as to decouple the apparent feature and the pose feature of the target body in the image to be processed. Therefore, the decoupling effect is improved, a foundation is laid for enabling the subsequent posture conversion process to have sufficient robustness, controllability and interpretability, and meanwhile, a foundation is laid for obtaining a high-definition, high-quality and vivid target image subsequently.

In a specific example of the scheme of the present application, to further improve the decoupling effect, the obtained intermediate apparent feature and intermediate pose feature may be further processed by a convolutional layer, specifically, the method further includes: inputting the first intermediate apparent feature and the first intermediate posture feature after the product operation processing into a preset neural network, and performing feature processing through a convolutional layer of the preset neural network to obtain a second intermediate apparent feature and a second intermediate posture feature; and decoupling the apparent characteristic and the posture characteristic of the target body in the image to be processed at least based on the second intermediate apparent characteristic and the second intermediate posture characteristic. In practical applications, the manner of decoupling the apparent feature and the pose feature of the target body in the image to be processed based on the second intermediate apparent feature and the second intermediate pose feature is similar to the manner of decoupling the apparent feature and the pose feature of the target body in the image to be processed based on the first intermediate apparent feature and the first intermediate pose feature, for example, the second intermediate apparent feature and the second intermediate pose feature are multiplied to decouple the apparent feature and the pose feature of the target body in the image to be processed. Or, performing feature processing on the second intermediate posture feature by using a preset activation function, and performing product operation on the second intermediate posture feature after the feature processing and the second intermediate apparent feature to perform decoupling processing on the apparent feature and the posture feature of the target body in the image to be processed, so as to circulate the operation to improve the decoupling degree, lay a foundation for enabling the posture conversion process to have sufficient robustness, controllability and interpretability, and lay a foundation for obtaining a high-definition, high-quality and vivid target image subsequently. Of course, in practical application, the cycle processing times of the apparent features and the attitude features obtained in the intermediate processing process can be set according to the decoupling effect, and the scheme of the application does not limit the cycle times.

The following describes the process of obtaining the target apparent features in further detail with reference to a specific scene, specifically, taking the target body as a human body model as an example, the step of obtaining the target apparent features may be implemented by a human body apparent encoder, which is an example of the process of obtaining the target apparent featuresThe encoder can effectively extract the human body apparent features (namely the target apparent features) in the model image with the assistance of the initial posture (namely the initial posture features) of the model. Here, for example, the human body appearance Encoder includes L Encoder blocks (Encoder blocks), where L is a positive integer greater than or equal to 1; user input of a master image I_SEncoding the low-dimensional characteristic variable I containing the human body apparent characteristics through L encoder modules_S ^LAnd obtaining the apparent characteristics of the target. As shown in FIG. 2, the left half of FIG. 2 depicts the apparent human body feature I obtained by the apparent human body encoder_S ^LThe Encoder comprises L Encoder modules, namely Encode Block-1, Encode Block-2, …, Encode Block-L, … and Encode Block-L, wherein L is a positive integer which is more than or equal to 2 and less than or equal to L; wherein, the input of the 1 st encoder module (EncodeBlock-1) is an original model image I_SAnd a master model image I_SThermodynamic diagram P of middle model attitude key points_S. The input of each subsequent Encoder module (such as Encoder Block-l) is the intermediate apparent characteristic I output by the last (i.e. the l-1 st) Encoder module_S ^(l-1)And intermediate attitude feature P_S ^(l-1). Here, in each encoder block, the intermediate apparent feature I_S ^(l-1)And intermediate attitude feature P_S ^(l-1)Respectively firstly carrying out feature processing through a convolution layer of a preset neural network, then carrying out Sigmoid activation function processing on the processed intermediate attitude feature and the processed I of the convolution layer_S ^(l-1)Performing element-by-element product operation to obtain the output I of the current Encoder module (i.e. Encoder Block-l)_S ^(l)And P_S ^(l)Up to output I_S ^L。

In a specific example of the present application, the pose conversion may be performed in a manner that a target image is obtained, specifically, after the target apparent feature and pose key point information (such as a target pose key point thermodynamic diagram) indicated by the target pose feature are subjected to image stitching processing, input into a preset neural network, and after feature processing is performed through a deconvolution layer of the preset neural network, normalization operation is performed on the target apparent feature, until the target image is obtained, where the target image is an image obtained by performing pose conversion on a target body in an image to be processed, and the target body in the target image has the target pose feature, where the target apparent feature obtained by the present application is an apparent feature decoupled from the pose feature, so that the pose conversion process has sufficient robustness, Controllability and interpretability, and simultaneously, the obtained target image has high-definition, high-quality and vivid effects.

In a specific example of the present application, to improve the quality of a target image, before the obtaining the target image, the method further includes: and performing image splicing on the third intermediate apparent feature obtained after the normalization operation and the attitude key point information indicated by the target attitude feature, inputting the image spliced into a preset neural network, performing feature processing through a deconvolution layer of the preset neural network, performing normalization operation on the image spliced with the target apparent feature again, and sequentially circulating until the target image with the image quality meeting preset conditions is obtained. Therefore, the quality of the target image is improved through a simple cyclic process, and the obtained target image has high definition, high quality and vivid effect.

The process of obtaining the target image is further explained in detail with reference to a specific scene, specifically, the step of obtaining the target image can be realized by an appearance-posture decoupling generator (hereinafter referred to as a generator) by taking the target body as an example; specifically, the Generator comprises L Generator modules (Generator blocks), namely, Generator Block-1, Generator Block-2, …, Generator Block-L, … and Generator Block-L, wherein L is a positive integer greater than or equal to 1, L is a positive integer greater than or equal to 2 and less than or equal to L, and the Generator can utilize the target apparent characteristic I output by the human body apparent encoder in the example_S ^LProgressively decoded by L generator modules to conform to target pose P_tThe real model image of (1). Here, it should be clear that the number of generator modules in the generator is equal toThe number of the encoder modules in the human body appearance encoder can be the same or different, that is, the two are not related, and this example is only a schematic illustration and does not limit the number of the two; in practical applications, the number of generator modules and the number of encoder modules may be set based on the processing effect, which is not limited in the present application.

Further, as shown in FIG. 2, the right half of FIG. 2 depicts a flow process diagram of the apparent-pose decoupling generator. Specifically, the input of the first generator module is a target apparent characteristic I output by the human body apparent encoder_S ^LAnd target pose key point thermodynamic diagram P_t. The input of each subsequent Generator module (e.g., Generator Block-l) is the target apparent feature I_S ^LThermodynamic diagram P of key points of target attitude_tAnd the intermediate apparent characteristics I of the output of the last (l-1 st) generator module_f ^(l-1). In each generator module, the intermediate apparent features I_f ^(l-1)Firstly, performing thermodynamic diagram P with target attitude key points_tPerforming splicing operation, performing feature processing on the spliced image through a deconvolution layer of a preset neural network, and performing feature processing on the spliced image and the target apparent feature I_S ^LPerforming Adaptive Patch Normalization (AdaPN) operation to obtain I_f ^lUp to output I_S ^LTo obtain a conforming target attitude P_tThe real model image of (1).

As shown in FIG. 3, the present example can utilize the apparent feature I of the target obtained after the original model image is coded_S ^(l)And carrying out self-adaptive area normalization operation on the target attitude characteristic to obtain a real model image under the target attitude characteristic.

Here, as shown in fig. 4, unlike the conventional normalization operation, the present application scheme can perform adaptive regional normalization processing on part of the content of an image, and thus lays a foundation for obtaining a high-definition, high-quality, and vivid target image. Specifically, C, H, W represent the channel number, height, and width of the feature map, respectively. It carries on normalization and inverse normalization operation in the special channel and area of the characteristic diagram, and is expressed as:

wherein x represents the intermediate apparent characteristic I_f ^(l-1)And y represents the apparent feature I of the object_S ^LC represents the number of the feature channel corresponding to the intermediate apparent feature or the target apparent feature, i, j represents the center position coordinate of a certain feature block on the feature map, the coordinate value of i, j is related to the height of the feature map, β and gamma represent the average value and variance of the feature block in x, y represents the average value and variance of the feature block in x, and_c,i,j ^wand y_c,i,j ^bFeature blocks y in y, respectively_c,i,jMean and variance of. The adaptive normalization operation in this example is to use the feature block x of the intermediate apparent feature_c,i,jAfter the mean value-variance normalization processing is carried out, multiplying the result by a feature block y in the apparent features of the target_c,i,jVariance y of_c,i,j ^wFinally, add the feature block y of the apparent feature of the target_c,i,jAverage value y of_c,i,j ^b。

Therefore, the human body appearance characteristic and the human body posture characteristic in the human body posture conversion process are effectively decoupled, so that the posture conversion process has enough robustness, controllability and interpretability, and the scheme can support generation of high-definition and vivid model pictures. Moreover, the present application further provides an attitude conversion apparatus, as shown in fig. 5, including:

the feature extraction unit 501 is configured to perform feature extraction on an image to be processed to obtain a target apparent feature of a target body presented by the image to be processed, where the target apparent feature is obtained by performing decoupling processing on the apparent feature and an attitude feature of the target body in the image to be processed;

an attitude feature obtaining unit 502, configured to obtain a target attitude feature, where the target attitude feature indicates that the target volume is converted into the target attitude feature for display;

a pose conversion unit 503, configured to perform pose conversion on the target object in the image to be processed based on the target apparent feature and the target pose feature to obtain a target image, where the target object in the target image is shown in the target pose feature.

In a specific example of the present disclosure, the feature extraction unit 501 includes:

the initial attitude extraction subunit is used for performing feature extraction on the image to be processed to obtain initial attitude features which correspond to the target body and contain attitude key point information;

the neural network processing subunit is used for inputting the attitude key point information in the initial attitude characteristic and the image to be processed into a preset neural network, and performing characteristic processing through a convolutional layer of the preset neural network to obtain a first intermediate apparent characteristic and a first intermediate attitude characteristic;

and the decoupling processing subunit is used for decoupling the apparent feature and the posture feature of the target body in the image to be processed at least based on the first intermediate apparent feature and the first intermediate posture feature to obtain a target apparent feature.

In a specific example of the solution of the present application, the decoupling processing subunit is further configured to perform a product operation on the first intermediate apparent feature and the first intermediate pose feature, so as to decouple the apparent feature and the pose feature of the target volume in the image to be processed.

In a specific example of the scheme of the application, the decoupling processing subunit is further configured to perform feature processing on the first intermediate pose feature by using a preset activation function, and perform a product operation on the first intermediate pose feature after the feature processing and the first intermediate apparent feature, so as to decouple the apparent feature and the pose feature of the target volume in the image to be processed.

In a specific example of the application, the neural network processing subunit is further configured to input the first intermediate apparent feature and the first intermediate pose feature after product operation processing into a preset neural network, and perform feature processing on a convolutional layer of the preset neural network to obtain a second intermediate apparent feature and a second intermediate pose feature;

the decoupling processing subunit is configured to perform decoupling processing on the apparent feature and the pose feature of the target volume in the image to be processed based on at least the second intermediate apparent feature and the second intermediate pose feature.

In a specific example of the present disclosure, the gesture converting unit 503 is further configured to:

and performing image splicing processing on the target apparent features and the attitude key point information indicated by the target attitude features, inputting the processed image into a preset neural network, performing feature processing through a deconvolution layer of the preset neural network, and performing normalization operation on the processed image and the target apparent features until the target image is obtained.

In a specific example of the present application, the posture conversion unit 503 includes:

and performing image splicing on the third intermediate apparent feature obtained after the normalization operation and the attitude key point information indicated by the target attitude feature, inputting the image spliced and input into a preset neural network, performing feature processing through a deconvolution layer of the preset neural network, and performing normalization operation on the image spliced and input again and the target apparent feature.

Here, it should be noted that: the descriptions of the embodiments of the apparatus are similar to the descriptions of the methods, and have the same advantages as the embodiments of the methods, and therefore are not repeated herein. For technical details not disclosed in the embodiments of the apparatus of the present application, those skilled in the art should refer to the description of the embodiments of the method of the present application for understanding, and for the sake of brevity, will not be described again here.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, the electronic device is a block diagram of an electronic device according to the gesture conversion method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the gesture conversion methods provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the gesture conversion method provided herein.

The memory 602, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the gesture conversion method in the embodiment of the present application (for example, the feature extraction unit 501, the gesture feature acquisition unit 502, and the gesture conversion unit 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 602, that is, implementing the gesture conversion method in the above method embodiment.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device of the posture conversion method, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 may optionally include memory remotely located from the processor 601, and these remote memories may be connected over a network to the electronic device of the gesture translation method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the posture conversion method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the gesture conversion method, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of gesture conversion, comprising:

2. The method according to claim 1, wherein the extracting features of the image to be processed to obtain the target apparent features of the target body presented by the image to be processed comprises:

extracting the features of the image to be processed to obtain initial attitude features which comprise attitude key point information and correspond to the target body;

inputting the attitude key point information in the initial attitude characteristics and the image to be processed into a preset neural network, and performing characteristic processing through a convolutional layer of the preset neural network to obtain a first intermediate apparent characteristic and a first intermediate attitude characteristic;

and decoupling the apparent feature and the posture feature of the target body in the image to be processed at least based on the first intermediate apparent feature and the first intermediate posture feature to obtain a target apparent feature.

3. The method of claim 2, wherein the decoupling the apparent features and pose features of the target volume in the image to be processed based on at least the first intermediate apparent features and the first intermediate pose features comprises:

and performing a product operation on the first intermediate apparent feature and the first intermediate pose feature to decouple the apparent feature and the pose feature of the target body in the image to be processed.

4. The method of claim 2, wherein the decoupling the apparent features and pose features of the target volume in the image to be processed based on at least the first intermediate apparent features and the first intermediate pose features comprises:

and performing feature processing on the first intermediate posture feature by using a preset activation function, and performing product operation on the first intermediate posture feature after the feature processing and the first intermediate apparent feature so as to decouple the apparent feature and the posture feature of the target body in the image to be processed.

5. The method of claim 3 or 4, further comprising:

inputting the first intermediate apparent feature and the first intermediate posture feature after the product operation processing into a preset neural network, and performing feature processing through a convolutional layer of the preset neural network to obtain a second intermediate apparent feature and a second intermediate posture feature;

and decoupling the apparent characteristic and the posture characteristic of the target body in the image to be processed at least based on the second intermediate apparent characteristic and the second intermediate posture characteristic.

6. The method of claim 1, wherein the posture-converting the target volume in the image to be processed based on the target apparent feature and the target posture feature to obtain a target image comprises:

7. The method of claim 6, prior to obtaining the target image, further comprising:

8. An attitude conversion apparatus comprising:

9. The apparatus of claim 8, wherein the feature extraction unit comprises:

10. The apparatus of claim 9, wherein the decoupling subunit is further configured to multiply the first intermediate apparent feature and the first intermediate pose feature to decouple the apparent feature and the pose feature of the target volume in the image to be processed.

11. The apparatus according to claim 9, wherein the decoupling processing subunit is further configured to perform feature processing on the first intermediate pose feature by using a preset activation function, and perform a product operation on the feature-processed first intermediate pose feature and the first intermediate apparent feature to decouple the apparent feature and the pose feature of the target volume in the image to be processed.

12. The apparatus of claim 10 or 11,

the neural network processing subunit is further configured to input the first intermediate apparent feature and the first intermediate posture feature subjected to the product operation processing into a preset neural network, and perform feature processing on a convolutional layer of the preset neural network to obtain a second intermediate apparent feature and a second intermediate posture feature;

13. The apparatus of claim 8, wherein the gesture conversion unit is further configured to:

14. The apparatus of claim 13, wherein the gesture conversion unit comprises:

15. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.