CN116309983B - Training method and generating method and device of virtual character model and electronic equipment - Google Patents

Training method and generating method and device of virtual character model and electronic equipment Download PDF

Info

Publication number
CN116309983B
CN116309983B CN202310028021.3A CN202310028021A CN116309983B CN 116309983 B CN116309983 B CN 116309983B CN 202310028021 A CN202310028021 A CN 202310028021A CN 116309983 B CN116309983 B CN 116309983B
Authority
CN
China
Prior art keywords
target
image
target point
style
gesture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310028021.3A
Other languages
Chinese (zh)
Other versions
CN116309983A (en
Inventor
张雨蒙
叶晓青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310028021.3A priority Critical patent/CN116309983B/en
Publication of CN116309983A publication Critical patent/CN116309983A/en
Application granted granted Critical
Publication of CN116309983B publication Critical patent/CN116309983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Graphics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The disclosure provides a training method, a generating device and electronic equipment for a virtual character model, relates to the technical field of artificial intelligence, and in particular relates to the technical fields of Augmented Reality (AR), virtual Reality (VR), computer vision, deep learning and the like, and can be applied to scenes such as metauniverse, virtual digital people and the like. The method comprises the following steps: determining the condition style body characteristics according to the condition images and the condition gestures through a style body characteristic extraction network; transforming the target point in the target gesture through a gesture control network to obtain a target point coordinate, and determining a style feature vector corresponding to the target point coordinate according to the target point coordinate and the condition style body feature; constructing a target nerve radiation field according to the coordinates, the style characteristic vector and the observation direction of the target point through a nerve radiation field network, and rendering through the target nerve radiation field to obtain a rendered image; the model is trained from the truth image and the rendered image. By the technical scheme, the image rendering quality can be improved.

Description

Training method and generating method and device of virtual character model and electronic equipment
Technical Field
The disclosure relates to the field of computers, in particular to the technical field of artificial intelligence, and specifically relates to the technical fields of Augmented Reality (AR), virtual Reality (VR), computer vision, deep learning and the like, and can be applied to scenes such as metauniverse, virtual digital people and the like. And in particular to a training method, a generating device and electronic equipment for a virtual character model.
Background
With the development of computer technology and network technology, image rendering technology and neural radiation field (Neural Radiance Field, neRF) technology, which improves the image rendering technology by integration with a neural network, have been rapidly developed.
Virtual humans refer to virtual characters having a digitized appearance that will depend on the display device presence and have the appearance of a person, the person's holding (talking, holding) and the person's mind. How to improve the rendering quality of virtual characters based on neural radiation fields is important.
Disclosure of Invention
The disclosure provides a training method, a generating method, a device and electronic equipment for a virtual character model.
According to an aspect of the present disclosure, there is provided a training method of a virtual character generating model, the virtual character generating model to be trained including a style body feature extraction network, a gesture control network, and a neural radiation field network; the method comprises the following steps:
Determining the condition style body characteristics under the condition posture according to the condition image and the condition posture corresponding to the condition image through the style body characteristic extraction network;
transforming the target point in the target gesture through the gesture control network to obtain transformed target point coordinates, and determining a style feature vector corresponding to the target point coordinates according to the target point coordinates and the conditional style body features;
constructing a target nerve radiation field according to the target point coordinates, the style feature vector corresponding to the target point coordinates and the observation direction corresponding to the target point coordinates through the nerve radiation field network, and rendering through the target nerve radiation field to obtain a rendered image;
and training the virtual character generation model according to the true value image corresponding to the target gesture and the rendering image.
According to another aspect of the present disclosure, there is provided a virtual character generating method including:
acquiring a target nerve radiation field; the target nerve radiation field is constructed in the process of training the virtual character generating model by adopting the training method of the virtual character generating model disclosed by any embodiment of the disclosure;
Rendering is carried out through the target nerve radiation field, and a virtual character image is obtained.
According to still another aspect of the present disclosure, there is provided a training apparatus of a virtual character generating model, the virtual character generating model to be trained including a style body feature extraction network, a posture control network, and a neural radiation field network; the device comprises:
the style body extraction module is used for determining the condition style body characteristics under the condition posture according to the condition image and the condition posture corresponding to the condition image through the style body characteristic extraction network;
the gesture control module is used for transforming the target point in the target gesture through the gesture control network to obtain transformed target point coordinates, and determining a style feature vector corresponding to the target point coordinates according to the target point coordinates and the conditional style body features;
the nerve radiation field module is used for constructing a target nerve radiation field according to the target point coordinates, the style feature vector corresponding to the target point coordinates and the observation direction corresponding to the target point coordinates through the nerve radiation field network, and rendering through the target nerve radiation field to obtain a rendered image;
And the model training module is used for training the virtual character generating model according to the true image corresponding to the target gesture and the rendering image.
According to still another aspect of the present disclosure, there is provided a virtual character generating apparatus including:
the target radiation field module is used for acquiring a target nerve radiation field; the target nerve radiation field is constructed in the process of training the virtual character generating model by adopting the training method of the virtual character generating model disclosed by any embodiment of the disclosure;
and the virtual character generating module is used for rendering through the target nerve radiation field to obtain a virtual character image.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods provided by any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method provided by any of the embodiments of the present disclosure.
According to the technology disclosed by the invention, the image rendering quality can be improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1a is a flow chart of a training method for virtual character generation models provided in accordance with an embodiment of the present disclosure;
FIG. 1b is a schematic diagram of a virtual character generation model provided in accordance with an embodiment of the present disclosure;
FIG. 2a is a flow chart of another method of training a virtual character generation model provided in accordance with an embodiment of the present disclosure;
FIG. 2b is a schematic diagram of another virtual character generation model provided in accordance with an embodiment of the present disclosure;
FIG. 2c is a schematic diagram of a style body feature extraction network provided in accordance with an embodiment of the present disclosure;
FIG. 3a is a flow chart of another method of training a virtual character generation model provided in accordance with an embodiment of the present disclosure;
FIG. 3b is a schematic diagram of another virtual character generation model provided in accordance with an embodiment of the present disclosure;
FIG. 4 is a flow chart of a method of virtual character generation provided in accordance with an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a training apparatus for virtual character generation models provided in accordance with an embodiment of the present disclosure;
fig. 6 is a schematic structural view of a virtual character generating apparatus provided according to an embodiment of the present disclosure;
fig. 7 is a block diagram of an electronic device for implementing a training method of a avatar generation model or a avatar generation method of an embodiment of the present disclosure.
Detailed Description
FIG. 1a is a flow chart of a training method for virtual character generation models provided in accordance with an embodiment of the present disclosure. The method is suitable for training the virtual character generation model based on the nerve radiation field. The method may be performed by a training apparatus for virtual character generation models, which may be implemented in software and/or hardware, and may be integrated into an electronic device. The virtual character generation model to be trained comprises a style body feature extraction network, a gesture control network and a nerve radiation field network. As shown in fig. 1a, the training method of the virtual character generation model of the present embodiment may include:
s101, determining the condition style body characteristics under the condition posture according to the condition image and the condition posture corresponding to the condition image through a style body characteristic extraction network;
S102, transforming a target point in a target gesture through a gesture control network to obtain transformed target point coordinates, and determining a style feature vector corresponding to the target point coordinates according to the target point coordinates and the conditional style body features;
s103, constructing a target nerve radiation field according to the target point coordinates, the style feature vector corresponding to the target point coordinates and the observation direction corresponding to the target point coordinates through a nerve radiation field network, and rendering through the target nerve radiation field to obtain a rendered image;
and S104, training the virtual character generation model according to the true value image corresponding to the target gesture and the rendering image.
Fig. 1b is a schematic structural diagram of a virtual character generating model provided according to an embodiment of the present disclosure, and referring to fig. 1b, a virtual character generating model to be trained includes a style body feature extraction network 11, a gesture control network 12, and a neural radiation field network 13, where the style body feature extraction network 11 may also be referred to as a style body feature extraction branch, and a conditional gesture 112 corresponding to the conditional image 111 is input to perform feature extraction on the conditional image 111 to form feature codes, and the conditional gesture 112 is used to obtain a conditional style body feature 113 under the conditional gesture 112, where the style body feature 113 is used to characterize a style attribute of a character, such as a wearing style, a hairstyle style, a skin style, and so on, and if the style body feature of any area changes, the generated virtual character also changes. The gesture control network 12 may also be referred to as a gesture control branch for controlling the gesture of the generated image, the input of the branch is a target gesture, the input is used for transforming the target point in the target gesture to obtain the coordinate of the target point transformed into other gestures, and the style feature vector 131 corresponding to the coordinate of the target point is determined based on the conditional style body feature 113; and training the radiation field function by taking the style characteristic vector 131 corresponding to the coordinates of the target point as a condition to obtain the target nerve radiation field.
Specifically, inputting a conditional image and a conditional posture corresponding to the conditional image into a style body feature extraction network in the virtual character generation model to obtain a conditional style body feature under the conditional posture; inputting the target gesture into a gesture control network, so as to transform the target point in the target gesture into a condition gesture or an intermediate gesture, and obtaining the target point coordinate in the condition gesture or the intermediate gesture as the transformed target point coordinate; and extracting a style characteristic vector corresponding to the target point coordinate from the condition style body characteristic under the condition posture according to the target point coordinate, taking the style characteristic vector corresponding to the target point coordinate as a condition of the nerve radiation field function, and sending the style characteristic vector and the target point coordinate into the nerve radiation field function together to obtain the color (RGB) and the body density (sigma) corresponding to the target point coordinate, thereby obtaining the target nerve radiation field.
In the disclosed embodiment, the input of the neural radiation field function includes not only the target point coordinates p tp Viewing direction corresponding to coordinates of target pointAnd also comprises a style characteristic vector f corresponding to the coordinates of the target point tp The output of the function is the color value (R, G, B) and the volume density σ corresponding to the coordinates of the target point, and the process is shown in formula (1).
(RGB,σ)=RF(f tp , p tp ), (1)
Wherein RF is a function of the radiation field.
When virtual character rendering is carried out through a target identity radiation field, a series of light rays are emitted through target camera parameters, a series of points are sampled on each light ray and sent into a radiation field function to obtain color and volume density, the color and volume density of each light ray are integrated to obtain pixel values of the light ray in a rendered image, the color values of different light rays are combined to obtain a final rendered image, namely different light ray directions corresponding to the image to be rendered are sent into the radiation field function to obtain the rendered image. By inputting the style feature vector corresponding to the target point coordinates as a condition to the radiation field function, the nerve radiation field can learn the character style in the condition image, such as the dressing style, the hairstyle style, the skin style, and the like, so that the virtual character image rendered by the target nerve radiation field is close to the character style in the condition image, and the image rendering quality can be improved.
In the embodiment of the present disclosure, a person image corresponding to a target pose may be taken as a truth image, for example, a first image and a first pose corresponding to the first image are taken as a conditional image and a conditional pose, respectively, in a person image frame sequence; and respectively taking the second image and the second gesture corresponding to the second image as a true image and a target gesture. Specifically, the real image and the rendered image can be compared, and the virtual character generating model is trained according to the comparison result, namely, the style body feature extraction network, the gesture control network and the nerve radiation field network in the virtual character generating model are trained, so that the rendered image is close to the real image, and the image rendering quality is improved.
According to the technical scheme provided by the embodiment of the disclosure, the style body characteristics under the condition posture are determined through the style body characteristic extraction network, the posture of the target point in the target posture is transformed through the posture control network to obtain transformed target point coordinates, the style characteristic vector corresponding to the target point coordinates is determined according to the target point coordinates and the condition style body characteristics, and the style characteristic vector corresponding to the target point coordinates is used as a condition input radiation field function, so that the nerve radiation field can learn the character style in the condition image, and the image rendering quality can be improved.
FIG. 2a is a flow chart of another method of training a virtual character generation model, provided in accordance with an embodiment of the present disclosure. In an embodiment of the present disclosure, a virtual character generation model to be trained includes a style body feature extraction network, a gesture control network, and a neural radiation field network; the style body feature extraction network comprises a first feature extraction sub-network, a second feature extraction sub-network and a feature fusion sub-network. Determining, by the style body feature extraction network, a conditional style body feature under the conditional posture according to the conditional image and the conditional posture corresponding to the conditional image, further comprising: carrying out semantic segmentation on the conditional image to obtain an image area, and obtaining the feature of the conditional image according to the image area and the conditional image through a first feature extraction sub-network; obtaining gesture features under the conditional gestures according to the conditional gestures through the second feature extraction sub-network; and obtaining the condition style body characteristics under the condition posture according to the condition image characteristics and the posture characteristics under the condition posture through the characteristic fusion sub-network.
Referring to fig. 2a, the training method of the virtual character generation model of the present embodiment may include:
s201, carrying out semantic segmentation on the conditional image to obtain an image area, and obtaining conditional image features according to the image area and the conditional image through the first feature extraction sub-network;
s202, obtaining gesture features under the condition gesture according to the condition gesture through the second feature extraction sub-network;
s203, obtaining the condition style body characteristics under the condition posture according to the condition image characteristics and the posture characteristics under the condition posture through the characteristic fusion sub-network;
s204, transforming the target point in the target gesture through the gesture control network to obtain transformed target point coordinates, and determining a style feature vector corresponding to the target point coordinates according to the target point coordinates and the conditional style body features;
s205, constructing a target nerve radiation field according to the target point coordinates, the style feature vectors corresponding to the target point coordinates and the observation directions corresponding to the target point coordinates through the nerve radiation field network, and rendering through the target nerve radiation field to obtain a rendered image;
And S206, training the virtual character generating model according to the true value image corresponding to the target gesture and the rendering image.
FIG. 2b is a schematic diagram of another virtual character generation model provided in accordance with an embodiment of the present disclosure; fig. 2c is a schematic structural diagram of a style body feature extraction network according to an embodiment of the present disclosure, and in combination with fig. 2b and fig. 2c, the style body feature extraction network includes a first feature extraction sub-network 211, a second feature extraction sub-network 212, and a feature fusion sub-network 213.
In the embodiment of the disclosure, human body semantic segmentation is performed on the conditional image to obtain an image region in the conditional image. It should be noted that, in the embodiment of the present disclosure, the number of categories of the semantic division result, that is, the number of image areas is not particularly limited, for example, the conditional image may be divided into M-class image areas, and the M-class image areas may be combined into N-class image areas according to the requirement, so as to set the granularity of the area control. Wherein M and N are positive integers, and N is smaller than M. The image area is used to characterize a local area style of the character, such as upper body dressing style, leg dressing style, shoe style, hairstyle style, skin style, etc.
Referring to fig. 2c, the conditional image and the image areas corresponding to the respective categories are multiplied, and the multiplication result is input to the first feature extraction network 211, resulting in the conditional image feature. Inputting the conditional pose into the second feature extraction network 212 to obtain a pose feature under the conditional pose; and inputting the feature of the conditional image and the feature of the gesture under the conditional gesture into the feature fusion sub-network 213 to obtain the feature of the conditional style body under the conditional gesture, specifically referring to the following formula (2):
V p = γ(f s )·f p +δ(f s ), (2)
wherein f s 、f p In order of conditional image features and gesture features under conditional gestures, V p For the feature fusion sub-network with the feature of the conditional style, gamma (& gtand delta (& gtin the conditional style), a convolution network structure can be adopted, and the feature of the conditional style V in the conditional style p And conditional image feature f s Is the same. It should be noted that, in the embodiments of the present disclosure, the network structures of the first feature extraction network and the second feature extraction network are not particularly limited, and for example, a VGG (neural network) structure or a res net (depth residual network) structure may be used. The condition style body characteristics are obtained by fusing the condition image characteristics and the posture characteristics under the condition posture, so that the condition style body characteristics comprise the condition posture information and the character style information in the condition image, the style characteristic vector corresponding to the target coordinate point is determined based on the condition style body characteristics, and the style characteristic vector corresponding to the target coordinate point is used as the condition of the radiation function, so that the rendering quality of the target nerve radiation field can be improved.
The conditional image features are obtained according to the image areas in the conditional image, and then a new virtual character image can be generated by editing any image area, so that the color and the volume density of the nerve radiation field are guided to be generated by using the style volume features based on the human body semantic area, and the local attribute of the virtual character is controlled. That is, through image region decoupling, single-model multi-character rendering and human body local attribute editing are realized, and based on one virtual character generating model, not only can a plurality of characters be rendered, but also the editing of the local attribute of the character is supported, so that the generating efficiency of the virtual character is greatly improved.
According to the technical scheme provided by the embodiment of the disclosure, the condition image features are obtained according to the image region in the condition image, the style feature vector corresponding to the target point coordinate in the target posture is determined by combining the condition image features and the condition style body features, the style feature vector corresponding to the target point coordinate is used as a condition input radiation field function to obtain the target nerve radiation field, and the image rendering is carried out through the target nerve radiation field, namely, the single-model multi-character rendering and the human body local attribute editing are realized through image region decoupling, and the generation efficiency of the virtual character is greatly improved.
In an alternative embodiment, the training the virtual character generating model according to the truth image corresponding to the target gesture and the rendering image includes: comparing the pixel value in the truth image corresponding to the target gesture with the pixel value in the rendering image, and training the virtual character generating model according to the pixel comparison result; and determining the global similarity between the truth image corresponding to the target gesture and the rendered image, and training the virtual character generation model according to the global similarity.
In an embodiment of the disclosure, pixel value comparison may be performed on the truth image and the rendered image, and a first loss function may be constructed according to the pixel value comparison result, and the virtual character generation model may be trained according to the first loss function, with reference to the following formula (3):
L l1 =L1(I g ,I t ), (3)
wherein L is l1 For calculating pixel error between two images as a first loss function, L1 () function, I g ,I t The rendered image and the truth image, respectively.
In the embodiment of the disclosure, the global similarity between the truth image corresponding to the target gesture and the rendered image is also determined, and the virtual character generation model is trained according to the global similarity.
The virtual character generation model is trained by combining the local pixel errors between the two images and the global similarity between the two images, so that the local pixels of the two images are continuously close to each other in the training process, the overall characteristics of the two images are continuously close to each other, and the image rendering quality is further improved.
In an alternative embodiment, determining a global similarity between the truth image corresponding to the target pose and the rendered image, and training the virtual character generation model according to the global similarity, including: determining a first global similarity between a truth image corresponding to a target gesture and the rendered image through a countermeasure network; determining a second global similarity between the truth image corresponding to the target gesture and the rendered image through a perception network; and training the virtual character generating model according to the first global similarity and/or the second global similarity.
In conjunction with equations (4) and (5) below, a first global similarity between the truth image and the rendered image may be determined through the countermeasure network; a second global similarity between the truth image and the rendered image may be determined via the perceptual network, and the virtual character generation model may be trained using at least one of the first global similarity and the second global similarity.
L gan =E[logD(I t )]+E[log(1-D(I g ))], (4)
L pert = |φ j (I t )-φ j (I g ) |, (5)
Wherein E []Represents the mean value, D (·) represents the arbiter function in the countermeasure network, L gan Representing the counterloss function, phi j (. Cndot.) represents the perceived network output, L pert Representing the perceptual loss function. The countermeasure network and the perception network are trained in advance, and the network structure and the training mode of the embodiment of the disclosure are not particularly limited. The first global similarity and the second global similarity between the true image and the rendered image are respectively determined through the antagonism network and the perception network, the antagonism loss function is built according to the first global similarity, the perception loss function is built according to the second global similarity, and at least one of the antagonism loss function and the perception loss function is adopted to train the virtual character generation model, so that the distance between the true image and the rendered image is shortened on the whole, and the image rendering quality is improved.
FIG. 3a is a flow chart of training of yet another virtual character generation model provided in accordance with an embodiment of the present disclosure. This embodiment is an alternative to the embodiments described above. The virtual character generation model to be trained comprises a style body feature extraction network, a gesture control network and a nerve radiation field network. Referring to fig. 3a, training of the virtual character generation model of the present embodiment may include:
S301, determining the condition style body characteristics under the condition posture according to the condition image and the condition posture corresponding to the condition image through the style body characteristic extraction network;
s302, through the style body feature extraction network, the conditional style body features are transformed into the middle posture to obtain transformed style body features;
s303, transforming the target point in the target gesture to the middle gesture through the gesture control network to obtain transformed target point coordinates;
s304, determining a style feature vector corresponding to the target point coordinates according to the transformed target point coordinates and the transformed style body features in the middle posture;
s305, constructing a target nerve radiation field according to the target point coordinates, the style feature vector corresponding to the target point coordinates and the observation direction corresponding to the target point coordinates through the nerve radiation field network, and rendering through the target nerve radiation field to obtain a rendered image;
and S306, training the virtual character generating model according to the true value image corresponding to the target gesture and the rendering image.
Referring to fig. 3b, any one of the gestures other than the conditional gesture and the target gesture may be taken as an intermediate gesture, the transformation style body feature is obtained by transforming the conditional style body feature under the conditional gesture to the intermediate gesture, and the target point in the target gesture is transformed to the intermediate gesture to obtain the target point coordinates transformed to the intermediate gesture; and extracting a style feature vector corresponding to the target point coordinates from the transformed style body features in the intermediate posture according to the target point coordinates transformed in the intermediate posture, and performing radiation field function training by taking the style feature vector as a condition. By taking the intermediate posture as an intermediary between the conditional posture and the target posture, the posture unification between the coordinates of the target point and the style feature vectors corresponding to the coordinates of the target point is realized, so that the training of the radiation field function is realized.
For ease of computation, a T-pose (T-post), or Bind pose (Bind post), may be used as an intermediate pose, i.e., the target point in the target pose and the style body features under the conditional pose are unified into a T-pose, where the training of the radiation field function is performed.
Optionally, the transforming, through the gesture control network, the target point in the target gesture to the intermediate gesture to obtain transformed target point coordinates includes: rigidly transforming a target point in a target posture to an intermediate posture through the posture control network to obtain a new coordinate in the target posture; inputting the new coordinates under the target posture into a non-rigid transformation sub-network in the posture control network to perform non-rigid transformation to obtain non-rigid corrected coordinates; and obtaining transformed target point coordinates according to the new coordinates in the target posture and the non-rigid corrected coordinates.
Specifically, the rotation matrix and the translation matrix between the target pose and the intermediate pose can be obtained through the conversion between the target pose and the intermediate pose, the coordinates of the target point in the target pose are transformed into the intermediate pose by using the following formula (6), the new coordinates in the target pose are obtained, and the target point is any point in the target pose.
p =R st @ p+ T st (6)
Wherein R is st And T st The method sequentially comprises a rotation matrix and a translation matrix between a target posture and an intermediate posture, wherein @ is a rotation operation, and p' sequentially comprise an original coordinate of a target point in the target posture and a new coordinate converted into the intermediate posture.
Referring to fig. 3b, the coordinates of the target point transformed into the intermediate pose may be input into a non-rigid transformation sub-network in the pose control branch for non-rigid transformation to obtain corrected coordinates for non-rigid transformation of the target point; and (3) obtaining transformed target point coordinates according to the new coordinates in the target posture and the non-rigid corrected coordinates by using the following formula (7).
p tp =p′+Δp (7)
Wherein p', Δp, p tp The new coordinates of the target point transformed to the intermediate posture, the non-rigidly transformed corrected coordinates, and the transformed target point coordinates are sequentially.
Because of the non-rigid transformation of the local properties of clothes, hair and the like of the person, the non-rigid transformation is introduced on the basis of the rigid transformation to simulate the non-rigid transformation of the human body, and the accuracy of the transformed target point coordinates can be further improved, so that the rendering quality of the radiation field function is improved.
In an alternative embodiment, said transforming the conditional style body feature into the intermediate pose results in a transformed style body feature, comprising: determining optical flow information of the key point movement according to the condition gesture and the target gesture; and according to the optical flow information of the key point movement, transforming the conditional style body characteristic into the intermediate posture to obtain a transformed style body characteristic.
Specifically, based on the optical flow method, optical flow information of the movement of the key points in the character is determined according to the conditional gesture and the target gesture. Optical flow (Optical flow or optic flow) is a concept in object motion detection in the field of view. To describe the movement of an observed object, surface or edge caused by movement relative to an observer. Optical flow is the instantaneous velocity of the pixel movement of a spatially moving object on an observation imaging plane. Optical flow is a study that uses temporal variations and correlations of pixel intensity data to determine the movement of the respective pixel locations. That is, based on the optical flow method, a velocity vector of the key point movement is determined. And carrying out optical flow deformation on the condition style body characteristics under the condition posture according to the optical flow information of the key point movement to obtain the transformation style body characteristics under the intermediate posture. Through optical flow deformation, the transformation style body characteristics are obtained by transforming the condition style body characteristics under the condition posture into the middle posture, and the subsequent construction of the nerve radiation field under the middle posture is facilitated.
In an optional implementation manner, the determining the style feature vector corresponding to the target point coordinate according to the transformed target point coordinate and the transformed style body feature in the middle gesture includes: and interpolating the transformed target point coordinates based on the transformed style body characteristics under the middle gesture to obtain style characteristic vectors corresponding to the target point coordinates.
Specifically, the style body feature vector corresponding to the target point coordinate can be obtained by interpolating the target point coordinate in the middle posture based on the transformation style body feature in the middle posture through grid_sample interpolation operation, and the style feature vector corresponding to the target point coordinate is used as the condition of the radiation field function. And (3) interpolating the style body characteristics at the target point coordinates to obtain style characteristic vectors corresponding to the target point coordinates, wherein the style characteristic vectors serve as conditions of the radiation field function, and compared with the condition that only local pixel values of the target point coordinates are input into the radiation field function, the style of the person at the target point can be focused in the construction process of the target nerve radiation field, so that the image rendering quality is improved. And, if the transformed grid body features of the whole conditional image are all input into the radiation field function, the input information amount is too much to pay attention to the local style at the target point. Therefore, the style feature vector corresponding to the coordinates of the target point is input into the radiation field function, so that people style at the target point can be focused, the focusing range is limited to the vicinity of the target point, the grasp of details is not lost, and the image rendering quality can be further improved.
According to the technical scheme provided by the embodiment of the disclosure, the target point in the target posture is transformed to the middle posture through rigid transformation and non-rigid transformation to obtain the target point coordinate, and the style body characteristic under the condition posture is transformed to the middle posture through optical flow calculation to obtain the transformed style body characteristic; under the target posture, the style body characteristic interpolation is carried out on the target point coordinates to obtain style characteristic vectors corresponding to the target point coordinates, and the style characteristic vectors corresponding to the target point coordinates are used as the conditions of the radiation field function, so that the unification of the posture between the target point coordinates and the style body characteristics is realized, the character style near the target point can be focused, the quality of the target nerve radiation field is improved, and the image rendering quality is improved.
Fig. 4 is a flowchart of a virtual character generation method provided according to an embodiment of the present disclosure. The method is applicable to the case of generating an image of a virtual character based on a neuro-radiation field. The method may be performed by a avatar generation device, which may be implemented in software and/or hardware, and may be integrated into an electronic device. As shown in fig. 4, the training method of the virtual character generation model of the present embodiment may include:
S401, acquiring a target nerve radiation field;
the target nerve radiation field is constructed in the process of training the virtual character generating model by adopting the training method of the virtual character generating model disclosed by any embodiment of the disclosure;
and S402, rendering through the target nerve radiation field to obtain a virtual character image.
According to the technical scheme provided by the embodiment of the disclosure, the style feature vector corresponding to the target point coordinate is introduced as a condition in the virtual character generation model training process, so that the quality of the target nerve radiation field is improved, and the image quality of the virtual character can be improved by generating the virtual character image through the target nerve radiation field.
In an alternative embodiment, the method further comprises: acquiring a new image area obtained by replacing at least one image area in the conditional image through a style body feature extraction network in the virtual character generation model, and determining new condition style body features under the condition posture according to the new image area and the condition posture corresponding to the condition image; transforming target points in the target gestures through a gesture control network in the virtual character generation model to obtain transformed target point coordinates, and determining new grid feature vectors corresponding to the target point coordinates according to the target point coordinates and the new condition style body features; and constructing a new target nerve radiation field according to the target point coordinates, the new grid feature vector corresponding to the target point coordinates and the observation direction corresponding to the target point coordinates through a nerve radiation field network in the virtual character generation model, and rendering through the new target nerve radiation field to obtain a new virtual character image.
The image area is at least one segmentation result of the conditional image for human semantic segmentation. And replacing at least one image area to obtain a new image area, forming a new segmentation result of the conditional image by the new image area and the original image area which is not replaced, multiplying each image area corresponding to the conditional image and the new segmentation result, and inputting the multiplication result into a first feature extraction network to obtain new conditional image features. For example, at least one of the upper wearing area, the shoe area, the hairstyle area, the skin area, etc. can be replaced to obtain new condition image features, thereby facilitating the subsequent generation of a new virtual character image for local editing.
According to the embodiment of the disclosure, a new condition style body characteristic under a condition posture is obtained according to new condition image characteristics and posture characteristics under the condition posture through a style body characteristic extraction network, and the new condition style body characteristic under the condition posture is transformed to an intermediate posture to obtain a new transformation style body characteristic; and interpolating the target point coordinates in the middle gesture based on the new transformation style body characteristics through the gesture control network to obtain new grid feature vectors corresponding to the target point coordinates, inputting the new grid feature vectors corresponding to the target point coordinates into a radiation field function, constructing a new target nerve radiation field, and rendering through the new target nerve radiation field to obtain a new virtual character image. By replacing the image area in the conditional image, namely carrying out local attribute editing on the conditional image, the multi-character rendering and the human body local attribute editing based on the single model are realized, and the generation efficiency and the flexibility of the virtual character image are improved.
Fig. 5 is a schematic structural diagram of a training apparatus for virtual character generation model according to an embodiment of the present disclosure. The embodiment is suitable for training the virtual character generation model based on the nerve radiation field. The apparatus may be implemented in software and/or hardware and may be integrated into an electronic device. The virtual character generation model to be trained comprises a style body feature extraction network, a gesture control network and a nerve radiation field network. The virtual character generating model to be trained comprises a style body characteristic extraction network, a gesture control network and a nerve radiation field network; as shown in fig. 5, the training apparatus 500 of the virtual character generation model of the present embodiment may include:
the style body extraction module 510 is configured to determine, through the style body feature extraction network, a condition style body feature under a condition pose according to a condition image and a condition pose corresponding to the condition image;
the gesture control module 520 is configured to transform, through the gesture control network, a target point in a target gesture to obtain transformed target point coordinates, and determine a style feature vector corresponding to the target point coordinates according to the target point coordinates and the conditional style body feature;
The neural radiation field module 530 is configured to construct a target neural radiation field according to the target point coordinates, the style feature vector corresponding to the target point coordinates, and the observation direction corresponding to the target point coordinates through the neural radiation field network, and render the target neural radiation field to obtain a rendered image;
the model training module 540 is configured to train the virtual character generating model according to the truth image corresponding to the target gesture and the rendered image.
In an alternative embodiment, the style body feature extraction network includes a first feature extraction sub-network, a second feature extraction sub-network, and a feature fusion sub-network;
the style body extraction module 510 includes:
the conditional image feature unit is used for carrying out semantic segmentation on the conditional image to obtain an image area, and obtaining conditional image features according to the image area and the conditional image through the first feature extraction sub-network;
a gesture feature unit, configured to obtain, through the second feature extraction sub-network, a gesture feature under a conditional gesture according to the conditional gesture;
and the conditional style body unit is used for obtaining the conditional style body characteristics under the conditional posture according to the conditional image characteristics and the posture characteristics under the conditional posture through the characteristic fusion sub-network.
In an alternative embodiment, the style body extraction module 510 further includes a style body transformation unit:
the style body transformation unit is used for transforming the conditional style body characteristics to the middle posture through the style body characteristic extraction network to obtain transformed style body characteristics;
wherein the gesture control module 520 comprises:
a target point transformation unit, configured to transform a target point in a target gesture to an intermediate gesture through the gesture control network, to obtain transformed target point coordinates;
and the style characteristic vector unit is used for determining a style characteristic vector corresponding to the target point coordinate according to the transformed target point coordinate and the transformed style body characteristic under the middle gesture.
In an alternative embodiment, the target point transforming unit comprises:
a rigidity transformation subunit, configured to rigidly transform, through the gesture control network, a target point in a target gesture to an intermediate gesture, to obtain a new coordinate in the target gesture;
a non-rigid correction subunit, configured to input the new coordinate under the target gesture into a non-rigid transformation sub-network in the gesture control network to perform non-rigid transformation to obtain a non-rigid correction coordinate;
And the target point coordinate subunit is used for obtaining the transformed target point coordinate according to the new coordinate under the target posture and the non-rigid correction coordinate.
In an alternative embodiment, the style body transforming unit includes:
an optical flow subunit, configured to determine optical flow information of the movement of the key point according to the condition gesture and the target gesture;
and the style body transformation subunit is used for transforming the conditional style body characteristics into the intermediate posture according to the optical flow information of the key point movement to obtain transformation style body characteristics.
In an alternative embodiment, the style feature vector unit is specifically configured to:
and interpolating the transformed target point coordinates based on the transformed style body characteristics under the middle gesture to obtain style characteristic vectors corresponding to the target point coordinates.
In an alternative embodiment, the model training module 540 includes:
the pixel training unit is used for comparing the pixel value in the true image corresponding to the target gesture with the pixel value in the rendered image and training the virtual character generation model according to the pixel comparison result;
and the similarity training unit is used for determining the global similarity between the truth image corresponding to the target gesture and the rendering image, and training the virtual character generation model according to the global similarity.
In an alternative embodiment, the similarity training unit includes:
a first global subunit, configured to determine, through a countermeasure network, a first global similarity between a truth image corresponding to a target gesture and the rendered image;
a second global subunit, configured to determine, through a perception network, a second global similarity between a truth image corresponding to a target gesture and the rendered image;
and the similarity training subunit is used for training the virtual character generation model according to the first global similarity and/or the second global similarity.
According to the technical scheme, the human body attribute rendered by the nerve radiation field is controlled by utilizing the characteristics extracted by the subareas, so that the local editing of the virtual character is realized; the dual control of the gesture and the appearance of the virtual character is realized by utilizing the dual-branch network structure, and single-model multi-character rendering is supported.
Fig. 6 is a schematic structural view of a virtual character generating apparatus provided according to an embodiment of the present disclosure; the present embodiment is applicable to the case of generating a virtual character image based on a neural radiation field. The apparatus may be implemented in software and/or hardware and may be integrated into an electronic device. As shown in fig. 6, the avatar generating apparatus 600 of the present embodiment may include:
A target radiation field module 610 for acquiring a target neural radiation field; the target nerve radiation field is constructed in the process of training the virtual character generating model by adopting the training method of the virtual character generating model disclosed by any embodiment of the disclosure;
the virtual character generating module 620 is configured to render through the target neural radiation field to obtain a virtual character image.
In an alternative embodiment, the virtual character generating apparatus 600 further includes a new character rendering module, where the new character rendering module includes:
the new condition style body unit is used for acquiring a new image area obtained by replacing at least one image area in the condition image through a style body feature extraction network in the virtual character generation model, and determining new condition style body features under the condition posture according to the new image area and the condition posture corresponding to the condition image;
the new grid feature unit is used for transforming the target point in the target gesture through the gesture control network in the virtual character generation model to obtain transformed target point coordinates, and determining new grid feature vectors corresponding to the target point coordinates according to the target point coordinates and the new condition style body features;
And the new character rendering unit is used for constructing a new target nerve radiation field according to the target point coordinates, the new grid feature vector corresponding to the target point coordinates and the observation direction corresponding to the target point coordinates through a nerve radiation field network in the virtual character generation model, and rendering through the new target nerve radiation field to obtain a new virtual character image.
According to the technical scheme, the human body attribute rendered by the nerve radiation field is controlled by utilizing the characteristics extracted by the subareas, so that the local editing of the virtual character is realized; the dual control of the gesture and the appearance of the virtual character is realized by utilizing the dual-branch network structure, and single-model multi-character rendering is supported.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 7 is a block diagram of an electronic device for implementing a training method of a avatar generation model or a avatar generation method of an embodiment of the present disclosure. Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 701 performs the respective methods and processes described above, for example, a training method of a virtual character generation model or a virtual character generation method. For example, in some embodiments, the training method of the avatar generation model or the avatar generation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the training method of the virtual character generating model or the virtual character generating method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the training method of the avatar generation model or the avatar generation method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligent software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.
Cloud computing (cloud computing) refers to a technical system that a shared physical or virtual resource pool which is elastically extensible is accessed through a network, resources can comprise servers, operating systems, networks, software, applications, storage devices and the like, and resources can be deployed and managed in an on-demand and self-service mode. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (23)

1. A training method of a virtual character generating model comprises a style body characteristic extraction network, a gesture control network and a nerve radiation field network; the method comprises the following steps:
determining the condition style body characteristics under the condition posture according to the condition image and the condition posture corresponding to the condition image through the style body characteristic extraction network; wherein the style body features are used for characterizing style attributes of the virtual characters;
transforming the target point in the target gesture through the gesture control network to obtain transformed target point coordinates, and determining a style feature vector corresponding to the target point coordinates according to the target point coordinates and the conditional style body features;
constructing a target nerve radiation field according to the target point coordinates, the style feature vector corresponding to the target point coordinates and the observation direction corresponding to the target point coordinates through the nerve radiation field network, and rendering through the target nerve radiation field to obtain a rendered image;
And training the virtual character generation model according to the true value image corresponding to the target gesture and the rendering image.
2. The method of claim 1, wherein the style body feature extraction network comprises a first feature extraction sub-network, a second feature extraction sub-network, and a feature fusion sub-network;
the step of determining the condition style body characteristics under the condition posture according to the condition image and the condition posture corresponding to the condition image through the style body characteristic extraction network comprises the following steps:
carrying out semantic segmentation on the conditional image to obtain an image area, and obtaining conditional image features according to the image area and the conditional image through the first feature extraction sub-network;
obtaining gesture features under the conditional gestures according to the conditional gestures through the second feature extraction sub-network;
and obtaining the condition style body characteristics under the condition posture according to the condition image characteristics and the posture characteristics under the condition posture through the characteristic fusion sub-network.
3. The method of claim 1, the method further comprising:
through the style body feature extraction network, the conditional style body features are transformed into the middle posture to obtain transformed style body features;
The method for determining the style feature vector corresponding to the target point coordinate according to the target point coordinate and the condition style body feature comprises the following steps:
transforming the target point in the target gesture to the middle gesture through the gesture control network to obtain transformed target point coordinates;
and determining a style characteristic vector corresponding to the target point coordinates according to the transformed target point coordinates and the transformed style body characteristics under the middle posture.
4. A method according to claim 3, wherein said transforming, via the gesture control network, the target point in the target gesture to an intermediate gesture, resulting in transformed target point coordinates, comprises:
rigidly transforming a target point in a target posture to an intermediate posture through the posture control network to obtain a new coordinate in the target posture;
inputting the new coordinates under the target posture into a non-rigid transformation sub-network in the posture control network to perform non-rigid transformation to obtain non-rigid corrected coordinates;
and obtaining transformed target point coordinates according to the new coordinates in the target posture and the non-rigid corrected coordinates.
5. A method according to claim 3, wherein said transforming the conditional style body feature into the intermediate pose results in a transformed style body feature, comprising:
determining optical flow information of the key point movement according to the condition gesture and the target gesture;
and according to the optical flow information of the key point movement, transforming the conditional style body characteristic into the intermediate posture to obtain a transformed style body characteristic.
6. A method according to claim 3, wherein the determining a style feature vector corresponding to the target point coordinates from the transformed target point coordinates and the transformed style body feature in the intermediate pose comprises:
and interpolating the transformed target point coordinates based on the transformed style body characteristics under the middle gesture to obtain style characteristic vectors corresponding to the target point coordinates.
7. The method of any of claims 1-6, wherein the training the virtual character generation model from the truth image and the rendered image corresponding to a target pose comprises:
comparing the pixel value in the truth image corresponding to the target gesture with the pixel value in the rendering image, and training the virtual character generating model according to the pixel comparison result;
And determining the global similarity between the truth image corresponding to the target gesture and the rendered image, and training the virtual character generation model according to the global similarity.
8. The method of claim 7, wherein the determining global similarity between the truth image and the rendered image for the target pose and training the virtual character generation model according to the global similarity comprises:
determining a first global similarity between a truth image corresponding to a target gesture and the rendered image through a countermeasure network;
determining a second global similarity between the truth image corresponding to the target gesture and the rendered image through a perception network;
and training the virtual character generating model according to the first global similarity and/or the second global similarity.
9. A virtual character generation method, comprising:
acquiring a target nerve radiation field; the target neural radiation field is constructed during training of the virtual character generation model by the method of any one of claims 1-8;
rendering is carried out through the target nerve radiation field, and a virtual character image is obtained.
10. The method of claim 9, the method further comprising:
acquiring a new image area obtained by replacing at least one image area in the conditional image through a style body feature extraction network in the virtual character generation model, and determining new condition style body features under the condition posture according to the new image area and the condition posture corresponding to the condition image;
transforming target points in the target gestures through a gesture control network in the virtual character generation model to obtain transformed target point coordinates, and determining new grid feature vectors corresponding to the target point coordinates according to the target point coordinates and the new condition style body features;
and constructing a new target nerve radiation field according to the target point coordinates, the new grid feature vector corresponding to the target point coordinates and the observation direction corresponding to the target point coordinates through a nerve radiation field network in the virtual character generation model, and rendering through the new target nerve radiation field to obtain a new virtual character image.
11. A training device of a virtual character generating model comprises a style body characteristic extraction network, a gesture control network and a nerve radiation field network; the device comprises:
The style body extraction module is used for determining the condition style body characteristics under the condition posture according to the condition image and the condition posture corresponding to the condition image through the style body characteristic extraction network; wherein the style body features are used for characterizing style attributes of the virtual characters;
the gesture control module is used for transforming the target point in the target gesture through the gesture control network to obtain transformed target point coordinates, and determining a style feature vector corresponding to the target point coordinates according to the target point coordinates and the conditional style body features;
the nerve radiation field module is used for constructing a target nerve radiation field according to the target point coordinates, the style feature vector corresponding to the target point coordinates and the observation direction corresponding to the target point coordinates through the nerve radiation field network, and rendering through the target nerve radiation field to obtain a rendered image;
and the model training module is used for training the virtual character generating model according to the true image corresponding to the target gesture and the rendering image.
12. The apparatus of claim 11, wherein the style body feature extraction network comprises a first feature extraction sub-network, a second feature extraction sub-network, and a feature fusion sub-network;
The style body extraction module comprises:
the conditional image feature unit is used for carrying out semantic segmentation on the conditional image to obtain an image area, and obtaining conditional image features according to the image area and the conditional image through the first feature extraction sub-network;
a gesture feature unit, configured to obtain, through the second feature extraction sub-network, a gesture feature under a conditional gesture according to the conditional gesture;
and the conditional style body unit is used for obtaining the conditional style body characteristics under the conditional posture according to the conditional image characteristics and the posture characteristics under the conditional posture through the characteristic fusion sub-network.
13. The apparatus of claim 11, the style volume extraction module further comprising a style volume transformation unit:
the style body transformation unit is used for transforming the conditional style body characteristics to the middle posture through the style body characteristic extraction network to obtain transformed style body characteristics;
wherein the gesture control module comprises:
a target point transformation unit, configured to transform a target point in a target gesture to an intermediate gesture through the gesture control network, to obtain transformed target point coordinates;
And the style characteristic vector unit is used for determining a style characteristic vector corresponding to the target point coordinate according to the transformed target point coordinate and the transformed style body characteristic under the middle gesture.
14. The apparatus of claim 13, wherein the target point transformation unit comprises:
a rigidity transformation subunit, configured to rigidly transform, through the gesture control network, a target point in a target gesture to an intermediate gesture, to obtain a new coordinate in the target gesture;
a non-rigid correction subunit, configured to input the new coordinate under the target gesture into a non-rigid transformation sub-network in the gesture control network to perform non-rigid transformation to obtain a non-rigid correction coordinate;
and the target point coordinate subunit is used for obtaining the transformed target point coordinate according to the new coordinate under the target posture and the non-rigid correction coordinate.
15. The apparatus of claim 13, wherein the style body transformation unit comprises:
an optical flow subunit, configured to determine optical flow information of the movement of the key point according to the condition gesture and the target gesture;
and the style body transformation subunit is used for transforming the conditional style body characteristics into the intermediate posture according to the optical flow information of the key point movement to obtain transformation style body characteristics.
16. The apparatus of claim 13, wherein the style feature vector unit is specifically configured to:
and interpolating the transformed target point coordinates based on the transformed style body characteristics under the middle gesture to obtain style characteristic vectors corresponding to the target point coordinates.
17. The apparatus of any of claims 11-16, wherein the model training module comprises:
the pixel training unit is used for comparing the pixel value in the true image corresponding to the target gesture with the pixel value in the rendered image and training the virtual character generation model according to the pixel comparison result;
and the similarity training unit is used for determining the global similarity between the truth image corresponding to the target gesture and the rendering image, and training the virtual character generation model according to the global similarity.
18. The apparatus of claim 17, wherein the similarity training unit comprises:
a first global subunit, configured to determine, through a countermeasure network, a first global similarity between a truth image corresponding to a target gesture and the rendered image;
a second global subunit, configured to determine, through a perception network, a second global similarity between a truth image corresponding to a target gesture and the rendered image;
And the similarity training subunit is used for training the virtual character generation model according to the first global similarity and/or the second global similarity.
19. A virtual character generating apparatus comprising:
the target radiation field module is used for acquiring a target nerve radiation field; the target neural radiation field is constructed during training of the virtual character generation model by the method of any one of claims 1-8;
and the virtual character generating module is used for rendering through the target nerve radiation field to obtain a virtual character image.
20. The apparatus of claim 19, the apparatus further comprising a new character rendering module, the new character rendering module comprising:
the new condition style body unit is used for acquiring a new image area obtained by replacing at least one image area in the condition image through a style body feature extraction network in the virtual character generation model, and determining new condition style body features under the condition posture according to the new image area and the condition posture corresponding to the condition image;
the new grid feature unit is used for transforming the target point in the target gesture through the gesture control network in the virtual character generation model to obtain transformed target point coordinates, and determining new grid feature vectors corresponding to the target point coordinates according to the target point coordinates and the new condition style body features;
And the new character rendering unit is used for constructing a new target nerve radiation field according to the target point coordinates, the new grid feature vector corresponding to the target point coordinates and the observation direction corresponding to the target point coordinates through a nerve radiation field network in the virtual character generation model, and rendering through the new target nerve radiation field to obtain a new virtual character image.
21. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.
22. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-10.
23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-10.
CN202310028021.3A 2023-01-09 2023-01-09 Training method and generating method and device of virtual character model and electronic equipment Active CN116309983B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310028021.3A CN116309983B (en) 2023-01-09 2023-01-09 Training method and generating method and device of virtual character model and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310028021.3A CN116309983B (en) 2023-01-09 2023-01-09 Training method and generating method and device of virtual character model and electronic equipment

Publications (2)

Publication Number Publication Date
CN116309983A CN116309983A (en) 2023-06-23
CN116309983B true CN116309983B (en) 2024-04-09

Family

ID=86831308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310028021.3A Active CN116309983B (en) 2023-01-09 2023-01-09 Training method and generating method and device of virtual character model and electronic equipment

Country Status (1)

Country Link
CN (1) CN116309983B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116977525B (en) * 2023-07-31 2024-03-01 之江实验室 Image rendering method and device, storage medium and electronic equipment
CN117876550B (en) * 2024-03-11 2024-05-14 国网电商科技有限公司 Virtual digital person rendering method, system and terminal equipment based on big data

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113099208A (en) * 2021-03-31 2021-07-09 清华大学 Method and device for generating dynamic human body free viewpoint video based on nerve radiation field
CN113688907A (en) * 2021-08-25 2021-11-23 北京百度网讯科技有限公司 Model training method, video processing method, device, equipment and storage medium
CN114511662A (en) * 2022-01-28 2022-05-17 北京百度网讯科技有限公司 Method and device for rendering image, electronic equipment and storage medium
WO2022104178A1 (en) * 2020-11-16 2022-05-19 Google Llc Inverting neural radiance fields for pose estimation
WO2022104299A1 (en) * 2020-11-16 2022-05-19 Google Llc Deformable neural radiance fields
CN114648613A (en) * 2022-05-18 2022-06-21 杭州像衍科技有限公司 Three-dimensional head model reconstruction method and device based on deformable nerve radiation field
CN114820906A (en) * 2022-06-24 2022-07-29 北京百度网讯科技有限公司 Image rendering method and device, electronic equipment and storage medium
CN115409937A (en) * 2022-08-19 2022-11-29 中国人民解放军战略支援部队信息工程大学 Facial video expression migration model construction method based on integrated nerve radiation field and expression migration method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022104178A1 (en) * 2020-11-16 2022-05-19 Google Llc Inverting neural radiance fields for pose estimation
WO2022104299A1 (en) * 2020-11-16 2022-05-19 Google Llc Deformable neural radiance fields
CN113099208A (en) * 2021-03-31 2021-07-09 清华大学 Method and device for generating dynamic human body free viewpoint video based on nerve radiation field
CN113688907A (en) * 2021-08-25 2021-11-23 北京百度网讯科技有限公司 Model training method, video processing method, device, equipment and storage medium
CN114511662A (en) * 2022-01-28 2022-05-17 北京百度网讯科技有限公司 Method and device for rendering image, electronic equipment and storage medium
CN114648613A (en) * 2022-05-18 2022-06-21 杭州像衍科技有限公司 Three-dimensional head model reconstruction method and device based on deformable nerve radiation field
CN114820906A (en) * 2022-06-24 2022-07-29 北京百度网讯科技有限公司 Image rendering method and device, electronic equipment and storage medium
CN115409937A (en) * 2022-08-19 2022-11-29 中国人民解放军战略支援部队信息工程大学 Facial video expression migration model construction method based on integrated nerve radiation field and expression migration method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HumanNerf: Free-viewpoint Rendering of Moving People from Monocular Video;Chuang-Yi Weng等;arXiv cs.CV;全文 *
NeuralActor: Neural Free-view Synthesis of Human Actors with Pose Control;Lingjie Liu等;arXiv cs.CV;全文 *

Also Published As

Publication number Publication date
CN116309983A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
US11734851B2 (en) Face key point detection method and apparatus, storage medium, and electronic device
JP7135125B2 (en) Near-infrared image generation method, near-infrared image generation device, generation network training method, generation network training device, electronic device, storage medium, and computer program
CN116309983B (en) Training method and generating method and device of virtual character model and electronic equipment
US20230419592A1 (en) Method and apparatus for training a three-dimensional face reconstruction model and method and apparatus for generating a three-dimensional face image
CN113409430B (en) Drivable three-dimensional character generation method, drivable three-dimensional character generation device, electronic equipment and storage medium
CN112581573B (en) Avatar driving method, apparatus, device, medium, and program product
CN114723888B (en) Three-dimensional hair model generation method, device, equipment, storage medium and product
CN114092759A (en) Training method and device of image recognition model, electronic equipment and storage medium
KR20220011078A (en) Active interaction method, device, electronic equipment and readable storage medium
US20230401799A1 (en) Augmented reality method and related device
CN114187624A (en) Image generation method, image generation device, electronic equipment and storage medium
CN115147265A (en) Virtual image generation method and device, electronic equipment and storage medium
CN117274491A (en) Training method, device, equipment and medium for three-dimensional reconstruction model
CN112580666A (en) Image feature extraction method, training method, device, electronic equipment and medium
CN113379877B (en) Face video generation method and device, electronic equipment and storage medium
CN114049290A (en) Image processing method, device, equipment and storage medium
CN112669431B (en) Image processing method, apparatus, device, storage medium, and program product
CN112562043B (en) Image processing method and device and electronic equipment
CN113658035A (en) Face transformation method, device, equipment, storage medium and product
CN115222895B (en) Image generation method, device, equipment and storage medium
CN113052962A (en) Model training method, information output method, device, equipment and storage medium
CN112714337A (en) Video processing method and device, electronic equipment and storage medium
CN116402914A (en) Method, device and product for determining stylized image generation model
CN113781653B (en) Object model generation method and device, electronic equipment and storage medium
CN116052288A (en) Living body detection model training method, living body detection device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant