CN110035271B - Fidelity image generation method and device and electronic equipment - Google Patents

Fidelity image generation method and device and electronic equipment Download PDF

Info

Publication number
CN110035271B
CN110035271B CN201910216551.4A CN201910216551A CN110035271B CN 110035271 B CN110035271 B CN 110035271B CN 201910216551 A CN201910216551 A CN 201910216551A CN 110035271 B CN110035271 B CN 110035271B
Authority
CN
China
Prior art keywords
target object
images
fidelity
map
input information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910216551.4A
Other languages
Chinese (zh)
Other versions
CN110035271A (en
Inventor
郭冠军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910216551.4A priority Critical patent/CN110035271B/en
Publication of CN110035271A publication Critical patent/CN110035271A/en
Application granted granted Critical
Publication of CN110035271B publication Critical patent/CN110035271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/275Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the disclosure provides a fidelity image generation method, a fidelity image generation device and electronic equipment, belonging to the technical field of data processing, wherein the method comprises the following steps: acquiring a plurality of images containing a target object, one or more continuous actions of the target object being determinable based on the plurality of images; acquiring a texture map of a specific area on the target object and a shape constraint map of a specific element in the plurality of images; constructing a reconstructed model of the target object based on the texture map, the shape constraint map, and two-dimensional image information of the plurality of images; generating, using the reconstruction model, a fidelity image that matches input information of the target object, the fidelity image including one or more predicted actions that match the input information. Through the processing scheme of the application, the reality of the generated image is improved.

Description

Fidelity image generation method and device and electronic equipment
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for generating a fidelity image, and an electronic device.
Background
With the development of network technology, the application of artificial intelligence technology in network scenes is greatly improved. As a specific application requirement, more and more network environments use virtual characters for interaction, for example, a virtual anchor is provided in live webcasting to perform anthropomorphic broadcast on live webcasting content, and necessary guidance is provided for live webcasting, so that the live webcasting presence and interactivity are enhanced, and the live webcasting effect is improved.
Expression simulation (e.g., mouth-type motion simulation) technology is one of artificial intelligence technologies, and currently, expression simulation is implemented to drive facial expressions of characters mainly based on text-driven, natural voice-driven, and audio-video hybrid modeling methods. For example, a Text-to-Speech (TTS) engine typically converts input Text information into a corresponding phoneme sequence, phoneme duration and a corresponding Speech waveform, then selects a corresponding model unit from a model library, and finally presents Speech and facial expression actions corresponding to the input Text content through smoothing and a corresponding synchronization algorithm.
The expression simulation in the prior art has the condition of single expression simulation and even distortion, more robots perform, and the fidelity of expression actions is far away from the expression of real characters.
Disclosure of Invention
In view of the above, embodiments of the present disclosure provide a method and an apparatus for generating a fidelity image, and an electronic device, which at least partially solve the problems in the prior art.
In a first aspect, an embodiment of the present disclosure provides a fidelity image generation method, including:
acquiring a plurality of images containing a target object, one or more continuous actions of the target object being determinable based on the plurality of images;
acquiring a texture map of a specific area on the target object and a shape constraint map of a specific element in the plurality of images;
constructing a reconstructed model of the target object based on the texture map, the shape constraint map, and two-dimensional image information of the plurality of images;
generating, using the reconstruction model, a fidelity image that matches input information of the target object, the fidelity image including one or more predicted actions that match the input information.
According to a specific implementation manner of the embodiment of the present disclosure, the acquiring a plurality of images including a target object includes:
adopting camera equipment to carry out video acquisition on the target object to obtain a video file containing a plurality of video frames;
and selecting part or all of the video frames from the video file to form a plurality of images containing the target object.
According to a specific implementation manner of the embodiment of the present disclosure, the acquiring a plurality of images including a target object includes:
setting broadcast samples of different styles aiming at a target object;
acquiring a sample video of the target object aiming at the broadcast samples of different styles;
a plurality of images including a target object are acquired from the sample video.
According to a specific implementation manner of the embodiment of the present disclosure, the obtaining a texture map of a specific region on the target object and a shape constraint map of a specific element in the plurality of images includes:
3D reconstruction is carried out on a specific area of the target object to obtain a 3D area object;
acquiring a three-dimensional grid of the 3D area object, wherein the three-dimensional grid comprises a preset coordinate value;
determining a texture map for the particular region based on pixel values at different three-dimensional grid coordinates.
According to a specific implementation manner of the embodiment of the present disclosure, the obtaining a texture map of a specific region on the target object and a shape constraint map of a specific element in the plurality of images further includes:
performing keypoint detection for a specific element in the plurality of images, resulting in a plurality of keypoints related to the specific element;
forming a shape constraint graph describing the particular element based on the plurality of keypoints.
According to a specific implementation manner of the embodiment of the present disclosure, the constructing a reconstruction model of the target object based on the texture map, the shape constraint map, and the two-dimensional image information of the plurality of images includes:
and setting a convolutional neural network for training the reconstruction model, and training an image containing the target object by using the convolutional neural network, wherein the input of the last layer of the convolutional neural network is consistent with the node input of the input layer.
According to a specific implementation manner of the embodiment of the present disclosure, the training, by using the convolutional neural network, an image including the target object includes:
measuring a prediction error by using a mean square error function, wherein the prediction error is used for describing the difference between an output pictographic frame and an artificial collection frame;
and reducing the prediction error by adopting a back propagation function.
According to a specific implementation manner of the embodiment of the present disclosure, the generating a fidelity image matched with the input information of the target object by using the reconstruction model includes:
acquiring input information aiming at the target object, and analyzing the input information to obtain a first analysis result;
performing model quantization on the first analysis result to obtain a target object motion quantization vector;
generating a plurality of fidelity images matched to the motion quantization vector.
According to a specific implementation manner of the embodiment of the present disclosure, the generating a plurality of fidelity images matched with the motion quantization vectors includes:
taking the texture map as a fixed input of the fidelity image;
determining a motion constraint value for the particular element based on element values in the motion quantization vector;
and predicting a plurality of fidelity images matched with the input information through continuous motion constraint values and the fixed texture maps.
In a second aspect, an embodiment of the present disclosure provides a fidelity image generating apparatus, including:
an acquisition module for acquiring a plurality of images containing a target object, one or more continuous actions of the target object being determinable based on the plurality of images;
an obtaining module, configured to determine, in the plurality of images, a texture map of a specific region on the target object and a shape constraint map of a specific element;
a construction module for constructing a reconstructed model of the target object based on the texture map, the shape constraint map, and two-dimensional image information of the plurality of images;
a generating module for generating a fidelity image matched with the input information of the target object by using the reconstruction model, wherein the fidelity image comprises one or more predicted actions matched with the input information.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating a fidelity image according to any of the first aspect or any implementation manner of the first aspect.
In a fourth aspect, the disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the fidelity image generation method in the first aspect or any implementation manner of the first aspect.
In a fifth aspect, the disclosed embodiments also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the fidelity image generation method in the first aspect or any of the implementations of the first aspect.
The fidelity image generation scheme in the disclosed embodiment comprises acquiring a plurality of images containing a target object, and determining one or more continuous actions of the target object based on the plurality of images; acquiring a texture map of a specific area on the target object and a shape constraint map of a specific element in the plurality of images; constructing a reconstructed model of the target object based on the texture map, the shape constraint map, and two-dimensional image information of the plurality of images; generating, using the reconstruction model, a fidelity image that matches input information of the target object, the fidelity image including one or more predicted actions that match the input information. By the aid of the processing scheme, the animation image matched with the input information can be truly simulated, and user experience is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram illustrating a process of generating a fidelity image according to an embodiment of the disclosure;
fig. 2 is a schematic diagram illustrating another process for generating a fidelity image according to an embodiment of the disclosure;
fig. 3 is a schematic diagram illustrating another process for generating a fidelity image according to an embodiment of the disclosure;
fig. 4 is a schematic view of another fidelity image generation process provided in the embodiments of the present disclosure;
fig. 5 is a schematic view of another fidelity image generation process provided in the embodiments of the present disclosure;
fig. 6 is a schematic structural diagram of a fidelity image generating apparatus provided in the embodiment of the disclosure;
fig. 7 is a schematic view of an electronic device provided in an embodiment of the disclosure.
Detailed Description
The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present disclosure, and the drawings only show the components related to the present disclosure rather than the number, shape and size of the components in actual implementation, and the type, amount and ratio of the components in actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The embodiment of the disclosure provides a fidelity image generation method. The fidelity image generation method provided by the embodiment can be executed by a computing device, the computing device can be implemented as software, or implemented as a combination of software and hardware, and the computing device can be integrated in a server, a terminal device and the like.
Referring to fig. 1, a fidelity image generation method provided by the embodiment of the present disclosure includes the following steps:
s101, a plurality of images containing a target object are acquired, and one or more continuous motions of the target object can be determined based on the plurality of images.
The action and expression of the target object are contents to be simulated and predicted by the scheme of the disclosure, and as an example, the target object may be a real person capable of performing network broadcasting, or may be another object having an information dissemination function, such as a television program host, a news broadcaster, a teacher giving lessons, and the like.
The target object is usually a person with a broadcasting function, and since the person of the type usually has a certain degree of awareness, when there is a huge amount of content that requires the target object to perform broadcasting including voice and/or video actions, it usually requires a large cost. Meanwhile, for a live-type program, a target object generally cannot appear in multiple live rooms (or multiple live channels) at the same time. If an effect such as "anchor separation" is desired, it is often difficult to achieve this effect by live broadcast.
For this reason, it is necessary to capture a video of a target object (e.g., a main broadcast) by a video recording device such as a video camera in advance, and capture a broadcast record of the target object for different contents by the video. For example, a live room host of the target object may be recorded, and a broadcast record of the target object for a news segment may also be recorded.
The video collected for the target object comprises a plurality of frame images, and a plurality of images comprising one or more continuous motions of the target object can be selected from the frame images of the video to form an image set. By training the image set, the action and expression of the target object aiming at different input contents can be predicted and simulated.
S102, acquiring a texture map of a specific area on the target object and a shape constraint map of a specific element from the plurality of images.
Having acquired a plurality of images (e.g., video frames) associated with the target object, constituent objects on the target object may be selected to model the target object. To improve the efficiency of the modeling, certain regions that are not too highly recognizable to the user (e.g., facial regions) and certain elements that are highly recognizable to the user (e.g., mouth, eyes, etc.) may be selected for modeling.
Specifically, a texture map (e.g., face texture) of a specific region of the target object and key points (e.g., eye, mouth, etc. key points of five sense organs) of specific elements in the plurality of images are acquired, so that a shape constraint map of the target object texture map and the specific elements is formed.
The texture of the specific region may be obtained by a 3D reconstruction method, for example, a human face three-dimensional mesh is obtained by a 3D human face reconstruction method, and the human face pixel values corresponding to all three-dimensional mesh points constitute the facial texture of the target object (e.g., anchor). Wherein, the 3D face reconstruction can be realized by adopting the existing technology.
The shape constraint graph of the specific element can be realized by adopting a key point detection mode, taking eyes and mouth as an example, the eye and mouth key points are obtained by the existing face key point detection algorithm. The eye/mouth closure area is formed by connecting key points around the eye/mouth, respectively. The pupil area of the eye is filled in blue, the rest of the eye is filled in white, and the mouth-closing area is filled in red. The image of the closed region formed by the key points of the specific element after being filled with color forms the shape constraint graph of the specific element.
S103, constructing a reconstruction model of the target object based on the texture map, the shape constraint map and the two-dimensional image information of the plurality of images.
After the texture map and the shape constraint map are obtained, a plurality of images for generating the texture map and the shape constraint map can be combined, and a reconstruction model for a target object can be trained and constructed through the set convolution neural network.
The convolutional neural network structure may contain several convolutional layers, pooling layers, fully-connected layers, and classifiers. The number of nodes of the output layer and the input layer of the last layer of the convolutional neural network structure is the same, so that the video frame generating the target object image can be directly output.
In the process of training the convolutional neural network, a mean square error function is used for measuring a prediction error, namely the difference between a prediction output target object pictograph frame and an artificial acquisition target object pictograph frame, and for the difference, a back propagation function is used for reducing the difference.
And S104, generating a fidelity image matched with the input information of the target object by using the reconstruction model, wherein the fidelity image comprises one or more predicted actions matched with the input information.
After the reconstruction model is set, various actions and expressions of the target object in the video can be predicted by utilizing the reconstruction model in a video animation mode. Specifically, a video file containing the target object motion and expression may be generated by generating a fidelity image, which may be a full frame or a key frame of the video file, containing a collection of multiple images of one or more predicted motions that match the input information.
The input information may be in a variety of ways, for example, the input information may be in the form of text or audio. And converting input information into parameters matched with the texture map and the shape constraint map after data analysis, and finally completing generation of a guaranteed image by calling the texture map and the shape constraint map by using the reconstructed model obtained after training.
In the prediction stage, a texture map of a specific area of a target object and shape constraints of specific elements can be given, image information of a two-dimensional anchor broadcast image is predicted by using a trained reconstruction model, and a continuous anchor broadcast image is predicted by taking the shape constraints of continuous specific elements and the textures of fixed specific areas as input.
In the process of implementing step S101, referring to fig. 2, according to a specific implementation manner of the embodiment of the present disclosure, acquiring a plurality of images including a target object may include the following steps:
s201, video acquisition is carried out on the target object by adopting the camera equipment, and a video file containing a plurality of video frames is obtained.
The target object is usually a person with a broadcasting function, and since the person of the type usually has a certain degree of awareness, when there is a huge amount of content that requires the target object to perform broadcasting including voice and/or video actions, it usually requires a large cost. Meanwhile, for live programs, a target object cannot appear in a plurality of live broadcasting rooms (or a plurality of live broadcasting channels) at the same time, and if an effect such as 'anchor' is displayed, the effect is usually difficult to achieve by live broadcasting through a real person.
For this reason, it is necessary to capture a video of a target object (e.g., a main broadcast) by a video recording device such as a video camera in advance, and capture a broadcast record of the target object for different contents by the video. For example, a live room host of the target object may be recorded, and a broadcast record of the target object for a news segment may also be recorded.
S202, selecting partial or all video frames from the video file to form a plurality of images containing the target object.
The video collected for the target object comprises a plurality of frame images, and a plurality of images comprising one or more continuous motions of the target object can be selected from the video to form an image set. By training the image set, the action and expression of the target object aiming at different input contents can be predicted and simulated.
As another implementation manner of step S101, referring to fig. 3, according to a specific implementation manner of the embodiment of the present disclosure, the acquiring a plurality of images including a target object may further include steps S301 to S303:
s301, setting broadcast samples of different styles aiming at the target object.
In order to more comprehensively acquire various actions and expressions of the target object, different types of broadcast samples can be preset. For example, the broadcast sample may contain different emotions such as happy, sad, angry, etc., thereby obtaining a more comprehensive training sample.
And S302, acquiring a sample video of the target object aiming at the broadcast samples of different styles.
By carrying out video sampling on the target object, sample videos of the target object aiming at the different styles of broadcast samples can be obtained.
S303, acquiring a plurality of images including the target object from the sample video.
According to actual needs, a plurality of images containing the target object can be selected from a plurality of video frames in the sample video, the plurality of images can be part or all of the video frames in the sample video, and key frames can be selected from all of the sample video as the plurality of images.
In the process of implementing step S102, according to a specific implementation manner of the embodiment of the present disclosure, referring to fig. 4, acquiring a texture map of a specific region on the target object and a shape constraint map of a specific element in the plurality of images may include:
s401, performing 3D reconstruction on the specific area of the target object to obtain a 3D area object.
Having acquired a plurality of images (e.g., video frames) associated with the target object, constituent objects on the target object may be selected to model the target object. To improve the efficiency of the modeling, certain regions that are not too highly recognizable to the user (e.g., facial regions) and certain elements that are highly recognizable to the user (e.g., mouth, eyes, etc.) may be selected for modeling.
S402, obtaining a three-dimensional grid of the 3D area object, wherein the three-dimensional grid comprises a preset coordinate value.
The 3D region object is described by a three-dimensional grid in terms of its specific position, for which specific coordinate values are set for the three-dimensional grid, which can be described, for example, by setting plane two-dimensional coordinates and spatial height coordinates.
And S403, determining a texture map of the specific area based on pixel values on different three-dimensional grid coordinates.
The pixel values at different three-dimensional grid coordinates may be connected together to form a grid plane that forms a texture map of the particular area.
Through the implementation of steps S401 to S403, the texture map of the specific area can be formed faster, and the efficiency of forming the texture map is improved.
In the process executed in step S104, as a specific implementation manner, referring to fig. 5, generating a fidelity image matched with the input information of the target object by using a reconstruction model may include the following steps:
s501, acquiring input information aiming at the target object, and analyzing the input information to obtain a first analysis result.
The input information may be in a variety of ways, for example, the input information may be in the form of text or audio. And converting the input information into a first analysis result after data analysis, wherein the first analysis result comprises parameters matched with the texture map and the shape constraint map, and finally completing generation of a guarantee image by calling the texture map and the shape constraint map by using the reconstruction model obtained after training.
And S502, performing model quantization on the first analysis result to obtain a target object motion quantization vector.
The first analysis result includes a motion amplitude parameter for a specific element on the target object, and taking the mouth as an example, the motion amplitude can be quantized to 1 when the mouth is fully opened, the motion amplitude can be quantized to 0 when the mouth is fully closed, and by quantizing a value between 0 and 1, an intermediate state of the mouth between full opening and full closing can be described.
S503, a plurality of fidelity images matched with the motion quantization vectors are generated.
Motion quantization vectors can describe the motion amplitude of a specific element on a target object in a mode of a sequence fidelity image, and fidelity patterns of the specific motion element with different motion amplitudes are continuously spliced together to form a prediction result containing different motions of the target object.
Specifically, generating a plurality of fidelity images matched with the motion quantization vectors may include steps S5031 to S5033:
s5031, using the texture map as a fixed input of the fidelity image.
Since the sensitivity of the texture map to the user is low, the texture map can be used as a fixed input of the predicted target object in the process of forming the fidelity image, that is, the texture map remains unchanged in the fidelity image.
S5032, determining a motion constraint value of the specific element based on the element value in the motion quantization vector.
Motion amplitude of a specific element on a target object on a fidelity image can be described through element values in motion quantization vectors, fidelity patterns of the specific motion element with different motion amplitudes are continuously spliced together, and prediction results comprising different motions of the target object are formed
S5033, predicting a plurality of fidelity images matching the input information by using the continuous motion constraint value and the fixed texture map.
In contrast to the above method embodiment, referring to fig. 6, the present disclosure embodiment further discloses a fidelity image generating device 60, including:
an acquisition module 601 for acquiring a plurality of images containing a target object, based on which one or more consecutive movements of the target object can be determined.
The action and expression of the target object are contents to be simulated and predicted by the scheme of the disclosure, and as an example, the target object may be a real person capable of performing network broadcasting, or may be another object having an information dissemination function, such as a television program host, a news broadcaster, a teacher giving lessons, and the like.
The target object is usually a person with a broadcasting function, and since the person of the type usually has a certain degree of awareness, when there is a huge amount of content that requires the target object to perform broadcasting including voice and/or video actions, it usually requires a large cost. Meanwhile, for live programs, a target object cannot appear in a plurality of live broadcasting rooms (or a plurality of live broadcasting channels) at the same time, and if an effect such as 'anchor' is displayed, the effect is usually difficult to achieve by live broadcasting through a real person.
For this reason, it is necessary to capture a video of a target object (e.g., a main broadcast) by a video recording device such as a video camera in advance, and capture a broadcast record of the target object for different contents by the video. For example, a live room host of the target object may be recorded, and a broadcast record of the target object for a news segment may also be recorded.
The video collected for the target object comprises a plurality of frame images, and a plurality of images comprising one or more continuous motions of the target object can be selected from the video to form an image set. By training the image set, the action and expression of the target object aiming at different input contents can be predicted and simulated.
An obtaining module 602, configured to determine, in the plurality of images, a texture map of a specific region on the target object and a shape constraint map of a specific element.
Having acquired a plurality of images (e.g., video frames) associated with the target object, constituent objects on the target object may be selected to model the target object. To improve the efficiency of the modeling, certain regions that are not too highly recognizable to the user (e.g., facial regions) and certain elements that are highly recognizable to the user (e.g., mouth, eyes, etc.) may be selected for modeling.
Specifically, a texture map (e.g., face texture) of a specific region of the target object and key points (e.g., eye, mouth, etc. key points of five sense organs) of specific elements in the plurality of images are acquired, so that a shape constraint map of the target object texture map and the specific elements is formed.
The texture of the specific region may be obtained by a 3D reconstruction method, for example, a human face three-dimensional mesh is obtained by a 3D human face reconstruction method, and the human face pixel values corresponding to all three-dimensional mesh points constitute the facial texture of the target object (e.g., anchor). Wherein, the 3D face reconstruction can be realized by adopting the existing technology.
The shape constraint graph of the specific element can be realized by adopting a key point detection mode, taking eyes and mouth as an example, the eye and mouth key points are obtained by the existing face key point detection algorithm. The eye/mouth closure area is formed by connecting key points around the eye/mouth, respectively. The pupil area of the eye is filled in blue, the rest of the eye is filled in white, and the mouth-closing area is filled in red. The image of the closed region formed by the key points of the specific element after being filled with color forms the shape constraint graph of the specific element.
A building module 603 configured to build a reconstructed model of the target object based on the texture map, the shape constraint map, and the two-dimensional image information of the plurality of images.
After the texture map and the shape constraint map are obtained, a plurality of images for generating the texture map and the shape constraint map can be combined, and a reconstruction model for a target object can be trained and constructed through the set convolution neural network.
Specifically, the convolutional neural network structure may include several convolutional layers, pooling layers, fully-connected layers, and classifiers. The number of nodes of the output layer and the input layer of the last layer of the convolutional neural network structure is the same, so that the video frame generating the target object image can be directly output.
In the process of training the convolutional neural network, a mean square error function is used for measuring a prediction error, namely the difference between a prediction output target object pictograph frame and an artificial acquisition target object pictograph frame, and for the difference, a back propagation function is used for reducing the difference.
A generating module 604 for generating a fidelity image matched to the input information of the target object using the reconstruction model, the fidelity image comprising one or more predicted actions matched to the input information.
After the reconstruction model is set, various actions and expressions of the target object in the video can be predicted by utilizing the reconstruction model in a video animation mode. Specifically, a video file containing the target object motion and expression may be generated by generating a fidelity image, which may be a full frame or a key frame of the video file, containing a collection of multiple images of one or more predicted motions that match the input information.
The input information may be in a variety of ways, for example, the input information may be in the form of text or audio. And converting input information into parameters matched with the texture map and the shape constraint map after data analysis, and finally completing generation of a guaranteed image by calling the texture map and the shape constraint map by using the reconstructed model obtained after training.
In the prediction stage, a texture map of a specific area of a target object and shape constraints of specific elements can be given, image information of a two-dimensional anchor broadcast image is predicted by using a trained reconstruction model, and a continuous anchor broadcast image is predicted by taking the shape constraints of continuous specific elements and the textures of fixed specific areas as input.
The apparatus shown in fig. 6 may correspondingly execute the content in the above method embodiment, and details of the part not described in detail in this embodiment refer to the content described in the above method embodiment, which is not described again here.
Referring to fig. 7, an embodiment of the present disclosure also provides an electronic device 70, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating a fidelity image of the method embodiments described above.
The disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the foregoing method embodiments.
The disclosed embodiments also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the fidelity image generation method of the aforementioned method embodiments.
Referring now to FIG. 7, a schematic diagram of an electronic device 70 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, the electronic device 70 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage means 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 70 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, or the like; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 70 to communicate wirelessly or by wire with other devices to exchange data. While the figures illustrate an electronic device 70 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.
Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (11)

1. A fidelity image generation method, comprising:
acquiring a plurality of images containing a target object, one or more continuous actions of the target object being determinable based on the plurality of images;
acquiring a texture map of a specific area on the target object and a shape constraint map of a specific element in the plurality of images;
constructing a reconstructed model of the target object based on the texture map, the shape constraint map, and two-dimensional image information of the plurality of images;
generating a fidelity image matched with input information of the target object by using the reconstruction model, wherein the fidelity image comprises one or more predicted actions matched with the input information;
wherein the generating a fidelity image matched with the input information of the target object by using the reconstruction model comprises:
acquiring input information aiming at the target object, and analyzing the input information to obtain a first analysis result;
performing model quantization on the first analysis result to obtain a target object motion quantization vector;
generating a plurality of fidelity images matched to the motion quantization vector.
2. The method of claim 1, wherein said acquiring a plurality of images containing a target object comprises:
adopting camera equipment to carry out video acquisition on the target object to obtain a video file containing a plurality of video frames;
and selecting part or all of the video frames from the video file to form a plurality of images containing the target object.
3. The method of claim 1, wherein said acquiring a plurality of images containing a target object comprises:
setting broadcast samples of different styles aiming at a target object;
acquiring a sample video of the target object aiming at the broadcast samples of different styles;
a plurality of images including a target object are acquired from the sample video.
4. The method according to claim 1, wherein the obtaining a texture map of a specific region and a shape constraint map of a specific element on the target object in the plurality of images comprises:
3D reconstruction is carried out on the specific area on the target object to obtain a 3D area object;
acquiring a three-dimensional grid of the 3D area object, wherein the three-dimensional grid comprises a preset coordinate value;
determining a texture map for the particular region based on pixel values at different three-dimensional grid coordinates.
5. The method according to claim 4, wherein said obtaining a texture map of a specific region and a shape constraint map of a specific element on the target object in the plurality of images further comprises:
performing keypoint detection for a specific element in the plurality of images, resulting in a plurality of keypoints related to the specific element;
forming a shape constraint graph describing the particular element based on the plurality of keypoints.
6. The method of claim 1, wherein constructing the reconstructed model of the target object based on the texture map, the shape constraint map, and two-dimensional image information of the plurality of images comprises:
and setting a convolutional neural network for training the reconstruction model, and training an image containing the target object by using the convolutional neural network, wherein the input of the last layer of the convolutional neural network is consistent with the node input of the input layer.
7. The method of claim 6, wherein said training an image containing said target object with said convolutional neural network comprises:
measuring a prediction error by using a mean square error function, wherein the prediction error is used for describing the difference between an output pictographic frame and an artificial collection frame;
and reducing the prediction error by adopting a back propagation function.
8. The method of claim 1, wherein the generating a plurality of fidelity images that match the motion quantization vector comprises:
taking the texture map as a fixed input of the fidelity image;
determining a motion constraint value for the particular element based on element values in the motion quantization vector;
and predicting a plurality of fidelity images matched with the input information through continuous motion constraint values and the fixed texture maps.
9. A fidelity image generation apparatus, comprising:
an acquisition module for acquiring a plurality of images containing a target object, one or more continuous actions of the target object being determinable based on the plurality of images;
an obtaining module, configured to determine, in the plurality of images, a texture map of a specific region on the target object and a shape constraint map of a specific element;
a construction module for constructing a reconstructed model of the target object based on the texture map, the shape constraint map, and two-dimensional image information of the plurality of images;
a generation module for generating a fidelity image matched with the input information of the target object by using the reconstruction model, wherein the fidelity image comprises one or more predicted actions matched with the input information;
wherein the generating a fidelity image matched with the input information of the target object by using the reconstruction model comprises:
acquiring input information aiming at the target object, and analyzing the input information to obtain a first analysis result;
performing model quantization on the first analysis result to obtain a target object motion quantization vector;
generating a plurality of fidelity images matched to the motion quantization vector.
10. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the fidelity image generation method of any of the preceding claims 1-8.
11. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the fidelity image generation method of any of the preceding claims 1-8.
CN201910216551.4A 2019-03-21 2019-03-21 Fidelity image generation method and device and electronic equipment Active CN110035271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910216551.4A CN110035271B (en) 2019-03-21 2019-03-21 Fidelity image generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910216551.4A CN110035271B (en) 2019-03-21 2019-03-21 Fidelity image generation method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110035271A CN110035271A (en) 2019-07-19
CN110035271B true CN110035271B (en) 2020-06-02

Family

ID=67236469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910216551.4A Active CN110035271B (en) 2019-03-21 2019-03-21 Fidelity image generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110035271B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532891B (en) * 2019-08-05 2022-04-05 北京地平线机器人技术研发有限公司 Target object state identification method, device, medium and equipment
CN111368137A (en) * 2020-02-12 2020-07-03 百度在线网络技术(北京)有限公司 Video generation method and device, electronic equipment and readable storage medium
CN111294665B (en) * 2020-02-12 2021-07-20 百度在线网络技术(北京)有限公司 Video generation method and device, electronic equipment and readable storage medium
CN114125492B (en) * 2022-01-24 2022-07-15 阿里巴巴(中国)有限公司 Live content generation method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106652025A (en) * 2016-12-20 2017-05-10 五邑大学 Three-dimensional face modeling method and three-dimensional face modeling printing device based on video streaming and face multi-attribute matching
CN106651978A (en) * 2016-10-10 2017-05-10 讯飞智元信息科技有限公司 Face image prediction method and system
CN107463888A (en) * 2017-07-21 2017-12-12 竹间智能科技(上海)有限公司 Face mood analysis method and system based on multi-task learning and deep learning
CN107977511A (en) * 2017-11-30 2018-05-01 浙江传媒学院 A kind of industrial design material high-fidelity real-time emulation algorithm based on deep learning
CN108229239A (en) * 2016-12-09 2018-06-29 武汉斗鱼网络科技有限公司 A kind of method and device of image procossing
CN108280883A (en) * 2018-02-07 2018-07-13 北京市商汤科技开发有限公司 It deforms the generation of special efficacy program file packet and deforms special efficacy generation method and device
CN108961369A (en) * 2018-07-11 2018-12-07 厦门幻世网络科技有限公司 The method and apparatus for generating 3D animation
CN109255830A (en) * 2018-08-31 2019-01-22 百度在线网络技术(北京)有限公司 Three-dimensional facial reconstruction method and device
CN109325437A (en) * 2018-09-17 2019-02-12 北京旷视科技有限公司 Image processing method, device and system
CN109344693A (en) * 2018-08-13 2019-02-15 华南理工大学 A kind of face multizone fusion expression recognition method based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5786259B2 (en) * 2011-08-09 2015-09-30 インテル・コーポレーション Parameterized 3D face generation
GB2510200B (en) * 2013-01-29 2017-05-10 Toshiba Res Europe Ltd A computer generated head

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651978A (en) * 2016-10-10 2017-05-10 讯飞智元信息科技有限公司 Face image prediction method and system
CN108229239A (en) * 2016-12-09 2018-06-29 武汉斗鱼网络科技有限公司 A kind of method and device of image procossing
CN106652025A (en) * 2016-12-20 2017-05-10 五邑大学 Three-dimensional face modeling method and three-dimensional face modeling printing device based on video streaming and face multi-attribute matching
CN107463888A (en) * 2017-07-21 2017-12-12 竹间智能科技(上海)有限公司 Face mood analysis method and system based on multi-task learning and deep learning
CN107977511A (en) * 2017-11-30 2018-05-01 浙江传媒学院 A kind of industrial design material high-fidelity real-time emulation algorithm based on deep learning
CN108280883A (en) * 2018-02-07 2018-07-13 北京市商汤科技开发有限公司 It deforms the generation of special efficacy program file packet and deforms special efficacy generation method and device
CN108961369A (en) * 2018-07-11 2018-12-07 厦门幻世网络科技有限公司 The method and apparatus for generating 3D animation
CN109344693A (en) * 2018-08-13 2019-02-15 华南理工大学 A kind of face multizone fusion expression recognition method based on deep learning
CN109255830A (en) * 2018-08-31 2019-01-22 百度在线网络技术(北京)有限公司 Three-dimensional facial reconstruction method and device
CN109325437A (en) * 2018-09-17 2019-02-12 北京旷视科技有限公司 Image processing method, device and system

Also Published As

Publication number Publication date
CN110035271A (en) 2019-07-19

Similar Documents

Publication Publication Date Title
CN110035271B (en) Fidelity image generation method and device and electronic equipment
CN110047121B (en) End-to-end animation generation method and device and electronic equipment
CN110189394B (en) Mouth shape generation method and device and electronic equipment
CN110047119B (en) Animation generation method and device comprising dynamic background and electronic equipment
WO2021004247A1 (en) Method and apparatus for generating video cover and electronic device
KR102346046B1 (en) 3d virtual figure mouth shape control method and device
CN112492380B (en) Sound effect adjusting method, device, equipment and storage medium
CN110072047B (en) Image deformation control method and device and hardware device
KR20220148915A (en) Audio processing methods, apparatus, readable media and electronic devices
CN110288532B (en) Method, apparatus, device and computer readable storage medium for generating whole body image
CN110060324B (en) Image rendering method and device and electronic equipment
CN114693876A (en) Digital human generation method, device, storage medium and electronic equipment
WO2023231918A1 (en) Image processing method and apparatus, and electronic device and storage medium
CN109977925B (en) Expression determination method and device and electronic equipment
CN112734631A (en) Video image face changing method, device, equipment and medium based on fine adjustment model
CN109816791B (en) Method and apparatus for generating information
WO2020077912A1 (en) Image processing method, device, and hardware device
CN112070888B (en) Image generation method, device, equipment and computer readable medium
CN114090817A (en) Dynamic construction method and device of face feature database and storage medium
CN114049403A (en) Multi-angle three-dimensional face reconstruction method and device and storage medium
CN114677738A (en) MV recording method, MV recording device, electronic equipment and computer readable storage medium
CN111696041B (en) Image processing method and device and electronic equipment
US11876843B2 (en) Method, apparatus, medium and electronic device for generating round-table video conference
CN111586261B (en) Target video processing method and device and electronic equipment
CN114339356B (en) Video recording method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.