CN111353069A - Character scene video generation method, system, device and storage medium - Google Patents

Character scene video generation method, system, device and storage medium Download PDF

Info

Publication number
CN111353069A
CN111353069A CN202010079892.4A CN202010079892A CN111353069A CN 111353069 A CN111353069 A CN 111353069A CN 202010079892 A CN202010079892 A CN 202010079892A CN 111353069 A CN111353069 A CN 111353069A
Authority
CN
China
Prior art keywords
image
sample
generation
label
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010079892.4A
Other languages
Chinese (zh)
Inventor
李�权
叶俊杰
王伦基
黄桂芳
任勇
韩蓝青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CYAGEN BIOSCIENCES (GUANGZHOU) Inc
Research Institute Of Tsinghua Pearl River Delta
Original Assignee
CYAGEN BIOSCIENCES (GUANGZHOU) Inc
Research Institute Of Tsinghua Pearl River Delta
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CYAGEN BIOSCIENCES (GUANGZHOU) Inc, Research Institute Of Tsinghua Pearl River Delta filed Critical CYAGEN BIOSCIENCES (GUANGZHOU) Inc
Priority to CN202010079892.4A priority Critical patent/CN111353069A/en
Publication of CN111353069A publication Critical patent/CN111353069A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a character scene video generation method, a system, a device and a storage medium, wherein a confrontation network model is generated through training, and a label image with a limiting condition is input into the trained generated confrontation network model, so that a real person picture corresponding to the limiting condition can be output, the limiting condition can guide the generated confrontation network model to generate a real image corresponding to the limiting condition, and therefore, more precise content control can be performed on generated content, and a more controllable high-definition image can be generated. And new limiting conditions can be added according to new generation requirements generated in subsequent use, so that the generated content is expanded more abundantly according to the requirements; and each video is not required to be recorded by a real person, so that the method has higher production efficiency and richer expansion forms. The invention is widely applied to the technical field of computers.

Description

Character scene video generation method, system, device and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a method, a system, a device and a storage medium for generating a character scene video.
Background
With the continuous development of virtual reality technology and/or augmented reality, more and more three-dimensional models are used for sharing applications, and three-dimensional scenes are constructed through the three-dimensional models for sharing applications, and the three-dimensional scenes are widely applied to many fields, so that more visual enjoyment can be provided for users to a great extent, and the user experience is improved.
Most of the existing figure image synthesis methods adopt a Computer Graphics (CG) method, and through a plurality of plates such as modeling, synthesis, material, rendering and the like, an object model is firstly built up one block, then different parts are subjected to chartlet rendering to achieve a more real effect, and finally the object model is fused with a real environment. In each step, a great deal of energy is required for professionals, each image needs to be finely processed, the whole manufacturing time is long, the labor cost is high, and the requirements of high quality and high efficiency cannot be met at the same time; in the existing mode of making video content according to the manuscript, each video must have real characters for recording, a large amount of time is needed, and the working efficiency is low.
Disclosure of Invention
In order to solve at least one of the above problems, the present invention provides a method, a system, an apparatus and a storage medium for generating a character scene video.
The technical scheme adopted by the invention is as follows: in one aspect, an embodiment of the present invention includes a method for generating a character scene video, including:
acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a human face contour, a human body key point skeleton, a human body contour, a head contour and a background;
receiving the first image by using a trained generation confrontation network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
acquiring a voice signal;
and combining the second image with the voice signal to generate a character scene video.
Further, the method further includes training generation of the confrontation network model, including:
constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
the training set is obtained to train a generative antagonistic network model.
Further, the method further comprises detecting generation of a countering network model, including:
modifying the label sample;
generating a counternetwork model to obtain a modified label sample;
whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
Further, the step of modifying the label sample specifically includes:
extracting key points and masks of the person image sample and the person video sample to obtain a label sample;
the keypoint coordinate locations and mask shapes are changed to modify the label exemplars.
Further, the generating of the confrontation network model comprises generating a network and judging the network;
the generation network is used for receiving the first image and generating a second image;
and the judging network is used for judging the truth of the second image.
Further, the generation network comprises a plurality of sub-networks, including a first sub-network and a second sub-network;
the first sub-network is used for generating an image containing global information;
the second sub-network is used for carrying out local detail enhancement on the image generated by the first sub-network so as to output an image containing local detail features.
Further, the step of judging the authenticity of the second image by the judging network specifically includes:
cropping the second image into a plurality of images of different scales;
judging on the images with different scales by using a multi-scale discriminator to obtain a plurality of judgment result values;
calculating an average value of the plurality of discrimination result values;
and judging the truth of the second image according to the calculated average value.
On the other hand, the embodiment of the invention also comprises a character scene video generation system, which comprises a test module and a training module;
the test module is used for:
acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a human face contour, a human body key point skeleton, a human body contour, a head contour and a background;
receiving the first image by using a trained generation confrontation network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
acquiring a voice signal;
combining the second image with the voice signal to generate a character scene video;
the training module is used for training the generation confrontation network model through the following processes:
constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
acquiring the training set to train a generative countermeasure network model;
whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
In another aspect, an embodiment of the present invention further includes a character scene video generating apparatus, including a processor and a memory, wherein,
the memory is to store program instructions;
the processor is used for reading the program instructions in the memory and executing the character scene video generation method according to the program instructions in the memory.
In another aspect, embodiments of the present invention also include a computer-readable storage medium, wherein,
the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, executes the person scene video generation method of an embodiment.
The invention has the beneficial effects that: according to the embodiment of the invention, the countermeasure network model is generated by training, and the label image with the limiting condition is input into the trained countermeasure network model, so that the real person picture corresponding to the limiting condition can be output, and the limiting condition can guide the generation of the countermeasure network model to generate the real image corresponding to the limiting condition, so that the generation content can be more finely controlled, and a more controllable high-definition image can be generated. And new limiting conditions can be added according to new generation requirements generated in subsequent use, so that the generated content is expanded more abundantly according to the requirements; and each video is not required to be recorded by a real person, so that the method has higher production efficiency and richer expansion forms.
Drawings
Fig. 1 is a flowchart of a method for generating a character scene video according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a character scene video generation system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a character scene video generation apparatus according to an embodiment of the present invention.
Detailed Description
Fig. 1 is a flowchart of a method for generating a character scene video according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
s1, acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a face contour, a human body key point skeleton, a human body contour, a head contour and a background;
s2, receiving the first image by using a trained generation countermeasure network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
s3, acquiring a voice signal;
and S4, combining the second image with the voice signal to generate a character scene video.
In the present embodiment, the conversion of the label image with the constraint condition into the real image corresponding to the constraint condition is mainly performed using a trained generative confrontation network model (GAN model). The limiting conditions include a face contour, a body key point skeleton, a body contour, a head contour and a background, for example, the face contour condition can guide the trained generation countermeasure network model to generate a vivid face at a corresponding position of the contour, the clothes contour condition can guide the trained generation countermeasure network model to generate a corresponding upper body and corresponding clothes at a corresponding position, and the body key point contour condition can guide the trained generation countermeasure network model to generate a real human body with a corresponding height at a corresponding position.
In this embodiment, acquiring the first image, that is, acquiring the label image with the limitation condition specifically includes the following processes:
and extracting key points and masks of the character scene image or video to construct a label image. For example, to acquire a label image with a face contour condition, a key point detection method is used for detecting key points from a person scene image or a video, and connection is performed, so that the label image with the face contour limiting condition can be generated; similarly, if a label image with a clothing contour condition is to be acquired, the image segmentation method is used for segmenting the clothing in the scene image or video of the character, and the mask of the clothing and/or the tie is acquired, so that the label image with the clothing contour limitation condition can be acquired.
In this embodiment, the training process for generating the antagonistic network model includes the following steps:
p1, constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
p2, acquiring the training set to train the generative countermeasure network model.
In this embodiment, after training the generation countermeasure network model, the generation countermeasure network model is also detected, and the process specifically includes the following steps:
D1. modifying the label sample;
D2. generating a counternetwork model to obtain a modified label sample;
D3. whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
In the embodiment, key points and masks are extracted from a character image sample and a character video sample to obtain a label sample;
by changing the keypoint coordinate locations and the mask shape, the label exemplars can be modified.
In this embodiment, the generating the confrontation network model includes generating a network and determining the network; the generation network is used for receiving the first image and generating a second image; and the judging network is used for judging the truth of the second image. That is, after receiving input and generating a label image with a limiting condition in a countermeasure network model, the generation network generates a real image corresponding to the limiting condition; for example, an image with a human face contour is input, and after the image is received by the generating network, a vivid human face is generated at the corresponding position of the contour.
In this embodiment, the generation network includes a plurality of sub-networks, including a first sub-network and a second sub-network, that is, the generation network G may be split into two sub-networks G ═ { G1, G2}, where the G1 generation network is an end2end network using a U-net structure, and is used to generate a lower resolution image (e.g., 1024x 512) containing global information, and G2 is used to output a high resolution image (e.g., 2048x 1024) by using the output of G1 for local detail enhancement; by analogy, if a higher definition image needs to be generated, only a more detail enhancement generation network needs to be added (e.g., G ═ G1, G2, G3).
As an optional specific implementation manner, the step of determining, by the network, the authenticity of the second image specifically includes
Cropping the second image into a plurality of images of different scales;
judging on the images with different scales by using a multi-scale discriminator to obtain a plurality of judgment result values;
calculating an average value of the plurality of discrimination result values;
and judging the truth of the second image according to the calculated average value.
In this embodiment, the second image is cut into 3 images with different scales, where the second image is an image output by the generated network processing, the discrimination network D adopts a multi-scale discriminator to discriminate values on three different image scales, and finally, the patch discrimination result values of the three scales are merged to obtain an average value. The three dimensions of the discrimination network are: artwork size, 1/2 size, and 1/4 size.
In this embodiment, an idea based on a pix2pixHD network and using a conditional GAN is adopted to generate a high-definition character scene video. The pix2pixHD adds a feature matching technology, the feature maps of all layers (except an output layer) in a judging network are taken to be used as the feature matching, and after a feature matching loss function is added, the loss function of the pix2pixHD is as follows:
Figure BDA0002379924720000051
the formula is divided into GAN loss and Feature matching loss, a network D is judged in the GAN loss to continuously and iteratively maximize an objective function, and a network G is generated to continuously and iteratively minimize the GAN loss and Feature matching loss so as to ensure that a clearer and more detailed image is generated.
In summary, the character scene video generation method in the embodiment has the following advantages:
the countermeasure network model is generated through training, the label image with the limiting condition is input into the trained countermeasure network model, so that the real person picture corresponding to the limiting condition can be output, the limiting condition can guide the generation of the countermeasure network model to generate the real image corresponding to the limiting condition, and therefore more precise content control can be performed on the generated content, and a more controllable high-definition image can be generated. And new limiting conditions can be added according to new generation requirements generated in subsequent use, so that the generated content is expanded more abundantly according to the requirements; and each video is not required to be recorded by a real person, so that the method has higher production efficiency and richer expansion forms.
Referring to fig. 2, the embodiment of the present invention further includes a system for generating a character scene video, including a testing module and a training module;
the test module is used for:
acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a human face contour, a human body key point skeleton, a human body contour, a head contour and a background
Receiving the first image by using a trained generation confrontation network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
acquiring a voice signal;
combining the second image with the voice signal to generate a character scene video;
the training module is used for training the generation confrontation network model through the following processes:
constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
acquiring the training set to train a generative countermeasure network model;
whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
The test module and the training module respectively refer to a hardware module, a software module or a combination of the hardware module and the software module with the same function. Different modules may share the same hardware or software elements.
The character scene video generation system may be a server or a personal computer, and the same technical effect as the method of converting voice into a lip shape may be achieved by operating the system, which is obtained by writing the method of converting voice into a lip shape into a computer program and writing the computer program into the server or the personal computer.
Fig. 3 is a schematic structural diagram of a character scene video generating apparatus according to an embodiment of the present invention, please refer to fig. 3, where the apparatus 60 may include a processor 601 and a memory 602. Wherein the content of the first and second substances,
the memory 602 is used to store program instructions;
the processor 601 is configured to read the program instructions in the memory 602 and execute the method for extracting the avatar gestures according to the embodiment shown in the embodiment according to the program instructions in the memory 602.
The memory may also be separately produced and used to store a computer program corresponding to the virtual character expression and motion extraction method. When the memory is connected with the processor, the stored computer program is read out by the processor and executed, so that the method for extracting the expression and the action of the virtual character is implemented, and the technical effect of the embodiment is achieved.
The present embodiment also includes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, executes the method for extracting the expressive action of the virtual character shown in the embodiment.
It should be noted that, unless otherwise specified, when a feature is referred to as being "fixed" or "connected" to another feature, it may be directly fixed or connected to the other feature or indirectly fixed or connected to the other feature. Furthermore, the descriptions of upper, lower, left, right, etc. used in the present disclosure are only relative to the mutual positional relationship of the constituent parts of the present disclosure in the drawings. As used in this disclosure, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. In addition, unless defined otherwise, all technical and scientific terms used in this example have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the description of the embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this embodiment, the term "and/or" includes any combination of one or more of the associated listed items.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element of the same type from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. The use of any and all examples, or exemplary language ("e.g.," such as "or the like") provided with this embodiment is intended merely to better illuminate embodiments of the invention and does not pose a limitation on the scope of the invention unless otherwise claimed.
It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, operations of processes described in this embodiment can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described in this embodiment (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described in this embodiment includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein.
A computer program can be applied to input data to perform the functions described in the present embodiment to convert the input data to generate output data that is stored to a non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
The above description is only a preferred embodiment of the present invention, and the present invention is not limited to the above embodiment, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention as long as the technical effects of the present invention are achieved by the same means. The invention is capable of other modifications and variations in its technical solution and/or its implementation, within the scope of protection of the invention.

Claims (10)

1. A method for generating a character scene video, comprising:
acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a human face contour, a human body key point skeleton, a human body contour, a head contour and a background;
receiving the first image by using a trained generation confrontation network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
acquiring a voice signal;
and combining the second image with the voice signal to generate a character scene video.
2. The method of claim 1, further comprising training generation of a confrontational network model, comprising:
constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
the training set is obtained to train a generative antagonistic network model.
3. The method of claim 2, further comprising detecting generation of a confrontational network model, comprising:
modifying the label sample;
generating a counternetwork model to obtain a modified label sample;
whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
4. The method of claim 3, wherein the step of modifying the label sample comprises:
extracting key points and masks of the person image sample and the person video sample to obtain a label sample;
the keypoint coordinate locations and mask shapes are changed to modify the label exemplars.
5. The character scene video generation method of claim 3, wherein the generation of the confrontation network model includes generation of a network and discrimination network;
the generation network is used for receiving the first image and generating a second image;
and the judging network is used for judging the truth of the second image.
6. The character scene video generation method of claim 5, wherein the generation network includes a plurality of sub-networks including a first sub-network and a second sub-network;
the first sub-network is used for generating an image containing global information;
the second sub-network is used for carrying out local detail enhancement on the image generated by the first sub-network so as to output an image containing local detail features.
7. The method as claimed in claim 5, wherein the step of determining the degree of reality of the second image by a determination network comprises:
cropping the second image into a plurality of images of different scales;
judging on the images with different scales by using a multi-scale discriminator to obtain a plurality of judgment result values;
calculating an average value of the plurality of discrimination result values;
and judging the truth of the second image according to the calculated average value.
8. A character scene video generation system is characterized by comprising a test module and a training module;
the test module is used for:
acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a human face contour, a human body key point skeleton, a human body contour, a head contour and a background;
receiving the first image by using a trained generation confrontation network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
acquiring a voice signal;
combining the second image with the voice signal to generate a character scene video;
the training module is used for training the generation confrontation network model through the following processes:
constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
acquiring the training set to train a generative countermeasure network model;
whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
9. A character scene video generating apparatus, comprising a processor and a memory, wherein,
the memory is to store program instructions;
the processor is used for reading the program instructions in the memory and executing the character scene video generation method as claimed in any one of claims 1 to 7 according to the program instructions in the memory.
10. A computer-readable storage medium, characterized in that,
a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs a method of generating a video of a character scene as claimed in any one of claims 1 to 7.
CN202010079892.4A 2020-02-04 2020-02-04 Character scene video generation method, system, device and storage medium Pending CN111353069A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010079892.4A CN111353069A (en) 2020-02-04 2020-02-04 Character scene video generation method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010079892.4A CN111353069A (en) 2020-02-04 2020-02-04 Character scene video generation method, system, device and storage medium

Publications (1)

Publication Number Publication Date
CN111353069A true CN111353069A (en) 2020-06-30

Family

ID=71195684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010079892.4A Pending CN111353069A (en) 2020-02-04 2020-02-04 Character scene video generation method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN111353069A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270651A (en) * 2020-10-15 2021-01-26 西安工程大学 Image restoration method for generating countermeasure network based on multi-scale discrimination
CN112329932A (en) * 2020-10-30 2021-02-05 深圳市优必选科技股份有限公司 Training method and device for generating countermeasure network and terminal equipment
CN112734657A (en) * 2020-12-28 2021-04-30 杨文龙 Cloud group photo method and device based on artificial intelligence and three-dimensional model and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679502A (en) * 2017-10-12 2018-02-09 南京行者易智能交通科技有限公司 A kind of Population size estimation method based on the segmentation of deep learning image, semantic
CN108205659A (en) * 2017-11-30 2018-06-26 深圳市深网视界科技有限公司 Face occluder removes and its method, equipment and the medium of model construction
CN109377448A (en) * 2018-05-20 2019-02-22 北京工业大学 A kind of facial image restorative procedure based on generation confrontation network
CN109635745A (en) * 2018-12-13 2019-04-16 广东工业大学 A method of Multi-angle human face image is generated based on confrontation network model is generated
CN109819313A (en) * 2019-01-10 2019-05-28 腾讯科技(深圳)有限公司 Method for processing video frequency, device and storage medium
CN110008832A (en) * 2019-02-27 2019-07-12 西安电子科技大学 Based on deep learning character image automatic division method, information data processing terminal
CN110059217A (en) * 2019-04-29 2019-07-26 广西师范大学 A kind of image text cross-media retrieval method of two-level network
CN110349081A (en) * 2019-06-17 2019-10-18 达闼科技(北京)有限公司 Generation method, device, storage medium and the electronic equipment of image

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679502A (en) * 2017-10-12 2018-02-09 南京行者易智能交通科技有限公司 A kind of Population size estimation method based on the segmentation of deep learning image, semantic
CN108205659A (en) * 2017-11-30 2018-06-26 深圳市深网视界科技有限公司 Face occluder removes and its method, equipment and the medium of model construction
CN109377448A (en) * 2018-05-20 2019-02-22 北京工业大学 A kind of facial image restorative procedure based on generation confrontation network
CN109635745A (en) * 2018-12-13 2019-04-16 广东工业大学 A method of Multi-angle human face image is generated based on confrontation network model is generated
CN109819313A (en) * 2019-01-10 2019-05-28 腾讯科技(深圳)有限公司 Method for processing video frequency, device and storage medium
CN110008832A (en) * 2019-02-27 2019-07-12 西安电子科技大学 Based on deep learning character image automatic division method, information data processing terminal
CN110059217A (en) * 2019-04-29 2019-07-26 广西师范大学 A kind of image text cross-media retrieval method of two-level network
CN110349081A (en) * 2019-06-17 2019-10-18 达闼科技(北京)有限公司 Generation method, device, storage medium and the electronic equipment of image

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270651A (en) * 2020-10-15 2021-01-26 西安工程大学 Image restoration method for generating countermeasure network based on multi-scale discrimination
CN112270651B (en) * 2020-10-15 2023-12-15 西安工程大学 Image restoration method for generating countermeasure network based on multi-scale discrimination
CN112329932A (en) * 2020-10-30 2021-02-05 深圳市优必选科技股份有限公司 Training method and device for generating countermeasure network and terminal equipment
CN112734657A (en) * 2020-12-28 2021-04-30 杨文龙 Cloud group photo method and device based on artificial intelligence and three-dimensional model and storage medium
CN112734657B (en) * 2020-12-28 2023-04-07 杨文龙 Cloud group photo method and device based on artificial intelligence and three-dimensional model and storage medium

Similar Documents

Publication Publication Date Title
CN109359538B (en) Training method of convolutional neural network, gesture recognition method, device and equipment
CN111243093B (en) Three-dimensional face grid generation method, device, equipment and storage medium
CN110610453B (en) Image processing method and device and computer readable storage medium
KR102304674B1 (en) Facial expression synthesis method and apparatus, electronic device, and storage medium
US8861800B2 (en) Rapid 3D face reconstruction from a 2D image and methods using such rapid 3D face reconstruction
JP5645079B2 (en) Image processing apparatus and method, program, and recording medium
CN100407798C (en) Three-dimensional geometric mode building system and method
TWI484444B (en) Non-transitory computer readable medium, electronic device, and computer system for face feature vector construction
CN111710036B (en) Method, device, equipment and storage medium for constructing three-dimensional face model
CN109753885A (en) A kind of object detection method, device and pedestrian detection method, system
US8207987B2 (en) Method and apparatus for producing digital cartoons
CN111353069A (en) Character scene video generation method, system, device and storage medium
CN111046763A (en) Portrait cartoon method and device
CN106068537A (en) For the method and apparatus processing image
CN112132739A (en) 3D reconstruction and human face posture normalization method, device, storage medium and equipment
CN111291674A (en) Method, system, device and medium for extracting expression and action of virtual character
CN111667005A (en) Human body interaction system adopting RGBD visual sensing
JP6052533B2 (en) Feature amount extraction apparatus and feature amount extraction method
Liu et al. Stereo video object segmentation using stereoscopic foreground trajectories
JP2017033556A (en) Image processing method and electronic apparatus
KR101305725B1 (en) Augmented reality of logo recognition and the mrthod
CN114862716A (en) Image enhancement method, device and equipment for face image and storage medium
CN116862920A (en) Portrait segmentation method, device, equipment and medium
CN116228850A (en) Object posture estimation method, device, electronic equipment and readable storage medium
CN111368853A (en) Label construction method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200630