CN111353069A - Character scene video generation method, system, device and storage medium - Google Patents
Character scene video generation method, system, device and storage medium Download PDFInfo
- Publication number
- CN111353069A CN111353069A CN202010079892.4A CN202010079892A CN111353069A CN 111353069 A CN111353069 A CN 111353069A CN 202010079892 A CN202010079892 A CN 202010079892A CN 111353069 A CN111353069 A CN 111353069A
- Authority
- CN
- China
- Prior art keywords
- image
- sample
- generation
- label
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000004590 computer program Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 7
- 230000003042 antagnostic effect Effects 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013256 Gubra-Amylin NASH model Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/738—Presentation of query results
- G06F16/739—Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7834—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
- G06F16/784—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention discloses a character scene video generation method, a system, a device and a storage medium, wherein a confrontation network model is generated through training, and a label image with a limiting condition is input into the trained generated confrontation network model, so that a real person picture corresponding to the limiting condition can be output, the limiting condition can guide the generated confrontation network model to generate a real image corresponding to the limiting condition, and therefore, more precise content control can be performed on generated content, and a more controllable high-definition image can be generated. And new limiting conditions can be added according to new generation requirements generated in subsequent use, so that the generated content is expanded more abundantly according to the requirements; and each video is not required to be recorded by a real person, so that the method has higher production efficiency and richer expansion forms. The invention is widely applied to the technical field of computers.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method, a system, a device and a storage medium for generating a character scene video.
Background
With the continuous development of virtual reality technology and/or augmented reality, more and more three-dimensional models are used for sharing applications, and three-dimensional scenes are constructed through the three-dimensional models for sharing applications, and the three-dimensional scenes are widely applied to many fields, so that more visual enjoyment can be provided for users to a great extent, and the user experience is improved.
Most of the existing figure image synthesis methods adopt a Computer Graphics (CG) method, and through a plurality of plates such as modeling, synthesis, material, rendering and the like, an object model is firstly built up one block, then different parts are subjected to chartlet rendering to achieve a more real effect, and finally the object model is fused with a real environment. In each step, a great deal of energy is required for professionals, each image needs to be finely processed, the whole manufacturing time is long, the labor cost is high, and the requirements of high quality and high efficiency cannot be met at the same time; in the existing mode of making video content according to the manuscript, each video must have real characters for recording, a large amount of time is needed, and the working efficiency is low.
Disclosure of Invention
In order to solve at least one of the above problems, the present invention provides a method, a system, an apparatus and a storage medium for generating a character scene video.
The technical scheme adopted by the invention is as follows: in one aspect, an embodiment of the present invention includes a method for generating a character scene video, including:
acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a human face contour, a human body key point skeleton, a human body contour, a head contour and a background;
receiving the first image by using a trained generation confrontation network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
acquiring a voice signal;
and combining the second image with the voice signal to generate a character scene video.
Further, the method further includes training generation of the confrontation network model, including:
constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
the training set is obtained to train a generative antagonistic network model.
Further, the method further comprises detecting generation of a countering network model, including:
modifying the label sample;
generating a counternetwork model to obtain a modified label sample;
whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
Further, the step of modifying the label sample specifically includes:
extracting key points and masks of the person image sample and the person video sample to obtain a label sample;
the keypoint coordinate locations and mask shapes are changed to modify the label exemplars.
Further, the generating of the confrontation network model comprises generating a network and judging the network;
the generation network is used for receiving the first image and generating a second image;
and the judging network is used for judging the truth of the second image.
Further, the generation network comprises a plurality of sub-networks, including a first sub-network and a second sub-network;
the first sub-network is used for generating an image containing global information;
the second sub-network is used for carrying out local detail enhancement on the image generated by the first sub-network so as to output an image containing local detail features.
Further, the step of judging the authenticity of the second image by the judging network specifically includes:
cropping the second image into a plurality of images of different scales;
judging on the images with different scales by using a multi-scale discriminator to obtain a plurality of judgment result values;
calculating an average value of the plurality of discrimination result values;
and judging the truth of the second image according to the calculated average value.
On the other hand, the embodiment of the invention also comprises a character scene video generation system, which comprises a test module and a training module;
the test module is used for:
acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a human face contour, a human body key point skeleton, a human body contour, a head contour and a background;
receiving the first image by using a trained generation confrontation network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
acquiring a voice signal;
combining the second image with the voice signal to generate a character scene video;
the training module is used for training the generation confrontation network model through the following processes:
constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
acquiring the training set to train a generative countermeasure network model;
whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
In another aspect, an embodiment of the present invention further includes a character scene video generating apparatus, including a processor and a memory, wherein,
the memory is to store program instructions;
the processor is used for reading the program instructions in the memory and executing the character scene video generation method according to the program instructions in the memory.
In another aspect, embodiments of the present invention also include a computer-readable storage medium, wherein,
the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, executes the person scene video generation method of an embodiment.
The invention has the beneficial effects that: according to the embodiment of the invention, the countermeasure network model is generated by training, and the label image with the limiting condition is input into the trained countermeasure network model, so that the real person picture corresponding to the limiting condition can be output, and the limiting condition can guide the generation of the countermeasure network model to generate the real image corresponding to the limiting condition, so that the generation content can be more finely controlled, and a more controllable high-definition image can be generated. And new limiting conditions can be added according to new generation requirements generated in subsequent use, so that the generated content is expanded more abundantly according to the requirements; and each video is not required to be recorded by a real person, so that the method has higher production efficiency and richer expansion forms.
Drawings
Fig. 1 is a flowchart of a method for generating a character scene video according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a character scene video generation system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a character scene video generation apparatus according to an embodiment of the present invention.
Detailed Description
Fig. 1 is a flowchart of a method for generating a character scene video according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
s1, acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a face contour, a human body key point skeleton, a human body contour, a head contour and a background;
s2, receiving the first image by using a trained generation countermeasure network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
s3, acquiring a voice signal;
and S4, combining the second image with the voice signal to generate a character scene video.
In the present embodiment, the conversion of the label image with the constraint condition into the real image corresponding to the constraint condition is mainly performed using a trained generative confrontation network model (GAN model). The limiting conditions include a face contour, a body key point skeleton, a body contour, a head contour and a background, for example, the face contour condition can guide the trained generation countermeasure network model to generate a vivid face at a corresponding position of the contour, the clothes contour condition can guide the trained generation countermeasure network model to generate a corresponding upper body and corresponding clothes at a corresponding position, and the body key point contour condition can guide the trained generation countermeasure network model to generate a real human body with a corresponding height at a corresponding position.
In this embodiment, acquiring the first image, that is, acquiring the label image with the limitation condition specifically includes the following processes:
and extracting key points and masks of the character scene image or video to construct a label image. For example, to acquire a label image with a face contour condition, a key point detection method is used for detecting key points from a person scene image or a video, and connection is performed, so that the label image with the face contour limiting condition can be generated; similarly, if a label image with a clothing contour condition is to be acquired, the image segmentation method is used for segmenting the clothing in the scene image or video of the character, and the mask of the clothing and/or the tie is acquired, so that the label image with the clothing contour limitation condition can be acquired.
In this embodiment, the training process for generating the antagonistic network model includes the following steps:
p1, constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
p2, acquiring the training set to train the generative countermeasure network model.
In this embodiment, after training the generation countermeasure network model, the generation countermeasure network model is also detected, and the process specifically includes the following steps:
D1. modifying the label sample;
D2. generating a counternetwork model to obtain a modified label sample;
D3. whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
In the embodiment, key points and masks are extracted from a character image sample and a character video sample to obtain a label sample;
by changing the keypoint coordinate locations and the mask shape, the label exemplars can be modified.
In this embodiment, the generating the confrontation network model includes generating a network and determining the network; the generation network is used for receiving the first image and generating a second image; and the judging network is used for judging the truth of the second image. That is, after receiving input and generating a label image with a limiting condition in a countermeasure network model, the generation network generates a real image corresponding to the limiting condition; for example, an image with a human face contour is input, and after the image is received by the generating network, a vivid human face is generated at the corresponding position of the contour.
In this embodiment, the generation network includes a plurality of sub-networks, including a first sub-network and a second sub-network, that is, the generation network G may be split into two sub-networks G ═ { G1, G2}, where the G1 generation network is an end2end network using a U-net structure, and is used to generate a lower resolution image (e.g., 1024x 512) containing global information, and G2 is used to output a high resolution image (e.g., 2048x 1024) by using the output of G1 for local detail enhancement; by analogy, if a higher definition image needs to be generated, only a more detail enhancement generation network needs to be added (e.g., G ═ G1, G2, G3).
As an optional specific implementation manner, the step of determining, by the network, the authenticity of the second image specifically includes
Cropping the second image into a plurality of images of different scales;
judging on the images with different scales by using a multi-scale discriminator to obtain a plurality of judgment result values;
calculating an average value of the plurality of discrimination result values;
and judging the truth of the second image according to the calculated average value.
In this embodiment, the second image is cut into 3 images with different scales, where the second image is an image output by the generated network processing, the discrimination network D adopts a multi-scale discriminator to discriminate values on three different image scales, and finally, the patch discrimination result values of the three scales are merged to obtain an average value. The three dimensions of the discrimination network are: artwork size, 1/2 size, and 1/4 size.
In this embodiment, an idea based on a pix2pixHD network and using a conditional GAN is adopted to generate a high-definition character scene video. The pix2pixHD adds a feature matching technology, the feature maps of all layers (except an output layer) in a judging network are taken to be used as the feature matching, and after a feature matching loss function is added, the loss function of the pix2pixHD is as follows:
the formula is divided into GAN loss and Feature matching loss, a network D is judged in the GAN loss to continuously and iteratively maximize an objective function, and a network G is generated to continuously and iteratively minimize the GAN loss and Feature matching loss so as to ensure that a clearer and more detailed image is generated.
In summary, the character scene video generation method in the embodiment has the following advantages:
the countermeasure network model is generated through training, the label image with the limiting condition is input into the trained countermeasure network model, so that the real person picture corresponding to the limiting condition can be output, the limiting condition can guide the generation of the countermeasure network model to generate the real image corresponding to the limiting condition, and therefore more precise content control can be performed on the generated content, and a more controllable high-definition image can be generated. And new limiting conditions can be added according to new generation requirements generated in subsequent use, so that the generated content is expanded more abundantly according to the requirements; and each video is not required to be recorded by a real person, so that the method has higher production efficiency and richer expansion forms.
Referring to fig. 2, the embodiment of the present invention further includes a system for generating a character scene video, including a testing module and a training module;
the test module is used for:
acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a human face contour, a human body key point skeleton, a human body contour, a head contour and a background
Receiving the first image by using a trained generation confrontation network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
acquiring a voice signal;
combining the second image with the voice signal to generate a character scene video;
the training module is used for training the generation confrontation network model through the following processes:
constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
acquiring the training set to train a generative countermeasure network model;
whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
The test module and the training module respectively refer to a hardware module, a software module or a combination of the hardware module and the software module with the same function. Different modules may share the same hardware or software elements.
The character scene video generation system may be a server or a personal computer, and the same technical effect as the method of converting voice into a lip shape may be achieved by operating the system, which is obtained by writing the method of converting voice into a lip shape into a computer program and writing the computer program into the server or the personal computer.
Fig. 3 is a schematic structural diagram of a character scene video generating apparatus according to an embodiment of the present invention, please refer to fig. 3, where the apparatus 60 may include a processor 601 and a memory 602. Wherein the content of the first and second substances,
the memory 602 is used to store program instructions;
the processor 601 is configured to read the program instructions in the memory 602 and execute the method for extracting the avatar gestures according to the embodiment shown in the embodiment according to the program instructions in the memory 602.
The memory may also be separately produced and used to store a computer program corresponding to the virtual character expression and motion extraction method. When the memory is connected with the processor, the stored computer program is read out by the processor and executed, so that the method for extracting the expression and the action of the virtual character is implemented, and the technical effect of the embodiment is achieved.
The present embodiment also includes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, executes the method for extracting the expressive action of the virtual character shown in the embodiment.
It should be noted that, unless otherwise specified, when a feature is referred to as being "fixed" or "connected" to another feature, it may be directly fixed or connected to the other feature or indirectly fixed or connected to the other feature. Furthermore, the descriptions of upper, lower, left, right, etc. used in the present disclosure are only relative to the mutual positional relationship of the constituent parts of the present disclosure in the drawings. As used in this disclosure, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. In addition, unless defined otherwise, all technical and scientific terms used in this example have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the description of the embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this embodiment, the term "and/or" includes any combination of one or more of the associated listed items.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element of the same type from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. The use of any and all examples, or exemplary language ("e.g.," such as "or the like") provided with this embodiment is intended merely to better illuminate embodiments of the invention and does not pose a limitation on the scope of the invention unless otherwise claimed.
It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, operations of processes described in this embodiment can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described in this embodiment (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described in this embodiment includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein.
A computer program can be applied to input data to perform the functions described in the present embodiment to convert the input data to generate output data that is stored to a non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
The above description is only a preferred embodiment of the present invention, and the present invention is not limited to the above embodiment, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention as long as the technical effects of the present invention are achieved by the same means. The invention is capable of other modifications and variations in its technical solution and/or its implementation, within the scope of protection of the invention.
Claims (10)
1. A method for generating a character scene video, comprising:
acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a human face contour, a human body key point skeleton, a human body contour, a head contour and a background;
receiving the first image by using a trained generation confrontation network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
acquiring a voice signal;
and combining the second image with the voice signal to generate a character scene video.
2. The method of claim 1, further comprising training generation of a confrontational network model, comprising:
constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
the training set is obtained to train a generative antagonistic network model.
3. The method of claim 2, further comprising detecting generation of a confrontational network model, comprising:
modifying the label sample;
generating a counternetwork model to obtain a modified label sample;
whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
4. The method of claim 3, wherein the step of modifying the label sample comprises:
extracting key points and masks of the person image sample and the person video sample to obtain a label sample;
the keypoint coordinate locations and mask shapes are changed to modify the label exemplars.
5. The character scene video generation method of claim 3, wherein the generation of the confrontation network model includes generation of a network and discrimination network;
the generation network is used for receiving the first image and generating a second image;
and the judging network is used for judging the truth of the second image.
6. The character scene video generation method of claim 5, wherein the generation network includes a plurality of sub-networks including a first sub-network and a second sub-network;
the first sub-network is used for generating an image containing global information;
the second sub-network is used for carrying out local detail enhancement on the image generated by the first sub-network so as to output an image containing local detail features.
7. The method as claimed in claim 5, wherein the step of determining the degree of reality of the second image by a determination network comprises:
cropping the second image into a plurality of images of different scales;
judging on the images with different scales by using a multi-scale discriminator to obtain a plurality of judgment result values;
calculating an average value of the plurality of discrimination result values;
and judging the truth of the second image according to the calculated average value.
8. A character scene video generation system is characterized by comprising a test module and a training module;
the test module is used for:
acquiring a first image, wherein the first image is a label image with limiting conditions, and the limiting conditions comprise a human face contour, a human body key point skeleton, a human body contour, a head contour and a background;
receiving the first image by using a trained generation confrontation network model and processing the first image to output a second image, wherein the second image is a real image corresponding to a limiting condition;
acquiring a voice signal;
combining the second image with the voice signal to generate a character scene video;
the training module is used for training the generation confrontation network model through the following processes:
constructing a training set, wherein the training set consists of a figure image sample, a figure video sample and a label sample, and the label sample is obtained by extracting key points and masks of the figure image sample and the figure video sample;
acquiring the training set to train a generative countermeasure network model;
whether the image and/or the video corresponding to the label is output by the generation of the confrontation network model is detected.
9. A character scene video generating apparatus, comprising a processor and a memory, wherein,
the memory is to store program instructions;
the processor is used for reading the program instructions in the memory and executing the character scene video generation method as claimed in any one of claims 1 to 7 according to the program instructions in the memory.
10. A computer-readable storage medium, characterized in that,
a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs a method of generating a video of a character scene as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010079892.4A CN111353069A (en) | 2020-02-04 | 2020-02-04 | Character scene video generation method, system, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010079892.4A CN111353069A (en) | 2020-02-04 | 2020-02-04 | Character scene video generation method, system, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111353069A true CN111353069A (en) | 2020-06-30 |
Family
ID=71195684
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010079892.4A Pending CN111353069A (en) | 2020-02-04 | 2020-02-04 | Character scene video generation method, system, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111353069A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112270651A (en) * | 2020-10-15 | 2021-01-26 | 西安工程大学 | Image restoration method for generating countermeasure network based on multi-scale discrimination |
CN112329932A (en) * | 2020-10-30 | 2021-02-05 | 深圳市优必选科技股份有限公司 | Training method and device for generating countermeasure network and terminal equipment |
CN112734657A (en) * | 2020-12-28 | 2021-04-30 | 杨文龙 | Cloud group photo method and device based on artificial intelligence and three-dimensional model and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679502A (en) * | 2017-10-12 | 2018-02-09 | 南京行者易智能交通科技有限公司 | A kind of Population size estimation method based on the segmentation of deep learning image, semantic |
CN108205659A (en) * | 2017-11-30 | 2018-06-26 | 深圳市深网视界科技有限公司 | Face occluder removes and its method, equipment and the medium of model construction |
CN109377448A (en) * | 2018-05-20 | 2019-02-22 | 北京工业大学 | A kind of facial image restorative procedure based on generation confrontation network |
CN109635745A (en) * | 2018-12-13 | 2019-04-16 | 广东工业大学 | A method of Multi-angle human face image is generated based on confrontation network model is generated |
CN109819313A (en) * | 2019-01-10 | 2019-05-28 | 腾讯科技(深圳)有限公司 | Method for processing video frequency, device and storage medium |
CN110008832A (en) * | 2019-02-27 | 2019-07-12 | 西安电子科技大学 | Based on deep learning character image automatic division method, information data processing terminal |
CN110059217A (en) * | 2019-04-29 | 2019-07-26 | 广西师范大学 | A kind of image text cross-media retrieval method of two-level network |
CN110349081A (en) * | 2019-06-17 | 2019-10-18 | 达闼科技(北京)有限公司 | Generation method, device, storage medium and the electronic equipment of image |
-
2020
- 2020-02-04 CN CN202010079892.4A patent/CN111353069A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679502A (en) * | 2017-10-12 | 2018-02-09 | 南京行者易智能交通科技有限公司 | A kind of Population size estimation method based on the segmentation of deep learning image, semantic |
CN108205659A (en) * | 2017-11-30 | 2018-06-26 | 深圳市深网视界科技有限公司 | Face occluder removes and its method, equipment and the medium of model construction |
CN109377448A (en) * | 2018-05-20 | 2019-02-22 | 北京工业大学 | A kind of facial image restorative procedure based on generation confrontation network |
CN109635745A (en) * | 2018-12-13 | 2019-04-16 | 广东工业大学 | A method of Multi-angle human face image is generated based on confrontation network model is generated |
CN109819313A (en) * | 2019-01-10 | 2019-05-28 | 腾讯科技(深圳)有限公司 | Method for processing video frequency, device and storage medium |
CN110008832A (en) * | 2019-02-27 | 2019-07-12 | 西安电子科技大学 | Based on deep learning character image automatic division method, information data processing terminal |
CN110059217A (en) * | 2019-04-29 | 2019-07-26 | 广西师范大学 | A kind of image text cross-media retrieval method of two-level network |
CN110349081A (en) * | 2019-06-17 | 2019-10-18 | 达闼科技(北京)有限公司 | Generation method, device, storage medium and the electronic equipment of image |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112270651A (en) * | 2020-10-15 | 2021-01-26 | 西安工程大学 | Image restoration method for generating countermeasure network based on multi-scale discrimination |
CN112270651B (en) * | 2020-10-15 | 2023-12-15 | 西安工程大学 | Image restoration method for generating countermeasure network based on multi-scale discrimination |
CN112329932A (en) * | 2020-10-30 | 2021-02-05 | 深圳市优必选科技股份有限公司 | Training method and device for generating countermeasure network and terminal equipment |
CN112734657A (en) * | 2020-12-28 | 2021-04-30 | 杨文龙 | Cloud group photo method and device based on artificial intelligence and three-dimensional model and storage medium |
CN112734657B (en) * | 2020-12-28 | 2023-04-07 | 杨文龙 | Cloud group photo method and device based on artificial intelligence and three-dimensional model and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109359538B (en) | Training method of convolutional neural network, gesture recognition method, device and equipment | |
CN111243093B (en) | Three-dimensional face grid generation method, device, equipment and storage medium | |
CN110610453B (en) | Image processing method and device and computer readable storage medium | |
KR102304674B1 (en) | Facial expression synthesis method and apparatus, electronic device, and storage medium | |
US8861800B2 (en) | Rapid 3D face reconstruction from a 2D image and methods using such rapid 3D face reconstruction | |
JP5645079B2 (en) | Image processing apparatus and method, program, and recording medium | |
CN100407798C (en) | Three-dimensional geometric mode building system and method | |
TWI484444B (en) | Non-transitory computer readable medium, electronic device, and computer system for face feature vector construction | |
CN111710036B (en) | Method, device, equipment and storage medium for constructing three-dimensional face model | |
CN109753885A (en) | A kind of object detection method, device and pedestrian detection method, system | |
US8207987B2 (en) | Method and apparatus for producing digital cartoons | |
CN111353069A (en) | Character scene video generation method, system, device and storage medium | |
CN111046763A (en) | Portrait cartoon method and device | |
CN106068537A (en) | For the method and apparatus processing image | |
CN112132739A (en) | 3D reconstruction and human face posture normalization method, device, storage medium and equipment | |
CN111291674A (en) | Method, system, device and medium for extracting expression and action of virtual character | |
CN111667005A (en) | Human body interaction system adopting RGBD visual sensing | |
JP6052533B2 (en) | Feature amount extraction apparatus and feature amount extraction method | |
Liu et al. | Stereo video object segmentation using stereoscopic foreground trajectories | |
JP2017033556A (en) | Image processing method and electronic apparatus | |
KR101305725B1 (en) | Augmented reality of logo recognition and the mrthod | |
CN114862716A (en) | Image enhancement method, device and equipment for face image and storage medium | |
CN116862920A (en) | Portrait segmentation method, device, equipment and medium | |
CN116228850A (en) | Object posture estimation method, device, electronic equipment and readable storage medium | |
CN111368853A (en) | Label construction method, system, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200630 |