CN111696029A - Virtual image video generation method and device, computer equipment and storage medium - Google Patents

Virtual image video generation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111696029A
CN111696029A CN202010444226.6A CN202010444226A CN111696029A CN 111696029 A CN111696029 A CN 111696029A CN 202010444226 A CN202010444226 A CN 202010444226A CN 111696029 A CN111696029 A CN 111696029A
Authority
CN
China
Prior art keywords
picture
face
person
target
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010444226.6A
Other languages
Chinese (zh)
Other versions
CN111696029B (en
Inventor
南海顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiye Cultural Technology Co ltd
Shenzhen Lian Intellectual Property Service Center
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN202010444226.6A priority Critical patent/CN111696029B/en
Publication of CN111696029A publication Critical patent/CN111696029A/en
Application granted granted Critical
Publication of CN111696029B publication Critical patent/CN111696029B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20182Noise reduction or smoothing in the temporal domain; Spatio-temporal filtering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a virtual image video generation method. The avatar video generation method includes: determining a first person and a second person used for generating a virtual face picture and weights thereof; acquiring a preset number of face image pairs of a first person and a second person; generating a candidate virtual face picture for each face picture pair by using a picture generation model according to the weights of the first person and the second person; extracting each frame of target picture in the target video; determining a target virtual face picture with similar expression to the target picture from the candidate virtual face pictures; carrying out face replacement on the target picture according to the target virtual face picture to obtain a face replacement picture; and synthesizing the face replacement picture into the target video to obtain the virtual image video. The invention can obtain high-quality virtual image video. The invention also relates to a blockchain technique, wherein the face picture pair is stored in a blockchain.

Description

Virtual image video generation method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence image processing, in particular to a virtual image video generation method, a virtual image video generation device, computer equipment and a storage medium.
Background
For systems requiring customer-oriented services, human agents are often required to provide services through real videos. The traditional manual seat has low working efficiency and is difficult to obtain improvement, and the operation cost is relatively high. The virtual video replaces the real video of the manual seat, so that the problems can be solved, the service quality is improved, and the customer satisfaction is improved. However, in the existing industry, a virtual video is generally obtained by performing face replacement on a video by using a real face picture, in practical application, it is often difficult to obtain a sufficient number of real face pictures with rich expressions, and a virtual video generated by directly performing face replacement according to the real face pictures is not ideal. How to obtain satisfactory virtual video becomes a problem to be solved urgently.
Disclosure of Invention
In view of the above, there is a need for an avatar video generation method, apparatus, computer device, and storage medium that can obtain high quality avatar video.
A first aspect of the present application provides an avatar video generating method, the method including:
determining a first person and a second person for generating a virtual face picture, and determining the weight of the first person and the second person;
acquiring a preset number of face image pairs, wherein each face image pair comprises a first face image of a first person and a second face image of a second person;
generating a candidate virtual face picture for each face picture pair by using a picture generation model according to the weights of the first person and the second person;
extracting each frame of target picture in the target video;
determining a target virtual face picture with similar expression to the target picture from the candidate virtual face pictures;
carrying out face replacement on the target picture according to the target virtual face picture to obtain a face replacement picture;
and synthesizing the face replacement picture into the target video to obtain an avatar video.
In another possible implementation manner, the determining a first person and a second person used for generating a virtual face picture, and determining weights of the first person and the second person includes:
generating a character and weight setting interface;
receiving two characters and the weights of the two characters set by a user from the character and weight setting interface;
generating a sample virtual face picture by using the picture generation model according to the sample face pictures of the two figures and the weights of the two figures;
and if the user saves the set two persons and the weights of the two persons, determining the two persons as the first person and the second person, and determining the weights of the two persons as the weights of the first person and the second person.
In another possible implementation manner, the storing of the face image pairs in a block chain, and the acquiring of the preset number of face image pairs includes:
(a1) acquiring a first video of the first person and a second video of the second person;
(a2) extracting a first picture of the first person from the first video and a second picture of the second person from the second video;
(a3) segmenting the first face picture from the first picture and segmenting the second face picture from the second picture;
(a4) combining the first face picture and the second face picture into the face picture pair;
(a5) judging whether the number of the face image pairs reaches the preset number or not;
(a6) and if the number of the face picture pairs does not reach the preset number, returning to the step (a 2).
In another possible implementation manner, the acquiring of the preset number of pairs of face pictures includes:
(b1) acquiring a first video of the first person and a second video of the second person, wherein the first person and the second person in the first video and the second video respectively speak the same conversation according to the same duration;
(b2) extracting a first picture of the first person from the first video by frame, and extracting a second picture of the second person from the second video by frame;
(b3) segmenting the first face picture from the first picture and segmenting the second face picture from the second picture;
(b4) combining the first face picture and the second face picture into the face picture pair;
(b5) judging whether the number of the face image pairs reaches the preset number or not;
(b6) and if the number of the face picture pairs does not reach the preset number, returning to the step (b 2).
In another possible implementation manner, the determining, from the candidate virtual face pictures, a target virtual face picture with a similar expression to the target picture includes:
segmenting a target face picture from the target picture;
detecting a first face feature point from the target face picture;
detecting a second face feature point from each candidate virtual face picture;
performing feature matching on the second face feature points of each candidate virtual face picture and the first face features;
and determining the candidate virtual face picture corresponding to the second face feature point with the highest matching degree as the target virtual face picture.
In another possible implementation manner, the performing, according to the target virtual face picture, face replacement on the target picture includes:
adjusting a second face characteristic point of the target virtual face picture according to the first face characteristic point;
and replacing the human face area in the target picture with the adjusted target virtual human face picture.
In another possible implementation manner, before the performing the face replacement on the target picture according to the target virtual face picture, the method further includes:
and adjusting the brightness of the target virtual human face picture.
A second aspect of the present application provides an avatar video generating apparatus, the apparatus comprising:
the system comprises a first determining module, a second determining module and a display module, wherein the first determining module is used for determining a first person and a second person used for generating a virtual face picture, and determining the weights of the first person and the second person;
the acquisition module is used for acquiring a preset number of face image pairs, and each face image pair comprises a first face image of the first person and a second face image of the second person;
the generating module is used for generating a candidate virtual face picture for each face picture pair by using a picture generating model according to the weights of the first person and the second person;
the extraction module is used for extracting each frame of target picture in the target video;
the second determining module is used for determining a target virtual face picture with similar expression to the target picture from the candidate virtual face pictures;
the replacing module is used for carrying out face replacement on the target picture according to the target virtual face picture to obtain a face replacement picture;
and the synthesis module is used for synthesizing the face replacement picture into the target video to obtain the virtual image video.
A third aspect of the present application provides a computer device comprising a processor for implementing the avatar video generation method when executing a computer program stored in a memory.
A fourth aspect of the present application provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the avatar video generation method.
The method comprises the steps of determining a first person and a second person for generating a virtual face picture, and determining the weight of the first person and the weight of the second person; acquiring a preset number of face image pairs, wherein each face image pair comprises a first face image of a first person and a second face image of a second person; generating a candidate virtual face picture for each face picture pair by using a picture generation model according to the weights of the first person and the second person; extracting each frame of target picture in the target video; determining a target virtual face picture with similar expression to the target picture from the candidate virtual face pictures; carrying out face replacement on the target picture according to the target virtual face picture to obtain a face replacement picture; and synthesizing the face replacement picture into the target video to obtain an avatar video. The method comprises the steps of carrying out face replacement on a target video by using a plurality of virtual face pictures to obtain a virtual image video, wherein each frame of target picture in the target video corresponds to one virtual face picture. All virtual face pictures for face replacement are generated for the face pictures of fixed figures (namely a first figure and a second figure) according to fixed weights, so that the generated virtual face pictures are different in expression but are the same virtual image, and high-quality virtual image videos are obtained.
Drawings
Fig. 1 is a flowchart of an avatar video generation method according to an embodiment of the present invention.
Fig. 2 is a block diagram of an avatar video generating apparatus according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Preferably, the avatar video generation method of the present invention is applied to one or more computer devices. The computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
Example one
Fig. 1 is a flowchart of an avatar video generation method according to an embodiment of the present invention. The avatar video generation method is applied to computer equipment. The virtual image video generation method uses a virtual face picture to carry out face replacement on a target video to obtain a virtual image video.
As shown in fig. 1, the avatar video generating method includes:
101, determining a first person and a second person for generating a virtual face picture, and determining the weight of the first person and the second person.
In one embodiment, the determining a first person and a second person for generating a virtual face picture includes:
generating a character and weight setting interface;
receiving two characters and the weights of the two characters set by a user from the character and weight setting interface;
generating a sample virtual face picture by using a picture generation model according to the sample face pictures of the two figures and the weights of the two figures;
and if the user saves the set two persons and the weights of the two persons, determining the two persons as the first person and the second person, and determining the weights of the two persons as the weights of the first person and the second person.
And if the user abandons the two set characters and the weights of the two characters, receiving the characters and/or the weights reset by the user from the character and weight setting interface, and generating a new sample virtual face picture by using the picture generation model according to the sample face picture and the weights of the characters reset by the user. The user may iteratively set one or both of the people and/or re-set the weights of one or both of the people until a sample virtual face picture is obtained that the user is satisfied with.
The person and weight setting interface can comprise a person setting area and a weight setting area, and a user can set persons in the person setting area and set weights in the weight setting area. The person setting region may include options of country, sex, age, etc., for setting the country (e.g., china, korea, usa, uk, etc.), sex, age, etc., of the person.
The two characters set by the user are real characters, and the generation of the sample virtual face picture by using the picture generation model is based on the real face picture to generate the virtual face picture.
The first person and the second person are respectively corresponding to a weight, and the sum of the weight of the first person and the weight of the second person is equal to 1. For example, the weight of the first person is W1, the weight of the second person is W2, and W1+ W2 is 1.
The weights are used for performing linear combination on the picture codes of the first person and the second person when generating a virtual face picture (such as a sample virtual face picture), so as to obtain a mixed code of the first person and the second person.
In one embodiment, a sample virtual face picture is generated using the styleGAN model. The style gan model can generate a virtual face picture with good effect. When a sample virtual face picture is generated for two characters (marked as character a and character B) by using a styleGAN model, firstly, the sample face picture of character a is encoded to obtain a sample picture code Da of character a, the sample face picture of character B is encoded (for example, encoded by using a RESnet model) to obtain a sample picture code Db of character B, the sample picture code Da of character a and the sample picture code Db of character B are linearly combined (namely Da Wa + Db Wb) according to the weight Wa of character a and the weight Wb of character B to obtain a mixed code of character a and character B, and the mixed code is input into the styleGAN model to generate a picture, so that the sample virtual face pictures of character a and character B are obtained. If the weight Wa of the person A and/or the weight Wb of the person B are/is changed, different sample virtual face pictures are obtained.
In other embodiments, other picture generation models (e.g., GAN model, VAE model, or glow model) may be used to generate the sample virtual face picture.
In another embodiment, the first person and the second person and their weights may be selected randomly.
102, obtaining a preset number of face image pairs, wherein each face image pair comprises a first face image of the first person and a second face image of the second person.
In an embodiment, the acquiring a preset number of pairs of face pictures includes:
(a1) acquiring a first video of the first person and a second video of the second person;
(a2) extracting a first picture of the first person from the first video and a second picture of the second person from the second video;
(a3) segmenting the first face picture from the first picture and segmenting the second face picture from the second picture;
(a4) combining the first face picture and the second face picture into the face picture pair;
(a5) judging whether the number of the face image pairs reaches the preset number or not;
(a6) and if the number of the face picture pairs does not reach the preset number, returning to the step (a 2).
And if the number of the face image pairs reaches the preset number, ending the process.
In a specific embodiment, the acquiring the preset number of pairs of face pictures includes:
(b1) acquiring a first video of the first person and a second video of the second person, wherein the first person and the second person in the first video and the second video respectively speak the same conversation according to the same duration;
(b2) extracting a first picture of the first person from the first video by frame, and extracting a second picture of the second person from the second video by frame;
(b3) segmenting the first face picture from the first picture and segmenting the second face picture from the second picture;
(b4) combining the first face picture and the second face picture into the face picture pair;
(b5) judging whether the number of the face image pairs reaches the preset number or not;
(b6) and if the number of the face picture pairs does not reach the preset number, returning to the step (b 2).
And if the number of the face image pairs reaches the preset number, ending the process.
For example, a video is recorded for a first person and a second person, and the first person and the second person respectively speak the same words for the same time duration (e.g., one minute), so as to obtain a first video and a second video. Extracting a first frame of first picture from a first video, extracting a first frame of second picture from a second video, segmenting a first face picture from the first frame of first picture, segmenting a first second face picture from the first frame of second picture, and combining the first face picture and the first second face picture into a first face picture pair; extracting a second frame of first picture from the first video, extracting a second frame of second picture from the second video, segmenting a second first face picture from the second frame of first picture, segmenting a second face picture from the second frame of second picture, and combining the second first face picture and the second face picture into a second face picture pair; … … and so on.
A first face picture may be segmented from the first picture and a second face picture may be segmented from the second picture using a face recognition model (e.g., a face recognition model of a Dlib library).
In the specific embodiment of obtaining the preset number of face image pairs, if the first person and the second person in the first video and the second video respectively say the same segment according to the same duration, the face image pairs are obtained by performing image extraction, segmentation and combination on the first video and the second video according to frames. The first character and the second character in each face picture obtained by the method have similar expressions, and a more stable virtual face picture can be generated according to the face picture pair.
In another embodiment, a preset number of pictures may be arbitrarily taken for the first person and the second person, and the taken pictures are subjected to face segmentation and combination to obtain the preset number of face picture pairs. It should be emphasized that, in order to further ensure the privacy and security of the face image pair, the face image pair may also be stored in a node of a block chain.
103, generating a candidate virtual face picture for each face picture pair by using a picture generation model according to the weight of the first person and the second person.
In one embodiment, a candidate virtual face picture is generated for each face picture pair using the styleGAN model. When candidate virtual face pictures are generated for a face picture pair (a first face picture and a second face picture) by using a styleGAN model, the first face picture is encoded to obtain a picture code D1 of a first person, the second face picture is encoded (for example, encoded by using a RESnet model) to obtain a picture code D2 of a second person, the picture code D1 of the first person and the picture code D2 of the second person are linearly combined (namely, D2W 2+ D2W 2) according to the weight W1 of the first person and the weight W2 of the second person to obtain a mixed code of the first person and the second person, and the mixed code is input into the styleGAN model to perform picture generation to obtain candidate virtual face pictures of the face picture pair.
In other embodiments, other image generation models (e.g., GAN model, VAE model, or glow model) may be used to generate a candidate virtual face image for each face image pair.
And 104, extracting each frame of target picture in the target video.
The target video is a character video to be replaced. And extracting the target video according to frames to obtain each frame of target picture in the target video.
105, determining a target virtual face picture with similar expression to the target picture from the candidate virtual face pictures.
In an embodiment, the determining, from the candidate virtual face pictures, a target virtual face picture with a similar expression to the target picture includes:
segmenting a target face picture from the target picture;
detecting a first face feature point from the target face picture;
detecting a second face feature point from each candidate virtual face picture;
performing feature matching on the second face feature points of each candidate virtual face picture and the first face features;
and determining the candidate virtual face picture corresponding to the second face feature point with the highest matching degree as the target virtual face picture.
In an embodiment, the first and second facial feature points may include mouth feature points. The mouth characteristic points of the target face picture and the mouth characteristic points of each candidate virtual face picture can be detected, and the matching degree of the mouth characteristic points of each candidate virtual face picture and the mouth characteristic points of the target face picture is calculated. And if the matching degree of the mouth characteristic points in one candidate virtual face picture and the mouth characteristic points of the target face picture is the highest, determining the candidate virtual face picture as the target virtual face picture.
The distance (e.g., euclidean distance or cosine distance) between the second facial feature point and the first facial feature may be calculated, and the matching degree between the second facial feature point and the first facial feature may be determined according to the distance. For example, the larger the distance, the lower the matching degree. If the distance is smaller, the matching degree is higher.
In another embodiment, the first and second facial feature points may include mouth feature points and eye feature points.
The target face picture may be segmented from the target picture using a face recognition model (e.g., a face recognition model of a Dlib library).
And 106, carrying out face replacement on the target picture according to the target virtual face picture to obtain a face replacement picture.
In an embodiment, the performing face replacement on the target picture according to the target virtual face picture includes:
adjusting a second face characteristic point of the target virtual face picture according to the first face characteristic point;
and replacing the human face area in the target picture with the adjusted target virtual human face picture.
The purpose of adjusting the second face feature points in the target virtual face picture is to improve the matching degree of the second face feature points and the first face feature points of the target virtual face picture, so that the expression similarity of the target virtual face picture and the target picture is improved.
In an embodiment, adjusting the second face feature points of the target virtual face picture according to the first face feature points includes adjusting mouth feature points and eye feature points of the target virtual face picture.
For adjusting the second face feature point of the target virtual face picture according to the first face feature point, reference may be made to the prior art, and details thereof are not repeated here.
And 107, synthesizing the face replacement picture into the target video to obtain an avatar video.
And carrying out face replacement on each frame of target picture, and synthesizing the face replacement picture of each frame of target picture into the target video to obtain the virtual image video. The face in the avatar video is generated based on the real face picture.
The virtual image video generation method comprises the steps of carrying out face replacement on a target video by using a plurality of virtual face pictures to obtain the virtual image video, wherein each frame of target picture in the target video corresponds to one virtual face picture. All virtual face pictures for face replacement are generated for the face pictures of fixed figures (namely a first figure and a second figure) according to fixed weights, so that the generated virtual face pictures are different in expression but are the same virtual image, and high-quality virtual image videos are obtained.
In an embodiment, before the performing the face replacement on the target picture according to the target virtual face picture, the method further includes:
and adjusting the brightness of the target virtual human face picture.
The target virtual face picture may have uneven skin color, for example, the left face is brighter than the right face, which may affect the fidelity of the face replacement result. And adjusting the brightness of the target virtual human face picture to enable the skin color of the target virtual human face picture to be uniform.
In an embodiment, the brightness adjustment of the target virtual face picture includes:
(a) affine transformation is carried out on the left face area and the right face area of the target virtual face picture, a plurality of pixel pairs of the target virtual face picture are obtained, and each pixel pair comprises a pixel point of the left face area and a pixel point in the right face area corresponding to the pixel point. The pixel pair may be represented as (a (x)a,ya),b(xb,yb)),a(xa,ya) Is a pixel point in the left face region, b (x)b,yb) Is a (x)a,ya) And corresponding pixel points in the right face area.
(b) And updating the pixel value of each pixel pair of the target virtual face picture by using a brightness smoothing formula. In one embodiment, the brightness smoothing formula is as follows:
Figure BDA0002505126470000121
Figure BDA0002505126470000122
wherein:
α=abs(ya-ymiddle)
β=abs(yb-ymiddle)
Figure BDA0002505126470000131
representing a pixel point a (x)a,yb) The value of the original pixel is determined,
Figure BDA0002505126470000132
representing a pixel point a (x)a,yb) The value of the pixel after the update is,
Figure BDA0002505126470000133
represents pixel b (x)b,yb) The value of the original pixel is determined,
Figure BDA0002505126470000134
represents pixel b (x)b,yb) The updated pixel values, α, β, represent the horizontal distance of the a, b points from the face centerline.
Example two
Fig. 2 is a structural diagram of an avatar video generating apparatus according to a second embodiment of the present invention. The avatar video generating apparatus 20 is applied to a computer device. The avatar video generating device 20 performs face replacement on the target video by using the virtual face picture to obtain an avatar video.
As shown in fig. 2, the avatar video generating apparatus 20 may include a first determining module 201, an obtaining module 202, a generating module 203, an extracting module 204, a second determining module 205, a replacing module 206, and a synthesizing module 207.
The first determining module 201 is configured to determine a first person and a second person for generating a virtual face picture, and determine weights of the first person and the second person.
In one embodiment, the determining a first person and a second person for generating a virtual face picture includes:
generating a character and weight setting interface;
receiving two characters and the weights of the two characters set by a user from the character and weight setting interface;
generating a sample virtual face picture by using a picture generation model according to the sample face pictures of the two figures and the weights of the two figures;
and if the user saves the set two persons and the weights of the two persons, determining the two persons as the first person and the second person, and determining the weights of the two persons as the weights of the first person and the second person.
And if the user abandons the two set characters and the weights of the two characters, receiving the characters and/or the weights reset by the user from the character and weight setting interface, and generating a new sample virtual face picture by using the picture generation model according to the sample face picture and the weights of the characters reset by the user. The user may iteratively set one or both of the people and/or re-set the weights of one or both of the people until a sample virtual face picture is obtained that the user is satisfied with.
The person and weight setting interface can comprise a person setting area and a weight setting area, and a user can set persons in the person setting area and set weights in the weight setting area. The person setting region may include options of country, sex, age, etc., for setting the country (e.g., china, korea, usa, uk, etc.), sex, age, etc., of the person.
The two characters set by the user are real characters, and the generation of the sample virtual face picture by using the picture generation model is based on the real face picture to generate the virtual face picture.
The first person and the second person are respectively corresponding to a weight, and the sum of the weight of the first person and the weight of the second person is equal to 1. For example, the weight of the first person is W1, the weight of the second person is W2, and W1+ W2 is 1.
The weights are used for performing linear combination on the picture codes of the first person and the second person when generating a virtual face picture (such as a sample virtual face picture), so as to obtain a mixed code of the first person and the second person.
In one embodiment, a sample virtual face picture is generated using the styleGAN model. The style gan model can generate a virtual face picture with good effect. When a sample virtual face picture is generated for two characters (marked as character a and character B) by using a styleGAN model, firstly, the sample face picture of character a is encoded to obtain a sample picture code Da of character a, the sample face picture of character B is encoded (for example, encoded by using a RESnet model) to obtain a sample picture code Db of character B, the sample picture code Da of character a and the sample picture code Db of character B are linearly combined (namely Da Wa + Db Wb) according to the weight Wa of character a and the weight Wb of character B to obtain a mixed code of character a and character B, and the mixed code is input into the styleGAN model to generate a picture, so that the sample virtual face pictures of character a and character B are obtained. If the weight Wa of the person A and/or the weight Wb of the person B are/is changed, different sample virtual face pictures are obtained.
In other embodiments, other picture generation models (e.g., GAN model, VAE model, or glow model) may be used to generate the sample virtual face picture.
In another embodiment, the first person and the second person and their weights may be selected randomly.
The obtaining module 202 is configured to obtain a preset number of face image pairs, where each face image pair includes a first face image of the first person and a second face image of the second person.
In an embodiment, the acquiring a preset number of pairs of face pictures includes:
(a1) acquiring a first video of the first person and a second video of the second person;
(a2) extracting a first picture of the first person from the first video and a second picture of the second person from the second video;
(a3) segmenting the first face picture from the first picture and segmenting the second face picture from the second picture;
(a4) combining the first face picture and the second face picture into the face picture pair;
(a5) judging whether the number of the face image pairs reaches the preset number or not;
(a6) and if the number of the face picture pairs does not reach the preset number, returning to the step (a 2).
And if the number of the face image pairs reaches the preset number, ending the process.
In a specific embodiment, the acquiring the preset number of pairs of face pictures includes:
(b1) acquiring a first video of the first person and a second video of the second person, wherein the first person and the second person in the first video and the second video respectively speak the same conversation according to the same duration;
(b2) extracting a first picture of the first person from the first video by frame, and extracting a second picture of the second person from the second video by frame;
(b3) segmenting the first face picture from the first picture and segmenting the second face picture from the second picture;
(b4) combining the first face picture and the second face picture into the face picture pair;
(b5) judging whether the number of the face image pairs reaches the preset number or not;
(b6) and if the number of the face picture pairs does not reach the preset number, returning to the step (b 2).
And if the number of the face image pairs reaches the preset number, ending the process.
For example, a video is recorded for a first person and a second person, and the first person and the second person respectively speak the same words for the same time duration (e.g., one minute), so as to obtain a first video and a second video. Extracting a first frame of first picture from a first video, extracting a first frame of second picture from a second video, segmenting a first face picture from the first frame of first picture, segmenting a first second face picture from the first frame of second picture, and combining the first face picture and the first second face picture into a first face picture pair; extracting a second frame of first picture from the first video, extracting a second frame of second picture from the second video, segmenting a second first face picture from the second frame of first picture, segmenting a second face picture from the second frame of second picture, and combining the second first face picture and the second face picture into a second face picture pair; … … and so on.
A first face picture may be segmented from the first picture and a second face picture may be segmented from the second picture using a face recognition model (e.g., a face recognition model of a Dlib library).
In the specific embodiment of obtaining the preset number of face image pairs, if the first person and the second person in the first video and the second video respectively say the same segment according to the same duration, the face image pairs are obtained by performing image extraction, segmentation and combination on the first video and the second video according to frames. The first character and the second character in each face picture obtained by the method have similar expressions, and a more stable virtual face picture can be generated according to the face picture pair.
In another embodiment, a preset number of pictures may be arbitrarily taken for the first person and the second person, and the taken pictures are subjected to face segmentation and combination to obtain the preset number of face picture pairs. It should be emphasized that, in order to further ensure the privacy and security of the face image pair, the face image pair may also be stored in a node of a block chain.
A generating module 203, configured to generate a candidate virtual face picture for each face picture pair by using a picture generation model according to the weights of the first person and the second person.
In one embodiment, a candidate virtual face picture is generated for each face picture pair using the styleGAN model. When candidate virtual face pictures are generated for a face picture pair (a first face picture and a second face picture) by using a styleGAN model, the first face picture is encoded to obtain a picture code D1 of a first person, the second face picture is encoded (for example, encoded by using a RESnet model) to obtain a picture code D2 of a second person, the picture code D1 of the first person and the picture code D2 of the second person are linearly combined (namely, D2W 2+ D2W 2) according to the weight W1 of the first person and the weight W2 of the second person to obtain a mixed code of the first person and the second person, and the mixed code is input into the styleGAN model to perform picture generation to obtain candidate virtual face pictures of the face picture pair.
In other embodiments, other image generation models (e.g., GAN model, VAE model, or glow model) may be used to generate a candidate virtual face image for each face image pair.
An extracting module 204, configured to extract each frame of target picture in the target video.
The target video is a character video to be replaced. And extracting the target video according to frames to obtain each frame of target picture in the target video.
A second determining module 205, configured to determine, from the candidate virtual face pictures, a target virtual face picture with a similar expression as the target picture.
In an embodiment, the determining, from the candidate virtual face pictures, a target virtual face picture with a similar expression to the target picture includes:
segmenting a target face picture from the target picture;
detecting a first face feature point from the target face picture;
detecting a second face feature point from each candidate virtual face picture;
performing feature matching on the second face feature points of each candidate virtual face picture and the first face features;
and determining the candidate virtual face picture corresponding to the second face feature point with the highest matching degree as the target virtual face picture.
In an embodiment, the first and second facial feature points may include mouth feature points. The mouth characteristic points of the target face picture and the mouth characteristic points of each candidate virtual face picture can be detected, and the matching degree of the mouth characteristic points of each candidate virtual face picture and the mouth characteristic points of the target face picture is calculated. And if the matching degree of the mouth characteristic points in one candidate virtual face picture and the mouth characteristic points of the target face picture is the highest, determining the candidate virtual face picture as the target virtual face picture.
The distance (e.g., euclidean distance or cosine distance) between the second facial feature point and the first facial feature may be calculated, and the matching degree between the second facial feature point and the first facial feature may be determined according to the distance. For example, the larger the distance, the lower the matching degree. If the distance is smaller, the matching degree is higher.
In another embodiment, the first and second facial feature points may include mouth feature points and eye feature points.
The target face picture may be segmented from the target picture using a face recognition model (e.g., a face recognition model of a Dlib library).
And the replacing module 206 is configured to perform face replacement on the target picture according to the target virtual face picture to obtain a face replacement picture.
In an embodiment, the performing face replacement on the target picture according to the target virtual face picture includes:
adjusting a second face characteristic point of the target virtual face picture according to the first face characteristic point;
and replacing the human face area in the target picture with the adjusted target virtual human face picture.
The purpose of adjusting the second face feature points in the target virtual face picture is to improve the matching degree of the second face feature points and the first face feature points of the target virtual face picture, so that the expression similarity of the target virtual face picture and the target picture is improved.
In an embodiment, adjusting the second face feature points of the target virtual face picture according to the first face feature points includes adjusting mouth feature points and eye feature points of the target virtual face picture.
For adjusting the second face feature point of the target virtual face picture according to the first face feature point, reference may be made to the prior art, and details thereof are not repeated here.
And the synthesizing module 207 is used for synthesizing the face replacing picture into the target video to obtain an avatar video.
And carrying out face replacement on each frame of target picture, and synthesizing the face replacement picture of each frame of target picture into the target video to obtain the virtual image video. The face in the avatar video is generated based on the real face picture.
The avatar video generating device 20 performs face replacement on the target video by using a plurality of avatar face pictures to obtain an avatar video, wherein each frame of target picture in the target video corresponds to one avatar face picture. All virtual face pictures for face replacement are generated for the face pictures of fixed figures (namely a first figure and a second figure) according to fixed weights, so that the generated virtual face pictures are different in expression but are the same virtual image, and high-quality virtual image videos are obtained.
In one embodiment, the avatar video generating apparatus 20 further includes:
and the adjusting module is used for adjusting the brightness of the target virtual face picture before the target picture is subjected to face replacement according to the target virtual face picture.
The target virtual face picture may have uneven skin color, for example, the left face is brighter than the right face, which may affect the fidelity of the face replacement result. And adjusting the brightness of the target virtual human face picture to enable the skin color of the target virtual human face picture to be uniform.
In an embodiment, the brightness adjustment of the target virtual face picture includes:
(a) affine transformation is carried out on the left face area and the right face area of the target virtual face picture, a plurality of pixel pairs of the target virtual face picture are obtained, and each pixel pair comprises a pixel point of the left face area and a pixel point in the right face area corresponding to the pixel point. The pixel pair may be represented as (a (x)a,ya),b(xb,yb)),a(xa,ya) Is a pixel point in the left face region, b (x)b,yb) Is a (x)a,ya) And corresponding pixel points in the right face area.
(b) And updating the pixel value of each pixel pair of the target virtual face picture by using a brightness smoothing formula. In one embodiment, the brightness smoothing formula is as follows:
Figure BDA0002505126470000191
Figure BDA0002505126470000192
wherein:
α=abs(ya-ymiddle)
β=abs(yb-ymiddle)
Figure BDA0002505126470000201
representing a pixel point a (x)a,yb) The value of the original pixel is determined,
Figure BDA0002505126470000202
representing a pixel point a (x)a,yb) The value of the pixel after the update is,
Figure BDA0002505126470000203
represents pixel b (x)b,yb) The value of the original pixel is determined,
Figure BDA0002505126470000204
represents pixel b (x)b,yb) The updated pixel values, α, β, represent the horizontal distance of the a, b points from the face centerline.
EXAMPLE III
The present embodiment provides a storage medium, which stores thereon a computer program, which when executed by a processor implements the steps in the above-mentioned avatar video generation method embodiments, such as 101-107 shown in fig. 1.
Alternatively, the computer program, when executed by the processor, implements the functions of the modules in the above-described device embodiments, such as the module 201 and 207 in fig. 2.
Example four
Fig. 3 is a schematic diagram of a computer device according to a fourth embodiment of the present invention. The computer device 30 comprises a memory 301, a processor 302 and a computer program 303, such as an avatar video generation program, stored in the memory 301 and executable on the processor 302. The processor 302, when executing the computer program 303, implements the steps in the above-mentioned avatar video generation method embodiments, such as 101-107 shown in fig. 1. Alternatively, the computer program, when executed by the processor, implements the functions of the modules in the above-described device embodiments, such as the module 201 and 207 in fig. 2.
Illustratively, the computer program 303 may be partitioned into one or more modules that are stored in the memory 301 and executed by the processor 302 to perform the present method. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 303 in the computer device 30. For example, the computer program 303 may be divided into the first determining module 201, the obtaining module 202, the generating module 203, the extracting module 204, the second determining module 205, the replacing module 206, and the synthesizing module 207 in fig. 2, and specific functions of each module are described in embodiment two.
The computer device 30 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. Those skilled in the art will appreciate that the schematic diagram 3 is merely an example of the computer device 30 and does not constitute a limitation of the computer device 30, and may include more or less components than those shown, or combine certain components, or different components, for example, the computer device 30 may also include input and output devices, network access devices, buses, etc.
The Processor 302 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor 302 may be any conventional processor or the like, the processor 302 being the control center for the computer device 30 and connecting the various parts of the overall computer device 30 using various interfaces and lines.
The memory 301 may be used to store the computer program 303, and the processor 302 may implement various functions of the computer device 30 by running or executing the computer program or module stored in the memory 301 and calling data stored in the memory 301. The memory 301 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system, an application program (such as a sound playing function and an image playing function) required by at least one function, and the like; the storage data area may store data created according to the use of the computer device 30. Further, the memory 301 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other non-volatile solid state storage device.
The modules integrated by the computer device 30 may be stored in a storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a storage medium and executed by a processor, to instruct related hardware to implement the steps of the embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM). It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.
The integrated module implemented in the form of a software functional module may be stored in a storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. Furthermore, it is to be understood that the word "comprising" does not exclude other modules or steps, and the singular does not exclude the plural. A plurality of modules or means recited in the system claims may also be implemented by one module or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for avatar video generation, the method comprising:
determining a first person and a second person for generating a virtual face picture, and determining the weight of the first person and the second person;
acquiring a preset number of face image pairs, wherein each face image pair comprises a first face image of a first person and a second face image of a second person;
generating a candidate virtual face picture for each face picture pair by using a picture generation model according to the weights of the first person and the second person;
extracting each frame of target picture in the target video;
determining a target virtual face picture with similar expression to the target picture from the candidate virtual face pictures;
carrying out face replacement on the target picture according to the target virtual face picture to obtain a face replacement picture;
and synthesizing the face replacement picture into the target video to obtain an avatar video.
2. The avatar video generation method of claim 1, wherein said determining a first person and a second person for generating a picture of a virtual face, determining weights of said first person and said second person comprises:
generating a character and weight setting interface;
receiving two characters and the weights of the two characters set by a user from the character and weight setting interface;
generating a sample virtual face picture by using the picture generation model according to the sample face pictures of the two figures and the weights of the two figures;
and if the user saves the set two persons and the weights of the two persons, determining the two persons as the first person and the second person, and determining the weights of the two persons as the weights of the first person and the second person.
3. The avatar video generating method of claim 1, wherein said face picture pairs are stored in a block chain, said obtaining a preset number of face picture pairs comprises:
(a1) acquiring a first video of the first person and a second video of the second person;
(a2) extracting a first picture of the first person from the first video and a second picture of the second person from the second video;
(a3) segmenting the first face picture from the first picture and segmenting the second face picture from the second picture;
(a4) combining the first face picture and the second face picture into the face picture pair;
(a5) judging whether the number of the face image pairs reaches the preset number or not;
(a6) and if the number of the face picture pairs does not reach the preset number, returning to the step (a 2).
4. The avatar video generating method of claim 1, wherein said obtaining a preset number of pairs of face pictures comprises:
(b1) acquiring a first video of the first person and a second video of the second person, wherein the first person and the second person in the first video and the second video respectively speak the same conversation according to the same duration;
(b2) extracting a first picture of the first person from the first video by frame, and extracting a second picture of the second person from the second video by frame;
(b3) segmenting the first face picture from the first picture and segmenting the second face picture from the second picture;
(b4) combining the first face picture and the second face picture into the face picture pair;
(b5) judging whether the number of the face image pairs reaches the preset number or not;
(b6) and if the number of the face picture pairs does not reach the preset number, returning to the step (b 2).
5. The avatar video generating method of claim 1, wherein said determining a target virtual face picture of similar expression to said target picture from said candidate virtual face pictures comprises:
segmenting a target face picture from the target picture;
detecting a first face feature point from the target face picture;
detecting a second face feature point from each candidate virtual face picture;
performing feature matching on the second face feature points of each candidate virtual face picture and the first face features;
and determining the candidate virtual face picture corresponding to the second face feature point with the highest matching degree as the target virtual face picture.
6. The avatar video generating method of claim 1, wherein said face-replacing said target picture according to said target virtual face picture comprises:
adjusting a second face characteristic point of the target virtual face picture according to the first face characteristic point;
and replacing the human face area in the target picture with the adjusted target virtual human face picture.
7. The avatar video generating method of any one of claims 1 to 6, wherein said method further comprises, before face replacement of said target picture according to said target virtual face picture:
and adjusting the brightness of the target virtual human face picture.
8. An avatar video generating apparatus, the apparatus comprising:
the system comprises a first determining module, a second determining module and a display module, wherein the first determining module is used for determining a first person and a second person used for generating a virtual face picture, and determining the weights of the first person and the second person;
the acquisition module is used for acquiring a preset number of face image pairs, and each face image pair comprises a first face image of the first person and a second face image of the second person;
the generating module is used for generating a candidate virtual face picture for each face picture pair by using a picture generating model according to the weights of the first person and the second person;
the extraction module is used for extracting each frame of target picture in the target video;
the second determining module is used for determining a target virtual face picture with similar expression to the target picture from the candidate virtual face pictures;
the replacing module is used for carrying out face replacement on the target picture according to the target virtual face picture to obtain a face replacement picture;
and the synthesis module is used for synthesizing the face replacement picture into the target video to obtain the virtual image video.
9. A computer device characterized in that the computer device includes a processor for executing a computer program stored in a memory to implement the avatar video generation method of any one of claims 1 to 7.
10. A storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the avatar video generation method of any of claims 1-7.
CN202010444226.6A 2020-05-22 2020-05-22 Virtual image video generation method, device, computer equipment and storage medium Active CN111696029B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010444226.6A CN111696029B (en) 2020-05-22 2020-05-22 Virtual image video generation method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010444226.6A CN111696029B (en) 2020-05-22 2020-05-22 Virtual image video generation method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111696029A true CN111696029A (en) 2020-09-22
CN111696029B CN111696029B (en) 2023-08-01

Family

ID=72477431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010444226.6A Active CN111696029B (en) 2020-05-22 2020-05-22 Virtual image video generation method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111696029B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750186A (en) * 2021-01-19 2021-05-04 深圳追一科技有限公司 Virtual image switching method and device, electronic equipment and storage medium
CN113559503A (en) * 2021-06-30 2021-10-29 上海掌门科技有限公司 Video generation method, device and computer readable medium
CN113643412A (en) * 2021-07-14 2021-11-12 北京百度网讯科技有限公司 Virtual image generation method and device, electronic equipment and storage medium
CN114007099A (en) * 2021-11-04 2022-02-01 北京搜狗科技发展有限公司 Video processing method and device for video processing
CN114187392A (en) * 2021-10-29 2022-03-15 北京百度网讯科技有限公司 Virtual even image generation method and device and electronic equipment
CN114969785A (en) * 2022-05-27 2022-08-30 哈尔滨工业大学(深圳) Carrier-free image steganography method based on reversible neural network
US11816174B2 (en) 2022-03-29 2023-11-14 Ebay Inc. Enhanced search with morphed images

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105118082A (en) * 2015-07-30 2015-12-02 科大讯飞股份有限公司 Personalized video generation method and system
US20160343168A1 (en) * 2015-05-20 2016-11-24 Daqri, Llc Virtual personification for augmented reality system
CN107734267A (en) * 2017-09-11 2018-02-23 广东欧珀移动通信有限公司 Image processing method and device
CN109598223A (en) * 2018-11-26 2019-04-09 北京洛必达科技有限公司 Method and apparatus based on video acquisition target person
US20190172252A1 (en) * 2017-12-01 2019-06-06 Koninklijke Kpn N.V. Selecting an Omnidirectional Image for Display
CN112017141A (en) * 2020-09-14 2020-12-01 北京百度网讯科技有限公司 Video data processing method and device
CN114419204A (en) * 2020-10-12 2022-04-29 深圳市声希科技有限公司 Video generation method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160343168A1 (en) * 2015-05-20 2016-11-24 Daqri, Llc Virtual personification for augmented reality system
CN105118082A (en) * 2015-07-30 2015-12-02 科大讯飞股份有限公司 Personalized video generation method and system
CN107734267A (en) * 2017-09-11 2018-02-23 广东欧珀移动通信有限公司 Image processing method and device
US20190172252A1 (en) * 2017-12-01 2019-06-06 Koninklijke Kpn N.V. Selecting an Omnidirectional Image for Display
CN109598223A (en) * 2018-11-26 2019-04-09 北京洛必达科技有限公司 Method and apparatus based on video acquisition target person
CN112017141A (en) * 2020-09-14 2020-12-01 北京百度网讯科技有限公司 Video data processing method and device
CN114419204A (en) * 2020-10-12 2022-04-29 深圳市声希科技有限公司 Video generation method, device, equipment and storage medium

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750186A (en) * 2021-01-19 2021-05-04 深圳追一科技有限公司 Virtual image switching method and device, electronic equipment and storage medium
CN112750186B (en) * 2021-01-19 2024-02-23 深圳追一科技有限公司 Virtual image switching method, device, electronic equipment and storage medium
CN113559503A (en) * 2021-06-30 2021-10-29 上海掌门科技有限公司 Video generation method, device and computer readable medium
CN113559503B (en) * 2021-06-30 2024-03-12 上海掌门科技有限公司 Video generation method, device and computer readable medium
CN113643412A (en) * 2021-07-14 2021-11-12 北京百度网讯科技有限公司 Virtual image generation method and device, electronic equipment and storage medium
CN113643412B (en) * 2021-07-14 2022-07-22 北京百度网讯科技有限公司 Virtual image generation method and device, electronic equipment and storage medium
US11823306B2 (en) 2021-07-14 2023-11-21 Beijing Baidu Netcom Science Technology Co., Ltd. Virtual image generation method and apparatus, electronic device and storage medium
CN114187392A (en) * 2021-10-29 2022-03-15 北京百度网讯科技有限公司 Virtual even image generation method and device and electronic equipment
CN114187392B (en) * 2021-10-29 2024-04-19 北京百度网讯科技有限公司 Virtual even image generation method and device and electronic equipment
CN114007099A (en) * 2021-11-04 2022-02-01 北京搜狗科技发展有限公司 Video processing method and device for video processing
US11816174B2 (en) 2022-03-29 2023-11-14 Ebay Inc. Enhanced search with morphed images
CN114969785A (en) * 2022-05-27 2022-08-30 哈尔滨工业大学(深圳) Carrier-free image steganography method based on reversible neural network

Also Published As

Publication number Publication date
CN111696029B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN111696029B (en) Virtual image video generation method, device, computer equipment and storage medium
CN107578017B (en) Method and apparatus for generating image
CN106682632B (en) Method and device for processing face image
CN111679949A (en) Anomaly detection method based on equipment index data and related equipment
CN113287118A (en) System and method for face reproduction
CN112258269B (en) Virtual fitting method and device based on 2D image
CN109657554A (en) A kind of image-recognizing method based on micro- expression, device and relevant device
CN113327278B (en) Three-dimensional face reconstruction method, device, equipment and storage medium
US11475608B2 (en) Face image generation with pose and expression control
CN111047509B (en) Image special effect processing method, device and terminal
CN109766925B (en) Feature fusion method and device, electronic equipment and storage medium
CN111814620A (en) Face image quality evaluation model establishing method, optimization method, medium and device
CN108053410A (en) Moving Object Segmentation method and device
CN111680544B (en) Face recognition method, device, system, equipment and medium
CN113470684A (en) Audio noise reduction method, device, equipment and storage medium
CN113689436A (en) Image semantic segmentation method, device, equipment and storage medium
CN111460893A (en) Face feature vector dynamic adjustment method and related equipment
CN114222179A (en) Virtual image video synthesis method and equipment
CN112669244A (en) Face image enhancement method and device, computer equipment and readable storage medium
CN110059739B (en) Image synthesis method, image synthesis device, electronic equipment and computer-readable storage medium
CN116205723A (en) Artificial intelligence-based face tag risk detection method and related equipment
CN113435357B (en) Voice broadcasting method, device, equipment and storage medium
CN110874567B (en) Color value judging method and device, electronic equipment and storage medium
CN112652325A (en) Remote voice adjusting method based on artificial intelligence and related equipment
CN112528140A (en) Information recommendation method, device, equipment, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230705

Address after: 5B11B, Floor 5, Building 1, Yard 13, Big Bell Temple, Haidian District, Beijing

Applicant after: Beijing Zhiye Cultural Technology Co.,Ltd.

Address before: 518000 Room 202, block B, aerospace micromotor building, No.7, Langshan No.2 Road, Xili street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: Shenzhen LIAN intellectual property service center

Effective date of registration: 20230705

Address after: 518000 Room 202, block B, aerospace micromotor building, No.7, Langshan No.2 Road, Xili street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen LIAN intellectual property service center

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: PING AN PUHUI ENTERPRISE MANAGEMENT Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant