CN114363712B

CN114363712B - AI digital person video generation method, device and equipment based on templated editing

Info

Publication number: CN114363712B
Application number: CN202210039411.6A
Authority: CN
Inventors: 刘玉婷; 丁淑华; 刘子健
Original assignee: Shenzhen Dlp Digital Technology Co ltd; Shenzhen Dillop Intelligent Technology Co ltd
Current assignee: Shenzhen Dlp Digital Technology Co ltd; Shenzhen Dillop Intelligent Technology Co ltd
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2024-03-19
Anticipated expiration: 2042-01-13
Also published as: CN114363712A

Abstract

The invention discloses an AI digital human video generation method, device and equipment based on templated editing, wherein the method comprises the following steps: the method comprises the steps of collecting audio and video data of a target object, performing AI model training to generate an AI digital personage package, performing templated image-text video editing and previewing on a 3D template and the AI digital personage package, making a broadcasting list containing broadcasting items, combining broadcasting items containing blue background preview videos with text content to generate broadcasting item manuscripts, performing AI reasoning according to the AI digital personage model and each broadcasting item manuscripts to generate blue background AI digital personage videos and control instructions, performing image matting on the blue background AI digital personage videos, and then performing rendering synthesis with the corresponding 3D template to obtain synthesized videos. The invention belongs to the technical field of artificial intelligence, and generates the synthetic video containing the AI digital human video by a templatized editing method, and the AI digital human video content can be rapidly and batched manufactured by only modifying template parameters, thereby greatly improving the video generation efficiency.

Description

AI digital person video generation method, device and equipment based on templated editing

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an AI digital human video generation method, device and equipment based on templatized editing.

Background

The operation flow of the existing AI virtual anchor and virtual image product is anchor video acquisition- > data processing- > model training- > image output; the production output stage is based on the trained image of the anchor, and video generation and output are carried out according to the inputted manuscript and voice, and optional anchor emotion, background pictures, video, standing posture, sitting posture and the like; only the driving of characters, sound and expression can be realized, and no limb action exists; only pictures or prefabricated videos can be used as a background for synthesis output, and the three-dimensional image-text package content cannot be synthesized; and play control of the three-dimensional image-text packaging content in the video generation process cannot be performed. Therefore, the existing video generation method based on the AI virtual anchor has the problems of insufficient flexibility and low video generation efficiency.

Disclosure of Invention

The embodiment of the invention provides a template editing-based AI digital human video generation method, a template editing-based AI digital human video generation device, template editing-based AI digital human video generation equipment and a template editing-based AI digital human video generation medium, and aims to solve the problem that the AI digital human video cannot be generated efficiently in the prior art method.

In a first aspect, an embodiment of the present invention provides a template editing-based AI digital human video generation method, where the method includes:

generating an AI digital persona package corresponding to audio and video data of a target object acquired in a blue box according to a preset AI model, wherein the AI digital persona package comprises a plurality of combined personas corresponding to each target object and a blue background preview video corresponding to each combined persona; the AI digital persona package may include a 2D AI digital persona data package and/or a 3D AI digital persona data package;

performing templated image-text video editing and previewing on the 3D template and each blue background preview video in the AI digital human image package to manufacture a broadcasting list formed by broadcasting items corresponding to each blue background preview video;

combining a play item containing a blue background preview video with text content to generate a play item manuscript corresponding to each play item, wherein the text content comprises text, emotion and limb actions;

exporting changeable/replaceable contents in the broadcasting item and the broadcasting item manuscript as template parameters so as to convert the broadcasting list into a broadcasting template;

Performing AI reasoning according to each broadcasting item manuscript in a broadcasting list or each broadcasting item manuscript in a broadcasting list corresponding to a broadcasting template to generate a corresponding blue background AI digital human video and a control instruction, or generating corresponding facial expression data, skeleton model data and a control instruction; after carrying out image matting on the blue background AI digital human video corresponding to each broadcasting item manuscript, carrying out rendering synthesis on the 3D template corresponding to each broadcasting item manuscript to obtain a synthesized video corresponding to each broadcasting item; or, the facial expression data and the skeleton model data corresponding to each broadcasting item manuscript are used for driving the AI digital person 3D model to be rendered and synthesized with the 3D template corresponding to the broadcasting item manuscript, so that a synthesized video corresponding to each broadcasting item is obtained; the control instruction is used for performing play control on the broadcast content when the synthesized video is generated.

In a second aspect, an embodiment of the present invention provides an AI digital person video generating apparatus based on templated editing, including:

the image package generating unit is used for generating an AI digital person image package corresponding to the audio and video data of the target object acquired in the blue box according to a preset AI model, wherein the AI digital person image package comprises a plurality of combined images corresponding to each target object and a blue background preview video corresponding to each combined image; the AI digital persona package may include a 2D AI digital persona data package and/or a 3D AI digital persona data package;

The broadcasting list generation unit is used for carrying out templatized image-text video editing and previewing on the 3D template and each blue background preview video in the AI digital human image package so as to manufacture a broadcasting list formed by broadcasting items corresponding to each blue background preview video;

a broadcast item manuscript generation unit for combining broadcast items containing blue background preview video with text content to generate broadcast item manuscripts corresponding to each broadcast item, wherein the text content comprises text, emotion and limb actions;

a broadcast template acquisition unit for exporting the broadcast item and the changeable/replaceable content in the broadcast item manuscript as template parameters so as to convert the broadcast list into a broadcast template;

the synthetic video generating unit is used for carrying out AI reasoning according to each broadcasting item manuscript in the broadcasting list or each broadcasting item manuscript in the broadcasting list corresponding to the broadcasting template so as to generate a corresponding blue background AI digital human video and a control instruction or generate corresponding facial expression data, skeleton model data and a control instruction; after carrying out image matting on the blue background AI digital human video corresponding to each broadcasting item manuscript, carrying out rendering synthesis on the 3D template corresponding to each broadcasting item manuscript to obtain a synthesized video corresponding to each broadcasting item; or, the facial expression data and the skeleton model data corresponding to each broadcasting item manuscript are used for driving the AI digital person 3D model to be rendered and synthesized with the 3D template corresponding to the broadcasting item manuscript, so that a synthesized video corresponding to each broadcasting item is obtained; the control instruction is used for performing play control on the broadcast content when the synthesized video is generated.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for generating AI digital human video based on templated editing according to the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program when executed by a processor causes the processor to perform the method for generating AI digital human video based on templated editing according to the first aspect.

The embodiment of the invention provides an AI digital human video generation method, device and equipment based on templated editing. The method comprises the steps of collecting audio and video data of a target object, performing AI model training to generate an AI digital personage package, performing templated image-text video editing and previewing on a 3D template and the AI digital personage package, making a broadcasting list containing broadcasting items, combining broadcasting items containing blue background preview videos with text content to generate broadcasting item manuscripts, performing AI reasoning according to the AI digital personage model and each broadcasting item manuscripts to generate blue background AI digital personage videos and control instructions, performing image matting on the blue background AI digital personage videos, and then performing rendering synthesis with the corresponding 3D template to obtain synthesized videos. By the method, the composite video containing the AI digital person video is generated by the templated editing production method, and the AI digital person video content can be rapidly produced in batches only by modifying the template parameters, so that the generation efficiency of the AI digital person video content is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of an AI digital human video generation method based on templated editing provided by the embodiment of the invention;

FIG. 2 is a schematic sub-flowchart of a template editing-based AI digital human video generation method according to an embodiment of the invention;

FIG. 3 is a schematic diagram of another sub-flowchart of the template editing-based AI digital human video generation method according to an embodiment of the invention;

FIG. 4 is a schematic diagram of another sub-flowchart of the template editing-based AI digital human video generation method according to an embodiment of the invention;

FIG. 5 is a schematic diagram of still another sub-flowchart of the template editing-based AI digital human video generation method according to an embodiment of the invention;

FIG. 6 is a schematic block diagram of an AI digital human video generation apparatus based on templated editing provided by an embodiment of the invention;

Fig. 7 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Referring to fig. 1, fig. 1 is a schematic flow chart of a template editing-based AI digital human video generation method according to an embodiment of the present invention; the method for generating the AI digital human video based on the templated editing is applied to a user terminal or a management server, and the method for generating the AI digital human video based on the templated editing is executed through application software installed in the user terminal or the management server; the user terminal can be used for executing the template-based AI digital person video generation method to generate AI main board video content in a template-based editing mode, the user terminal can be a terminal device such as a desktop computer, a notebook computer, a tablet computer or a mobile phone, and the management server is a server end for executing the template-based AI digital person video generation method to generate AI main board video content in a template-based editing mode, such as a server end built in an enterprise or government department. As shown in fig. 1, the method includes steps S110 to S150.

S110, generating an AI digital personage package corresponding to the audio and video data of the target object acquired in the blue box according to a preset AI model, wherein the AI digital personage package comprises a plurality of combined personages corresponding to each target object and a blue background preview video corresponding to each combined personage.

Specifically, the audio and video data of each target object can be collected in a blue box, the blue box is an external scene of video shooting, the blue box can be red, green, blue or other single colors, an AI model can be trained based on the characteristics of the audio and video data of the target object, and an AI digital personage package is generated according to the AI digital personage model obtained through training.

Specifically, the video and audio data of the target object are collected, the AI digital human figure of the target object is generated based on the AI digital human technology platform, and the training process for the 2D AI digital human figure data packet comprises the following steps: acquiring video and audio of a target object in a blue box, and generating a 2D AI digital human figure of the target object based on an AI digital human technology platform; collecting corresponding target object blue box video matting parameters; generating an AI digital portrait preview video; and combining the target object ID numbers to generate a 2D AI digital human figure data packet. The training process for the 3D AI digital portrait data packet includes: shooting a target object video at multiple angles or 3D scanning the target object and collecting the audio of the target object, and generating a 3D AI digital person image of the target object based on an AI digital person technology platform; generating an AI digital portrait preview video; and combining the target object ID numbers to generate an AI digital portrait data packet. The AI digital personage package of the target object can be formed by training a group of AI digital personage data packages generated for the target object, and can be stored in an AI digital personage library, so that the AI digital personage package is convenient to search and use based on ID numbers.

The specific types of the AI digital persona comprise a 2D true man and a 3D true man, the AI digital persona package can comprise a 2D AI digital persona data package and/or a 3D AI digital persona data package, wherein the 2D AI digital persona data package comprises but is not limited to a 2D AI digital persona, a digital persona ID, a corresponding blue box video matting parameter, a preview video, and the 3D AI digital persona data package comprises but is not limited to a 3D AI digital persona, a digital persona ID and a preview video.

In one embodiment, as shown in FIG. 2, step S110 includes sub-steps S111 and S112.

S111, training the AI model by using the audio and video data of the target object acquired in the blue box to obtain AI digital portrait models corresponding to each combined image; s112, respectively generating blue background preview videos corresponding to each combined image according to the AI digital portrait model, wherein each combined image is obtained by combining gestures, limb actions and expressions.

Specifically, each target object may correspond to a plurality of combined images, where the combined images include a gesture, a limb action, and an expression, that is, the combined images are formed by combining a specific gesture, a specific limb action, and a specific expression, for example, the gesture is a sitting gesture or a standing gesture, the limb action is a hand-lifting indication, a nodding gesture, and the expression is smiling, serious, and the like. If the target object (a real person) can record lossless compression video in a blue box in a segmented mode, the video resolution is optionally 1280x720P, 1920x1080P or 3840x2160P; wherein the header region requires a resolution of not less than 192 x 192 pixels; and combining and recording videos according to the expressions such as standing, sitting and standing, smiling, surprise and the like, hand movements such as hand lifting instructions and the like.

Each combined image corresponds to one section of audio and video, the image characteristic corresponding to each section of audio and video can be extracted, the image characteristic and the audio characteristic of the same section of audio and video are input into an AI model for training, the image characteristic and the audio characteristic are used as input, the corresponding audio and video are used as training targets, the parameter values in the AI model are subjected to feedback adjustment by acquiring the video frames output in the training process and the loss values between the audio and the training targets, so that the AI model is trained, the combined image of each target object can be correspondingly trained to obtain an AI digital human image model, each AI digital human image model can respectively generate a corresponding blue background preview video, namely, the blue color is used as a background, one section of video frame and the audio output based on the AI digital human image model are used as preview videos generated by the AI digital human images, for example, the duration of the preview video can be 10 seconds, 15 seconds or 30 seconds, the object labels are added in each blue background preview video of the AI digital human image package, and the object labels can be used for identifying the blue background video, and the quick searching process is convenient.

The video of the target object can be shot from multiple angles to carry out three-dimensional modeling or construct a three-dimensional model image of the target object by a three-dimensional scanning method; the method directly builds the three-dimensional model image of the target object, and can directly generate facial expression data and skeleton data and send the facial expression data and skeleton data to a synthesis unit to drive an AI digital person 3d model to conduct video rendering synthesis besides generating a blue background AI digital person video during AI reasoning.

And S120, carrying out templated image-text video editing and previewing on the 3D template and each blue background preview video in the AI digital human figure package so as to manufacture a broadcasting list formed by broadcasting items corresponding to each blue background preview video.

And selecting a 3D template and an AI digital portrait preview video to edit and preview the graphics and video, setting parameters of various graphics and text elements, and making a broadcasting list containing broadcasting items.

In one embodiment, as shown in FIG. 3, step S120 includes sub-steps S121 and S122.

S121, according to the image identifications in the 3D template, obtaining blue background preview videos of combined images corresponding to each image identification in the AI digital person image package as scene elements to replace the image identifications.

S122, after the blue background preview video added into the 3D template is subjected to image matting to generate a video frame sequence with an Alpha channel, the video frame sequence is combined with the three-dimensional virtual element in the 3D template, and templated text video editing and previewing are performed to manufacture broadcasting items corresponding to each target object.

Specifically, each AI digital person image can be used as an element in a three-dimensional scene based on a template image-text editing tool such as a VR editor and image-text column packaging, a blue background preview video corresponding to each AI digital person image is used as a substitute, a video frame sequence with an Alpha channel is generated after image matting, the video frame sequence is integrated with various three-dimensional virtual elements such as the three-dimensional scene, a three-dimensional object, a combined animation, three-dimensional simulation, information access of the Internet of things, fused medium information access, big data acquisition, arbitrary algorithm control, AI driving, visual display of image-text chart data, PPT, pictures, videos, characters and the like in a 3D template, the template image-text packaging video editing and previewing are performed, broadcasting items are manufactured, and a broadcasting list (or a broadcasting template) is formed by combining a plurality of broadcasting items. The broadcasting modes of the broadcasting items in the broadcasting list comprise automatic broadcasting, manual broadcasting, timing broadcasting, sequential broadcasting, hot key triggering broadcasting and VR handle triggering, and the broadcasting items can be independently broadcasted or broadcasted in combination with other broadcasting items.

In this embodiment, the study object refers to an object shown by a 3D template, and its constituent elements include, but are not limited to: three-dimensional models (e.g., three-dimensional models in the format of. Obj/. Fbx/.3ds/. Ac/. Stl/. Wrl/. Igs), three-dimensional animations, special effects, fused media access, algorithms, databases, data access, text, pictures, graphics, video, and the like. The three-dimensional model acquisition mode can be realized by directly importing an existing research object or remanufacturing a new research object, in the process of importing the existing research object, related object importing can be selected from external modeling software, or the existing object importing can be directly selected in the VR editor, for remanufacturing the new research object, a research object meeting requirements can be directly built in the VR editor according to actual requirements, or by means of association of the external modeling software and the VR editor, the research object meeting requirements can be built from the external modeling software according to the actual requirements and transmitted to the VR editor for acquisition.

The method for obtaining the three-dimensional model by selecting the display interfaces of the external equipment and various display terminals by the user can be directly imported into a historical template in a database, or can be used for creating a model according to the needs of individuals, 3D modeling is carried out on a research object or a plurality of objects in any mode, and the obtained three-dimensional model is read by a template animation designer.

In this embodiment, the 3D template is used to expose a process for study object attribute and/or state value changes; the subject includes, but is not limited to, a three-dimensional model. The 3D template is a set of attribute/state changes of the researched object/objects, and generates various attribute connections, attribute simulations and combined animations according to specific logic relations and state change requirements based on various 3D template elements aiming at the attribute/state changes of the researched object/objects, so that the external connection parts do not need to be internally realized, and the external connection parts need to be displayed according to attribute/state values, input and output classifications.

When the 3D templates are acquired, one 3D template can be used alone or in other 3D templates. When used alone or within other 3D templates, the manner of use may be direct importation use or importation modification use by the VR editor, including but not limited to the following: the replaceable research object is any object; different logic relationship triggers can be employed; a mechanism model may be modified, including but not limited to physical formulas, mathematical functions, biochemical characteristics, etc.; the association of attributes (including internal and external associations of attributes) may also be altered to modify the data sources upon which the subject attribute change process is based, not limited to the internally configured attribute associations of 3D templates, any data structure (e.g., stock, weather data, sensor data, fused media access-microblog, weChat, SMS, from media, web site, etc.), any external program/algorithm, etc.

When the 3D template is used, a plurality of attributes and/or state values need to be adjusted and modified in real time, so that the related attributes need to be associated when the 3D template is manufactured, and the corresponding real-time adjustment and modification can be realized when the 3D template is used. Corresponding to the need for real-time attribute and/or state value adjustment modification, the association manner of the corresponding attribute can be divided into the following two types: the association inside the attribute can directly input and adjust and modify the attribute and/or the state value, and can also calculate and assign the adjustment and modification through the function; the attribute is externally associated, and the attribute and/or state value is updated in real time by associating the attribute with any external data structure, any external program/algorithm and the like.

The first method of the internal association of the attribute is to directly input a numerical value, so as to realize adjustment and modification of the corresponding attribute and/or state value; a second method of intra-attribute association is to adjust by calculating assignments through functions, by adding custom script functions to attributes including, but not limited to y1=sinx, y2=2x ² And the like, when x=1, y1=sin1 and y2=2 are calculated through functions, and then the calculation result is assigned to the attribute parameters to realize the self-defined adjustment of the attribute values; the attribute external associations include associations of corresponding attributes and/or state values with any data structures, any programs/algorithms, and the like. Any data structure, including but not limited to: stock, weather data, sensor data, media access such as microblog, weChat, SMS, self-media, website and the like; association with any program/algorithm, including but not limited to: all kinds of programs realized by any programming language can read and modify any data in the 3D template.

The attribute external association realizes real-time update by associating with an external arbitrary data structure, and realizes that a user can associate the attribute and/or state value of the 3D template with the external data source without programming by reserving an interface accessed by the external data structure and an access module of various data sources, including but not limited to: text files, excel documents, odbc data, sql data and the like are associated in real time and can be updated to relevant contents of the 3D template in real time. For example: the weather data are connected to the relevant attributes of the model to complete visual display of weather conditions, and the weather conditions are used for virtual simulation; connecting the information of the fusion media such as microblog, weChat and website information to complete the real-time fusion of the information system; the motion capture data are connected in real time and used for supporting skeleton animation editing and manufacturing and realizing biological virtual simulation.

The attribute external association controls the change of the attribute and/or the state value through an external program/algorithm, for example, the flight attitude and the power system attribute of the aircraft model are controlled in real time through an external flight control algorithm, so that the aircraft simulation is realized; and controlling flow field parameters in the 3D template through a flow field simulation control algorithm to realize test simulation of the aerodynamic profile of the new energy automobile and the like.

In addition, the attribute external association can be used for realizing the adjustment and modification of the external data structure and the external program/algorithm on the attribute and/or the state value of the 3D template, and outputting the attribute and/or the state value of the 3D template to the external data structure and the external algorithm so as to realize that the 3D template is used for outputting or controlling the external data and/or the external algorithm.

S130, combining the broadcast items containing the blue background preview video with text content to generate a broadcast item manuscript corresponding to each broadcast item, wherein the text content comprises text, emotion and limb actions.

According to the object image labels of the blue background preview video in the broadcasting item, the text content of the broadcasting item can be searched and combined with the broadcasting item, for example, the text content of the weather forecast broadcasting is matched with the combined image corresponding to standing, hand lifting indication and smile, and the text content of the weather forecast broadcasting can be combined with the corresponding broadcasting item, wherein the text content comprises texts, emotions and limb actions, and the emotions and the limb actions can be used as identifiers corresponding to the texts. Based on a template image-text editing tool such as a VR editor and image-text column packaging, the edited items containing the AI digital person model in the broadcast items are input with corresponding texts and identifiers such as emotion, action and the like, and broadcast item manuscripts are generated.

S140, exporting the broadcasting item and the changeable/replaceable content in the broadcasting item manuscript as template parameters so as to convert the broadcasting list into a broadcasting template.

The content which needs to be changed and replaced in the broadcasting item when the video is generated is exported to be the template parameter, so that the broadcasting list is packaged into the easy-to-use broadcasting template, and videos with different contents can be manufactured only by modifying the template parameters when the video is generated, thereby avoiding the complex editing and manufacturing process of the broadcasting list.

S150, carrying out AI reasoning according to each broadcasting item manuscript in a broadcasting list or each broadcasting item manuscript in a broadcasting list corresponding to a broadcasting template to generate a corresponding blue background AI digital person video and a control instruction, or generating corresponding facial expression data, skeleton model data and a control instruction; after carrying out image matting on the blue background AI digital human video corresponding to each broadcasting item manuscript, carrying out rendering synthesis on the 3D template corresponding to each broadcasting item manuscript to obtain a synthesized video corresponding to each broadcasting item; or, the facial expression data and the skeleton model data corresponding to each broadcasting item manuscript are used for driving the AI digital person 3D model to be rendered and synthesized with the 3D template corresponding to the broadcasting item manuscript to obtain the synthesized video corresponding to each broadcasting item, and the control instruction is used for playing and controlling broadcasting content when generating the synthesized video.

Aiming at broadcasting items containing AI digital person images in a broadcasting list, inputting corresponding manuscript content, and sending the manuscript content to an AI reasoning unit to generate AI digital person data in a reasoning way; aiming at the broadcasting template, the corresponding manuscript content is modified through the input template parameters and is sent to an AI reasoning unit to generate AI digital person data in a reasoning way; AI digital data contains: 1) Blue background AI digital person video, or AI digital person 3D expression data, AI digital person 3D limb motion data, etc.; 2) Voice data corresponding to the manuscript text; 3) And playing the control instruction. The manuscript content contains text, emotion, limb actions, control instructions, etc.

Specifically, the broadcasting item manuscript can be pushed to an AI reasoning server, so that blue background AI digital human video and control instructions are generated. Specifically, according to the contents such as text, emotion, limb actions and the like in the broadcasting item manuscript, the video conforming to the mouth shape, emotion and limb actions of characters in the text can be inferred from the AI digital person image library and combined so as to generate blue background AI digital person video; and carrying out matching analysis on keywords in the text to obtain corresponding control instructions from the keywords, wherein the control instructions are used for carrying out play control on the annotation, animation play, text display or video play of the PPT, and the keywords corresponding to the control instructions are play, next page, pause and the like.

Or carrying out AI reasoning according to each broadcasting item manuscript to generate corresponding facial expression data, skeleton model data and control instructions. Specifically, according to the text and emotion in the broadcasting item manuscript, the facial expression data which accords with the mouth shape and emotion of the characters in the text can be deduced from the AI digital person image library, the facial expression data is the data for recording the change characteristics of the faces of the AI digital person, the facial video of the AI digital person can be restored according to the facial expression data, the bone model data which accords with the limb motion can be deduced from the AI digital person image library according to the limb motion in the broadcasting item manuscript, the bone model data is the data for recording the limb motion change characteristics of the AI digital person, and the limb motion video of the AI digital person can be restored according to the bone model data. And carrying out matching analysis on keywords in the text to obtain corresponding control instructions therefrom, wherein the control instructions are used for carrying out play control on the annotation, animation play, text display or video play of the PPT.

The rendering synthesis unit receives the AI digital person data generated by the AI reasoning unit and the corresponding broadcasting list or broadcasting template to render and synthesize to obtain an AI digital person synthesized video; the control instruction is used for playing and controlling broadcasting content when the AI digital person synthesizes the video; and the voice data and the AI digital person synthesize video and synchronously output.

The rendering server acquires blue background AI digital human video corresponding to the broadcasting item manuscript from the AI reasoning server, performs image matting, and then performs rendering synthesis with various three-dimensional virtual elements to output a final synthesized video; and acquiring a control instruction corresponding to the broadcasting item manuscript from the AI reasoning server for controlling broadcasting contents (marking, animation broadcasting, text displaying, video broadcasting and the like) so as to realize interaction between the AI virtual AI digital person and the broadcasting contents, wherein if the control instruction comprises instruction information and a corresponding instruction time point, the control instruction can trigger and execute the corresponding instruction information when reaching the instruction time point, and realize broadcasting control of the broadcasting contents through the control instruction.

Specifically, the facial expression data and the skeleton model data corresponding to each broadcasting item manuscript can be used for driving the AI digital person 3D model to be rendered and synthesized with the 3D template corresponding to the broadcasting item manuscript, so that the synthesized video corresponding to each broadcasting item can be obtained.

That is, AI rendering is performed on facial expression data based on an AI digital person 3d model, so as to obtain a facial video corresponding to the facial expression data, AI rendering is performed on skeleton model data based on the AI digital person 3d model, so as to obtain a limb action video corresponding to the skeleton model data, other items such as hair and clothing are consistent with original content in the AI digital person 3d model, speech synthesis is performed on the basis of text in a broadcasting item manuscript and audio features in the AI digital person 3d model to obtain text speech content, the above information is combined, so that AI digital person video corresponding to the facial expression data and the skeleton model can be rendered, and a synthetic video is generated based on the AI digital person video.

In one embodiment, as shown in FIG. 4, step S150 includes sub-steps S151, S152, and S153.

S151, carrying out image matting on the blue background AI digital human video corresponding to each broadcasting item manuscript to obtain video image matting.

The blue background AI digital human video can be scratched, namely, the blue background is scratched, and the video comprising the AI digital human image is left. Video matting can be used as an element for video composition.

S152, adjusting the 3D template and configuring parameters.

In one embodiment, as shown in fig. 5, step S152 includes sub-steps S1521, S1522, and S1523.

S1521, adjusting the attribute and/or state value of the research object through an internally set relation to update the attribute and/or state value of the research object in real time; and/or, adjusting through the connection relation between the 3D templates to update the attribute and/or state value of the research object in real time; and/or adjusting based on the connection relation between the attribute and/or state value of the research object and external data and/or external algorithm so as to update the attribute and/or state value of the research object in real time.

When the 3D template is adjusted, corresponding triggering conditions can be set, wherein the triggering conditions comprise triggering of processes such as VR simulation, starting operation and ending operation of VR animation, and change of intermediate state, and the setting modes of the triggering conditions comprise but are not limited to: at least one of a time axis ordered triggering mode, an event triggering mode and a conditional triggering mode.

The time axis orderly triggering mode is to trigger VR animation, VR simulation and the like in sequence according to the time set by an editor. For example, if it is edited that the car starts and runs at a time of 1 minute and 30 seconds after the start of the corresponding time axis, the car washing must start and start running by 1 minute and 30 seconds. The event triggering mode refers to a triggering mode of VR animation and VR simulation, which is based on the premise that a certain event occurs, that is, when a certain event occurs, the VR animation and the VR simulation are triggered. For example, a mouse left click may be followed by a fan start to rotate or a sky darkens and floats to a large scale and a tree starts to swing. The condition triggering mode refers to a triggering mode of VR animation and VR simulation, and is based on the premise that a certain condition is met. For example, boiling begins when the water temperature attribute value is greater than or equal to 100 ℃ at standard atmospheric pressure.

S1522, configuring relative position parameters between the video matting and the 3D template.

And the corresponding relation between the video matting and the VR elements in the 3D template can be configured, such as adjusting the position of the video matting in the three-dimensional scene in the 3D template.

S1523, configuring the position parameters of the subtitle templates in the 3D templates.

The position of the caption template can be configured in the 3D template, so that the Chinese characters in the manuscript can be displayed in the synthesized video through the caption, and the corresponding sound is output together with the generated synthesized video.

And S153, according to the configured parameters, performing corresponding value change on attributes and/or states of the study objects in the 3D template based on VR simulation and VR animation generated by the 3D template and the video matting, and obtaining the synthesized video corresponding to each broadcasting item.

According to the method, VR simulation and VR animation generated based on the 3D template and video matting can be performed according to configured parameters, and the attribute and/or state of a study object in the 3D template are triggered to change corresponding values, such as the AR/VR model is subjected to zoom-in and zoom-out adjustment according to set adjustment information, and the synthetic video corresponding to each playing item can be generated by combining the video matting and playing of playing content (marking of a PPT, animation playing, text displaying, video playing and the like) based on the change of various three-dimensional model materials of the 3D template. Any material includes, but is not limited to: the three-dimensional model, the three-dimensional animation, the special effect, the fused medium access, the algorithm, the database, the data access, the characters, the pictures, the graphics, the video and the like can be edited, combined, transplanted and mixed; editing and making VR content based on a 3D template, wherein the 3D template can be a plurality of functional type templates aiming at any content; the association between any attribute values in the 3D templates is realized, the association between any attribute values in the 3D templates can be also realized, and the association between the 3D templates and external data structures and external algorithms can be realized, so that the method is used for realizing VR simulation and VR animation; the method has the advantages that the method has a powerful motion effect editing function, supports 3D track animation and key frame animation, and realizes any coordinate parameter, gesture parameter, appearance parameter and attribute animation of any study object (including but not limited to a three-dimensional model); in the 3D template manufacturing process, the output effect can be previewed at any time, animation and simulation effects can be conveniently modified in real time, and editing of visual operation is realized; the study object can define any attribute by user, including but not limited to weight, speed, density, pressure intensity, molecular weight, pH value, illumination, magnetic field intensity, resistivity, hardness and other attributes, and any association can be realized between any attributes; the method can be directly assigned and modified in real time, and can also realize the custom adjustment of the attribute value by giving a mathematical function; the method supports various broadcasting modes such as automatic broadcasting, manual broadcasting, timing broadcasting, sequential broadcasting, hot key triggering broadcasting and the like, and supports the combined broadcasting of each broadcasting item so as to realize various combined animation broadcasting modes; based on the time track and event quantity triggering mode, controlling editing broadcasting, realizing complex logic dynamic virtual simulation of a real scene, and being used for virtual simulation and data visualization; realizing real-time connection of the programming-free external data with any attribute; realizing real-time fusion of fusion media such as social media, website information and the like; the support is connected with motion capture data, is used for editing and making skeleton animation and is used for realizing biological virtual simulation; the VR content edited can be played based on various types of terminals including, but not limited to, VR all-in-one, helmets, stereoscopic LED large screen/projection, naked eye 3D screen, and the like.

Taking courseware manufacture as an example, adding a 3D template (comprising a blackboard and a lectern) of a lecture, adding a virtual large screen model for playing and displaying the courseware, selecting a combined image of a target object from an AI digital person image package, and adding a blue background preview video corresponding to the combined image to the current 3D template to generate a broadcasting item; setting the relative positions of a three-dimensional lecture scene, a virtual large screen model and an AI digital person in the 3D template, and setting an entrance and exit animation for the virtual large screen (adjusting parameters of model elements in the 3D template); adding text content corresponding to a lecture, and integrating the AI digital person video after the image is scratched with a three-dimensional lecture scene, the text content and a virtual large screen model to obtain a blue background AI digital person video and a control instruction; adding a caption template and adjusting the position for displaying the manuscript caption of the target object when generating the video; the virtual large screen is used for displaying courseware content: pictures, videos, ppt, etc., and finally generates a composite video corresponding to the playout entry.

The above editing production may further generate templates: the content which needs to be changed and replaced when the video is generated can be exported as the template modification parameters, and when the video is generated, videos with different contents can be manufactured by modifying and setting the template parameters through simple web pages or mobile phone App, so that the complex editing and manufacturing process is avoided.

In the method for generating the AI digital person video based on the templated editing provided by the embodiment of the invention, audio and video data of a target object are collected to perform AI model training to generate an AI digital person image package, the 3D template and the AI digital person image package are subjected to templated image and text video editing and previewing to manufacture a broadcasting list containing broadcasting items, broadcasting items containing blue background preview video and text content are combined to generate broadcasting item manuscripts, AI reasoning is performed according to the AI digital person image model and each broadcasting item manuscripts to generate blue background AI digital person video and control instructions, and the blue background AI digital person video is subjected to image matting and then is subjected to rendering synthesis with the corresponding 3D template to obtain a synthetic video. By the method, the composite video containing the AI digital person video is generated by the templated editing production method, and the AI digital person video content can be rapidly produced in batches only by modifying the template parameters, so that the generation efficiency of the AI digital person video content is greatly improved.

The embodiment of the invention also provides a template-editing-based AI digital human video generation device which can be configured in a user terminal or a management server and is used for executing any embodiment of the template-editing-based AI digital human video generation method. Specifically, referring to fig. 6, fig. 6 is a schematic block diagram of an AI digital human video generating apparatus based on templated editing according to an embodiment of the present invention.

As shown in fig. 6, the AI digital personal video generation apparatus 100 based on templated editing includes a persona package generation unit 110, a play list generation unit 120, a play item manuscript generation unit 130, a play template acquisition unit 140, and a composite video generation unit 150.

The image packet generating unit 110 is configured to generate an AI digital person image packet corresponding to audio/video data of a target object collected in a blue box according to a preset AI model, where the AI digital person image packet includes a plurality of combined images corresponding to each of the target objects and a blue background preview video corresponding to each of the combined images.

In a specific embodiment, the image package generating unit 110 includes a subunit: the AI model training unit is used for training the AI model by using the audio and video data of the target object acquired in the blue box so as to obtain AI digital portrait models corresponding to each combined image; and the blue background preview video generation unit is used for respectively generating blue background preview videos corresponding to each combined image according to the AI digital portrait model, and each combined image is obtained by combining gestures, limb actions and expressions.

The play list generating unit 120 is configured to perform templatized image-text video editing and previewing on the 3D template and each blue background preview video in the AI digital avatar packet, so as to make a play list composed of play items corresponding to each blue background preview video.

In a specific embodiment, the play list generating unit 120 includes a subunit: and the scene element replacing unit is used for acquiring a blue background preview video of a combined image corresponding to each image identification in the AI digital person image package as a scene element to replace the image identification according to the image identification in the 3D template. And the broadcasting item generating unit is used for generating a video frame sequence with an Alpha channel by matting the blue background preview video added into the 3D template, combining the video frame sequence with the Alpha channel with the three-dimensional virtual element in the 3D template, and editing and previewing the templated text video so as to manufacture broadcasting items corresponding to each target object.

And a broadcast item manuscript generation unit 130, configured to combine a broadcast item including a blue background preview video with text content to generate a broadcast item manuscript corresponding to each broadcast item, where the text content includes text, emotion, and limb motion.

And a play-out template acquisition unit 140 for deriving the play-out items and changeable/replaceable contents in the play-out item manuscript as template parameters to convert the play-out list into a play-out template.

A composite video generating unit 150, configured to perform AI reasoning according to each of the broadcast item documents in the broadcast list or each of the broadcast item documents in the broadcast list corresponding to the broadcast template, so as to generate a corresponding blue background AI digital personal video and a control instruction, or generate corresponding facial expression data, skeleton model data and a control instruction; after carrying out image matting on the blue background AI digital human video corresponding to each broadcasting item manuscript, carrying out rendering synthesis on the 3D template corresponding to each broadcasting item manuscript to obtain a synthesized video corresponding to each broadcasting item; or, the facial expression data and the skeleton model data corresponding to each broadcasting item manuscript are used for driving the AI digital person 3D model to be rendered and synthesized with the 3D template corresponding to the broadcasting item manuscript, so that a synthesized video corresponding to each broadcasting item is obtained; the control instruction is used for performing play control on the broadcast content when the synthesized video is generated.

In a specific embodiment, the composite video generating unit 150 includes a subunit: and the image matting unit is used for matting the blue background AI digital human video corresponding to each broadcasting item manuscript to obtain video matting. The adjusting and configuring unit is used for adjusting the 3D template and configuring parameters; and the video generation unit is used for carrying out corresponding value change on the attribute and/or the state of the research object in the 3D template according to the configured parameters based on VR simulation and VR animation generated by the 3D template and the video matting, and obtaining the synthesized video corresponding to each broadcasting item.

In a specific embodiment, the adjustment configuration unit includes a subunit: the adjusting unit is used for adjusting the attribute and/or state value of the research object through an internally set relation so as to update the attribute and/or state value of the research object in real time; and/or, adjusting through the connection relation between the 3D templates to update the attribute and/or state value of the research object in real time; and/or adjusting based on the connection relation between the attribute and/or state value of the research object and external data and/or external algorithm so as to update the attribute and/or state value of the research object in real time; the first position parameter configuration unit is used for configuring relative position parameters between the video matting and the 3D template; and the second position parameter configuration unit is used for configuring the position parameters of the caption templates in the 3D templates.

The AI digital person video generating device based on templated editing provided by the embodiment of the invention applies the AI digital person video generating method based on templated editing, collects audio and video data of a target object, carries out AI model training to generate an AI digital person image package, carries out templated image and text video editing and previewing on a 3D template and the AI digital person image package, makes a play list containing play items, combines the play items containing blue background preview video with text content to generate play item manuscripts, carries out AI reasoning according to the AI digital person image model and each play item manuscript to generate blue background AI digital person video and a control instruction, carries out image matting on the blue background AI digital person video, and then carries out rendering synthesis with a corresponding 3D template to obtain a synthetic video. By the method, the composite video containing the AI digital person video is generated by the templated editing production method, and the AI digital person video content can be rapidly produced in batches only by modifying the template parameters, so that the generation efficiency of the AI digital person video content is greatly improved.

The above-described AI digital human video generating apparatus based on templated editing may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 7.

Referring to fig. 7, fig. 7 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device may be a user terminal or a management server for performing the templated editing-based AI digital personal video generation method to generate AI motherboard video content in a templated editing-based manner.

With reference to FIG. 7, the computer device 500 includes a processor 502, a memory, and a network interface 505, connected by a system bus 501, where the memory may include a storage medium 503 and an internal memory 504.

The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a templated editing-based AI digital human video generation method, wherein the storage medium 503 may be a volatile storage medium or a non-volatile storage medium.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of a computer program 5032 in the storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform the templated editing-based AI digital human video generation method.

The network interface 505 is used for network communication, such as providing for transmission of data information, etc. It will be appreciated by those skilled in the art that the architecture shown in fig. 7 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computer device 500 to which the present inventive arrangements may be implemented, as a particular computer device 500 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

The processor 502 is configured to execute a computer program 5032 stored in a memory, so as to implement the corresponding functions in the above-mentioned template editing-based AI digital human video generation method.

Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 7 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 7, and will not be described again.

It should be appreciated that in an embodiment of the invention, the processor 502 may be a central processing unit (Central Processing Unit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium. The computer readable storage medium stores a computer program which when executed by a processor implements the steps included in the above-described templated editing-based AI digital human video generation method.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention is essentially or part of what contributes to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a computer-readable storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned computer-readable storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. An AI digital human video generation method based on templated editing, which is characterized by comprising the following steps:

generating an AI digital persona package corresponding to audio and video data of a target object acquired in a blue box according to a preset AI model, wherein the AI digital persona package comprises a plurality of combined personas corresponding to each target object and a blue background preview video corresponding to each combined persona; the AI digital portrait packet comprises a 2D AI digital portrait data packet and/or a 3D AI digital portrait data packet;

2. The template editing-based AI digital person video generation method according to claim 1, wherein the generating the AI digital person image package corresponding to the audio and video data of the target object collected in the blue box according to the preset AI model includes:

training the AI model by using the audio and video data of the target object collected in the blue box to obtain AI digital human figure models corresponding to each combined image;

and respectively generating a blue background preview video corresponding to each combined image according to the AI digital portrait model, wherein each combined image is obtained by combining gestures, limb actions and expressions.

3. The method for generating AI digital personal video based on templatized editing according to claim 1, wherein said templatizing teletext video editing and previewing each blue background preview video in the 3D template and the AI digital personal video package to make a play list composed of play items corresponding to each of the blue background preview videos, comprises:

according to the character identifiers in the 3D template, obtaining a blue background preview video of a combined character corresponding to each character identifier in the AI digital person character package as a scene element to replace the character identifier;

And after the blue background preview video added into the 3D template is subjected to image matting to generate a video frame sequence with an Alpha channel, combining the video frame sequence with the Alpha channel with the three-dimensional virtual element in the 3D template, and performing templated text video editing and previewing to manufacture broadcasting items corresponding to each target object.

4. The templated editing-based AI digital human video generation method of claim 1, wherein the broadcasting modes of broadcasting the broadcasting items in the broadcasting list include automatic broadcasting, manual broadcasting, timing broadcasting, sequential broadcasting, hot key triggering broadcasting and VR handle triggering, and the broadcasting items can be broadcasted alone or in combination with other broadcasting items.

5. The templated editing-based AI digital human video generation method of claim 1, wherein the control instruction is for play control of annotation, animation play, text display, or video play of the PPT.

6. The method for generating AI digital personal video based on templated editing according to claim 1, wherein after said matting the blue background AI digital personal video corresponding to each of said broadcast item documents, rendering and synthesizing the 3D template corresponding to said broadcast item document to obtain a synthesized video corresponding to each of said broadcast items, comprising:

Carrying out image matting on the blue background AI digital human video corresponding to each broadcasting item manuscript to obtain video image matting;

adjusting the 3D template and configuring parameters;

and according to the configured parameters, performing corresponding value change based on VR simulation and VR animation generated by 3D templates and video matting and triggering the attribute and/or state of the research object in the 3D templates, and obtaining the synthesized video corresponding to each broadcasting item.

7. The templated editing-based AI digital human video generation method of claim 6, wherein adjusting the 3D template and configuring trigger conditions comprises:

the attribute and/or state value of the research object is/are adjusted through the relation set internally, so that the attribute and/or state value of the research object is updated in real time; and/or, adjusting through the connection relation between the 3D templates to update the attribute and/or state value of the research object in real time; and/or adjusting based on the connection relation between the attribute and/or state value of the research object and external data and/or external algorithm so as to update the attribute and/or state value of the research object in real time;

configuring relative position parameters between video matting and the 3D template;

And configuring the position parameters of the caption templates in the 3D templates.

8. An AI digital human video generation device based on templated editing, the device comprising:

the image package generating unit is used for generating an AI digital person image package corresponding to the audio and video data of the target object acquired in the blue box according to a preset AI model, wherein the AI digital person image package comprises a plurality of combined images corresponding to each target object and a blue background preview video corresponding to each combined image; the AI digital portrait packet comprises a 2D AI digital portrait data packet and/or a 3D AI digital portrait data packet;

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the templated editing-based AI digital human video generation method of any one of claims 1 to 7 when the computer program is executed by the processor.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program which, when executed by a processor, implements the templated editing-based AI digital human video generation method according to any one of claims 1 to 7.