CN114741541B

CN114741541B - Method and device for interactive control of AI digital person to PPT based on templated editing

Info

Publication number: CN114741541B
Application number: CN202210369117.1A
Authority: CN
Inventors: 吴天生
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2022-04-08
Filing date: 2022-04-08
Publication date: 2024-07-12
Anticipated expiration: 2042-04-08
Also published as: CN114741541A

Abstract

The invention discloses a template editing-based interactive control method and device for an AI digital person to a PPT, wherein the method comprises the following steps: determining an attribute connection relation between media material elements, generating a play template according to the set display area and display mode combination, extracting page information from a PPT file, constructing a 2D mapping relation between lecture text content and a PPT page, reasoning according to the lecture text content to generate an AI digital person video frame, an AI digital person voice frame and a subtitle frame, adding the AI digital person video frame and page pictures to the play template for display, adding a prompt graphic and a laser pen mark to the corresponding display area according to the 2D mapping relation to generate an image frame, and synchronously outputting the image frame and the AI digital person voice frame. The invention belongs to the technical field of artificial intelligence, and can add a prompt graphic and a laser pen mark in a corresponding display area according to a 2D mapping relation so as to realize synchronous control of PPT background materials in the process of synthesizing videos, thereby greatly improving the application function of AI digital human videos.

Description

Method and device for interactive control of AI digital person to PPT based on templated editing

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an AI digital person-to-PPT interactive control method and device based on templatized editing.

Background

The operation flow of the existing AI virtual anchor and virtual image product is anchor video acquisition- > data processing- > model training- > image output; the production output stage is based on the trained anchor image, and optional anchor emotion, background pictures, videos, standing postures, sitting postures and the like are added according to input files and voices to generate and output videos, however, in the current AI digital person video generation system, data flow unidirectionally, only the input pictures and videos are used as backgrounds to be synthesized with AI digital persons, and background materials such as 3D pictures, videos and 3D templates cannot be synchronously controlled in the synthesis process, particularly synchronous marks of PPT content, so that the generated AI digital person video has a shortage of interaction functions, and the application functions of the generated AI digital person video are influenced. Therefore, the prior art method has the problem that the PPT background material cannot be synchronously controlled in the process of synthesizing the AI digital human video.

Disclosure of Invention

The embodiment of the invention provides a template editing-based interactive control method, device, equipment and medium for a PPT (point of interest) by an AI (analog digital) digital person, and aims to solve the problem that the background material of the PPT cannot be synchronously controlled in the process of synthesizing the video of the AI digital person in the prior art.

In a first aspect, an embodiment of the present invention provides a method for controlling interaction of an AI digital person with a PPT based on templated editing, where the method includes:

Importing media material elements and generating attribute connection, attribute simulation and combined animation corresponding to the media material elements according to specific logic relation and state change requirements;

Setting a display area and a display mode of the PPT page and a combination mode of connection, attribute simulation and combined animation of the AI digital person and the attribute according to the input setting parameters, so as to generate a corresponding play template;

Importing a PPT file into the play template, and extracting page information corresponding to each PPT page in the PPT file, wherein the page information comprises page pictures and lecture text contents corresponding to each PPT page;

Establishing a 2D mapping relation between the lecture text content and the PPT page;

Sending the text content of the lecture to an AI digital person reasoning model to generate an AI digital person video frame, an AI digital person voice frame and a caption frame which correspond to the rendering synthetic frame;

Displaying an AI digital person video frame in an AI digital person display area of the playing template according to the combination mode of the set AI digital person and the attribute connection, the attribute simulation and the combined animation;

according to the display area and the display mode of the set PPT page, displaying the page picture in the PPT display area of the play template according to the display mode;

Acquiring the region coordinates of the text content corresponding to the caption frame in the PPT page according to the 2D mapping relation between the text content of the lecture and the PPT page, and generating a prompt diagram of the text content corresponding to the current AI digital person voice frame;

acquiring the 2D mapping relation between the text content of the lecture and the PPT page, rendering and drawing a circular bright point in the region coordinates of the PPT page according to the text content of the caption, and performing laser pen identification to render and generate an image frame;

and rendering the synthesized image frame and the current AI digital human voice frame to be synchronously output to a video board card, recorded to a file or generated to be output in a network plug flow.

In a second aspect, an embodiment of the present invention provides an AI digital person-to-PPT interaction control apparatus based on templated editing, including:

The media material element attribute acquisition unit is used for importing media material elements and generating attribute connection, attribute simulation and combined animation corresponding to the media material elements according to specific logic relation and state change requirements;

the play template generating unit is used for setting a display area and a display mode of the PPT page and a combination mode of the connection of the AI digital person and the attribute, the attribute simulation and the combined animation according to the input setting parameters so as to generate a corresponding play template;

The page information extraction unit is used for importing a PPT file into the play template and extracting page information corresponding to each PPT page in the PPT file, wherein the page information comprises page pictures and lecture text contents corresponding to each PPT page;

The mapping relation construction unit is used for establishing a 2D mapping relation between the lecture text content and the PPT page;

The reasoning generation unit is used for sending the manuscript text content to the AI digital person reasoning model so as to generate an AI digital person video frame, an AI digital person voice frame and a caption frame which correspond to the rendering synthesis frame;

the AI digital person video frame display unit is used for displaying AI digital person video frames in an AI digital person display area of the playing template according to the combination mode of the set AI digital person and the attribute connection, the attribute simulation and the combined animation;

the page picture display unit is used for displaying the page picture in the PPT display area of the play template according to the display mode and the display area of the set PPT page;

The prompt graphic generation unit is used for acquiring the region coordinates of the text content corresponding to the caption frame in the PPT page according to the 2D mapping relation between the text content of the lecture and the PPT page, and generating the prompt graphic of the text content corresponding to the current AI digital person voice frame;

The image frame rendering unit is used for obtaining the 2D mapping relation between the lecture text content and the PPT page, rendering and drawing a round bright point in the region coordinates of the PPT page according to the text content corresponding to the caption frame, and performing laser pen identification to render and generate an image frame;

and the output unit is used for rendering the synthesized image frame and the current AI digital human voice frame to be synchronously output to a video board card, recorded to a file or generated to be output by a network plug flow.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for controlling interaction of AI digital people with PPT based on templated editing according to the first aspect when the processor executes the computer program.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program when executed by a processor causes the processor to execute the method for controlling interaction of AI digital people with PPT based on templated editing according to the first aspect.

The embodiment of the invention provides a template editing-based interactive control method, a template editing-based interactive control device, template editing-based interactive control equipment and template editing-based interactive control medium for a PPT. Determining an attribute connection relation between media material elements, generating a play template according to the set display area and display mode combination, extracting page information from a PPT file, constructing a 2D mapping relation between lecture text content and a PPT page, reasoning according to the lecture text content to generate an AI digital person video frame, an AI digital person voice frame and a subtitle frame, respectively adding the AI digital person video frame and page pictures to the play template for display, adding a prompt graphic and a laser pen mark to the corresponding display area of the PPT page according to the 2D mapping relation to render and generate an image frame, and synchronously outputting the image frame and the AI digital person voice frame. Through the method, page information can be extracted from the PPT file, a corresponding 2D mapping relation is constructed, and a prompt graphic and a laser pen mark are added in a corresponding display area of the PPT page according to the 2D mapping relation, so that synchronous control of PPT background materials in the process of synthesizing the AI digital person video is realized, and the application function of the AI digital person video is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method for controlling interaction of an AI digital person to a PPT based on templated editing according to an embodiment of the invention;

FIG. 2 is a schematic sub-flowchart of a template editing-based interactive control method of AI digital people to PPT provided in an embodiment of the invention;

FIG. 3 is another flow chart of the interactive control method of the AI digital person to the PPT based on templated editing provided by the embodiment of the invention;

FIG. 4 is a schematic diagram of another sub-flow of the method for controlling interaction of AI digital person to PPT based on templatized editing according to the embodiment of the invention;

FIG. 5 is a schematic diagram of another sub-flowchart of the method for controlling interaction of an AI digital person with a PPT based on templatized editing according to an embodiment of the invention;

FIG. 6 is a schematic diagram of another sub-flowchart of the method for controlling interaction of the AI digital person with the PPT based on templatized editing according to the embodiment of the invention;

FIG. 7 is a schematic block diagram of an AI digital person-to-PPT interaction control device based on templated editing provided by an embodiment of the invention;

Fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Referring to fig. 1, fig. 1 is a schematic flow chart of a method for controlling interaction of AI digital people with PPT based on templated editing according to an embodiment of the present invention; the interactive control method of the AI digital person to the PPT based on the templated editing is applied to a user terminal or a management server, and the interactive control method of the AI digital person to the PPT based on the templated editing is executed through application software installed in the user terminal or the management server; the user terminal can be a terminal device such as a desktop computer, a notebook computer, a tablet computer or a mobile phone, and the management server is a server end constructed in the enterprise, and is used for executing the interactive control method of the AI digital person on the PPT based on the templated editing so as to acquire the parameter information uploaded by the user terminal and the PPT file to generate the corresponding video board card, record the corresponding video board card to the file or generate the network plug flow output. As shown in fig. 1, the method includes steps S101 to S110.

S101, importing media material elements and generating attribute connection, attribute simulation and combined animation corresponding to the media material elements according to specific logic relation and state change requirements.

Specifically, a user can import various media material elements through the template making unit, wherein the media material elements comprise pictures, videos, flash, webpages, streaming media, 3D models, animations, sounds and the like, and various attribute connections, attribute simulations and combined animations of the various media material elements are generated according to specific logic relations and state change requirements.

S102, setting a display area and a display mode of the PPT page and a combination mode of the AI digital person and the attribute connection, the attribute simulation and the combined animation according to the input setting parameters, so as to generate a corresponding playing template.

The user can set parameters through the template making unit, namely, the display area and mode (such as windowing or full screen) of the PPT page and the synthesis mode (such as picture-in-picture, full screen and the like) of the AI digital person and the picture-text elements can be set according to the input set parameters, so that a corresponding playing template is generated, and the picture-text elements are the attribute connection, the attribute simulation and the combined animation.

S103, importing a PPT file into the play template, and extracting page information corresponding to each PPT page in the PPT file, wherein the page information comprises page pictures and lecture text contents corresponding to each PPT page.

The play template manufactured by the method can be opened in the rendering and synthesizing unit, a user can import a PPT file into the play template, the PPT file comprises a plurality of PPT pages, page pictures of each PPT page can be respectively obtained from the PPT file, and the text contents of the lectures of each PPT page are tried to be extracted. .

In one embodiment, as shown in FIG. 2, step S103 includes sub-steps S131 and S132.

S131, generating a page picture corresponding to each page in the PPT file.

For example, a PPT page in a PPT file may be screenshot to obtain a corresponding page picture.

S132, extracting the manuscript text content of each page in the PPT file.

If the PPT page is in a text mode, the content in the page can be directly read to obtain the corresponding lecture text content; for example, for a PPT page embedded in a picture manner, an OCR (Optical Character Recognition ) recognition technology may be used to extract corresponding lecture text content from the PPT page, or a manual entry manner may be used to obtain corresponding lecture text content.

S104, establishing a 2D mapping relation between the lecture text content and the PPT page.

Specifically, two virtual windows can be constructed, one window displays the text content of the lecture, the other window displays the PPT page picture, a section of text set is selected in the lecture text window according to punctuation marks, and the display area coordinates of the corresponding text set are selected in the PPT page window, so that the 2D mapping relation from the lecture text to the PPT page is established.

S105, sending the manuscript text content to an AI digital person reasoning model to generate an AI digital person video frame, an AI digital person voice frame and a caption frame corresponding to the rendering synthetic frame.

The obtained manuscript text content can be sent to a pre-trained AI digital person reasoning model for reasoning, so that corresponding AI digital person video, AI digital person voice and caption are generated, and corresponding AI digital person video frames, AI digital person voice frames and caption frames are respectively obtained based on the AI digital person video, the AI digital person voice and the caption.

In one embodiment, as shown in fig. 3, step S1501 is included before step S105.

S151, generating an AI digital person reasoning model corresponding to the audio and video data of the target object acquired in the blue box according to a preset AI model, wherein the AI digital person reasoning model comprises a plurality of combined images corresponding to each target object and a blue background preview video corresponding to each combined image; the AI digital person reasoning model comprises a 2D AI digital person and/or a 3D AI digital person.

Specifically, the audio and video data of each target object can be collected in a blue box, the blue box is an external scene of video shooting, the blue box can be red, green, blue or other single colors, and an AI model can be trained based on the characteristics of the audio and video data of the target object, so that an AI digital person reasoning model obtained through training can be obtained, and the AI digital person reasoning model can comprise an AI digital person image library.

Specifically, the video and audio data of the target object are collected, an AI digital person reasoning model of the target object is generated based on an AI digital person technology platform, and the training process for the 2D AI digital person reasoning model comprises the following steps: acquiring video and audio of a target object in a blue box, and generating a 2D AI digital human figure of the target object based on an AI digital human technology platform; collecting corresponding target object blue box video matting parameters; generating an AI digital portrait preview video; and combining the target object ID numbers to generate an AI digital person reasoning model containing the 2D AI digital person image. The training process for the 3D AI digital human reasoning model includes: shooting a target object video at multiple angles or 3D scanning the target object and collecting the audio of the target object, and generating a 3D AI digital person image of the target object based on an AI digital person technology platform; generating an AI digital portrait preview video; and combining the target object ID numbers to generate an AI digital person reasoning model containing the AI digital person image. The AI digital persona of the target object can be formed by training the generated group of AI digital personas aiming at the target object, and the AI digital personas can be stored in an AI digital persona library of an AI digital persona reasoning model, so that the AI digital persona is convenient to search and use based on the ID number.

Specific types of the AI digital personas comprise 2D true personas and 3D true personas, 2D AI digital personas and/or 3D AI digital personas can be stored in an AI digital persona library, wherein the 2D AI digital personas comprise but are not limited to 2D AI digital personas, digital persona IDs, corresponding blue box video matting parameters and preview videos, and the 3D AI digital personas comprise but are not limited to 3D AI digital personas, digital persona IDs and preview videos.

Specifically, each target object may correspond to a plurality of combined images, where the combined images include a gesture, a limb action, and an expression, that is, the combined images are formed by combining a specific gesture, a specific limb action, and a specific expression, for example, the gesture is a sitting gesture or a standing gesture, the limb action is a hand-lifting indication, a nodding gesture, and the expression is smiling, serious, and the like. If the target object (a real person) can record lossless compression video in a blue box in a segmented mode, the video resolution is optionally 1280x720P, 1920x1080P or 3840x2160P; wherein the header region requires a resolution of not less than 192 x 192 pixels; and combining and recording videos according to the expressions such as standing, sitting and standing, smiling, surprise and the like, hand movements such as hand lifting instructions and the like.

Each combined image corresponds to one section of audio and video, the image characteristic corresponding to each section of audio and video can be extracted, the image characteristic and the audio characteristic of the same section of audio and video are input into an AI model for training, the image characteristic and the audio characteristic are used as input, the corresponding audio and video are used as training targets, the parameter values in the AI model are subjected to feedback adjustment by acquiring the video frames output in the training process and the loss values between the audio and the training targets, so that the AI model is trained, the combined image of each target object can be correspondingly trained to obtain an AI digital person inference model, each AI digital person inference model can respectively generate a corresponding blue background preview video, namely, the preview video generated by taking one section of video frames and the audio output by the AI digital person inference model as AI digital person images is used as the background, for example, the duration of the preview video can be 10 seconds, 15 seconds or 30 seconds, the object image labels are added in each blue background preview video of the AI digital person image, and the object labels can be used for identifying the blue background preview video, and the quick search process is convenient.

The video of the target object can be shot from multiple angles to carry out three-dimensional modeling or construct a three-dimensional model image of the target object by a three-dimensional scanning method; the method directly builds a three-dimensional model image of a target object, can directly generate facial expression data and skeleton data to build an AI digital person reasoning model based on an AI digital person 3D model besides generating a blue background AI digital person video during AI reasoning, and performs rendering synthesis of the video through the AI digital person reasoning model.

The intelligent recognition can be carried out on the input manuscript text content to acquire information such as emotion characteristics, limb action characteristics and the like corresponding to the manuscript text content, and the information such as the manuscript text content, the emotion characteristics, the limb action characteristics and the like is sent to an AI reasoning unit to generate AI digital person data in a reasoning way; AI digital data contains: 1) Blue background AI digital person video, or AI digital person 3D expression data, AI digital person 3D limb motion data, etc.; 2) And voice data corresponding to the manuscript text.

In one embodiment, as shown in fig. 4, step S105 includes sub-steps S151, S152, S153, and S154.

S151, according to text information in the text content of the lecture, the blue background AI digital person video conforming to the text information is inferred from the AI digital person inference model.

In one embodiment, step S151 specifically includes: and (3) according to the text information in the text content of the lecture, deducing the video conforming to the mouth shape, emotion and limb actions of the characters in the text information from the AI digital person reasoning model, and combining to generate the blue background AI digital person video.

Specifically, an AI digital person image can be selected from an AI digital person image library of an AI digital person reasoning model, and according to contents such as manuscript text contents, emotion, limb actions and the like, videos of the AI digital person image which conform to the mouth shape, emotion and limb actions of characters in the text contents are deduced from the AI digital person reasoning model and combined, so that blue background AI digital person videos corresponding to the AI digital person image are generated.

In an embodiment, step S151 may further specifically include: according to text information in the text content of the lecture, deducing facial expression data conforming to the mouth shape and emotion of characters in the text information from an AI digital person reasoning model, and deducing skeleton model data conforming to limb actions in the text information from the AI digital person reasoning model; and restoring according to the facial expression data to obtain a facial video of the AI digital person, and restoring according to the skeleton model data to obtain a limb action video of the AI digital person so as to generate a blue background AI digital person video in a combined mode.

Specifically, an AI digital person image can be selected from an AI digital person image library of an AI digital person reasoning model, according to the text content and emotion of a lecture, the facial expression data of the AI digital person image, which accords with the mouth shape and emotion of characters in the text content, is deduced from the AI digital person reasoning model, namely, the facial expression data is the data recorded on the change characteristics of the face of the AI digital person, the facial video of the AI digital person can be restored according to the facial expression data, the bone model data of the AI digital person image, which accords with the limb motion, can be deduced from the AI digital person reasoning model according to the limb motion in a broadcasting item file, and the bone model data is the data recorded on the limb motion change characteristics of the AI digital person, and can be restored according to the bone model data to obtain the limb motion video of the AI digital person.

In addition, the keywords in the manuscript text content can be matched and analyzed to obtain corresponding control instructions therefrom, wherein the control instructions are used for performing play control on the annotation, animation play, text display or video play of the PPT.

S152, performing voice synthesis according to text information in the manuscript text content and audio features in the AI digital person reasoning model to obtain AI digital person voice.

Specifically, an AI digital person image can be selected from an AI digital person image library of an AI digital person reasoning model, and speech synthesis is performed according to text information in the text content of the lecture and the audio characteristics of the AI digital person image to obtain AI digital person speech corresponding to the AI digital person image.

S153, splitting paragraphs according to the text content of the lecture to obtain subtitles.

The text content of the lecture can be split into subtitles comprising a plurality of text segments according to the maximum number of characters, and the number of characters contained in each text segment is not more than the maximum number of characters. If the maximum number of characters is 30, the characters are in ","; ",". And taking the equal punctuation marks as split nodes to paragraph the lecture contents, and obtaining the caption containing a plurality of corresponding text fragments.

S154, acquiring an AI digital person video frame, an AI digital person voice frame and a caption frame corresponding to the rendering synthetic frame according to the AI digital person video, the AI digital person voice and the time stamp of the caption.

Specifically, the time length of the AI digital person video and the time length of the AI digital person voice are equal, each text segment in the subtitle corresponds to a certain time point in the AI digital person video or the AI digital person voice, a video segment is formed by combining a plurality of video frames, a voice segment is formed by combining a plurality of voice frames, the time point corresponding to the text segment in the subtitle and corresponding video content or corresponding voice content can be used as the time stamp of the text segment, and the time stamp is used for splitting, so that the AI digital person video frame, the AI digital person voice frame and the subtitle frame corresponding to the current rendering synthetic frame are obtained.

And S106, displaying the AI digital person video frame in the AI digital person display area of the playing template according to the combination mode of the set AI digital person, the attribute connection, the attribute simulation and the combined animation.

And inserting the AI digital person video frame into an AI digital person display area of the playing template according to the combination mode of attribute connection, attribute simulation and combined animation in the AI digital person and the playing template, and displaying the AI digital person video frame according to a specified mode.

And S107, displaying the page picture in the PPT display area of the play template according to the display mode according to the display area and the display mode of the set PPT page.

According to the display area and the display mode of the PPT page set in the play template, inserting the PPT page picture into the PPT display area of the play template, and displaying the PPT page picture according to the appointed display mode.

S108, acquiring the region coordinates of the text content corresponding to the caption frame in the PPT page according to the 2D mapping relation between the lecture text content and the PPT page, and generating a prompt diagram of the text content corresponding to the current AI digital person voice frame.

In one embodiment, as shown in FIG. 5, step S108 includes sub-steps S181 and S182.

S181, acquiring the region coordinates of the text content corresponding to the caption frame in the PPT page according to the 2D mapping relation between the lecture text content and the PPT page, and determining the geometric vertex parameters of the PPT page in the 3D space according to the camera parameters of the 3D rendering; s182, overlaying a prompt graphic corresponding to the AI digital person voice frame to a corresponding PPT page picture through 3D mapping, so as to realize synchronous marking of the AI digital person voice and the corresponding PPT lecture text.

Specifically, mapping between the 2D space and the 3D space can be realized based on the projection matrix, so as to construct camera parameters of 3D rendering, the PPT page picture and the AI digital person video frame displayed by the playing template are correspondingly presented as the 2D space at the camera end, the 3D position relationship between the PPT page picture and the AI digital person video frame and the virtual camera end can be constructed to form the 3D space, the camera parameters comprise parameters such as the position, the orientation and the like of a camera in the 3D space, and the camera parameters determine how the camera views in the 3D space and determine how to display a view of the camera to the screen. Corresponding prompt graphic representations can be generated according to AI digital person voices, the region coordinates of the text content corresponding to the caption frame on the PPT page can be obtained firstly based on the 2D mapping relation between the lecture text content and the PPT page, the geometric vertex parameters of the PPT page in the 3D space (namely, the three-dimensional coordinate parameters of four vertex angles of the PPT page) are determined according to camera parameters, and the corresponding prompt graphic representations are covered on the corresponding PPT page pictures in a 3D mapping mode, so that synchronous marking based on the AI digital person voices and the corresponding PPT lecture text is realized.

And S109, acquiring the text content corresponding to the caption frame according to the 2D mapping relation between the lecture text content and the PPT page, rendering and drawing a round bright point on the region coordinates of the PPT page, and performing laser pen identification to render and generate an image frame.

In one embodiment, as shown in FIG. 6, step S109 includes substeps S191, S192, and S193.

S191, acquiring the region coordinates of the text content corresponding to the caption frame in the PPT page according to the 2D mapping relation between the text content of the lecture and the PPT page, and calculating the 2D coordinates of each caption text in the PPT page; s192, converting 2D coordinates of caption characters into 3D coordinates through camera parameters of 3D rendering and geometric vertex parameters of the PPT page in a 3D space; s193, converting 3D coordinates of the caption characters into screen coordinates through camera parameters of 3D rendering; rendering a circular highlight at this coordinate enables laser pen identification in the PPT page.

The mapping between the 2D space and the 3D space can be realized based on the projection matrix, and the camera parameters of the 3D rendering are obtained, and the specific method is the same as the above steps. The method comprises the steps of obtaining the region coordinates of the caption frame corresponding to the text content in the PPT page according to the 2D mapping relation between the lecture text content and the PPT page, further obtaining the 2D coordinates of each caption text in the PPT page, converting the 2D coordinates of the caption text in the 3D space into the 3D coordinates through the mapping between the 2D space and the 3D space, specifically obtaining a corresponding view matrix according to camera parameters, mapping the geometric coordinates of the caption text in the 2D space into the 3D scene through the view matrix and the projection matrix, obtaining the 3D coordinates corresponding to the geometric coordinates of the 2D space, then converting the 3D coordinates of the caption text into screen coordinates, namely mapping the 3D scene coordinates into the 2D screen coordinates through the view matrix and the projection matrix, so as to realize the conversion from the 2D coordinates of the PPT page to the screen coordinates, and displaying a circular bright spot on the screen coordinates, namely rendering the circular bright spot, and displaying a circular bright spot in video, namely rendering the circular bright spot, so as to realize the application of the laser pen in the PPT page, namely the laser pen in the laser analog application.

And S110, synchronously outputting the rendered and synthesized image frames and the current AI digital human voice frames to a video board card, recording the video board card and the video board card to a file or generating a network plug flow for outputting.

The image frames synthesized by rendering can be combined with the current AI digital human voice frames and synchronously output to a video board card, or video files can be synchronously recorded and generated, and network plug flow output (such as video live stream) can be synchronously generated.

In the interactive control method for the AI digital person to the PPT based on the templated editing provided by the embodiment of the invention, the attribute connection relation between media material elements is determined, a play template is generated by combining the set display area and the display mode, page information is extracted from a PPT file, a 2D mapping relation between the lecture text content and the PPT page is constructed, an AI digital person video frame, an AI digital person voice frame and a caption frame are generated by reasoning according to the lecture text content, the AI digital person video frame and the page picture are respectively added to the play template for display, a prompt graphic and a laser pen mark are added to the corresponding display area of the PPT page according to the 2D mapping relation to render and generate an image frame, and the image frame and the AI digital person voice frame are synchronously output. Through the method, page information can be extracted from the PPT file, a corresponding 2D mapping relation is constructed, and a prompt graphic and a laser pen mark are added in a corresponding display area of the PPT page according to the 2D mapping relation, so that synchronous control of PPT background materials in the process of synthesizing the AI digital person video is realized, and the application function of the AI digital person video is greatly improved.

The embodiment of the invention also provides a template-editing-based AI digital person-to-PPT interaction control device which can be configured in a user terminal or a management server, and the template-editing-based AI digital person-to-PPT interaction control device is used for executing any embodiment of the template-editing-based AI digital person-to-PPT interaction control method. Specifically, referring to fig. 7, fig. 7 is a schematic block diagram of an AI digital person-to-PPT interaction control device based on templated editing according to an embodiment of the present invention.

As shown in fig. 7, the interactive control device 100 for AI digital person-to-PPT based on templated editing includes a media material element attribute acquisition unit 101, a play template generation unit 102, a page information extraction unit 103, a mapping relation construction unit 104, an inference generation unit 105, an AI digital person video frame display unit 106, a page picture display unit 107, a hint graphic generation unit 108, an image frame rendering unit 109, and an output unit 110.

The media material element attribute obtaining unit 101 is configured to import a media material element and generate an attribute connection, an attribute simulation, and a combined animation corresponding to the media material element according to a specific logic relationship and a state change requirement.

The play template generating unit 102 is configured to set a display area and a display mode of the PPT page and a combination mode of the AI digital person and the attribute connection, the attribute simulation and the combined animation according to the input setting parameters, so as to generate a corresponding play template.

And the page information extraction unit 103 is used for importing the PPT file into the play template and extracting page information corresponding to each PPT page in the PPT file, wherein the page information comprises page pictures and lecture text contents corresponding to each PPT page.

And the mapping relation construction unit 104 is used for establishing a 2D mapping relation between the lecture text content and the PPT page.

And an inference generating unit 105 for transmitting the lecture text contents to the AI digital person inference model to generate AI digital person video frames, AI digital person voice frames, and subtitle frames corresponding to the rendering composition frames.

And the AI digital person video frame display unit 106 is used for displaying AI digital person video frames in the AI digital person display area of the playing template according to the combination mode of the set AI digital person and the attribute connection, the attribute simulation and the combined animation.

And the page picture display unit 107 is configured to display the page picture in the PPT display area of the play template according to the display area and the display mode of the set PPT page.

The prompt graphic generation unit 108 is configured to obtain, according to the 2D mapping relationship between the lecture text content and the PPT page, a region coordinate of the text content corresponding to the subtitle frame on the PPT page, and generate a prompt graphic of the text content corresponding to the current AI digital person voice frame.

And the image frame rendering unit 109 is configured to obtain, according to the 2D mapping relationship between the lecture text content and the PPT page, a text content corresponding to the subtitle frame, render a circular bright point on the region coordinates of the PPT page, and perform a laser pen identification to render and generate an image frame.

And the output unit 110 is used for rendering and outputting the synthesized image frames and the current AI digital human voice frames synchronously to a video board card, recording the synthesized image frames and the current AI digital human voice frames to a file or generating network plug flow for output.

The interactive control device for the template-based editing AI digital person to the PPT provided by the embodiment of the invention applies the interactive control method for the template-based editing AI digital person to the PPT, determines the attribute connection relation between media material elements, generates a play template according to the combination of the set display area and the display mode, extracts page information from a PPT file, constructs a 2D mapping relation between lecture text content and a PPT page, generates an AI digital person video frame, an AI digital person voice frame and a caption frame by reasoning according to the lecture text content, respectively adds the AI digital person video frame and the page picture to the play template for display, adds a prompt graphic and a laser pen mark to the corresponding display area of the PPT page according to the 2D mapping relation to render and generate an image frame, and synchronously outputs the image frame and the AI digital person voice frame. Through the method, page information can be extracted from the PPT file, a corresponding 2D mapping relation is constructed, and a prompt graphic and a laser pen mark are added in a corresponding display area of the PPT page according to the 2D mapping relation, so that synchronous control of PPT background materials in the process of synthesizing the AI digital person video is realized, and the application function of the AI digital person video is greatly improved.

The above-described interactive control device for AI digital person-to-PPT based on templated editing may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 8.

Referring to fig. 8, fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer equipment can be a user terminal or a management server for executing an interactive control method of the AI digital person to the PPT based on the templatized editing so as to generate a corresponding video board card, record the video board card into a file or generate a network plug flow output according to the input parameter information and the PPT file.

With reference to FIG. 8, the computer device 500 includes a processor 502, a memory, and a network interface 505, connected by a system bus 501, where the memory may include a storage medium 503 and an internal memory 504.

The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a templated editing-based AI digital person-to-PPT interactive control method, wherein the storage medium 503 may be a volatile storage medium or a nonvolatile storage medium.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of a computer program 5032 in the storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a templated editing-based AI digital person-to-PPT interactive control method.

The network interface 505 is used for network communication, such as providing for transmission of data information, etc. It will be appreciated by those skilled in the art that the architecture shown in fig. 8 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computer device 500 to which the present inventive arrangements may be implemented, as a particular computer device 500 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

The processor 502 is configured to run a computer program 5032 stored in a memory, so as to implement a corresponding function in the above-mentioned interactive control method of the AI digital person to the PPT based on templated editing.

Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 8 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 8, and will not be described again.

It should be appreciated that in embodiments of the present invention, the Processor 502 may be a central processing unit (Central Processing Unit, CPU), the Processor 502 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor realizes the steps included in the above-mentioned interactive control method for the PPT by the AI digital person based on templated editing.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention is essentially or part of what contributes to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a computer-readable storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned computer-readable storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. An interactive control method of AI digital person to PPT based on templated editing is characterized in that the method comprises the following steps:

Acquiring the region coordinates of the text content corresponding to the caption frame in the PPT page according to the 2D mapping relation between the lecture text content and the PPT page to generate a prompt diagram of the text content corresponding to the current AI digital person voice frame;

2. The interactive control method for the PPT by the AI digital person based on the templated editing according to claim 1, wherein the extracting page information corresponding to each page in the PPT file includes:

Generating a page picture corresponding to each page in the PPT file;

and extracting the text content of the lecture of each page in the PPT file.

3. The template editing-based AI digital person-to-PPT interactive control method according to claim 1, wherein before transmitting lecture text contents to the AI digital person inference model to generate AI digital person video frames, AI digital person voice frames, and subtitle frames corresponding to the rendering composition frames, further comprising:

Generating an AI digital person reasoning model corresponding to the audio and video data of the target object acquired in the blue box according to a preset AI model, wherein the AI digital person reasoning model comprises a plurality of combined images corresponding to each target object and a blue background preview video corresponding to each combined image; the AI digital person reasoning model comprises a 2D AI digital person and/or a 3D AI digital person.

4. The template editing-based AI digital person-to-PPT interactive control method according to claim 1, wherein the transmitting the lecture text content to the AI digital person inference model to generate AI digital person video frames, AI digital person voice frames, and subtitle frames corresponding to the rendering composition frames includes:

According to text information in the text content of the lecture, the blue background AI digital person video conforming to the text information is inferred from the AI digital person reasoning model;

performing voice synthesis according to text information in the text content of the lecture and audio features in the AI digital person reasoning model to obtain AI digital person voice;

Dividing paragraphs according to the text content of the lecture to obtain subtitles;

And acquiring the AI digital person video frame, the AI digital person voice frame and the caption frame corresponding to the rendering synthetic frame according to the time stamps of the AI digital person video, the AI digital person voice and the caption.

5. The interactive control method for PPT by AI digital person based on templated editing according to claim 4, wherein said reasoning out blue background AI digital person video conforming to text information from AI digital person reasoning model according to text information in lecture text content includes:

according to text information in the text content of the lecture, the video conforming to the mouth shape, emotion and limb actions of characters in the text information is inferred from an AI digital person inference model and combined to generate a blue background AI digital person video;

Or is:

According to text information in the text content of the lecture, deducing facial expression data conforming to the mouth shape and emotion of characters in the text information from an AI digital person reasoning model, and deducing skeleton model data conforming to limb actions in the text information from the AI digital person reasoning model;

And restoring according to the facial expression data to obtain a facial video of the AI digital person, and restoring according to the skeleton model data to obtain a limb action video of the AI digital person so as to generate a blue background AI digital person video in a combined mode.

6. The interactive control method for the AI digital person to the PPT based on the templated editing according to claim 1, wherein the obtaining, according to the 2D mapping relationship between the lecture text content and the PPT page, a hint graphic of the text content corresponding to the caption frame in the region coordinates of the PPT page to generate the text content corresponding to the current AI digital person voice frame includes:

Acquiring the region coordinates of the text content corresponding to the caption frame in the PPT page according to the 2D mapping relation between the text content of the lecture and the PPT page, and determining the geometric vertex parameters of the PPT page in the 3D space according to the camera parameters of the 3D rendering;

And overlaying the prompt graphic corresponding to the AI digital person voice frame to the corresponding PPT page picture through 3D mapping, so as to realize synchronous marking of the AI digital person voice and the corresponding PPT lecture text.

7. The interactive control method for the AI digital person to the PPT based on the templated editing according to claim 1, wherein the obtaining, according to the 2D mapping relationship between the lecture text content and the PPT page, the text content corresponding to the subtitle frame to render and draw a circular bright point at the region coordinates of the PPT page to perform laser pen identification so as to render and generate an image frame includes:

Acquiring the region coordinates of the text content corresponding to the caption frame in the PPT page according to the 2D mapping relation between the text content of the lecture and the PPT page, and calculating the 2D coordinates of each caption text in the PPT page;

Converting the 2D coordinates of the caption characters into 3D coordinates through the camera parameters of the 3D rendering and the geometric vertex parameters of the PPT page in the 3D space;

Converting 3D coordinates of the caption characters into screen coordinates through camera parameters of 3D rendering; rendering a circular highlight at this coordinate enables laser pen identification in the PPT page.

8. An AI digital person-to-PPT interactive control device based on templated editing, the device comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the templated editing-based AI digital person-to-PPT interactive control method of any one of claims 1 to 7 when the computer program is executed by the processor.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which when executed by a processor, implements the templated editing-based AI digital person-to-PPT interactive control method according to any one of claims 1 to 7.