CN113794930B - Video generation method, device, equipment and storage medium - Google Patents

Video generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN113794930B
CN113794930B CN202111064510.1A CN202111064510A CN113794930B CN 113794930 B CN113794930 B CN 113794930B CN 202111064510 A CN202111064510 A CN 202111064510A CN 113794930 B CN113794930 B CN 113794930B
Authority
CN
China
Prior art keywords
multimedia
video
target
knowledge
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111064510.1A
Other languages
Chinese (zh)
Other versions
CN113794930A (en
Inventor
于向丽
张煜
刘驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202111064510.1A priority Critical patent/CN113794930B/en
Publication of CN113794930A publication Critical patent/CN113794930A/en
Application granted granted Critical
Publication of CN113794930B publication Critical patent/CN113794930B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The application provides a video generation method, a device, equipment and a storage medium, wherein the method responds to video generation operation of a user to acquire target structuring knowledge and a target template selected by the user; triggering a dialogue recording function to acquire dialogue content videos; inputting the dialogue content video, the target structured knowledge and the target template into a preset training model, and outputting to obtain a target multimedia material; the method has the advantages that the target multimedia materials are spliced to obtain the target video, and the technical problems that a large amount of manpower and material resources are consumed by a method for manually editing the video, the cost is high, the time consumption is long, the efficiency is low, and the video quality of the video generated by a keyword searching mode is difficult to guarantee are solved.

Description

Video generation method, device, equipment and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a video generating method, apparatus, device, and storage medium.
Background
Short video, namely short video, is an internet content transmission mode, along with popularization of mobile terminals and acceleration of networks, short, flat and rapid large-flow transmission content is gradually widely applied in various fields, for example, rapid development is achieved in customer service systems in the field of telecommunication services, and in the large environment, interaction modes of operators and the outside are not limited to traditional text and voice modes, and more short video modes, such as brand propaganda, package introduction, activity popularization and the like, are carried out.
The existing video is basically recorded manually and is subjected to later editing, or a plurality of related pictures are searched on the Internet to be spliced into the video according to keywords in the Chinese description in the video file to be formed.
However, in the prior art, the method of manually editing the video requires a lot of manpower and material resources, and has the disadvantages of high cost, long time consumption and low efficiency, and the video generated by the keyword searching method is difficult to ensure the video quality.
Disclosure of Invention
The application provides a video generation method, a device, equipment and a storage medium, which solve the technical problems that in the prior art, a large amount of manpower and material resources are consumed by a method for manually editing video, the cost is high, the time consumption is long, the efficiency is low, and the quality of the video generated by a keyword search mode is difficult to ensure.
In a first aspect, the present application provides a video generating method, including:
responding to video generation operation of a user, and acquiring target structural knowledge and a target template selected by the user, wherein the target structural knowledge comprises rule information for generating a target video;
triggering a dialogue recording function to acquire dialogue content videos;
Inputting the dialogue content video, the target structured knowledge and the target template into a preset training model, and outputting to obtain a target multimedia material;
and performing splicing processing on the target multimedia materials to obtain a target video.
The method and the device can acquire target structural knowledge and target templates selected by a user when the video needs to be generated, then start a dialogue recording function, acquire video content to be generated according to recorded dialogue content videos, input the dialogue content videos and the target structural knowledge and target templates selected by the user into a preset training model, and screen multimedia materials through the preset training model so as to generate target videos according to the screened multimedia materials.
Optionally, before the inputting the dialogue content video, the target structured knowledge and the target template into a preset training model, the method further comprises:
acquiring a plurality of multimedia videos from a preset knowledge base;
splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video;
and performing model training according to the multimedia material samples and the multimedia video to obtain a preset training model.
The application provides a training method of a preset training model, which is characterized in that a corresponding multimedia material sample is obtained by processing multimedia videos acquired in a preset knowledge base, training is performed according to the multimedia videos and the multimedia material sample, and the preset training model capable of accurately screening video materials is obtained by taking knowledge in the preset knowledge base as a reference, so that the quality of generated videos is ensured.
Optionally, the splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video includes:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is formed by splicing multimedia materials, splitting the multimedia video to obtain a plurality of multimedia material samples and corresponding material scores of each multimedia material sample in a template;
Correspondingly, the training of the model according to the multimedia material sample and the multimedia video to obtain a preset training model comprises the following steps:
and carrying out model optimization on the training model according to the multimedia material sample and the material score to obtain a preset training model.
When the model is trained, whether the multimedia video is formed by splicing the multimedia materials is judged, aiming at the multimedia video obtained by splicing the multimedia materials, each multimedia material sample and each corresponding material module score can be obtained by splitting the multimedia video, and the training model can be optimized and trained through the data and the scores, so that an accurate and optimized preset training model is obtained, and the accuracy of model weight and the quality of video generation are further ensured.
Optionally, the splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video includes:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is not formed by splicing the multimedia materials, a template is acquired from the preset knowledge base;
Splitting the multimedia video according to a template to obtain a plurality of multimedia materials, and obtaining a structured knowledge sample corresponding to the template;
information extraction processing is carried out on the plurality of multimedia materials, and structured knowledge information corresponding to the multimedia video is obtained;
correspondingly, the training of the model according to the multimedia material sample and the multimedia video to obtain a preset training model comprises the following steps:
and inputting the structured knowledge sample, the multimedia video, the multimedia material and the structured knowledge information into a training model for training to obtain a preset training model.
When the model is trained, firstly judging whether the multimedia video is formed by splicing multimedia materials, aiming at the multimedia video which is not formed by splicing the media materials, marking and splitting the multimedia video into the multimedia materials in a template mode, extracting structural knowledge information in the split multimedia materials, simultaneously finding out parts of the same template corresponding to the structural knowledge according to the content after the splitting is finished, extracting the structural knowledge data of the parts, binding the structural knowledge data with text content of the corresponding multimedia materials to form a group of training data, and inputting the training data into a training model for model training to obtain a preset training model, thereby obtaining an accurate and optimized preset training model, automatically screening the materials through the model, and further ensuring the accuracy of model weight and the quality of video generation.
Optionally, the extracting information from the plurality of multimedia materials to obtain structured knowledge information corresponding to the multimedia video includes:
and extracting the image, video and text contents in the multimedia material by a symbolic mathematical system based on data stream programming and a natural language processing and identifying technology to obtain the structural knowledge information.
The application uses the symbol mathematical system (TensorFlow) and natural language processing technology based on data flow programming (dataflow programming) to identify the image, video and text content in the multimedia material, so that the structured knowledge information in the multimedia material can be accurately extracted, and the accuracy of model weight and the quality of video generation are further ensured.
Optionally, the performing a stitching process on the target multimedia material to obtain a target video includes:
and according to the target template, performing splicing processing on the target multimedia material to obtain a target video.
The application can splice the target multimedia materials according to the target template, thereby achieving the capability of automatically generating the multimedia materials, realizing the automatic construction of the multimedia video, saving manpower, improving the video yield, further saving manpower and material resources, saving cost and improving the video generation efficiency.
Optionally, after the splicing processing is performed on the target multimedia material to obtain a target video, the method further includes:
pushing the target video to a client of the user.
After the target video is generated, the target video can be directly pushed to the client of the user, so that the user can obtain the information according to the target video, and the user experience is improved.
In a second aspect, the present application provides a video generating apparatus, comprising:
the first acquisition module is used for responding to video generation operation of a user and acquiring target structural knowledge and a target template selected by the user, wherein the target structural knowledge comprises rule information for generating a target video;
the first processing module is used for triggering a dialogue recording function and acquiring dialogue content videos;
the second processing module is used for inputting the dialogue content video, the target structural knowledge and the target template into a preset training model and outputting to obtain a target multimedia material;
and the third processing module is used for performing splicing processing on the target multimedia materials to obtain a target video.
Optionally, before the second processing module inputs the dialogue content video, the target structured knowledge and the target template into a preset training model, the apparatus further includes:
The second acquisition module is used for acquiring a plurality of multimedia videos in a preset knowledge base;
the splitting module is used for splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video;
and the training module is used for carrying out model training according to the multimedia material samples and the multimedia video to obtain a preset training model.
Optionally, the splitting module is specifically configured to:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is formed by splicing multimedia materials, splitting the multimedia video to obtain a plurality of multimedia material samples and corresponding material scores of each multimedia material sample in a template;
correspondingly, the training module is specifically configured to:
and carrying out model optimization on the training model according to the multimedia material sample and the material score to obtain a preset training model.
Optionally, the splitting module is specifically configured to:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is not formed by splicing the multimedia materials, a template is acquired from the preset knowledge base;
Splitting the multimedia video according to a template to obtain a plurality of multimedia materials, and obtaining a structured knowledge sample corresponding to the template;
information extraction processing is carried out on the plurality of multimedia materials, and structured knowledge information corresponding to the multimedia video is obtained;
correspondingly, the training module is specifically configured to:
and inputting the structured knowledge sample, the multimedia video, the multimedia material and the structured knowledge information into a training model for training to obtain a preset training model.
Optionally, the splitting module is further specifically configured to:
and extracting the image, video and text contents in the multimedia material by a symbolic mathematical system based on data stream programming and a natural language processing and identifying technology to obtain the structural knowledge information.
Optionally, the third processing module is specifically configured to:
and according to the target template, performing splicing processing on the target multimedia material to obtain a target video.
Optionally, after the third processing module performs stitching processing on the target multimedia material to obtain a target video, the apparatus further includes:
and the pushing module is used for pushing the target video to the client of the user.
In a third aspect, the present application provides a video generating apparatus comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes the computer-executable instructions stored by the memory, causing the at least one processor to perform the video generation method as described above in the first aspect and the various possible designs of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the video generation method according to the first aspect and the various possible designs of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the video generation method according to the first aspect and the various possible designs of the first aspect.
According to the video generation method, device and equipment and storage medium, when a user needs to generate a video, target structural knowledge and target templates selected by the user can be obtained, then a conversation recording function is started, video content to be generated can be obtained according to recorded conversation content videos, the conversation content videos and the target structural knowledge and target templates selected by the user are input into a preset training model, screening of multimedia materials can be conducted through the preset training model, and accordingly the target videos can be generated according to the screened multimedia materials.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a schematic diagram of a video generating system architecture according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a video generating method according to an embodiment of the present application;
fig. 3 is a flowchart of another video generating method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present application.
Specific embodiments of the present disclosure have been shown by way of the above drawings and will be described in more detail below. These drawings and the written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the disclosed concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Along with the rapid development of customer service systems, popularization of mobile terminals and acceleration of networks, a knowledge base is not limited to the traditional text mode, more progress is being made towards diversification, short and rapid large-flow transmission contents are gradually favored by more and more users and enterprises, meanwhile, the interaction mode of operators and the outside is also made in more modes of short videos, such as propaganda of brands, introduction of packages, popularization of activities and the like, but the production of the short videos often needs higher labor cost, shooting time and editing time.
The existing video is basically recorded manually and is subjected to later editing, or a plurality of related pictures are searched on the Internet according to keywords in the Chinese description in the video file to be formed to be spliced into the video. However, in the existing manner, if the recording is performed manually, more manpower and material resources are consumed; if the video is generated only through the keywords, the generated video content is in shortage and incoherence, meanwhile, the video formed by splicing network pictures extracted through the keywords has the problem that the content style and the like are greatly different, the quality cannot be ensured, and the technical problems of high cost, long time consumption, low efficiency and difficulty in ensuring the video quality exist in the prior art.
In order to solve the above problems, embodiments of the present application provide a method, an apparatus, a device, and a storage medium for generating video, where the method may obtain target structural knowledge and a target template selected by a user when the user needs to generate video, then start a session recording function, obtain video content to be generated according to a recorded session content video, input the session content video and the target structural knowledge and the target template selected by the user to a preset training model, and perform screening of multimedia materials through the preset training model, thereby generating the target video according to the screened multimedia materials, realizing automatic construction of the multimedia video, saving manpower, and improving video yield and quality.
Optionally, fig. 1 is a schematic diagram of a video generating system architecture according to an embodiment of the present application. In fig. 1, the above architecture includes at least one of a receiving device 101, a processor 102, and a display device 103.
It should be understood that the architecture illustrated in the embodiments of the present application does not constitute a specific limitation on the architecture of the video generation system. In other possible embodiments of the present application, the architecture may include more or less components than those illustrated, or some components may be combined, some components may be split, or different component arrangements may be specifically determined according to the actual application scenario, and the present application is not limited herein. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.
In a specific implementation, the receiving device 101 may be an input/output interface or a communication interface.
The processor 102 can acquire target structured knowledge and target templates selected by a user when the user needs to generate videos, then start a dialogue recording function, acquire video content to be generated according to the recorded dialogue content videos, input the dialogue content videos and the target structured knowledge and target templates selected by the user into a preset training model, and screen multimedia materials through the preset training model, so that the target videos are generated according to the screened multimedia materials, automatic construction of the multimedia videos is realized, labor is saved, and video yield and quality are improved.
The display device 103 may be used to display the above results or the like, or may be used to interact with the user.
The display device may also be a touch display screen for receiving user instructions while displaying the above to enable interaction with a user.
It should be understood that the above-described processor may be implemented by a processor that reads instructions in a memory and executes the instructions, or may be implemented by a chip circuit.
In addition, the network architecture and the service scenario described in the embodiments of the present application are for more clearly describing the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided by the embodiments of the present application, and as a person of ordinary skill in the art can know, with evolution of the network architecture and occurrence of a new service scenario, the technical solution provided by the embodiments of the present application is also applicable to similar technical problems.
The following describes the technical scheme of the present application in detail with reference to specific examples:
optionally, fig. 2 is a schematic flow chart of a video generating method according to an embodiment of the present application. The execution body of the embodiment of the present application may be the processor 102 in fig. 1, and the specific execution body may be determined according to an actual application scenario. As shown in fig. 2, the method comprises the steps of:
s201: and responding to the video generation operation of the user, and acquiring target structuring knowledge and target templates selected by the user.
Wherein the target structuring knowledge comprises rule information for generating the target video.
Here, the video generating operation of the user may be a point touch or input operation on the user terminal or the video generating device, and the user performs selection of the target structuring knowledge and the target template through the video generating operation. Alternatively, the input operation may be a plurality of input modes such as voice input, text input, and the like.
Each template has corresponding structured knowledge, the structured knowledge comprises rule information for generating videos, and a user can select a target template and target structured knowledge from the pre-stored templates.
Optionally, these templates and structured knowledge are pre-stored in a pre-set knowledge base.
The knowledge base refers to a rule set applied by expert system design, and comprises facts and data related by the rules, and the facts and the data form the knowledge base. The knowledge base is related to a specific expert system, and the sharing problem of the knowledge base does not exist; the other is a knowledge base with consultative properties, which is shared and not unique to the individual.
The structuring refers to summarizing and arranging the gradually accumulated knowledge, so that the knowledge is in a physical and chemical form and in a chemical form, and the aim of outline is achieved.
S202: and triggering a dialogue recording function to acquire dialogue content videos.
Alternatively, the user operation may be received to trigger the dialogue recording function, or the corresponding recording function may be automatically triggered.
S203: and inputting the dialogue content video, the target structured knowledge and the target template into a preset training model, and outputting to obtain the target multimedia material.
The preset training model is a model based on deep learning implementation.
The preset training model is used for selecting video content with higher matching degree, deleting irrelevant fragments in the dialogue process, complementing the lacking content according to the corresponding structural knowledge, and generating the screened and optimized multimedia material.
S204: and performing splicing treatment on the target multimedia materials to obtain a target video.
Optionally, performing stitching processing on the target multimedia material to obtain a target video, including:
and according to the target template, performing splicing processing on the target multimedia material to obtain a target video.
Here, the target multimedia material may be directly filled into the target template to obtain the target video, or the multimedia material may be spliced according to the format of the target template.
Optionally, after the target multimedia material is spliced to obtain the target video, the method further includes:
pushing the target video to the user's client.
After the target video is generated, the target video can be directly pushed to the client of the user, so that the user can obtain the desired information according to the target video, and the user experience is improved.
According to the embodiment of the application, the target multimedia materials can be spliced according to the target template, so that the capability of automatically generating the multimedia materials is achieved, the automatic construction of the multimedia video is realized, the manpower is saved, the video yield is improved, the manpower and material resources are further saved, the cost is also saved, and the video generation efficiency is improved.
According to the embodiment of the application, when a user needs to generate a video, target structural knowledge and a target template selected by the user can be obtained, then a dialogue recording function is started, video content to be generated can be obtained according to the recorded dialogue content video, the dialogue content video and the target structural knowledge and the target template selected by the user are input into a preset training model, and the multimedia material can be screened through the preset training model, so that the target video is generated according to the screened multimedia material.
In a possible implementation manner, the embodiment of the present application provides a method for pre-training a model to screen multimedia materials according to the model, and correspondingly, fig. 3 is a schematic flow chart of another video generating method provided by the embodiment of the present application, as shown in fig. 3, including:
S301: and responding to the video generation operation of the user, and acquiring target structuring knowledge and target templates selected by the user.
S302: and triggering a dialogue recording function to acquire dialogue content videos.
S303: and acquiring a plurality of multimedia videos from a preset knowledge base.
Optionally, a plurality of multimedia videos may be pre-stored in the preset knowledge base, and the preset knowledge base may be updated in real time, so as to increase or decrease the multimedia videos, so as to improve the quality of the preset knowledge base.
S304: and splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video.
Optionally, splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video, including:
judging whether the multimedia video is formed by splicing multimedia materials or not; if the multimedia video is formed by splicing multimedia materials, splitting the multimedia video to obtain a plurality of multimedia material samples and corresponding material scores of each multimedia material sample in the template.
Correspondingly, performing model training according to the multimedia material samples and the multimedia video to obtain a preset training model, including: and carrying out model optimization on the training model according to the multimedia material samples and the material scores to obtain a preset training model.
When the model is trained, the embodiment of the application firstly judges whether the multimedia video is formed by splicing the multimedia materials, and can split the multimedia video to obtain each multimedia material sample and each corresponding material module score aiming at the multimedia video obtained by splicing the multimedia materials, and the training model can be optimized and trained through the data and the scores, so that an accurate and optimized preset training model is obtained, and the accuracy of model weight and the quality of video generation are further ensured.
Optionally, splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video, including:
judging whether the multimedia video is formed by splicing multimedia materials or not; if the multimedia video is not formed by splicing the multimedia materials, a template is acquired from a preset knowledge base; splitting the multimedia video according to the template to obtain a plurality of multimedia materials, and obtaining a structured knowledge sample corresponding to the template; and carrying out information extraction processing on the plurality of multimedia materials to obtain the structured knowledge information corresponding to the multimedia video.
Correspondingly, performing model training according to the multimedia material samples and the multimedia video to obtain a preset training model, including: and inputting the structured knowledge sample, the multimedia video, the multimedia material and the structured knowledge information into a training model for training to obtain a preset training model.
Optionally, the information extracting process is performed on the multiple multimedia materials to obtain structured knowledge information corresponding to the multimedia video, including: and extracting the image, video and text contents in the multimedia material by a symbolic mathematical system based on data stream programming and a natural language processing and identifying technology to obtain the structured knowledge information.
If the multimedia video is not obtained through splicing, if the multimedia video is obtained through splicing by a gatherer, the structurally related multimedia video can be manually marked and split into multimedia materials in a template mode, and images, video contents and text contents in the split multimedia materials are extracted by using TensorFlow and natural language processing technology. And meanwhile, finding out a part of the same template corresponding to the structured knowledge according to the content after splitting, extracting the structured knowledge data of the part, binding the part of the structured knowledge data with the text content corresponding to the multimedia material to serve as a group of training data, and simultaneously distributing the corresponding image, video content and knowledge evaluation with different weights to serve as input parameters, and performing model training by using a deep learning technology.
The embodiment of the application utilizes a symbolic mathematical system (TensorFlow) based on data flow programming (dataflow programming) and a natural language processing technology to identify images, videos and text contents in the multimedia materials, can accurately extract structural knowledge information in the multimedia materials, and further ensures the accuracy of model weight and the quality of video generation.
When the model is trained, firstly judging whether the multimedia video is formed by splicing multimedia materials, aiming at the multimedia video which is not formed by splicing the media materials, marking and splitting the multimedia video into the multimedia materials in a template mode, extracting structural knowledge information in the split multimedia materials, simultaneously finding out parts of the same template corresponding to the structural knowledge according to the content after the splitting is finished, extracting the structural knowledge data of the parts, binding the parts with text content of the corresponding multimedia materials as a group of training data, and inputting the training data into a training model for model training to obtain a preset training model, so that the accurate and optimized preset training model can be obtained, material screening can be automatically carried out through the model, and the accuracy of model weight and the quality of video generation are further ensured.
S305: model training is carried out according to the multimedia material samples and the multimedia video, and a preset training model is obtained.
S306: and inputting the dialogue content video, the target structured knowledge and the target template into a preset training model, and outputting to obtain the target multimedia material.
S307: and performing splicing treatment on the target multimedia materials to obtain a target video.
The embodiment of the application provides a training method of a preset training model, which is characterized in that a corresponding multimedia material sample is obtained by processing multimedia videos acquired in a preset knowledge base, training is performed according to the multimedia videos and the multimedia material sample, knowledge in the preset knowledge base is used as a reference, and the preset training model capable of accurately screening video materials is obtained, so that the quality of generated videos is ensured.
Fig. 4 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present application, where, as shown in fig. 4, the apparatus according to an embodiment of the present application includes: a first acquisition module 401, a first processing module 402, a second processing module 403, and a third processing module 404. The video generating means may be the processor 102 itself described above, or a chip or an integrated circuit implementing the functions of the processor 102. Here, the division of the first acquisition module 401, the first processing module 402, the second processing module 403, and the third processing module 404 is just a division of a logic function, and both may be integrated or independent physically.
The first acquisition module is used for responding to video generation operation of a user and acquiring target structural knowledge and a target template selected by the user, wherein the target structural knowledge comprises rule information for generating a target video;
The first processing module is used for triggering a dialogue recording function and acquiring dialogue content videos;
the second processing module is used for inputting the dialogue content video, the target structured knowledge and the target template into a preset training model and outputting to obtain a target multimedia material;
and the third processing module is used for performing splicing processing on the target multimedia materials to obtain a target video.
Optionally, before the second processing module inputs the dialogue content video, the target structural knowledge and the target template into the preset training model, the apparatus further includes:
the second acquisition module is used for acquiring a plurality of multimedia videos in a preset knowledge base;
the splitting module is used for splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video;
the training module is used for carrying out model training according to the multimedia material samples and the multimedia video to obtain a preset training model.
Optionally, the splitting module is specifically configured to:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is formed by splicing multimedia materials, splitting the multimedia video to obtain a plurality of multimedia material samples and corresponding material scores of each multimedia material sample in a template;
Correspondingly, the training module is specifically configured to:
and carrying out model optimization on the training model according to the multimedia material samples and the material scores to obtain a preset training model.
Optionally, the splitting module is specifically configured to:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is not formed by splicing the multimedia materials, a template is acquired from a preset knowledge base;
splitting the multimedia video according to the template to obtain a plurality of multimedia materials, and obtaining a structured knowledge sample corresponding to the template;
extracting information from the multimedia materials to obtain structured knowledge information corresponding to the multimedia video;
correspondingly, the training module is specifically configured to:
and inputting the structured knowledge sample, the multimedia video, the multimedia material and the structured knowledge information into a training model for training to obtain a preset training model.
Optionally, the splitting module is further specifically configured to:
and extracting the image, video and text contents in the multimedia material by a symbolic mathematical system based on data stream programming and a natural language processing and identifying technology to obtain the structured knowledge information.
Optionally, the third processing module is specifically configured to:
And according to the target template, performing splicing processing on the target multimedia material to obtain a target video.
Optionally, after the third processing module performs stitching processing on the target multimedia material to obtain the target video, the apparatus further includes:
and the pushing module is used for pushing the target video to the client of the user.
Fig. 5 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present application, where the video generating apparatus may be the processor 102. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not limiting of the implementations of the application described and/or claimed in this document.
As shown in fig. 5, the video generating apparatus includes: the processor 501 and the memory 502, the respective components are interconnected using different buses, and may be mounted on a common motherboard or in other manners as appropriate. The processor 501 may process instructions executing within the video generation device, including instructions stored in or on memory for graphical information displayed on an external input/output device, such as a display device coupled to an interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. One processor 501 is illustrated in fig. 5.
The memory 502 is used as a non-transitory computer readable storage medium for storing a non-transitory software program, a non-transitory computer executable program, and program instructions/modules (e.g., the first acquisition module 401, the first processing module 402, the second processing module 403, and the third processing module 404 shown in fig. 5) corresponding to the method of the video generating apparatus according to the embodiment of the present application. The processor 501 executes various functional applications of the authentication platform and data processing, that is, a method of implementing the video generating apparatus in the above-described method embodiment, by running a non-transitory software program, instructions, and modules stored in the memory 502.
The video generating apparatus may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the video generating apparatus, such as a touch screen, a keypad, a mouse, or a plurality of mouse buttons, a trackball, a joystick, or the like. The output means 504 may be an output device such as a display device of a video generating device. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
The video generating device of the embodiment of the present application may be used to execute the technical solutions of the above embodiments of the methods of the present application, and its implementation principle and technical effects are similar, and are not repeated here.
The embodiment of the application also provides a computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, and the computer executable instructions are used for realizing the video generation method of any one of the above when being executed by a processor.
The embodiment of the application also provides a computer program product, which comprises a computer program, and the computer program is used for realizing the video generation method of any one of the above steps when being executed by a processor.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (7)

1. A video generation method, comprising:
Responding to video generation operation of a user, and acquiring target structural knowledge and a target template selected by the user, wherein the target structural knowledge comprises rule information for generating a target video;
triggering a dialogue recording function to acquire dialogue content videos;
inputting the dialogue content video, the target structured knowledge and the target template into a preset training model, and outputting to obtain a target multimedia material; the target multimedia material is multimedia material which has high matching degree with video content, removes irrelevant fragments in the dialogue process and complements lacking content according to target structural knowledge;
splicing the target multimedia materials to obtain a target video;
before the dialogue content video, the target structured knowledge and the target template are input into a preset training model, the method further comprises the following steps:
acquiring a plurality of multimedia videos from a preset knowledge base;
splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video;
model training is carried out according to the multimedia material samples and the multimedia video, and a preset training model is obtained;
The splitting processing is performed on the multimedia video to obtain a multimedia material sample corresponding to the multimedia video, including:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is formed by splicing multimedia materials, splitting the multimedia video to obtain a plurality of multimedia material samples and corresponding material scores of each multimedia material sample in a template;
correspondingly, the training of the model according to the multimedia material sample and the multimedia video to obtain a preset training model comprises the following steps:
according to the multimedia material samples and the material scores, carrying out model optimization on a training model to obtain a preset training model;
the splitting processing is performed on the multimedia video to obtain a multimedia material sample corresponding to the multimedia video, including:
judging whether the multimedia video is formed by splicing multimedia materials or not;
if the multimedia video is not formed by splicing the multimedia materials, a template is acquired from the preset knowledge base;
splitting the multimedia video according to a template to obtain a plurality of multimedia materials, and obtaining a structured knowledge sample corresponding to the template;
Information extraction processing is carried out on the plurality of multimedia materials, and structured knowledge information corresponding to the multimedia video is obtained;
correspondingly, the training of the model according to the multimedia material sample and the multimedia video to obtain a preset training model comprises the following steps:
and inputting the structured knowledge sample, the multimedia video, the multimedia material and the structured knowledge information into a training model for training to obtain a preset training model.
2. The method of claim 1, wherein the performing information extraction processing on the plurality of multimedia materials to obtain structured knowledge information corresponding to the multimedia video includes:
and extracting the image, video and text contents in the multimedia material by a symbolic mathematical system based on data stream programming and a natural language processing and identifying technology to obtain the structural knowledge information.
3. The method according to any one of claims 1 to 2, wherein the performing a stitching process on the target multimedia material to obtain a target video includes:
and according to the target template, performing splicing processing on the target multimedia material to obtain a target video.
4. The method according to any one of claims 1 to 2, further comprising, after said performing a stitching process on said target multimedia material to obtain a target video:
pushing the target video to a client of the user.
5. A video generating apparatus, comprising:
the first acquisition module is used for responding to video generation operation of a user and acquiring target structural knowledge and a target template selected by the user, wherein the target structural knowledge comprises rule information for generating a target video;
the first processing module is used for triggering a dialogue recording function and acquiring dialogue content videos;
the second processing module is used for inputting the dialogue content video, the target structural knowledge and the target template into a preset training model and outputting to obtain a target multimedia material; the target multimedia material is multimedia material which has high matching degree with video content, removes irrelevant fragments in the dialogue process and complements lacking content according to target structural knowledge;
the third processing module is used for performing splicing processing on the target multimedia materials to obtain a target video;
The second acquisition module is used for acquiring a plurality of multimedia videos in a preset knowledge base;
the splitting module is used for splitting the multimedia video to obtain a multimedia material sample corresponding to the multimedia video;
the training module is used for carrying out model training according to the multimedia material samples and the multimedia video to obtain a preset training model;
the splitting module is specifically used for judging whether the multimedia video is formed by splicing multimedia materials; if the multimedia video is formed by splicing multimedia materials, splitting the multimedia video to obtain a plurality of multimedia material samples and corresponding material scores of each multimedia material sample in a template;
the training module is specifically used for carrying out model optimization on the training model according to the multimedia material samples and the material scores to obtain a preset training model;
the splitting module is specifically used for judging whether the multimedia video is formed by splicing multimedia materials; if the multimedia video is not formed by splicing the multimedia materials, a template is acquired from a preset knowledge base; splitting the multimedia video according to the template to obtain a plurality of multimedia materials, and obtaining a structured knowledge sample corresponding to the template; extracting information from the multimedia materials to obtain structured knowledge information corresponding to the multimedia video;
The training module is specifically configured to input the structured knowledge sample, the multimedia video, the multimedia material and the structured knowledge information into the training model for training, so as to obtain a preset training model.
6. A video generating apparatus, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video generation method of any one of claims 1 to 4.
7. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are for implementing the video generation method of any of claims 1 to 4.
CN202111064510.1A 2021-09-10 2021-09-10 Video generation method, device, equipment and storage medium Active CN113794930B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111064510.1A CN113794930B (en) 2021-09-10 2021-09-10 Video generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111064510.1A CN113794930B (en) 2021-09-10 2021-09-10 Video generation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113794930A CN113794930A (en) 2021-12-14
CN113794930B true CN113794930B (en) 2023-11-24

Family

ID=79183264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111064510.1A Active CN113794930B (en) 2021-09-10 2021-09-10 Video generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113794930B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117891971B (en) * 2024-03-18 2024-05-14 吉林省通泰信息技术有限公司 Video editing system management method

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109120992A (en) * 2018-09-13 2019-01-01 北京金山安全软件有限公司 Video generation method and device, electronic equipment and storage medium
CN109660865A (en) * 2018-12-17 2019-04-19 杭州柚子街信息科技有限公司 Make method and device, medium and the electronic equipment of video tab automatically for video
CN109819179A (en) * 2019-03-21 2019-05-28 腾讯科技(深圳)有限公司 A kind of video clipping method and device
CN110855904A (en) * 2019-11-26 2020-02-28 Oppo广东移动通信有限公司 Video processing method, electronic device and storage medium
CN111105817A (en) * 2018-10-25 2020-05-05 国家新闻出版广电总局广播科学研究院 Training data generation method and device for intelligent program production
CN111209435A (en) * 2020-01-10 2020-05-29 上海摩象网络科技有限公司 Method and device for generating video data, electronic equipment and computer storage medium
CN111866585A (en) * 2020-06-22 2020-10-30 北京美摄网络科技有限公司 Video processing method and device
CN111914523A (en) * 2020-08-19 2020-11-10 腾讯科技(深圳)有限公司 Multimedia processing method and device based on artificial intelligence and electronic equipment
CN112073649A (en) * 2020-09-04 2020-12-11 北京字节跳动网络技术有限公司 Multimedia data processing method, multimedia data generating method and related equipment
CN112565825A (en) * 2020-12-02 2021-03-26 腾讯科技(深圳)有限公司 Video data processing method, device, equipment and medium
CN112784078A (en) * 2021-01-22 2021-05-11 哈尔滨玖楼科技有限公司 Video automatic editing method based on semantic recognition
CN113079326A (en) * 2020-01-06 2021-07-06 北京小米移动软件有限公司 Video editing method and device and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10741089B2 (en) * 2004-12-23 2020-08-11 Carl Wakamoto Interactive immersion system for movies, television, animation, music videos, language training, entertainment, video games and social networking
US20120195573A1 (en) * 2011-01-28 2012-08-02 Apple Inc. Video Defect Replacement
US20130272679A1 (en) * 2012-04-12 2013-10-17 Mario Luis Gomes Cavalcanti Video Generator System
WO2017132228A1 (en) * 2016-01-25 2017-08-03 Wespeke, Inc. Digital media content extraction natural language processing system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109120992A (en) * 2018-09-13 2019-01-01 北京金山安全软件有限公司 Video generation method and device, electronic equipment and storage medium
CN111105817A (en) * 2018-10-25 2020-05-05 国家新闻出版广电总局广播科学研究院 Training data generation method and device for intelligent program production
CN109660865A (en) * 2018-12-17 2019-04-19 杭州柚子街信息科技有限公司 Make method and device, medium and the electronic equipment of video tab automatically for video
CN109819179A (en) * 2019-03-21 2019-05-28 腾讯科技(深圳)有限公司 A kind of video clipping method and device
CN110855904A (en) * 2019-11-26 2020-02-28 Oppo广东移动通信有限公司 Video processing method, electronic device and storage medium
CN113079326A (en) * 2020-01-06 2021-07-06 北京小米移动软件有限公司 Video editing method and device and storage medium
CN111209435A (en) * 2020-01-10 2020-05-29 上海摩象网络科技有限公司 Method and device for generating video data, electronic equipment and computer storage medium
CN111866585A (en) * 2020-06-22 2020-10-30 北京美摄网络科技有限公司 Video processing method and device
CN111914523A (en) * 2020-08-19 2020-11-10 腾讯科技(深圳)有限公司 Multimedia processing method and device based on artificial intelligence and electronic equipment
CN112073649A (en) * 2020-09-04 2020-12-11 北京字节跳动网络技术有限公司 Multimedia data processing method, multimedia data generating method and related equipment
CN112565825A (en) * 2020-12-02 2021-03-26 腾讯科技(深圳)有限公司 Video data processing method, device, equipment and medium
CN112784078A (en) * 2021-01-22 2021-05-11 哈尔滨玖楼科技有限公司 Video automatic editing method based on semantic recognition

Also Published As

Publication number Publication date
CN113794930A (en) 2021-12-14

Similar Documents

Publication Publication Date Title
US20190205477A1 (en) Method for Processing Fusion Data and Information Recommendation System
US20190370305A1 (en) Method and apparatus for providing search results
CN105183787A (en) Information input method and apparatus
US20240147050A1 (en) Prop processing method and apparatus, and device and medium
CN107818168B (en) Topic searching method, device and equipment
CN110457214B (en) Application testing method and device and electronic equipment
CN104077294A (en) Information recommendation method, information recommendation device and information resource recommendation system
JP2017534097A (en) Two-dimensional code analysis method and apparatus, computer-readable storage medium, computer program product, and terminal device
CN114449327B (en) Video clip sharing method and device, electronic equipment and readable storage medium
CN112612690B (en) User interface information processing method and device, electronic equipment and storage medium
CN107729491B (en) Method, device and equipment for improving accuracy rate of question answer search
CN113794930B (en) Video generation method, device, equipment and storage medium
CN109683760B (en) Recent content display method, device, terminal and storage medium
CN114722292A (en) Book searching method, device, equipment and storage medium
CN108921138B (en) Method and apparatus for generating information
CN111259225A (en) New media information display method and device, electronic equipment and computer readable medium
CN113869063A (en) Data recommendation method and device, electronic equipment and storage medium
CN113626624A (en) Resource identification method and related device
CN116467607B (en) Information matching method and storage medium
CN107357481B (en) Message display method and message display device
CN110689285A (en) Test method, test device, electronic equipment and computer readable storage medium
CN113593614B (en) Image processing method and device
CN105446971A (en) Information display method and device
EP3916586A1 (en) Method and device for transmitting information
CN111026438B (en) Method, device, equipment and medium for extracting small program package and page key information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant