CN116405723A

CN116405723A - Video production system, method, electronic device, and readable storage medium

Info

Publication number: CN116405723A
Application number: CN202310317989.8A
Authority: CN
Inventors: 张礼官
Original assignee: Hangzhou Simima Information Technology Co ltd
Current assignee: Hangzhou Simima Information Technology Co ltd
Priority date: 2023-03-28
Filing date: 2023-03-28
Publication date: 2023-07-07
Anticipated expiration: 2043-03-28
Also published as: CN116405723B

Abstract

The invention is applicable to the technical field of video production, and provides a video production system, a method, electronic equipment and a readable storage medium, wherein the method comprises the following steps: receiving an audio file and a video file; the method comprises the steps of receiving key point binding information, wherein the key point binding information comprises a plurality of audio key points and video key frames; calculating a first time difference between two adjacent audio key points and a second time difference between two corresponding video key frames; determining a percentage of error between the first time difference and the second time difference; when the error percentage is smaller than the set percentage, the audio key points and the video key frames are clamped by shifting the audio or video; when the error percentage is greater than or equal to the set percentage, the audio key points and the video key frames are clamped by cutting the audio or video. Therefore, the audio key points and the video key frames in the finally manufactured video are guaranteed to be completely clamped, and the video manufacturing efficiency is high.

Description

Video production system, method, electronic device, and readable storage medium

Technical Field

The present invention relates to the field of video production technology, and in particular, to a video production system, a video production method, an electronic device, and a readable storage medium.

Background

With the continuous development of computer technology, short videos are increasingly exploded, more and more people start to make videos by themselves, and when the short videos are shot, background music is often added so that the whole video is more interesting. Many times, in order to make background music and short video itself more fused, need carry out the stuck point with short video and background music, if the stuck point quantity exceeds one, the video photographer hardly guarantees that two video stuck points just block with the corresponding point on the background music, often need repeatedly shoot, brings inconvenience for the video preparation. Accordingly, there is a need to provide a video production system, method, electronic device, and readable storage medium, which aim to solve the above-mentioned problems.

Disclosure of Invention

In view of the shortcomings of the prior art, an object of the present invention is to provide a video production system, a method, an electronic device and a readable storage medium, so as to solve the problems in the prior art.

The invention is realized in that a video production method comprises the following steps:

receiving an audio file and a video file;

the method comprises the steps of receiving key point binding information, wherein the key point binding information comprises a plurality of audio key points and video key frames, and the audio key points and the video key frames are in one-to-one correspondence;

calculating a first time difference between two adjacent audio key points and a second time difference between two corresponding video key frames;

determining a percentage of error between the first time difference and the second time difference;

when the error percentage is smaller than the set percentage, the audio key points and the video key frames are clamped by shifting the audio or video;

when the error percentage is greater than or equal to the set percentage, the audio key points and the video key frames are clamped by cutting the audio or video.

As a further scheme of the invention: the step of enabling the audio key points and the video key frames to be clamped by changing the speed of the audio or video specifically comprises the following steps:

determining a mean time difference from the first time difference and the second time difference, the mean time difference = (first time difference + second time difference)/2;

determining a first speed change rate of the audio according to the first time difference and the average time difference, and processing the audio fragment between the two audio key points according to the first speed change rate;

and determining a second variable speed rate of the video according to the second time difference and the average time difference, and processing the video fragments between the two video key frames according to the second variable speed rate.

As a further scheme of the invention: the step of clipping the audio or video to enable the audio key points and the video key frames to be clamped specifically comprises the following steps:

when the first time difference is larger than the second time difference, determining a first difference value, wherein the first difference value is equal to the first time difference minus the second time difference, identifying music playing, and cutting out an audio piece equal to the first difference value, wherein the audio piece is preferentially music playing;

when the first time difference is smaller than the second time difference, determining a second difference value, wherein the second difference value is equal to the second time difference minus the first time difference, receiving a cuttable segment, cutting off a video segment equal to the second difference value, and the video segment belongs to the cuttable segment.

As a further scheme of the invention: the step of receiving the tailorable segment and cutting off the video segment equal to the second difference value specifically includes:

receiving a tailorable segment, wherein the duration of the tailorable segment is longer than a second difference value;

determining a plurality of pairs of interval video frames according to the second difference value, wherein the time difference between each pair of interval video frames is the second difference value;

and calculating the similarity between each pair of interval video frames, determining a pair of interval video frames with the highest similarity, and cutting off the video clips in the middle of the pair of interval video frames.

As a further scheme of the invention: the method further comprises the step of carrying out voice recognition on the audio file to obtain an audio subtitle, wherein the audio subtitle can be edited.

Another object of the present invention is to provide a video production system, the system comprising:

the audio and video file receiving module is used for receiving the audio files and the video files;

the key point binding determination module is used for receiving key point binding information, wherein the key point binding information comprises a plurality of audio key points and video key frames, and the audio key points and the video key frames are in one-to-one correspondence;

the time difference calculation module is used for calculating a first time difference between two adjacent audio key points and calculating a second time difference between two corresponding video key frames;

the error percentage judging module is used for judging the error percentage between the first time difference and the second time difference;

the first audio and video processing module is used for enabling the audio key points and the video key frames to be clamped by changing the speed of the audio or video when the error percentage is smaller than the set percentage;

and the second audio and video processing module cuts the audio or video when the error percentage is greater than or equal to the set percentage, so that the audio key points are clamped with the video key frames.

As a further scheme of the invention: the first audio and video processing module comprises:

a mean time difference determining unit configured to determine a mean time difference from the first time difference and the second time difference, the mean time difference= (first time difference+second time difference)/2;

the audio fragment speed changing unit is used for determining a first speed changing rate of audio according to the first time difference and the average time difference, and processing the audio fragment between the two audio key points according to the first speed changing rate;

and the video segment speed change unit is used for determining a second speed change rate of the video according to the second time difference and the average time difference, and processing the video segments between the two video key frames according to the second speed change rate.

As a further scheme of the invention: the second audio/video processing module comprises:

and the video clip cutting unit is used for determining a second difference value when the first time difference is smaller than the second time difference, wherein the second difference value is equal to the second time difference minus the first time difference, receiving the cuttable clips, cutting off video clips equal to the second difference value, and the video clips belong to the cuttable clips.

The invention also aims to provide an electronic device, which comprises a processor, a readable storage medium and a computer program stored on the readable storage medium and capable of running on the processor, wherein the specific steps in the video production method are realized when the processor executes the computer program

The present invention also aims to provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor, implement specific steps in the video production method.

Compared with the prior art, the invention has the beneficial effects that:

the method comprises the steps of receiving key point binding information input by a user, wherein the key point binding information comprises a plurality of audio key points and video key frames, and the audio key points and the video key frames are in one-to-one correspondence; calculating a first time difference between two adjacent audio key points and a second time difference between two corresponding video key frames; determining a percentage of error between the first time difference and the second time difference; when the error percentage is smaller than the set percentage, the audio key points and the video key frames are clamped by shifting the audio or video; when the error percentage is greater than or equal to the set percentage, the audio key points and the video key frames are clamped by cutting the audio or video. Therefore, the audio key points and the video key frames in the finally manufactured video are ensured to be completely clamped, the video does not need to be repeatedly shot, and the video manufacturing efficiency is high.

Drawings

Fig. 1 is a flow chart of a video production method.

Fig. 2 is a flowchart of a video production method for enabling an audio key point and a video key frame to be clamped by changing the speed of audio or video.

Fig. 3 is a flowchart of a video production method for clipping an audio or video so that an audio key point and a video key frame are clamped.

FIG. 4 is a flow chart of a method for video production in which a tailorable segment is received and a video segment equal to the second difference is clipped.

Fig. 5 is a schematic structural diagram of a video production system.

Fig. 6 is a schematic structural diagram of a first audio/video processing module in a video production system.

Fig. 7 is a schematic structural diagram of a second audio/video processing module in the video production system.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Specific implementations of the invention are described in detail below in connection with specific embodiments.

As shown in fig. 1, an embodiment of the present invention provides a video production method, which includes the following steps:

s100, receiving an audio file and a video file;

s200, key point binding information is received, wherein the key point binding information comprises a plurality of audio key points and video key frames, and the audio key points and the video key frames are in one-to-one correspondence;

s300, calculating a first time difference between two adjacent audio key points and calculating a second time difference between two corresponding video key frames;

s400, determining the error percentage between the first time difference and the second time difference;

s500, when the error percentage is smaller than the set percentage, the audio key points and the video key frames are clamped by changing the speed of the audio or video;

s600, when the error percentage is greater than or equal to the set percentage, the audio key points and the video key frames are clamped by cutting the audio or video.

It should be noted that, when a short video is shot, background music is often added to make the whole video more interesting. Many times, in order to enable background music and short video to be more fused, short video and background music are required to be clamped, if the number of clamped points exceeds one, a video photographer is difficult to ensure that two video clamped points are exactly clamped with corresponding points on the background music, repeated shooting is often required, inconvenience is brought to video production, and the embodiment of the invention aims to solve the problems.

In the embodiment of the invention, firstly, a video producer is required to upload an audio file and a video file, key point binding information is input, the key point binding information comprises a plurality of audio key points and video key frames, the audio key points and the video key frames are in one-to-one correspondence, and the aim of the embodiment of the invention is to enable the audio key points and the video key frames in the finally produced video to be completely clamped; then, the embodiment of the invention automatically calculates the first time difference between two adjacent audio key points, calculates the second time difference between two corresponding video key frames, and judges the error percentage between the first time difference and the second time difference, wherein the error percentage is a fixed value set in advance, the error percentage=the difference/max between the first time difference and the second time difference (the first time difference and the second time difference), when the error percentage is smaller than the set percentage, the audio key points and the video key frames are clamped by shifting the audio or the video, and the scheme is only suitable for the case that the error percentage is smaller if the shift range is too large and is easy to understand; when the error percentage is greater than or equal to the set percentage, the audio key points and the video key frames are clamped by cutting the audio or video. Therefore, the audio key points and the video key frames in the finally manufactured video are ensured to be completely clamped, the video does not need to be repeatedly shot, and the video manufacturing efficiency is high.

As shown in fig. 2, as a preferred embodiment of the present invention, the step of enabling the audio key point and the video key frame to be clamped by shifting the audio or the video specifically includes:

s501, determining a mean time difference according to the first time difference and the second time difference, where the mean time difference= (first time difference+second time difference)/2;

s502, determining a first speed change rate of audio according to the first time difference and the average time difference, and processing the audio fragments between the two audio key points according to the first speed change rate;

and S503, determining a second speed change rate of the video according to the second time difference and the average time difference, and processing the video segments between the two video key frames according to the second speed change rate.

In the embodiment of the invention, when the audio key points and the video key frames are clamped by using a speed changing means, in order to make the speed changing amplitude smaller, the video is manufactured more naturally, the audio files and the video files are all changed in speed and are mutually close, specifically, firstly, the average time difference is determined according to the first time difference and the second time difference, the average time difference= (first time difference+second time difference)/2, then, the first speed changing rate of the audio can be determined according to the first time difference and the average time difference, the speed changing processing is carried out on the audio fragments between the two audio key points according to the first speed changing rate, and after the processing, the time interval between the two audio key points is equal to the average time difference; and meanwhile, a second variable speed rate of the video is required to be determined according to the second time difference and the average time difference, and video fragments between two video key frames are processed according to the second variable speed rate, and after the video fragments are processed, the time interval between the two video key frames is also equal to the average time difference.

As shown in fig. 3, as a preferred embodiment of the present invention, the step of clipping the audio or video to clip the audio key points and the video key frames specifically includes:

s601, when the first time difference is larger than the second time difference, determining a first difference value, wherein the first difference value is equal to the first time difference minus the second time difference, identifying music playing, cutting off an audio piece equal to the first difference value, and preferentially enabling the audio piece to be the music playing;

s602, when the first time difference is smaller than the second time difference, determining a second difference value, wherein the second difference value is equal to the second time difference minus the first time difference, receiving a tailorable segment, and cutting out a video segment equal to the second difference value, wherein the video segment belongs to the tailorable segment.

In the embodiment of the invention, when the error percentage is greater than or equal to the set percentage, the audio or video is required to be cut, specifically, when the first time difference is greater than the second time difference, the first difference is determined, the first difference is equal to the first time difference minus the second time difference, the music interlude is automatically identified, the music interlude does not contain lyrics, the audio clip which is equal to the first difference is cut off, and the audio clip is preferentially music interlude, so that the music content is not influenced as much as possible; in addition, when the first time difference is smaller than the second time difference, the second difference is determined, the second difference is equal to the second time difference minus the first time difference, a user is required to select a cuttable segment, the cuttable segment is a part of a video file, a video segment equal to the second difference is cut out, the video segment belongs to the cuttable segment, the cuttable segment is a segment considered to be unimportant by the user, and after cutting, the expression of the video content is not affected basically.

As shown in fig. 4, as a preferred embodiment of the present invention, the step of receiving a tailorable segment and cutting out a video segment equal to the second difference value specifically includes:

s6021, receiving a tailorable segment, wherein the duration of the tailorable segment is longer than a second difference value;

s6022, determining a plurality of pairs of interval video frames according to the second difference value, wherein the time difference between each pair of interval video frames is the second difference value;

s6023, calculating the similarity between each pair of interval video frames, determining a pair of interval video frames with the highest similarity, and cutting off the video clips in the middle of the pair of interval video frames.

In the embodiment of the present invention, it is easy to understand that the time length of the user-selected clip needs to be greater than the second difference value, then, in the embodiment of the present invention, a plurality of pairs of interval video frames are determined according to the second difference value, the time difference between each pair of interval video frames is the second difference value, after the plurality of pairs of interval video frames are determined, the similarity between each pair of interval video frames needs to be calculated, a pair of interval video frames with the highest similarity is determined, and the video clip between the pair of interval video frames is cut off, so that the pair of interval video frames with the highest similarity can become continuous video frames, and even if the originally-spaced video frames become continuous, the video frames will not appear abrupt due to the higher similarity.

As a preferred embodiment of the invention, the method further comprises the step of performing voice recognition on the audio file to obtain the audio subtitle, wherein the audio subtitle can be edited, so that the voice recognition can be manually corrected even if errors occur, and the method is convenient to use.

As shown in fig. 5, an embodiment of the present invention further provides a video production system, where the system includes:

an audio and video file receiving module 100 for receiving audio files and video files;

the key point binding determination module 200 is configured to receive key point binding information, where the key point binding information includes a plurality of audio key points and video key frames, and the audio key points and the video key frames are in one-to-one correspondence;

the time difference calculating module 300 is configured to calculate a first time difference between two adjacent audio key points and calculate a second time difference between two corresponding video key frames;

an error percentage determination module 400, configured to determine a percentage of error between the first time difference and the second time difference;

the first audio/video processing module 500 changes the speed of the audio or video when the error percentage is smaller than the set percentage, so that the audio key points and the video key frames are clamped;

and when the error percentage is greater than or equal to the set percentage, the second audio/video processing module 500 clips the audio or video so that the audio key points and the video key frames are clamped.

As shown in fig. 6, as a preferred embodiment of the present invention, the first audio/video processing module 500 includes:

a mean time difference determining unit 501 configured to determine a mean time difference from the first time difference and the second time difference, where the mean time difference= (first time difference+second time difference)/2;

an audio segment speed change unit 502, configured to determine a first speed change rate of audio according to the first time difference and the average time difference, and process an audio segment between two audio key points according to the first speed change rate;

and a video segment speed changing unit 503, configured to determine a second speed changing rate of the video according to the second time difference and the average time difference, and process the video segment between the two video key frames according to the second speed changing rate.

As shown in fig. 7, as a preferred embodiment of the present invention, the second audio/video processing module 600 includes:

the audio clip cutting unit 601 determines a first difference value when the first time difference is greater than a second time difference, wherein the first difference value is equal to the first time difference minus the second time difference, identifies music playing, cuts out an audio clip equal to the first difference value, and the audio clip is preferentially music playing;

and the video clip cutting unit 602 is used for determining a second difference value when the first time difference is smaller than the second time difference, wherein the second difference value is equal to the second time difference minus the first time difference, receiving a cuttable clip, and cutting off a video clip equal to the second difference value, and the video clip belongs to the cuttable clip.

The embodiment of the invention also provides electronic equipment, which comprises a processor, a readable storage medium and a computer program which is stored on the readable storage medium and can run on the processor, wherein the specific steps in the video production method are realized when the processor executes the computer program.

The embodiment of the invention also provides a readable storage medium, wherein the readable storage medium stores a program or instructions which when executed by a processor realize specific steps in the video production method.

The foregoing description of the preferred embodiments of the present invention should not be taken as limiting the invention, but rather should be understood to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method of video production, the method comprising the steps of:

receiving an audio file and a video file;

2. The method for producing video according to claim 1, wherein the step of engaging the audio key point with the video key frame by shifting the audio or video comprises:

3. The method for producing video according to claim 1, wherein the step of clipping the audio or video to clip the audio key points and the video key frames specifically comprises:

4. The method of claim 3, wherein the step of receiving the croppable segments and cropping out the video segments equal to the second difference value comprises:

5. The method of claim 1, further comprising speech recognition of the audio file to obtain an audio subtitle, the audio subtitle being editable.

6. A video production system, the system comprising:

7. The video production system of claim 6, wherein the first audio-video processing module comprises:

8. The video production system of claim 6, wherein the second audio-video processing module comprises:

9. An electronic device comprising a processor, a readable storage medium and a computer program stored on the readable storage medium and capable of running on the processor, which when executed by the processor, performs the specific steps in the video production method according to any one of claims 1 to 5.

10. A readable storage medium, characterized in that it has stored thereon a program or instructions which, when executed by a processor, realizes the specific steps in the video production method according to any one of claims 1 to 5.