CN115460455B

CN115460455B - Video editing method, device, equipment and storage medium

Info

Publication number: CN115460455B
Application number: CN202211083089.3A
Authority: CN
Inventors: 王传鹏; 李腾飞; 卢炬康; 张婷
Original assignee: Shanghai Hard Link Network Technology Co ltd
Current assignee: Shanghai Hard Link Network Technology Co ltd
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2024-02-09
Anticipated expiration: 2042-09-06
Also published as: CN115460455A

Abstract

The invention discloses a video editing method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring original video data for popularizing a business object; marking a clip point in the original video data when a clipping operation is received; detecting target subtitle data belonging to the same paragraph in the original video data; adjusting the clipping points to preserve the target subtitle data; the target video data is clipped from the original video data according to the clipping points after adjustment, and the target caption data belonging to the same paragraph has certain independence and integrity in terms of semantics, so that the dubbing associated with the target caption data also has certain independence and integrity, the pictures of the original video data corresponding to the target caption data also have certain independence and integrity, the clipping points are adjusted during clipping, the target caption data can be saved, correspondingly, dubbing can be saved, the situation of cutting dubbing is avoided, and the integrity of dubbing is ensured.

Description

Video editing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of multimedia technologies, and in particular, to a video editing method, apparatus, device, and storage medium.

Background

In the scene of popularizing service objects such as games, electronic products and the like, video data are often used for introducing the service objects such as games, electronic products and the like, and the video data present the information of the service objects such as games, electronic products and the like in a picture and sound mode, so that the user can read the information conveniently.

After the original video data is recorded, the art staff mainly clips the video data by using a professional video clipping tool, namely, the playing progress of the video data is continuously dragged, the video data is quickly browsed and clipped, and the clipping place mainly depends on the picture content of the art staff, so that the problem of incomplete dubbing frequently occurs.

Disclosure of Invention

The invention provides a video editing method, a device, equipment and a storage medium, which are used for solving the problem of how to ensure the integrity of dubbing when editing video data.

According to an aspect of the present invention, there is provided a video editing method, including:

acquiring original video data for popularizing a business object;

marking a clip point in the original video data when a clipping operation is received;

detecting target subtitle data belonging to the same paragraph in the original video data;

adjusting the editing point to ensure the target subtitle data;

And editing target video data from the original video data according to the adjusted editing point.

According to another aspect of the present invention, there is provided a video editing apparatus comprising:

the original video data acquisition module is used for acquiring original video data for popularizing the business object;

the editing point labeling module is used for labeling editing points in the original video data when editing operation is received;

the target subtitle data detection module is used for detecting target subtitle data belonging to the same paragraph in the original video data;

the editing point adjusting module is used for adjusting the editing point so as to save the target subtitle data;

and the target video data clipping module is used for clipping target video data from the original video data according to the clipping points after adjustment.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the video editing method of any of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing a computer program for causing a processor to implement the video editing method according to any of the embodiments of the present invention when executed.

In this embodiment, original video data for promoting a business object is acquired; marking a clip point in the original video data when a clipping operation is received; detecting target subtitle data belonging to the same paragraph in the original video data; adjusting the clipping points to preserve the target subtitle data; the target video data is clipped from the original video data according to the clipping points after adjustment, and the target caption data belonging to the same paragraph has certain independence and integrity in terms of semantics, so that the dubbing associated with the target caption data also has certain independence and integrity, the pictures of the original video data corresponding to the target caption data also have certain independence and integrity, the clipping points are adjusted during clipping, the target caption data can be saved, correspondingly, dubbing can be saved, the situation of cutting dubbing is avoided, and the integrity of dubbing is ensured.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a video editing method provided according to a first embodiment of the present invention;

FIG. 2 is an exemplary diagram of an ending fragment provided in accordance with one embodiment of the present invention;

fig. 3 is an exemplary diagram of editing original video data provided according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a video editing apparatus according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1 is a flowchart of a video editing method according to a first embodiment of the present invention, where the method may be performed by a video editing apparatus, which may be implemented in hardware and/or software, and the video editing apparatus may be configured in an electronic device, where editing points are adjusted according to semantic integrity of subtitle data, so as to ensure sound integrity. As shown in fig. 1, the method includes:

step 101, obtaining original video data for popularizing a business object.

Objects with service characteristics in the service scene are distributed in different service scenes and are recorded as service objects, the service objects can be physical objects, such as mobile phones, tablet computers, smart watches, and the like, the service objects can also be virtual objects, and application programs of most third parties, such as games, short video applications, shopping applications, and the like, and the embodiment is not limited to this.

In order for those skilled in the art to better understand the present invention, in this embodiment, a game is described as an example of a business object.

The types of games may include MOBA (Multiplayer Online Battle Arena, multiplayer online tactical Game), RPG (Role-playing Game), SLG (strategy Game), etc., which is not limited in this embodiment.

For a given business object, an artist may make one or more video data that may be clipped over different channels in advance, and record the video data as original video data, e.g., the original video data has a longer duration that is greater than or equal to the duration of all channel restrictions, so that the artist may prune for a particular channel, the original video data does not configure background music, so that the artist may configure background music for a particular channel, and so on.

Further, the content (including pictures and sounds) of the original video data is related to the service object, and can be used for introducing the service object and popularizing the service object.

Taking a game as an example, the content of the original video data can be divided into two main forms of the content of the game and the real scenario, wherein the content of the game can be introduced for a process of controlling the game by a user, the game can be introduced for a speaker, the game can also be introduced by the speaker wearing clothes in the game, and the scenario can be further divided into the following categories:

1. pseudo-food sharing

The original video data contains some food material, attracts attention of users, and implants a playing method for playing games and eating food.

2. Proximate to the living subject matter of the user

The content of the original video data is close to the current living state of the user, and games are implanted in the aspects of living, such as playing games, eating, buying snacks and the like. The first half of the material is mainly 2 people dialogue, and the second half is the implanted section of the game.

3. Exaggerated situation play

The original video data contains the material of the scenario, and some scenarios are exaggerated to attract the attention of the user.

Of course, the above original video data is merely an example, and other original video data may be set according to actual situations when the present embodiment is implemented, which is not limited thereto. In addition, in addition to the original video data described above, those skilled in the art can use other original video data according to actual needs, which is not limited in this embodiment.

In practical applications, as shown in fig. 2, the original video data 210 has multi-frame image data 211, and in order to popularize a business object, information such as an icon (Logo) 212, banner information (Banner) 213, and an End Clip (EC) 214 is generally configured in different image data 211.

The icon Logo is a Logo of the service object, and may be a literal icon Logo (including a name of the service object (such as "XX game")), or may be a graphical icon Logo.

The Banner information Banner is generally rectangular information, typically located at the top and/or bottom of the image data, and may record information about the business object itself (e.g., picture in game, character in game, name), information that attracts the user to purchase or download the business object (e.g., gift code).

As shown in fig. 3, the end section EC has an identifier of the downloaded service object, for example, information of the service object itself (such as a picture, a character, a name (such as "XX game") in the game), a manner of purchasing or downloading the service object (such as an icon of the application distribution platform, a name (such as "ABC App Stroe", "EFG Play") and an icon of the application distribution platform, a name and an icon of the shopping platform, and the like).

Further, as shown in fig. 2, the original video data may have an end segment 214 in the end, so as to prevent the end segment 214 from interfering with the clip, whether the end segment 214 exists may be detected at the end of the original video data by a feature such as a duration (the end segment is typically at the end for the last 6 seconds), a color (the end segment has a transition picture with a distinct color such as black), and the like, and when the end segment 214 is detected, the end segment 214 is deleted.

Step 102, when a clipping operation is received, clipping points are marked in the original video data.

In practical applications, various editing operations may be provided for the original video data, such as clipping (also called cropping), rotation, mirroring, segmentation, shifting, adding filters, adding text, adding special effects, adding animations, background settings, transition settings, stylization, and the like.

When the original video data is loaded, new thumbnail images, sounds and the like of pictures are displayed on an interface through a time axis (track), various controls representing editing operations are provided on the interface, and an artist can trigger the controls on the interface through clicking, long-pressing, dragging and other operations according to the service requirements, so that clipping operations are triggered.

In this embodiment, if a clipping operation triggered by an artist is received, a clipping point may be marked in the original video data according to an instruction of the clipping operation, where the clipping point is a point in time when clipping is performed on a time axis of the original video data, and in general, the clipping point is a pair of points in time, so that data between the pair of clipping points is clipped.

Further, in order to improve the flexibility of editing, various types of editing operations may be provided to the artist, and then, when the editing operation is received, the type of the editing operation is determined so that the editing point is marked in the original video data according to the indication of the type.

In one example, the type of clipping operation includes at least one of:

1. clip head

If the type is the beginning of the clip, that is, the part of the original video data from which the artist intends to clip, the starting point of the original video data may be marked as one of the clip points on the time axis of the original video data, and a first time period preset after the starting point is marked as another clip point.

The first time period may be set by an artist, or may be a default value (e.g., 10 seconds), etc.

2. Clip end

If the type is the end of the clip, i.e., the data of the end portion of the clip that the artist intends to clip from the original video data, it can be detected whether there is an end segment at the end of the original video data.

And marking the ending point of the original video data as one clip point on the time axis of the original video data under the condition that the ending segment is included in the original video data and the ending segment is deleted, and marking the other clip point by a preset second time period before the ending point.

3. Intermediate of clips

If the type is middle of the clip, i.e., the artist is intended to clip the data of the middle portion from the original video data, it can be detected whether there is an end segment at the end of the original video data.

In the condition that the original video data has an end clip and the end clip has been deleted, two points of editing can be marked in the time axis of the original video data based on the picture of the original video data according to a model of Swav, BYOL, self-Label or the like, wherein the two points of editing are other points of time on the time axis of the original video data except the starting point and the end point of the original video data.

4. Custom clipping

If the type is custom clipping, the artist is intended to manually clip part of the data from the original video data, at this time, two time points may be read in the clipping operation, so that the positions at the time points in the time axis of the original video data are respectively marked as clipping points, where the two clipping points are any time point on the time axis of the original video data, and may include a start point and an end point of the original video data.

Of course, the types of the above-described clipping operations are merely examples, and other types of clipping operations may be set according to actual situations when the present embodiment is implemented, which is not limited thereto. In addition, in addition to the types of the clipping operations described above, the person skilled in the art may employ other types of clipping operations according to actual needs, which is not limited in this embodiment.

If the ending segment at the end of the original video data has been deleted and the clip point has been determined, there may be a case where the clip point is disconnected from dubbing (i.e., audio data), at this time, the clip point with a larger time point may be moved backward on the time axis of the original video data according to a preset step (e.g., 2 seconds) to increase the duration of the clip, thereby reducing the case where the clip point is disconnected from dubbing.

Of course, if the clip point whose time point is large is the end point of the original video data, the clip point is maintained as the end point and is not moved backward.

Step 103, detecting the target caption data belonging to the same paragraph in the original video data.

As shown in fig. 2, in order to promote a business object, the original video data has dubbing adapted to a picture, and in order to facilitate a user to browse the original video data, caption data 215 corresponding to the dubbing is synchronously displayed on the picture of the original video data.

In practical applications, the frames of the original video data have a certain storyline, and the dubbing and the subtitle data are also set in coordination with the storyline, so that the dubbing and the subtitle data have a certain consistency semantically.

In this embodiment, the subtitle data is detected in the original video data, and the subtitle data is aggregated according to semantics to obtain the target subtitle data belonging to the same paragraph, where the target subtitle data belonging to the same paragraph has certain independence and integrity in terms of semantics.

In one embodiment of the present invention, step 103 may include the steps of:

in step 1031, the original video data is segmented into a plurality of segments by using the scene as a segmentation node.

In this embodiment, the boundary of scene switching (such as fade in, fade out, cut into black, etc.), the similarity between pictures, the color features and the structural features, etc. are used to segment the original video data into a plurality of segments, where each segment has one or more independent scenes.

In general, the data including one independent scene in the original video data may be segmented into one video segment, and in consideration of a short duration of some data including one independent scene, the scene may be combined with other adjacent scenes, so that the data including two or more connected scenes in the original video data may be segmented into one segment.

In one slicing manner, to extract color features and structural features of original video data, for each frame of image data in the original video data, the image data may be converted from a first color space, which is an RGB color space, representing Red (Red), green (Green), blue (Blue), to a second color space, which is an HSV color space, representing Hue (Hue), saturation (Saturation), and brightness (Value).

Wherein, the value range of the tone H is 0-360 degrees, the red is 0 degrees, the green is 120 degrees and the blue is 240 degrees according to the anticlockwise direction from the red. The higher the saturation S, the darker and more colorful the color. The white light component of the spectral color is 0, and the saturation reaches the highest. The value range is usually 0% to 100%, and the larger the value is, the more saturated the color is. H represents the degree of brightness of the color, and for the light source color, the brightness value is related to the luminance of the illuminant; for object colors, this value is related to the transmittance or reflectance of the object. Typically the values range from 0% (black) to 100% (white).

In the second color space, every two adjacent frames of image data are traversed, differences in all color channels of the two adjacent frames of image data are calculated, and an average value is calculated for the differences.

If the average value is larger than a preset transition threshold value, the color difference is larger, a time point between two adjacent frames of image data is determined to be a transition point, wherein the transition point represents scene switching, and the original video data is segmented on the transition point to obtain a plurality of segments.

In the method, the scene is detected by using the color features and the structural features in consideration of the characteristics of stable scene and less change in the ending segment, so that the calculation mode is simple, the time consumption of calculation can be reduced, and the accuracy of detecting the scene can be ensured.

Step 1032, detecting original subtitle data belonging to the same sentence in the clip.

One or more independent scenes are arranged in one segment, and the dubbing and the caption data thereof configured for the scenes are also independent, so that the caption data can be detected in each segment of the original video data respectively, and the captions in the same segment can be considered to belong to the same sentence semantically and can be recorded as the original caption data.

Typically, when recording the original video data and dubbing, one or more sentences are configured for the scenes in a segment, and thus, one or more independent sentences are typically contained in a segment.

In a specific implementation, for each frame of image data in a segment, optical character recognition (optical character recognition, OCR) may be performed on each frame of image data using a convolutional neural network, such as a convolutional recurrent neural network (Convolutional Recurrent Neural Network, CRNN), respectively, to obtain raw text information.

In the picture of the original video data, there may be some text in the background in addition to the subtitle data, which may interfere with the recognition of the subtitle data.

In general, the original video data is the subtitle data displayed at the same time, and does not change with the scene, while the text in the background changes with the scene.

Thus, for each original text information, a first amount of image data of the original text information appears in the clip, i.e. the first amount of the original text information appears in the clip.

If the first number is greater than or equal to the preset frame number threshold, the frequency of the occurrence of the original text information is high, and the larger probability is caption data, so that the original text information can be reserved.

If the first number is smaller than the preset frame number threshold value, the occurrence frequency of the original text information is lower, and the larger probability is that the text in the background, the original text information is filtered.

The original text information is marked in the form of text boxes, and due to the fact that subtitle data are longer, a certain gap possibly exists in the middle of the subtitle data, the same subtitle data are segmented into a plurality of text boxes, at the moment, the text boxes with overlapped height ranges can be found out for the height ranges of different text boxes in the vertical direction, the marked original text information can be considered to be in the same line, and therefore the original text information in the same line is combined, and the target text information is obtained.

Since the caption data is generally located at a fixed position such as the middle position and the lower position, a region with higher confidence of the caption data can be marked in advance, the position of the target text information is compared with the region, and if the target text information is located in the preset region, the target text information is determined to be the caption data belonging to the same sentence and is recorded as the original caption data.

Step 1033, comparing each original caption data to merge the original caption data into the target caption data belonging to the same paragraph.

In this embodiment, the original subtitle data in each segment is compared with the semantically related relationship, so that the semantically related original subtitle data are combined and recorded as the target subtitle data belonging to the same segment.

In a specific implementation, the original caption data corresponding to each segment may be traversed sequentially, and if a certain segment does not have the original caption data, the original caption data in the segment may be considered to be null, and then the original caption data corresponding to a segment preceding the segment and the original caption data corresponding to a segment following the segment belong to different segments.

In the traversing process, the similarity is calculated on the original subtitle data corresponding to the adjacent two segments by using modes of TF-IDF (term frequency-inverse text frequency index), BM25, text distance, simhash (hash similarity), LSI (Latent Semantic Indexing, latent semantic index), deep learning and the like.

For example, the longest continuous matching sequence may be found in the original subtitle data corresponding to two adjacent segments, where the sequence does not contain blank elements.

On the one hand, the second number M of elements contained in the sequence is counted, and on the other hand, the third number T of all elements contained in the original subtitle data corresponding to the two adjacent fragments is counted.

A ratio T/M of the second number to the third number is calculated.

And multiplying the ratio by a preset coefficient alpha to obtain the similarity between the original subtitle data corresponding to the two adjacent fragments, wherein the coefficient is larger than 1 (e.g. 2), namely the similarity sim=alpha x T/M.

If the similarity is greater than or equal to a preset similarity threshold, the original subtitle data corresponding to the two adjacent segments are similar, and the original subtitle data corresponding to the two adjacent segments can be determined to belong to the same segment.

And if the traversal is finished, merging the original caption data belonging to the same paragraph to obtain target caption data.

Step 104, adjusting the clipping point to secure the target caption data.

Because the target caption data belonging to the same paragraph has certain independence and integrity in terms of semantics, the dubbing associated with the target caption data has certain independence and integrity, and the picture of the original video data corresponding to the target caption data also has certain independence and integrity.

The original video data may be cut by an artist, and therefore, the cut point may be finely adjusted to preserve the target subtitle data, i.e., to include the complete target subtitle data in the cut target video data.

In a specific implementation, the first time range in which the clip points are located and the second time range in which the target subtitle data are located may be compared, so as to determine whether any complete target subtitle data is included between the clip points.

If yes (i.e. the first time range and the second time range do not overlap), the clipping point is maintained, and the time point where the clipping point is located is not adjusted.

If not (i.e. there is an overlap between the first time range and the second time range), the clip points can be moved forward and/or backward on the time axis of the original video data, and it is re-determined whether any complete target subtitle data is included between the clip points.

Further, if a clip point is a start point or an end point of the original video data, the clip point is maintained as the start point or the end point, and is not moved forward and/or backward on the time axis of the original video data.

Further, if a range overlapping with the first time range in the second time range is referred to as an overlapping range and a range not overlapping with the first time range in the second time range is referred to as a non-overlapping range, there are generally two cases in which there is an overlap of the first time range with the second time range:

1. the overlap range is smaller than the non-overlap range

In order to reduce the magnitude of the adjustment, when the overlapping range is smaller than the non-overlapping range, the clip point located in the second time range may be moved forward or backward on the time axis of the original video data such that the first time range after the adjustment of the clip point exits the second time range.

Wherein the clip point moves at least to overlap with an end point of the current second time range, and at most to overlap with an end point of the next second time range.

2. The overlap range is greater than the non-overlap range

When the overlapping range is greater than the non-overlapping range, the clip point located in the second time range may be moved forward or backward on the time axis of the original video data such that the first time range after adjusting the clip point covers the first time range.

To reduce the amplitude of the adjustment, the clip point may be moved to overlap with the nearest end point of the current second time range.

For example, as shown in FIG. 2, a clip operation is triggered for the original video data 210, customizing the clip point T ₁₁ And clip point T ₁₂ A first time range 224 is composed.

Three segments of target subtitle data are detected in the original video data 210, and three second time ranges are correspondingly divided into a second time range 221, a second time range 222, and a second time range 223, where the second time range 221 includes an endpoint T ₂₁ Endpoint T ₂₂ The second time range 222 includes the endpoint T ₂₃ Endpoint T ₂₄ The second time range 223 includes endpoint T ₂₅ Endpoint T ₂₆ 。

Wherein the first time range 224 and the second time range 221 are partially overlapped, the first time range 224 and the second time range 222 are partially overlapped, and the first time range 224 and the second time range 223 are not overlapped.

Since the overlapping range of the second time range 221 is smaller than the non-overlapping range, then the clip point T is moved backward on the time axis of the original video data ₁₁ Clip point T ₁₁ Movable to T ₂₂ And T is ₂₃ Preferably with T ₂₂ Overlapping.

Since the overlapping range of the second time range 221 is greater than the non-overlapping range, then the clip point T is moved backward on the time axis of the original video data ₁₂ Clip point T ₁₂ Movable to T ₂₄ And T is ₂₅ Preferably with T ₂₄ Overlapping.

Of course, the above-described manner of adjusting the clip points is merely an example, and other manners of adjusting the clip points may be set according to actual situations when implementing the present embodiment, which is not limited thereto. In addition, other ways of adjusting the clipping points can be adopted by those skilled in the art according to actual needs, and the present embodiment is not limited thereto.

And step 105, clipping the target video data from the original video data according to the clipping point after adjustment.

As shown in FIG. 2, if the adjustment of the clip point is completed, the original video data 210 can be clipped to the clip point (e.g., clip point T ₁₁ And clip point T ₁₂ ) The data therebetween, denoted as target video data 220.

For the target video data, the target video data can be output to an artist for browsing, and the artist performs post-processing on the target video data according to editing conditions, for example, changing background music, performing stylization processing on the whole target video data, adding an end segment, and the like.

If post-processing is completed, the target video data can be distributed in a designated channel (such as news information, short video, novel reading, sports health, etc.), then when the client accesses the channel, the target video data is pushed to the client for playing, and when the user is interested in the service object, the user can search the service object through the information in the target video data, for example, search and download games from an application distribution platform, etc.

Example two

Fig. 4 is a schematic structural diagram of a video editing apparatus according to a second embodiment of the present invention. As shown in fig. 4, the apparatus includes:

an original video data obtaining module 401, configured to obtain original video data for promoting a business object;

a clipping point labeling module 402, configured to label clipping points in the original video data when a clipping operation is received;

a target caption data detection module 403, configured to detect target caption data belonging to the same paragraph in the original video data;

a clipping point adjusting module 404, configured to adjust the clipping point to secure the target subtitle data;

and the target video data clipping module 405 is configured to clip target video data from the original video data according to the adjusted clipping point.

In one embodiment of the invention, the end of the original video data has an end segment with an identification to download the business object;

the clipping point labeling module 402 includes:

a clipping type determining module, configured to determine a type of a clipping operation when the clipping operation is received;

the beginning marking module is used for marking a starting point of the original video data as a clipping point if the type is clipping beginning, and marking the clipping point after the starting point;

The end marking module is used for marking the ending point of the original video data as a clipping point under the condition that the ending segment is deleted in the original video data if the type is the clipping end, and marking the clipping point before the ending point;

the middle labeling module is used for labeling clipping points in the original video data based on pictures of the original video data under the condition that the ending segment is deleted in the original video data if the type is clipping middle;

the custom labeling module is used for reading a time point in the editing operation if the type is custom editing; the position in the original video data at the time point is marked as a clipping point.

In one embodiment of the present invention, the target subtitle data detection module 403 includes:

the video slicing module is used for slicing the original video data into a plurality of fragments by taking a scene as a sliced node;

the original subtitle data detection module is used for detecting original subtitle data belonging to the same sentence in the fragment;

and the original subtitle data merging module is used for comparing each original subtitle data so as to merge the original subtitle data into target subtitle data belonging to the same paragraph.

In one embodiment of the invention, the video slicing module comprises:

a color space conversion module for converting, for each frame of image data in the original video data, the image data from a first color space representing red, green, blue to a second color space representing hue, saturation, and brightness;

a color difference calculating module, configured to calculate, in the second color space, differences between the image data of two adjacent frames on all color channels;

the average value calculation module is used for calculating an average value of the differences;

a transition point determining module, configured to determine that a transition point is between two adjacent frames of image data if the average value is greater than a preset transition threshold, where the transition point represents scene switching;

the transition point segmentation module is used for segmenting the original video data at the transition point to obtain a plurality of fragments.

In one embodiment of the present invention, the original subtitle data detecting module includes:

the optical character recognition module is used for executing optical character recognition on the image data aiming at each frame of image data in the fragment to obtain original text information;

The text information merging module is used for merging the original text information positioned in the same line to obtain target text information;

and the original subtitle data determining module is used for determining that the target text information is original subtitle data belonging to the same sentence if the target text information is positioned in a preset area.

In one embodiment of the present invention, the original subtitle data detecting module further includes:

a first quantity counting module for counting, for each of the original text information, a first quantity of the image data in which the original text information appears in the segment;

the original text information retaining module is used for retaining the original text information if the first number is greater than or equal to a preset frame number threshold;

and the original text information filtering module is used for filtering the original text information if the first number is smaller than a preset frame number threshold value.

In one embodiment of the present invention, the original subtitle data merging module includes:

the original subtitle data traversing module is used for traversing the original subtitle data corresponding to each fragment in sequence;

the similarity calculation module is used for calculating the similarity of the original subtitle data corresponding to the two adjacent fragments in the traversing process;

A paragraph determining module, configured to determine that the original subtitle data corresponding to two adjacent segments belong to the same paragraph if the similarity is greater than or equal to a preset similarity threshold;

and the paragraph merging module is used for merging the original caption data belonging to the same paragraph if the traversal is finished to obtain target caption data.

In one embodiment of the present invention, the similarity calculation module includes:

a sequence searching module, configured to search a sequence that is longest and continuously matched in the original subtitle data corresponding to two adjacent segments, where the sequence does not include blank elements;

a second number statistics module for counting a second number of elements contained in the sequence;

a third number statistics module, configured to count a third number of all elements included in the original subtitle data corresponding to two adjacent segments;

the ratio calculating module is used for calculating the ratio of the second quantity to the third quantity;

and the ratio multiplying module is used for multiplying the ratio by a preset coefficient to obtain the similarity between the original subtitle data corresponding to the two adjacent fragments, wherein the coefficient is larger than 1.

In one embodiment of the present invention, the clipping point adjustment module 404 includes:

the relation judging module is used for judging whether any complete target subtitle data is contained between the clip points; if yes, calling a maintenance module, and if not, calling a mobile module;

a maintenance module for maintaining the clipping point;

and the moving module is used for moving the clipping point forwards and/or backwards and calling the inclusion relation judging module back.

In one embodiment of the present invention, further comprising:

an ending segment deleting module, configured to delete an ending segment at the end of the original video data, where the ending segment has an identifier for downloading the service object;

and the clipping time length increasing module is used for moving the clipping point backwards according to a preset step length if the ending fragment is deleted so as to increase the clipping time length.

The video editing device provided by the embodiment of the invention can execute the video editing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the video editing method.

Example III

Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the video clip method.

In some embodiments, the video clip method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the video editing method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the video clip method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

Example IV

Embodiments of the present invention also provide a computer program product comprising a computer program which, when executed by a processor, implements a video editing method as provided by any of the embodiments of the present invention.

Computer program product in the implementation, the computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A video editing method, comprising:

acquiring original video data for popularizing a business object;

adjusting the editing point to ensure the target subtitle data;

editing target video data from the original video data according to the adjusted editing point;

Wherein, the end of the original video data is provided with an ending segment, and the ending segment is provided with an identifier for downloading the business object;

the marking a clip point in the original video data when a clipping operation is received includes:

when a clipping operation is received, determining the type of the clipping operation;

if the type is the clip beginning, marking the starting point of the original video data as a clip point, and marking the clip point after the starting point;

if the type is the end of the clip, marking the end point of the original video data as a clip point under the condition that the end segment is deleted in the original video data, and marking the clip point before the end point;

if the type is the middle of the clip, marking a clip point in the original video data based on a picture of the original video data under the condition that the end segment is deleted in the original video data;

if the type is a custom clip, reading a time point in the clipping operation; the position in the original video data at the time point is marked as a clipping point.

2. The method according to claim 1, wherein detecting target subtitle data belonging to the same paragraph in the original video data includes:

Dividing the original video data into a plurality of fragments by taking a scene as a dividing node;

detecting original subtitle data belonging to the same sentence in the segment;

and comparing the original caption data to merge the original caption data into target caption data belonging to the same paragraph.

3. The method according to claim 2, wherein the splitting the original video data into a plurality of segments using the scene as a splitting node comprises:

converting the image data from a first color space representing red, green, blue to a second color space representing hue, saturation, and brightness for each frame of image data in the original video data;

calculating differences of the image data of two adjacent frames on all color channels in the second color space;

calculating an average value for the difference;

if the average value is larger than a preset transition threshold value, determining that a transition point is between two adjacent frames of image data, wherein the transition point represents scene switching;

and cutting the original video data at the transition point to obtain a plurality of fragments.

4. The method according to claim 2, wherein the detecting original subtitle data belonging to the same sentence in the segment includes:

performing optical character recognition on the image data for each frame of image data in the segment to obtain original text information;

combining the original text information positioned in the same line to obtain target text information;

and if the target text information is positioned in the preset area, determining that the target text information is the original subtitle data belonging to the same sentence.

5. The method of claim 4, wherein detecting original subtitle data belonging to the same sentence in the segment further comprises:

for each of the original text information, counting a first number of the image data in which the original text information appears in the segment;

if the first number is greater than or equal to a preset frame number threshold, retaining the original text information;

and if the first number is smaller than a preset frame number threshold value, filtering the original text information.

6. The method of claim 2, wherein the comparing each of the original subtitle data to merge the original subtitle data into target subtitle data belonging to the same paragraph includes:

Traversing the original subtitle data corresponding to each fragment in sequence;

in the traversing process, calculating the similarity of the original subtitle data corresponding to two adjacent fragments;

if the similarity is greater than or equal to a preset similarity threshold, determining that the original subtitle data corresponding to two adjacent fragments belong to the same paragraph;

and if the traversal is finished, merging the original subtitle data belonging to the same paragraph to obtain target subtitle data.

7. The method of claim 6, wherein the calculating the similarity of the original subtitle data corresponding to the two adjacent segments includes:

searching the longest continuous matching sequence in the original subtitle data corresponding to two adjacent fragments, wherein the sequence does not contain blank elements;

counting a second number of elements contained in the sequence;

counting the third number of all elements contained in the original subtitle data corresponding to two adjacent fragments;

calculating a ratio of the second number to the third number;

multiplying the ratio by a preset coefficient to obtain the similarity between the original subtitle data corresponding to two adjacent fragments, wherein the coefficient is larger than 1.

8. The method of claim 1, wherein said adjusting the clip point to preserve the target subtitle data comprises:

judging whether any complete target subtitle data is contained between the clip points;

if yes, maintaining the clipping point;

if not, moving the editing points forwards and/or backwards, and returning to execute the judgment whether any complete target subtitle data is contained between the editing points.

9. The method according to any one of claims 1-8, further comprising:

deleting an ending segment at the end of the original video data, wherein the ending segment is provided with an identifier for downloading the service object;

if the ending segment is deleted, the clipping point is moved backwards according to a preset step length so as to increase the clipping duration.

10. A video editing apparatus, comprising:

a target video data clipping module, configured to clip target video data from the original video data according to the adjusted clipping point;

the clipping point labeling module comprises:

11. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the video editing method of any of claims 1-9.

12. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program for causing a processor to implement the video editing method of any of claims 1-9 when executed.