CN105530554B

CN105530554B - Video abstract generation method and device

Info

Publication number: CN105530554B
Application number: CN201410570690.4A
Authority: CN
Inventors: 董振江; 邓硕; 田玉敏; 唐铭谦; 冯艳
Original assignee: Nanjing ZTE New Software Co Ltd
Current assignee: Nanjing ZTE New Software Co Ltd
Priority date: 2014-10-23
Filing date: 2014-10-23
Publication date: 2020-08-07
Anticipated expiration: 2034-10-23
Also published as: CN105530554A; WO2015184768A1

Abstract

The invention provides a video abstract generating method and a device, wherein the method comprises the following steps: dividing an original video into a plurality of views; dividing each object track contained in the original video into the view field with the closest object track according to the proximity degree of the object track and each view field; counting activity indexes of the vision field according to the activity degree of an object track in the vision field, and dividing each vision field into an important vision field and a secondary vision field according to whether the activity indexes exceed a preset threshold; and carrying out parallel processing on the object tracks in each important view and each secondary view, and combining the view obtained after the parallel processing to generate the video abstract. In the video abstract generating method, the object tracks in the important visual field and the secondary visual field are processed in parallel, so that the calculation amount of track combination is reduced, the calculation speed is increased, and a user can pay attention to the main target in the important visual field more simply and clearly.

Description

Video abstract generation method and device

Technical Field

The invention relates to the field of image recognition, in particular to a video abstract generating method and device.

Background

Video summarization, also called video concentration, is the summarization of video content, and is characterized in that moving objects are extracted through moving object analysis in an automatic or semi-automatic mode, then the moving tracks of all the objects are analyzed, different objects are spliced into a common background scene, and the different objects are combined in a certain mode. With the development of video technology, the role of video summarization in video analysis and content-based video retrieval is increasingly important.

In the field of social public safety, a video monitoring system becomes an important component for maintaining social security and strengthening social management. However, video recording has the characteristics of large data storage amount, long storage time and the like, and the traditional method for obtaining evidence through video recording searching clues consumes a large amount of manpower, material resources and time, so that the efficiency is extremely low, and the best solution solving opportunity is missed.

Aiming at the problem that the optimal abstract video cannot be quickly found from large-scale video data in the prior art, an effective solution is not provided at present.

Disclosure of Invention

In order to overcome the defects in the prior art, the embodiment of the invention provides a video abstract generating method and device.

In order to solve the technical problem, the embodiment of the invention adopts the following technical scheme:

according to an aspect of the embodiments of the present invention, there is provided a video summary generation method, including: dividing an original video into a plurality of views; dividing each object track contained in the original video into the view field with the closest object track according to the proximity degree of the object track and each view field; counting activity indexes of the vision field according to the activity degree of an object track in the vision field, and dividing each vision field into an important vision field and a secondary vision field according to whether the activity indexes exceed a preset threshold; and carrying out parallel processing on the object tracks in each important view and each secondary view, and combining the view obtained after the parallel processing to generate the video abstract.

Wherein the dividing the original video into a plurality of views comprises: determining the direction of a scene in an original video; and dividing the original video into a plurality of visual fields according to the direction of the scene, wherein the directions of the plurality of visual fields are consistent with the direction of the scene.

Wherein the determining the direction of the scene in the original video comprises: acquiring initial points and end points of a plurality of object tracks in a scene in the original video; calculating a coordinate difference value according to an initial point and a final point of the object track, and determining the direction of the object track; and judging the direction of a scene in the original video according to the direction of most of the object tracks in the object tracks, wherein the direction of the scene is consistent with the direction of most of the object tracks in the object tracks.

Wherein, according to the proximity degree of the object track and each view, dividing each object track contained in the original video into the view with the closest object track, comprises: acquiring line segment characteristics for each field of view, the line segment characteristics comprising: coordinates of a start point and a stop point of the visual field and the number of object tracks contained in the visual field; acquiring coordinates of a start point and a stop point of an object track, and calculating the degree of closeness of the object track and each view; dividing each object track contained in the original video into the view field with the closest object track according to the proximity degree; and updating the line segment characteristics of the closest visual field according to the coordinates of the start point and the stop point of the object track.

Wherein, according to the activity degree of the object track in the visual field, the activity degree index of the visual field is counted, and according to whether the activity degree index exceeds a preset threshold, each visual field is divided into an important visual field and a secondary visual field, including: the activity degree is positively correlated with the object area corresponding to the object track and the duration of the object track, and the activity degree index of the statistic view field is as follows: summing the activity degrees of all object tracks in the view field to obtain an activity degree index of the view field; and dividing each visual field into an important visual field and a secondary visual field according to whether the activity index exceeds a preset threshold.

Optionally, the performing parallel processing on the object tracks in each important view and the secondary view, and combining the views obtained after the parallel processing to generate the video summary includes: if the plurality of views are all important views, respectively solving the optimal solution of the object track combination of each view by adopting a first preset function, and further determining the optimal object track combination corresponding to the optimal solution; and generating a video abstract according to the optimal object track combination of all the views.

Optionally, the performing parallel processing on the object tracks in each important view and the secondary view, and combining the views obtained after the parallel processing to generate the video summary includes: if the plurality of vision fields are all secondary vision fields, respectively solving the optimal solution of the object track combination of each vision field by adopting a second preset function, and further determining the optimal object track combination corresponding to the optimal solution; and generating a video abstract according to the optimal object track combination of all the views.

Optionally, the performing parallel processing on the object tracks in each important view and the secondary view, and combining the views obtained after the parallel processing to generate the video summary includes: if the plurality of view areas comprise important view areas and secondary view areas, if two important view areas are adjacent, combining the two important view areas into one important view area, and solving the optimal solution of the object track combination by adopting a first preset function for the combined important view areas; if the important views are not adjacent to each other, respectively solving the optimal solution of the object track combination of each important view by adopting a first preset function, and further determining the optimal object track combination corresponding to the optimal solution; respectively solving the optimal solution of the object track combination of each secondary view by adopting a second preset function, and further determining the optimal object track combination corresponding to the optimal solution; and generating a video abstract according to the optimal object track combination of all the views.

Optionally, the performing parallel processing on the object tracks in each important view and the secondary view, and combining the views obtained after the parallel processing to generate the video summary includes: if the plurality of view areas comprise important view areas and secondary view areas, if two important view areas are adjacent, combining the two important view areas into one important view area, and solving the optimal solution of the object track combination by adopting a first preset function for the combined important view areas; if the important views are not adjacent to each other, respectively solving the optimal solution of the object track combination of each important view by adopting a first preset function, and further determining the optimal object track combination corresponding to the optimal solution; copying the object track in the secondary view field into a background image according to the original video; and combining all the views according to the processing result to generate the video abstract.

According to another aspect of the embodiments of the present invention, there is also provided a video summary generating apparatus, including: a first partitioning module for partitioning an original video into a plurality of views; the classification module is used for dividing each object track contained in the original video into the view field with the closest object track according to the proximity degree of the object track and each view field; the second division module is used for counting the activity degree index of the vision field according to the activity degree of the object track in the vision field, and dividing each vision field into an important vision field and a secondary vision field according to whether the activity degree index exceeds a preset threshold; and the merging processing module is used for carrying out parallel processing on the object tracks in each important view and each secondary view, merging the view obtained after the parallel processing and generating the video abstract.

Wherein the first partitioning module comprises: the first computing unit is used for determining the direction of a scene in an original video; and the first dividing unit is used for dividing the original video into a plurality of visual fields according to the direction of the scene, and the directions of the visual fields are consistent with the direction of the scene.

Wherein the first calculation unit includes: the first acquisition unit is used for acquiring initial points and end points of a plurality of object tracks in a scene in the original video; the difference value calculation unit is used for performing coordinate difference value calculation according to the initial point and the end point of the object track and determining the direction of the object track; and the judging unit is used for judging the direction of a scene in the original video according to the direction of most of the object tracks in the object tracks, wherein the direction of the scene is consistent with the direction of most of the object tracks in the object tracks.

Wherein the classification module comprises: a second acquisition unit configured to acquire a line segment feature of each field of view, the line segment feature including: coordinates of a start point and a stop point of the visual field and the number of object tracks contained in the visual field; the distance calculation unit is used for acquiring a starting point and a terminating point of the object track and calculating the proximity degree of the object track and each view; the first classification unit is used for dividing each object track contained in the original video into the most approximate visual field of the object track according to the proximity degree; and the updating unit is used for updating the line segment characteristics of the closest visual field according to the coordinates of the start point and the end point of the object track.

Wherein the second partitioning module comprises: an activity index calculation unit, wherein the activity level of the object track is positively correlated with the object area corresponding to the object track and the duration of the object track, and the activity index of the statistic view field is as follows: summing the activity degrees of all object tracks in the view field to obtain an activity degree index of the view field; and the second dividing unit is used for dividing each visual field into an important visual field and a secondary visual field according to whether the activity index exceeds a preset threshold.

Optionally, the merge processing module includes: the first merging unit is used for respectively solving the optimal solution of the object track combination of each view by adopting a first preset function if the plurality of views are all important views, and further determining the optimal object track combination corresponding to the optimal solution; and the first processing unit is used for generating the video abstract according to the optimal object track combination of all the views.

Optionally, the merge processing module includes: the second merging unit is used for solving the optimal solution of the object track combination of each view by adopting a second preset function if the plurality of views are all secondary views, and further determining the optimal object track combination corresponding to the optimal solution; and the second processing unit is used for generating the video abstract according to the optimal object track combination of all the views.

Optionally, the merge processing module includes: a third merging unit, configured to merge two important views into an important view if the multiple views include the important view and the secondary view, and solve an optimal solution of the object trajectory combination by using a first preset function for the merged important views; if the important views are not adjacent to each other, respectively solving the optimal solution of the object track combination of each important view by adopting a first preset function, and further determining the optimal object track combination corresponding to the optimal solution; respectively solving the optimal solution of the object track combination of each secondary view by adopting a second preset function, and further determining the optimal object track combination corresponding to the optimal solution; and the third processing unit is used for generating the video abstract according to the optimal object track combination of all the views.

Optionally, the merge processing module includes: a fourth merging unit, configured to merge two important views into an important view if the multiple views include the important view and the secondary view, and solve an optimal solution of the object trajectory combination by using a first preset function for the merged important views; if the important views are not adjacent to each other, respectively solving the optimal solution of the object track combination of each important view by adopting a first preset function, and further determining the optimal object track combination corresponding to the optimal solution; copying the object track in the secondary view field into a background image according to the original video; and the fourth processing unit is used for combining all the views according to the processing result to generate the video abstract.

The embodiment of the invention has the following beneficial effects: in the video abstract generating method of the embodiment of the invention, the operation amount of the track combination is reduced, the operation speed is accelerated and the user can pay attention to the main target in the important visual field more simply and clearly by parallel processing of the object tracks in the important visual field and the secondary visual field.

Drawings

FIG. 1 is a flow chart illustrating the basic steps of a video summary generation method according to an embodiment of the present invention;

fig. 2 is a diagram of an application of a video summary generation method according to an embodiment of the present invention;

FIG. 3 is a second application diagram of a video summary generation method according to an embodiment of the present invention;

FIG. 4 is a third application diagram of a video summary generation method according to an embodiment of the present invention;

FIG. 5 is a fourth application diagram of a video summary generation method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video summary generation apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

Example one

As shown in fig. 1 and fig. 2, which are schematic diagrams of an embodiment of the present invention, as shown in fig. 1, an embodiment of the present invention provides a video summary generation method, including:

step 101, dividing an original video into a plurality of views;

step 102, dividing each object track contained in the original video into the view field with the closest object track according to the proximity degree of the object track and each view field;

step 103, counting activity indexes of the vision field according to the activity degree of the object track in the vision field, and dividing each vision field into an important vision field and a secondary vision field according to whether the activity indexes exceed a preset threshold;

and 104, performing parallel processing on the object tracks in each important view and each secondary view, and combining the views obtained after the parallel processing to generate the video abstract.

In the video abstract generating method, the object tracks in the important visual field and the secondary visual field are processed in parallel, so that the calculation amount of track combination is reduced, the calculation speed is increased, and a user can pay attention to the main target in the important visual field more simply and clearly.

Further, step 101 in the above embodiment of the present invention specifically includes:

determining the direction of a scene in an original video;

and dividing the original video into a plurality of visual fields according to the direction of the scene, wherein the directions of the plurality of visual fields are consistent with the direction of the scene.

That is, the original video can be divided into k views according to the actual requirement, where k is a positive integer.

In the above embodiment, the calculating the direction of the scene in the original video may be implemented by the following calculation method:

firstly, acquiring initial points and end points of a plurality of object tracks in a scene in an original video;

the plurality of tracks may be all tracks in the original video scene or a part of tracks in the original video scene, for example, the original video scene includes 100 object tracks, and when calculating the direction of the scene, 20 tracks of the original video scene or all 100 tracks of the original video scene may be taken.

Then, calculating a coordinate difference value according to the initial point and the end point of the object track, and determining the direction of the object track;

if the calculation result of the coordinate difference value between the initial point and the end point of the object track is as follows: if the absolute value of the difference value of the vertical coordinates of the start point and the stop point is greater than the absolute value of the difference value of the horizontal coordinates, judging that the direction of the track is the vertical direction; and if the absolute value of the difference value of the vertical coordinates of the start point and the stop point is smaller than the absolute value of the difference value of the horizontal coordinates, judging that the direction of the track is the transverse direction.

And judging the direction of a scene in the original video according to the direction of most of the object tracks in the object tracks, wherein the direction of the scene is consistent with the direction of most of the object tracks in the object tracks.

That is, if the directions of most of the object tracks in the object tracks are the horizontal direction or the vertical direction, the corresponding direction of the scene is the horizontal direction or the vertical direction.

Specifically, step 102 in the above embodiment of the present invention includes:

acquiring line segment characteristics for each field of view, the line segment characteristics comprising: the starting point and the ending point of the view and the number of the object tracks contained in the view;

the line segment characteristics of the view include, but are not limited to, the coordinates of the start and end points of the view and the number of object trajectories included in the view.

Acquiring coordinates of a start point and a stop point of an object track, and calculating the degree of closeness of the object track and each view;

wherein the proximity of the object trajectory to each field of view may be calculated according to a distance calculation formula.

And dividing each object track contained in the original video into the view field with the closest object track according to the proximity degree.

In the embodiment of the present invention, preferably, after each object track is added to a certain view, the line segment feature of the most viewing area may also be updated according to the coordinates of the start point and the end point of the object track. Specifically, updating the formula includes: n is_k＝n_k+1, where n_kFor the number of track objects contained in the field of view before joining the object track, n_k+1 is the number of the track objects contained in the view after the object track is added;

wherein, x'_s、y′_sIs the abscissa and ordinate, x ', of the start point of the object trajectory'、y′As the abscissa and ordinate of the end point of the object trajectory,

as the abscissa and ordinate of the starting point of the field of view,

the abscissa and ordinate of the end point of the field of view. In the embodiment of the present invention, the initial starting point and the end point of the view may be selected by adding the starting point and the end point of the first object track into the view.

Specifically, step 103 in the above embodiment of the present invention includes:

the activity degree of the object track is positively correlated with the object area corresponding to the object track and the duration of the object track, and the activity degree index of the statistic view field is as follows: summing the activity degrees of all object tracks in the view field to obtain an activity degree index of the view field;

wherein the object area of the object trajectory can be calculated from the height and width of the object itself.

And dividing each visual field into an important visual field and a secondary visual field according to whether the activity index exceeds a preset threshold.

Explaining the division of the vision field into an important vision field and a secondary vision field, respectively calculating to obtain activity indexes of the 3 vision fields in an actual scene, such as dividing an original video into 3 vision fields, comparing the size relationship between the 3 activity indexes and a preset threshold, and if the activity indexes of one of the vision fields are larger than a preset threshold value, dividing the vision field into the important vision field; if the maximum activity index of the view is still smaller than the preset threshold, the 3 views are all secondary views.

Specifically, step 104 in the above embodiment of the present invention includes:

if the plurality of views are all important views, respectively solving the optimal solution of the object track combination of each view by adopting a first preset function, and further determining the optimal object track combination corresponding to the optimal solution;

and generating a video abstract according to the optimal object track combination of all the views.

As a preferred embodiment, the following first preset function and second preset function are further provided to illustrate the present invention. In the embodiment of the present invention, the first preset function uses a complex transfer mapping energy function to solve the optimal solution of the object trajectory combination of each view, and may be solved by the following formula:

E(MAP)＝E_a(BO)+αE_tps(BO)+β*E_ntps(BO)+γ*E_tc(BO)+λE_tct(BO)

wherein E (MAP) is a complex transfer mapping energy function; BO is the set of object trajectories within the important field of view; e_a(BO) is the active energy cost, representing a penalty function if the target does not appear in the digest video; e_tps(BO) is the associated positive order cost, representing a penalty function if the target is added in the digest video in non-positive order; e_ntps(BO) is a relative reverse order cost, which represents a cost penalty function brought by adding two objects which are related before and after the object in the abstract video in a reverse order; e_tc(BO) is a pseudo collision cost, which represents a penalty function caused by the track collision of two objects which do not collide in the original video in the summary result; e_tct(BO) is the true collision cost, which means that two objects colliding in the original video do not have penalty function brought by collision in the summary result, E_tct(BO) is a negative value, α, γ, λ are preset weighting coefficients, and the specific values thereof can be determined according to the needs of the situation in the actual scene.

Fig. 2 is an application diagram of a video summary generation method according to an embodiment of the present invention, where the application is mainly used in a complex motion scene, and a moving object is relatively large and also relatively many. As shown in fig. 2, the application is implemented by the following steps:

step 201: initializing the number of the vision fields.

That is, the original video is divided into a plurality of views, and the specific division into several views may be determined according to actual needs, for example, the division into 3 or 5 views may be performed.

Step 202: the field of view direction is calculated.

Specifically, the direction of the view is calculated according to the direction of the scene in the original video, and if the direction of the scene in the original video is horizontal or vertical, the direction of the corresponding view is horizontal or vertical.

Step 203: and calculating the affiliated vision field of each object track.

Specifically, the proximity of the object trajectory to each viewing area may be calculated according to a distance calculation formula, and each object trajectory included in the original video may be divided into the viewing area to which the object trajectory is closest.

Step 204: and updating the view straight line model.

Specifically, after each object track is added to a certain view, the line segment feature of the most visible area may be updated according to the coordinates of the start and end points of the object track, so as to add the next object track.

Step 205: and calculating the visual field activity index.

Specifically, according to the activity degree of the object track in the visual field, the activity degree index of the visual field is counted.

Step 206: the view activity indicator is compared to a predetermined threshold.

And correspondingly judging the vision field with the vision field activity index larger than/preset threshold as the important vision field/the secondary vision field.

Step 207: and processing the object track by using a first preset function.

Specifically, due to the particularity of the scene in the application, the calculated view areas are all important view areas, the optimal solution of the object track combination of each view area is solved by using a first preset function, the optimal object track combination corresponding to the optimal solution is further determined, and the video abstract is generated.

Example two

As shown in fig. 1 and fig. 3, for the purpose of illustrating the embodiment of the present invention, the embodiment of the present invention includes

steps

101, 102, 103, and 104 in the first embodiment, except that the step 104 in the first embodiment is different from the step 104 in the first embodiment in implementation manner, and the same parts in the first embodiment are not repeated, and only different parts are described below:

specifically, step 104 in this embodiment of the present invention includes:

if the plurality of vision fields are all secondary vision fields, respectively solving the optimal solution of the object track combination of each vision field by adopting a second preset function, and further determining the optimal object track combination corresponding to the optimal solution;

As a preferred implementation, the second preset function in this embodiment uses a simple transfer mapping energy function to solve the optimal solution of the object trajectory combination of each view, where the simple transfer mapping energy function is relative to the complex transfer mapping energy function in the first embodiment, and can be solved by the following formula:

wherein E (MAP) c solves the optimal solution of the object trajectory combination of each view for a simple transfer mapping energy function, b_mAnd b_bFor the two moving object trajectories in the secondary view domain, γ is a preset weight coefficient, and the specific value thereof can be determined according to the situation in the actual scene.

Fig. 3 is a second application diagram of the video summary generation method according to the embodiment of the present invention, in which the application is mainly used in a simple motion scene, the moving object is relatively small and also relatively small. As shown in fig. 3, the application is implemented by the following steps:

step 301: initializing the number of the vision fields.

Step 302: the field of view direction is calculated.

Step 303: and calculating the affiliated vision field of each object track.

Step 304: and updating the view straight line model.

Wherein, step 305: and calculating the visual field activity index.

Step 306: the view activity indicator is compared to a predetermined threshold.

Step 307: and processing the object track by using a second preset function.

Specifically, due to the particularity of the scene in the application, the calculated view areas are all secondary view areas, the second preset function is used for respectively solving the optimal solution of the object track combination of each view area, the optimal object track combination corresponding to the optimal solution is further determined, and the video abstract is generated.

EXAMPLE III

As shown in fig. 1 and fig. 4, for the purpose of illustrating the embodiment of the present invention, the embodiment of the present invention includes

steps

specifically, step 104 in this embodiment of the present invention includes:

if the plurality of view areas comprise important view areas and secondary view areas, if two important view areas are adjacent, combining the two important view areas into one important view area, and solving the optimal solution of the object track combination by adopting a first preset function for the combined important view areas; if the important views are not adjacent to each other, respectively solving the optimal solution of the object track combination of each important view by adopting a first preset function, and further determining the optimal object track combination corresponding to the optimal solution; respectively solving the optimal solution of the object track combination of each secondary view by adopting a second preset function, and further determining the optimal object track combination corresponding to the optimal solution;

The optimal solution of the object track combination in each important view field can be solved through a first preset function, and then the optimal object track combination corresponding to the optimal solution is determined, the function in the prior art can be adopted to solve the optimal solution of the object track combination in the important view field, as an optimal implementation mode, the first preset function in the embodiment adopts a complex transfer mapping energy function to solve the optimal solution of the object track combination in each view field, and the solution can be carried out through the following formula:

E(MAP)＝E_a(BO)+αE_tps(BO)+β*E_ntps(BO)+γ*E_tc(BO)+λE_tct(BO)

wherein E (MAP) is a complex transfer mapping energy function; BO is the set of object trajectories within the important field of view; e_a(BO) is the active energy penalty, meaning if not in summary videoA penalty function when the target occurs; e_tps(BO) is the associated positive order cost, representing a penalty function if the target is added in the digest video in non-positive order; e_ntps(BO) is a relative reverse order cost, which represents a cost penalty function brought by adding two objects which are related before and after the object in the abstract video in a reverse order; e_tc(BO) is a pseudo collision cost, which represents a penalty function caused by the track collision of two objects which do not collide in the original video in the summary result; e_tct(BO) is the true collision cost, which means that two objects colliding in the original video do not have penalty function brought by collision in the summary result, E_tct(BO) is a negative value, α, γ, λ are preset weighting coefficients, and the specific values thereof can be determined according to the needs of the situation in the actual scene.

The optimal solution of the object track combination in each secondary view can be solved through a second preset function, and then the optimal object track combination corresponding to the optimal solution is determined, the optimal solution of the object track combination in the secondary view can be solved through a function in the prior art, as a preferred implementation manner, the second preset function in this embodiment adopts a simple transfer mapping energy function to solve the optimal solution of the object track combination in each view, and can be solved through the following formula:

wherein E (MAP) c solves the optimal solution of the object trajectory combination for each view for a simple transfer mapping energy function, which is relative to the complex transfer mapping energy function in embodiment one, b_mAnd b_bFor the two moving object trajectories in the secondary view domain, γ is a preset weight coefficient, and the specific value thereof can be determined according to the situation in the actual scene.

Fig. 4 is a third application diagram of the video summary generation method according to the embodiment of the present invention, and the application is mainly used in a motion scene with a complex structure, where moving objects are irregular, for example, some area objects have simple motion and small number, and some area objects have complex relative motion. As shown in fig. 4, the application is implemented by the following steps:

step 401: initializing the number of the vision fields.

Step 402: the field of view direction is calculated.

Step 403: and calculating the affiliated vision field of each object track.

Step 404: and updating the view straight line model.

Step 405: and calculating the visual field activity index.

Step 406: the view activity indicator is compared to a predetermined threshold.

Step 407: whether two important views are adjacent to each other.

If the two important views are adjacent to each other, then step 408 is continued.

Step 408: and (6) merging. I.e. merging two adjacent important views.

Step 409: processing the object track in the important view field by using a first preset function;

step 410, processing the object track in the secondary view domain by using a second preset function;

and finally, generating a video abstract according to the optimal object track combination of all the views.

Example four

As shown in fig. 1 and fig. 5, for the purpose of illustrating the embodiment of the present invention, the embodiment of the present invention includes

steps

specifically, step 104 in this embodiment of the present invention includes:

if the plurality of view areas comprise important view areas and secondary view areas, if two important view areas are adjacent, combining the two important view areas into one important view area, and solving the optimal solution of the object track combination by adopting a first preset function for the combined important view areas; if the important views are not adjacent to each other, respectively solving the optimal solution of the object track combination of each important view by adopting a first preset function, and further determining the optimal object track combination corresponding to the optimal solution; copying the object track in the secondary view field into a background image according to the original video;

E(MAP)＝E_a(BO)+αE_tps(BO)+β*E_ntps(BO)+γ*E_tc(BO)+λE_tct(BO)

And copying the object track in the secondary view field into the background image according to the original video, and finally generating the video abstract.

Fig. 5 is a fourth application diagram of the video abstract generating method according to the embodiment of the present invention, where the application is mainly used in a motion scene with a complex structure, and the motion targets are irregular, for example, some area targets have simple motion and small number, and some area targets have complex relative motion. As shown in fig. 5, the application is implemented by the following steps:

step 501: initializing the number of the vision fields.

Step 502: the field of view direction is calculated.

Step 503: and calculating the affiliated vision field of each object track.

Step 504: and updating the view straight line model.

Step 505: and calculating the visual field activity index.

Step 506: the view activity indicator is compared to a predetermined threshold.

Step 507: whether two important views are adjacent to each other.

If the two important views are adjacent to each other, then step 508 is continued.

Step 508: and (6) merging. I.e. merging two adjacent important views.

Step 509: processing the object track in the important view field by using a first preset function;

step 510, copying the object track into a background image according to an original video;

EXAMPLE five

As shown in fig. 6, an embodiment of the present invention further provides a video summary generating apparatus, where the apparatus 60 includes:

a first dividing module 61 for dividing the original video into a plurality of views;

a classification module 62, configured to divide each object track included in the original video into the view field to which the object track is closest according to the proximity degree of the object track to each view field;

a second dividing module 63, configured to count activity indicators of the views according to activity degrees of object tracks in the views, and divide each view into an important view and a secondary view according to whether the activity indicators exceed a preset threshold;

and a merging processing module 64, configured to perform parallel processing on the object tracks in each important view and the secondary view, and merge the views obtained through the parallel processing to generate a video summary.

Wherein the first dividing module 61 comprises: the first computing unit is used for determining the direction of a scene in an original video; and the first dividing unit is used for dividing the original video into a plurality of visual fields according to the direction of the scene, and the directions of the visual fields are consistent with the direction of the scene.

Wherein the categorizing module 62 comprises: a second acquisition unit configured to acquire a line segment feature of each field of view, the line segment feature including: coordinates of a start point and a stop point of the visual field and the number of object tracks contained in the visual field; the distance calculation unit is used for acquiring coordinates of a start point and a stop point of the object track and calculating the proximity degree of the object track and each view; the first classification unit is used for dividing each object track contained in the original video into the most approximate visual field of the object track according to the proximity degree;

and the updating unit is used for updating the line segment characteristics of the closest visual field according to the coordinates of the start point and the end point of the object track.

Wherein, the second dividing module 63 includes: an activity index calculation unit, wherein the activity level of the object track is positively correlated with the object area corresponding to the object track and the duration of the object track, and the activity index of the statistic view field is as follows: summing the activity degrees of all object tracks in the view field to obtain an activity degree index of the view field; and the second dividing unit is used for dividing each visual field into an important visual field and a secondary visual field according to whether the activity index exceeds a preset threshold.

Optionally, the merge processing module 64 includes: the first merging unit is used for respectively solving the optimal solution of the object track combination of each view by adopting a first preset function if the plurality of views are all important views, and further determining the optimal object track combination corresponding to the optimal solution; and the first processing unit is used for generating the video abstract according to the optimal object track combination of all the views.

Optionally, the merge processing module 64 includes: the second merging unit is used for solving the optimal solution of the object track combination of each view by adopting a second preset function if the plurality of views are all secondary views, and further determining the optimal object track combination corresponding to the optimal solution; and the second processing unit is used for generating the video abstract according to the optimal object track combination of all the views.

Optionally, the merge processing module 64 includes: a third merging unit, configured to merge two important views into an important view if the multiple views include the important view and the secondary view, and solve an optimal solution of the object trajectory combination by using a first preset function for the merged important views; if the important views are not adjacent to each other, respectively solving the optimal solution of the object track combination of each important view by adopting a first preset function, and further determining the optimal object track combination corresponding to the optimal solution; respectively solving the optimal solution of the object track combination of each secondary view by adopting a second preset function, and further determining the optimal object track combination corresponding to the optimal solution; and the third processing unit is used for generating the video abstract according to the optimal object track combination of all the views.

Optionally, the merge processing module 64 includes: a fourth merging unit, configured to merge two important views into an important view if the multiple views include the important view and the secondary view, and solve an optimal solution of the object trajectory combination by using a first preset function for the merged important views; if the important views are not adjacent to each other, respectively solving the optimal solution of the object track combination of each important view by adopting a first preset function, and further determining the optimal object track combination corresponding to the optimal solution; copying the object track in the secondary view field into a background image according to the original video; and the fourth processing unit is used for combining all the views according to the processing result to generate the video abstract.

In the video abstract generating method of the embodiment of the invention, the operation amount of the track combination is reduced, the operation speed is accelerated and the user can pay attention to the main target in the important visual field more simply and clearly by parallel processing of the object tracks in the important visual field and the secondary visual field.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for generating a video summary is characterized by comprising the following steps:

dividing the original video into a plurality of views, comprising:

determining the direction of a scene in an original video;

dividing an original video into a plurality of visual fields according to the direction of the scene, wherein the directions of the visual fields are consistent with the direction of the scene;

dividing each object track contained in the original video into the view field with the closest object track according to the proximity degree of the object track and each view field;

counting activity indexes of the vision field according to the activity degree of an object track in the vision field, and dividing each vision field into an important vision field and a secondary vision field according to whether the activity indexes exceed a preset threshold;

and carrying out parallel processing on the object tracks in each important view and each secondary view, and combining the view obtained after the parallel processing to generate the video abstract.

2. The method of claim 1, wherein determining the orientation of the scene in the original video comprises:

acquiring initial points and end points of a plurality of object tracks in a scene in the original video;

calculating a coordinate difference value according to an initial point and a final point of the object track, and determining the direction of the object track;

3. The method according to claim 1, wherein the dividing each object track included in the original video into the view areas where the object track is closest according to the proximity of the object track to each view area comprises:

acquiring line segment characteristics for each field of view, the line segment characteristics comprising: coordinates of a start point and a stop point of the visual field and the number of object tracks contained in the visual field;

dividing each object track contained in the original video into the view field with the closest object track according to the proximity degree;

and updating the line segment characteristics of the closest visual field according to the coordinates of the start point and the stop point of the object track.

4. The method of claim 1, wherein the counting activity indicators of the view areas according to activity levels of object tracks in the view areas, and dividing the view areas into the important view areas and the secondary view areas according to whether the activity indicators exceed a preset threshold comprises:

the activity degree is positively correlated with the object area corresponding to the object track and the duration of the object track, and the activity degree index of the statistic view field is as follows: summing the activity degrees of all object tracks in the view field to obtain an activity degree index of the view field;

5. The method of claim 1, wherein the parallel processing of the object trajectories in the respective important view and secondary view and combining the respective views after the parallel processing to generate the video summary comprises:

6. The method of claim 1, wherein the parallel processing of the object trajectories in the respective important view and secondary view and combining the respective views after the parallel processing to generate the video summary comprises:

7. The method of claim 1, wherein the parallel processing of the object trajectories in the respective important view and secondary view and combining the respective views after the parallel processing to generate the video summary comprises:

8. The method of claim 1, wherein the parallel processing of the object trajectories in the respective important view and secondary view and combining the respective views after the parallel processing to generate the video summary comprises:

and combining all the views according to the processing result to generate the video abstract.

9. A video summary generation apparatus, comprising:

a first partitioning module for partitioning an original video into a plurality of views;

the first division module includes:

the first computing unit is used for determining the direction of a scene in an original video;

a first dividing unit, configured to divide an original video into a plurality of views according to a direction of the scene, where the directions of the views are consistent with a direction of the scene;

the classification module is used for dividing each object track contained in the original video into the view field with the closest object track according to the proximity degree of the object track and each view field;

the second division module is used for counting the activity degree index of the vision field according to the activity degree of the object track in the vision field, and dividing each vision field into an important vision field and a secondary vision field according to whether the activity degree index exceeds a preset threshold;

and the merging processing module is used for carrying out parallel processing on the object tracks in each important view and each secondary view, merging the view obtained after the parallel processing and generating the video abstract.

10. The apparatus of claim 9, wherein the first computing unit comprises:

the first acquisition unit is used for acquiring initial points and end points of a plurality of object tracks in a scene in the original video;

the difference value calculation unit is used for performing coordinate difference value calculation according to the initial point and the end point of the object track and determining the direction of the object track;

and the judging unit is used for judging the direction of a scene in the original video according to the direction of most of the object tracks in the object tracks, wherein the direction of the scene is consistent with the direction of most of the object tracks in the object tracks.

11. The apparatus of claim 9, wherein the classification module comprises:

a second acquisition unit configured to acquire a line segment feature of each field of view, the line segment feature including: coordinates of a start point and a stop point of the visual field and the number of object tracks contained in the visual field;

the distance calculation unit is used for acquiring coordinates of a start point and a stop point of the object track and calculating the proximity degree of the object track and each view;

the first classification unit is used for dividing each object track contained in the original video into the most approximate visual field of the object track according to the proximity degree;

12. The apparatus of claim 9, wherein the second partitioning module comprises:

an activity index calculation unit, wherein the activity level of the object track is positively correlated with the object area corresponding to the object track and the duration of the object track, and the activity index of the statistic view field is as follows: summing the activity degrees of all object tracks in the view field to obtain an activity degree index of the view field;

and the second dividing unit is used for dividing each visual field into an important visual field and a secondary visual field according to whether the activity index exceeds a preset threshold.

13. The apparatus of claim 9, wherein the merge processing module comprises:

the first merging unit is used for respectively solving the optimal solution of the object track combination of each view by adopting a first preset function if the plurality of views are all important views, and further determining the optimal object track combination corresponding to the optimal solution;

and the first processing unit is used for generating the video abstract according to the optimal object track combination of all the views.

14. The apparatus of claim 9, wherein the merge processing module comprises:

the second merging unit is used for solving the optimal solution of the object track combination of each view by adopting a second preset function if the plurality of views are all secondary views, and further determining the optimal object track combination corresponding to the optimal solution;

and the second processing unit is used for generating the video abstract according to the optimal object track combination of all the views.

15. The apparatus of claim 9, wherein the merge processing module comprises:

a third merging unit, configured to merge two important views into an important view if the multiple views include the important view and the secondary view, and solve an optimal solution of the object trajectory combination by using a first preset function for the merged important views; if the important views are not adjacent to each other, respectively solving the optimal solution of the object track combination of each important view by adopting a first preset function, and further determining the optimal object track combination corresponding to the optimal solution; respectively solving the optimal solution of the object track combination of each secondary view by adopting a second preset function, and further determining the optimal object track combination corresponding to the optimal solution;

and the third processing unit is used for generating the video abstract according to the optimal object track combination of all the views.

16. The apparatus of claim 9, wherein the merge processing module comprises:

a fourth merging unit, configured to merge two important views into an important view if the multiple views include the important view and the secondary view, and solve an optimal solution of the object trajectory combination by using a first preset function for the merged important views; if the important views are not adjacent to each other, respectively solving the optimal solution of the object track combination of each important view by adopting a first preset function, further determining the optimal object track combination corresponding to the optimal solution, and copying the object track in the secondary view into a background image according to the original video;

and the fourth processing unit is used for combining all the views according to the processing result to generate the video abstract.