CN102495907A

CN102495907A - Video summary with depth information

Info

Publication number: CN102495907A
Application number: CN201110437761XA
Authority: CN
Inventors: 胡大鹏; 李志前; 周晓; 麦振文
Original assignee: Hong Kong Applied Science and Technology Research Institute ASTRI
Current assignee: Hong Kong Applied Science and Technology Research Institute ASTRI
Priority date: 2011-12-23
Filing date: 2011-12-23
Publication date: 2012-06-13
Anticipated expiration: 2031-12-23
Also published as: CN102495907B

Abstract

The invention relates to a video summary with depth information. A computer executable method for creating a summary video with the depth information comprises the following steps: recognizing moving objects from an input primary video; generating animated moving object cut blocks aiming at recognized moving objects through copying and piling sequential frames in the input primary video containing moving object images; establishing a scene background through using a structure of a scene in the input primary video and estimating any loss part; establishing a three-dimensional scene through using a foreground object and the depth information of the scene background in the input primary video, and overlaying the animated moving object cut blocks on the three-dimensional scene according to respective longitude, dimensionality and depth positions of the animated moving object cut blocks in the three-dimensional scene so as to display a dynamic 3D (three-dimensional) scene; and compositing to obtain the summary video through using the dynamic 3D scene.

Description

Video summary with depth information

Technical field

The present invention relates generally to video analysis, index and retrieval in the video monitoring.Particularly, the present invention relates to a kind of analysis and summary video to help to expect the search of content and the method and system of identification.

Background technology

Video is sheared classification locate to try in some perhaps that incident is a dull and time-consuming procedure.The beholder must carefully check the whole video shearing, and these videos are sheared in each frame, possibly comprise perhaps and possibly do not comprised interested scene.If in the video monitoring midium or long term Video Capture scene of checking ceaselessly, this problem is just more serious.In addition, in commercial and public safety monitoring, it comprises the network of hundreds of monitor video video camera usually, and these monitor video video cameras are caught a plurality of unlimited video data streams.Billions of monitor video video cameras is installed on All Around The World.Southern city in China only in Shenzhen, just is provided with above 1,000,000 video cameras according to estimates.

Therefore, need a kind of method to summarize or the compressed video shearing, thereby only show that part of video of expecting content that comprises probably.Some traditional video summarization technology are activity compression object and the result is presented in the conventional two dimensional motion image in time.But the two dimensional motion image of compression can make mobiles get together and for the human vision comprehension, can be difficult to digestion like this.Other traditional video summarization technology are deleted static frames simply from the source video is sheared, this can not obtain best summary effect.

Summary of the invention

The object of the present invention is to provide a kind of method that is presented at the summary video in the three-dimensional scenic through activity compression object in time and with the result.Additional depth dimension allows the nature person to visually perceive effectively: parallax and difference are with helping the vision comprehension of mobiles along with the time location.Because the video summarization method generates the summary video that has three-dimensional information that produces, so can create the scene view of the novelty of being caught by virtual video camera thereupon.

A purpose more of the present invention is to provide a kind of summary video that comprises two viewing areas: the object tabulation and the scene of appearance are looked back part.The object tabulation that occurs is through getting rid of background information so that the user can only pay close attention to object and show events object cutout.Scene is looked back part and is shown the three-dimensional scenic view with object cutout.

Description of drawings

Hereinafter, will combine accompanying drawing that embodiments of the invention are described in detail, wherein:

Fig. 1 shows an embodiment of summary video, and this summary video comprises the object tabulation and the scene review part of appearance;

Fig. 2 shows exemplary computer system application user interface, and this user interface covers the summary video with relevant classification and sequencing feature; And

Fig. 3 shows original view and the new view of catching from the different virtual video camera observation point of three-dimensional scenic.

Embodiment

In the following description, set forth the method and system embodiment of video summary with depth dimension with the form of preferred embodiment.For one of ordinary skill in the art, can under the situation that does not deviate from scope of the present invention and spirit, comprise the modification that increases and/or reduce.Can omit detail, to avoid fuzzy the present invention; But disclosed content should be write and can be made one of ordinary skill in the art under the situation of not carrying out undo experimentation, just can put into practice the instruction of this paper.

But the invention provides a kind of computing machine implementation method that is used to summarize video, the mobiles that it is at first discerned in the original video of input then utilizes the three-dimensional information of scene to synthesize the summary video.Object identification can be based on choice criteria, for example the form of the shape of object and structure, color, accurate outward appearance and spatial movement.

With reference to Fig. 1, through duplicating and piling up the original video frame of the continuous input that comprises live image and abandon mobiles two field picture pixel on every side and carry out the cartoon making of each mobiles cutout (cutout) 122.Therefore, each mobiles cutout is one group of successive video frames with set time order.Like this, each sequence of motion of processing the mobiles cutout of animation is held down with its spatial movement longitude, dimension and depth location data in scene.Then, the mobiles cutout frame group of processing animation is kept in the permanent memory data storage.The structure of each scene in the original video of utilization input makes up the background 121 of scene.The disappearance part can be estimated automatically.Can also import background by the user.

Equally with reference to Fig. 1.Computing machine of the present invention can be implemented the summary video that the dynamic 3D scene of video summarization method utilization is synthesized output, and wherein dynamically the 3D scene had both shown that static background 121 was also shown in the time and goes up chaotic and be made into the mobiles cutout 122 of animation.The depth information of background and foreground object is used for creating dynamic 3D scene.Then, through will overlapping onto on the three-dimensional scenic from the frame group of the mobiles cutout of processing animation of permanent memory memory fetch, to present dynamic 3D scene according to longitude, dimension and the depth location of each mobiles in scene.The generation of dynamic 3D scene can generally be described through following step:

1. the depth information of known background scene.

2. the 3D position of the mobiles of known every frame.

3. will estimate automatically or the degree of depth of background image structure mapping to the 3D scene of user input.

4. the user can select the 3D representative of every type of object.For example, the object of regulation or human 3D model can be represented the people, and the object or the vehicle 3D model of regulation can be represented vehicle.

5. each object cutout has the 3D representative of appointment.

6. for each mobiles, can regard the frame of object cutout as will be mapped to selected 3D object (object described in step 4) structure, to force outward appearance separately.

7. the 3D representative with structure mapping is put on the 3D scene.

8. pass when the time, upgrade the position (according to the 3D position of each mobiles) of the 3D representative of structure mapping in each time.

9. simultaneously, the next frame of the object cutout of each object of usefulness upgrades the outward appearance (that is structure) of 3D representative.

10. repeating step (8) and (9) always are shown up to all mobileses, disappear then.

According to one embodiment of present invention, the time sequencing of processing the mobiles cutout of animation can change.For example; Can overlap onto on the three-dimensional scenic in their each positions in scene through the frame group of processing the mobiles cutout of animation with two, be made into and appear at together at one time in the dynamic 3D scene and will appear at two objects in the scene at two different times.Therefore, can be through side by side a plurality of mobileses being illustrated in the length that shortens the summary video in the dynamic 3D scene in large quantities together, wherein mobiles can appear in the original video of input in different periods individually.The user can dispose how many mobiles cutouts can occur simultaneously and which mobiles cutout will appear in the dynamic 3D scene.

According to another embodiment; Can from the frame group of the mobiles cutout of processing animation, select a frame; And after the frame group of the mobiles cutout of processing animation is reappeared fully in dynamic 3D scene, this frame its oneself position in scene is overlapped onto on the three-dimensional scenic.Can be from the frame group selection frame of object cutout based on the choice criteria of user's input or specific time sequencing or the position the frame group.When other frame groups of processing the mobiles cutout of animation were also reappeared, this can be used as the position mark of object in scene.

According to an embodiment again, dynamically the 3D scene can use virtual video camera to watch three-dimensional scenic from different angles.Like this, can generate the snapshot image and the video at the new visual angle of three-dimensional scenic.For example, Fig. 3 shows the original view 301 of three-dimensional scenic in the left side; And show the new view 302 that observation point captures on the right side, this observation point tilts to the right from raw observation point a little perhaps counter-clockwisely.

According to another embodiment, as shown in fig. 1, the summary video of output is made up of two following viewing areas: the object tabulation 101 and the scene of appearance are looked back part 102.As shown in fig. 1, the object tabulation 102 of appearance shows the snapshot or the animation 111 of mobiles, the mobiles cutout of processing animation 122 current the appearing in the dynamic 3D scene that it is corresponding.If object tabulation 101 in mobiles illustrate with snapshot, a frame of processing so in the frame group of mobiles cutout of animation just is used as snapshot.If object tabulation 101 in mobiles illustrate with animation, process so in the frame group of mobiles cutout of animation frame just under the situation of not considering spatial movement by reproduction in scene.Scene is looked back part has shown dynamic 3D scene through the mobiles cutout of processing animation virtual video camera view.The object tabulation 101 that occurs can vertically or flatly be placed on any position in the display.The object tabulation 101 that occurs can also overlapping scene be looked back part 102.In this case, the object of appearance is tabulated and 101 is occurred translucently, looks back part 102 to avoid blocking scene.

According to an embodiment; The time sequencing that the order that mobiles 111 in the object tabulation 101 that occurs occurs from the top to the bottom and scene are looked back in the part 102 the corresponding mobiles cutout 122 of processing animation that in dynamic 3D scene, occurs is identical, and the object at top is corresponding to the mobiles cutout of processing animation that occurs recently in the object tabulation 101 that wherein occurs.

With reference to Fig. 2.According to each embodiment of the present invention, use the summary video of dynamic 3D scene can be included in the computer system with using user interface.The user that one embodiment comprises of user interface imports Standard Selection window 201; Has the object relevant criterion therein; For example, if the space operation of shape, color, object type, mobiles or direction of motion and mobiles are the license plate numbers under the situation of vehicle.Each mobiles cutout of processing animation based on it and the degree of selected relevant criterion close match be assigned with one relevant number.Then, the mobiles in the object of the appearance tabulation 202 is labeled their separately relevant numbers.The snapshot or the animation of the mobiles cutout of processing animation in the object tabulation that occurs are classified through their relevance rankings separately.In one embodiment, the correlativity of processing the mobiles cutout of animation can also be used to specifying the mobiles cutout of processing animation to appear at the time sequencing in the dynamic 3D scene.

Equally with reference to Fig. 2.In one embodiment, the computer system application user interface comprises the virtual video camera controller 203 at the visual angle that is used to adjust dynamic 3D scene.

Can use general or dedicated computing equipment, computer processor or electronic circuit system are realized embodiment disclosed herein, said electronic circuit system includes but not limited to digital signal processor (DSP), special IC (ASIC), field programmable gate array (FPGA) and according to the instruction configuration of present disclosure or other PLDs of programming.The computing machine appointment or the software code that run in general or dedicated computing equipment, computer processor or the PLD can easily be prepared according to the instruction of present disclosure by the technician of software or electronic applications.

In certain embodiments, the present invention includes computer storage media may, this computer storage media may has the computing machine that is stored in wherein specifies or software code, and they can be used for designated computer or microprocessor and carry out any processing of the present invention.Medium can include but not limited to floppy disk, CD, Blu-ray Disc, DVD, CD-ROM and magneto-optic disk, ROM, RAM, flash memory device or be suitable for storing the medium or the equipment of any kind of appointment, code and/or data.

From demonstration and illustrative purposes, aforementioned description of the present invention is provided.It is not intended to limit of the present invention or is limited in disclosed exact form.For one of ordinary skill in the art, a lot of modifications and modification will be conspicuous.

In order to explain principle of the present invention and practical application thereof better; Select and described embodiment; Thereby make one of ordinary skill in the art understand the present invention through each embodiment; And make one of ordinary skill in the art can understand the present invention to have various modifications, the practical application that these modifications are suitable for expecting.Scope of the present invention is limited appended claims and equivalent thereof.

Claims

1. the computer executing method of a summary video that is used to create have depth information comprises:

Receive the original video of input through computer processor;

Through the original video identification activity object of computer processor from input, wherein said mobiles identification is based on the choice criteria that comprises body form and structure, color, accurate outward appearance and space operation form;

Duplicate and pile up the successive frame in the original video of said input of the image that comprises each mobiles and abandon the two field picture pixel around said each mobiles through computer processor, come to generate the mobiles cutout of processing animation for the mobiles of each identification;

Use the scene in the original video of importing through computer processor and estimate that any disappearance partly makes up scene background;

Depth information through foreground object in the original video of computer processor use input and scene background overlaps onto the said mobiles cutout of processing animation on the three-dimensional scenic with the establishment three-dimensional scenic and according to said longitude separately, dimension and the depth location of mobiles cutout in three-dimensional scenic of processing animation, presents dynamic 3D scene; And

Use dynamic 3D scene to synthesize the summary video through computer processor.

2. method according to claim 1, wherein dynamically the 3D scene comprises the virtual video camera that is used for watching from all angles three-dimensional scenic.

3. method according to claim 1, wherein said scene background is made up automatically by computer processor or is imported by the user.

4. method according to claim 1, the time sequencing that the wherein said mobiles cutout of processing animation occurs in dynamic 3D scene is configurable.

5. method according to claim 4; Wherein through in the said separately position of mobiles cutout in three-dimensional scenic of processing animation each said mobiles cutout of processing animation being overlapped on the three-dimensional scenic together, two or more mobileses that different times is occurred appear in the dynamic 3D scene simultaneously together.

6. method according to claim 1 also comprises:

Use dynamic 3D scene to synthesize the summary video through computer processor, said summary video comprises two viewing areas: the object tabulation and the scene of appearance are looked back part;

The object tabulation of wherein said appearance shows current said snapshot or the animation of processing the mobiles cutout of animation that appears in the dynamic 3D scene; And

Wherein said scene is looked back the virtual video camera view that part shows the dynamic 3D scene with said mobiles cutout of processing animation.

7. method according to claim 6, wherein to look back the time sequencing that appears at the corresponding mobiles cutout of processing animation in the dynamic 3D scene in the part identical for the appearance order of the snapshot of the mobiles cutout of processing animation in the object tabulation that occurs or animation and scene.

8. method according to claim 6; Each snapshot or the animation of the mobiles cutout of processing animation during the object that wherein occurs with relevant labelled notation is tabulated; The degree that the mobiles cutout that wherein relevant number expression be said processes animation and one group of optional relevant criterion of user are mated, the optional relevant criterion of said user comprise the license plate number under space operation or the direction of motion of shape, color, object type, mobiles and the situation that mobiles is vehicle.

9. method according to claim 8 wherein sorts to said mobiles animation cutout according to the said correlativity separately of processing the mobiles cutout of animation; And wherein said snapshot or the animation of processing the mobiles cutout of animation in the object tabulation that occurs classified through the said relevance ranking separately of processing the mobiles cutout of animation.

10. method according to claim 8 wherein sorts to the said mobiles cutout of processing animation according to the said correlativity separately of processing the mobiles cutout of animation; And the time sequencing that the mobiles cutout of wherein dynamically processing animation described in the 3D scene occurs is specified by the said relevance ranking of processing the mobiles cutout of animation.