AU2008264196A1

AU2008264196A1 - Interactive video surveillance review and reporting system

Info

Publication number: AU2008264196A1
Application number: AU2008264196A
Authority: AU
Inventors: Jonathan Anthony Duhigs
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2008-12-24
Filing date: 2008-12-24
Publication date: 2010-07-08

Description

S&F Ref: 886354 AUSTRALIA PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3 of Applicant: chome, Ohta-ku, Tokyo, 146, Japan Actual Inventor(s): Jonathan Anthony Duhigs Address for Service: Spruson & Ferguson St Martins Tower Level 35 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Interactive video surveillance review and reporting system The following statement is a full description of this invention, including the best method of performing it known to me/us: 5845c(1 908185_1) - 1 INTERACTIVE VIDEO SURVEILLANCE REVIEW AND REPORTING SYSTEM FIELD OF INVENTION The current invention relates to video surveillance systems and in particular to a system and method for constructing an interactive video collage. The current invention also relates to a computer program product including a computer readable medium having 5 recorded thereon a computer program for constructing an interactive video collage. DESCRIPTION OF BACKGROUND ART Video surveillance systems are systems involving cameras and recording devices which survey an area and record activity. The recordings are then available for later review. Uses include monitoring of staff and customers for illegal behaviour such as stealing or D vandalism and for locating evidence from events. For example, video data sequences may be useful for police investigations. Early simple video surveillance systems involved a single camera recording continuously to a video tape format such as VHS. The camera in such a system has its own power supply and transmits a video signal via normal coaxial cable to a standard video 5 recorder. The recording is controlled by pressing a record button on the recorder. More modem video surveillance systems utilise network cameras which use computer networks to transmit compressed digital video signals through wires or wirelessly to a central server. The central server may manage concurrent signals from many cameras. Such systems are common in public spaces, transport interchanges, work places and retail shopping areas. 0 The number of cameras is now so large that it is usually not possible for any organisation to employ enough people to continuously view the cameras and react to events as they happen. If live viewing is used, for example, in an airport or in a traffic monitoring office, usually a smaller team is employed and they periodically review a set of cameras. The smaller team are able to focus on certain cameras when events occur. Because the amount of video data is 25 larger than can be perpetually reviewed, implementations of video surveillance require the ability to review video data to find video data sequences relevant to previous known events such as accidents, and to review the video data sequences to look for previously unknown events such as illegal activities. The number of cameras and volume of video data has given rise to many technologies 30 that can assist in automatically handling much of the video data that the cameras transmit. Such technologies reduce the effort required to view and review the video data. Automatic processing of video data to locate video data where activity occurs, is known. Sections of 1907782vl (886354_Final) - 2 video data sequences can be marked or indexed so that a reviewer can view only the video data with activity. Further, methods of detecting foreground objects within video data, which reduces the amount of video data that needs to be reviewed, are also known. The methods work on the 5 premise that some activity can be disregarded as noise and camera motion. In accordance with one such method, a background is detected using a background modelling method (e.g., a 'mixed Gaussian' approach where a set of tolerances is defined such that each pixel can vary a certain amount and be defined as part of a background). Where sets of neighbouring pixels all exceed their background tolerance thresholds, a foreground object is detected and can be ) formed by collecting the variation of pixels as the object moves across a scene. Classifying objects within captured video data into object types (e.g., vehicles, people and parcels) and movement types (e.g., direction, passing a tripwire, left behind, walking and running). A reviewer can therefore filter the video data into, for example, vehicles passing from left to right and passing a certain point (e.g. a barrier). 5 Most conventional object detection methods misinterpret some object events. Further, the conventional object detection methods also require significant training to accurately detect objects in each given camera installation. Still further, a totally new object or event may not fit previously learnt or programmed categorisations. A problem faced by organisations employing reviewers to review video data is that 0 watching a video is a passive activity and there is no direct method to monitor the work of the reviewer except for directly watching the reviewers. Further, there is no method of understanding from an organisational level how much of the video data captured by a video surveillance camera is being reviewed and what outcomes have arisen from the level of review that has taken place. 25 Video sprites are known. Video sprites are graphical images of detected objects with reasonably exact outlines such that the video sprites can be overlaid onto a background without occluding significantly more background than is absolutely necessary. Methods of creating interactive montages of video sprites, whereby detected objects are made into video sprites and a static scene is constructed using all of the sprites from a full length of video data, 30 are known. When a sprite in the static scene is selected, a relevant section of video data is played. Systems using such montage creation methods allow a reviewer to see all of the objects and events in a scene at a glance, and to select which objects to view. However, such systems are limited to being a visual index and offer no review and reporting functionality. 1907782vl (886354_Final) -3 Further, there are too many sprites to create a montage where no sprite is practically or fully occluded by other sprites. A single-scene montage system fails at a certain complexity level. Thus a need clearly exists for a video surveillance system which allows efficient and flexible review of footage by human operators. 5 SUMMARY OF THE INVENTION It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements. According to one aspect of the present invention there is provided a method of o constructing a collage summarizing a video data sequence, said method comprising the steps of: detecting at least one foreground object within the video data sequence; creating a sprite for each of said foreground objects; displaying a collage constructed from at least one video frame of the video data 5 sequence and a plurality of said sprites; determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and removing the selected video sprite from the displayed collage. o According to another aspect of the present invention there is provided a method of adapting video playlists, said method comprising the steps of: detecting at least one foreground object within at least one video data sequence; creating a sprite for each of said foreground objects; displaying a collage constructed from at least one video frame of the video data 25 sequence and a plurality of said sprites; determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and adding a reference indicating the portion of said video data sequence to a category 30 playlist, based on the determined association. 1907782v (886354_Final) -4 According to still another aspect of the present invention there is provided an apparatus for constructing a collage summarizing a video data sequence, said apparatus comprising: means for detecting at least one foreground object within the video data sequence; 5 means for creating a sprite for each of said foreground objects; means for displaying a collage constructed from at least one video frame of the video data sequence and a plurality of said sprites; means for determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category 3 using the displayed collage; and means for removing the selected video sprite from the displayed collage. According to still another aspect of the present invention there is provided an apparatus for adapting video playlists, said apparatus comprising: means for detecting at least one foreground object within at least one video data 5 sequence; means for creating a sprite for each of said foreground objects; means for displaying a collage constructed from at least one video frame of the video data sequence and a plurality of said sprites; means for determining an association between at least a portion of the video data 0 sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and means for adding a reference indicating the portion of said video data sequence to a category playlist, based on the determined association. According to still another aspect of the present invention there is provided a system for 25 constructing a collage summarizing a video data sequence, said system comprising: memory for storing data and a computer program; processor coupled to said memory for executing the computer program, said computer program comprising instructions for: detecting at least one foreground object within the video data sequence; 30 creating a sprite for each of said foreground objects; displaying a collage constructed from at least one video frame of the video data sequence and a plurality of said sprites; 1907782vl (886354_Final) - 5 determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and removing the selected video sprite from the displayed collage. 5 According to still another aspect of the present invention there is provided a system for adapting video playlists, said system comprising: memory for storing data and a computer program; processor coupled to said memory for executing the computer program, said computer program comprising instructions for: detecting at least one foreground object within at least one video data sequence; creating a sprite for each of said foreground objects; displaying a collage constructed from at least one video frame of the video data sequence and a plurality of said sprites; 5 determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and adding a reference indicating the portion of said video data sequence to a category playlist, based on the determined association. 0 According to still another aspect of the present invention there is provided a computer readable medium having recorded thereon a computer program for constructing a collage summarizing a video data sequence, said computer program comprising: code for detecting at least one foreground object within the video data sequence; code for creating a sprite for each of said foreground objects; 25 code for displaying a collage constructed from at least one video frame of the video data sequence and a plurality of said sprites; code for determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and 30 code for removing the selected video sprite from the displayed collage. According to still another aspect of the present invention there is provided a computer readable medium having recorded thereon a computer program for adapting video playlists, said computer program comprising: 1907782vl (886354_Final) -6 code for detecting at least one foreground object within at least one video data sequence; code for creating a sprite for each of said foreground objects; code for displaying a collage constructed from at least one video frame of the video 5 data sequence and a plurality of said sprites; code for determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and code for adding a reference indicating the portion of said video data sequence to a 3 category playlist, based on the determined association. Other aspects of the invention are also disclosed. BRIEF DESCRIPTION OF THE DRAWINGS One or more embodiments of the invention will now be described with reference to the following drawings, in which: 5 Fig. 1 is a flow diagram showing a method of constructing a collage summarizing a video data sequence; Figs. 2A-2C show an example of video frames showing a detectable object; Figs. 3A-3C show an example of video frames showing a detectable object; Fig. 4 is a collage of two detected objects overlaid over a detected background; 0 Fig. 5 shows a user interface of a video surveillance system; Figs. 6A and 6B shows interactions with the user interface of Fig. 5; Fig. 7 shows another user interface of a video surveillance system; and Fig. 8 shows an activity report; and Figs. 9A and 9B form a schematic block diagram of a video surveillance system upon 25 which arrangements described can be practiced. DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for 30 the purposes of this description the same function(s) or operation(s), unless the contrary intention appears. A method 100 of constructing a collage summarizing a video data sequence will be described in detail below with reference to Fig. 1. 1907782vl (886354_Final) - 7 Figs. 9A and 9B collectively form a schematic block diagram of a video surveillance system 900, upon which the various arrangements described can be practiced. As seen in Fig. 9A, the system 900 comprises central server in the form of a computer module 901. Input devices such as a keyboard 902, a mouse pointer device 903, a 5 scanner 926, and a microphone 980, and output devices including a printer 915, a display device 914 and loudspeakers 917, are connected to the computer module 901. An external Modulator-Demodulator (Modem) transceiver device 916 may be used by the computer module 901 for communicating to and from a communications network 920 via a connection 921. The network 920 may be a wide-area network (WAN), such as the Internet or a private ) WAN. Where the connection 921 is a telephone line, the modem 916 may be a traditional "dial-up" modem. Alternatively, where the connection 921 is a high capacity (eg: cable) connection, the modem 916 may be a broadband modem. A wireless modem may also be used for wireless connection to the network 920. The computer module 901 typically includes at least one processor unit 905, and a 5 memory unit 906 for example formed from semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The module 901 also includes a number of input/output (I/O) interfaces including an audio-video interface 907 that couples to the video display 914, loudspeakers 917 and microphone 980, an 1/0 interface 913 for the keyboard 902, mouse 903 and scanner 926, and an interface 908 for the external modem 916 0 and printer 915. In some implementations, the modem 916 may be incorporated within the computer module 901, for example within the interface 908. The computer module 901 also has a local network interface 911 which, via a connection 923, permits coupling of the computer system 900 to a local computer network 922, known as a Local Area Network (LAN). As also illustrated, the local network 922 may also couple to the wide network 920 25 via a connection 924, which would typically include a so-called "firewall" device or device of similar functionality. The interface 911 may be formed by an Ethernet" circuit card, a Bluetoothm wireless arrangement or an IEEE 802.11 wireless arrangement. A video surveillance camera 990, mounted in a position so as to monitor a particular secure area, is connected to the network 922. Typically, a plurality of video surveillance cameras 30 similar to the camera 900 will be connected to the network 922. The interfaces 908 and 913 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage 1907782vl (886354_Final) -8 devices 909 are provided and typically include a hard disk drive (HDD) 910. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 912 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (eg: CD-ROM, DVD), USB-RAM, and floppy 5 disks for example may then be used as appropriate sources of data to the system 900. The components 905 to 913 of the computer module 901 typically communicate via an interconnected bus 904 and in a manner which results in a conventional mode of operation of the computer system 900 known to those in the relevant art. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun D Sparcstations, Apple MacTM or alike computer systems evolved therefrom. The method 100 of constructing a collage may be implemented using the computer system 900 wherein the processes of Figs. I to 8, to be described, may be implemented as one or more software application programs 933 executable within the video surveillance system 900. In particular, the steps of the method 100 are effected by instructions 931 in the software 5 that are carried out within the video surveillance system 900. The software instructions 931 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described method 100 and a second part and the corresponding code modules manage a user interface between the first part and the user. 0 The software 933 is typically stored in the HDD 910 or the memory 906. The software is loaded into the computer system 900 from a computer readable medium, and is then executed by the computer system 900. Thus for example the software may be stored on an optically readable CD-ROM medium 925 that is read by the optical disk drive 912. A computer readable medium having such software or computer program recorded on it is a 25 computer program product. The use of the computer program product in the video surveillance system 900 preferably effects an advantageous apparatus for implementing the method 100. In some instances, the application programs 933 may be supplied to the user encoded on one or more CD-ROM 925 and read via the corresponding drive 912, or alternatively may 30 be read by the user from the networks 920 or 922. Still further, the software can also be loaded into the video surveillance system 900 from other computer readable media. Computer readable storage media refers to any storage medium that participates in providing instructions and/or data to the video surveillance system 900 for execution and/or processing. Examples 1907782vl (886354_Final) - 9 of such storage media include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 901. Examples of computer readable transmission media that may also 5 participate in the provision of software, application programs, instructions and/or data to the computer module 901 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e mail transmissions and information recorded on Websites and the like. The second part of the application programs 933 and the corresponding code modules ) mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 914. Through manipulation of typically the keyboard 902 and the mouse 903, a user of the video surveillance system 900 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other 5 forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 917 and user voice commands input via the microphone 980. Fig. 9B is a detailed schematic block diagram of the processor 905 and a "memory" 934. The memory 934 represents a logical aggregation of all the memory modules 0 (including the HDD 909 and semiconductor memory 906) that can be accessed by the computer module 901 in Fig. 9A. When the computer module 901 is initially powered up, a power-on self-test (POST) program 950 executes. The POST program 950 is typically stored in a ROM 949 of the semiconductor memory 906. A hardware device such as the ROM 949 is sometimes referred 25 to as firmware. The POST program 950 examines hardware within the computer module 901 to ensure proper functioning, and typically checks the processor 905, the memory (909, 906), and a basic input-output systems software (BIOS) module 951, also typically stored in the ROM 949, for correct operation. Once the POST program 950 has run successfully, the BIOS 951 activates the hard disk drive 910. Activation of the hard disk drive 910 causes a 30 bootstrap loader program 952 that is resident on the hard disk drive 910 to execute via the processor 905. This loads an operating system 953 into the RAM memory 906 upon which the operating system 953 commences operation. The operating system 953 is a system level application, executable by the processor 905, to fulfil various high level functions, including 1907782vl (886354_Final) -10 processor management, memory management, device management, storage management, software application interface, and generic user interface. The operating system 953 manages the memory (909, 906) in order to ensure that each process or application program 933 running on the computer module 901 has sufficient 5 memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 900 must be used properly so that each process can run effectively. Accordingly, the aggregated memory 934 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer ) system 900 and how such is used. The processor 905 includes a number of functional modules including a control unit 939, an arithmetic logic unit (ALU) 940, and a local or internal memory 948, sometimes called a cache memory. The cache memory 948 typically includes a number of storage registers 944 - 946 in a register section. One or more internal busses 941 functionally 5 interconnect these functional modules. The processor 905 typically also has one or more interfaces 942 for communicating with external devices via the system bus 904, using a connection 918. The application program 933 includes a sequence of instructions 931 that may include conditional branch and loop instructions. The program 933 may also include data 932 which D is used in execution of the program 933. The instructions 931 and the data 932 are stored in memory locations 928-930 and 935-937 respectively. Depending upon the relative size of the instructions 931 and the memory locations 928-930, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 930. Alternately, an instruction may be segmented into a number of parts each of which is stored in !5 a separate memory location, as depicted by the instruction segments shown in the memory locations 928-929. In general, the processor 905 is given a set of instructions which are executed therein. The processor 905 then waits for a subsequent input, to which it reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, 30 including data generated by one or more of the input devices 902, 903, data received from an external source across one of the networks 920, 902, data retrieved from one of the storage devices 906, 909 or data retrieved from a storage medium 925 inserted into the corresponding 1907782vl (886354_Final) - 11 reader 912. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 934. The disclosed arrangements use input variables 954 that are stored in the memory 934 in corresponding memory locations 955-958. The arrangements produce output variables 961 5 that are stored in the memory 934 in corresponding memory locations 962-965. Intermediate variables may be stored in memory locations 959, 960, 966 and 967. The register section 944-946, the arithmetic logic unit (ALU) 940, and the control unit 939 of the processor 905 work together to perform sequences of micro-operations needed to perform "fetch, decode, and execute" cycles for every instruction in the instruction set ) making up the program 933. Each fetch, decode, and execute cycle comprises: (a) a fetch operation, which fetches or reads an instruction 931 from a memory location 928; (b) a decode operation in which the control unit 939 determines which instruction has been fetched; and 5 (c) an execute operation in which the control unit 939 and/or the ALU 940 execute the instruction. Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 939 stores or writes a value to a memory location 932. DI Each step or sub-process in the processes of Figs. 1 to 8 is associated with one or more segments of the program 933, and is performed by the register section 944-1047, the ALU 940, and the control unit 939 in the processor 905 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 933. Z5 The method 100 of constructing a collage may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of the method 100. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories. The video surveillance camera 990 is configured to capture one or more video data 30 sequences and transmit the video data, in the form of one or more video data streams, to the computer module 901, via the network 922. The camera 990 may also be configured to record the video data (i.e., store the video data streams on the hard disk drive 910), detect events and detect objects. The camera 990 may optionally transmit event metadata or object 1907782vl (886354_Final) - 12 metadata to the computer module 901. The camera 990 may be configured to follow a predetermined surveillance pattern over time. For example, the camera 990 may be configured to follow a slow, reciprocating pan. Optionally the camera 920 can receive instructions to adjust settings such as the pan, tilt and zoom of the camera 990. 5 The processor 905 of the computer module 901 can receive video data from the camera 990 in compressed or uncompressed format along with metadata relevant to the video data. For example, the metadata may comprise camera source and metadata generated by the camera 927 by video data analysis. The processor 905 may configured to perform further analysis of the video data ) received from the camera 990 and further metadata can be produced. Video data and metadata can be stored on the hard disk drive 910. The processor 905 may also be configured to send instructions to the camera 990 to adjust settings such as the pan, tilt and zoom of the camera 990. Operators may interact with the computer module 901 using the keyboard 902 and 5 mouse 903 in a conventional manner. Alternatively, the operator may interact with the computer module 901 using a computer client 995, such as a desk top general purpose computer or laptop computer, connected to the network 922. Alternatively the client 995 may be connected to the computer module 901 via the network 920. In another alternative, the client 995 may be connected directly to the computer module 901. 0 The computer client 995 has a similar configuration to the computer module 901 and usually comprises a keyboard (not shown) and a mouse (not shown) , and an output device in the form of a display screen (not shown). However, other visual, auditory and tactile displays are also known and used. The operator, user or manager therefore interacts with the one or more video data sequences captured by the camera 990 and stored on the computer module 25 901, using the computer client 995 connected to the computer module 901 (or server). The one or more cameras (e.g., 990) of the system 900 may each have greater or lesser processing abilities. The method 100 of constructing a collage summarizing a video data sequence, will now be described with reference to Fig. 7. As described above, the method 700 may be 30 implemented in the form of the software application program 933 resident on the hard disk drive 910 and being controlled in its execution by the processor 905. The method 100 determines an interactive collage of video sprites, which can be interacted with to give enhanced review and management functions to an operator. The 1907782vl (886354_Final) - 13 method 100 constructs a functional single-scene collage which allows the operator to add metadata to the recording video data and prepare event-based video data sequences. The method 100 allows a manager, for example, to review the completeness and effectiveness of the work performed by the operator or by a plurality of operators. The system 900 also 5 provides a simple and intuitive user interface 500 (see Fig. 5) which will be described in detail below. The method 100 begins at step 110, where the processor 905 receives a video data stream representing one or more video data sequences captured by the camera 990. The video data stream received at step 110 is adapted in steps 120 to 150 to construct an interactive ) video collage. At step 120, the processor 905 performs the step of detecting at least oneforeground object within the video data sequences using any suitable object detection method. Also at step 110, the processor 905 performs the step of creating video sprites for each of the detected foreground objects. Details of the detected objects are stored in the hard disk drive 910. Also 5 at step 120, a reference to at least a portion of a corresponding video data sequence may be associated with each created sprite. The method 100 continues at the next step 130, where the processor 905 generates a background image from at least one video frame of the video data sequences. In order to generate the background image, the processor 905 performs the step of detecting a background 0 in the video data sequences for use in an interactive collage as will be described below. Also at step 130, the processor 905 constructs an interactive collage (e.g., 400 as seen in Fig. 4) of the background image plus the sprites created at step 110 using a video frame of the video data sequences. Each of the sprites has a defined position within the collage. The interactive collage is stored by the processor 905 in the hard disk drive 910. The sprites are 25 arranged in the collage to minimise overlapping. In one embodiment, most recent items are placed in the collage last. No scaling is necessary and full occlusion is permitted. Methods such as scaling and splitting the video data into a series of interactive montages may be used at step 130. In the following steps 140 and 150, the processor 905 performs the steps for adapting 30 video playlists. In particular, at the step 140, the processor 905 creates a master playlist. The video data stream received at step 110 is segmented such that each foreground object detected at step 120 is associated with at least a portion of a particular video data sequence which contains the video data of the detected object. Parts of the received video data stream 1907782vl (886354_Final) - 14 containing overlapping video data sequences comprising more than one object may therefore be associated with more than one video data sequence. Further, multiple video data sequences may contain reference to the same video data. Initially, all the video data is associated with the master playlist such that the master playlist contains references to all of the video data 5 sequences represented by the received video data stream. Then at step 150, the processor 905 creates a category playlist which is initially empty. Categories for additional category playlists may be predefined and the operator may determine the specification of the category playlists. Once the interactive video collage is constructed and the playlists are created, the user 3 interface 500 can be displayed on the display device 914 by the processor 905 at step 160. The user interface 500 is used by the processor 905 for displaying the interactive video collage constructed from the video frames of the video data sequence and from a plurality of the video sprites created at step 120. The video collage and video sprites are displayed in a display area 201 of the user interface 500 as seen in Fig. 5. 5 If a sprite is selected by the operator, for instance, using the mouse 903 with a pointer positioned over the sprite, the relevant video data sequence corresponding to the selected video sprite is then played by the processor 905. Accordingly, the processor 905 will perform the step of playing back at least a portion of a video data sequence upon selection of a corresponding one of the sprites. In one embodiment the video data sequence is played in 0 place of the video collage in the display area 201 of the user interface 500 and the video collage is displayed after the video data sequence has finished playing. At the next step 170, in response to selection of a sprite and operation of the mouse 903 by the operator in a conventional manner, the processor 905 represents the sprite being dragged from the display area where the video collage has been displayed. Once the sprite has been dragged, the sprite ?5 is moved from a position in the collage and follows a mouse cursor on the display device 914 until the drag is stopped. If the processor 905 detects release of the sprite inside the collage display area, at step 170, then the method 100 returns to step 160 and the sprite returns to the collage position defined for the sprite in step 130. Otherwise, if the processor 905 detects that the sprite is released outside the collage 30 display area, at step 170, then at the next step 180 the processor 905 performs the step of removing the selected sprite from the displayed collage completely. Steps 170 and 180 allow the operator to clear from the collage the objects which have been reviewed or which the 1907782vl (886354_Final) - 15 operator decides do not need reviewing. Steps 170 and 180 allow the operator a simple view of the progress of their work in relation to a review task as a whole. As will be described by way of example below, different areas (e.g., 503, 504) of the user interface 500 are defined to drop the sprites in order to categorise the sprites. Each 5 category is presented as an area (e.g., 503, 504) responsive to drop operations. These areas enable the sprites and the categories to be substantially simultaneously selectable. The processor 905 may detect that the operator has dropped the sprite onto a predefined area (e.g., 503 and 504, as seen in Fig. 5) defining a predetermined metadata category for one or more video data sequences, at step 170. In this instance, at step 180, a reference to the video data ) sequence corresponding to the sprite dropped on the predefined area is added to a set of video data sequences corresponding to the category. The processor 905 also performs the step of recording data regarding the selection of the sprite. In particular, association metadata representing an association created between the category and the corresponding video data sequence is stored in the hard disk drive 910. Accordingly, at step 180, the processor 905 may 5 perform the step of determining an association between at least a portion of a video data sequence and a predetermined category, upon selection of at least one of the sprites and the category using the displayed collage. The corresponding portion of the video data sequence is associated with the category. In one embodiment, the association causes the corresponding portion of the video data 0 sequence to be removed from the master playlist and added to the appropriate category playlist. A reference indicating at least a portion of the video data sequence may be added to the category playlist, based on the determined association. Accordingly, the category playlists are automatically managed by the visual interaction with video sprites. Adding the video data sequences to the set may also be attained using any suitable 25 method. For example, the video data sequences may be copied to a new storage location within the hard disk drive 910, where all video data sequences related to a particular metadata tag (or category) are placed. In another example, the original video data sequences may be tagged with the metadata and retrieved on demand when that metadata tag is queried. In still another example, a reference to a particular video data sequence may be stored in a list and the 30 particular video data sequence may be retrieved using the reference when the list is later read. In one embodiment, the video data sequences of a particular category playlist may be played back upon selection of the particular category playlist. 1907782vl (886354_Final) - 16 The method 100 will now be further described by way of example with reference to Figs. 2A to 6B. Figs. 2A-2C show video frames 210, 220 and 230, respectively, from a video data 5 sequence captured by the camera 990 and transmitted as a video stream to the computer module 901. The video frames 210, 220 and 230 are shown chronologically from top to bottom and represent a portion of video data from within a longer video data sequence captured by the camera 990. After being captured and transmitted to the computer module 901, the longer video data sequence is stored in the hard disk drive 910 by the processor 905. ) Figs. 3A-3C show another three video frames 310, 320 and 330 from a portion of the same longer video data sequence occurring some time later within the longer video data sequence. In accordance with the example, the video frames 210, 220 and 230 are displayed within a display area 201 of the user interface on the display device 914. As seen in Figs. 2A 2C, the frames 210, 220 and 230 show a rarely changing background 202. A detectable 5 foreground object, in this case a car represented by a spite 203, is seen travelling from near the middle of the display area 201 in video frame 210 where the sprite 203 is relatively smaller to the bottom left of the display area 201 in video frame 230 where the sprite 203 is relatively larger. The foreground object (ie., the car) may be detected in the video frame 210, as at step 120 of the method 100, and the sprite 203 created for the car. In the example of Figs. 2A-2C, D after frame 230, the car moves out of view of the camera 990, leaving the background 202 fully visible. In Fig. 3A to 3B, a detectable foreground object, in this case a human figure represented by a sprite 301 is seen walking from the left side of the display area 201 in video frame 310 to the right side of the display area 201 in video frame 330, after which the person !5 moves out of view of the camera 990. The foreground object (ie., the person) may be detected in the video frame 310, as at step 120 of the method 100, and the sprite 301 created for the person. Fig 4 shows a collage 400 constructed as at step 130. The collage 400 contains the sprites 203 and 301 for each object (i.e., the person and the car). The collage 400 has been 50 constructed using a single frame for both objects. In particular, the frame 210 comprising the sprite 203 representing the car and the frame 310 comprising the sprite 301 for the person, have been used to construct the collage 400. Accordingly, the collage 400 summarizes the longer video data sequence representing a scene captured by the camera 990. 1907782vl (886354_Final) - 17 Fig 5 shows the user interface 500. The user interface 500 may be implemented as a dialog 501 displayed on the display device 914. The user interface 500 shows the video collage 400 displayed in he video display area 201. The collage 400 comprises the background 202, the video sprite 203 representing the car from frame 210 and the video sprite 5 301 representing the person of frame 310. The dialog 501 also provides two predefined drop areas, a positive category drop area 502 and a negative category drop area 504. The user can operate the mouse pointer 502, for example, with the mouse 903, to select an object in the collage 400 as at step 170. Selecting the sprite 203 as shown in Fig. 5 causes the collage 400 to disappear and the video data sequence containing the car, from moments before the first ) frame 210 shown in Fig. 2 to some moments after the last frame in Fig [2], is played by the processor 905 in the video display area 201. The video data sequence may be retrieved from the hard disk drive 910. In Fig. 6A, the user has dragged the sprite 203 representing the car 203 using the pointer 502, as at step 170, so that the sprite 203 is held over the positive category drop area 5 503. In the example of Fig. 6A, the area 503 has indicated that the association will be made if the sprite 203 is dropped, as at step 180, by highlighting a boundary of the area 503 in a thicker line. The sprite 203 representing the car 203 may then be dropped on the category area 503, as at step 180, thereby making a simultaneous selection of both the video sprite 203 and the positive category. As described above, association metadata representing an association D created between the positive category and the corresponding video data sequence is stored in the hard disk drive 910. The association is determined based on a drag-and-drop operation. In response to the drag-and-drop operation, the sprite 203 disappears from the collage 400, the 501 dialog and the display device 914, as at step 180. As seen in Fig. 6A, the collage 400 displayed in the display area 201 no longer shows !5 the sprite 203, indicating that the video data sequence containing the car has been processed. Data is also stored within the hard disk drive 910 indicating that the video data sequence containing the car has been analysed as positive and has been associated with the positive category. Fig. 6B shows a similar operation where the sprite 301 of the walking person is 30 dragged and dropped onto the negative category area 504, as at steps 170 and 180, indicating that a corresponding video data sequence shows activity which has been analysed as negative. As described above, association metadata representing an association created between the negative category and the corresponding video data sequence is stored in the hard disk drive 1907782vl (886354_Final) - 18 910. The video data sequence which includes the walking person is included in the set of video data sequences which have been associated with the negative category. Having analysed and processed both the foreground objects represented by the sprites 203 and 301, the video collage 400 displayed in the video display area 201 shows only the background 5 image 202 with no sprites, indicating to the operator that they have completed analysing all of the detected foreground objects from this video data sequence stored on the hard disk drive 910. The clear background image 202 is a satisfying sign that the operator's task of viewing the stored video data sequence has been fully completed. The stored association metadata is available for future use. For example, the operator or manager may press one of the 3 association areas 503 or 504 and the associated video data sequence is played in the video display area 201. Hence, the work of playing the video data sequence and clearing the collage 400 has effortlessly created, in this example, two edited video sequences for later review. Fig. 7 shows a further example where the operator defines categories (i.e., metadata tags) which they wish to use for the analysis of the video data sequence. As seen in Fig. 7, the 5 dialog 501 showing the video display area 201 provides a button 701, which the operator can use to add one or more drop areas. The drop areas can be used to associate the video data sequences associated with the displayed and drag-able sprites 203 and 301. In the example of Fig. 7, the operator selects the button 701 using the mouse 903, for example, in a conventional manner. In response to selection of the button 701, a new drop area 702 is added to the dialog D 501 by the processor 905. In the example of Fig. 7, the drop area 702 is given a default alphanumeric name which is instantly editable as indicated by selection box 703. The operator can position a mouse pointer, for example, using the mouse 903, over the selection area 703 where, under execution of the processor 905, the pointer automatically becomes a text cursor 704 to indicate that the text can be changed. The operator may then replace the default name Z5 on the drop area 702 with a name of their choice, using the keyboard 902 in a conventional manner. A previously created and named drop category area 705 is shown already on the dialog 501. The operator is optionally able to drag the drop areas (e.g., 702, 705) above and below each other. In this instance, the drop areas are re-arranged to set grid positions. Accordingly, the operator can create the categories they require for analysing a video data 30 sequence. In one embodiment, the categories are predefined by a system setting. For example, a list of categories may be stored in a file resident within the hard disk drive 910. The system 900 may be configured such that system administrator can edit the stored file so that all video 1907782vl (886354_Final) - 19 analysis dialogs use the same default categories. In such an embodiment, the operator may be allowed to add custom categories to individual video analysis sessions. Fig. 8 shows an analysis report 800 where an operator or a manager is able to review information regarding the analysis of video data as described above. The report 800 is 5 arranged in three columns. The first column 801 lists titles, the second column 802 lists values and the third column 803 lists relative percentages of the values. Example data which has until now been difficult to compile and present is easy to record and present using the described method 100. Row 804 of the report 800 lists, for reference, "Total Video Data" indicating a total 3 period of video data sequences which have been available for review (i.e., "120 hours: 32 minutes: 11 seconds"). Row 805 titled, "Total Video Data viewed", indicates an amount of video data which has actually been reviewed by selecting sprites and watching video data sequences. (i.e., "22 hours: 52 minutes: 30 seconds"). Row 805 does not include passive viewing where an 5 operator has just pressed play and watched a long video data sequence, where it may be easy to become distracted. Row 806, titled "Total objects detected", lists the total number of objects detected for reference (i.e., 5460) and row 807, titled "Total detected objects viewed" indicates how many objects have actually been selected and viewed (i.e., "1538"). 0 Row 808, titled "Total objects viewed", indicates the number of objects that the operator has actually actively removed from the collages (i.e., "1650"). Row 809, titled "removed without review", indicates how many of those removed objects were removed from the collaged (e.g., 400) without the associated video data sequence being played (i.e., "112"). 25 As described above, the third column 803 lists relative percentages of the values. For example, as seen in Fig. 8, the percentage of video data viewed compared to the total video data available is "19%". Similarly, the total detected objects viewed compared to the total number of detected objects is "33%". As also seen in Fig. 8, less than "1%" of the total objects have been removed without review. 30 The rows 809-813 list the objects and therefore video data sequences associated with each category (metadata tag) that has been used to analyse the video data sequence. In the example of Fig. 8, the following categories have been defined: 1907782vl (886354 Final) - 20 (i) 'no category' 810 indicating the number of times where the operator drags a sprite (eg., 203) off the collage (e.g., 400) and not onto any category drop zone (i.e., "1289"). In the example of Fig. 8, "78%" of all of the video sequences are categorised as "benign"; (ii) 'Traffic violation' 811 indicating the number of traffic violations viewed by the 5 operator (i.e., "312"). In the example of Fig. 8, "19%" of all of the video sequences show Traffic Violations; (iii) 'Suspicious behaviour' 812 indicating the number of times suspicious behaviour has been viewed by the operator (i.e., "45). In the example of Fig. 8, "3%" of the total video sequences show suspicious behaviour; and ) (iv) 'Purse theft 22-Oct' 813 indicating the number of purse thefts have been viewed by the operator In the example of Fig. 8, "0.2%" of all of the video sequences show purse theft. The processor 905 uses the report 800 for presenting a control element for each category. In particular, each of the rows 810-813 includes a control element (e.g., 821, 822, 5 823 and 824) such as an icon or button which allows the operator to activate playback of the associated video data sequence. In response to selection of one of the control elements (e.g., 821, 822, 823 and 824), the processor 905 may be configured to perform the step of retrieving and playing back all of the video data from the video data sequences corresponding to the selected control element. For example, in response to activation of the icon 822, the processor D 905 displays a video window on the display device 914 to play an edited video data sequence showing just the video data which were given no category. As another example, in response to activation of the icon 823, the processor 905 displays a video window on the display device 914 to play an edited video data sequence showing just the video data which contain analysed video data sequences associated with .5 traffic violations. As still another example, in response to activation of the icon 824, the processor 905 displays a video window to play an edited video data sequence showing just the video data which contains analysed sequences associated with suspicious behaviour. As still another example, in response to activation of the icon 824, the processor 905 30 displays a video window to play an edited video data sequence showing just the video data which contains sequences related to a purse theft occurrence on the 22 nd of October. 1907782v1 (886354_Final) - 21 From the edited video data sequences, able to be selected using the icons 821-824, it is then simple to offer export functions which allow packaging and transmission of the retrieved video data sequence for third parties and other locations. As described above, the described methods allow a simple and intuitive method for an 5 organization to: analyse video data sequences; collect video data into edited sequences representing categories or events; report on the video surveillance review process; and to quickly access the reference edited video data sequences. The described methods provide unprecedented detail of information and unprecedented convenience for both the video reviewer and the organisation employing and utilising their work. This improvement is 3 attained while simultaneously reducing the effort required to review a video data sequence. The described methods allow efficient access to video and provide a visual summary of video data sequences. The described methods allow for collection of salient video data sequences and for the tracking of surveillance viewing. Industrial Applicability 5 The arrangements described are applicable to the computer and data processing industries and particularly for the for video surveillance systems. The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive. O In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of'. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings. Z5 1907782vl (886354_Final)

Claims

2. The method according to claim 1, further comprising the step of adding a reference indicating the portion of said video data sequence to a category playlist, based on the determined association. 20 3. The method according to claim 2, further comprising the step of playing back the portion of said video data sequence upon selection of the category playlist.
4. The method according to claim 1, further comprising the step of associating a reference to at least a portion of the said at least one video data sequence with each sprite. 25 1907782v0 (886354_Final) - 23
5. The method according to claim 4, wherein a master playlist comprises the associated references.
6. The method according to claim 1, further comprising the step of playing back at least a 5 portion of said video data sequence upon selection of a corresponding one of said sprites.
7. The method according to claim 1, wherein the at least one sprite and said category are substantially simultaneously selectable. D 8. The method according to claim 1, further comprising the step of associating corresponding portions of said video data sequence with said category.
9. The method according to claim 1, further comprising the step of detecting a background in said video data sequence for use in said collage. 5
10. The method according to claim 1, where the association is determined based on a drag and-drop operation.
11. The method according to claim 1, further comprising the step of recording data 20 regarding said selection.
12. The method according to claim 1, wherein said category is presented as an area responsive to drop operations. 1907782vl (886354_Final) - 24
13. The method according to claim 1, further comprising the step of presenting a control element for said category.
14. The method according to claim 13, further comprising the step of retrieving and 5 playing all video data from said video data sequence corresponding to said control element.
15. A method of adapting video playlists, said method comprising the steps of: detecting at least one foreground object within at least one video data sequence; creating a sprite for each of said foreground objects; D displaying a collage constructed from at least one video frame of the video data sequence and a plurality of said sprites; determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and 5 adding a reference indicating the portion of said video data sequence to a category playlist, based on the determined association.
16. The method according to claim 15, further comprising the step of playing back at least the portion of said video data sequence upon selection of the category playlist. 20
17. The method according to claim 15, further comprising the step of associating a reference to a section of the said at least one video data sequence with each sprite.
18. The method according to claim 17, wherein a master playlist comprises the associated 25 references. 1907782v1 (886354_Final) -25
19. The method according to claim 15, further comprising the step of playing back at least a portion of said video data sequence upon selection of a corresponding one of said sprites. 5 20. The method according to claim 15 wherein the at least one sprite and said category are substantially simultaneously selectable.
21. The method according to claim 15, further comprising the step of associating corresponding portions of said video data sequence with said category. D
22. The method according to claim 15, further comprising the step of detecting a background in said video data sequence for use in said collage.
23. The method according to claim 15, where the association is determined based on a 5 drag-and-drop operation.
24. The method according to claim 15, further comprising the step of recording data regarding said selection. 20 25. The method according to claim 15, wherein said category is presented as an area responsive to drop operations.
26. The method according to claim 15, further comprising the step of presenting a control element for said category. 25 1907782vl (886354_Final) -26
27. The method according to claim 26, further comprising the step of retrieving and playing all video data from said video data sequence corresponding to said control element.
28. An apparatus for constructing a collage summarizing a video data sequence, said 5 apparatus comprising: means for detecting at least one foreground object within the video data sequence; means for creating a sprite for each of said foreground objects; means for displaying a collage constructed from at least one video frame of the video data sequence and a plurality of said sprites; means for determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and means for removing the selected video sprite from the displayed collage. 5 29. An apparatus for adapting video playlists, said apparatus comprising: means for detecting at least one foreground object within at least one video data sequence; means for creating a sprite for each of said foreground objects; means for displaying a collage constructed from at least one video frame of the video ?0 data sequence and a plurality of said sprites; means for determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and means for adding a reference indicating the portion of said video data sequence to a 25 category playlist, based on the determined association. 1907782vl (886354_Final) - 27
30. A system for constructing a collage summarizing a video data sequence, said system comprising: memory for storing data and a computer program; processor coupled to said memory for executing the computer program, said computer 5 program comprising instructions for: detecting at least one foreground object within the video data sequence; creating a sprite for each of said foreground objects; displaying a collage constructed from at least one video frame of the video data sequence and a plurality of said sprites; 3 determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and removing the selected video sprite from the displayed collage. 5 31. A system for adapting video playlists, said system comprising: memory for storing data and a computer program; processor coupled to said memory for executing the computer program, said computer program comprising instructions for: detecting at least one foreground object within at least one video data 20 sequence; creating a sprite for each of said foreground objects; displaying a collage constructed from at least one video frame of the video data sequence and a plurality of said sprites; 1907782v1 (886354_Final) -28 determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and adding a reference indicating the portion of said video data sequence to a category playlist, 5 based on the determined association.
32. A computer readable medium having recorded thereon a computer program for constructing a collage summarizing a video data sequence, said computer program comprising: 0 code for detecting at least one foreground object within the video data sequence; code for creating a sprite for each of said foreground objects; code for displaying a collage constructed from at least one video frame of the video data sequence and a plurality of said sprites; code for determining an association between at least a portion of the video data 5 sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and code for removing the selected video sprite from the displayed collage.
33. A computer readable medium having recorded thereon a computer program for 20 adapting video playlists, said computer program comprising: code for detecting at least one foreground object within at least one video data sequence; code for creating a sprite for each of said foreground objects; code for displaying a collage constructed from at least one video frame of the video 25 data sequence and a plurality of said sprites; 1907782vl (886354_Final) -29 code for determining an association between at least a portion of the video data sequence and a predetermined category, upon selection of at least one of said sprites and said category using the displayed collage; and code for adding a reference indicating the portion of said video data sequence to a 5 category playlist, based on the determined association.
34. A method of constructing a collage summarizing a video data sequence, said method being substantially as herein before described with reference to any one of the embodiments as that embodiment is shown in the accompanying drawings.
35. A method of adapting video playlists, said method being substantially as herein before described with reference to any one of the embodiments as that embodiment is shown in the accompanying drawings. 5 36. An apparatus for constructing a collage summarizing a video data sequence, said apparatus being substantially as herein before described with reference to any one of the embodiments as that embodiment is shown in the accompanying drawings.
37. An apparatus for adapting video playlists, said apparatus being substantially as herein 20 before described with reference to any one of the embodiments as that embodiment is shown in the accompanying drawings.
38. A system for constructing a collage summarizing a video data sequence, said system being substantially as herein before described with reference to any one of the embodiments as 25 that embodiment is shown in the accompanying drawings. 1907782vl (886354_Final) - 30
39. A system for adapting video playlists, said system being substantially as herein before described with reference to any one of the embodiments as that embodiment is shown in the accompanying drawings. DATED this Twenty-fourth Day of December 2008 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant SPRUSON&FERGUSON 1907782vl (886354_Final)