AMENDED CLAIMS
received by the International Bureau on 9 October 2012 (09.10.2012)
Claim
An integrated intelligent server based system having sensory input/data acquisition cum recording server group and /or analytics server group adapted to facilitate fail-safe integration with out involving any dedicated failover server and/ or central control and /or optimised utilization of various sensory inputs for various utility applications comprising at least one autonomous system having :
I) A) said sensory input acquisition cum recording server group comprising plurality of acquisition cum recording servers which are operatively linked to assess respective server capacity and operate as a group to enable fail-safe support when any of the servers in the group fail to operate the remaining operative servers in the group are adapted to distribute and take over the sensory input load of the non-operative server/s to render the system fail safe and self sufficient ; and/or B) said analytics server group comprising plurality of analytics server for intelligent analysis including resource dependent analytical accuracy control including means adapted for computing complexity of scenes and dynamically reconfigure the analytical processing steps for optimal analysis without compromising subjective fidelity of the data and controlling the available computational bandwidth and/or availability of computational and other resources for on-line and real-time and/or on demand for efficient and user friendly streaming/analysis/detection/alert generation of events and/or follow up actions; and
II) an intelligent interface for operative connection to said sensory input acquisition cum recording server group; and/or said analytics server group.
2. An integrated intelligent server based system as claimed in claim 1 wherein each said acquisition cum recording servers are adapted for bandwidth optimized fail-safe recording and/or join-split mechanism for multi channel sensory data/video streaming without compromising subjective fidelity of the data,.
An integrated intelligent server based system as claimed in claim 1 wherein each said analytics server is adapted for anyone or more of (a) intelligent colour object analysis framework and colour coherent background estimation, (b) identifying moving, static, quasi-static objects, (c ) enhanced object tracking, (d) content aware resource scheduling, (e) join split mechanism for multi channel video streaming, and (f) resource dependent accuracy control and establishing correlation amongst multiple types of sensory data obtained from the various sensory inputs.
An integrated intelligent server based system as claimed in claim 1 wherein said intelligent interface is operatively connected to anyone or more (a) user management and client access controller, (b) event controller and handler, and (c ) event and/or selected segments of sensory data distributor.
An integrated intelligent server based system as claimed in claim 1 comprising operative client modules comprises selectively standalone surveillance client, internet browser, web client , any hand held devices including mobile device client, and remote event and/or notification receiver.
An integrated intelligent server based system as claimed in claim 1 wherein said acquisition cum recording server is adapted to (i) collect inputs from various sensory sources , archiving, tagging, and indexing to seamlessly map in a database or data-warehousing system involving any one or more of optimal usage of computing, communication and storage resources, facilitate efficient search, transcoding, retransmission, authentication of data, rendering and viewing of archived data at any point of time and (ii) Streaming input sensory data real time or on Demand including streaming video and other sensory content in multiple formats to multiple devices for purposes including live view in different matrix layout, relay of the content, local archiving, rendering of the sensory data in multiple forms and formats, by a fail-safe mechanism without affecting speed and performance of on-going operations and services.
An integrated intelligent server based system as claimed in claim 4 comprising means for auto registration of servers involving unique identification number, configuration data of the relevant server, means for recording sensory inputs in
128
local storage in a fail-safe mechanism with out involving any dedicated failover server and/ or central control and streaming the data to client modules and means for bandwidth adaptive uploading to central storage systems without compromising subjective fidelity of the data,
An integrated intelligent server based system as claimed in claim 1 wherein said analytics server comprises:
(a) various sensory including video sensory input analytics engine; and
(b) analytics engine controller.
An integrated intelligent server based system as claimed in claim 1 wherein said intelligent interface is adapted for anyone or more of (i) filtering and need based transmission of sensory inputs, (ii) directing distribution of alerts, (iii) providing a common gateway for heterogeneous entities.
An integrated intelligent server based system as claimed in anyone of claim 5 wherein said client module comprises means enabling user to receive, view, analyze, search sensory inputs and include standalone surveillance clients, internet browsers, handheld devices, cell phones, PCs, Tablet PCs and the like.
An integrated intelligent server based system as claimed in claim 1 comprising remote event receiver adapted to receive and display messages and ALERTs from various components of the system which can further be multicast or broadcast.
An integrated intelligent server based system as claimed in claim 1 comprising central server adapted to serve as a gateway to plurality of said autonomous system and integrate the system into a single unified system.
An integrated intelligent server based system as claimed in claim 5 wherein each said acquisition cum recording server is adapted to accept requests through the intelligent interface and/or receive inputs from various other input sources, recording sensory inputs in local storage, intelligent uploading of the sensory input in a cluster of storage devices wherein said cluster comprises one or more network accessible storages in an efficient manner with fair share
129
to individual sources utilizing optimal bandwidth in a cooperative manner, enabling searching of input and analytical sensory inputs and streaming of the sensory inputs in original or transcoded format to various other devices including surveillance clients.
An integrated intelligent server based system as claimed in claim 1 comprising means for recording sensory inputs in local storage in a fail-safe mechanism without involving any dedicated failover server and/ or central control and intelligent streaming of stored inputs continuously or on trigger from any external or internal services wherein the data stream is first segmented into small granular clips or segments of programmable and variable length sizes and said clips stored in the said local storage of the server, the clip metadata being stored in the local database.
An integrated intelligent server based system as claimed in claim 14 comprising bandwidth adaptive data uploading from channels to central storage system via said local storage comprises allocating a data source to a server group with multiple servers in the group, said servers comprising the server group adapted to exchange their respective capacity information such that in case of a breakdown of anyone or more of the servers in a group the remaining operative servers in the group share the load of the failed server/servers, each server also adapted to monitor the available bandwidth and also the data inflow rate for each channel into the server and accordingly adjust the upload rate for an input channel , means to segment the data stream into various sized clips and the rate of uploading the clips to the central storage adjusted depending upon the network bandwidth and data inflow rate for that particular channel .
An integrated intelligent server based system as claimed in claim 8 wherein said sensory input analytic engine comprises of scene analyzer comprises means to generate meta-data against each frame for analysis and computing the complexity of the scene such as to dynamically reconfigure the processing steps based thereon for optimal analysis results depending upon the availability of the computational and other resources for
130
on-line and real-time detection of events and follow up actions and further feeding the metadata along with the scene complexity measure to a controller adapted to decide the rate at which the frames of said channel should be decoded and sent to the analytic engine for processing;
rule engine adapted to maintain history of the metadata and correlate the data across multiple frames to thereby decide the behavioural patterns of the objects in the scene for further determinations; and event decider adapted to receive the behavioural patterns as detected by the rule engine and analyze the same to thereby detect various events in parallel and also to control user defined application of any external device for better decision making/study of the event identified.
An integrated intelligent server based system as claimed in claim 16 wherein said scene analyzer comprises means for intelligent scene adaptive colour coherent object analysis framework and method adaptive to the availability of computational bandwidth and memory enabling processing steps to be dynamically reconfigured.
An integrated intelligent server based system as claimed in claim 8 wherein said analytical engine controller comprises:
A) means to receive multiple sensory channel inputs and feed decoded frames of the multiple channels to the analytical engine wherein the said decoding and feeding of the decoded frames to the analytical engine is optimally controlled such that the number of frames sent per second for each channel is individually and automatically controlled depending on the requirement of the analytics engine and also on the computational bandwidth available in the system at any point of time; and means adapted to stream sensory data along with analytical results either as individual streams for each channel or as joined single stream data for all or user requested channels involving joining the channels and transmitting resulting combined single channel over IP network adapted to varying and low bandwidth network connectivity, Or
131
B) means adapted to directly generate events without feeding any decoded frames to the analytical engine.
An integrated intelligent server based system as claimed in claim 1 wherein said intelligent interface is adapted to (i) auto register itself to the system, (ii) accept request from surveillance clients and relay the same to corresponding recording server and analytic server, (iii) receive configuration data from the surveillance clients and feed to the intended components of the system, (iv) receive event information from analytic server on-line and transmit to various recipients including remote event receiver, fetch outstanding event clips, if any,(v) periodically receive heartbeat signals along with status information from all active devices and relay that to other devices in same or other networks, (vi) stream live video, recorded video or event alerts at appropriate time,(vii) join multiple channel sensory inputs into a single combined stream to adapt to variable and low bandwidth network,(viii) enable search based on various criteria including data ,time, event types, channels, signal features, and other system input and (ix) enable user to perform an user-interactive smart search to filter out desired segment of the sensory input from the database.
An integrated intelligent server based system as claimed in claim 1 wherein said acquisition cum recording server group comprise plurality of sensory data recording server adapted to : record inputs from single /multiple data sources in atleast one local storage space database with the URL of the files stored; transfer the thus stored files from said local storage to a network based central storage provided for accessing the files for end use/applications, said transfer of sensory data from source to the central storage via said local storage being carried out taking into consideration the data download speed (inflow rate) from data source to server along with the availability of network bandwidth at any given point of time for efficient network
132
bandwidth sharing amongst the data multiple data sources to said storage device in the network.
An integrated intelligent server based system as claimed in claim 20 wherein said sensory data recording server is adapted to monitor available total network bandwidth and per channel inflow rate and based thereon decide rate of per channel video transfer from the server local storage to said central storage. 22. An integrated intelligent server based system as claimed in claim 21 wherein said sensory data from the source are recorded in the form of variable length clips wherein the clip duration is set by the user or set by the server itself.
23. An integrated intelligent server based system as claimed in claim 21 wherein said sensory data recording server is adapted for determining the optimal bit rate for uploading sensory inputs involving :
(a) average bit rate for each channel separately in periodic intervals wherein the sensory input streaming rate (D, ) of a particular source/camera (Q) camera to the server is estimated and (b) identifying the available network bandwidth (B) at that instant from the system; and finally (c ) calculating the frequency of Clip upload for channel, based on :
where 0<k< l, depending on how much of the remaining bandwidth is to be allocated for video uploading task.
24. An integrated intelligent server based system as claimed in claim 1 wherein the capacity of the respective servers in a server group is based on the memory, network bandwidth and current processor utilization within the server.
25. An integrated intelligent server based system as claimed in claim 24 wherein a server group is adapted to allocate any one of the operative servers in said
133
group as the group master server and continuously monitor the servers in the group and their respective capacities and decide on the allocation and release of the input sensory source from any server within the Group.
An integrated intelligent server based system as claimed in claim 25 the said group master server is adapted to release or add a sensory input source based on required (a) addition of an input source, (b) deletion of an existing input source, (c ) addition of a new recording server to the system or when a failed server again re-operates and (d) when a running server stops functioning.
An integrated intelligent server based system as claimed in claim 1 wherein each said analytical server is adapted for multiple component colour object analysis in a scene favouring scene analytic applications comprising : multiple component colour coherent background estimation involving colour correlation of neighbouring pixels and inter-frame multiple component colour correlation using said multiple components as a composite data and using the relative values of these components to maintain accurate colour information and appearance of the true colour in the estimated background frame.
An integrated intelligent server based system as claimed in claim 27 wherein said analytical server is adapted for colour object analysis involving said unified colour coherent background estimation involving statistical pixel processing comprises using R,G,B components as a composite single structure in a unified manner to thereby preserve the mutual relationship of theses colour components in each individual pixel in order to maintain true colour appearances in the estimative colour background frame; continuously readjusting modelled or predicted values for each colour pixel in a frame with all sequential forthcoming frames of the colour video; correlate the spatial distribution of the colour values in a local region to model the pixel background colour value.
134
An integrated intelligent server based system as claimed in claim 27 wherein said analytical server is adapted for colour object analysis involving said colour analysis of each pixel comprising accumulating the colours in the above window in different colour clusters k consisting of a mean representative colour pixel value "" with span of colour deviation (σ«' σβ'σ v
and a number of appearance ( k ) of a colour pixel in this cluster and based thereon i) Matching the colour pixel (R,G,B) with colour cluster k to confirm if the same is within the span of colour deviation ; i) If the colour of the pixel does not match with any cluster then create a new colour cluster with with mean value (R, G, B) and default chosen allowed threshold for deviation ( ση , ση , aTh ) and number of occurrence υ = 1 ii) Split the colour cluster (p) which have a large (σκ, σβ , σΒ )ρ value and merge all the colour cluster which have very close mean reprasentative value, the propability of occurrence then adjusted in the same ratio of the estimated colour clusters for that population, to thereby achive finer granular colour matching.
An integrated intelligent server based system as claimed in claim 1 wherein said analytical server is adapted for efficient face detection in video images and the like by limiting the search space involving motion detection technique and controlled computational requirements based on desired accuracy by carrying out prediction of number iterations and temporal parameter "t".
An integrated intelligent server based system as claimed in claim 30 wherein said analytical server for said face detection is adapted for :
i) involving the grey image of cropped motion rectangular area from current frame to calculate said temporal parameter "t" and updating "t" with history and calculating possible number of iterations "nlterations" ii) calculating scale factor, no. of iterations and other parameter from look up table;
135
iii) using convolution on different scaled images to get probable face rectangles;
iv) grouping the probable faces with spatial information ; and
v) obtaining therefrom the confirmed faces.
32. An integrated intelligent server based system as claimed in claim 1 comprising resource allocation for analytical servers involving : estimating scene complexity relevant for frequency of frame processing ; spawning of processor threads based on physical CPU cores involving a controller;
allocation of threads to video channels for analytical processing based on requirements; and feeding the frames for processing to a video analytics engine at an fps F, where F is calculated dynamically by the analytics engine itself depending upon its processing requirements based on scene complexity to thereby favour optimal sharing of resources eliminating unnecessary computing.
33. An integrated intelligent server based system as claimed in claim 32 wherein said scene complexity is determined based on (a) inter class difference of foreground and background (b) number of objects present and (c) extent of processing based on the particular processing task.
34. An integrated intelligent server based system as claimed in claim 32 comprising a Controller module for spawning a number of processing threads depending on the number of CPU cores present as available from the system hardware information and a task scheduler module for generating the sequence indicating the order in which the individual channels are to be served for analytics tasks.
35. An integrated intelligent server based system as claimed in claim 1 comprising multi channel join-split mechanism adapted for low and /or variable bandwidth network link comprising :
136
a sender unit adapted to receive multi channel inputs from site to join and compress into a single channel and a receiver unit at the client site to receive the inputs and extract the individual channels for the purposes of end use said sender unit adapted to combine while transmitting multi channel inputs into a single channel , frame by frame, and controlling the transmission bit rate to avoid jittery out puts and/or any interference between individual channels and/or starvation for any single channel.
An integrated intelligent server based system as claimed in claim 34 comprising means for encoding the stream with variable bit rate depending upon the available bandwidth from server to the client, a frame header is transmitted with each frame of the combined stream, said frame header containing meta data about the constituent streams, said receiver unit adapted to split the combined stream into constituent streams based on said frame header.
An integrated intelligent server based system as claimed in claim 34 wherein the sender unit is adapted to receive raw inputs or decode the inputs to raw input and store in a memory allocated for inputs from a defined channel and generate an initial fps on request from a client, on request of a subset of channel from the client , a sample module is adapted to take the current frame from the channel specific memory area at a fixed rate for those channels and combines to a single frame along with generation of a look-up table to store the channel ID and its boundary within the combined frame and finally compressed and checked to identify all motion vectors which cross the allocated inter-frame boundary and forcibly set all such motion vectors to null to ensure that the video content of one constituent frame within the combined frame does not interfere with the content of another constituent frame , a frame header composed with meta data information about the position of the individual channels frames within the combined frame , the resolution of the individual frames and the time stamp; said receiver unit is adapted to open a TCP connection with the sender and request for all or selected channels including selectively specifying the format
for compression, additional commands to get the existing channel information, resolution of the channels, the fps of the individual channels at the senders end and other inputs directed to specifying the channels of interest and specifying other parameters as the transmitting fps (f) , initial bit rate etc.
An integrated intelligent server based system as claimed in claim 16 wherein said event decider means comprises an enhanced object tracking system comprising : object tracking means in conjunction and one or more PTZ cameras wherein when an object is first detected in a fixed camera view of the said object tracking means the same is adapted to track the object and also generate and transmit the positional values along with a velocity prediction data to the PTZ camera controller;
said PTZ camera controller adapted to receive the positional information of the object in the PTZ camera view involving scene registration and coordinate transformation technique.
An integrated intelligent server based system as claimed in claim 37 adapted to carry out said coordinate transformation following : a . identifying a set of points in the static camera as A, B, ... and also corresponding points Α',Β', ... in the PTZ camera by the user; b. mapping any arbitrary point C in the static camera to the corresponding point C in the PTZ camera view dynamically wherein : ax, bx, cx are x-coordinates of points A, B and C respectively in the static Camera view and similarly a'x, b'x and c'x are for the corresponding points in
PTZ view where point C is interpolated with the help of points A and B, with a confidence factor WAB , where WAB = (Ax - Bx) ÷ [Minimum of (Cx - Bx , Cx - Ax)] is determined to be
C' XAB = Bx ' + [( Ax ' - Bx ' ) x (Cx- Bx) ÷ ( Ax - Bx )]
and wherein similarly, an estimate of x-coordinate of the same point C is generated for all pair of points (A, B) in the Static camera view based on :
C = ∑ [ C'xAB x WAB ] ÷∑WAB and similarly generating also the y-coordinate C y for the point C.
40. An integrated intelligent server based system as claimed in claim 1 wherein said acquisition cum recording servers and said analytical server are adapted to carry out intelligent automated traffic enforcement involving a video surveillance system with video analytic servers adapted for carrying out sequential analytical process (a) configuration means ( b) incident detection means (c ) incident audit means (d ) reporting generation means (e) synchronization means and (f) user management means.
41. An integrated intelligent server based system as claimed in claim 1 comprising a site map server installed within each autonomous system and also within the centralized server gateway to the entire system which is adapted to receive request from any authorised components of the system and respond with positional data corresponding to any component linked, said site layer preferably multilayered and components linked to any spatial position of the map in any layer. 42. A method for cost-effective and efficient transferring /recording sensory data from single or multiple data sources to network accessible storage devices comprising :
atleast one sensory data recording server adapted to record inputs from single /multiple data sources in atleast one local storage space with the URL of the files stored in database;
transferring the thus stored files from said local storage to a network based central storage provided for accessing the files for end use/applications, said transfer of sensory data from source to the central storage via said local storage being carried out taking into consideration the data download speed(inflow rate) from data source to server along with the availability of
139
network bandwidth at any given point of time for efficient network bandwidth sharing amongst multiple data sources to said storage device in the network.
A method as claimed in claim 41 wherein said sensory data recording server is adapted to monitor available total network bandwidth and per channel inflow rate and based thereon decide rate of per channel video transfer from the server local storage to said central storage.
44. A method as claimed in claim 42 wherein sensory data from the source are recorded in the form of variable length clips wherein the clip duration is set by the user or set by the server itself.
45. A method as claimed in claim 43 comprising the step of determining the optimal bit rate for uploading sensory inputs comprising the following steps :
(a) calculating the average bit rate for each channel separately in periodic intervals wherein the sensory input streaming rate (D, ) of a particular source/camera (Q) camera to the server is estimated and (b) identifying the available network bandwidth (B) at that instant from the system ; and finally (c ) calculating the frequency of Clip upload for channel, based on :
LI, = [ B x k ÷∑D; ] x D| , where 0< k< l, depending on how much of the remaining bandwidth is to be allocated for video uploading task.
46. A method for sensory input recording and live streaming in a multi-server environment comprising : a fail-safe server group with out involving any dedicated failover server and mirror central control means
Each said server group comprising plurality of acquisition cum recording servers
said multiple recording servers adapted to exchange information amongst one another and left over capacity of each server is known along with the channel information of every other server such that in case of any server failure in said server group the remaining active servers in the server group
140
automatically distribute the required operative load amongst the remaining operative servers for a fail safe recording and streaming of the sensory data.
A method as claimed in claim 45 wherein each recording server auto registers in the system and a database entry is created with the server ID whereby the said recording server gets listed in the database and is then ready for recording data from one or more sources.
A method as claimed in claim 45 wherein the recording is done by breaking the data streams into chunks or clips of small duration and the clips are initially stored in a local server storage space and periodically uploaded to one or more network attached storage in a round robin fashion.
A method as claimed in claim 45 comprising plurality of server groups which are operatively connected to network storage and as soon as a server is registered in a Group it generates a message describing its IP address, group ID and remaining capacity to handle more data source/cameras.
A method as claimed in claim 45 wherein the capacity of the respective servers in a server group is based on the memory, bandwidth and current processor utilization within the server.
A method as claimed in claim 45 comprising assigning the server operatively connected to the input sensory devices and the capacity of the server determined accordingly with continuous monitoring of required decrement or increment of capacity based on addition or removal of sensory input sources.
A method as claimed in claim 45 wherein a server group is adapted to allocate any one of the operative servers in said group as the group master server and continuously monitor the servers in the group and their respective capacities and decide on the allocation and release of the input sensory source from any server within the Group.
A method as claimed in claim 51 wherein the said group master server is adapted to release or add a sensory input source based on required (a)
141
addition of an input source (b) deletion of an existing input source (c) addition of anew recording server to the system or when a failed server again re-operates and (d) when a running server stops functioning.
An intelligent and unified method of multiple component colour object analysis in a scene favouring scene analytic applications comprising :
Multiple component colour coherent background estimation involving colour correlation of neighbouring pixels and inter-frame multiple component colour correlation using said multiple components as a composite data and using the relative values of these components to maintain accurate colour information and appearance of the true colour in the estimated background frame.
An intelligent and unified method of claim 54 wherein said multiple components comprise multi-spectral signals including human visible spectra Red (R ), Green (G), Blue (B) signals and similar.
An intelligent and unified method of colour object analysis as claimed in claim 54 comprising (A) unified colour coherent background estimation involving statistical pixel processing ; (B) removal of shadow and glare from the scene along with removal of electronics induced different types of noises in sensors and vibrations of sensors;(C) characterization of pixels in the foreground regions and extract moving and/or static objects.
An intelligent and unified method of colour object analysis as claimed in claim 54 comprising tracking variety of objects individually and generating related information for rule-engine based intelligent analytical applications.
An intelligent and unified method of colour object analysis as claimed in claim 54 wherein said unified colour coherent background estimation involving statistical pixel processing comprises using R,G,B components as a composite single structure in a unified manner to thereby preserve the mutual relationship of theses colours components in each individual pixel in order to maintain true colour appearances in the estimative colour background frame;
142
continuously readjusting modeled or predicted values for each colour pixel in a frame with all sequential forthcoming frames of the colour video; correlate the spatial distribution of the colour values in a local region to model the pixel background colour value.
An intelligent and unified method of colour object analysis as claimed in claim 57 wherein for each pixel (x,y) in the input colour frame there is carried out (i) local window estimation (ii) colour analysis of each pixel and (iii) background frame construction based thereon.
An intelligent and unified method of colour object analysis as claimed in claim 57 wherein if the pixel location in a current frame belongs to an object pixel in the previous frame , estimation of colour background at that pixel location is skipped since the colour pixel is not representative of the background estimation, otherwise, compute an adaptive size (k * h, k* w) local window centering around this pixel for computation of the background estimation using the colour pixel values within this window, where k =— -— -—
255 representing normalized average intensity of all the pixels in window size (h, w). for all 0< k< l, with the processing window size reduces with the reduction of intensity in the region surrounding the pixel.
An intelligent and unified method of colour object analysis as claimed in claim 59 wherein said colour analysis of each pixel comprises accumulating the colours in the above window in different colour clusters k consisting of a mean representative colour pixel value ^R ^o ^B ^k w\Xh span of colour deviation (σκ , σα > σΒ ancj a number of appearance ( Vk ) of a colour pixel in this cluster and based thereon i) Matching the colour pixel (R,G,B) with colour cluster k to confirm if the same is within the span of colour deviation ; ii) If the colour of the pixel does not match with any cluster then create a
143
new colour cluster with with mean value (R, G, B) and default chosen allowed threshold for deviation ( ση , ση , ση ) and number of occurrence v = 1 iii) Split the colour cluster (p) which have a a large (σκ, σα , σΒ ) value and merge all the colour cluster which have very close mean reprasentative value, the propability of occurrence then adjusted in the same ratio of the estimated colour clusters for that population, to thereby achive finer granular colour matching.
An intelligent and unified method of colour object analysis as claimed in claim 60 wherein background frame construction comprises constructing colour background reference frame from representative colour values of the generated clusters, if matched colour cluster has significantly high occuerance relative to the overall population occuerance then the representative colour of the colour cluster is used as the value of the colour pixel in the colour background refence frame.
An intelligent and unified method of colour object analysis as claimed in claim 60 wherein the removal of the shadow, glare and sensor generated noises comprises removal of shadow and glare in background and /or foreground segmentation process for dynamic scenes involving image characteristics parameters.
An intelligent and unified method of colour object analysis as claimed in claim
62 wherein said image characteristic parameters comprise
(1) median intensity (I) of the image, (2) a sharpness parameter (S) of the image.
An intelligent and unified method of colour object analysis as claimed in claim
63 wherein said sharpness parameter of the image is obtained based on
144
every row of the input frame is filtered with a high pass filter. The average of the filtered values of the overall image is considered as horizontal sharpness parameter SH .
every column of the input frame is filtered with the same high pass filter. The average of the filtered values of the overall filtered image is considered as vertical sharpness parameter Sv .
maximum of S
H
S
v is the sharpness parameter (S) of the image
66. An intelligent and unified method of colour object analysis as claimed in claim 64 wherein ratio V = ' ¼ 15 is used to characterize the scene.
An intelligent and unified method of colour object analysis as claimed in claim 65 comprising (a) adaptive threshold value calculation based on the value V in every frame of each said parameter (b) measurement of change in pixel's characteristics and (c) identification and removal of shadow and glare with or without sensor generated noises based on the comparative details under (a) and (b) above.
68. An intelligent and unified method of colour object analysis as claimed in claim 66 comprising static foreground formation involving multi level hierarchical estimation and characterization of the static foreground pixels.
69. An intelligent and unified method of colour object analysis as claimed in claim 67 comprising segmenting the detected foreground regions using suitable image processing based object clustering methods and morphological techniques.
70. A method of face detection in video images and the like comprising the step of limiting the search space involving motion detection technique and controlled computational requirements based on desired accuracy by carrying out prediction of number iterations and temporal parameter "t".
71. A method of face detection in video images as claimed in claim 70 comprising the steps of:
145
i) involving the grey image of cropped motion rectangular area from current frame to calculate said temporal parameter "t" and updating "t" with history and calculating possible number of iterations Alterations"
ii) calculating scale factor, no. of iterations and other parameter from look up table;
iii) using convolution on different scaled images to get probable face rectangles;
iv) grouping the probable faces with spatial information ; and
iv) obtaining therefrom the confirmed faces.
A method of face detection in video images as claimed in claim 70 comprising using the convolution on probable face regions with Harr feature set to confirm faces and publishing the confirmed faces based thereon .
A method of face detection in video images as claimed in claim 71 comprising step of carrying out said temporal estimation "t", prediction of possible number of iterations Alterations" following : i. Generating time taken to detect face for Image with size MxN based on
TMN « t * [(M - m) * (N - n) ] / [pixelShift * pixelShift] where, pixelShift is the window shift size and the time taken to process a
single window area (fixed window size mxn) with standard feature set = t. ii . For multi-scale processing ScaleFactor = f(M, N, m, n, niteration) iii . Total time taken to detect faces, T = ΧΜ' Ν·
Where, M' = M / (ScaleFactor ' )
N' = N / (ScaleFactor ' )
146
iv. T = f(M, N, t, pixelShift, nlteration), for a fixed size window. v. Calculating average t in host machine and tune the parameters pixelShift, nlteration accordingly using generated lookup table to suite the bandwidth; and vi. Optionally, to increase the accuracy, enable a second pass upon the probable face regions detected by first pass.
74. A method of resource allocation for analytical processing involving multi channel environment comprising : estimating scene complexity relevant for frequency of frame processing ; spawning of processor threads based on physical CPU cores involving a controller;
allocation of threads to video channels for analytical processing based on requirements; and feeding the frames for processing to a video analytics engine at an fps F, where F is calculated dynamically by the analytics engine itself depending upon its processing requirements based on scene complexity to thereby favour optimal sharing of resources eliminating unnecessary computing.
75. A method as claimed in claim 74 wherein the scene complexity is calculated based on (a) inter class difference of foreground and background (b) number of objects present and (c) extent of processing based on the particular processing task.
76. A method as claimed in claim 74 wherein a Controller module spawns a number of processing threads depending on the number of CPU cores present as available from the system hardware information and a task scheduler module generates the sequence indicating the order in which the individual channels are to be served for analytics tasks.
147
A system for multi channel join-split mechanism adapted for low and /or variable bandwidth network link comprising : a sender unit adapted to receive multi channel inputs from site to join and compress into a single channel and a receiver unit at the client site to receive the inputs and extract the individual channels for the purposes of end use said sender unit adapted to combine while transmitting multi channel inputs into a single channel ,frame by frame, and controlling the transmission bit rate to avoid jittery outputs and/or any interference between individual channels and/or starvation for any single channel.
A system as claimed in claim 77 adapted for intelligent data compression without affecting the decoding process.
A system as claimed in claim 77 wherein said compression is intelligently controlled such that no motion vector crosses over the inter-frame boundary in the combined frame.
A system as claimed in claim 77 comprising means for encoding the stream with variable bit rate depending upon the available bandwidth from server to the client, a frame header is transmitted with each frame of the combined stream, said frame header containing meta data about the constituent streams, said receiver unit adapted to split the combined stream into constituent streams based on said frame header.
A system as claimed in claim 79 wherein the sender unit is adapted to receive raw inputs or decode the inputs to raw input and store in memory allocated for inputs from a defined channel and generate an initial fps on request from a client, on request of a subset of channel from the client, a sample module is adapted to take the current frame from the channel specific memory area at a fixed rate for those channels and combines to a single frame along with generation of a look-up table to store the channel ID and its boundary within the combined frame and finally compressed and checked to
148
identify all motion vectors which cross the allocated inter-frame boundary and forcibly set all such motion vectors to null to ensure that the video content of one constituent frame within the combined frame does not interfere with the content of another constituent frame , a frame header composed with meta data information about the position of the individual channels frames within the combined frame, the resolution of the individual frames and the time stamp; said receiver unit is adapted to open a TCP connection with the sender and request for all or selected channels including selectively specifying the format for compression, additional commands to get the existing channel information, resolution of the channels, the fps of the individual channels at the senders end and other inputs directed to specifying the channels of interest and specifying other parameters as the transmitting fps (f) , initial bit rate etc.
82. A system as claimed in claim 79 wherein said receiving unit is further adapted to calculate receiving bit rates based on averages and request target bit rate to the sender unit, a bit rate controller at the server end adapted to prepare the encoder for new bit rate, flushing the transmission queue and respond to the client with the new bit rate as set.
83. A system for enhanced object tracking comprising :
object tracking means in conjunction with one or more PTZ cameras wherein when an object is first detected in a fixed camera view of the said object tracking means the same is adapted to track the object and also generate and transmit the positional values along with a velocity prediction data to the PTZ camera controller;
said PTZ camera controller adapted to receive the positional information of the object in the PTZ camera view involving scene registration and coordinate transformation technique.
84. A system for enhanced object tracking as claimed in claim 83 wherein more than one object is tracked involving multiple PTZ cameras such as to cover a
149
wider range in the scene and to enhance multiple object tracking over a single framework.
A system for enhanced object tracking as claimed in claim 83 wherein said means of coordinate transformation from fixed camera view to PTZ camera view involves coordinate transformation technique comprising weighted interpolation method.
A system for enhanced object tracking as claimed in claim 84 which is adapted to carry out said coordinate transformation following : a. identifying a set of points in the static camera as A ,B, etc and also corresponding points Α',Β', etc respectively in the PTZ camera by the user; b. mapping any arbitrary point C in the static camera to the corresponding point C in the PTZ camera view dynamically wherein : ax, bx, cx are x-coordinates of points A, B and C respectively in the static Camera view and similarly a'x, b'x and c'x are for the corresponding points in PTZ view as interpolated with the help of points A and B, with a confidence factor WAB , where WAB = (Ax - Bx) ÷ [Minimum of (Cx - Bx , Cx - Ax)] is determined to be
C'xAB = Bx ' + [( Ax ' - Bx ' ) x (Cx- Bx) ÷ ( Ax - Bx )] and wherein similarly, an estimate of x-coordinate of the same point C is generated for all pair of points (A, B) in the Static camera view based on :
C'x = ∑ [ C'xAB x WAB ] ÷∑WAB and similarly generating also the y-coordinate Cy for the point C.
A system for enhanced object tracking as claimed in claim 85 wherein for a bounding rectangle to be mapped from the static view to the PTZ view, the
150
system is adapted to apply said coordinate transformation technique for all the four corner points of the rectangle.
88. A system for enhanced object tracking as claimed in claim 86 wherein the bounding rectangle corresponding to an object in the static camera view is associated with velocity prediction information, the system is adapted to apply that velocity prediction information to map the rectangle in the PTZ camera view. 89. An intelligent automated traffic enforcement system comprising :
a video surveillance system adapted to localize one or more number plates / License Plates of vehicles stationary or in motion in the field of view of atleast one camera without requiring to fix the number plate in a fixed location of the car, the license plate can be reflective or non-reflective, independent of font and language, and using normal security camera, and filtering out other texts from the field of view not related to the number-plate, enabling to process the localized number plate region with any Optical Character Recognition, and generate localized information of the number plate with or without in other relevant composite information of car (type, possible driver snapshot, shape and contour of the vehicle) in parallel to monitor traffic and an intelligent video analytical application for event detection based on the video feeds
90. An intelligent traffic enforcement system as claimed in claim 89 wherein the process depends localizes possible license plate in the field of view of the camera by (a) analysing statistically correlation and relative contrast between the number plate content region and the background region surrounding this content, (b) unique signature of number plate content based on pixel intensity and vertical and horizontal distribution, (c) colour features of the content and surrounded background.
91. An intelligent automated traffic enforcement system as claimed in claim 89 wherein said video analytic process is carried out in the sequence involving (a) configuration means ( b) incident detection means (c ) incident audit
151
means (d ) reporting generation means (e) synchronization means and (f) user management means.
An intelligent automated traffic enforcement system as claimed in claim 89 wherein said configuration means adapted to configure parameters for incident detection and management comprises (i) camera configuration means (ii) means for providing for virtual loops in regions where monitoring is required(iii) means for setting time limits for the monitoring activity (iv) means providing feed indicative of regular traffic moving directions for each camera (v) means providing for setting speed limits to detect over speeding vehicles (vi) means for setting the sensitivity and duration determining traffic abnormality and congestion.
An intelligent automated traffic enforcement system as claimed in claim 89 wherein said incident detection means is adapted to detect deviations from set parameters, analyze appropriate video feed and check for offence involving (a) recording by way of saving video feeds from various traffic locations of interest (b) generating alarm including alerts and/or notifications visual and/or sound based on any incident detection involving traffic violation and (c ) registering the incident against the extracted corresponding license plate number of the violating vehicle.
An intelligent automated traffic enforcement system as claimed in claim 89 wherein said incident audit means comprises:
Filter means adapted to reach to the incident if incident is an archived incident and in case of live incident means for viewing the details;
Means for generating details of the incident, a link to incident video and a link to license plate image of the vehicle;
Means for verification of the incident by playing the video and vehicle's registration number by viewing the license plate image and If the license plate number is incorrect means to enter the correct vehicle number of the incident image;
152
Means for updating incident status changed from "Pending"/" Acknowledged" to "Audit" and saving into the database.
Means to enter remark about the action taken while auditing the incident and finally the remark is saved in the database with possible re-verification for future reference.
95. An intelligent automated traffic enforcement system as claimed in claim 89 wherein said incident reporting means comprises means for automatized generation of incident detail reports and incident summary report and generation of offence report.
96. An intelligent automated traffic enforcement system as claimed in claim 89 wherein said synchronization means includes means adapted for synchronization with handheld device applications.
97. An intelligent automated traffic enforcement system as claimed in claim 89 wherein said user management means includes interface for administrative functions including (a) user creation and management (b) privilege assignment and (c) master data Management.
98. A computer readable medium adapted for enabling and operating an integrated intelligent sensory input/data acquisition cum recording server group and /or analytics server group adapted to facilitate fail-safe integration with out involving any dedicated failover server and central control and /or optimised utilization of various sensory inputs for various utility applications comprising at atleast one autonomous system having :
I) A) said sensory input acquisition cum recording server group comprising plurality of acquisition cum recording servers which are operatively linked to assess respective server capacity and operate as a group to enable fail-safe support when any of the servers in the group fail to operate the remaining operative servers in the group are adapted to distribute and take over the sensory input load of the non- operative server/s to render the system fail safe and self sufficient ;
153
and/or B) said analytics server group comprising plurality of analytics server for intelligent analysis including resource dependent analytical accuracy control including means adapted for computing complexity of scenes and establishing correlation amongst multiple types of sensory data obtained from the various sensory inputs and dynamically reconfigure the analytical processing steps for optimal analysis and/or availability of computational and other resources for on-line and realtime and/or on demand for efficient and user friendly streaming/analysis/detection/alert generation of events and/or follow up actions; and
an intelligent interface for operative connection to said sensory input acquisition cum recording server group; and/or said analytics server group.
A computer readable medium adapted for enabling and operating a method for cost-effective and efficient transferring /recording sensory data from single or multiple data sources to network accessible storage devices comprising : atleast one sensory data recording server adapted to record inputs from single /multiple data sources in atleast one local storage space with the URL of the files stored in database;
transferring the thus stored files from said local storage to a network based central storage provided for accessing the files for end use/applications, said transfer of sensory data from source to the central storage via said local storage being carried out taking into consideration the data download speed(inflow rate) from data source to server along with the availability of network bandwidth at any given point of time for efficient network bandwidth sharing amongst multiple data sources to said storage device in the network.
100. A computer readable medium adapted for enabling and operating a method for sensory input recording and live streaming in a multi-server environment comprising : a fail-safe server group without involving any dedicated failover server and/or central control
Each said server group comprising plurality of acquisition cum recordin servers
154
said multiple recording servers adapted to exchange information amongst one another and left over capacity of each server is known along with the channel information of every other server such that in case of any server failure in said server group the remaining active servers in the server group automatically distribute the required operative load amongst the remaining operative servers for a fail safe recording and streaming of the sensory data.
101. A computer readable medium adapted for enabling and operating an intelligent and unified method of multiple component colour object analysis in a scene favouring scene analytic applications comprising :
Multiple component colour coherent background estimation involving colour correlation of neighbouring pixels and inter-frame multiple component colour correlation using said multiple components as a composite data and using the relative values of these components to maintain accurate colour information and appearance of the true colour in the estimated background frame.
102. A computer readable medium adapted for enabling and operating a method of face detection in video images and the like comprising the step of limiting the search space involving motion detection technique and controlled computational requirements based on desired accuracy by carrying out prediction of number iterations and temporal parameter "t".
103. A computer readable medium adapted for enabling and operating a method of resource allocation for analytical processing involving multi channel environment comprising :
estimating scene complexity relevant for frequency of frame processing ; spawning of processor threads based on physical CPU cores involving a controller;
allocation of threads to video channels for analytical processing based on requirements; and feeding the frames for processing to a video analytics engine at an fps F, where F is calculated dynamically by the analytics engine itself depending
155
upon its processing requirements based on scene complexity to thereby favour optimal sharing of resources eliminating unnecessary computing.
104. A computer readable medium adapted for enabling and operating a system for multi channel join-split mechanism adapted for low and /or variable bandwidth network link comprising : a sender unit adapted to receive multi channel inputs from site to join and compress into a single channel and a receiver unit at the client site to receive the inputs and extract the individual channels for the purposes of end use said sender unit adapted to combine while transmitting multi channel inputs into a single channel, frame by frame, and controlling the transmission bit rate to avoid jittery outputs and/or any interference between individual channels and/or starvation for any single channel .
105. A computer readable medium adapted for enabling and operating a system for enhanced object tracking comprising :
object tracking means in conjunction with one or more PTZ cameras wherein when an object is first detected in a fixed camera view of the said object tracking means the same is adapted to track the object and also generate and transmit the positional values along with a velocity prediction data to the PTZ camera controller;
said PTZ camera controller adapted to receive the positional information of the object in the PTZ camera view involving scene registration and coordinate transformation technique.
106. A computer readable medium adapted for enabling and operating an intelligent automated traffic enforcement system comprising : a video surveillance system adapted to localize one or more number plates / License Plates of vehicles stationary or in motion in the field of view of atleast one camera without requiring to fix the number plate in a fixed location of the car, the license plate can be reflective or non-reflective, independent of font and language, and using normal security camera, and filtering out other
156
texts from the field of view not related to the number-plate, enabling to process the localized number plate region with any Optical Character Recognition, and generate localized information of the number plate with or without in other relevant composite information of car (type, possible driver snapshot, shape and contour of the vehicle) in parallel to monitor traffic and an intelligent video analytical application for event detection based on the video feeds.
157