CN109344285B

CN109344285B - Monitoring-oriented video map construction and mining method and equipment

Info

Publication number: CN109344285B
Application number: CN201811058356.5A
Authority: CN
Inventors: 邹复好; 李开; 周檬; 刘鹏坤
Original assignee: Wuhan Meitong Technology Co ltd
Current assignee: Wuhan Meitong Technology Co ltd
Priority date: 2018-09-11
Filing date: 2018-09-11
Publication date: 2020-08-07
Anticipated expiration: 2038-09-11
Also published as: CN109344285A

Abstract

The embodiment of the invention provides a monitoring-oriented video map construction and mining method and equipment, which take a map database and a key value type database as data persistence storage of a knowledge map, automatically perform face recognition by utilizing a deep convolutional neural network, greatly reduce the manual review strength, convert unstructured video data into structured map data for storage by using a shared convolutional neural network and a deep deconvolution network in combination with the automatic marking of attributes of pedestrian objects, can well meet the security protection requirements of figure retrieval, track mining, group center mining and the like based on reviewing monitoring videos, can efficiently inquire attribute information of nodes, can efficiently visually display the association relationship among the nodes, and can exert respective advantages by using a main information database and a detailed information database storage mode, the data storage and retrieval efficiency is high.

Description

Monitoring-oriented video map construction and mining method and equipment

Technical Field

The embodiment of the invention relates to the technical field of computer vision, in particular to a video map construction and mining method and device for monitoring.

Background

With the development of technology, intelligent security appears in the aspects of people's life, and residential districts, road monitoring, motor vehicles, bank schools and the like are all provided with monitoring cameras. With the digitization and large scale of intelligent security, the data volume of the monitoring video is continuously increased, and the characteristics of large data volume, unstructured data and difficulty in obtaining valuable information are presented. Therefore, developing a framework for automatically structuring surveillance videos is of great practical significance in the aspect of large data analysis application of surveillance videos.

The knowledge graph is a key technical branch of artificial intelligence and is widely applied to the fields of intelligent search, intelligent question answering, personalized recommendation, content distribution and the like. At present, two knowledge graph construction modes exist, one is a top-down method, and the other is a bottom-up method. In either method, a knowledge base containing a large amount of knowledge is constructed as the structural basis of the knowledge graph. Most of the existing knowledge bases are constructed on the basis of characters, few knowledge bases are constructed on the basis of pictures, the knowledge base constructed on the basis of the monitoring video is almost blank, key information in the original video, such as information of people, vehicles, objects and the like, is stored in a more simplified video map, the original information acquired from the video is converted into a mode of operation on the video map, and a stable and feasible basis is provided for subsequent data analysis and data mining. Therefore, it is valuable to evolve the knowledge-graph into a video graph oriented towards the surveillance video.

The existing relational database is difficult to process the massive unstructured monitoring video data, and the dependence on the table in the relational database to maintain the information and realize the subsequent data analysis is not ideal.

Disclosure of Invention

Embodiments of the present invention provide a method and apparatus for building and mining a surveillance-oriented video map, which overcome the above problems or at least partially solve the above problems.

According to a first aspect of the embodiments of the present invention, there is provided a monitoring-oriented video map building and mining method, including:

acquiring the same monitored object in different monitored videos based on re-identification, and acquiring attribute information of the monitored object;

storing the object number of the monitored object based on a graphic database, storing the object number of the monitored object based on a row key of a key value type database, storing the attribute information of the monitored object and the corresponding monitored video number based on a sub key of the key value type database, and establishing a monitored object video map;

and acquiring the track information of the monitored object based on the video map of the monitored object.

Preferably, the monitored object is any object in the monitored video, including a pedestrian and a vehicle.

Preferably, in step S1, the acquiring the same monitored object in different monitored videos based on re-identification specifically includes:

preprocessing all monitoring videos based on a computer cluster, dividing a monitoring video stream into a picture stream according to frames, and acquiring a unique feature vector of a monitoring object in the picture stream containing complete monitoring object information;

and establishing a segment hash index based on the unique characteristic vector, re-identifying the monitored object based on the segment hash index, acquiring the monitored video information containing the monitored object, and obtaining the entering time and the leaving time of the monitored object in the monitored video.

Preferably, the establishing of the segment hash index based on the unique feature vector and the re-identification of the monitored object based on the segment hash index specifically include:

converting the unique feature vector into a hash code, and equally dividing the hash code into m segments, wherein the length of each segment is

Bits, where b is the length of the hash code;

and inquiring candidate hash codes with the Hamming distance r from the hash codes in the m segments, and judging whether the objects corresponding to the candidate hash codes are the same as the monitored objects or not based on a trained combined Bayesian classifier.

Preferably, in step S1, the obtaining of the attribute information of the monitored object specifically includes:

and performing semantic segmentation and attribute labeling on the monitored object based on the trained neural network model to acquire attribute information of the monitored object.

Preferably, before storing the unique number information of the monitored object based on the graphic database, the method further includes:

and searching based on the unique characteristic vector and the attribute information to find whether records of the same object exist or not, if the records of the same object are selected, taking the object number of the same object as the number of the monitored object, and if the records of the same object do not exist, reallocating a new object number.

Preferably, the step S3 specifically includes:

based on the object number of the monitored object, acquiring the number of the monitored video from the key value type database, and acquiring the entering time and the leaving time of the monitored object in each monitored video;

and taking the mean value of the entering time and the leaving time as the occurrence time of the monitored object in the corresponding monitored video, acquiring the occurrence time of the monitored object in each monitored video, sequencing the occurrence time, and acquiring the track information of the monitored object based on the address of the camera corresponding to each monitored video.

Preferably, the method further comprises the step of S4, performing suspicious group center mining based on the monitoring object video map:

s401, all monitoring object nodes p connected with each monitoring video node in the graph database₀、p₁…p_nAnd the nodes are sorted from small to large according to the average value of the entering time and the leaving time, and the sorted nodes are

S402, the node sequence obtained in the step S401 is shifted left by one bit and then combined with the original sequence to form a node pair sequence:

inputting the node pair sequence into a memory for calculation, and counting that the difference value of the average value of the entering time and the leaving time of each pair of nodes is less than a threshold value K₁Node number of

Recorded as a key-value pair

S403, repeating the process of the step S402 for each monitoring video, merging all key value pairs, if the key values are the same, adding the corresponding values, and outputting the key value pair with the median value larger than the threshold value K₂A bond of (a);

s404, establishing an adjacent matrix for the monitoring object group with the relationship in the step S403, and counting the degree of each node, wherein the node with the maximum degree is the central node of the monitoring object group.

Preferably, the method further comprises the step of S5, displaying the video map of the monitoring object, the track information of the monitoring object and the suspicious group center mining information on a webpage.

According to a second aspect of the embodiments of the present invention, there is provided a surveillance-oriented video atlas construction and mining apparatus, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the surveillance-oriented video atlas construction and mining method according to the first aspect of the embodiments of the present invention when executing the program.

The embodiment of the invention provides a monitoring-oriented video atlas construction and mining method and equipment, wherein a graph database and a key value type database are constructed to be used as data persistence storage of a knowledge atlas, a deep convolutional neural network is utilized to automatically recognize human faces, the manual review strength is greatly reduced, a shared convolutional neural network and a deep deconvolution network are used to combine with the automatic attribute of a landmark pedestrian object, the unstructured video data is converted into structured graph data to be stored, the figure retrieval based on reviewing and monitoring videos can be well met, the method has the advantages that due to the requirements of track mining, group partner center mining and other security aspects, the attribute information of the nodes can be efficiently inquired, the association relation between the nodes can be efficiently and visually displayed, the respective advantages are brought into play by using a main information database and a detailed information database storage mode, and the data storage and retrieval efficiency is high.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of a monitoring-oriented video graph construction and mining method according to an embodiment of the present invention;

fig. 2 is an effect display diagram of a video map of a monitored object according to an embodiment of the present invention.

FIG. 3 is a diagram illustrating pedestrian re-identification based on segment hash indexing according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating attribute labeling learning according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of video map construction according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of track mining according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

With the rapid development of monitoring technology, video monitoring becomes an important component of a security system, and the demand range of the video monitoring implementation in various industries of society is continuously expanded, so that the video monitoring is developing from traditional public security and banks to the fields of traffic, venues, communities, campuses, houses and the like, and is gradually and closely integrated with the industries of telecommunication, IT and the like, a new technical means is brought to the video monitoring, and a template is provided for constructing a video monitoring integration solution. For example, video monitoring is performed in various public places such as streets and shopping malls, so that emergency can be found in time, real-time handling of the emergency is facilitated, and the stored video monitoring videos can be backtracked and checked, so that corresponding monitoring details can be found out.

The Knowledge map (also called scientific Knowledge map) is a Knowledge domain visualization or Knowledge domain mapping map in the book intelligence world, and is a series of different graphs for displaying the relationship between the Knowledge development process and the structure, describing Knowledge resources and carriers thereof by using a visualization technology, and mining, analyzing, constructing, drawing and displaying Knowledge and the mutual relation among the Knowledge resources and the carriers. Specifically, the knowledge graph is a modern theory which achieves the purpose of multi-discipline fusion by combining theories and methods of applying subjects such as mathematics, graphics, information visualization technology, information science and the like with methods such as metrology introduction analysis, co-occurrence analysis and the like and utilizing a visualized graph to vividly display the core structure, development history, frontier field and overall knowledge framework of the subjects. The method displays the complex knowledge field through data mining, information processing, knowledge measurement and graph drawing, reveals the dynamic development rule of the knowledge field, and provides practical and valuable reference for subject research, so that it is valuable to develop the knowledge map to the video map facing the monitoring video.

The essence of the video map is that data modeling and data analysis are carried out on massive unstructured monitoring video data, and the videos contain various entities such as pedestrians, vehicles, articles and the like; the number of the entities is huge, taking pedestrians as an example, in a busy place with high pedestrian traffic, the number of the pedestrians shot by one monitoring camera in one day can reach ten thousands; the relationships between entities are various, including the relationship between pedestrians and surveillance cameras, and the relationship between pedestrians and surveillance cameras. The existing relational database is difficult to process the massive unstructured monitoring video data, and the dependence on the table in the relational database to maintain the information and realize the subsequent data analysis is not ideal.

Aiming at the defects in the prior art, the embodiment of the invention uses the construction graph database and the key value type database as data persistence storage of the knowledge graph through the monitoring-oriented video graph construction and mining method and equipment, and can quickly and automatically construct the video graph by combining the storage modes of the graph database and the key value type database, and can perform intelligent data mining and data analysis based on the video graph. The following description and description will proceed with reference being made to various embodiments.

The embodiment provides a video map construction and mining method facing monitoring, as shown in fig. 1, including:

s1, acquiring the same monitored object in different monitored videos based on re-identification, and acquiring attribute information of the monitored object;

s2, storing the object number of the monitored object based on the graphic database, storing the object number of the monitored object based on the row key of the key-value type database, storing the attribute information of the monitored object and the corresponding monitored video number based on the sub-key of the key-value type database, and establishing a monitored object video map;

and S3, acquiring the track information of the monitored object based on the monitored object video map.

In this embodiment, a definition of a video map for a surveillance video is proposed, where the video map includes three basic elements: the video graph comprises nodes, edges and attributes, wherein the nodes of the video graph represent entity objects, the entity objects can be pedestrians, vehicles or other moving objects, the edges represent the relationship between the two entity objects, and the nodes and the edges contain the respective attributes. The method provided by the embodiment of the invention can efficiently inquire the attribute information of the nodes by using the storage mode of the graph database, can efficiently and visually display the association relation between the nodes, and can exert respective advantages by using the storage modes of the main information database and the detailed information database, so that the data storage and retrieval efficiency is high.

The video map comprises three basic elements: the video graph comprises nodes, edges and attributes, wherein the nodes of the video graph represent entity objects, the entity objects can be pedestrians or vehicles or other movable objects, the edges represent the relation between the two entity objects, and the nodes and the edges contain the respective attributes. In this embodiment, taking a pedestrian object as an example, the technical solution adopted is as follows: the person appearing under the camera is taken as a node to be used as a part of the graph, the human face characteristics extracted by the deep learning method, the attribute of the person, the appearing time of the person and the leaving time of the person are taken as the attributes of the person node to be stored in a main information database and a detailed information database, a large number of person nodes and the nodes of the existing camera form the nodes of the graph, and the appearance of the person corresponds to the edges of one person node and one camera node on the graph. The graph-based mining algorithm may be converted to a graph computation on a graph. The video map effect display diagram of the monitored object is shown in fig. 2.

On the basis of the above embodiment, acquiring the same monitored object in different monitored videos based on re-identification specifically includes:

In this embodiment, the object to be recognized is, for example, a pedestrian or a vehicle, the unique feature vector of the pedestrian is a face feature (such as iris recognition, face feature recognition, etc.), and the unique feature vector of the vehicle may be a license plate; specifically, taking a pedestrian as an example (the pedestrian in the following embodiments may also be replaced by a movable object such as a vehicle), the specific re-identification includes:

s11, inputting each path of monitoring video into a cluster consisting of a plurality of computers, performing video preprocessing, detecting the faces of pedestrians in the monitoring video and extracting face features;

s12, establishing a segmented index by using the face features obtained in the step S11, and carrying out pedestrian re-identification based on the segmented hash index;

specifically, in this embodiment, MTCNN (Multi-task convolutional neural network) is adopted as a face detection method, and key frames in a video are input to obtain face positions in video frames, and the face is aligned and corrected. The adopted face feature extraction network inputs aligned and corrected faces, the result of the last full connection layer of the network model is taken as the real-value feature of a face picture, if the face picture is a vehicle, the face picture can be corrected by the sum of the unique identification features corresponding to the vehicle, such as a license plate, and the face processing is replaced by the license plate processing.

Specifically, in this embodiment, the step S11 specifically includes:

s111, distributing all monitored video streams to be processed to a computer cluster consisting of a plurality of computers for processing, wherein the distribution mode is that the video streams in each computer are evenly distributed according to the number of monitoring cameras, the video streams are divided into picture streams according to frames, each key frame is divided into every 30 frames, and the rest frames are non-key frames;

s112, inputting the key frame obtained in the step S111 into an MTCNN network, judging whether a final face window is generated in output or not, and if not, finishing the identification; if so, extracting a final face window image, correcting and aligning the face to the center, and storing the corrected face window image as a specified resolution;

and S113, extracting the face picture and the real-value feature vector of the mirror image of the picture by using a face feature extraction network.

On the basis of the foregoing embodiments, establishing a segment hash index based on the unique feature vector, and performing re-identification on a monitored object based on the segment hash index specifically includes:

converting the unique feature vector into a hash code, and equally dividing the hash code into m segments, wherein the length of each segment is b/m bits, and b is the length of the hash code;

Because the embodiment of the present invention includes a plurality of cameras, and the number of pedestrians included in the surveillance video of each camera is huge as time increases, a pedestrian re-identification technology based on segment indexes is used, and the segment indexes greatly improve the speed of retrieving whether the database includes the target face by establishing the correspondence between the face features and the pedestrians based on the segments in an off-line manner, in which step S12 mainly includes the following steps:

s121, converting the unique characteristic vector obtained in the step into a hash code;

s122, assuming that the length of the hash code y obtained in the step S121 is b bits, dividing the hash code y into m segments, and then the length of each segment is

Or

A bit;

s123, a rough matching stage is carried out, and all hash codes with the Hamming distance r from the hash code y in the step S122 are inquired in the m sections;

s124, combining the m sections of query results to form a candidate set;

s125, performing an accurate matching stage on the candidate set, assuming x₁And x₂And respectively representing real-value characteristics of two to-be-verified pedestrian objects, and judging whether the two images are the same pedestrian by using a combined Bayesian classifier.

Specifically, in the present embodiment, when two objects belong to the same pedestrian, it is assumed that H is_sIf true; otherwise, when two objects do not belong to the same pedestrian, assume H_uThis is true. Given this assumption, two features x are given₁And x₂In practice, the probability that two hypotheses are most likely to occur is compared, which formally can be expressed as observing x₁And x₂One log-likelihood ratio of:

in the above formulas (1) and (2), when r (x)₁,x₂) When t is greater than t, x is determined₁And x₂Corresponding to the same pedestrian object, otherwise, not the same pedestrian object; r (x1, x2) is a log-likelihood ratio of observed x1 and x 2; a and G denote model parameters obtained by parameter learning; s_μAnd SA covariance matrix is represented.

In this embodiment, the purpose of using the segment hash index is to accelerate the search speed, the segment hash index may be regarded as a first-layer screening, the combined bayesian classifier is used to judge that the search is a second-layer screening, and a candidate set obtained after the first-layer screening is an input data set of the second-layer screening. And in the process of training the combined Bayesian classifier, continuously iterating and solving according to sample data, obtaining model parameters A and G through parameter learning, and storing the model parameters A and G as model parameters. And a schematic diagram of an algorithm for selecting a suitable threshold t according to the experimental results is shown in fig. 3.

On the basis of the foregoing embodiments, acquiring the attribute information of the monitored object specifically includes:

s13, performing semantic segmentation and attribute labeling on the monitored object based on the trained neural network model, and acquiring attribute information of the monitored object.

In this embodiment, a human body structure is semantically segmented by combining an RPN (region suggestion network) network and a Deep deconvolution network, which specifically includes the following steps:

s131, sending the video key frames obtained in the steps into a public deep convolution neural network, and extracting universal local features and unique abstract features of pedestrians in a forward direction to obtain a feature activation matrix of a scene;

s132, regressing the pedestrian region by using a depth RPN, expressing the interest region of each dynamic object by using a quadruple, and performing space pooling on the feature activation matrix according to the interest region to obtain human body features which are consistent in size and independent from each other; regressing the pedestrian region by using a depth RPN network, expressing the interest region of each dynamic object by using a quadruple [ x, y, z, h ], and performing spatial pooling on the feature activation matrix according to the interest region to obtain human body features which are consistent in size and independent from each other;

s133, reconstructing the shape distribution of the human body by using a depth deconvolution network which is in mirror symmetry with the depth convolution network;

s134, through a large amount of sample training, in the restoration process, pixel points of the same part of a human body or a vehicle are gradually gathered, pixel points of different parts are gradually far away, and finally contour segmentation at a pixel level is presented;

and S135, in the attribute classification section, performing attribute classification on each segmented object by using a fully-connected and Softmax combined classifier, completing attribute labeling of the pedestrian, and repeating the labeling process to obtain the body characteristics of the pedestrian. In the attribute classification section, the attribute classification is carried out on each segmented object by using a fully-connected and Softmax combined classifier, the attribute marking of the pedestrian is completed, the marking process is repeated, and the body characteristics of the pedestrian, such as the hair length, the hair color, the coat color, the trousers color and the like, can be obtained. If the vehicle is a vehicle, the vehicle type, the vehicle brand, the vehicle color, the tire type and the like can be adopted.

The RPN is a deep convolution regressor, which is used for generating propulses, all sliding windows share the RPN, and by adopting an anchor mechanism, a propulses is generated by marking positive samples in advance and calculating L oss value, a deep DNN is a deconvolution network which is stacked with a deep convolution network in a mirror image mode, the morphological distribution of a human body can be reconstructed, so that the human body is subjected to semantic segmentation, the result of the semantic segmentation is used as a mask for training a Softmax classifier corresponding to each part in attribute classification, the classification result corresponding to each part can be obtained, and the whole process of the semantic segmentation and the attribute labeling is shown in FIG. 4.

On the basis of the above embodiments, before storing the unique number information of the monitoring object based on the graphic database, the method further includes:

In this embodiment, taking a pedestrian as an example, step S2 specifically includes:

s201, combining the obtained face features (unique feature vectors), hash features and attribute features (information) to form structured data;

s202, performing rough segmented hash index and accurate matching, and searching whether the same person record exists in a face library;

s203, allocating a figure number as a unique number representing the figure in the database according to the index result, if the figure number exists, not allocating the number, and if not, re-allocating a new number;

s204, storing the figure number obtained in the step S203 in a graph database as the number of the node;

and S205, taking the figure number obtained in the step S203 as a row key in a key value type database, and respectively taking the hash feature and the attribute name as sub-key names to store corresponding sub-key values.

In the embodiment, the monitoring-oriented video atlas construction and mining method can automatically perform face recognition by utilizing a deep convolutional neural network, greatly reduce the manual review strength, convert unstructured video data into structured graph data for storage by using a shared convolutional neural network and a deep deconvolution network in combination with the automatic annotation of attributes of pedestrian objects, and can well meet the security requirements of people retrieval, track mining, group partner center mining and the like based on review monitoring videos. By using the storage mode of the graph database, the attribute information of the nodes can be efficiently inquired, the association relationship between the nodes can be efficiently and visually displayed, the respective advantages are exerted by using the storage modes of the main information database and the detailed information database, and the data storage and retrieval efficiency is high.

Before a pedestrian video atlas is constructed, relevant information of pedestrians in a video is extracted in a structured mode, whether an existing database contains detected pedestrians or not can be obtained through segmented hash index and accurate matching, numbers of the pedestrians distributed at the backstage are calculated, main information is stored in an image database, detailed information is stored in a key value type database, the advantages of the two databases are brought into full play, the image database is good at storing an image structure but not beneficial to storing a large amount of information, and the key value type database does not have the image structure but can store a large amount of information. The three tables in the key-value database are a character table, a camera table and a relation table (the relation between characters and cameras, that is, which cameras the characters appear in), and the schematic diagram of the map construction is shown in fig. 5.

The specific structure of person _ table is shown in table 1 below:

TABLE 1 person _ Table Structure

RowKey	Column
		Person_Id	Person_hash_feature
Person_Id	Person_attribute_1
		Person_Id	Person_attribute_2
…	…
		Person_Id	Person_cameras

The specific structure of the camera _ table is as follows:

TABLE 2 Camera _ Table Structure

RowKey	Column
		Camera_Id	Camera_producor
Camera_Id	Camera_location
		Camera_Id	Camera_picture
Camera_Id	Camera_latitudeAndLongitude

The specific structure of relationship _ table is as follows:

table 3 relational _ table structure

RowKey	Column
		Person_Id+Camera_Id	Person_in_time-count
Person_Id+Camera_Id	Person_out_time-count
		Person_Id+Camera_Id	Person_snapshot-count
Person_Id+Camera_Id	count

On the basis of the foregoing embodiments, acquiring trajectory information of a monitored object based on the monitored object video map specifically includes:

Specifically, in this embodiment, after the object-based video map is constructed, the specified pedestrian or vehicle trajectory may be mined on the monitoring object video map, and the step S3 specifically includes:

s301, according to the serial number p of the person, the serial number C of the camera is obtained in the person table of the key-value type database₁、C₂、C₃；

S302, the person number p and the camera number C obtained in the step S301₁、C₂、C₃Combining the new row keys, continuously inquiring two columns of time of entering the camera and time of leaving in a relation table, and averaging the two times to obtain the time t of the character appearing in front of each camera₁、t₂、t₃；

S303, obtaining the time t according to the step S302₁、t₂、t₃And (4) outputting the addresses of the corresponding cameras by inquiring the camera table in a descending order.

Specifically, in this embodiment, by automatically associating the pedestrian captured in front of each camera with the camera and having corresponding records in both the graph database and the key-value database, the pedestrian can obtain each associated camera and the corresponding time appearing in front of the camera according to the target pedestrian, thereby obtaining the track path that is based on the camera location and is strung together along the time line.

As shown in fig. 6, there is information related to each camera in the camera table, the information is static information and basically does not change, and the number of the camera where the pedestrian appears is obtained according to the time sequence, so that the tracks of the pedestrian in a period of time can be connected.

On the basis of the above embodiments, the method further includes:

s4, performing suspicious group center mining based on the monitoring object video map:

s401, comparing all character nodes p connected with each camera node in graph database₀、p₁…p_nThe nodes are sorted from small to large according to the average value of the time of entering the camera and the time of leaving, and the sorted nodes are

inputting the node pair sequence into a memory for calculation, and counting that the difference value of the average value of the time of entering a camera and the time of leaving the camera of each pair of nodes is less than a threshold value K₁Node number of

Recorded as a key-value pair

S403, repeating the process of the step S402 for each camera, merging all key value pairs, if the key values are the same, adding the corresponding values, and outputting the key value pair with the median value larger than the threshold value K₂A bond of (a);

s404, establishing an adjacency matrix for the person group with the relationship in the step S403, and counting the degree of each node, wherein the node with the maximum degree is the central node of the person group.

The method comprises the steps that the target of the algorithm is divided into two stages, the first stage is to search two nodes with a co-occurrence relation in a video map, data division is carried out on the two nodes according to the serial number of a camera, pedestrians in each group belong to pedestrians appearing in front of the same camera, sequencing is carried out in the groups according to time, after sequencing is completed, the front and rear pedestrian targets meeting the time difference in each group are counted, and then the same two pedestrian targets in all the groups are combined; the second stage is to mine the central persons in the group partners with the co-occurrence relationship, and the section calculates the degree of the node by the algorithm processed on the graph, thereby obtaining the node with the maximum degree. The first part of the algorithm can be calculated in the memory in an iterative manner, so that the calculation efficiency is improved, and the second part can be replaced by other algorithms to achieve the aim of improving the calculation speed.

On the basis of the above embodiments, the method further includes:

s5, displaying the map established in the steps S2-S4 and the mining result on a webpage.

S501, querying the numbers of all nodes including the character node p on the graph database by using corresponding query language₀、p₁…p_nAnd a camera node C₀、C₁…C_n；

S502, finding the basic information of all people and the basic information of the camera nodes in a key value type database to form a data file in a json format;

and S503, the front end of the webpage analyzes the data file in the json format in the step S502 by using the D3.js, and displays the content of the video map on the webpage.

The embodiment discloses monitoring-oriented video atlas constructing and mining equipment, which comprises a memory, a processor and a computer program, wherein the computer program is stored on the memory and can run on the processor, and the processor executes the steps of executing the monitoring-oriented video atlas constructing and mining method described in the method embodiments. Examples include:

The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of executing the steps of the monitoring-oriented video atlas construction and mining method according to the above method embodiments, for example, the steps of:

The present embodiment further provides a non-transitory computer-readable storage medium, which stores computer instructions, where the computer instructions cause the computer to perform the steps of the monitoring-oriented video map building and mining method described in the foregoing method embodiments, for example, the steps include:

To sum up, the embodiment of the present invention provides a method and a device for constructing and mining a video map facing monitoring, wherein a map database and a key-value database are used as data persistence storage of a knowledge map, a deep convolutional neural network is used for automatically recognizing human faces, the manual review strength is greatly reduced, a shared convolutional neural network and a deep deconvolution network are used in combination with the automatic annotation of attributes of pedestrian objects, the unstructured video data is converted into structured map data for storage, the security requirements of people retrieval, track mining, group center mining and the like based on the review monitoring video can be well met, the attribute information of nodes can be efficiently inquired, the association relationship between the nodes can be efficiently visualized, the storage modes of a main information database and a detailed information database are used, and respective advantages are exerted, the data storage and retrieval efficiency is high.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A monitoring-oriented video map construction and mining method is characterized by comprising the following steps:

s3, acquiring track information of the monitored object based on the monitored object video map;

the step S3 includes:

2. The surveillance-oriented video atlas building and mining method of claim 1, wherein the surveillance object is any object in surveillance video, including pedestrians, vehicles.

3. The monitoring-oriented video graph building and mining method according to claim 1, wherein in step S1, obtaining the same monitoring object in different monitoring videos based on re-recognition specifically comprises:

4. The monitoring-oriented video graph building and mining method according to claim 3, wherein a segment hash index is built based on the unique feature vector, and monitoring object re-identification is performed based on the segment hash index, specifically comprising:

Bits, where b is the length of the hash code;

5. The monitoring-oriented video graph constructing and mining method according to claim 3, wherein in step S1, acquiring attribute information of the monitoring object specifically includes:

6. The surveillance-oriented video atlas building and mining method of claim 3, before storing the unique number information of the surveillance object based on a graph database, further comprising:

7. The monitoring-oriented video atlas constructing and mining method of claim 3, wherein the step S3 specifically comprises:

8. The monitoring-oriented video graph building and mining method according to claim 7, further comprising S4, performing suspicious group center mining based on the monitoring-object video graph:

s401, all monitoring object nodes p connected with each monitoring video node in the graph database₀、p₁…p_nAnd the nodes are sorted from small to large according to the appearance time, and the sorted nodes are

Recorded as a key-value pair

9. The monitoring-oriented video graph building and mining method according to claim 8, further comprising S5, displaying the monitoring object video graph, the track information of the monitoring object, and the suspicious group center mining information on a webpage.

10. A surveillance-oriented video atlas construction and mining apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the surveillance-oriented video atlas construction and mining method according to any of claims 1 to 9.