CN114677414A

CN114677414A - Multi-camera target matching method and system with overlapped vision fields

Info

Publication number: CN114677414A
Application number: CN202111406029.6A
Authority: CN
Inventors: 宁艳; 陈志明; 徐嘉星; 王宁
Original assignee: Jiangsu Fangtian Power Technology Co Ltd
Current assignee: Jiangsu Fangtian Power Technology Co Ltd
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-06-28

Abstract

The embodiment of the application discloses a multi-camera target matching method and a system with overlapped vision fields, wherein the method comprises the following steps: calculating the visual field boundary of the two cameras according to the first 5 frames of information of the camera 1 and the camera 2; extracting foreground targets in a video of the camera 1, and fusing the features of the foreground targets according to a multi-feature fusion rule to obtain fused feature vectors; calculating projection points of each target in the camera 1 in the camera 2, determining a reliable area of the target to be matched, and then calculating a multi-feature fusion vector in the reliable area; calculating Euclidean distance of the multi-feature fusion vector, completing multi-camera target matching of view overlapping, taking a target which is closest to the multi-camera target and within a set threshold range as a matching target, and storing view boundary parameters; and updating the view boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the view boundary parameter of the current frame, and updating the view boundary model.

Description

Multi-camera target matching method and system with overlapped vision fields

Technical Field

The present disclosure relates to the field of intelligent video surveillance technologies, and in particular, to a method and a system for matching multiple cameras with overlapped viewing areas.

Background

With the rapid development of smart power grids and smart cities, the intelligent video monitoring technology is in a rapid development stage, and makes a great contribution to security monitoring. However, the number of the monitoring cameras and the monitoring range are gradually increased, so that the labor intensity of the analysis work of camera installation, offline calibration, measurement and video monitoring is increased. In addition, with the complexity and diversity of monitoring scenes, the traditional single-camera target tracking and positioning method faces huge challenges in practical application. In recent years, a multi-camera collaborative video surveillance technology has attracted more and more people's attention, and vision-based positioning and tracking methods have gradually transitioned from the single-camera field to the multi-camera field, especially when there is occlusion between objects in a surveillance scene. Therefore, the research on the target positioning and tracking technology based on the cooperation of multiple cameras in intelligent video monitoring has great significance.

In the prior art, a homography-based target space positioning system and a main axis-based target space positioning system cannot independently divide targets in a foreground under the condition of severe shielding, possibly one foreground region contains a plurality of targets, and if a rectangular frame is reused to frame the foreground region, the rectangular frame may contain a plurality of targets, and at the moment, four vertexes of the rectangular frame cannot replace foreground information of any target, or wrong foreground information, so that a reliable target positioning result cannot be obtained by using the four vertexes of the rectangular frame. The prior art has defects and needs to be improved.

Disclosure of Invention

One of the embodiments of the present specification provides a method for multi-camera target matching with overlapping fields of view, the method comprising: s1, calculating the visual field boundary of the two cameras according to the previous n frames of information of the camera 1 and the camera 2, and respectively collecting foreground images containing the target through the camera 1 and the camera 2, wherein n is a positive integer; s2, extracting foreground targets in the video of the camera 1, fusing the features of each foreground target according to a multi-feature fusion rule to obtain a fused feature vector, and detecting and framing the foreground image through a Vibe detection algorithm; s3, sampling the head foreground in the foreground image in the step S2 to form head sampling points, projecting the head sampling points, calculating the projection points of foreground targets to be matched in the camera 1 in the camera 2, determining the reliable regions of the targets to be matched, and then calculating multi-feature fusion vectors in the reliable regions; s4, connecting the projection points and the camera center projection point, calculating candidate target foot points, weighting and summing the candidate target foot points to obtain target foot points, calculating the Euclidean distance of the multi-feature fusion vector, completing multi-camera target matching with overlapped vision fields, taking the target which is closest to the distance and in the range of the set threshold value as a matching target, and storing the vision field boundary parameter; s5, sampling the chest foreground in the foreground image in the step S2 to form a chest sampling point, projecting the chest sampling point, calculating the gravity center of a foot point to update a field boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the field boundary parameter of the current frame, and updating a field boundary model; and S6, repeating the step S2 until the video is finished.

One of the embodiments of the present specification provides a multi-camera object matching system with overlapped views, the system being used for the multi-camera object matching method with overlapped views, the system comprising: the first calculation module is used for calculating the view boundary of the two cameras according to the previous n frames of information of the camera 1 and the camera 2, and respectively acquiring foreground images containing the target through the camera 1 and the camera 2, wherein n is a positive integer; the extraction module is used for extracting foreground targets in the video of the camera 1, fusing the features of the foreground targets according to a multi-feature fusion rule to obtain fused feature vectors, and detecting and framing the foreground images through a Vibe detection algorithm; the sampling module is used for sampling the head foreground in the foreground image to form a head sampling point, projecting the head sampling point, calculating the projection point of each foreground target to be matched in the camera 1 in the camera 2, determining the reliable region of the target to be matched, and then calculating the multi-feature fusion vector in the reliable region; the second calculation module is used for connecting the projection points and the camera center projection points, calculating candidate target foot points, carrying out weighted summation on the candidate target foot points to obtain target foot points, calculating Euclidean distances of multi-feature fusion vectors, completing multi-camera target matching of view overlapping, taking a target which is closest to the distance and in a set threshold range as a matched target, and storing view boundary parameters; the third calculation module is used for sampling the chest foreground in the foreground image to form a chest sampling point, projecting the chest sampling point, calculating the gravity center of a foot point to update a view boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the view boundary parameter of the current frame, and updating a view boundary model; and the repeating module is used for repeating the operation of the extracting module until the video is finished.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments, like numbers indicate like structures, wherein:

FIG. 1 is an exemplary flow diagram of a multi-camera target matching method with overlapping fields of view, according to some embodiments of the present description;

FIG. 2 is another exemplary flow diagram of a method of multi-camera target matching with overlapping fields of view, according to some embodiments of the present description;

FIG. 3 is a block diagram of a multi-camera object matching method with overlapping fields of view, according to some embodiments of the present description;

FIG. 4 is a schematic diagram of gradient direction partitioning, shown in accordance with some embodiments herein.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system," "device," "unit," and/or "module" as used herein is a method for distinguishing between different components, elements, parts, portions, or assemblies of different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not to be taken in a singular sense, but rather are to be construed to include a plural sense unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" are intended to cover only the explicitly identified steps or elements as not constituting an exclusive list and that the method or apparatus may comprise further steps or elements.

Flowcharts are used in this specification to illustrate the operations performed by the system according to embodiments of the present specification. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

Fig. 1 is an exemplary flow diagram of a method of multi-camera object matching with overlapping fields of view according to some embodiments described herein, and fig. 2 is another exemplary flow diagram of a method of multi-camera object matching with overlapping fields of view according to some embodiments described herein. The method comprises the following steps:

s1, calculating the visual field boundary of the two cameras according to the previous n frames of information of the camera 1 and the camera 2, and respectively collecting foreground images containing the target through the camera 1 and the camera 2, wherein n is a positive integer;

s2, extracting foreground targets in the video of the camera 1, fusing the features of each foreground target according to a multi-feature fusion rule to obtain a fused feature vector, and detecting and framing the foreground image through a Vibe detection algorithm;

s3, sampling the head foreground in the foreground image in the step S2 to form head sampling points, projecting the head sampling points, calculating the projection points of each foreground target to be matched in the camera 1 in the camera 2, determining the reliable area of the target to be matched, and then calculating the multi-feature fusion vector in the reliable area;

s4, connecting the projection points and the central projection point of the camera, calculating candidate target foot points, weighting and summing the candidate target foot points to obtain target foot points, calculating Euclidean distance of the multi-feature fusion vector, completing multi-camera target matching of overlapped view fields, taking the target which is closest to the distance and in a set threshold range as a matching target, and storing view field boundary parameters;

S5, sampling the chest foreground in the foreground image in the step S2 to form a chest sampling point, projecting the chest sampling point, calculating the center of gravity of the foot point to update the vision boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the vision boundary parameter of the current frame, and updating the vision boundary model;

and S6, repeating the step S2 until the video is finished.

The method further comprises the following steps:

in step 210, the frame images of the two cameras at the same time are set as image 1 and image 2.

Step 220, extracting sift matching key points of the image 1 and the image 2, and filtering by using a RANSAC algorithm.

Step 230, selecting 4 pairs of space coplanar point pairs from the filtered sift matching key point pairs, wherein any 3 points are not collinear, and setting I₁And I₂Are two independent projection invariant.

Step 240, calculating the object with the smallest distance from the boundary of the field of view, and judging whether the object with the smallest distance is blocked.

In some embodiments, the overall framework of the present invention is an adaptive view space model based target matching method, i.e. the parameters of the view boundary model can be updated in time according to the optimization method of the present invention. This section mainly comprises two modules: firstly, initializing a view boundary model, and secondly, updating model parameters.

The frame images at the same time in the two cameras are referred to as image 1 and image 2. Firstly, sift matching key points of the image 1 and the image 2 are extracted and filtered by using RANSAC algorithm. And 4 pairs of spatially coplanar point pairs are selected from the filtered key point pairs, and any 3 points are not collinear. Let I₁And I₂For two independent projection invariants, the following formula (1) is calculated:

wherein the content of the first and second substances,

is a point

Coordinates in graph i. In the two background images to be matched, 5 points in the known image 1

Calculating a projection invariant I₁And I₂. (ii) a 5 points in image 1

Corresponding to the relative position of the fifth point in the image 2, respectively, the corresponding position of the 5 th point in the image 2 can be calculated according to the two projection invariants.

The steps of calculating the candidate target foot point are as follows:

GPS coordinates gC1 and g C2 of a vertical projection point of the center of the camera on a scene plane are measured by a GPS receiver and are called as a central projection point of the camera for short;

projecting two vertexes on a rectangular frame representing a pedestrian foreground area in each camera foreground onto a scene plane through a homography matrix to obtain T₁ ⁰、T₁ ¹、T₀ ²、T₀ ²；

Connecting the projection points with corresponding central projection points of the camera to obtain 4 straight lines on the scene plane _gC₁T₀ ¹、_gC₁T₁ ¹、_gC₂T₀ ²、_gC₂T₁ ²；

The intersection points p1, p2, p3 and p4 of the straight lines are calculated, and the coordinates of the intersection points are used as candidate coordinates of the foot points.

Note that l1 and l2 are two homographic projection lines, and the endpoints of the straight line are (x0, y0), (x1, y1), (x2, y2), (x3, y3), l 1: y — k1(x-x0) + y0, l 2: y is k2(x-x2) + y2, k1 and k2 are slopes of two straight lines which intersect at one point (x, y),

combined stand

It is possible to obtain a solution of,

the candidate target foot point can be obtained by solving the intersection point of the connecting lines in the cameras gC1 and gC2 according to the method. After the targets are not completely segmented and matched, namely after foreground detection, the targets are not completely segmented and are not matched with each other, and the foreground area framed by the rectangular frame cannot represent the foreground area of any pedestrian.

In some embodiments, a may be calculated₁、b₁、c₁And a₂、b₂、c₂Two sets of parameters, firstly, selecting the 1 st, 3 rd and 5 th frame images of two sets of videos, calculating the view boundary parameter, and recording the nth (n)>5) the frame image view boundary parameter is

FIG. 3 is a block diagram of a multi-camera object matching method with overlapping fields of view, according to some embodiments described herein. As shown in fig. 3, the system includes: the first calculation module is used for calculating the view boundary of the two cameras according to the previous n frames of information of the camera 1 and the camera 2, and respectively acquiring foreground images containing the target through the camera 1 and the camera 2, wherein n is a positive integer; the extraction module is used for extracting foreground targets in the video of the camera 1, fusing the features of the foreground targets according to a multi-feature fusion rule to obtain fused feature vectors, and detecting and framing the foreground images through a Vibe detection algorithm; the sampling module is used for sampling the head foreground in the foreground image to form a head sampling point, projecting the head sampling point, calculating the projection point of each foreground target to be matched in the camera 1 in the camera 2, determining the reliable region of the target to be matched, and then calculating the multi-feature fusion vector in the reliable region; the second calculation module is used for connecting the projection points and the central projection point of the camera, calculating candidate target foot points, carrying out weighted summation on the candidate target foot points to obtain target foot points, calculating Euclidean distances of multi-feature fusion vectors, completing multi-camera target matching with overlapped vision fields, taking a target which is closest to the distance and in a set threshold value range as a matched target, and storing vision field boundary parameters; the third calculation module is used for sampling the chest foreground in the foreground image to form a chest sampling point, projecting the chest sampling point, calculating the gravity center of a foot point to update a view boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the view boundary parameter of the current frame, and updating a view boundary model; and the repeating module is used for repeating the operation of the extracting module until the video is finished.

In some embodiments, the system further comprises: the device comprises a setting module, a judging module and a display module, wherein the setting module is used for setting frame images at the same moment in two cameras as an image 1 and an image 2; the second extraction module is used for extracting sift matching key points of the image 1 and the image 2 and filtering by using a RANSAC algorithm; a matching module for selecting 4 pairs of point pairs coplanar in space from the filtered sift matching key point pairs, wherein any 3 points are not collinear, and I is set₁And I₂Two independent projection invariants; and the fourth calculation module is used for calculating the target with the minimum distance from the boundary of the visual field and judging whether the target with the minimum distance is blocked.

In some embodiments, the system further comprises: a fifth calculation module for calculating a₁、b₁、c₁And a₂、b₂、c₂Two sets of parameters, firstly, selecting the 1 st, 3 rd and 5 th frame images of two sets of videos, calculating the view boundary parameter, and recording the nth (n)>5) the frame image view boundary parameter is

In some embodiments, the system further comprises: a coordinate module for measuring the GPS coordinate gC of the vertical projection point of the camera center on the scene plane by using the GPS receiver₁、gC₂The central projection point of the camera is called for short; and the projection module is used for projecting two vertexes on a rectangular frame representing a pedestrian foreground area in each camera foreground onto a scene plane through a homography matrix to obtain: t is ₀ ¹、T₁ ¹、T₀ ²、T₀ ²(ii) a The connecting module is used for connecting the projection points and the corresponding central projection points of the camera to obtain 4 straight lines positioned on a scene plane; and the sixth calculating module is used for calculating the intersection points p1, p2, p3 and p4 of the straight lines and taking the intersection point coordinates as candidate foot point coordinates.

In the view angle 1, the two objects are shielded to a relatively large degree, and the vertex of the rectangular frame cannot represent the head point or the foot point of any pedestrian. If the candidate target foot point is found according to the method of the first case, a large positioning error will be caused. Since a small error in the two-dimensional image is amplified in the scene plane, for example, for a distance of 20 meters, there may be 10 pixels in the image plane, and each pixel represents a length of 2 meters, that is, a difference of one pixel in the image plane may cause an error range of 2 meters in the scene plane, which is unacceptable.

In the video monitoring scene, only the foreground of the target can be obtained, and if high-precision positioning and tracking are needed, a large amount of workload is needed to complete target matching. A commonly used matching algorithm is Surf feature point extraction and matching algorithm. Matching is usually an important task in target location and tracking, but the matching algorithm usually involves a large number of non-linear optimization processes, which has great limitations for real-time target location and tracking. In recent years, with the expansion of video monitoring range and the increase of real-time monitoring demand, the research on non-matching target positioning and tracking is increasing. Meanwhile, with the complexity of monitoring scenes, target positioning and tracking under the shielding condition also become a research hotspot. Most of research focuses on obtaining candidate target foot points by sampling the foreground, then performing footprint analysis on the candidate target foot points, and finally positioning the position of the target in the scene plane.

Therefore, in order to further reduce the positioning error under the condition, the invention provides a method for sampling pixels in the side length range on the rectangular frame representing the foreground to acquire candidate target foot points aiming at the target positioning under the condition of severe shielding.

FIG. 4 is a schematic illustration of gradient direction partitioning, shown in accordance with some embodiments herein.

As shown in fig. 4, the target image may be decomposed into a number of cell units, and a gradient distribution histogram thereof may be calculated for each cell. The gradient distribution histogram of a cell is calculated, the gradient direction (0-360 degrees) is divided into 4 value domain band bins (figure 4), the gradient value and the gradient direction of each pixel in the cell are calculated, and the gradient values of the pixels are accumulated into the corresponding gradient direction bins, so that the gradient distribution histogram of the cell can be obtained.

Adjacent 2 × 2 cells may be combined into a block, as shown in fig. 4, and connected in series as a block gradient distribution histogram feature, and normalized by means of L2-norm. And finally, combining the histogram features of each block to form the final HOG feature of the target image.

In some embodiments, the target image may be unified into an image with a size of 128 × 64 pixels by a bilinear interpolation method, and then the color histogram feature and the HOG feature may be extracted according to the above method, respectively. The color histogram is characterized by H and S channels in the HSV color space, and the color histogram is characterized by 8 × 8 × 8 ═ 512 dimensions; in the HOG feature extraction, each cell has a size of 8 × 8, and if adjacent 2 × 2 cells form a block, the HOG feature is 128 × 2 × 2 × 4-2048 dimensions. Finally, the color histogram feature and the HOG feature are combined to form a final 512+ 2048-2560-dimensional target feature vector.

Having thus described the basic concepts, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad description. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not specifically described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Additionally, the order in which elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described in this specification, unless explicitly stated in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the foregoing description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features are required than are expressly recited in the claims. Indeed, the features of an embodiment may be less than all of the features of a single embodiment disclosed above.

Where numerals describing the number of components, attributes or the like are used in some embodiments, it is to be understood that such numerals used in the description of the embodiments are modified in some instances by the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A method of multi-camera object matching with overlapping fields of view, the method comprising:

s3, sampling the head foreground in the foreground image in the step S2 to form a head sampling point, projecting the head sampling point, calculating the projection point of each foreground target to be matched in the camera 1 in the camera 2, determining the reliable area of the target to be matched, and then calculating the multi-feature fusion vector in the reliable area;

s4, connecting the projection points and the central projection point of the camera, calculating candidate target foot points, weighting and summing the candidate target foot points to obtain target foot points, calculating Euclidean distance of the multi-feature fusion vector, completing multi-camera target matching with overlapped view fields, taking the target which is closest to the distance and in a set threshold range as a matching target, and storing view field boundary parameters;

S5, sampling the chest foreground in the foreground image in the step S2 to form a chest sampling point, projecting the chest sampling point, calculating the gravity center of a foot point to update a view boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the view boundary parameter of the current frame, and updating a view boundary model;

and S6, repeating the step S2 until the video is finished.

2. The method of claim 1, wherein the method further comprises:

setting frame images at the same moment in the two cameras as an image 1 and an image 2;

extracting sift matching key points of the image 1 and the image 2, and filtering by using a RANSAC algorithm;

selecting 4 pairs of space coplanar point pairs from the filtered sift matching key point pairs, wherein any 3 points are not collinear, and setting I₁And I₂Two independent projection invariants;

and calculating the target with the minimum distance from the boundary of the visual field, and judging whether the minimum target is blocked or not.

3. The method of claim 1, wherein the method further comprises:

calculating a₁、b₁、c₁And a₂、b₂、c₂Two sets of parameters, firstly, selecting the 1 st, 3 rd and 5 th frame images of two sets of videos, calculating the view boundary parameter, and recording the nth (n) >5) frame image view boundary parameter of

4. The method of claim 1, wherein the method further comprises:

GPS coordinates gC of vertical projection point of camera center on scene plane is measured by GPS receiver₁、gC₂The central projection point of the camera is called for short;

projecting two vertexes on a rectangular frame representing a pedestrian foreground area in each camera foreground onto a scene plane through a homography matrix to obtain:

connecting the projection points with corresponding central projection points of the camera to obtain 4 straight lines positioned on a scene plane;

the intersection points p1, p2, p3 and p4 of the straight lines are calculated, and the intersection point coordinates are set as candidate foot point coordinates.

5. A multi-camera object matching system with overlapping fields of view, the system being configured to perform the method of any of claims 1-4, the system comprising:

the first calculation module is used for calculating the view boundary of the two cameras according to the previous n frames of information of the camera 1 and the camera 2, and respectively acquiring foreground images containing the target through the camera 1 and the camera 2, wherein n is a positive integer;

the extraction module is used for extracting foreground targets in the video of the camera 1, fusing the features of the foreground targets according to a multi-feature fusion rule to obtain fused feature vectors, and detecting and framing the foreground images through a Vibe detection algorithm;

The sampling module is used for sampling the head foreground in the foreground image to form a head sampling point, projecting the head sampling point, calculating the projection point of each foreground target to be matched in the camera 1 in the camera 2, determining the reliable region of the target to be matched, and then calculating the multi-feature fusion vector in the reliable region;

the second calculation module is used for connecting the projection points and the camera center projection points, calculating candidate target foot points, carrying out weighted summation on the candidate target foot points to obtain target foot points, calculating Euclidean distances of multi-feature fusion vectors, completing multi-camera target matching of view overlapping, taking a target which is closest to the distance and in a set threshold range as a matched target, and storing view boundary parameters;

the third calculation module is used for sampling the chest foreground in the foreground image to form a chest sampling point, projecting the chest sampling point, calculating the gravity center of a foot point to update a view boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the view boundary parameter of the current frame, and updating a view boundary model;

and the repeating module is used for repeating the operation of the extracting module until the video is finished.

6. The system of claim 5, wherein the system further comprises:

the device comprises a setting module, a processing module and a display module, wherein the setting module is used for setting frame images at the same moment in two cameras as an image 1 and an image 2;

the second extraction module is used for extracting sift matching key points of the image 1 and the image 2 and filtering by using a RANSAC algorithm;

a matching module for selecting 4 pairs of space coplanar point pairs from the filtered sift matching key point pairs, wherein any 3 points are not collinear, and I is set₁And I₂Two independent projection invariants;

and the fourth calculation module is used for calculating the target with the minimum distance from the boundary of the visual field and judging whether the target with the minimum distance is blocked.

7. The system of claim 5, wherein the system further comprises:

a fifth calculation module for calculating a₁、b₁、c₁And a₂、b₂、c₂Two sets of parameters, firstly, selecting the 1 st, 3 rd and 5 th frame images of two sets of videos, calculating the view boundary parameter, and recording the nth (n)>5) the frame image view boundary parameter is

8. The system of claim 5, wherein the system further comprises:

a coordinate module for measuring the GPS coordinate gC of the vertical projection point of the camera center on the scene plane by using the GPS receiver ₁、gC₂The central projection point of the camera is called for short;

and the projection module is used for projecting two vertexes on a rectangular frame representing a pedestrian foreground area in each camera foreground onto a scene plane through a homography matrix to obtain:

the connecting module is used for connecting the projection points and the corresponding central projection points of the camera to obtain 4 straight lines positioned on a scene plane;

and the sixth calculating module is used for calculating the intersection points p1, p2, p3 and p4 of the straight lines and taking the intersection point coordinates as candidate foot point coordinates.