CN114677414A - Multi-camera target matching method and system with overlapped vision fields - Google Patents

Multi-camera target matching method and system with overlapped vision fields Download PDF

Info

Publication number
CN114677414A
CN114677414A CN202111406029.6A CN202111406029A CN114677414A CN 114677414 A CN114677414 A CN 114677414A CN 202111406029 A CN202111406029 A CN 202111406029A CN 114677414 A CN114677414 A CN 114677414A
Authority
CN
China
Prior art keywords
camera
target
calculating
foreground
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111406029.6A
Other languages
Chinese (zh)
Inventor
宁艳
陈志明
徐嘉星
王宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Fangtian Power Technology Co Ltd
Original Assignee
Jiangsu Fangtian Power Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Fangtian Power Technology Co Ltd filed Critical Jiangsu Fangtian Power Technology Co Ltd
Priority to CN202111406029.6A priority Critical patent/CN114677414A/en
Publication of CN114677414A publication Critical patent/CN114677414A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/292Multi-camera tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a multi-camera target matching method and a system with overlapped vision fields, wherein the method comprises the following steps: calculating the visual field boundary of the two cameras according to the first 5 frames of information of the camera 1 and the camera 2; extracting foreground targets in a video of the camera 1, and fusing the features of the foreground targets according to a multi-feature fusion rule to obtain fused feature vectors; calculating projection points of each target in the camera 1 in the camera 2, determining a reliable area of the target to be matched, and then calculating a multi-feature fusion vector in the reliable area; calculating Euclidean distance of the multi-feature fusion vector, completing multi-camera target matching of view overlapping, taking a target which is closest to the multi-camera target and within a set threshold range as a matching target, and storing view boundary parameters; and updating the view boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the view boundary parameter of the current frame, and updating the view boundary model.

Description

Multi-camera target matching method and system with overlapped vision fields
Technical Field
The present disclosure relates to the field of intelligent video surveillance technologies, and in particular, to a method and a system for matching multiple cameras with overlapped viewing areas.
Background
With the rapid development of smart power grids and smart cities, the intelligent video monitoring technology is in a rapid development stage, and makes a great contribution to security monitoring. However, the number of the monitoring cameras and the monitoring range are gradually increased, so that the labor intensity of the analysis work of camera installation, offline calibration, measurement and video monitoring is increased. In addition, with the complexity and diversity of monitoring scenes, the traditional single-camera target tracking and positioning method faces huge challenges in practical application. In recent years, a multi-camera collaborative video surveillance technology has attracted more and more people's attention, and vision-based positioning and tracking methods have gradually transitioned from the single-camera field to the multi-camera field, especially when there is occlusion between objects in a surveillance scene. Therefore, the research on the target positioning and tracking technology based on the cooperation of multiple cameras in intelligent video monitoring has great significance.
In the prior art, a homography-based target space positioning system and a main axis-based target space positioning system cannot independently divide targets in a foreground under the condition of severe shielding, possibly one foreground region contains a plurality of targets, and if a rectangular frame is reused to frame the foreground region, the rectangular frame may contain a plurality of targets, and at the moment, four vertexes of the rectangular frame cannot replace foreground information of any target, or wrong foreground information, so that a reliable target positioning result cannot be obtained by using the four vertexes of the rectangular frame. The prior art has defects and needs to be improved.
Disclosure of Invention
One of the embodiments of the present specification provides a method for multi-camera target matching with overlapping fields of view, the method comprising: s1, calculating the visual field boundary of the two cameras according to the previous n frames of information of the camera 1 and the camera 2, and respectively collecting foreground images containing the target through the camera 1 and the camera 2, wherein n is a positive integer; s2, extracting foreground targets in the video of the camera 1, fusing the features of each foreground target according to a multi-feature fusion rule to obtain a fused feature vector, and detecting and framing the foreground image through a Vibe detection algorithm; s3, sampling the head foreground in the foreground image in the step S2 to form head sampling points, projecting the head sampling points, calculating the projection points of foreground targets to be matched in the camera 1 in the camera 2, determining the reliable regions of the targets to be matched, and then calculating multi-feature fusion vectors in the reliable regions; s4, connecting the projection points and the camera center projection point, calculating candidate target foot points, weighting and summing the candidate target foot points to obtain target foot points, calculating the Euclidean distance of the multi-feature fusion vector, completing multi-camera target matching with overlapped vision fields, taking the target which is closest to the distance and in the range of the set threshold value as a matching target, and storing the vision field boundary parameter; s5, sampling the chest foreground in the foreground image in the step S2 to form a chest sampling point, projecting the chest sampling point, calculating the gravity center of a foot point to update a field boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the field boundary parameter of the current frame, and updating a field boundary model; and S6, repeating the step S2 until the video is finished.
One of the embodiments of the present specification provides a multi-camera object matching system with overlapped views, the system being used for the multi-camera object matching method with overlapped views, the system comprising: the first calculation module is used for calculating the view boundary of the two cameras according to the previous n frames of information of the camera 1 and the camera 2, and respectively acquiring foreground images containing the target through the camera 1 and the camera 2, wherein n is a positive integer; the extraction module is used for extracting foreground targets in the video of the camera 1, fusing the features of the foreground targets according to a multi-feature fusion rule to obtain fused feature vectors, and detecting and framing the foreground images through a Vibe detection algorithm; the sampling module is used for sampling the head foreground in the foreground image to form a head sampling point, projecting the head sampling point, calculating the projection point of each foreground target to be matched in the camera 1 in the camera 2, determining the reliable region of the target to be matched, and then calculating the multi-feature fusion vector in the reliable region; the second calculation module is used for connecting the projection points and the camera center projection points, calculating candidate target foot points, carrying out weighted summation on the candidate target foot points to obtain target foot points, calculating Euclidean distances of multi-feature fusion vectors, completing multi-camera target matching of view overlapping, taking a target which is closest to the distance and in a set threshold range as a matched target, and storing view boundary parameters; the third calculation module is used for sampling the chest foreground in the foreground image to form a chest sampling point, projecting the chest sampling point, calculating the gravity center of a foot point to update a view boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the view boundary parameter of the current frame, and updating a view boundary model; and the repeating module is used for repeating the operation of the extracting module until the video is finished.
Drawings
The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments, like numbers indicate like structures, wherein:
FIG. 1 is an exemplary flow diagram of a multi-camera target matching method with overlapping fields of view, according to some embodiments of the present description;
FIG. 2 is another exemplary flow diagram of a method of multi-camera target matching with overlapping fields of view, according to some embodiments of the present description;
FIG. 3 is a block diagram of a multi-camera object matching method with overlapping fields of view, according to some embodiments of the present description;
FIG. 4 is a schematic diagram of gradient direction partitioning, shown in accordance with some embodiments herein.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system," "device," "unit," and/or "module" as used herein is a method for distinguishing between different components, elements, parts, portions, or assemblies of different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not to be taken in a singular sense, but rather are to be construed to include a plural sense unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" are intended to cover only the explicitly identified steps or elements as not constituting an exclusive list and that the method or apparatus may comprise further steps or elements.
Flowcharts are used in this specification to illustrate the operations performed by the system according to embodiments of the present specification. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
Fig. 1 is an exemplary flow diagram of a method of multi-camera object matching with overlapping fields of view according to some embodiments described herein, and fig. 2 is another exemplary flow diagram of a method of multi-camera object matching with overlapping fields of view according to some embodiments described herein. The method comprises the following steps:
s1, calculating the visual field boundary of the two cameras according to the previous n frames of information of the camera 1 and the camera 2, and respectively collecting foreground images containing the target through the camera 1 and the camera 2, wherein n is a positive integer;
s2, extracting foreground targets in the video of the camera 1, fusing the features of each foreground target according to a multi-feature fusion rule to obtain a fused feature vector, and detecting and framing the foreground image through a Vibe detection algorithm;
s3, sampling the head foreground in the foreground image in the step S2 to form head sampling points, projecting the head sampling points, calculating the projection points of each foreground target to be matched in the camera 1 in the camera 2, determining the reliable area of the target to be matched, and then calculating the multi-feature fusion vector in the reliable area;
s4, connecting the projection points and the central projection point of the camera, calculating candidate target foot points, weighting and summing the candidate target foot points to obtain target foot points, calculating Euclidean distance of the multi-feature fusion vector, completing multi-camera target matching of overlapped view fields, taking the target which is closest to the distance and in a set threshold range as a matching target, and storing view field boundary parameters;
S5, sampling the chest foreground in the foreground image in the step S2 to form a chest sampling point, projecting the chest sampling point, calculating the center of gravity of the foot point to update the vision boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the vision boundary parameter of the current frame, and updating the vision boundary model;
and S6, repeating the step S2 until the video is finished.
The method further comprises the following steps:
in step 210, the frame images of the two cameras at the same time are set as image 1 and image 2.
Step 220, extracting sift matching key points of the image 1 and the image 2, and filtering by using a RANSAC algorithm.
Step 230, selecting 4 pairs of space coplanar point pairs from the filtered sift matching key point pairs, wherein any 3 points are not collinear, and setting I1And I2Are two independent projection invariant.
Step 240, calculating the object with the smallest distance from the boundary of the field of view, and judging whether the object with the smallest distance is blocked.
In some embodiments, the overall framework of the present invention is an adaptive view space model based target matching method, i.e. the parameters of the view boundary model can be updated in time according to the optimization method of the present invention. This section mainly comprises two modules: firstly, initializing a view boundary model, and secondly, updating model parameters.
The frame images at the same time in the two cameras are referred to as image 1 and image 2. Firstly, sift matching key points of the image 1 and the image 2 are extracted and filtered by using RANSAC algorithm. And 4 pairs of spatially coplanar point pairs are selected from the filtered key point pairs, and any 3 points are not collinear. Let I1And I2For two independent projection invariants, the following formula (1) is calculated:
Figure BDA0003372243430000051
Figure BDA0003372243430000052
Figure BDA0003372243430000053
wherein the content of the first and second substances,
Figure BDA0003372243430000054
is a point
Figure BDA0003372243430000055
Coordinates in graph i. In the two background images to be matched, 5 points in the known image 1
Figure BDA0003372243430000061
Calculating a projection invariant I1And I2. (ii) a 5 points in image 1
Figure BDA0003372243430000062
Corresponding to the relative position of the fifth point in the image 2, respectively, the corresponding position of the 5 th point in the image 2 can be calculated according to the two projection invariants.
The steps of calculating the candidate target foot point are as follows:
GPS coordinates gC1 and g C2 of a vertical projection point of the center of the camera on a scene plane are measured by a GPS receiver and are called as a central projection point of the camera for short;
projecting two vertexes on a rectangular frame representing a pedestrian foreground area in each camera foreground onto a scene plane through a homography matrix to obtain T1 0、T1 1、T0 2、T0 2
Connecting the projection points with corresponding central projection points of the camera to obtain 4 straight lines on the scene plane gC1T0 1gC1T1 1gC2T0 2gC2T1 2
The intersection points p1, p2, p3 and p4 of the straight lines are calculated, and the coordinates of the intersection points are used as candidate coordinates of the foot points.
Note that l1 and l2 are two homographic projection lines, and the endpoints of the straight line are (x0, y0), (x1, y1), (x2, y2), (x3, y3), l 1: y — k1(x-x0) + y0, l 2: y is k2(x-x2) + y2, k1 and k2 are slopes of two straight lines which intersect at one point (x, y),
Figure BDA0003372243430000063
combined stand
Figure BDA0003372243430000064
It is possible to obtain a solution of,
Figure BDA0003372243430000065
the candidate target foot point can be obtained by solving the intersection point of the connecting lines in the cameras gC1 and gC2 according to the method. After the targets are not completely segmented and matched, namely after foreground detection, the targets are not completely segmented and are not matched with each other, and the foreground area framed by the rectangular frame cannot represent the foreground area of any pedestrian.
In some embodiments, a may be calculated1、b1、c1And a2、b2、c2Two sets of parameters, firstly, selecting the 1 st, 3 rd and 5 th frame images of two sets of videos, calculating the view boundary parameter, and recording the nth (n)>5) the frame image view boundary parameter is
Figure BDA0003372243430000071
Figure BDA0003372243430000072
Figure BDA0003372243430000073
FIG. 3 is a block diagram of a multi-camera object matching method with overlapping fields of view, according to some embodiments described herein. As shown in fig. 3, the system includes: the first calculation module is used for calculating the view boundary of the two cameras according to the previous n frames of information of the camera 1 and the camera 2, and respectively acquiring foreground images containing the target through the camera 1 and the camera 2, wherein n is a positive integer; the extraction module is used for extracting foreground targets in the video of the camera 1, fusing the features of the foreground targets according to a multi-feature fusion rule to obtain fused feature vectors, and detecting and framing the foreground images through a Vibe detection algorithm; the sampling module is used for sampling the head foreground in the foreground image to form a head sampling point, projecting the head sampling point, calculating the projection point of each foreground target to be matched in the camera 1 in the camera 2, determining the reliable region of the target to be matched, and then calculating the multi-feature fusion vector in the reliable region; the second calculation module is used for connecting the projection points and the central projection point of the camera, calculating candidate target foot points, carrying out weighted summation on the candidate target foot points to obtain target foot points, calculating Euclidean distances of multi-feature fusion vectors, completing multi-camera target matching with overlapped vision fields, taking a target which is closest to the distance and in a set threshold value range as a matched target, and storing vision field boundary parameters; the third calculation module is used for sampling the chest foreground in the foreground image to form a chest sampling point, projecting the chest sampling point, calculating the gravity center of a foot point to update a view boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the view boundary parameter of the current frame, and updating a view boundary model; and the repeating module is used for repeating the operation of the extracting module until the video is finished.
In some embodiments, the system further comprises: the device comprises a setting module, a judging module and a display module, wherein the setting module is used for setting frame images at the same moment in two cameras as an image 1 and an image 2; the second extraction module is used for extracting sift matching key points of the image 1 and the image 2 and filtering by using a RANSAC algorithm; a matching module for selecting 4 pairs of point pairs coplanar in space from the filtered sift matching key point pairs, wherein any 3 points are not collinear, and I is set1And I2Two independent projection invariants; and the fourth calculation module is used for calculating the target with the minimum distance from the boundary of the visual field and judging whether the target with the minimum distance is blocked.
In some embodiments, the system further comprises: a fifth calculation module for calculating a1、b1、c1And a2、b2、c2Two sets of parameters, firstly, selecting the 1 st, 3 rd and 5 th frame images of two sets of videos, calculating the view boundary parameter, and recording the nth (n)>5) the frame image view boundary parameter is
Figure BDA0003372243430000081
In some embodiments, the system further comprises: a coordinate module for measuring the GPS coordinate gC of the vertical projection point of the camera center on the scene plane by using the GPS receiver1、gC2The central projection point of the camera is called for short; and the projection module is used for projecting two vertexes on a rectangular frame representing a pedestrian foreground area in each camera foreground onto a scene plane through a homography matrix to obtain: t is 0 1、T1 1、T0 2、T0 2(ii) a The connecting module is used for connecting the projection points and the corresponding central projection points of the camera to obtain 4 straight lines positioned on a scene plane; and the sixth calculating module is used for calculating the intersection points p1, p2, p3 and p4 of the straight lines and taking the intersection point coordinates as candidate foot point coordinates.
In the view angle 1, the two objects are shielded to a relatively large degree, and the vertex of the rectangular frame cannot represent the head point or the foot point of any pedestrian. If the candidate target foot point is found according to the method of the first case, a large positioning error will be caused. Since a small error in the two-dimensional image is amplified in the scene plane, for example, for a distance of 20 meters, there may be 10 pixels in the image plane, and each pixel represents a length of 2 meters, that is, a difference of one pixel in the image plane may cause an error range of 2 meters in the scene plane, which is unacceptable.
In the video monitoring scene, only the foreground of the target can be obtained, and if high-precision positioning and tracking are needed, a large amount of workload is needed to complete target matching. A commonly used matching algorithm is Surf feature point extraction and matching algorithm. Matching is usually an important task in target location and tracking, but the matching algorithm usually involves a large number of non-linear optimization processes, which has great limitations for real-time target location and tracking. In recent years, with the expansion of video monitoring range and the increase of real-time monitoring demand, the research on non-matching target positioning and tracking is increasing. Meanwhile, with the complexity of monitoring scenes, target positioning and tracking under the shielding condition also become a research hotspot. Most of research focuses on obtaining candidate target foot points by sampling the foreground, then performing footprint analysis on the candidate target foot points, and finally positioning the position of the target in the scene plane.
Therefore, in order to further reduce the positioning error under the condition, the invention provides a method for sampling pixels in the side length range on the rectangular frame representing the foreground to acquire candidate target foot points aiming at the target positioning under the condition of severe shielding.
FIG. 4 is a schematic illustration of gradient direction partitioning, shown in accordance with some embodiments herein.
As shown in fig. 4, the target image may be decomposed into a number of cell units, and a gradient distribution histogram thereof may be calculated for each cell. The gradient distribution histogram of a cell is calculated, the gradient direction (0-360 degrees) is divided into 4 value domain band bins (figure 4), the gradient value and the gradient direction of each pixel in the cell are calculated, and the gradient values of the pixels are accumulated into the corresponding gradient direction bins, so that the gradient distribution histogram of the cell can be obtained.
Adjacent 2 × 2 cells may be combined into a block, as shown in fig. 4, and connected in series as a block gradient distribution histogram feature, and normalized by means of L2-norm. And finally, combining the histogram features of each block to form the final HOG feature of the target image.
In some embodiments, the target image may be unified into an image with a size of 128 × 64 pixels by a bilinear interpolation method, and then the color histogram feature and the HOG feature may be extracted according to the above method, respectively. The color histogram is characterized by H and S channels in the HSV color space, and the color histogram is characterized by 8 × 8 × 8 ═ 512 dimensions; in the HOG feature extraction, each cell has a size of 8 × 8, and if adjacent 2 × 2 cells form a block, the HOG feature is 128 × 2 × 2 × 4-2048 dimensions. Finally, the color histogram feature and the HOG feature are combined to form a final 512+ 2048-2560-dimensional target feature vector.
Having thus described the basic concepts, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad description. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not specifically described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.
Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.
Additionally, the order in which elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described in this specification, unless explicitly stated in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.
Similarly, it should be noted that in the foregoing description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features are required than are expressly recited in the claims. Indeed, the features of an embodiment may be less than all of the features of a single embodiment disclosed above.
Where numerals describing the number of components, attributes or the like are used in some embodiments, it is to be understood that such numerals used in the description of the embodiments are modified in some instances by the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.
For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims (8)

1. A method of multi-camera object matching with overlapping fields of view, the method comprising:
S1, calculating the visual field boundary of the two cameras according to the previous n frames of information of the camera 1 and the camera 2, and respectively collecting foreground images containing the target through the camera 1 and the camera 2, wherein n is a positive integer;
s2, extracting foreground targets in the video of the camera 1, fusing the features of each foreground target according to a multi-feature fusion rule to obtain a fused feature vector, and detecting and framing the foreground image through a Vibe detection algorithm;
s3, sampling the head foreground in the foreground image in the step S2 to form a head sampling point, projecting the head sampling point, calculating the projection point of each foreground target to be matched in the camera 1 in the camera 2, determining the reliable area of the target to be matched, and then calculating the multi-feature fusion vector in the reliable area;
s4, connecting the projection points and the central projection point of the camera, calculating candidate target foot points, weighting and summing the candidate target foot points to obtain target foot points, calculating Euclidean distance of the multi-feature fusion vector, completing multi-camera target matching with overlapped view fields, taking the target which is closest to the distance and in a set threshold range as a matching target, and storing view field boundary parameters;
S5, sampling the chest foreground in the foreground image in the step S2 to form a chest sampling point, projecting the chest sampling point, calculating the gravity center of a foot point to update a view boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the view boundary parameter of the current frame, and updating a view boundary model;
and S6, repeating the step S2 until the video is finished.
2. The method of claim 1, wherein the method further comprises:
setting frame images at the same moment in the two cameras as an image 1 and an image 2;
extracting sift matching key points of the image 1 and the image 2, and filtering by using a RANSAC algorithm;
selecting 4 pairs of space coplanar point pairs from the filtered sift matching key point pairs, wherein any 3 points are not collinear, and setting I1And I2Two independent projection invariants;
and calculating the target with the minimum distance from the boundary of the visual field, and judging whether the minimum target is blocked or not.
3. The method of claim 1, wherein the method further comprises:
calculating a1、b1、c1And a2、b2、c2Two sets of parameters, firstly, selecting the 1 st, 3 rd and 5 th frame images of two sets of videos, calculating the view boundary parameter, and recording the nth (n) >5) frame image view boundary parameter of
Figure FDA0003372243420000021
4. The method of claim 1, wherein the method further comprises:
GPS coordinates gC of vertical projection point of camera center on scene plane is measured by GPS receiver1、gC2The central projection point of the camera is called for short;
projecting two vertexes on a rectangular frame representing a pedestrian foreground area in each camera foreground onto a scene plane through a homography matrix to obtain:
Figure FDA0003372243420000022
connecting the projection points with corresponding central projection points of the camera to obtain 4 straight lines positioned on a scene plane;
the intersection points p1, p2, p3 and p4 of the straight lines are calculated, and the intersection point coordinates are set as candidate foot point coordinates.
5. A multi-camera object matching system with overlapping fields of view, the system being configured to perform the method of any of claims 1-4, the system comprising:
the first calculation module is used for calculating the view boundary of the two cameras according to the previous n frames of information of the camera 1 and the camera 2, and respectively acquiring foreground images containing the target through the camera 1 and the camera 2, wherein n is a positive integer;
the extraction module is used for extracting foreground targets in the video of the camera 1, fusing the features of the foreground targets according to a multi-feature fusion rule to obtain fused feature vectors, and detecting and framing the foreground images through a Vibe detection algorithm;
The sampling module is used for sampling the head foreground in the foreground image to form a head sampling point, projecting the head sampling point, calculating the projection point of each foreground target to be matched in the camera 1 in the camera 2, determining the reliable region of the target to be matched, and then calculating the multi-feature fusion vector in the reliable region;
the second calculation module is used for connecting the projection points and the camera center projection points, calculating candidate target foot points, carrying out weighted summation on the candidate target foot points to obtain target foot points, calculating Euclidean distances of multi-feature fusion vectors, completing multi-camera target matching of view overlapping, taking a target which is closest to the distance and in a set threshold range as a matched target, and storing view boundary parameters;
the third calculation module is used for sampling the chest foreground in the foreground image to form a chest sampling point, projecting the chest sampling point, calculating the gravity center of a foot point to update a view boundary parameter, selecting 2 groups of parameters from the first 5 frames of the current video frame, performing linear weighting by combining the view boundary parameter of the current frame, and updating a view boundary model;
and the repeating module is used for repeating the operation of the extracting module until the video is finished.
6. The system of claim 5, wherein the system further comprises:
the device comprises a setting module, a processing module and a display module, wherein the setting module is used for setting frame images at the same moment in two cameras as an image 1 and an image 2;
the second extraction module is used for extracting sift matching key points of the image 1 and the image 2 and filtering by using a RANSAC algorithm;
a matching module for selecting 4 pairs of space coplanar point pairs from the filtered sift matching key point pairs, wherein any 3 points are not collinear, and I is set1And I2Two independent projection invariants;
and the fourth calculation module is used for calculating the target with the minimum distance from the boundary of the visual field and judging whether the target with the minimum distance is blocked.
7. The system of claim 5, wherein the system further comprises:
a fifth calculation module for calculating a1、b1、c1And a2、b2、c2Two sets of parameters, firstly, selecting the 1 st, 3 rd and 5 th frame images of two sets of videos, calculating the view boundary parameter, and recording the nth (n)>5) the frame image view boundary parameter is
Figure FDA0003372243420000031
8. The system of claim 5, wherein the system further comprises:
a coordinate module for measuring the GPS coordinate gC of the vertical projection point of the camera center on the scene plane by using the GPS receiver 1、gC2The central projection point of the camera is called for short;
and the projection module is used for projecting two vertexes on a rectangular frame representing a pedestrian foreground area in each camera foreground onto a scene plane through a homography matrix to obtain:
Figure FDA0003372243420000041
the connecting module is used for connecting the projection points and the corresponding central projection points of the camera to obtain 4 straight lines positioned on a scene plane;
and the sixth calculating module is used for calculating the intersection points p1, p2, p3 and p4 of the straight lines and taking the intersection point coordinates as candidate foot point coordinates.
CN202111406029.6A 2021-11-24 2021-11-24 Multi-camera target matching method and system with overlapped vision fields Pending CN114677414A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111406029.6A CN114677414A (en) 2021-11-24 2021-11-24 Multi-camera target matching method and system with overlapped vision fields

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111406029.6A CN114677414A (en) 2021-11-24 2021-11-24 Multi-camera target matching method and system with overlapped vision fields

Publications (1)

Publication Number Publication Date
CN114677414A true CN114677414A (en) 2022-06-28

Family

ID=82069912

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111406029.6A Pending CN114677414A (en) 2021-11-24 2021-11-24 Multi-camera target matching method and system with overlapped vision fields

Country Status (1)

Country Link
CN (1) CN114677414A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912517A (en) * 2023-06-06 2023-10-20 阿里巴巴(中国)有限公司 Method and device for detecting camera view field boundary

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912517A (en) * 2023-06-06 2023-10-20 阿里巴巴(中国)有限公司 Method and device for detecting camera view field boundary
CN116912517B (en) * 2023-06-06 2024-04-02 阿里巴巴(中国)有限公司 Method and device for detecting camera view field boundary

Similar Documents

Publication Publication Date Title
EP2915333B1 (en) Depth map generation from a monoscopic image based on combined depth cues
CN103955888A (en) High-definition video image mosaic method and device based on SIFT
CN104537381B (en) A kind of fuzzy image recognition method based on fuzzy invariant features
CN112215925A (en) Self-adaptive follow-up tracking multi-camera video splicing method for coal mining machine
St-Charles et al. Online multimodal video registration based on shape matching
CN108229281B (en) Neural network generation method, face detection device and electronic equipment
CN111028263B (en) Moving object segmentation method and system based on optical flow color clustering
CN115409814A (en) Photovoltaic module hot spot detection method and system based on fusion image
CN109785357B (en) Robot intelligent panoramic photoelectric reconnaissance method suitable for battlefield environment
CN114677414A (en) Multi-camera target matching method and system with overlapped vision fields
Recky et al. Window detection in complex facades
Shi et al. Investigating the performance of corridor and door detection algorithms in different environments
Cai et al. Improving CNN-based planar object detection with geometric prior knowledge
CN117787690A (en) Hoisting operation safety risk identification method and identification device
Avola et al. Homography vs similarity transformation in aerial mosaicking: which is the best at different altitudes?
Fang et al. 1-D barcode localization in complex background
CN117218633A (en) Article detection method, device, equipment and storage medium
Thomas et al. Color balancing for change detection in multitemporal images
Aktar et al. Performance evaluation of feature descriptors for aerial imagery mosaicking
CN116543014A (en) Panorama-integrated automatic teacher tracking method and system
CN113628251B (en) Smart hotel terminal monitoring method
Ince et al. Fast video fire detection using luminous smoke and textured flame features
Yang et al. Design flow of motion based single camera 3D mapping
CN115249024A (en) Bar code identification method and device, storage medium and computer equipment
van de Wouw et al. Hierarchical 2.5-d scene alignment for change detection with large viewpoint differences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination