CN114758016A

CN114758016A - Camera equipment calibration method, electronic equipment and storage medium

Info

Publication number: CN114758016A
Application number: CN202210671076.1A
Authority: CN
Inventors: 区士超; 布伦诺.卡尔达托; 刘晓涛
Original assignee: Super Node Innovative Technology Shenzhen Co ltd
Current assignee: Super Node Innovative Technology Shenzhen Co ltd
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2022-07-15
Anticipated expiration: 2042-06-15
Also published as: CN114758016B

Abstract

The application provides a camera equipment calibration method, equipment and a storage medium, comprising the following steps: acquiring an image frame sequence of a target object based on each camera device, and estimating two-dimensional joint coordinates and three-dimensional skeleton coordinates of the target object under the shooting of each camera device to obtain three-dimensional center coordinates of the three-dimensional skeleton coordinates; analyzing the two-dimensional joint coordinates and the three-dimensional skeleton coordinates shot by each pair of camera equipment according to a PNP algorithm to obtain a first transformation matrix and a second transformation matrix, determining a relative transformation matrix between each pair of camera equipment according to each first transformation matrix and each second transformation matrix, and repeatedly executing the steps to obtain three-dimensional center coordinates of the three-dimensional skeleton coordinates and obtain a relative transformation matrix between each two camera equipment; and obtaining the three-dimensional posture of each camera device relative to the three-dimensional center coordinate of the target object according to the relative change matrix between every two camera devices and the predefined reference coordinate. The calibration method aims to improve the calibration efficiency and the calibration accuracy of the camera equipment.

Description

Camera equipment calibration method, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of vision technologies, and in particular, to a camera calibration method, an electronic device, and a storage medium.

Background

At present, with the application of a distributed intelligent camera system in the fields of artificial intelligence and computer vision, the calibration requirements for multiple camera devices are higher and higher. As distributed intelligent camera systems often require hundreds or even thousands of cameras to be deployed for security, security or business analysis applications.

However, in practical applications, if such a large-scale camera device is calibrated by using a conventional camera calibration means, it is necessary to not only move the calibration plate in front of each camera, but also ensure that at least two cameras can see the calibration features on the plane and the plate surface of the calibration plate simultaneously. The harsh requirements not only limit the deployment position of the camera, but also lead to different movement rules of the corresponding calibration plates along with different deployment positions of the camera equipment, thereby causing low calibration efficiency of large-scale deployment of the camera equipment and hardly ensuring the calibration accuracy.

Disclosure of Invention

The application provides a camera calibration method, an electronic device and a storage medium, wherein people are used as a calibration basis, the calibration of the camera is completed based on an image frame sequence of a tracked object by acquiring the image frame sequence of the tracked object in a region to be monitored, the problems encountered in the calibration process of deploying a large number of cameras by using a traditional calibration plate are effectively avoided, and the calibration efficiency of the camera is improved while the calibration accuracy is improved.

In a first aspect, an embodiment of the present application provides a method for calibrating an image capturing apparatus, where the method includes:

acquiring an image frame sequence of a target object based on each camera device, wherein the target object is a tracked object in a region to be monitored;

according to each image frame sequence, estimating two-dimensional joint coordinates and three-dimensional skeleton coordinates of the target object under the shooting of each camera by using a pre-trained neural network respectively;

performing fusion processing on each three-dimensional skeleton coordinate to obtain a three-dimensional center coordinate of the three-dimensional skeleton coordinate;

respectively acquiring the two-dimensional joint coordinates and the three-dimensional skeleton coordinates shot by each pair of camera equipment, and respectively analyzing the two-dimensional joint coordinates and the three-dimensional skeleton coordinates shot by each pair of camera equipment according to a PNP algorithm to obtain a first transformation matrix and a second transformation matrix; the first transformation matrix is a transformation matrix of each first image pickup apparatus relative to the three-dimensional center coordinates, the second transformation matrix is a transformation matrix of each second image pickup apparatus relative to the three-dimensional center coordinates, each pair of image pickup apparatuses is two image pickup apparatuses having a common field of view in the image pickup apparatuses, and each pair of image pickup apparatuses includes the first image pickup apparatus and the second image pickup apparatus;

Determining a relative transformation matrix between each pair of image pickup apparatuses according to each first transformation matrix and each second transformation matrix;

repeatedly executing the fusion processing of the three-dimensional skeleton coordinates to obtain three-dimensional center coordinates of the three-dimensional skeleton coordinates and the subsequent steps until a relative transformation matrix between every two pieces of camera equipment in all the camera equipment is obtained;

and obtaining the three-dimensional postures of the camera devices relative to the three-dimensional center coordinate of the target object according to the relative transformation matrix between every two camera devices and the predefined reference coordinate.

In a second aspect, an embodiment of the present application provides an electronic device, including a memory and a processor;

the memory is used for storing a computer program;

the processor is configured to execute the computer program and, when executing the computer program, implement the steps of the method for calibrating an image capturing apparatus according to the first aspect.

In a third aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the processor is caused to implement the steps of the method for calibrating an image capturing apparatus according to the first aspect.

The embodiment of the application provides a camera equipment calibration method, electronic equipment and a storage medium, firstly, an image frame sequence of a target object is obtained based on each camera equipment, and the target object is a tracked object in an area to be monitored; then, according to each image frame sequence, two-dimensional joint coordinates and three-dimensional skeleton coordinates of the target object under the shooting of each camera device are respectively estimated, and the three-dimensional skeleton coordinates are subjected to fusion processing to obtain three-dimensional center coordinates of the three-dimensional skeleton coordinates; respectively acquiring the two-dimensional joint coordinates and the three-dimensional skeleton coordinates shot by each pair of camera equipment, and respectively analyzing the two-dimensional joint coordinates and the three-dimensional skeleton coordinates shot by each pair of camera equipment according to a PNP algorithm to obtain a first transformation matrix and a second transformation matrix; the first transformation matrix is a transformation matrix of each first camera device relative to the three-dimensional center coordinate, the second transformation matrix is a transformation matrix of each second camera device relative to the three-dimensional center coordinate, each pair of camera devices is two camera devices with a common view field in each camera device, each pair of camera devices comprises the first camera device and the second camera device, the relative transformation matrix between each pair of camera devices is determined according to each first transformation matrix and each second transformation matrix, and the fusion processing of each three-dimensional skeleton coordinate is repeatedly executed to obtain the three-dimensional center coordinate of the three-dimensional skeleton coordinate until the relative transformation matrix between each two camera devices in all the camera devices is obtained; and finally, obtaining the three-dimensional postures of the camera devices relative to the three-dimensional center coordinate of the target object according to the relative change matrix between every two camera devices and the predefined reference coordinate. The calibration of the multi-camera equipment is automatically completed based on the image frame sequence of the tracked object by acquiring the image frame sequence of the tracked object in the area to be monitored, so that the complex process of deploying the calibration plate in the calibration process of the multi-camera equipment and the calibration error caused by the deployment of the calibration plate are effectively avoided, the calibration efficiency of the camera equipment is improved, and the calibration accuracy is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure of the embodiments of the present application.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a camera calibration method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a calibration method for an image capturing apparatus according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating an implementation of S202 in FIG. 2;

FIG. 4 is a front view diagram of rendering results of rendering three-dimensional poses finally calibrated by all cameras into a 3D interface;

FIG. 5 is a schematic top view of rendering results of rendering three-dimensional poses finally calibrated by all cameras into a 3D interface;

fig. 6 is a schematic block diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only one wind embodiment of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The flowcharts shown in the figures are illustrative only and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may also be interpreted, combined or combined, and thus the order of actual execution may be changed according to actual situations.

It should be noted that the image capturing apparatus calibration method, the electronic apparatus, and the storage medium provided in the present application may be applied to calibration of multiple image capturing apparatuses in a distributed image capturing apparatus system. The multi-camera equipment comprises at least two camera equipment, and the corresponding camera equipment comprises a camera, a depth vision camera and the like. Specifically, an image frame sequence of a target object, which is a tracked object in an area to be monitored, may be first acquired based on each image pickup device, respectively; respectively estimating two-dimensional joint coordinates and three-dimensional skeleton coordinates of the target object under the shooting of each camera device according to each image frame sequence; then carrying out fusion processing on each three-dimensional skeleton coordinate to obtain a three-dimensional center coordinate of the three-dimensional skeleton coordinate; respectively acquiring the two-dimensional joint coordinates and the three-dimensional skeleton coordinates shot by each pair of camera equipment, and respectively analyzing the two-dimensional joint coordinates and the three-dimensional skeleton coordinates shot by each pair of camera equipment according to a PNP algorithm to obtain a first transformation matrix and a second transformation matrix; the first transformation matrix is a transformation matrix of each first camera device relative to the three-dimensional center coordinate, the second transformation matrix is a transformation matrix of each second camera device relative to the three-dimensional center coordinate, each pair of camera devices is two camera devices with a common view field in each camera device, each pair of camera devices comprises the first camera device and the second camera device, the relative transformation matrix between each pair of camera devices is determined according to each first transformation matrix and each second transformation matrix, and the fusion processing of each three-dimensional skeleton coordinate is repeatedly executed to obtain the three-dimensional center coordinate of the three-dimensional skeleton coordinate and the subsequent steps until the relative transformation matrix between each two camera devices in all the camera devices is obtained; and finally, obtaining the three-dimensional postures of the camera devices relative to the three-dimensional center coordinate of the target object according to the relative transformation matrix between every two camera devices and the predefined reference coordinate. The calibration of the multiple cameras is automatically completed by using the image frame sequence of the tracked object in the region to be monitored, the complex deployment process of the calibration plate is effectively avoided, the calibration error caused by unscientific deployment process of the calibration plate can be effectively solved, the calibration efficiency of the cameras is effectively improved, and the calibration accuracy is improved.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a calibration method for an image capturing apparatus according to an embodiment of the present application.

As shown in fig. 1, the method for calibrating a camera device provided in this embodiment is applied to a camera device calibration system 10, and the camera device calibration system 10 may be deployed in a large monitoring place, such as a mall, an airport, an exhibition hall, or a hospital, and includes a plurality of camera devices 101, an electronic device 102, and a target object 103. Wherein the plurality of image pickup apparatuses 101 are respectively communicatively connected to the electronic apparatus 102, and the plurality of image pickup apparatuses 101 are communicatively connected to each other.

It should be understood that each image pickup apparatus 101 may individually detect and track the target object 103 within the respective visual field range and capture a sequence of image frames of the target object 103 within the respective visual field range. The target object 103 may be a target pedestrian, a robot, or a trackable moving object, which walks within the monitored area, among others.

Each image pickup apparatus 101 transmits the image frame sequence of the target object 103 photographed by each to the electronic apparatus 102, and the electronic apparatus 102 acquires each image frame sequence of the target object 103, and estimates two-dimensional joint coordinates and three-dimensional bone coordinates of the target object 103 photographed by each image pickup apparatus 101, respectively. Specifically, in the present embodiment, it is exemplarily explained that the target object 103 is a human. The two-dimensional joint coordinates comprise two-dimensional coordinates of human joints such as human shoulder coordinates, elbow joint coordinates, waist coordinates and knee joint coordinates, and the three-dimensional skeleton coordinates comprise point coordinates of three-dimensional human body postures.

It should be understood that when a plurality of image pickup apparatuses 101 simultaneously observe the tracking target object 103, two-dimensional joint coordinates of the target object 103 in the image frame captured by each image pickup apparatus 101 may be obtained by performing matching mapping of joint feature points of the target object 103 captured by each image pickup apparatus 101 in tracking; further, the two-dimensional joint coordinates of the target object 103 in the image frames captured by the image capturing devices 101 can be used to obtain the three-dimensional bone coordinates of the target object 103 in the image frames captured by the image capturing devices 101; after the three-dimensional skeleton coordinates are subjected to fusion processing to obtain three-dimensional center coordinates of the three-dimensional skeleton coordinates, analyzing two-dimensional joint coordinates and three-dimensional skeleton coordinates shot by each pair of camera equipment by using an angle N Point (PNP) algorithm to obtain a first transformation matrix and a second transformation matrix. The first transformation matrix is a transformation matrix of each first image pickup device relative to a three-dimensional center coordinate, the second transformation matrix is a transformation matrix of each second image pickup device relative to the three-dimensional center coordinate, each pair of image pickup devices is two image pickup devices with a common field of view in each image pickup device 101, each pair of image pickup devices comprises the first image pickup device and the second image pickup device, and the relative transformation matrix between each pair of image pickup devices is further determined according to each first transformation matrix and each second transformation matrix; and repeatedly executing fusion processing on the three-dimensional skeleton coordinates to obtain three-dimensional center coordinates of the three-dimensional skeleton coordinates, and obtaining the three-dimensional postures of the camera devices 101 relative to the three-dimensional center coordinates of the target object 103 according to the relative transformation matrix between every two camera devices and the predefined reference coordinates respectively after the relative transformation matrix between every two camera devices in all the camera devices is obtained. The calibration method has the advantages that a complex multi-camera calibration moving scheme is established by workers according to experience or feeling without a special calibration plate or according to the arrangement of each camera, and each camera in the multi-camera can be automatically calibrated step by step only by walking the tracked target object in the monitoring area of the camera in a conventional mode, so that the calibration process of the multi-camera is greatly simplified, the calibration efficiency of the multi-camera is greatly improved, the manual operation process is avoided, and the calibration accuracy is improved.

Referring to fig. 2, fig. 2 is a schematic flowchart of a calibration method for a camera device according to an embodiment of the present disclosure. The method for calibrating the camera device is implemented by the electronic device shown in fig. 1, wherein the electronic device includes a terminal or a server. The terminal can be a handheld terminal device, a computer, a robot or an intelligent wearable device and the like; the server may be a single server, a local server, a cloud server, or a cluster of servers.

As can be seen from fig. 2, the method for calibrating an image capturing apparatus provided in the embodiment of the present application includes steps S201 to S204. The details are as follows:

s201, image frame sequences of the target object are acquired based on the respective image capturing apparatuses, respectively.

And the target object is a tracked object in the area to be monitored. For example, the tracked object in the area to be monitored is a human, a robot, or other possible objects. The image frame sequence of the target object is obtained through each camera device, and then the automatic calibration of each camera device is carried out one by one according to the image frame sequence of the target object obtained by each camera device, so that a calibration plate can be effectively avoided, the movement of the calibration plate is not required to be carried out according to the deployment scene of each camera device, the scheme of making the complex calibration plate to move is avoided, the calibration process of multiple camera devices is greatly simplified, the calibration efficiency is improved, meanwhile, the manual participation is avoided, and the calibration accuracy of the multiple camera devices is improved.

Specifically, the target object to be tracked only needs to move within the monitoring area of each image capturing device, for example, a person to be tracked walks in the monitoring area of each image capturing device in a conventional walking manner, an image frame sequence of the person to be tracked can be respectively obtained by each image capturing device, and calibration of each image capturing device can be gradually completed according to the image sequence of the person to be tracked, which is obtained by each image capturing device. The calibration efficiency and the calibration accuracy of the multi-camera equipment are improved.

And S202, respectively estimating two-dimensional joint coordinates and three-dimensional skeleton coordinates of the target object under the shooting of each image pickup device according to each image frame sequence.

In an embodiment, a pre-trained CNN neural network may be used to simultaneously identify two-dimensional and three-dimensional joint points of each frame of image frame in each frame of image frame, for example, each frame of image includes a human body image, and the pre-trained neural network may be used to identify joint points of the human body in each frame of image, so as to obtain two-dimensional joint coordinates and three-dimensional bone coordinates of the human body captured by each image capturing device. Wherein, each joint point of the human body comprises a head, a shoulder, an elbow joint, a waist, a knee joint and the like.

Specifically, in order to reduce the recognition error of the pre-trained neural network, the target object may be first clipped out of the image frames captured by the cameras, so that not only the computation amount of the pre-trained neural network may be reduced, but also the error of detecting the joint point may be reduced. In addition, the calculation of three-dimensional human body skeleton coordinates of the tracked target object can be carried out by acquiring images of continuous 3 frames, and an algorithm for taking the average value of the three-dimensional skeleton coordinates for fusion is carried out, so that errors brought by joint estimation are further reduced.

It should be understood that, in the embodiment of the present application, since each image capturing apparatus to be calibrated and the tracked object are not on a planar calibration board, the tracked object needs to be a movable object, such as a human, because the image capturing apparatuses in the same observation area can easily observe the joint point of the movable object, and the characteristics of the tracked object are not lost due to a slight angle change. Therefore, each camera can acquire the image frame sequence of the tracked object, and calibration of each camera is completed in sequence.

Illustratively, as shown in fig. 3, fig. 3 is a flowchart of a specific implementation of S202 in fig. 2. As can be seen from fig. 3, S202 includes steps S2021 and S2022. The details are as follows:

and S2021, respectively extracting joint feature points of the target object in each image frame sequence, and performing matching mapping on the joint feature points to obtain two-dimensional joint coordinates of the target object shot by each camera.

In one embodiment, the extracting joint feature points of a target object in each image frame sequence, performing matching mapping on the joint feature points, and obtaining two-dimensional joint coordinates of the target object under shooting by each image capturing device includes: and detecting joint feature points of the target object in each image frame sequence according to the pre-trained two-dimensional posture detection model, and matching and mapping the joint feature points in each associated image frame to obtain two-dimensional joint coordinates of the target object shot by each camera device. Wherein the associated image frames are each image frame in the same image frame sequence.

Specifically, a continuous sequence of image frames, for example, 3 image frames, may be acquired as an input to a pre-trained two-dimensional posture detection model from each image capturing apparatus, and the pre-trained two-dimensional posture detection model may be used to detect two-dimensional joint coordinates of the target object in each image frame sequence.

Furthermore, since image frame sequences are acquired by a plurality of image capturing apparatuses, two-dimensional joint coordinates identified from image frame sequences acquired by different image capturing apparatuses need to be subjected to matching mapping to obtain two-dimensional joint coordinates captured by each image capturing apparatus. Specifically, the two-dimensional joint coordinate correspondences detected from each image frame are subjected to matching mapping. For example, the head of the human body detected from a first image frame and the head of the human body detected from a second image frame adjacent to the first image frame are correspondingly matched and mapped. Specifically, the process of matching the mapping is the process of associating the mapping. Since the pre-trained two-dimensional posture detection model can output two-dimensional joint coordinates, it is difficult to distinguish directions in practice, for example, when the target object is a human, it is difficult to distinguish between the left and right hands and the left and right feet of the human. Therefore, the two-dimensional joint coordinates detected between the adjacent image frames are subjected to matching mapping, and the recognition result can be effectively corrected.

And S2022, performing three-dimensional skeleton calculation according to the two-dimensional joint coordinates to obtain the three-dimensional skeleton coordinates of the target object shot by each camera.

The method for calculating the three-dimensional skeleton according to the two-dimensional joint coordinates of the target object to obtain the three-dimensional skeleton coordinates of the target object shot by each camera device comprises the following steps: and inputting the two-dimensional joint coordinates of the target object into a pre-trained three-dimensional skeleton recognition model for calculation to obtain the three-dimensional skeleton coordinates of the target object shot by each camera.

And S203, performing fusion processing on the three-dimensional skeleton coordinates to obtain three-dimensional center coordinates of the three-dimensional skeleton coordinates.

In an embodiment, the performing the fusion processing on each three-dimensional bone coordinate to obtain a three-dimensional center coordinate of the three-dimensional bone coordinate includes: and performing outlier joint removal and continuous frame mean processing on each three-dimensional skeleton coordinate to obtain a three-dimensional center coordinate of the three-dimensional skeleton coordinate.

In this embodiment, the three-dimensional bone coordinates of each camera are fused by a method of removing outlier three-dimensional bone coordinate points and taking an average value of the three-dimensional bone coordinates of consecutive frames to optimize the three-dimensional bone coordinates, and then the center coordinates of the optimized three-dimensional bone coordinates are taken as calibration reference coordinates to improve the calibration accuracy of each camera.

S204, the two-dimensional joint coordinates and the three-dimensional bone coordinates which are shot by each pair of camera shooting equipment are respectively obtained, and the two-dimensional joint coordinates and the three-dimensional bone coordinates which are shot by each pair of camera shooting equipment are respectively analyzed according to a PNP algorithm to obtain a first transformation matrix and a second transformation matrix.

Wherein the first transformation matrix is a transformation matrix of each first image pickup apparatus with respect to the three-dimensional center coordinates of a target object, the second transformation matrix is a transformation matrix of each second image pickup apparatus with respect to the three-dimensional center coordinates, and each pair of image pickup apparatuses is two image pickup apparatuses having a common field of view among the image pickup apparatuses, and each pair of image pickup apparatuses includes the first image pickup apparatus and the second image pickup apparatus.

And S205, determining a relative transformation matrix between each pair of the image pickup devices according to each first transformation matrix and each second transformation matrix.

In a specific implementation, the determining a relative transformation matrix between each pair of image capturing apparatuses according to each first transformation matrix and each second transformation matrix includes: determining a relative transformation matrix between each of the first image pickup apparatuses and each of the second image pickup apparatuses based on each of the first transformation matrices and each of the second transformation matrices. Each first transformation matrix and each second transformation matrix are rotational translation change matrices, and the rotational translation change matrices specifically represent rotational translation change matrices between the three-dimensional postures of any first camera device relative to the three-dimensional center coordinate of the target object and the three-dimensional postures of second camera devices relative to the three-dimensional center coordinate of the target object.

And S206, repeatedly executing fusion processing on the three-dimensional skeleton coordinates to obtain three-dimensional center coordinates of the three-dimensional skeleton coordinates until a relative transformation matrix between every two image pickup devices in all the image pickup devices is obtained.

It should be understood that, in specific implementation, the above steps S203 to S205 are repeatedly performed until a relative transformation matrix between every two image capturing apparatuses in all the image capturing apparatuses is obtained. Every two image pickup devices in all the image pickup devices represent any two image pickup devices, that is, the relative transformation matrix between every two image pickup devices is determined in the embodiment of the present application. This is because under a large imaging system, not every pair of imaging apparatuses can track a tracked object every moment. Therefore, in the present embodiment, the relative transformation matrix calculation is first performed with a pair of image pickup apparatuses that can observe the tracked object, and this process is repeatedly performed until the relative transformation matrix between any two image pickup apparatuses among all the image pickup apparatuses is obtained. The calibration of each camera device in the camera system can be effectively guaranteed, calibration is prevented from being missed, and meanwhile calibration efficiency is improved.

And S207, obtaining the three-dimensional postures of the camera devices relative to the three-dimensional center coordinate of the target object according to the relative transformation matrix between every two camera devices and the predefined reference coordinate.

In one embodiment, the obtaining the three-dimensional postures of the image capturing apparatuses relative to the three-dimensional center coordinate of the target object according to the relative transformation matrix between each two image capturing apparatuses and a predefined reference coordinate comprises: determining the global coordinate of each camera device according to the relative transformation matrix between every two camera devices and the predefined reference coordinate; and respectively carrying out coordinate conversion on the three-dimensional skeleton coordinates of the target object shot by each camera shooting device according to the global coordinates and a preset scale factor to obtain the three-dimensional posture of the target object in each camera shooting device.

The method for converting the three-dimensional skeleton coordinates of the target object shot by each camera shooting device according to the global coordinates and the preset scale factors to obtain the three-dimensional postures of the target object in each camera shooting device comprises the following steps: and sequentially carrying out coordinate conversion on the three-dimensional skeleton coordinates of the target object shot by each camera shooting device according to the global coordinates, and optimizing the converted three-dimensional skeleton coordinates according to a preset scale factor to obtain the three-dimensional postures of the target object in each camera shooting device.

Wherein, determining the global coordinate of each image pickup apparatus according to the relative transformation matrix between every two image pickup apparatuses and the predefined reference coordinate respectively comprises: the coordinate of any camera is defined as a predefined reference coordinate, and the coordinate of each camera is converted based on the predefined reference coordinate according to a transformation matrix between the cameras, so that the global coordinate of each camera is obtained.

In the embodiment, each pair of camera devices capable of observing the object to be tracked simultaneously is taken, and the external parameters of each camera device are calibrated by using the PNP algorithm, so that the calibration of each camera device can be efficiently and automatically completed, the manual layout and movement of a calibration plate are avoided, and the calibration accuracy of each camera device is improved.

Specifically, by repeating the process of taking each pair of image pickup apparatuses capable of simultaneously observing the object to be tracked and calibrating the external parameters of each image pickup apparatus using the PNP algorithm, the transformation matrix between all the image pickup apparatuses can be found. In this embodiment, the transformation matrix is a relative rotation and translation transformation matrix. For example, the first transformation matrix is a first relative rotation and translation transformation matrix and the second transformation matrix is a second relative rotation and translation transformation matrix.

Further, by defining the coordinates of any one of the plurality of image pickup apparatuses as predefined reference coordinates, and unifying the coordinates of all the image pickup apparatuses step by step using the calculated relative rotation and translation transformation matrices between the image pickup apparatuses, it is possible to unify all the image pickup apparatuses under a global coordinate system. So as to quickly and accurately finish the calibration of each camera device.

Further, the preset scale factor is a scale factor of a three-dimensional posture calculated from an actual height of the object to be tracked, for example, from an actual height of a person, assuming that a distance between a head joint point and a foot joint point of the object to be tracked is calculated to be 17. The real height of the actual person, for example, 1.7 meters, the scaling factor can be calculated to be 1/10. In the embodiment, the three-dimensional postures of the camera devices at different moments are optimized according to the scale factors of the object to be tracked, so that the real calibration result of each camera device is obtained.

It should be understood that the actual height of the object to be tracked may be determined in advance, and the calculated three-dimensional poses of the respective image pickup apparatuses are relative numerical values before the three-dimensional pose optimization is not performed based on the preset scale factor.

As can be seen from the above analysis, in this embodiment, the three-dimensional pose estimation is performed by aggregating the three-dimensional bone coordinates captured by the image capturing devices at different times in an averaging manner, so that errors caused by the three-dimensional pose estimation and the PNP pose estimation can be reduced, and accurate calibration results for each image capturing device can be obtained.

In addition, the three-dimensional gestures finally calibrated by all cameras can be rendered into a 3D interface. Exemplarily, as shown in fig. 4 or fig. 5, fig. 4 is a rendering result front view schematic diagram for rendering the three-dimensional poses finally calibrated by all cameras into the 3D interface; fig. 5 is a schematic top view of the rendering result of rendering the three-dimensional poses finally calibrated by all cameras into a 3D interface.

Exemplarily, only the image pickup apparatuses CAM0, CAM1, and CAM2 are illustrated in fig. 4 and 5, respectively. It should be understood that in practical applications, the image capturing system is not limited to including only the image capturing apparatuses CAM0, CAM1, and CAM2, which may include any number of image capturing apparatuses.

As can be seen from the above analysis, in the camera calibration method provided in the embodiment of the present application, firstly, an image frame sequence of a target object is obtained based on each camera, where the target object is a tracked object in an area to be monitored; then, according to each image frame sequence, two-dimensional joint coordinates and three-dimensional skeleton coordinates of the target object under the shooting of each camera device are respectively estimated, and the three-dimensional skeleton coordinates are subjected to fusion processing to obtain three-dimensional center coordinates of the three-dimensional skeleton coordinates; respectively acquiring the two-dimensional joint coordinates and the three-dimensional skeleton coordinates shot by each pair of camera shooting equipment, and respectively analyzing the two-dimensional joint coordinates and the three-dimensional skeleton coordinates shot by each pair of camera shooting equipment according to a PNP algorithm to obtain a first transformation matrix and a second transformation matrix; the first transformation matrix is a transformation matrix of each first camera device relative to the three-dimensional center coordinate, the second transformation matrix is a transformation matrix of each second camera device relative to the three-dimensional center coordinate, each pair of camera devices is two camera devices with a common view field in each camera device, each pair of camera devices comprises the first camera device and the second camera device, the relative transformation matrix between each pair of camera devices is determined according to each first transformation matrix and each second transformation matrix, and the fusion processing of each three-dimensional skeleton coordinate is repeatedly executed to obtain the three-dimensional center coordinate of the three-dimensional skeleton coordinate until the relative transformation matrix between each two camera devices in all the camera devices is obtained; and finally, obtaining the three-dimensional postures of the camera devices relative to the three-dimensional center coordinate of the target object according to the relative change matrix between every two camera devices, the predefined reference coordinate and the predefined reference coordinate. The calibration of the multi-camera equipment is automatically completed based on the image frame sequence of the tracked object by acquiring the image frame sequence of the tracked object in the area to be monitored, so that the complex process of deploying the calibration plate in the calibration process of the multi-camera equipment and the calibration error caused by the deployment of the calibration plate are effectively avoided, the calibration efficiency of the camera equipment is improved, and the calibration accuracy is improved.

Referring to fig. 6, fig. 6 is a schematic block diagram of an electronic device according to an embodiment of the present disclosure.

Illustratively, the electronic device 102 includes a processor 401 and a memory 402.

In addition, as can be seen from the foregoing description of the embodiments, the electronic device 102 may be a terminal device or a server. The terminal device includes, but is not limited to, a handheld terminal, a computer, a robot, or a smart wearable device, and the server may be a single server, a local server, a cloud server, a server cluster, or the like.

Illustratively, the processor 401 and memory 402 are connected by a bus 403, such as an I2C (Inter-integrated Circuit) bus 403.

The processor 401 may be used to provide computing and control capabilities to support the operation of the overall electronic device. The Processor 401 may be a Central Processing Unit (CPU), and the Processor 401 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Specifically, the Memory 402 may be a Flash chip, a Read-Only Memory (ROM) magnetic disk, an optical disk, a usb disk, or a removable hard disk.

Those skilled in the art will appreciate that the structure shown in fig. 4 is a block diagram of only a portion of the structure related to the embodiments of the present application, and does not constitute a limitation to the electronic device 102 to which the embodiments of the present application are applied, and a specific electronic device 102 may include more or less components than those shown in the drawings, or may combine some components, or have a different arrangement of components.

The processor 401 is configured to run a computer program stored in the memory 402, and when executing the computer program, implement the steps of the above-mentioned calibration method for an image capturing apparatus provided in the embodiment of the present application.

In an embodiment, the processor 401 is configured to run a computer program stored in the memory 402 of the electronic device and to implement the following steps when executing the computer program:

respectively acquiring an image frame sequence of a target object based on each camera device, wherein the target object is a tracked object in a region to be monitored;

according to each image frame sequence, respectively estimating two-dimensional joint coordinates and three-dimensional skeleton coordinates of the target object under the shooting of each camera device;

Performing fusion processing on the three-dimensional skeleton coordinates to obtain three-dimensional center coordinates of the three-dimensional skeleton coordinates;

respectively acquiring the two-dimensional joint coordinates and the three-dimensional skeleton coordinates shot by each pair of camera shooting equipment, and respectively analyzing the two-dimensional joint coordinates and the three-dimensional skeleton coordinates shot by each pair of camera shooting equipment according to a PNP algorithm to obtain a first transformation matrix and a second transformation matrix; the first transformation matrix is a transformation matrix of each first image pickup device relative to the three-dimensional center coordinate, the second transformation matrix is a transformation matrix of each second image pickup device relative to the three-dimensional center coordinate, each pair of image pickup devices is two image pickup devices with a common view field in each image pickup device, and each pair of image pickup devices comprises the first image pickup device and the second image pickup device;

determining a relative transformation matrix between each pair of the image pickup apparatuses according to each first transformation matrix and each second transformation matrix;

repeatedly executing fusion processing on the three-dimensional skeleton coordinates to obtain three-dimensional center coordinates of the three-dimensional skeleton coordinates until a relative transformation matrix between every two pieces of camera equipment in all the camera equipment is obtained;

In one embodiment, the estimating, from each of the image frame sequences, two-dimensional joint coordinates and three-dimensional bone coordinates of the target object captured by each of the image capturing apparatuses, respectively, includes:

respectively extracting joint feature points of the target object in each image frame sequence, and performing matching mapping on the joint feature points to obtain two-dimensional joint coordinates of the target object shot by each camera device;

and performing three-dimensional skeleton calculation according to the two-dimensional joint coordinates to obtain the three-dimensional skeleton coordinates of the target object shot by each camera device.

In an embodiment, the performing fusion processing on each three-dimensional bone coordinate to obtain a three-dimensional center coordinate of the three-dimensional bone coordinate includes:

and performing outlier joint removal and continuous frame mean processing on each three-dimensional skeleton coordinate to obtain a three-dimensional center coordinate of the three-dimensional skeleton coordinate.

In one embodiment, the obtaining the three-dimensional postures of the image capturing apparatuses relative to the three-dimensional center coordinate of the target object according to the relative transformation matrix between each two image capturing apparatuses and a predefined reference coordinate comprises:

Determining the global coordinate of each camera device according to the relative transformation matrix between every two camera devices and the predefined reference coordinate;

and respectively carrying out coordinate conversion on the three-dimensional skeleton coordinates of the target object shot by each camera shooting device according to the global coordinates and preset scale factors to obtain the three-dimensional postures of the camera shooting devices relative to the three-dimensional center coordinates of the target object.

In an embodiment, the converting the three-dimensional bone coordinates of the target object captured by each imaging device according to the global coordinates and a preset scale factor to obtain the three-dimensional postures of the imaging devices relative to the three-dimensional center coordinates of the target object includes:

and according to the global coordinate, sequentially carrying out coordinate conversion on the three-dimensional skeleton coordinate of the target object shot by each camera shooting device, and optimizing the converted three-dimensional skeleton coordinate according to the preset scale factor to obtain the three-dimensional posture of each camera shooting device relative to the three-dimensional center coordinate of the target object.

In one embodiment, the determining global coordinates of each image capturing apparatus from each relative transformation matrix and the predefined reference coordinates includes:

Determining a transformation matrix among the camera devices according to the relative transformation matrices;

defining the coordinate of any one of the camera devices as the predefined reference coordinate, and converting the coordinate of each camera device based on the predefined reference coordinate according to the transformation matrix among the camera devices to obtain the global coordinate of each camera device.

In an embodiment, the extracting joint feature points of the target object in each image frame sequence, performing matching mapping on the joint feature points, and obtaining the two-dimensional joint coordinates of the target object captured by each image capturing device respectively includes:

detecting joint feature points of a target object in each image frame sequence according to a pre-trained two-dimensional posture detection model, and matching and mapping the joint feature points in each associated image frame to obtain two-dimensional joint coordinates of the target object under shooting of each camera device, wherein the associated image frames are all image frames in the same image frame sequence.

In one embodiment, the performing three-dimensional bone calculation according to the two-dimensional joint coordinates to obtain the three-dimensional bone coordinates of the target object captured by each image capturing device includes:

And inputting the two-dimensional joint coordinates into a pre-trained three-dimensional skeleton recognition model for calculation to obtain the three-dimensional skeleton coordinates of the target object shot by each camera.

It should be understood that the specific principle and implementation manner of the electronic device provided in this embodiment are the same as those of the implementation process in the method for calibrating the camera device in the foregoing embodiment, and are not described herein again.

In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the processor is caused to implement the following steps of the method for calibrating an image capturing apparatus:

In an embodiment, the estimating, from each of the image frame sequences, two-dimensional joint coordinates and three-dimensional bone coordinates of the target object captured by each of the image capturing apparatuses, respectively, includes:

respectively extracting joint feature points of the target object in each image frame sequence, and performing matching mapping on the joint feature points to obtain the two-dimensional joint coordinates of the target object under the shooting of each camera device;

and respectively carrying out coordinate conversion on the three-dimensional skeleton coordinates of the target object shot by each camera device according to the global coordinates and preset scale factors to obtain the three-dimensional postures of the camera devices relative to the three-dimensional center coordinates of the target object.

In an embodiment, the converting, according to the global coordinate and a preset scale factor, the three-dimensional bone coordinates of the target object captured by the respective imaging devices to obtain three-dimensional poses of the respective imaging devices relative to a three-dimensional center coordinate of the target object includes:

And according to the global coordinate, sequentially converting the three-dimensional skeleton coordinate of the target object shot by each camera device, and optimizing the converted three-dimensional skeleton coordinate according to the preset scale factor to obtain the three-dimensional posture of each camera device relative to the three-dimensional center coordinate of the target object.

Detecting joint feature points of a target object in each image frame sequence according to a pre-trained two-dimensional posture detection model, and matching and mapping the joint feature points in each associated image frame to obtain two-dimensional joint coordinates of the target object under shooting of each camera device, wherein the associated image frames are each image frame in the same image frame sequence.

and inputting the two-dimensional joint coordinates into a pre-trained three-dimensional skeleton recognition model for calculation to obtain the three-dimensional skeleton coordinates of the target object shot by each camera device.

Specifically, the specific principle and implementation manner of each step provided in this embodiment are the same as those in the implementation process in the foregoing embodiment of the calibration method for an image capturing device, and are not described here again.

The computer-readable storage medium may be an internal storage unit of the electronic device in the foregoing embodiments, for example, a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device; alternatively, the computer readable storage medium may be an internal storage unit of the electronic device in the foregoing embodiment, for example, a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device.

It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

It should also be understood that the term "and/or" as used in this application and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for calibrating an image pickup apparatus, the method comprising:

2. The method according to claim 1, wherein the estimating, from each of the image frame sequences, two-dimensional joint coordinates and three-dimensional bone coordinates of the target object captured by each of the image capturing apparatuses, respectively, comprises:

3. The method of claim 1, wherein said fusing each of said three-dimensional bone coordinates to obtain a three-dimensional center coordinate of said three-dimensional bone coordinates comprises:

4. The method according to claim 1, wherein the obtaining the three-dimensional postures of the image capturing apparatuses relative to the three-dimensional center coordinate of the target object according to the relative transformation matrix between each two image capturing apparatuses and a predefined reference coordinate comprises:

5. The method according to claim 4, wherein said converting the three-dimensional bone coordinates of the target object captured by each of the image capturing devices according to the global coordinates and a preset scale factor to obtain the three-dimensional pose of each of the image capturing devices relative to the three-dimensional center coordinates of the target object comprises:

6. The method according to claim 4, wherein the determining global coordinates of the respective image capturing apparatuses from the respective relative transformation matrices and the predefined reference coordinates comprises:

7. The method according to claim 2, wherein the extracting joint feature points of the target object in each image frame sequence respectively, and performing matching mapping on the joint feature points to obtain the two-dimensional joint coordinates of the target object captured by each image capturing device comprises:

8. The method according to claim 2, wherein the performing three-dimensional bone calculation based on the two-dimensional joint coordinates to obtain the three-dimensional bone coordinates of the target object captured by each image capturing device comprises:

9. An electronic device comprising a memory and a processor;

the memory is used for storing a computer program;

the processor is configured to execute the computer program and, when executing the computer program, implement the steps of the imaging apparatus calibration method according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, which, when executed by a processor, causes the processor to carry out the steps of the camera calibration method according to any one of claims 1 to 8.