CN117635683A

CN117635683A - Trolley indoor positioning method based on multiple cameras

Info

Publication number: CN117635683A
Application number: CN202311706393.3A
Authority: CN
Inventors: 李倩迪; 姚焙继; 高桓; 吴冶
Original assignee: Nanjing Yingqi Intelligent Technology Co ltd
Current assignee: Nanjing Yingqi Intelligent Technology Co ltd
Priority date: 2023-12-13
Filing date: 2023-12-13
Publication date: 2024-03-01

Abstract

The invention discloses a multi-camera-based positioning method in a trolley room, which comprises the following steps: step 1, calibrating internal and external parameters of a multi-camera, acquiring three-dimensional space information from a two-dimensional image, acquiring a mapping relation between pixels in the image and a space object through camera calibration, and calculating space coordinates by using pixel coordinates; step 2, detecting characteristic points of the trolley based on an improved YOLOv5 model; and step 3, triangulating the coordinates of the pixel points of the feature points detected in the step 2 to obtain depth information of the trolley, and finishing the indoor positioning target. In the invention, the multi-camera system obtains information of a plurality of angles and visual angles at the same time, thereby reducing errors and providing higher positioning precision; while in a complex indoor environment, the multi-camera system can provide stable position estimation by handling obstructions and multipath effects through multiple perspectives.

Description

Trolley indoor positioning method based on multiple cameras

Technical Field

The invention relates to the technical field of indoor positioning, in particular to a trolley indoor positioning method based on multiple cameras.

Background

Under the large background of automobile intelligence, autopilot is regarded as an important technology in the future traffic field, and can improve traffic safety, reduce traffic jams and provide more travel options. The hundred-degree Apollo trolley is a test vehicle carrying an automatic driving technology and is used for testing and verifying an automatic driving system on an actual road. The Apollo trolley positioning module depends on an IMU, a GPS, a laser radar, a radar and a high-precision map, and the sensors support GNSS positioning and LiDAR positioning simultaneously, GNSS positioning output position and speed information and LiDAR positioning output position and travelling direction information. While the lack of GPS information for indoor positioning presents challenges for the study of indoor autopilot of Apollo carts.

The existing indoor positioning technology covers various methods and application fields, such as indoor positioning technology based on wifi, bluetooth, ultrasonic wave, inertial navigation and other methods, is often applied to the fields of indoor navigation, indoor positioning, indoor remote control and the like, and provides some solutions for positioning and navigation problems in indoor environments. However, the current indoor positioning technology has various problems and disadvantages in practical application, for example, accuracy in a complex environment is limited, and signal strength may be affected by obstacles, interference or multipath propagation, thereby causing an increase in positioning error. Inertial navigation techniques are susceptible to accumulated errors, resulting in drift problems. While some high precision indoor positioning techniques, such as ultrasound or laser based systems, are costly. Deployment of these systems requires expensive hardware and infrastructure investments, limiting their feasibility in large-scale applications.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a multi-camera-based positioning method in a trolley room, wherein a multi-camera system obtains information of a plurality of angles and visual angles at the same time, so that errors are reduced, and higher positioning precision is provided; while in a complex indoor environment, the multi-camera system can provide stable position estimation by handling obstructions and multipath effects through multiple perspectives.

In order to solve the technical problems, the invention provides a multi-camera-based positioning method in a trolley room, which comprises the following steps:

step 1, calibrating internal and external parameters of a multi-camera, acquiring three-dimensional space information from a two-dimensional image, acquiring a mapping relation between pixels in the image and a space object through camera calibration, and calculating space coordinates by using pixel coordinates;

step 2, detecting characteristic points of the trolley based on an improved YOLOv5 model;

and step 3, triangulating the coordinates of the pixel points of the feature points detected in the step 2 to obtain depth information of the trolley, and finishing the indoor positioning target.

Preferably, in step 1, calibrating the internal parameter and the external parameter of the multi-camera, and acquiring three-dimensional space information from the two-dimensional image specifically includes the following steps:

step 11, acquiring camera internal parameters, and converting a camera coordinate system into an image coordinate system;

step 12, obtaining camera external parameters, and converting a world coordinate system into a camera coordinate system;

step 13, constructing a perspective projection matrix through the internal reference matrix and the external reference matrix to finish the calibration of the camera; the perspective projection matrix associates pixel coordinates on the image with actual coordinates in the three-dimensional space, so that points in the image are mapped into the three-dimensional space, and the functions of measuring and calculating of the camera are realized.

Preferably, in step 11, obtaining a camera internal reference, and implementing conversion from a camera coordinate system to an image coordinate system specifically includes the following steps:

step 111, fixedly mounting four Kinect cameras at four vertex angle positions of an indoor positioning site to obtain position information of a plurality of angles of the trolley;

and 112, calling a pyk a interface to obtain a Kinect camera internal reference matrix K.

Preferably, in step 12, obtaining the camera external parameters, and implementing the conversion from the world coordinate system to the camera coordinate system specifically includes the following steps:

step 121, recording a section of indoor positioning site video through a multi-camera, and splitting the video file into a Zhang De checkerboard image;

step 122, loading images, selecting four vertex positions for each image, and calling a cv2.findHomoprography () function to calculate a homography matrix of the camera, wherein the homography matrix is used for mapping points on the images to points in the actual world;

step 123, decomposing a camera homography matrix, and obtaining external parameters of the camera, including a rotation matrix R and a translation matrix T, wherein:

T＝(t _x ,t _y ,t _z ) ^T (2)。

preferably, in step 13, the calibration of the camera is completed by constructing a perspective projection matrix through an internal reference matrix and an external reference matrix, which specifically includes the following steps:

step 131, setting the coordinate of a point in the world coordinate system as P _w ＝(x _w ,y _w ,z _w ) ^T The coordinates in the camera coordinate system are P _c ＝(x _c ,y _c ,z _c ) Then:

step 132, if the pixel coordinate in the image coordinate system is { u, v }, then:

wherein K is a camera reference matrix, c _x And c _y Representing camera optical axis sitting in pixelOffset in the frame; f (f) _x And f _y Normalized focal lengths on the u-axis and v-axis.

Preferably, in step 2, based on the improved YOLOv5 model, the feature point detection for the trolley specifically includes the following steps:

step 21, setting a trolley characteristic point: binding four balls with different colors at front and rear vertex angles of the trolley to serve as characteristic points of the trolley, determining the advancing and retreating directions of the trolley through the colors of the characteristic points, determining the position of the trolley through the position of the characteristic points, and meanwhile adding the trolley as one characteristic point to realize the position constraint effect on other characteristic points;

step 22, constructing a small target detection deep learning model based on improved YOLOv 5: the YOLOv5 model is improved from three aspects of a feature extraction model, a loss function module and a non-maximum suppression module NMS, so that the detection precision of the YOLOv5 model on small target objects is effectively enhanced;

step 23, constructing a trolley detection data set;

and step 24, training the YOLOv 5-based target detection deep learning model constructed in the step 22.

Preferably, in step 22, the feature extraction model is modified: the ratio of the trolley feature points to the whole image is small under the view of the camera, a 4-time downsampling process is added to an original input picture on the basis of a YOLOv5 backbone network, and the original picture is fed into a feature fusion network after 4-time downsampling to obtain a feature picture with a new size; the feature map has smaller receptive field and relatively rich position information, and can promote the detection effect of a small target;

improvement loss function module: and (3) taking the EIoU loss function, taking the aspect ratio apart on the basis of the CIoU, adding the Focal focusing high-quality anchor frame, taking the aspect ratio influence factor apart on the basis of the CIoU penalty term, and calculating the length and width of the target frame and the anchor frame respectively, wherein the EIoU loss function comprises three parts of overlapping loss, center distance loss and width and height loss, the first two parts continue the method in the CIoU, the width and height loss makes the difference between the width and the height of the target frame and the anchor frame minimum, the convergence speed is faster, and the EIoU loss function formula is as follows:

where IOU (A, B) represents the ratio of the intersection to union of two rectangular boxes A and B, a commonly used IoU value, which is a standard method of evaluating the similarity of a prediction bounding box to a real bounding box; ρ ² : center point b representing predicted bounding box and center point b of real bounding box ^gt Square distance between them; ρ ² (h,h ^gt ) And ρ ² (w,w ^gt ): representing the height h and width w of the predicted bounding box and the height h of the real bounding box, respectively ^gt And width w ^gt Square distance between them; h is a ^c And w ^c : center point coordinates representing the height and width of the predicted bounding box, respectively.

Improved non-maximum suppression NMS module: the non-maximum suppression module is used for a prediction stage of target detection, NMS is used for merging similar boundary boxes of the same target, and DIoU substitution IoU considering the distance between the center points of the two boxes is used as a judgment standard of the NMS;

wherein s is _i The classification confidence coefficient epsilon is an NMS threshold value, M is a frame with the highest confidence coefficient, and different characteristic points exist in a boundary frame with a far distance between a DIoU-NMS prediction center point, so that the condition of missed detection is reduced.

Preferably, in step 23, constructing the trolley detection data set specifically includes the steps of:

step 231, acquiring trolley videos of a plurality of angles by adopting the four Kinect cameras installed in the step 1, and dividing the videos into images;

step 232, performing feature point semiautomatic labeling on the acquired trolley image by using a labeling tool LabelImg;

step 233, converting the labeling data set into a yolo labeling format suitable for the YOLOv5 model;

step 234, constructing a trolley detection training set, a verification set and a test set, and randomly selecting 80% of trolley detection data sets as the training set, 10% of trolley detection data sets as the verification set and 10% of trolley detection data sets as the test set.

Preferably, in step 24, the YOLOv 5-based object detection deep learning model constructed in the training step 22 specifically includes the following steps:

step 241, setting training parameters, training by using a random optimization algorithm Adam, wherein the size of a training Batch is set to be batch=64, the Momentum momentum=0.9, the learning rate is initially set to be ir=0.001, and the training iteration times epoch=300;

step 242, the trolley detection data set constructed in the step 22 is sent to a target detection deep learning model based on YOLOv5 constructed in the step 22;

step 243, training a target detection deep learning model based on YOLOv5, adjusting the learning rate and the iteration times according to the average precision change and the loss change trend of the cross verification of the training set and the verification set until the precision change and the loss change gradually tend to a stable state, and determining the final learning rate and the iteration times;

and step 244, completing training of the target detection deep learning model based on the YOLOv5 according to the determined learning rate and iteration times, and obtaining the target detection deep learning model based on the YOLOv5 with good convergence.

Preferably, in step 3, triangulating the coordinates of the feature points detected in step 2 to obtain depth information of the trolley, and completing the indoor positioning target specifically includes the following steps:

step 31, for each frame of image, detecting the characteristic points of the trolley by using a trained target detection model, and acquiring the position and category information of the characteristic points in the image;

step 32, triangulating the positions of the pixel points of the same feature point in different cameras, calculating the coordinates of the feature point in a three-dimensional space, and obtaining depth information of the feature point; acquiring a group of images under multiple cameras at the same moment, and designating one point on one image as a point to be reconstructed; determining the coordinates of the characteristic points of each image according to the category consistency, and calculating corresponding three-dimensional coordinates of each characteristic point by utilizing a triangulation principle;

and 33, continuously detecting the characteristic points of the trolley in the real-time image, and outputting the three-dimensional coordinates of the trolley to the trolley in real time through the GRPC so as to realize indoor automatic driving of the trolley.

The beneficial effects of the invention are as follows: the depth image obtained based on the four Kinect cameras and the multi-camera system can provide high-quality indoor positioning, the multi-camera system can reduce errors, no extra base station or beacon is needed, and positioning accuracy is improved; meanwhile, the balls with different colors are bound to the trolley to serve as target feature points, so that the reliability of target identification and positioning is improved, and the diversity is conducive to realizing reliable positioning under various illumination and scene conditions; feature point detection is carried out based on an improved YOLOv5 model, and the object detection function can be directly utilized to label the feature points of the pellets, so that the preparation work of training data is simplified; since YOLOv5 has multi-target detection capability, it can detect a plurality of balls with different colors on the trolley at the same time, and further accurately position the trolley according to the relative positions and depth information of the feature points; the improved YOLOv5 model not only can be used for detecting specific small ball feature points, but also can be easily expanded to feature points of other shapes, sizes and colors, so that greater flexibility is provided, and the deployment and maintenance cost of the system can be greatly reduced as no additional base stations or beacons are relied on, and meanwhile, the portability and expansibility of the system are also increased; therefore, the detection of the trolley characteristic points combined with the improved YOLOv5 model brings real-time, accuracy and robustness improvement to the indoor positioning method, and simultaneously simplifies the deployment and maintenance work of the system; the indoor positioning technology based on multiple cameras combines the advantages of high precision, stability, low cost, autonomous navigation and the like, provides a more feasible and reliable solution to the indoor positioning problem, and provides a new indoor positioning solution to the fields of automatic driving and robot navigation.

Drawings

FIG. 1 is a schematic flow chart of a positioning method in a trolley room according to the present invention.

Fig. 2 is a schematic diagram of a feature extraction module modified in the present invention.

FIG. 3 is a schematic diagram of a labelImg labeling interface of the present invention.

FIG. 4 is a schematic diagram of the model evaluation result of the present invention.

Fig. 5 is a schematic diagram of the analysis of the positioning error in the car room according to the present invention.

Fig. 6 is a diagram showing an example of the detection of a car object according to the present invention.

Fig. 7 is a schematic diagram of a three-dimensional reconstruction method of multiple images based on feature point constraints according to the present invention.

Detailed Description

As shown in fig. 1, a multi-camera-based positioning method in a car room includes the steps of:

calibrating the internal and external parameters of the multi-camera, and acquiring three-dimensional space information from the two-dimensional image specifically comprises the following steps:

step 11, acquiring camera internal parameters, and converting a camera coordinate system into an image coordinate system; the method specifically comprises the following steps:

Step 12, obtaining camera external parameters, and converting a world coordinate system into a camera coordinate system; the method specifically comprises the following steps:

T＝(t _x ,t _y ,t _z ) ^T (2)。

step 13, constructing a perspective projection matrix through the internal reference matrix and the external reference matrix to finish the calibration of the camera; the perspective projection matrix associates pixel coordinates on the image with actual coordinates in the three-dimensional space, so that points in the image are mapped into the three-dimensional space, and the functions of measuring and calculating of the camera are realized. The method specifically comprises the following steps:

wherein K is a camera reference matrix, c _x And c _y Representing an offset of the camera optical axis in the pixel coordinate system; f (f) _x And f _y Normalized focal lengths on the u-axis and v-axis.

Step 2, detecting characteristic points of the trolley based on an improved YOLOv5 model; the method specifically comprises the following steps:

improving a feature extraction model: the ratio of the trolley feature points to the whole image is small under the view of the camera, a 4-time downsampling process is added to an original input picture on the basis of a YOLOv5 backbone network, and the original picture is fed into a feature fusion network after 4-time downsampling to obtain a feature picture with a new size; the feature map has smaller receptive field and relatively rich position information, and can improve the detection effect of small targets, as shown in fig. 2;

Step 23, constructing a trolley detection data set; the method specifically comprises the following steps:

step 232, performing feature point semiautomatic labeling on the acquired trolley image by using a labeling tool LabelImg; the labelmg labeling interface is shown in figure 3.

And step 24, training the YOLOv 5-based target detection deep learning model constructed in the step 22. The method specifically comprises the following steps:

and step 244, completing training of the target detection deep learning model based on the YOLOv5 according to the determined learning rate and iteration times, and obtaining the target detection deep learning model based on the YOLOv5 with good convergence. Evaluation model: the effect of the model is evaluated by using five indexes such as box_loss and obj_loss, the model evaluation result is shown in fig. 4, and the index mAP@50 can reach more than 99%, so that the accuracy of the target detection model obtained by training of the invention is fully proved to be reliable; FIG. 5 is a graph of the error value between the three-dimensional coordinates and the real space coordinates of the positioning of the trolley using the model, the error range being between 0.01m and 0.10 m; an example of the target detection of the cart is shown in fig. 6.

And step 3, triangulating the coordinates of the pixel points of the feature points detected in the step 2 to obtain depth information of the trolley, and finishing the indoor positioning target. As shown in fig. 7, the method specifically comprises the following steps:

Claims

1. The positioning method in the car room based on the multiple cameras is characterized by comprising the following steps:

2. The multi-camera-based positioning method in a car room according to claim 1, wherein in step 1, the calibration of the internal and external parameters of the multi-camera is performed, and the acquisition of three-dimensional spatial information from the two-dimensional image specifically comprises the following steps:

3. The multi-camera based intra-car positioning method according to claim 2, wherein in step 11, camera parameters are acquired, and the conversion from the camera coordinate system to the image coordinate system is implemented specifically by the following steps:

4. The multi-camera based cart indoor positioning method of claim 2, wherein in step 12, camera parameters are obtained, and the conversion from the world coordinate system to the camera coordinate system is performed specifically by the following steps:

T＝(t _x ,t _y ,t _z ) ^T (2)。

5. the multi-camera-based positioning method in a car room according to claim 2, wherein in step 13, a perspective projection matrix is constructed by an internal reference matrix and an external reference matrix, and the calibration of the camera is completed specifically comprises the following steps:

6. The multi-camera based intra-vehicle positioning method according to claim 1, wherein in step 2, feature point detection for the vehicle based on the modified YOLOv5 model specifically includes the steps of:

step 23, constructing a trolley detection data set;

7. The multi-camera based cart indoor positioning method of claim 6, wherein in step 22, the feature extraction model is modified: the ratio of the trolley feature points to the whole image is small under the view of the camera, a 4-time downsampling process is added to an original input picture on the basis of a YOLOv5 backbone network, and the original picture is fed into a feature fusion network after 4-time downsampling to obtain a feature picture with a new size; the feature map has smaller receptive field and relatively rich position information, and can promote the detection effect of a small target;

where IOU (A, B) represents the ratio of the intersection to union of two rectangular boxes A and B, a commonly used IoU value, which is a standard method of evaluating the similarity of a prediction bounding box to a real bounding box; ρ ² : center point b representing predicted bounding box and center point b of real bounding box ^gt Square distance between them; ρ ² (h,h ^gt ) And ρ ² (w,w ^gt ): representing the height h and width w of the predicted bounding box and the height h of the real bounding box, respectively ^gt And width w ^gt Square distance between them; h is a ^c And w ^c : center point coordinates representing the height and width of the predicted bounding box, respectively;

8. The multi-camera based cart indoor positioning method of claim 6, wherein in step 23, constructing the cart detection data set specifically comprises the steps of:

9. The multi-camera based intra-car positioning method according to claim 6, wherein the YOLOv 5-based object detection deep learning model constructed in the training step 22 in step 24 specifically includes the steps of:

10. The multi-camera-based positioning method in a car room according to claim 1, wherein in step 3, triangulating coordinates of the feature points detected in step 2 to obtain depth information of the car, and completing the positioning of the target in the car room specifically comprises the following steps: