CN115497077A

CN115497077A - Carriage attitude recognition system, carriage attitude recognition method, electronic device and storage medium

Info

Publication number: CN115497077A
Application number: CN202211299656.9A
Authority: CN
Inventors: 蔡登胜; 李佳恒; 陶佳伟; 周文彬
Original assignee: Guangxi Liugong Machinery Co Ltd
Current assignee: Guangxi Liugong Machinery Co Ltd
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2022-12-20
Also published as: WO2024087962A1

Abstract

The invention discloses a carriage posture recognition system, a carriage posture recognition method, electronic equipment and a storage medium. The system comprises a processor, a binocular camera, a first vehicle, a second vehicle and an identification member; the first vehicle includes a cabin; the second vehicle is used for unloading materials to the carriage of the first vehicle; the identification piece is arranged on the side surface of the carriage; a plurality of reference points are arranged on the identification piece; the binocular camera is arranged on the second vehicle and used for acquiring an image containing the identification piece and sending the image to the processor; the processor is used for identifying the image by using the image identification network model to obtain a target image; the target image is an image containing the recognition results of a plurality of reference points; determining a three-dimensional coordinate of a target space of each reference point in the target image by using a binocular camera solid geometry vision algorithm; and determining the attitude information of the carriage based on the target space three-dimensional coordinates of the reference points. The system has the characteristics of simple structure and suitability for large-scale outdoor unmanned operation scenes.

Description

Carriage attitude recognition system, carriage attitude recognition method, electronic device and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a carriage attitude recognition system, a carriage attitude recognition method, electronic equipment and a storage medium.

Background

Engineering machinery equipment such as earth moving machinery, mining machinery, road surface machinery and the like need cluster cooperative operation in unmanned autonomous operation upgrading, for example, an excavator discharges materials to a truck, and under the application scene, the excavator is required to have real-time position and attitude detection capability of a truck carriage.

Generally, the position of the truck in the map coordinate system can be detected by using a differential positioning principle of Real-time kinematic (RTK) and the map coordinate system position of the carriage can be calculated by the size information of the truck. Meanwhile, the excavator can also obtain the position of the excavator in a map coordinate system by utilizing RTK positioning, and the relative position of the truck carriage and the excavator can be obtained by comparing the RTK positioning and the map coordinate system, so that the unloading is finished.

The scheme can complete unmanned autonomous operation of the excavator outdoors, but the problem is high cost, because a set of RTK mobile station needs to be respectively installed on the truck and the excavator to obtain the position information of the truck and the excavator in a map, and a public base station needs to be installed at the same time, the cost is possibly tens of thousands yuan to tens of thousands yuan, and large-scale application cannot be realized. Therefore, the scheme is not suitable for large-scale application of unmanned autonomous operation outside the engineering machinery.

Disclosure of Invention

The invention aims to solve the technical problems of complex system structure, low intellectualization, high cost and narrow application range of the existing excavator based on unmanned autonomous operation.

To solve the above technical problem, in one aspect, the present application discloses a car posture recognition system, which includes a processor, a binocular camera, a first vehicle, a second vehicle, and an identifier;

the first vehicle includes a cabin;

the second vehicle is used for unloading materials to the compartment of the first vehicle;

the identification piece is arranged on the side surface of the carriage; the identification piece is provided with a plurality of reference points;

the binocular camera is arranged on the second vehicle and used for acquiring an image containing the identification piece and sending the image to the processor;

the processor is in communication connection with the binocular camera; the processor is used for identifying the image by using the image identification network model to obtain a target image; the target image is an image containing the recognition results of the plurality of reference points; determining a target three-dimensional coordinate of each reference point in a plurality of reference points in the target image by using a binocular camera solid geometry vision algorithm; and determining the attitude information of the carriage based on the target three-dimensional coordinates of the reference points.

Optionally, the second vehicle comprises a connected tripper device and a body structure;

the discharging device is connected with the first side surface of the main body structure;

the binocular camera is arranged on the first side face.

Optionally, the first side comprises a first mounting point and a second mounting point;

the first mounting point and the second mounting point are located on top of the first side;

a first preset distance exists between the first mounting point and the second mounting point;

the binocular camera comprises a first binocular camera and a second binocular camera;

the first binocular camera is arranged at the first mounting point;

the second binocular camera is arranged at the second mounting point.

Optionally, the identifier comprises at least two sub identifiers;

a first location point and a second location point of the at least two sub-identifiers located on a second side of the car, respectively;

the second side surface is a surface facing the first side surface;

a second preset distance exists between the first position point and the second position point; the second preset distance is greater than half of the length of the second side surface along the first preset direction; the first preset direction is vertical to the second preset direction; the second preset direction is an extension line direction of the height direction of the first vehicle.

Optionally, the processor comprises an image recognition module and a position determination module;

the image recognition module is used for carrying out recognition processing on the image by utilizing an image recognition network model to obtain a target image; the target image is an image containing the identification results of the plurality of reference points, and the target image is sent to the position determining module;

the position determining module is used for determining a target three-dimensional coordinate of each reference point in a plurality of reference points in the target image by using a binocular camera solid geometry vision algorithm; and determining the attitude information of the carriage based on the target three-dimensional coordinates of the reference points.

In another aspect, the present application further discloses a car posture identifying method, which includes:

acquiring an image containing a marker by using a binocular camera;

inputting the image into an image recognition network model to obtain a target image; the target image comprises a recognition result of each reference point in the plurality of reference points;

for each of the reference points, determining coordinates of the reference point in a pixel coordinate system based on the target image and the recognition result of the reference point;

acquiring camera parameters of the binocular camera;

determining a target three-dimensional coordinate of the reference point based on camera parameters of the binocular camera and the coordinate of the reference point under a pixel coordinate system by using a binocular camera solid geometry vision algorithm;

determining a target three-dimensional coordinate of the carriage based on the target three-dimensional coordinates of the reference points;

and determining the attitude information of the carriage based on the target three-dimensional coordinates of the carriage.

Optionally, the camera parameters include a distance between left and right eye cameras of the binocular camera, installation parameters, and internal parameters.

Optionally, the target three-dimensional coordinate of the reference point is determined based on the camera parameters of the binocular camera and the coordinate of the reference point in the pixel coordinate system; determining a target three-dimensional coordinate of the carriage based on the target three-dimensional coordinates of the reference points; determining the attitude information of the car based on the target three-dimensional coordinates of the car comprises:

determining the coordinates of the reference point in a camera coordinate system based on the distance between the left and right eye cameras of the binocular camera, the internal reference of the binocular camera and the coordinates of the reference point in a pixel coordinate system;

determining a first coordinate transformation matrix based on the installation parameters of the binocular camera;

converting the three-dimensional coordinates of the reference point in a camera coordinate system into three-dimensional coordinates in a target vehicle coordinate system according to the first coordinate conversion matrix; the target vehicle coordinate system is a coordinate system where the second vehicle is located;

determining the three-dimensional coordinates of the compartment in the target vehicle coordinate system based on the three-dimensional coordinates of the reference points in the target vehicle coordinate system;

and determining the attitude information of the carriage based on the three-dimensional coordinates of the carriage in the target vehicle coordinate system.

In another aspect, the present application further provides an electronic device, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the car attitude identification method.

In another aspect, the present application further provides a computer storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the car attitude identification method.

By adopting the technical scheme, the carriage posture identification method provided by the application has the following beneficial effects:

the carriage attitude recognition system comprises a processor, a binocular camera, a first vehicle, a second vehicle and an identification piece; the first vehicle includes a cabin; the second vehicle is used for unloading materials to the carriage of the first vehicle; the identification piece is arranged on the side surface of the carriage; the identification piece is provided with a plurality of reference points; the binocular camera is arranged on the second vehicle and used for acquiring an image containing the identification piece and sending the image to the processor; the processor is in communication connection with the binocular camera; the processor is used for identifying the image by using the image identification network model to obtain a target image; the target image is an image containing the recognition results of the plurality of reference points; determining a target three-dimensional coordinate of each reference point in a plurality of reference points in the target image by using a binocular camera solid geometry vision algorithm; and determining the attitude information of the carriage based on the target three-dimensional coordinates of the reference points. The carriage gesture recognition system is simple in structure and suitable for large-scale outdoor unmanned operation scenes.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic structural diagram of an alternative car attitude identification system according to the present application;

FIG. 2 is a schematic illustration of an alternative second vehicle of the present application;

FIG. 3 is a schematic illustration of an alternative first vehicle configuration of the present application;

FIG. 4 is a schematic structural view of an alternative identification member of the present application;

FIG. 5 is a schematic flow chart diagram of an alternative car attitude identification method of the present application;

FIG. 6 is a schematic flow chart of an alternative car attitude identification method of the present application;

FIG. 7 is a schematic view of an alternative relationship between various coordinates according to the present application;

FIG. 8 is a schematic view of an alternative binocular stereoscopic camera model of the present application;

fig. 9 is a schematic diagram of another alternative relationship between multiple coordinates according to the present application.

The following is a supplementary description of the drawings:

1-a first vehicle; 101-a compartment; 2-a second vehicle; 201-a discharge device; 202-a body structure; 2021-a first side; 3-a marker; 301-child identifiers; 4-a binocular camera; 401-a first binocular camera; 402-a second binocular camera; and 5, a processor.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic may be included in at least one implementation of the present application. In the description of the present application, it is to be understood that the terms "upper", "lower", "top", "bottom", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing the present application and simplifying the description, and do not indicate or imply that the referred devices or elements must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. Moreover, the terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein.

Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Generally, the following methods are included for detecting the position of the vehicle compartment.

The first is the detection of the position of the vehicle cabin by vision at a fixed position. For example, the position of a car is determined by visually detecting several key points of the car moving back and forth at a plurality of positions, and a work operation is performed. The method is suitable for detecting the position of the carriage in a single scene, belongs to repeated detection, has low intelligent degree, and is not suitable for detecting the dynamic position and the attitude of the carriage of the unmanned autonomous operation truck outdoors in the excavator.

The second is to correct the relative position point to point by using a laser. In the industry of Automatic Guided Vehicles (AGVs), a laser transmitter installed on the AGV and a laser receiver on a shelf are used for correcting the relative positions of the AGV and the shelf, and guiding the AGV to unload. The AGV trolley can only be used for accurately correcting the tail end of a fixed route and cannot be used for detecting the dynamic position and the attitude of the outdoor unmanned autonomous operation truck carriage of the excavator.

The third is unmanned in-truck detection. The front truck is detected in the current unmanned driving process to obtain the space relative position of the front truck and the self-vehicle, but the accurate position and the attitude of a truck carriage are not concerned in the unmanned driving process, only the truck is detected as a whole, and the video truck is an obstacle, so that the detection of the truck in the unmanned driving process cannot be directly used for the accurate detection of the position and the attitude of the unmanned autonomous operation truck carriage outside the engineering machinery room.

Fourth, RTK truck position, attitude detection as set forth in the background section above. The RTK can detect the position of the truck in a map coordinate system by using a differential positioning principle, and meanwhile, the position of the map coordinate system of the carriage is calculated through the size information of the truck. Meanwhile, the excavator can also obtain the position of the excavator in a map coordinate system by utilizing RTK positioning, and the relative position of the truck carriage and the excavator can be obtained by comparing the RTK positioning and the map coordinate system, so that the unloading is finished. The scheme can complete unmanned autonomous operation of the excavator outdoors, but the problem is high cost, because a set of RTK mobile station needs to be respectively installed on the truck and the excavator to obtain the position information of the truck and the excavator in a map, and a public base station needs to be installed at the same time, the cost is possibly tens of thousands yuan to tens of thousands yuan, and large-scale application cannot be realized. Therefore, the scheme is not suitable for large-scale application of unmanned autonomous operation outside the engineering machinery.

The above-mentioned way of detecting the car has the following disadvantages:

1) The intelligent degree is low. The multi-position visual carriage detection and the point-to-point laser relative position correction detection do not have the intelligent detection capability, and the intelligent degree is low, so that the method can not be used for sensing the dynamic position and the attitude of the outdoor unmanned autonomous operation truck carriage of the excavator.

2) The structure is complicated and the cost is high. Although the RTK positioning technology can be used for unmanned autonomous operation outside the engineering machinery, the cost is high, large-scale application is impossible, and the RTK positioning technology is only suitable for some early exploration researches.

3) Is not suitable for outdoor operation of engineering machinery. The existing mass production schemes cannot be used for unmanned autonomous operation outside the engineering machinery, wherein the unmanned truck detection is also truck detection, but the truck as a whole is used as an obstacle to be detected, and the accurate position and the posture of the carriage are not concerned. In other schemes, multi-position visual carriage detection and point-to-point laser relative position correction detection cannot be used for dynamic sensing of the carriage position and posture of the unmanned autonomous operation truck outside the engineering machinery due to low intelligent degree; the RTK positioning technology cannot be used for large-scale application of unmanned autonomous operation outside the engineering machinery because of its high cost.

To this end, referring to fig. 1, fig. 1 is a schematic structural diagram of an alternative car attitude recognition system according to the present application. The application discloses a carriage attitude recognition system, which comprises a processor 5, a binocular camera 4, a first vehicle 1, a second vehicle 2 and an identification piece 3; the first vehicle 1 includes a cabin 101; the second vehicle 2 is used for unloading materials to the compartment 101 of the first vehicle 1; the marker 3 is arranged on the side surface of the carriage 101; a plurality of reference points are arranged on the identification part 3; the binocular camera 4 is arranged on the second vehicle 2, and the binocular camera 4 is used for acquiring an image containing the identification part 3 and sending the image to the processor 5; the processor 5 is in communication connection with the binocular camera 4; the processor 5 is configured to perform recognition processing on the image by using an image recognition network model to obtain a target image; the target image is an image containing the recognition results of the plurality of reference points; determining a target three-dimensional coordinate of each reference point in a plurality of reference points in the target image by using a binocular camera 4 stereoscopic geometric vision algorithm; the attitude information of the vehicle 101 is determined based on the target three-dimensional coordinates of the respective reference points.

Alternatively, the first vehicle 1 may be a vehicle having a cabin 101, such as a truck; the second vehicle 2 may be a vehicle having a robot arm, such as an excavator.

Alternatively, the processor 5 may be located on the second vehicle 2, or may be located on a terminal or server, independent of the vehicle.

Optionally, the server may include an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud audio recognition model training, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. The operating system running on the server may include, but is not limited to, an android system, an IOS system, linux, windows, unix, and the like.

Alternatively, the terminal may include, but is not limited to, a smart phone, a desktop computer, a tablet computer, a laptop computer, a smart speaker, a digital assistant, an Augmented Reality (AR)/Virtual Reality (VR) device, a smart wearable device, and the like. The software running on the client may also be an application, an applet, or the like. Alternatively, the operating system running on the client may include, but is not limited to, an android system, an IOS system, linux, windows, unix, and the like.

In an alternative example, referring to fig. 2, fig. 2 is a schematic structural diagram of an alternative second vehicle of the present application. The second vehicle 2 comprises a connected discharge device 201 and a body structure 202; the discharge device 201 is connected to the first side 2021 of the main body structure 202; the binocular camera 4 is disposed on the first side 2021.

In order to further ensure that the binocular camera can acquire image information of all the identification pieces. In an alternative example, the first side 2021 includes a first mounting point and a second mounting point; the first and second mounting points are located on top of the first side 2021; a first preset distance exists between the first mounting point and the second mounting point; the binocular camera 4 includes a first binocular camera 401 and a second binocular camera 402; the first binocular camera 401 is disposed at the first mounting point; the second binocular camera 402 is provided at the second mounting point.

In order to improve the extent of the view angle of the binocular camera 4 and ensure the effect of the acquired image, optionally, the first preset distance is greater than half the width of the first side 2021; optionally, in order to further improve the image acquisition effect; optionally, the first mounting point is located at the leftmost side of the first side 2021, and the second mounting point is located at the rightmost side of the first side 2021.

In an alternative example, referring to fig. 3, fig. 3 is a schematic structural diagram of an alternative first vehicle of the present application. In order to improve the accuracy of the subsequently determined posture of the carriage; the identification member 3 comprises at least two sub-identification members 301; the at least two sub-identifiers 301 are respectively located at a first location point and a second location point on a second side of the car 101; the second side surface is a surface facing the first side surface 2021; a second preset distance exists between the first position point and the second position point; the second preset distance is greater than half of the length of the second side surface along the first preset direction; the first predetermined direction (e.g., the x-axis direction in fig. 3) is perpendicular to the second predetermined direction (e.g., the y-axis direction in fig. 3); the second predetermined direction is an extension line direction of the height direction of the first vehicle 1. The processor 5 can thus determine the spatial three-dimensional coordinates of the plurality of sub-identifiers 301, thereby locating the attitude information of the car 101, such as the distance between the car 101 and the second vehicle 2 and the angle of deflection relative to the second vehicle 2.

In this embodiment, the second side surface may refer to not only one side surface, but in an actual scenario, the first vehicle 1 does not stop right in front of the second vehicle 2, and may form an included angle with the second vehicle 2, and the second side surface may be two side surfaces corresponding to the first side surface 2021; accordingly, the sub-identifiers 301 may be located on one of the two side surfaces, or a corresponding number of sub-identifiers may be located on both side surfaces. This embodiment will be explained by taking as an example a case where the sub-identifiers 301 are all located on the same side.

Optionally, referring to fig. 4, fig. 4 is a schematic structural diagram of an optional identification element of the present application. The identifier 3 is provided with 5 reference points, which may be more than 5 actually according to the requirement, for example, n numbers of 6,7,8,9, 10, etc., when the number of the reference points is larger, the coordinate data of the finally determined identifier 3 is more accurate, but the number of the reference points is too large, the calculation time is further increased, the time consumed for data processing is too long, and the number of the reference points may be specifically set according to the requirement.

In an alternative example, the processor 5 comprises an image recognition module and a position determination module; the image identification module is used for carrying out identification processing on the image by utilizing an image identification network model to obtain a target image containing identification results of the plurality of reference points and sending the target image to the position determination module; the position determining module is used for determining a target three-dimensional coordinate of each reference point in a plurality of reference points in the target image by using a binocular camera 4 stereoscopic geometric vision algorithm; the attitude information of the vehicle 101 is determined based on the target three-dimensional coordinates of the respective reference points. The details of the car attitude identification method can be seen in the following description.

The posture recognition system of the carriage 101 provided by the application has the following advantages:

1) The intelligent degree is high. Based on the artificial intelligence field deep convolution neural network technology, the binocular camera 4 can perform accurate detection of the position and the posture of the truck carriage 101 under various weather conditions and in various working environments by using images and the stereoscopic geometrical vision of the binocular camera 4.

2) And (4) single-ended detection. The binocular camera 4 performs single-ended detection, and recognizes and measures external things like human eyes. The mobile stations do not need to be installed on the truck and the excavator like RTK, RTK positioning results of the truck need to be sent to the unmanned excavator through the communication terminal, and the relative position relation between the truck positioning data and the unmanned excavator can be obtained only by comparing the truck positioning data with self positioning data after the unmanned excavator receives the truck positioning data.

3) The cost is low. The binocular camera 4 that reaches car rule level at present possesses waterproof dustproof ability, can reduce the cost to thousands of yuan in large-scale application, and is with low costs.

4) The large-scale application of unmanned autonomous operation outside the engineering machinery is met. Based on the deep convolutional neural network technology in the field of artificial intelligence, the binocular camera 4 can detect the accurate position and posture of the carriage 101 of the truck under various weather conditions and in various working environments by using images and binocular solid geometry vision. Meanwhile, the cost is low, and the large-scale application of unmanned autonomous operation outside the engineering machinery can be realized.

A specific embodiment of the car attitude identification method according to the present application is described below, and referring to fig. 5, fig. 5 is a schematic flow chart of an optional car attitude identification method according to the present application. The specification provides the method steps as in the examples or flowcharts, but may include more or fewer steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of sequences, and does not represent a unique order of performance. In actual system or server product execution, it may be performed sequentially or in parallel (e.g., parallel processor 5 or a multi-threaded environment) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 5, the method may include:

s501: an image containing the identification member 3 is acquired with the binocular camera 4.

In the present embodiment, the binocular camera 4 is arranged in a manner as shown in fig. 2, and is described in detail in the above system section.

Optionally, the binocular camera 4 includes a left eye camera and a right eye camera; step S501 may be specifically stated as: acquiring a first image containing the identification member 3 by using a left eye camera; the first image is a visible light image; acquiring a second image comprising the identification member 3 by using a right-eye camera; the second image is a visible light image, and in step S503 below, the input image includes the first image and the second image, and then the target three-dimensional coordinate calculation is performed based on the identification result of the two image identifiers.

S503: inputting the image into an image recognition network model to obtain a target image; the target image includes a recognition result for each of the plurality of reference points.

Optionally, the image recognition network model includes a feature extraction network, a feature fusion network, and a prediction recognition network; step S503 may include: carrying out feature extraction operation on the image by using a feature extraction network to obtain a feature map set; performing feature fusion processing on the feature map set by using a feature fusion network to obtain a target feature map; and performing prediction processing on the target characteristic graph by using a prediction identification network to obtain a target image.

Optionally, the feature extraction network includes an input layer and a sub-feature extraction network; the input layer is used for normalizing image data and then sending the image data to a neural network for reasoning, and the normalization method has various methods, wherein one form is to normalize the pixel value of an image from 0-255 to 0-1. The image pixel value normalization formula is as follows:

X1＝x/255

wherein x represents a pixel value corresponding to a pixel point of the image; x1 represents a value obtained by normalizing the pixel value of the pixel.

For example, when X =200, X1=0.78 after the normalization processing can be obtained.

And then continuously extracting image features from the normalized input layer image data through convolution operation and activation operation.

Optionally, in the feature extraction process, the convolution kernel may be set to 1*1, or may be set to 3*3, which is not limited herein.

And obtaining the output of the current convolutional layer through an activation function after the convolution operation. There are many types of activation functions, and the ReLU and Sigmoid are commonly used.

Optionally, the deep convolution neural network Yolo5KeyPoints adopted in the scheme has generalization capability, and can effectively identify the identifier 3 containing the specific pattern and the 5 reference points thereof in various weather and various working environments. The Yolo5KeyPoints adopts CSP-Darknet53 as a main feature extraction network, SPPF and CSP-PAN as a feature fusion network, and then uses a class prediction subnetwork (class subnet) to predict the category of each grid, a detection frame regression subnetwork (box subnet) regression detection frame and a reference point regression subnetwork (key point subnet) regression reference point to obtain a final network prediction result. The reference point detection Loss function Yolo5KeyPoints uses a Wing Loss function, when Loss is large, the parameter gradient is small, the Wing Loss function is not sensitive to outliers, when the Loss is small, the parameter gradient is large, model convergence is better, and the reference point detection accuracy is greatly improved.

As can be seen from the above description, the trunk feature extraction network is generally connected to a feature fusion network, and the feature fusion network performs feature fusion on feature maps of different scales extracted by the trunk feature extraction network, and then further performs convolution to extract features.

The feature fusion network may include an enhanced feature extraction network that may be performed using a feature pyramid stacking plus convolution operation, with a typical enhanced feature extraction network being FPN. And fusing the low-layer high-resolution feature map and the high-layer high-semantic feature map by the FPN network, and then independently predicting each fused layer of feature map to obtain a prediction result.

After the feature fusion and enhancement of the feature extraction network, a 1x1 convolution is generally connected to obtain a network prediction result, namely a detection frame regression result, a reference point regression result and a class confidence value.

Tests show that the position and posture recognition algorithm for the carriage 101 is deployed in a CPU, the operation speed is about 130 ms/time, the real-time requirement can be met in a low-speed scene that the excavator does unmanned autonomous operation, and the real-time performance can be further improved to be within 30 ms/time if the algorithm is deployed in AI acceleration hardware (such as a GPU and an AI chip).

The deep convolutional neural network Yolo5KeyPoints can only recognize the identifier 3 containing the specific pattern and 5 reference points thereof by acquiring data, labeling the data and training a model in advance.

An alternative process for training the image recognition network model for the object recognition is as follows:

1) Acquiring a training sample data set, wherein the training sample data set comprises each sample image in a plurality of sample images and a corresponding label image; each of said sample images comprising a marker 3 and a plurality of reference points located on the marker 3; the label image is an image corresponding to the labeling frame identification of the identifier 3 in the sample image and the identification of each reference point on the identifier 3.

In this embodiment, the sample image may be an image of the identification component 3 captured by a binocular camera under various weather conditions and in various working environments, and the image meeting the requirements is manually picked out and labeled after the image is captured. When the pictures are selected, meaningful pictures in various weather and various working environments are selected, the pictures completely contain the patterns of the identification pieces 3, and only one of the pictures which appear repeatedly is reserved.

The label image is generated as follows: and (3) carrying out data annotation by using a labelme tool to obtain an annotation frame of each image identifier 3 and a label (label) file of 5 reference point positions.

The specific operation process is as follows: clicking a button of a target rectangular frame in the label software labelme, creating a new target frame to frame the identifier 3, and inputting a label name 'SignBoard' of the label frame to obtain a label of the identifier 3; then click the create point target button, click five reference points of the logo 3 pattern, and enter the reference point name: the point1 (point 1), the point2 (point 2), the point3 (point 3), the point4 (point 4) and the point5 (point 5) obtain labels of reference points, and all the markers 3 marking the whole picture and the 5 reference points thereof obtain the label image of the picture.

2) And constructing a preset deep learning model, and determining the preset deep learning model as a current deep learning model.

3) And performing annotation prediction operation on the sample images in the sample data set based on the current deep learning model, and determining the annotation prediction result of the sample images.

4) Determining a loss value based on the sample image annotation and the annotation prediction result.

5) When the loss value is larger than a preset threshold value, performing back propagation on the basis of the loss value, updating the model weight of the current deep learning model to obtain a deep learning model after updating the model weight, and re-determining the deep learning model after updating the model weight as the current deep learning model, so as to finish one iterative training; repeating the steps: and performing annotation prediction operation on the sample image based on the current deep learning model, and determining a loss value between the annotation of the sample image and the annotation prediction result, wherein if the loss value is greater than a threshold value, the model weight is updated by back propagation.

6) And when the loss value is smaller than or equal to the preset threshold value or reaches a preset maximum iteration number, determining the current deep learning model as the image recognition network model of the object.

In an alternative embodiment, the model training process may further be as follows: and loading pre-training weights, and inputting the marked data for model training. The image data is normalized through a preprocessing module, the normalized image data is sent to a network model for forward propagation to obtain a prediction result, and the prediction result and a label file target true value solve the deviation between the model output and the target true value through a damage function, namely a loss value. The loss value can comprise three parts, namely target classification loss, detection frame regression loss and reference point regression loss, weight updating is carried out on each layer of weight of the network through back propagation of the loss, one training is completed, the model is converged through continuous iterative training, and the training of the whole model is completed when the convergence target is reached or the maximum iteration times are reached.

And then deploying the trained Yolo5KeyPoints model to a CPU (central processing unit) processor or a GPU (graphics processing Unit) or AI (analog input) chip to perform model inference prediction to obtain the identification piece 3 of the real-time shot image of the binocular camera 4 and the identification results of 5 reference points of the identification piece.

In the model training stage, a model with detection capability is obtained through field data training, the model can be deployed to a CPU (central processing unit) processor through an opencv library or a libtorh library, and a GPU (graphics processing unit) and an AI (analog input/output) chip can also be deployed to the GPU and the AI chip through a library provided by a manufacturer and a deployment requirement, and conversion of a model format is generally carried out.

The detection process of the identification part 3 and the 5 reference points thereof comprises three stages, and when a real-time image is input, the normalization operation of the image is completed through a preprocessing stage; sending the normalized image into a Yolo5KeyPoints model for forward reasoning to obtain a prediction result of each grid point on the image; and finally, the prediction result of each grid point is subjected to post-processing, such as non-maximum suppression, so that the final prediction result is obtained.

In the autonomous operation of the unmanned excavator, the binocular camera 4 acquires image data in real time and identifies the identifier 3 and 5 reference points thereof, and then the spatial three-dimensional coordinates of the 5 reference points of the identifier 3 are obtained by calculating the coordinates of the 5 pairs of reference point images through the stereoscopic geometric vision of the binocular camera 4 in steps S505-S513. The method has high detection robustness and high precision.

S505: for each of the reference points, the coordinates of the reference point in the pixel coordinate system are determined based on the target image and the recognition result of the reference point.

In the present embodiment, the coordinates of the reference point in the pixel coordinate system may be (u, v).

S507: the camera parameters of the binocular camera 4 are acquired.

In an alternative example, the camera parameters include a distance between the left and right eye cameras of the binocular camera 4, installation parameters, and internal parameters.

Optionally, the binocular camera 4 includes a left eye camera and a right eye camera, and a distance between the left eye and the right eye is T; the installation parameters of the binocular camera comprise translation distances between the left eye camera and the right eye camera and the origin of the vehicle coordinate system respectively, and a rotation angle relative to the origin of the vehicle coordinate system; the internal reference of the binocular camera comprises the internal reference of a left eye camera and the internal reference of a right eye camera.

S509: and determining the target three-dimensional coordinates of the reference point based on the camera parameters of the binocular camera 4 and the coordinates of the reference point in the pixel coordinate system by using a binocular camera 4 solid geometry vision algorithm.

S511: determining target three-dimensional coordinates of the compartment 101 based on the target three-dimensional coordinates of the respective reference points;

s513: the attitude information of the car 101 is determined based on the target three-dimensional coordinates of the car 101.

In an alternative example, referring to fig. 6, fig. 6 is a schematic flow chart of another alternative car attitude identification method according to the present application. Steps S509-S513 may be specifically set forth as:

s601: the coordinates of the reference point in the camera coordinate system are determined based on the distance between the left and right eye cameras of the binocular camera 4, the internal reference of the binocular camera 4, and the coordinates of the reference point in the pixel coordinate system.

Optionally, the internal reference of the binocular camera includes an internal reference of a left eye camera and an internal reference of a right eye camera.

Alternatively, an alternative embodiment of determining the coordinates of the reference point in the camera coordinate system is provided below. Referring to fig. 7, fig. 7 is a schematic diagram illustrating a relationship between various coordinate systems according to an alternative embodiment of the present disclosure. World coordinate system: xw, yw, zw; camera coordinate system: xc, yc, zc; image coordinate system: x, y; pixel coordinate system: u, v (reflecting the arrangement of the pixels in the camera CCD chip).

As can be seen from the figure, suppose (u) ₀ ，v ₀ ) Representing the coordinates of O in the u-v coordinate system, assuming that the length and width of a pixel are dx and dy, respectively, the relationship between the pixel coordinate system and the image coordinate system is as follows:

the simultaneous writing of equation (1) and equation (2) into a matrix can be expressed as follows:

the coordinates (x, y) of any pixel point on the image in the image coordinate system can be solved by the above formula (3).

Referring to fig. 8, fig. 8 is a schematic view of an alternative camera model of the present application. O is _l 、O _r Is the center of the left and right objective projections of the binocular camera 4, and the line D between the two is called the binocular base line, i.e., the distance between the center points of the two cameras. P is a point in space, P _l Is the imaging point of point P on the left eye; p _r Is the imaging point of the point P on the right eye, the parallax d = x _l -x _r ；

According to the theorem of similar triangles, delta PO _l O _r Similar Delta PP _l P _r . The depth Z can be obtained by the following formula.

Wherein f is the focal length of the camera.

After the depth information Z, that is, zc, is obtained from the above formula (4), xc and Yc of the coordinates of P in the target binocular camera coordinate system (the binocular camera coordinate system is a coordinate system with the center of the left eye camera as the origin, that is, the camera coordinate system of the coordinate relation fig. 7) can be further determined.

Referring to fig. 9, fig. 9 is a schematic diagram of a relationship between various coordinates according to another alternative embodiment of the present invention. The binocular camera is used for acquiring the left eye image coordinate system and the space point P at the imaging point P of the left eye image _l And the horizontal coordinate X and the vertical coordinate Y of the space point P can be obtained by utilizing the similar triangle theorem. The coordinate of P (X, Y, Z) in the coordinate system of the binocular camera left eye camera is P (Xc, yc, zc); p (x, y) is an imaging point image coordinate of the spatial point P (Xc, yc, zc) in the binocular camera eye image coordinate system.

Due to delta ABO _c ～△oCO _c ,△PBO _c ～△pCO _c Then one can deduce:

from the derivation of the above equation, xc, yc can be found as:

all the coordinates of the spatial points P (Xc, yc, zc) are obtained. According to the mode, the three-dimensional coordinates of the 5 reference points on the identification part 3 in the camera coordinate system can be solved in sequence, and the coordinates are those in the left eye camera coordinate system.

S603: a first coordinate transformation matrix is determined based on the mounting parameters of the binocular camera.

S605: converting the three-dimensional coordinates of the reference point in the camera coordinate system into the three-dimensional coordinates in the target vehicle coordinate system according to the first coordinate conversion matrix; the target vehicle coordinate system is the coordinate system of the second vehicle 2.

In this embodiment, if the origin of the vehicle coordinate system is the center of gyration of the second vehicle 2, the corresponding rotation matrix R and the translation vector T can be determined by determining the translation distance and the rotation angle of the binocular camera from the origin; when the second vehicle is an excavator, the second vehicle comprises a mechanical arm and a base, the rotating end of the mechanical arm is rotatably connected with the base, and the rotation center point of the second vehicle is located on the rotation axis where the rotating end is located.

Converting the three-dimensional coordinates of the reference point in the camera coordinate system into the three-dimensional coordinates in the second vehicle 2 coordinate system, wherein the coordinate conversion matrix is expressed as follows:

the camera coordinate system is then related to the second vehicle 2 coordinate system according to equation (7) as follows:

alternatively, the coordinate transformation matrix, i.e., equation (7), may be different according to the definition of the vehicle coordinate system.

S607: the three-dimensional coordinates of the vehicle compartment 101 in the target vehicle coordinate system are determined based on the three-dimensional coordinates of the respective reference points in the target vehicle coordinate system.

Optionally, taking the example that the identifier 3 includes two sub-identifiers 301, and each sub-identifier 301 includes 5 reference points, for each sub-identifier 301, the coordinates of the 5 reference points may be screened, the data of the reference point position coordinates with a large numerical deviation in the coordinates is removed, and the coordinate data of the rest reference points that meet the requirement is averaged, so as to obtain the three-dimensional coordinate data of the sub-identifier 301. In another alternative embodiment, weights may be set for coordinates of reference points located at different positions on the sub-marker 301, and the three-dimensional coordinate data of the sub-marker 301 is obtained by multiplying the coordinates of each reference point by the corresponding weight.

S609: the attitude information of the vehicle 101 is determined based on the three-dimensional coordinates of the vehicle 101 in the target vehicle coordinate system.

The three-dimensional coordinates of the two sub-identifiers 301 in the second vehicle 2 coordinate system can determine the attitude information of the vehicle 101.

Such as the relative distance between the cabin 101 and the second vehicle 2, the yaw angle, and the like.

In one possible embodiment, the image recognition module comprises a feature extraction sub-module, a feature fusion sub-module and a prediction recognition module; the feature extraction submodule is used for carrying out feature extraction operation on the image by using a feature extraction network to obtain a feature map set; the characteristic fusion submodule is used for carrying out characteristic fusion processing on the characteristic map set by utilizing a characteristic fusion network to obtain a target characteristic map; a prediction identification module for performing prediction processing on the target characteristic graph by using a prediction identification network to obtain a target image

In one possible embodiment, the position determining module includes:

a pixel coordinate determination module for determining, for each of the reference points, coordinates of the reference point in a pixel coordinate system based on the target image and the recognition result of the reference point;

a camera parameter acquiring module for acquiring the camera parameters of the binocular camera 4;

the target three-dimensional coordinate determination module is used for determining a target three-dimensional coordinate of the reference point based on the camera parameters of the binocular camera 4 and the coordinate of the reference point in a pixel coordinate system by utilizing a binocular camera 4 stereoscopic geometric vision algorithm; determining target three-dimensional coordinates of the compartment 101 based on the target three-dimensional coordinates of the respective reference points;

and the attitude information determination module is used for determining the attitude information of the carriage 101 based on the target three-dimensional coordinates of the carriage 101.

In one possible embodiment, the target three-dimensional coordinate determination module comprises a first coordinate determination module, a first coordinate transformation matrix and a second coordinate determination module;

a first coordinate determination module, configured to determine a three-dimensional coordinate of the reference point in a camera coordinate system based on a distance between left and right eye cameras of the binocular camera 4, an internal reference of the binocular camera 4, and a coordinate of the reference point in a pixel coordinate system;

the first coordinate conversion matrix is used for determining the first coordinate conversion matrix based on the installation parameters of the binocular camera;

the second coordinate determination module is used for converting the three-dimensional coordinates of the reference point in the camera coordinate system into the three-dimensional coordinates in the target vehicle coordinate system according to the first coordinate conversion matrix; the target vehicle coordinate system is the coordinate system of the second vehicle 2; determining the three-dimensional coordinates of the compartment 101 in the target vehicle coordinate system based on the three-dimensional coordinates of the reference points in the target vehicle coordinate system;

and the attitude information determination module is used for determining the attitude information of the carriage 101 based on the three-dimensional coordinates of the carriage 101 in the target vehicle coordinate system.

The specific implementation process of the module is the same as the description of the method item, and is not described herein again, and it should be noted that the module may be an entity sub-module in the processor, or may be a virtual module formed by a program.

As described above, based on the car attitude recognition system provided by the present application, the optional working process is as follows: deploying the trained deep convolution neural network model Yolo5KeyPoints to the CPU, and completing one-time detection calculation on the neural network model Yolo5KeyPoints to obtain the identifier 3 and the time consumed by five reference points of the identifier 3, wherein the time consumed by the identifier 3 is about 130ms; if the calculating unit comprises an image calculating unit GPU or an AI accelerating chip, the neural network model Yolo5KeyPoints can be deployed to the image calculating unit GPU or the AI accelerating chip, so that the real-time performance is improved, and at the moment, the neural network model Yolo5KeyPoints completes one detection calculation to obtain the identifier 3 and five reference points thereof, and the time consumption is about 30ms. And (3) calculating the binocular solid geometric vision, and generating 5 reference point space three-dimensional coordinate algorithms for the reference points and deploying the reference point space three-dimensional coordinate algorithms to the CPU.

The system is powered on, the position of the binocular camera 4 in the truck compartment 101, the attitude sensing system are started, the unmanned excavator control system is started, and the unmanned excavator enters an unmanned autonomous operation state.

The operator parks a first vehicle 1, such as a truck, in the parking space in preparation for filling.

After the truck is parked, the binocular camera 4 sends the captured 1280x720RGB images of the left-eye and right-eye color images to the Yolo5KeyPoints model of the deep convolutional neural network respectively. The neural network Yolo5KeyPoints model preprocesses the 1280x720RGB images of the left eye and right eye color images before reasoning, wherein the preprocessing comprises image normalization and image scaling. In the image normalization in the Yolo5KeyPoints model preprocessing, dividing the RGB value of each pixel of the image by 127.5 to subtract 1, and normalizing all the pixel values to-1; the image scaling operation is to scale the original 1280x720RGB image to 1280x736 RGB image, which meets the size requirement of the model input image. After preprocessing, the image data are sent to a Yolo5KeyPoints model for reasoning to obtain model output, and the model output is an object classification result, a detection frame regression result and five reference point regression results of each anchor frame on each grid point. And then, outputting a final identification part 3 target detection frame and 5 reference point pixel coordinates thereof by a Yolo5KeyPoints model through post-processing such as non-maximum value inhibition.

After the pixel coordinates of 5 reference points of the identification part 3 of the left eye image and the right eye image of the binocular camera 4 are obtained through neural network reasoning, the pixel coordinates of the 5 reference points are calculated by utilizing the stereo geometric vision of the binocular camera 4 to obtain the three-dimensional space coordinates of the 5 reference points. The carriage posture recognition system is simple in structure and suitable for large-scale outdoor unmanned operation scenes.

Embodiments of the present application also provide an electronic device comprising a processor and a memory, the memory having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the car attitude recognition method as described above.

Embodiments of the present application also provide a computer storage medium that can be disposed in a server to store at least one instruction, at least one program, a set of codes, or a set of instructions related to implementing a car posture recognition method in the method embodiments, where the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the car posture recognition method.

Optionally, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A carriage attitude recognition system is characterized by comprising a processor (5), a binocular camera (4), a first vehicle (1), a second vehicle (2) and an identification member (3);

the first vehicle (1) comprises a cabin (101);

the second vehicle (2) is used for unloading materials to a compartment (101) of the first vehicle (1);

the identification piece (3) is arranged on the side surface of the carriage (101); a plurality of reference points are arranged on the identification piece (3);

the binocular camera (4) is arranged on the second vehicle (2), and the binocular camera (4) is used for acquiring an image containing the identification piece (3) and sending the image to the processor (5);

the processor (5) is in communication connection with the binocular camera (4); the processor (5) is used for identifying the image by using an image identification network model to obtain a target image; the target image is an image containing the recognition results of the plurality of reference points; determining a target space three-dimensional coordinate of each reference point in a plurality of reference points in the target image by using a binocular camera (4) solid geometry vision algorithm; determining attitude information of the vehicle compartment (101) based on the target space three-dimensional coordinates of the respective reference points.

2. A car attitude recognition system according to claim 1, characterised in that the second vehicle (2) comprises a connected tripper device (201) and a body structure (202);

the discharge device (201) is connected to a first side (2021) of the body structure (202);

the binocular camera (4) is arranged on the first side surface (2021).

3. The car attitude recognition system according to claim 2, wherein the first side (2021) includes a first mounting point and a second mounting point;

the first and second mounting points are located on top of the first side (2021);

the binocular camera (4) comprises a first binocular camera (401) and a second binocular camera (402);

the first binocular camera (401) is arranged at the first mounting point;

the second binocular camera (402) is provided at the second mounting point.

4. A car attitude recognition system according to claim 2, characterized in that the identifier (3) comprises at least two sub-identifiers (301);

the at least two sub-identifiers (301) are respectively located at a first position point and a second position point of a second side face of the carriage (101);

the second side is the side facing the first side (2021);

a second preset distance exists between the first position point and the second position point; the second preset distance is greater than half of the length of the second side surface along the first preset direction; the first preset direction is vertical to the second preset direction; the second preset direction is an extension line direction of the height direction of the first vehicle (1).

5. A car attitude recognition system according to claim 1, wherein the processor (5) comprises an image recognition module and a position determination module;

the image identification module is used for identifying the image by using an image identification network model to obtain a target image; the target image is an image containing the recognition results of the plurality of reference points, and the target image is sent to a position determining module;

the position determination module is used for determining a target space three-dimensional coordinate of each reference point in a plurality of reference points in the target image by using a binocular camera (4) solid geometry vision algorithm; determining attitude information of the vehicle compartment (101) based on the target space three-dimensional coordinates of the respective reference points.

6. A car attitude recognition method implemented by the car attitude recognition system according to any one of claims 1 to 5, characterized by comprising:

acquiring an image containing the identification member (3) by using a binocular camera (4);

acquiring camera parameters of the binocular camera (4);

determining a target space three-dimensional coordinate of the reference point based on camera parameters of the binocular camera (4) and the coordinate of the reference point under a pixel coordinate system by utilizing a binocular camera solid geometry vision algorithm;

determining target space three-dimensional coordinates of the carriage (101) based on the target space three-dimensional coordinates of the respective reference points;

determining attitude information of the car (101) based on the target space three-dimensional coordinates of the car (101).

7. The car posture recognition method according to claim 6, wherein the camera parameters include a distance between left and right eye cameras of the binocular camera (4), an installation parameter, and an internal reference.

8. The car posture recognition method according to claim 7, characterized in that the target space three-dimensional coordinates of the reference point are determined based on the camera parameters of the binocular camera (4) and the coordinates of the reference point in a pixel coordinate system; determining target space three-dimensional coordinates of the carriage (101) based on the target three-dimensional coordinates of the respective reference points; determining attitude information of the car (101) based on target three-dimensional coordinates of the car (101), including:

determining the three-dimensional coordinates of the reference point in a camera coordinate system based on the distance between the left and right eye cameras of the binocular camera (4), the internal reference of the binocular camera and the coordinates of the reference point in a pixel coordinate system;

determining attitude information of the vehicle compartment (101) based on three-dimensional coordinates of the vehicle compartment in a target vehicle coordinate system.

9. An electronic device comprising a processor and a memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the car attitude recognition method according to any one of claims 6-8.

10. A computer storage medium, characterized in that at least one instruction or at least one program is stored in the computer storage medium, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the car attitude identification method according to any one of claims 6 to 8.