WO2024087962A1 - 车厢姿态识别***、方法、电子设备及存储介质 - Google Patents

车厢姿态识别***、方法、电子设备及存储介质 Download PDF

Info

Publication number
WO2024087962A1
WO2024087962A1 PCT/CN2023/120389 CN2023120389W WO2024087962A1 WO 2024087962 A1 WO2024087962 A1 WO 2024087962A1 CN 2023120389 W CN2023120389 W CN 2023120389W WO 2024087962 A1 WO2024087962 A1 WO 2024087962A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
carriage
vehicle
binocular camera
target
Prior art date
Application number
PCT/CN2023/120389
Other languages
English (en)
French (fr)
Inventor
蔡登胜
李佳恒
陶佳伟
周文彬
Original Assignee
广西柳工机械股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广西柳工机械股份有限公司 filed Critical 广西柳工机械股份有限公司
Publication of WO2024087962A1 publication Critical patent/WO2024087962A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Definitions

  • the present invention relates to the field of computer technology, and in particular to a vehicle carriage posture recognition system, method, electronic equipment and storage medium.
  • Earth-moving machinery, mining machinery, road machinery and other engineering machinery equipment need to work in groups during the upgrade to unmanned autonomous operations.
  • the excavator needs to have the ability to detect the real-time position and posture of the truck compartment.
  • the differential positioning principle of carrier phase differential technology (Real-time kinematic, RTK) can be used to detect the position of the truck in the map coordinate system, and the map coordinate system position of the compartment can be calculated through the size information of the truck itself.
  • the excavator can also use RTK positioning to obtain its own position in the map coordinate system. By comparing the two, the relative position of the truck compartment and the excavator can be obtained, and then the unloading can be completed.
  • This solution can complete the unmanned autonomous operation of the excavator outdoors, but the problem that comes with it is the high cost.
  • it is necessary to install a set of RTK mobile stations on the truck and the excavator, and also install a public base station.
  • the cost may be tens of thousands to hundreds of thousands of yuan, which cannot be applied on a large scale. Therefore, this solution is not suitable for large-scale applications of unmanned autonomous outdoor operation of construction machinery.
  • the present invention aims to solve the technical problems that the existing unmanned autonomous operation of excavators is based on a complex system structure, low intelligence, high cost and narrow application range.
  • the present application discloses a vehicle carriage posture recognition system, which includes a processor, a binocular camera, a first vehicle, a second vehicle and an identification member;
  • the first vehicle includes a carriage
  • the second vehicle is used to unload materials into the compartment of the first vehicle
  • the identification piece is arranged on the side of the carriage; a plurality of reference points are arranged on the identification piece;
  • the binocular camera is arranged on the second vehicle, and is used to collect an image containing the identification element and send the image to the processor;
  • the processor is connected to the binocular camera for communication; the processor is used to perform recognition processing on the image using an image recognition network model to obtain a target image; the target image is an image containing the recognition results of the multiple reference points; The target three-dimensional coordinates of each of the multiple reference points in the target image are determined using a binocular camera stereo geometry vision algorithm; and the posture information of the carriage is determined based on the target three-dimensional coordinates of each reference point.
  • the second vehicle includes a connected unloading device and a main structure
  • the discharge device is connected to the first side surface of the main structure
  • the binocular camera is arranged on the first side surface.
  • the first side includes a first mounting point and a second mounting point
  • the first mounting point and the second mounting point are located at the top of the first side;
  • the binocular camera includes a first binocular camera and a second binocular camera
  • the first binocular camera is disposed at the first installation point
  • the second binocular camera is disposed at the second installation point.
  • the identification element includes at least two sub-identification elements
  • a first position point and a second position point of the at least two sub-identification members are respectively located at a second side surface of the carriage;
  • the second side surface is a surface facing the first side surface
  • the second preset distance is greater than half of the length of the second side along the first preset direction; the first preset direction is perpendicular to the second preset direction; the second preset direction is the extension line direction of the height direction of the first vehicle.
  • the processor includes an image recognition module and a position determination module
  • the image recognition module is used to perform recognition processing on the image using an image recognition network model to obtain a target image;
  • the target image is an image containing the recognition results of the multiple reference points, and the target image is sent to the position determination module;
  • the position determination module is used to determine the target three-dimensional coordinates of each of the multiple reference points in the target image using a binocular camera stereo geometry vision algorithm; and determine the posture information of the carriage based on the target three-dimensional coordinates of each reference point.
  • the present application also discloses a method for recognizing a carriage posture, which comprises:
  • the image is input into an image recognition network model to obtain a target image;
  • the target image includes a recognition result of each of the multiple reference points;
  • For each of the reference points determine the coordinates of the reference point in a pixel coordinate system based on the target image and the recognition result of the reference point;
  • the posture information of the carriage is determined based on the target three-dimensional coordinates of the carriage.
  • the camera parameters include the distance between the left and right cameras of the binocular camera, installation parameters, and internal parameters.
  • the determining the target three-dimensional coordinates of the reference point based on the camera parameters of the binocular camera and the coordinates of the reference point in the pixel coordinate system; determining the target three-dimensional coordinates of the carriage based on the target three-dimensional coordinates of each reference point; and determining the posture information of the carriage based on the target three-dimensional coordinates of the carriage include:
  • the three-dimensional coordinates of the reference point in the camera coordinate system are converted into three-dimensional coordinates in the target vehicle coordinate system;
  • the target vehicle coordinate system is the coordinate system where the second vehicle is located;
  • the posture information of the carriage is determined based on the three-dimensional coordinates of the carriage in the target vehicle coordinate system.
  • the present application also provides an electronic device, which includes a processor and a memory, in which at least one instruction, at least one program, a code set or an instruction set is stored, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor to implement the above-mentioned vehicle posture recognition method.
  • the present application also provides a computer storage medium, which stores at least one instruction or at least one program, and the at least one instruction or at least one program is loaded and executed by a processor to implement the above-mentioned vehicle posture recognition method.
  • the carriage posture recognition method provided by the present application has the following beneficial effects:
  • the carriage posture recognition system includes a processor, a binocular camera, a first vehicle, a second vehicle and an identification piece; the first vehicle includes a carriage; the second vehicle is used to unload materials into the carriage of the first vehicle; the identification piece is arranged on the side of the carriage; a plurality of reference points are arranged on the identification piece; the binocular camera is arranged on the second vehicle, and the binocular camera is used to collect an image containing the identification piece and send the image to the processor; the processor is connected to the binocular camera for communication; the processor is used to use an image recognition network model to perform recognition processing on the image to obtain a target image; the target image is an image containing the recognition results of the plurality of reference points; and the stereoscopic geometric vision algorithm of the binocular camera is used to determine the target three-dimensional coordinates of each reference point in the target image; the posture information of the carriage is determined based on the target three-dimensional coordinates of each reference point.
  • the carriage posture recognition system provided by the present application has a simple structure
  • FIG1 is a schematic structural diagram of an optional carriage posture recognition system of the present application.
  • FIG2 is a schematic structural diagram of an optional second vehicle of the present application.
  • FIG3 is a schematic structural diagram of an optional first vehicle of the present application.
  • FIG4 is a schematic diagram of the structure of an optional identification member of the present application.
  • FIG5 is a flow chart of an optional method for recognizing a carriage posture according to the present application.
  • FIG6 is a flow chart of another optional method for recognizing a carriage posture according to the present application.
  • FIG. 7 is a schematic diagram of the relationship between multiple optional coordinates of the present application.
  • FIG8 is a schematic diagram of an optional binocular stereo vision camera model of the present application.
  • FIG. 9 is a schematic diagram showing another optional relationship between multiple coordinates of the present application.
  • one embodiment or “embodiment” as used herein refers to a specific feature, structure or characteristic that may be included in at least one implementation of the present application.
  • orientation or positional relationship indicated by the terms “upper”, “lower”, “top”, “bottom”, etc. is based on the orientation or positional relationship shown in the accompanying drawings, which is only for the convenience of describing the present application and simplifying the description, rather than indicating or implying that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore cannot be understood as a limitation of the present application.
  • first and second are used only for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated.
  • the features defined as “first” and “second” may include one or more of the features explicitly or implicitly.
  • the terms “first”, “second”, etc. are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments of the present application described here can be implemented in an order other than those illustrated or described here.
  • the first method is to use vision to detect the position of the carriage at a fixed position. For example, vision is used to detect several key points of the carriage moving back and forth at multiple positions to determine the position of the carriage and then perform the operation.
  • This method is suitable for the position detection of the carriage that is repeated in a single scene. It is a repetitive detection with a low degree of intelligence and is not suitable for the dynamic position and posture detection of the carriage of an excavator or an outdoor unmanned autonomous truck.
  • the second method is to use laser point-to-point relative position correction.
  • AGV Automated Guided Vehicle
  • the laser transmitter installed on the AGV and the laser receiver on the shelf are used to correct the relative position of the AGV and the shelf, guiding the AGV to unload.
  • This type of AGV's perception of the shelf position can only be used for precise correction at the end of a fixed route, and cannot be used for dynamic position and posture detection of the excavator's outdoor unmanned autonomous operation truck compartment.
  • the third is truck detection in unmanned driving.
  • the truck in front is also detected to obtain the spatial relative position of the truck in front and the vehicle.
  • the precise position and posture of the truck compartment are not concerned in unmanned driving.
  • the truck is only detected as a whole and is regarded as an obstacle. Therefore, truck detection in unmanned driving cannot be directly used for the precise detection of the position and posture of the truck compartment of unmanned autonomous outdoor operation of construction machinery.
  • the fourth type is RTK truck position and posture detection, as described in the above background technology.
  • RTK can detect the position of the truck in the map coordinate system using the differential positioning principle, and at the same time, the map coordinate system position of the car body is inferred through the size information of the truck itself.
  • the excavator can also use RTK positioning to obtain its own position in the map coordinate system. By comparing the two, the relative position of the truck car body and the excavator can be obtained, and then the unloading is completed.
  • This solution can complete the unmanned autonomous operation of the excavator outdoors, but the problem that comes with it is the high cost, because in order to obtain the location information of the truck and the excavator in the map, it is necessary to install a set of RTK mobile stations on the truck and the excavator respectively, and also install a public base station.
  • the cost may be tens of thousands to hundreds of thousands of yuan, which cannot be realized on a large scale. Therefore, this solution is not suitable for large-scale applications of unmanned autonomous outdoor operations of construction machinery.
  • Multi-position visual cabin detection and point-to-point laser relative position correction detection do not have the ability of intelligent detection and have a low intelligence level. Therefore, they cannot be used for dynamic position and posture perception of truck cabins for outdoor unmanned autonomous operation of excavators.
  • RTK positioning technology can be used in outdoor unmanned autonomous operations of engineering machinery, it is expensive and cannot be applied on a large scale. It is only suitable for some preliminary exploratory research.
  • FIG. 1 is a structural schematic diagram of an optional carriage posture recognition system of the present application.
  • the present application discloses a carriage posture recognition system, which includes a processor 5, a binocular camera 4, a first vehicle 1, a second vehicle 2 and an identification member 3; the first vehicle 1 includes a carriage 101; the second vehicle 2 is used to unload materials into the carriage 101 of the first vehicle 1; the identification member 3 is arranged on the side of the carriage 101; a plurality of reference points are arranged on the identification member 3; the binocular camera 4 is arranged on the second vehicle 2, and the binocular camera 4 is used to collect an image containing the identification member 3 and send the image to the processor 5; the processor 5 is connected to the binocular camera 4 in communication; the processor 5 is used to use an image recognition network model to perform recognition processing on the image to obtain a target image; the target image is an image containing the recognition results of the plurality of reference points; and the stereoscopic geometric vision algorithm of the binocular camera 4 is used to determine the target three-
  • the first vehicle 1 may be a vehicle having a carriage 101, such as a truck; and the second vehicle 2 may be a vehicle having a mechanical arm, such as an excavator.
  • the processor 5 may be located on the second vehicle 2, or may be independent of the vehicle and located on a terminal or a server.
  • the server may include an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud audio recognition model training, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • the operating system running on the server may include but is not limited to Android, IOS, Linux, Windows, Unix, etc.
  • the terminal may include but is not limited to a client such as a smart phone, a desktop computer, a tablet computer, a laptop computer, a smart speaker, a digital assistant, an augmented reality (AR)/virtual reality (VR) device, a smart wearable device, etc. It may also be software running on the above client, such as an application program, a small program, etc.
  • the operating system running on the client may include but is not limited to Android system, IOS system, Linux, Windows, Unix, etc.
  • FIG2 is a schematic diagram of the structure of an optional second vehicle of the present application.
  • the second vehicle 2 includes a connected unloading device 201 and a main structure 202; the unloading device 201 is connected to a first side surface 2021 of the main structure 202; and the binocular camera 4 is disposed on the first side surface 2021.
  • the first side surface 2021 includes a first mounting point and a second mounting point; the first mounting point and the second mounting point are located at the top of the first side surface 2021; there is a first preset distance between the first mounting point and the second mounting point; the binocular camera 4 includes a first binocular camera 401 and a second binocular camera 402; the first binocular camera 401 is disposed at the first mounting point; the second binocular camera 402 is disposed at the second mounting point.
  • the first preset distance is greater than half the width of the first side surface 2021; optionally, in order to further improve the image collection effect;
  • the first mounting point is located at the leftmost side of the first side 2021
  • the second mounting point is located at the rightmost side of the first side 2021 .
  • FIG. 3 is a schematic diagram of the structure of an optional first vehicle of the present application.
  • the identification member 3 includes at least two sub-identification members 301; the at least two sub-identification members 301 are respectively located at the first position point and the second position point of the second side of the carriage 101; the second side is the surface facing the first side 2021; there is a second preset distance between the first position point and the second position point; the second preset distance is greater than half of the length of the second side along the first preset direction; the first preset direction (such as the x-axis direction in FIG. 3) is perpendicular to the second preset direction (such as the y-axis direction in FIG.
  • the second preset direction is the extension line direction of the height direction of the first vehicle 1.
  • the processor 5 can determine the spatial three-dimensional coordinates of multiple sub-identification members 301, thereby locating the posture information of the carriage 101, such as the distance between the carriage 101 and the second vehicle 2 and the deflection angle relative to the second vehicle 2.
  • the second side may not only refer to one side.
  • the first vehicle 1 does not stop directly in front of the second vehicle 2, but may be at a certain angle with the second vehicle 2.
  • the second side may be two sides corresponding to the first side 2021.
  • the plurality of sub-identification elements 301 may be located on one of the two sides, or may be provided on both sides. This embodiment will be described by taking the case where the sub-identification elements 301 are all located on the same side as an example.
  • Figure 4 is a schematic diagram of the structure of an optional identification member of the present application.
  • the identification member 3 is provided with 5 reference points, and the number can be more than 5 according to actual needs, such as 6, 7, 8, 9, 10, etc.
  • the number of reference points can be set according to needs.
  • the processor 5 includes an image recognition module and a position determination module; the image recognition module is used to use the image recognition network model to perform recognition processing on the image, obtain a target image containing the recognition results of the multiple reference points, and send the target image to the position determination module; the position determination module is used to use the stereoscopic geometric vision algorithm of the binocular camera 4 to determine the target three-dimensional coordinates of each of the multiple reference points in the target image; the posture information of the carriage 101 is determined based on the target three-dimensional coordinates of each reference point.
  • the image recognition module is used to use the image recognition network model to perform recognition processing on the image, obtain a target image containing the recognition results of the multiple reference points, and send the target image to the position determination module
  • the position determination module is used to use the stereoscopic geometric vision algorithm of the binocular camera 4 to determine the target three-dimensional coordinates of each of the multiple reference points in the target image
  • the posture information of the carriage 101 is determined based on the target three-dimensional coordinates of each reference point.
  • the posture recognition system of the carriage 101 provided in this application has the following advantages:
  • the binocular camera 4 uses images and stereoscopic geometric vision to accurately detect the position and posture of the truck compartment 101 of the truck in various weather conditions and various working environments.
  • the binocular camera 4 performs single-end detection, identifying and measuring external objects like human eyes. Unlike RTK, it is not necessary to install mobile stations on both the truck and the excavator. The RTK positioning result of the truck needs to be sent to the unmanned excavator through the communication terminal. After the unmanned excavator receives the truck positioning data, it compares it with its own positioning data to obtain the relative position relationship between the two.
  • the binocular camera 4 has reached the automotive grade and is waterproof and dustproof. In large-scale applications, the cost can be reduced to several thousand yuan, which is low cost.
  • the binocular camera 4 can accurately detect the position and posture of the carriage 101 of the truck in various weather conditions and various working environments. At the same time, the cost is low, and the large-scale application of unmanned outdoor autonomous operation of construction machinery can be realized.
  • Figure 5 is a flow chart of an optional method for recognizing a carriage posture of the present application.
  • This specification provides method operation steps such as embodiments or flow charts, but more or fewer operation steps may be included based on conventional or non-creative labor.
  • the order of steps listed in the embodiments is only one way of executing the order of many steps and does not represent the only execution order.
  • the actual system or server product is executed, it can be executed in sequence or in parallel according to the method shown in the embodiments or the accompanying drawings (for example, a parallel processor 5 or a multi-threaded processing environment).
  • the method may include:
  • the configuration of the binocular camera 4 is shown in FIG2 , and the details are shown in the description of the above system part.
  • the binocular camera 4 includes a left camera and a right camera; step S501 can be specifically described as: using the left camera to capture a first image including the identification part 3; the first image is a visible light image; using the right camera to capture a second image including the identification part 3; the second image is a visible light image.
  • the input image includes the first image and the second image, and the target three-dimensional coordinates are calculated based on the identification part recognition results of the two images.
  • S503 Input the image into an image recognition network model to obtain a target image; the target image includes a recognition result for each of the multiple reference points.
  • the image recognition network model includes a feature extraction network, a feature fusion network and a prediction recognition network; step S503 may include: using the feature extraction network to perform feature extraction operations on the image to obtain a feature atlas; using the feature fusion network to perform feature fusion processing on the feature atlas to obtain a target feature map; using the prediction recognition network to perform prediction processing on the target feature map to obtain a target image.
  • the feature extraction network includes an input layer and a sub-feature extraction network; the input layer is generally used to normalize the image data before feeding it into the neural network for reasoning.
  • the input layer is generally used to normalize the image data before feeding it into the neural network for reasoning.
  • normalization methods one of which is to normalize the image pixel value from 0-255 to 0-1.
  • x represents the pixel value corresponding to the pixel point of the image
  • X1 represents the value after the pixel value of the pixel point is normalized.
  • the normalized input layer image data is then subjected to convolution and activation operations to continuously extract image features.
  • the convolution kernel can be set to 1*1 or 3*3, and there is no restriction here.
  • the output of the current convolution layer is obtained through the activation function.
  • activation functions There are many activation functions, and the commonly used ones are ReLU and Sigmoid.
  • the deep convolutional neural network Yolo5KeyPoints used in this solution has generalization capabilities and can effectively identify specific pattern identification parts 3 and their 5 reference points in various weather conditions and working environments.
  • Yolo5KeyPoints uses CSP-Darknet53 as the backbone feature extraction network, SPPF and CSP-PAN as feature fusion networks, and then uses the category prediction subnet (class subnet) to predict the category of each grid, the detection box regression subnet (box subnet) to regress the detection box, and the reference point regression subnet (key point subnet) to regress the reference point to obtain the final network prediction result.
  • the reference point detection loss function Yolo5KeyPoints uses the Wing Loss function. Wing Loss has a small parameter gradient when the loss is large and is not sensitive to outliers, while a large parameter gradient when the loss is small, which makes the model converge better and greatly improves the accuracy of reference point detection.
  • the backbone feature extraction network is generally followed by a feature fusion network, which fuses the feature maps of different scales extracted by the backbone feature extraction network, and then further performs convolution to extract features.
  • the feature fusion network may include an enhanced feature extraction network, which may be performed by stacking feature pyramids and performing convolution operations.
  • a typical enhanced feature extraction network is FPN.
  • the FPN network fuses low-level high-resolution feature maps with high-level high-semantic feature maps, and then performs a separate prediction on each layer of the fused feature maps to obtain a prediction result.
  • a 1x1 convolution is generally performed to obtain the network prediction results, namely the detection box regression results, reference point regression results and category confidence values.
  • the deep convolutional neural network Yolo5KeyPoints needs to collect data, annotate data, and train models in advance before it can have the ability to recognize specific pattern identification parts 3 and their 5 reference points.
  • the training sample data set includes each sample image in a plurality of sample images, and a corresponding label image; each of the sample images includes an identification piece 3 and a plurality of reference points located on the identification piece 3; the label image is an image corresponding to the identification piece 3 in the sample image, with a labeling box marked and each reference point on the identification piece 3 marked.
  • the sample images can be images of the identification part 3 in various weather conditions and various working environments taken by a binocular camera, and after the image acquisition is completed, the images that meet the requirements are manually selected for annotation.
  • images meaningful images in various weather conditions and various working environments are selected and the images completely contain the pattern of the identification part 3. Only one of the repeated images is retained.
  • the generation process of label images is as follows: label the data using the labelme tool to obtain the label of each image. 3 annotation boxes and 5 reference point locations label files.
  • the specific operation process is as follows: click the Create Target Rectangular Frame button in the labeling software labelme, create a new target frame to frame the sign piece 3, enter the name of the labeling frame "SignBoard" to get the sign piece 3 label; then click the Create Point Target button, click the five reference points of the sign piece 3 pattern, and enter the reference point names: point 1 (point1), point 2 (point2), point 3 (point3), point 4 (point4), point 5 (point5) to get the reference point labels. After marking all the sign pieces 3 and their 5 reference points in the entire image, you will get the label image of this image.
  • the current deep learning model is determined as the image recognition network model of the object.
  • the model training process can also be as follows: load the pre-trained weights, and then input the labeled data for model training.
  • the image data is normalized by the preprocessing module, and the normalized image data is sent to the network model for forward propagation to obtain the prediction result.
  • the prediction result and the target true value of the label file are used to calculate the deviation between the model output and the target true value through the damage function, that is, the loss value.
  • the loss value can include three parts, namely target classification loss, detection box regression loss, and reference point regression loss.
  • the loss is updated by back propagation on each layer of the network weights. At this point, one training is completed.
  • the model converges through continuous iterative training. When the convergence target is reached or the maximum number of iterations is reached, the training of the entire model is completed.
  • the trained Yolo5KeyPoints model is deployed to the CPU processor or GPU, AI chip to perform model inference and prediction, and obtain the recognition results of the identification part 3 and its 5 reference points in the real-time image captured by the binocular camera 4.
  • a model with detection capabilities is obtained through on-site data training.
  • the model can be deployed to the CPU processor through the opencv library or libtorh library, and the GPU and AI chips can also be deployed to the GPU and AI chips through the library and deployment requirements provided by the manufacturer. Generally, it needs to be converted to the model format.
  • the detection process of identification part 3 and its five reference points includes three stages. When a real-time image is input, it first goes through the preprocessing stage to complete the image normalization operation; the normalized image is then sent to the Yolo5KeyPoints model for forward reasoning to obtain the prediction results of each grid point on the image; finally, the prediction results of each grid point are post-processed, such as non-maximum suppression, to obtain the final prediction results.
  • the binocular camera 4 collects image data in real time and completes the identification of the identification part 3 and its five reference points. Then, the following steps S505-S513 are used to calculate the image coordinates of the five pairs of reference points through the stereoscopic geometric vision of the binocular camera 4 to obtain the three-dimensional coordinates of the five reference points of the identification part 3.
  • This detection method has high robustness and high accuracy.
  • the coordinates of the reference point in the pixel coordinate system may be (u, v).
  • the camera parameters include the distance between the left and right cameras of the binocular camera 4, installation parameters, and internal parameters.
  • the binocular camera 4 includes a left camera and a right camera, and the distance between the left camera and the right camera is T; the installation parameters of the binocular camera include the translation distance of the left camera or the right camera from the origin of the vehicle coordinate system, and the rotation angle relative to the origin of the vehicle coordinate system; the internal parameters of the binocular camera include the internal parameters of the left camera and the internal parameters of the right camera.
  • S509 Using the stereoscopic geometric vision algorithm of the binocular camera 4, based on the camera parameters of the binocular camera 4 and the coordinates of the reference point in the pixel coordinate system, determine the target three-dimensional coordinates of the reference point.
  • S511 Determine the target three-dimensional coordinates of the carriage 101 based on the target three-dimensional coordinates of each reference point;
  • S513 Determine the posture information of the carriage 101 based on the target three-dimensional coordinates of the carriage 101.
  • Steps S509-S513 can be specifically described as follows:
  • S601 Determine the coordinates of the reference point in the camera coordinate system based on the distance between the left and right cameras of the binocular camera 4, the internal parameters of the binocular camera 4 and the coordinates of the reference point in the pixel coordinate system.
  • the internal reference of the binocular camera includes the internal reference of the left camera and the internal reference of the right camera.
  • FIG. 7 is a schematic diagram of the relationship between multiple optional coordinate systems of the present application.
  • World coordinate system Xw, Yw, Zw
  • Camera coordinate system Xc, Yc, Zc
  • Image coordinate system x, y
  • Pixel coordinate system u, v (reflecting the arrangement of pixels in the camera CCD chip).
  • the above formula (3) can be used to solve the coordinates of any pixel on the image in the image coordinate system, that is, (x, y).
  • O l and O r are the projection centers of the left and right eyes of the binocular camera 4, and the line D between them is called the binocular baseline, that is, the distance between the center points of the two cameras.
  • P is a point in space
  • P l is the imaging point of point P on the left eye
  • P r is the imaging point of point P on the right eye
  • the parallax d x l -x r ;
  • ⁇ PO l O r is similar to ⁇ PP l P r .
  • the depth Z can be calculated by the following formula.
  • f is the focal length of the camera.
  • the coordinates Xc and Yc of P in the target binocular camera coordinate system (the binocular camera coordinate system is the coordinate system with the center of the left camera as the origin, i.e., the camera coordinate system in the coordinate relationship diagram 7) can be further determined.
  • FIG 9 is another schematic diagram of the relationship between multiple optional coordinates of the present application.
  • the imaging point P l of the spatial point P in the left eye image can be obtained by using the similar triangle theorem.
  • the coordinates of P (X, Y, Z) in the left eye camera coordinate system of the binocular camera are P (Xc, Yc, Zc);
  • P (x, y) is the image coordinates of the imaging point of the spatial point P (Xc, Yc, Zc) in the left eye image coordinate system of the binocular camera.
  • the three-dimensional coordinates of the five reference points on the identification element 3 in the camera coordinate system can be solved in sequence, and the coordinates are the coordinates in the left camera coordinate system.
  • S603 Determine a first coordinate transformation matrix based on installation parameters of the binocular camera.
  • S605 According to the first coordinate conversion matrix, convert the three-dimensional coordinates of the reference point in the camera coordinate system into three-dimensional coordinates in the target vehicle coordinate system; the target vehicle coordinate system is the coordinate system where the second vehicle 2 is located.
  • the origin of the vehicle coordinate system is the rotation center point of the second vehicle 2
  • the corresponding rotation matrix R and translation vector T can be determined;
  • the second vehicle when the second vehicle is an excavator, the second vehicle includes a mechanical arm and a base, and the rotating end of the mechanical arm and the base can be Rotationally connected, the rotation center point of the second vehicle is located on the rotation axis where the rotating end is located.
  • the three-dimensional coordinates of the reference point in the camera coordinate system are converted into the three-dimensional coordinates in the second vehicle 2 coordinate system.
  • the coordinate conversion matrix is expressed as follows:
  • the above coordinate transformation matrix that is, formula (7), is also different.
  • S607 Determine the three-dimensional coordinates of the carriage 101 in the target vehicle coordinate system based on the three-dimensional coordinates of each reference point in the target vehicle coordinate system.
  • the identification element 3 includes two sub-identification elements 301, and each sub-identification element 301 includes 5 reference points
  • the coordinates of the 5 reference points can be screened, and the data of the coordinates of the reference points with large numerical deviations can be eliminated, and the coordinate data of the remaining reference points that meet the requirements can be averaged to obtain the three-dimensional coordinate data of the sub-identification element 301.
  • weights can also be set for the coordinates of the reference points at different positions on the sub-identification element 301, and the three-dimensional coordinate data of the sub-identification element 301 can be obtained by multiplying the coordinates of each reference point by the corresponding weight.
  • S609 Determine the posture information of the carriage 101 based on the three-dimensional coordinates of the carriage 101 in the target vehicle coordinate system.
  • the posture information of the carriage 101 can be determined based on the three-dimensional coordinates of the two sub-identification elements 301 in the coordinate system of the second vehicle 2 .
  • the posture information includes information such as the relative distance and the deflection angle between the carriage 101 and the second vehicle 2 .
  • the image recognition module includes a feature extraction submodule, a feature fusion submodule and a prediction recognition module; the feature extraction submodule is used to perform feature extraction operation on the image using a feature extraction network to obtain a feature atlas; the feature fusion submodule is used to perform feature fusion processing on the feature atlas using a feature fusion network to obtain a target feature map; the prediction recognition module is used to perform prediction processing on the target feature map using the prediction recognition network to obtain a target image.
  • the location determination module includes:
  • a pixel coordinate determination module used for determining, for each reference point, the coordinates of the reference point in a pixel coordinate system based on the target image and the recognition result of the reference point;
  • a camera parameter acquisition module used to acquire the camera parameters of the binocular camera 4;
  • the target three-dimensional coordinate determination module is used to determine the target three-dimensional coordinates of the reference point based on the camera parameters of the binocular camera 4 and the coordinates of the reference point in the pixel coordinate system by using the stereoscopic geometric vision algorithm of the binocular camera 4;
  • the target three-dimensional coordinates of the reference point determine the target three-dimensional coordinates of the carriage 101;
  • the posture information determination module is used to determine the posture information of the carriage 101 based on the target three-dimensional coordinates of the carriage 101.
  • the target three-dimensional coordinate determination module includes a first coordinate determination module, a first coordinate conversion matrix and a second coordinate determination module;
  • a first coordinate determination module used for determining the three-dimensional coordinates of the reference point in the camera coordinate system based on the distance between the left and right cameras of the binocular camera 4, the internal parameters of the binocular camera 4 and the coordinates of the reference point in the pixel coordinate system;
  • a first coordinate transformation matrix used to determine the first coordinate transformation matrix based on installation parameters of the binocular camera
  • a second coordinate determination module is used to convert the three-dimensional coordinates of the reference point in the camera coordinate system into the three-dimensional coordinates in the target vehicle coordinate system according to the first coordinate conversion matrix;
  • the target vehicle coordinate system is the coordinate system where the second vehicle 2 is located; and determine the three-dimensional coordinates of the carriage 101 in the target vehicle coordinate system based on the three-dimensional coordinates of each reference point in the target vehicle coordinate system;
  • the posture information determination module is used to determine the posture information of the carriage 101 based on the three-dimensional coordinates of the carriage 101 in the target vehicle coordinate system.
  • the carriage posture recognition system As described above, based on the carriage posture recognition system provided by this application, its optional working process is as follows: deploy the trained deep convolutional neural network model Yolo5KeyPoints to the CPU. At this time, the neural network model Yolo5KeyPoints completes a detection calculation to obtain the identification part 3 and its five reference points, which takes about 130ms; if the computing unit contains an image computing unit GPU or AI acceleration chip, the neural network model Yolo5KeyPoints can be deployed to the image computing unit GPU or AI acceleration chip to improve real-time performance. At this time, the neural network model Yolo5KeyPoints completes a detection calculation to obtain the identification part 3 and its five reference points, which takes about 30ms. The binocular stereoscopic geometric vision calculates 5 pairs of reference points to generate 5 reference points in three-dimensional space coordinates. The algorithm is deployed to the CPU.
  • the system is powered on, the binocular camera 4 truck compartment 101 position and posture perception system is started, the unmanned excavator control system is started, and the unmanned excavator enters the unmanned autonomous operation state.
  • the operator parks the first vehicle 1 (such as a truck) in a parking space and prepares for loading.
  • first vehicle 1 such as a truck
  • the binocular camera 4 sends the captured left and right color images (1280x720 RGB images) to the deep convolutional neural network Yolo5KeyPoints model.
  • the left and right color images (1280x720 RGB images) are preprocessed, including image normalization and image scaling.
  • the image normalization uses the RGB value of each pixel in the image divided by 127.5 minus 1 to normalize all pixel values to -1 to 1; the image scaling operation is to scale the original 1280x720 RGB image to a 1280x736 RGB image to meet the model input image size requirements.
  • the image data is sent to the Yolo5KeyPoints model for inference to obtain the model output, which is the object classification result of each anchor box on each grid point, the detection box regression result, and the regression results of the five reference points.
  • the Yolo5KeyPoints model output is post-processed through non-maximum suppression to obtain the final identification part 3 target detection box and its 5 reference point pixel coordinates.
  • the three-dimensional spatial coordinates of the five reference points are obtained by calculating the pixel coordinates of the five reference points using the stereoscopic geometric vision of the binocular camera 4.
  • the carriage posture recognition system has a simple structure and is suitable for large-scale outdoor unmanned operation scenarios.
  • An embodiment of the present application also provides an electronic device, which includes a processor and a memory, wherein the memory stores at least one instruction, at least one program, a code set or an instruction set, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor to implement the vehicle posture recognition method as described above.
  • An embodiment of the present application also provides a computer storage medium, which can be set in a server to store at least one instruction, at least one program, a code set or an instruction set related to a vehicle posture recognition method in an implementation method embodiment.
  • the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor to implement the above-mentioned vehicle posture recognition method.
  • the storage medium may be located in at least one of the multiple network servers of the computer network.
  • the storage medium may include, but is not limited to, various media that can store program codes, such as a USB flash drive, a read-only memory (ROM), a random access memory (RAM), a mobile hard disk, a magnetic disk or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种车厢姿态识别***、方法、电子设备及存储介质。该***包括处理器、双目相机、第一车辆、第二车辆和标识件;第一车辆包括车厢;第二车辆用于向第一车辆的车厢卸物料;标识件设于车厢侧面;标识件上设有多个参考点;双目相机设于第二车辆上,双目相机用于采集包含标识件的图像并将图像发送给处理器;处理器用于利用图像识别网络模型对图像进行识别处理,得到目标图像;目标图像为包含多个参考点的识别结果的图像;并利用双目相机立体几何视觉算法对目标图像中的每个参考点进行目标空间三维坐标确定;基于各个参考点的目标空间三维坐标确定车厢的姿态信息。该车厢姿态识别***具有结构简单和适用于大规模室外无人作业场景的特点。

Description

车厢姿态识别***、方法、电子设备及存储介质 技术领域
本发明涉及计算机技术领域,特别涉及车厢姿态识别***、方法、电子设备及存储介质。
背景技术
土方机械、矿山机械、路面机械等工程机械设备在进行无人化自主作业升级中需要机群协同作业,如挖掘机向卡车进行卸料,在这种应用场景下就需要挖掘机具备卡车车厢实时位置和姿态检测能力。
通常,可以利用载波相位差分技术(Real-time kinematic,RTK)的差分定位原理可以检测出卡车位于地图坐标系的位置,同时通过卡车自身的尺寸信息推算得到车厢的地图坐标系位置。同时挖掘机也可以利用RTK定位得到自己在地图坐标系的位置,两者相比较就可以得到卡车车厢跟挖掘机的相对位置,进而完成卸料。
此种方案可以完成挖掘机室外无人自主作业,但随之带来的问题是高昂的成本,因为要想得到卡车和挖掘机在地图中的位置信息,就必需在卡车和挖掘机上各安装一套RTK移动站,同时还要安装一台公共基站,带来的成本可能是数万元至数十万元,无法实现大规模应用。因此,此种方案不适用于工程机械室外无人自主作业的大规模应用中。
发明内容
本发明要解决的是现有挖掘机无人自主作业所基于的***结构复杂,智能化低,成本高,应用范围窄的技术问题。
为解决上述技术问题,于一方面,本申请公开了一种车厢姿态识别***,其包括处理器、双目相机、第一车辆、第二车辆和标识件;
该第一车辆包括车厢;
该第二车辆用于向该第一车辆的车厢卸物料;
该标识件设于该车厢的侧面;该标识件上设有多个参考点;
该双目相机设于该第二车辆上,该双目相机用于采集包含该标识件的图像,并将该图像发送给该处理器;
该处理器与该双目相机通信连接;该处理器用于利用图像识别网络模型对该图像进行识别处理,得到目标图像;所述目标图像为包含所述多个参考点的识别结果的图像; 并利用双目相机立体几何视觉算法对该目标图像中的多个参考点中的每个参考点进行目标三维坐标确定;基于各个参考点的目标三维坐标确定该车厢的姿态信息。
可选的,该第二车辆包括连接的卸料装置和主体结构;
该卸料装置与该主体结构的第一侧面连接;
该双目相机设于该第一侧面上。
可选的,该第一侧面包括第一安装点和第二安装点;
该第一安装点和该第二安装点位于该第一侧面的顶部;
该第一安装点与该第二安装点之间存在第一预设距离;
该双目相机包括第一双目相机和第二双目相机;
该第一双目相机设于该第一安装点;
该第二双目相机设于该第二安装点。
可选的,该标识件包括至少两个子标识件;
该至少两个子标识件中分别位于该车厢的第二侧面的第一位置点和第二位置点;
该第二侧面为朝向该第一侧面的面;
该第一位置点与该第二位置点之间存在第二预设距离;该第二预设距离大于该第二侧面沿第一预设方向的长度的一半;该第一预设方向与该第二预设方向垂直;该第二预设方向为该第一车辆的高度方向的延长线方向。
可选的,该处理器包括图像识别模块和位置确定模块;
该图像识别模块用于利用图像识别网络模型对该图像进行识别处理,得到目标图像;目标图像为包含所述多个参考点的识别结果的图像,并将该目标图像发送给位置确定模块;
该位置确定模块用于利用双目相机立体几何视觉算法对该目标图像中的多个参考点中的每个参考点进行目标三维坐标确定;基于各个参考点的目标三维坐标确定该车厢的姿态信息。
于另一方面,本申请还公开了一种车厢姿态识别方法,其包括:
利用双目相机采集包含标识件的图像;
将该图像输入到图像识别网络模型,得到目标图像;该目标图像上包含有对该多个参考点中每个参考点的识别结果;
针对每个该参考点,基于该目标图像和该参考点的识别结果确定该参考点在像素坐标系下的坐标;
获取该双目相机的相机参数;
利用双目相机立体几何视觉算法,基于该双目相机的相机参数和该参考点在像素坐标系下的坐标确定该参考点的目标三维坐标;
基于各个参考点的目标三维坐标确定该车厢的目标三维坐标;
基于该车厢的目标三维坐标确定该车厢的姿态信息。
可选的,该相机参数包括双目相机的左右目相机之间的距离、安装参数以及内参。
可选的,该基于该双目相机的相机参数和该参考点在像素坐标系下的坐标确定该参考点的目标三维坐标;基于各个参考点的目标三维坐标确定该车厢的目标三维坐标;基于该车厢的目标三维坐标确定该车厢的姿态信息包括:
基于该双目相机的左右目相机之间的距离、该双目相机的内参和该参考点在像素坐标系下的坐标确定该参考点在相机坐标系下的坐标;
基于该双目相机的安装参数确定第一坐标转换矩阵;
按照该第一坐标转换矩阵,将该参考点在相机坐标系下的三维坐标转换为目标车辆坐标系下的三维坐标;该目标车辆坐标系为该第二车辆所在的坐标系;
基于各个参考点在该目标车辆坐标系下的三维坐标确定该车厢在目标车辆坐标系下的三维坐标;
基于该车厢在目标车辆坐标系下的三维坐标确定该车厢的姿态信息。
于另一方面,本申请还提供了一种电子设备,该电子设备包括处理器和存储器,该存储器中存储有至少一条指令、至少一段程序、代码集或指令集,该至少一条指令、该至少一段程序、该代码集或指令集由该处理器加载并执行以实现上述的车厢姿态识别方法。
于另一方面,本申请还提供了一种计算机存储介质,该计算机存储介质中存储有至少一条指令或至少一段程序,该至少一条指令或至少一段程序由处理器加载并执行以实现上述的车厢姿态识别方法。
采用上述技术方案,本申请提供的车厢姿态识别方法具有如下有益效果:
该车厢姿态识别***包括处理器、双目相机、第一车辆、第二车辆和标识件;该第一车辆包括车厢;该第二车辆用于向该第一车辆的车厢卸物料;该标识件设于该车厢的侧面;该标识件上设有多个参考点;该双目相机设于该第二车辆上,该双目相机用于采集包含该标识件的图像,并将该图像发送给该处理器;该处理器与该双目相机通信连接;该处理器用于利用图像识别网络模型对该图像进行识别处理,得到目标图像;目标图像为包含所述多个参考点的识别结果的图像;并利用双目相机立体几何视觉算法对该目标图像中的多个参考点中的每个参考点进行目标三维坐标确定;基于各个参考点的目标三维坐标确定该车厢的姿态信息。本申请提供的该车厢姿态识别***结构简单且适用于大规模室外无人作业场景。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请一种可选的车厢姿态识别***的结构示意图;
图2为本申请一种可选的第二车辆的结构示意图;
图3为本申请一种可选的第一车辆的结构示意图;
图4为本申请一种可选的标识件的结构示意图;
图5为本申请一种可选的车厢姿态识别方法的流程示意图;
图6为本申请另一种可选的车厢姿态识别方法的流程示意图;
图7为本申请一种可选的多种坐标之间的关系示意图;
图8为本申请一种可选的双目立体视觉相机模型示意图;
图9为本申请另一种可选的多种坐标之间的关系示意图。
以下对附图作补充说明:
1-第一车辆;101-车厢;2-第二车辆;201-卸料装置;202-主体结构;2021-第一
侧面;3-标识件;301-子标识件;4-双目相机;401-第一双目相机;402-第二双目相机;5-处理器。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。
此处所称的“一个实施例”或“实施例”是指可包含于本申请至少一个实现方式中的特定特征、结构或特性。在本申请的描述中,需要理解的是,术语“上”、“下”、“顶”、“底”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本申请和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本申请的限制。此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含的包括一个或者更多个该特征。而且,术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。
此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、***、产品或服务器不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
通常,对于车厢的位置检测方法,包括以下几种。
第一种是在固定位置利用视觉进行车厢位置检测。例如,利用视觉在多个位置检测出往返运动车厢的几个关键点来确定车厢的位置,进而执行作业操作。此方法适用于单个场景重复工作的车厢位置检测,属于重复性检测,智能化程度低,不适用于挖掘机室外无人自主作业卡车车厢动态位置、姿态检测中。
第二种是利用激光点对点进行相对位置矫正。在无人驾驶搬运车(Automated Guided Vehicle,AGV)行业中利用安装在AGV小车上的激光发射器和货架上的激光接收器进行AGV小车和货架的相对位置矫正,引导AGV小车进行卸货。此种AGV小车对货架位置的感知只能用于固定路线的末端精确矫正,并不能用于挖掘机室外无人自主作业卡车车厢动态位置、姿态检测中。
第三种是无人驾驶中卡车检测。在当前无人驾驶中也对前方卡车进行检测,得到前方卡车与本车的空间相对位置,但无人驾驶中并不关心卡车车厢的精确位置及姿态,只将卡车作为一个整体检测,视卡车为一种障碍物,因此无人驾驶中的卡车检测不能直接用于工程机械室外无人自主作业卡车车厢位置、姿态的精确检测中。
第四种如上述背景技术种阐述的,RTK卡车位置、姿态检测。RTK利用差分定位原理可以检测出卡车位于地图坐标系的位置,同时通过卡车自身的尺寸信息推算得到车厢的地图坐标系位置。同时挖掘机也可以利用RTK定位得到自己在地图坐标系的位置,两者相比较就可以得到卡车车厢跟挖掘机的相对位置,进而完成卸料。此种方案可以完成挖掘机室外无人自主作业,但随之带来的问题是高昂的成本,因为要想得到卡车和挖掘机在地图中的位置信息,就必需在卡车和挖掘机上各安装一套RTK移动站,同时还要安装一台公共基站,带来的成本可能是数万元至数十万元,无法实现大规模应用。因此,此种方案不适用于工程机械室外无人自主作业的大规模应用中。
上述种检测车厢的方式存在以下缺点:
1)智能化程度低。多位置视觉车厢检测以及点对点激光相对位置矫正检测都不具备智能化检测的能力,智能化程度低,因此也不能用于挖掘机室外无人自主作业卡车车厢动态位置、姿态感知中。
2)结构复杂,成本高。RTK定位技术虽然可用于工程机械室外无人自主作业中,但其成本高昂,不可能进行大规模应用,只适合做一些前期的探索研究。
3)不适用于工程机械室外作业。当前已有的量产方案都不能用于工程机械室外无人自主作业,其中无人驾驶中的卡车检测虽然也是卡车检测,但其是将卡车作为整体,当成障碍物进行检测,不关心车厢的精确位置和姿态。其他方案,多位置视觉车厢检测、点对点激光相对位置矫正检测因为智能化程度低,不能用于工程机械室外无人自主作业卡车车厢位置和姿态动态感知中;RTK定位技术因为其成本高昂,不能用于工程机械室外无人自主作业的大规模应用中。
为此,参阅图1,图1为本申请一种可选的车厢姿态识别***的结构示意图。本申请公开了一种车厢姿态识别***,该***包括处理器5、双目相机4、第一车辆1、第二车辆2和标识件3;该第一车辆1包括车厢101;该第二车辆2用于向该第一车辆1的车厢101卸物料;该标识件3设于该车厢101的侧面;该标识件3上设有多个参考点;该双目相机4设于该第二车辆2上,该双目相机4用于采集包含该标识件3的图像,并将该图像发送给该处理器5;该处理器5与该双目相机4通信连接;该处理器5用于利用图像识别网络模型对该图像进行识别处理,得到目标图像;目标图像为包含所述多个参考点的识别结果的图像;并利用双目相机4立体几何视觉算法对该目标图像中的多个参考点中的每个参考点进行目标三维坐标确定;基于各个参考点的目标三维坐标确定该车厢101的姿态信息。
可选的,第一车辆1可以是具有车厢101的车辆,如卡车;第二车辆2可以是具有机械臂的车辆,例如挖掘机。
可选的,该处理器5可以是位于第二车辆2上,还可以独立于该车辆,位于终端或者服务器上。
可选的,服务器可以包括是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式***,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云音频识别模型训练、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。服务器上运行的操作***可以包括但不限于安卓***、IOS***、linux、windows、Unix等。
可选的,终端可以包括但不限于智能手机、台式计算机、平板电脑、笔记本电脑、智能音箱、数字助理、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、智能可穿戴设备等类型的客户端。也可以为运行于上述客户端的软体,例如应用程序、小程序等。可选的,客户端上运行的操作***可以包括但不限于安卓***、IOS***、linux、windows、Unix等。
于一种可选的示例中,参阅图2,图2为本申请一种可选的第二车辆的结构示意图。该第二车辆2包括连接的卸料装置201和主体结构202;该卸料装置201与该主体结构202的第一侧面2021连接;该双目相机4设于该第一侧面2021上。
为了进一步保证该双目相机能够采集到所有标识件的图像信息。于一种可选的示例中,该第一侧面2021包括第一安装点和第二安装点;该第一安装点和该第二安装点位于该第一侧面2021的顶部;该第一安装点与该第二安装点之间存在第一预设距离;该双目相机4包括第一双目相机401和第二双目相机402;该第一双目相机401设于该第一安装点;该第二双目相机402设于该第二安装点。
为了提高双目相机4视角的广度,保证采集到的图像的效果,可选的,该第一预设距离大于该第一侧面2021的宽度的一半;可选的,为了进一步提高图像采集效果;可 选的,该第一安装点位于第一侧面2021的最左侧,第二安装点位于第一侧面2021的最右侧。
于一种可选的示例中,参阅图3,图3为本申请一种可选的第一车辆的结构示意图。为了提高后续确定的车厢姿态的精准度;该标识件3包括至少两个子标识件301;该至少两个子标识件301分别位于该车厢101的第二侧面的第一位置点和第二位置点;该第二侧面为朝向该第一侧面2021的面;该第一位置点与该第二位置点之间存在第二预设距离;该第二预设距离大于该第二侧面沿第一预设方向的长度的一半;该第一预设方向(如图3中的x轴方向)与该第二预设方向(如图3中的y轴方向)垂直;该第二预设方向为该第一车辆1的高度方向的延长线方向。从而处理器5可以确定出多个子标识件301的空间三维坐标,从而定位出车厢101的姿态信息,例如车厢101与第二车辆2之间的距离以及相对于第二车辆2的偏转角度等信息。
在本实施例中,上述第二侧面可以不仅指一个侧面,在实际场景中,第一车辆1并非会停在第二车辆2的正前方,可能是与第二车辆2存在一定夹角,则该第二侧面可以是第一侧面2021对应的两个侧面;相应的,上述多个子标识件301可以位于其中的两个侧面中的一个侧面上,也可以在两个侧面上均设置相应数量。本实施例将以子标识件301均位于同一侧面的情况为例进行阐述。
可选的,参阅图4,图4为本申请一种可选的标识件的结构示意图。该标识件3上设有5个参考点,实际根据需要还可以是5个以上,例如6,7,8,9,10等n个,当参考点的数量越多,会使得最终确定出的标识件3的坐标数据越准确,但参考点的数量太多,也会以进一步增加计算时间,导致数据处理耗时过长,参考点的数量具体可以根据需要设定。
于一种可选的示例中,该处理器5包括图像识别模块和位置确定模块;该图像识别模块用于利用图像识别网络模型对该图像进行识别处理,得到包含该多个参考点的识别结果的目标图像,并将该目标图像发送给位置确定模块;该位置确定模块用于利用双目相机4立体几何视觉算法对该目标图像中的多个参考点中的每个参考点进行目标三维坐标确定;基于各个参考点的目标三维坐标确定该车厢101的姿态信息。具体可以详见下文对车厢姿态识别方法部分的描述。
本申请提供的该车厢101的姿态识别***具有以下优点:
1)智能化程度高。基于人工智能领域深度卷积神经网路技术,双目相机4利用图像和双目相机4立体几何视觉可以对各种天气条件下、各种工作环境中的卡车进行精确的卡车车厢101位置和姿态检测。
2)单端检测。双目相机4进行的是单端检测,像人的眼睛一样对外界事物进行识别并测量。无需像RTK那样要求卡车和挖掘机都安装移动站,卡车的RTK定位结果需要通过通信终端发往无人挖机,无人挖机收到卡车定位数据后与自身定位数据做比较才能得到两者的相对位置关系。
3)成本低。目前达到车规级的双目相机4,具备防水防尘能力,在大规模应用中可以将成本降低至数千元,成本低。
4)满足工程机械室外无人自主作业大规模应用。基于人工智能领域中的深度卷积神经网络技术,利用图像和双目立体几何视觉,双目相机4可以对各种天气条件下、各种工作环境中的卡车进行车厢101的精确位置和姿态检测。同时成本低,能够实现工程机械室外无人自主作业的大规模应用。
以下介绍本申请一种车厢姿态识别方法的具体实施例,参阅图5,图5为本申请一种可选的车厢姿态识别方法的流程示意图。本说明书提供了如实施例或流程图的方法操作步骤,但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。在实际中的***或服务器产品执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器5或者多线程处理的环境)。具体的如图5所示,该方法可以包括:
S501:利用双目相机4采集包含标识件3的图像。
在本实施例中,该双目相机4的设置的方式参阅图2,具体详见上述***部分的描述。
可选的,双目相机4包括左目相机和右目相机;步骤S501可以具体阐述为:利用左目相机采集包含标识件3的第一图像;所述第一图像为可见光图像;利用右目相机采集包括标识件3的第二图像;所述第二图像为可见光图像,在下述步骤S503中,输入的图像包括第一图像和第二图像,后面基于两幅图像标识件识别结果进行目标三维坐标计算。
S503:将该图像输入到图像识别网络模型,得到目标图像;该目标图像上包含有对该多个参考点中每个参考点的识别结果。
可选的,该图像识别网络模型包括特征提取网络、特征融合网络和预测识别网络;步骤S503可以包括:利用特征提取网络对图像进行特征提取操作,得到特征图集;利用特征融合网络对该特征图集进行特征融合处理,得到目标特征图;利用预测识别网络对该目标特征图进行预测处理,得到目标图像。
可选的,该特征提取网络包括输入层和子特征提取网络;其中输入层用于一般对图像数据进行归一化再送入神经网络进行推理,归一化的方法有多种,其中一种形式是将图像像素值由0-255归一化至0-1。图像像素值归一化公式如下:
X1=x/255
其中,x表示图像的像素点对应的像素值;X1表示对该像素点的像素值进行归一化处理后的值。
例如,当x=200,则可以得到归一化处理后的X1=0.78。
然后将归一化后的输入层图像数据经过卷积操作和激活操作不断提取图像特征。
可选的,在特征提取过程中,卷积核可以设置为1*1,也可以设置为3*3,在此不做限制。
卷积操作后通过激活函数得到当前卷积层的输出。激活函数有多种,常用的有ReLU和Sigmoid。
可选的,本方案采用的深度卷积神经网络Yolo5KeyPoints具有泛化能力,能够对各种天气下、各种工作环境中的包含特定图案标识件3及其5个参考点进行有效识别。Yolo5KeyPoints采用CSP-Darknet53作为主干特征提取网络,SPPF、CSP-PAN作为特征融合网络,接着使用种类预测子网络(class subnet)预测每个网格的类别、检测框回归子网络(box subnet)回归检测框、参考点回归子网络(key point subnet)回归参考点,得到最终的网络预测结果。参考点检测损失函数Yolo5KeyPoints使用Wing Loss函数,Wing Loss具有损失大的时候参数梯度小,不对离群点敏感,而损失小的时候参数梯度大,使模型收敛的更好,参考点检测准确性大幅提升。
由以上描述可知,主干特征提取网络之后一般会接特征融合网络,特征融合网络将主干特征提取网络提取的不同尺度特征图进行特征融合,然后进一步进行卷积提取特征。
特征融合网络可以包括加强特征提取网络,该加强特征提取网络可以采用特征金子塔堆叠加卷积操作进行,一个典型的加强特征提取网络是FPN。FPN网络将低层高分辨率特征图和高层高语义特征图进行融合,然后在融合后的每一层特征图上单独做预测得到预测结果。
特征融合及加强特征提取网络之后一般接1x1的卷积得到网络预测结果,即检测框回归结果、参考点回归结果和种类置信度值。
经测试表明,将上述对车厢101位置以及姿态识别算法部署于CPU中,运算速度约130ms/次,在挖掘机无人自主作业这种低速场景下可以满足实时性要求,如果部署在AI加速硬件中(如GPU、AI芯片)实时性会进一步提升,达到30ms/次以内。
深度卷积神经网络Yolo5KeyPoints需要事先通过采集数据、数据标注、模型训练才能具备对含有特定图案标识件3及其5个参考点识别能力。
一种可选的对上述对象识别的图像识别网络模型训练的过程如下:
1)获取训练样本数据集,所述训练样本数据集包括多个样本图像中每个样本图像,以及对应的标签图像;每个所述样本图像包含标识件3和位于该标识件3上的多个参考点;所述标签图像为对应所述样本图像中的标识件3的进行标注框标识和所述标识件3上每个参考点的标识的图像。
在本实施例中,样本图像可以是使用双目相机去拍摄各种天气条件下、各种工作环境中的标识件3的图像,图像采集完成后人工挑选出符合要求的图片进行标注。挑选图片时挑选各种天气下、各种工作环境中有意义的图片且图片完整地包含标识件3图案,重复出现的图片只保留其中一张。
标签图像的生成过程如下:通过labelme工具进行数据标注,得到每张图像标识件 3的标注框和5个参考点位置的标签(label)文件。
具体操作过程如下:点击标注软件labelme中创建目标矩形框按钮,创建新的目标框将标识件3框起来,输入标注框名字“SignBoard”即得到了标识件3标签;之后点击创建点目标按钮,点击标识件3图案的五个参考点,并输入参考点名称:点1(point1)、点2(point2)、点3(point3)、点4(point4)、点5(point5)得到参考点的标签,标完整张图片的所有标识件3及其5个参考点就得到了这张图像的标签图像。
2)构建预设深度学习模型,将所述预设深度学习模型确定为当前深度学习模型。
3)基于所述当前深度学习模型,对样本数据集中的样本图像进行标注预测操作,确定所述样本图像的标注预测结果。
4)基于所述样本图像标注和所述标注预测结果,确定损失值。
5)当所述损失值大于预设阈值时,基于所述损失值进行反向传播,对所述当前深度学习模型进行模型权重更新以得到更新模型权重后的深度学习模型,将所述更新模型权重后的深度学习模型重新确定为所述当前深度学习模型,至此完成一次迭代训练;重复步骤:基于所述当前深度学习模型,对所述样本图像进行标注预测操作,确定所述样本图像标注和所述标注预测结果之间的损失值,损失值大于阈值即进行反向传播更新模型权重。
6)当所述损失值小于或等于所述预设阈值时或达到预设最大迭代次数时,将所述当前深度学习模型确定为所述对象的图像识别网络模型。
于一种可选的实施例中,模型训练过程还可以如下:加载预训练权重,然后输入标注好的数据进行模型训练。图像数据通过预处理模块进行归一化操作,归一化后的图像数据送入网络模型进行前向传播得到预测结果,预测结果与标签文件目标真值通过损坏函数求出模型输出与目标真值之间的偏差,即损失值。损失值可以包含三部分,即目标分类损失、检测框回归损失、参考点回归损失,损失通过反向传播对网络每一层权重进行权值更新,至此完成一次训练,通过不断迭代训练使得模型收敛,当达到收敛目标时或最大迭代次数就完成了整个模型的训练。
而后,将上述训练好的Yolo5KeyPoints模型部署至CPU处理器或者GPU、AI芯片即可进行模型推理预测,得到双目相机4实时拍摄图像的标识件3及其5个参考点识别结果。
模型训练阶段通过现场数据训练得到了具备检测能力的模型,模型可以通过opencv库或者libtorh库部署至CPU处理器,而GPU、AI芯片也可以通过厂家提供的库及部署要求部署至GPU、AI芯片上,一般要过经过模型格式的转化。
标识件3及其5个参考点检测过程包含三个阶段,当输入一张实时图像时首先经过预处理阶段,完成图像的归一化操作;归一化后的图像再送入Yolo5KeyPoints模型进行前向推理得到图像上每个网格点的预测结果;最后每个网格点的预测结果通过后处理,如非极大值抑制,得到最终的预测结果。
无人挖机自主作业中,双目相机4实时采集图像数据并完成标识件3及其5个参考点识别后,而后利用下述步骤S505-S513,即通过双目相机4立体几何视觉对5对参考点图像坐标进行计算得到标识件3的5个参考点的空间三维坐标。此种方式检测鲁棒性高、精度高。
S505:针对每个该参考点,基于该目标图像和该参考点的识别结果确定该参考点在像素坐标系下的坐标。
在本实施例中,参考点在像素坐标系下的坐标可以为(u,v)。
S507:获取该双目相机4的相机参数。
于一种可选的示例中,该相机参数包括双目相机4的左右目相机之间的距离、安装参数以及内参。
可选的,双目相机4包括左目相机和右目相机,左目和右目之间的距离为T;双目相机的安装参数包括左目相机或者右目相机分别距离车辆坐标系原点的平移距离,以及相对于车辆坐标系原点的旋转角度;该双目相机的内参包括左目相机的内参和右目相机的内参。
S509:利用双目相机4立体几何视觉算法,基于该双目相机4的相机参数和该参考点在像素坐标系下的坐标确定该参考点的目标三维坐标。
S511:基于各个参考点的目标三维坐标确定该车厢101的目标三维坐标;
S513:基于该车厢101的目标三维坐标确定该车厢101的姿态信息。
于一种可选的示例中,参阅图6,图6为本申请另一种可选的车厢姿态识别方法的流程示意图。步骤S509-S513可以具体阐述为:
S601:基于该双目相机4的左右目相机之间的距离、该双目相机4的内参和该参考点在像素坐标系下的坐标确定该参考点在相机坐标系下的坐标。
可选的,该双目相机的内参包括左目相机的内参和右目相机的内参。
可选的,以下提供一种可选的确定参考点在相机坐标系下的坐标的实施例。参阅图7,图7为本申请一种可选的多种坐标系之间的关系示意图。世界坐标系:Xw、Yw、Zw;相机坐标系:Xc、Yc、Zc;图像坐标系:x、y;像素坐标系:u、v(反映了相机CCD芯片中像素的排列情况)。
从图中可以看出,假设(u0,v0)代表O在u-v坐标系下的坐标,假设一个像素的长度和宽度分别为dx和dy,则像素坐标系与图像坐标系的关系如下:

将公式(1)和公式(2)联立,并写成矩阵可以表示如下:
由上述公式(3)可以求解出图像上任意一像素点在图像坐标系下的坐标,即(x,y)。
参阅图8,图8为本申请一种可选的相机模型示意图。Ol、Or是双目相机4左右目的投影中心,两者之间的连线D称作双目基线,即两个相机中心点之间的距离。P是空间中的一点,Pl是点P在左目上的成像点;Pr是点P在右目上的成像点,则视差d=xl-xr
根据相似三角形定理,△POlOr相似△PPlPr。则深度Z可由下方公式求得。
其中,f为相机的焦距。
由上述公式(4)得到深度信息Z,即Zc后,可以进一步确定P在目标双目相机坐标系下坐标(双目相机坐标系即以左目相机中心为原点的坐标系,也即坐标关系图7的相机坐标系)的Xc、Yc。
参阅图9,图9为本申请另一种可选的多种坐标之间的关系示意图。由双目相机左目相机坐标系,空间点P在左目图像的成像点Pl,利用相似三角形定理可以求取空间点P的横向坐标X,垂直方向坐标Y。P(X,Y,Z)在双目相机左目相机坐标系下的坐标为P(Xc、Yc、Zc);P(x,y)是空间点P(Xc、Yc、Zc)在双目相机左目图像坐标系下的成像点图像坐标。
由于△ABOc~△oCOc,△PBOc~△pCOc,则可以推导出:
由上述公式的推导可求出Xc、Yc为:

至此,空间点P(Xc、Yc、Zc)所有坐标已求出。按照上述方式,可以依次求解出标识件3上5个参考点在相机坐标系下的三维坐标,坐标为左目相机坐标系下坐标。
S603:基于该双目相机的安装参数确定第一坐标转换矩阵。
S605:按照该第一坐标转换矩阵,将该参考点在相机坐标系下的三维坐标转换为目标车辆坐标系下的三维坐标;该目标车辆坐标系为该第二车辆2所在的坐标系。
本实施例中,假如车辆坐标系的原点为第二车辆2的回转中心点,通过确定该双目相机距离该原点的平移距离以及旋转角度,从而可以确定相应旋转矩阵R和平移向量T;该第二车辆为挖掘机时,该第二车辆包括机械臂和基座,该机械臂的旋转端与该基座可 旋转连接,该第二车辆的回转中心点位于该旋转端所在的旋转轴线上。
将参考点在相机坐标系下的三维坐标转换为第二车辆2坐标系下的三维坐标,坐标转换矩阵表示如下:
则根据公式(7),该相机坐标系与第二车辆2坐标系的关系如下:
可选的,根据车辆坐标系定义的不同,上述坐标转换矩阵,即公式(7)也是不同的。
S607:基于各个参考点在该目标车辆坐标系下的三维坐标确定该车厢101在目标车辆坐标系下的三维坐标。
可选的,以标识件3包括两个子标识件301,每个子标识件301上包含有5个参考点为例,针对每个子标识件301,可以对5个参考点的坐标进行筛选处理,剔除其中数值偏差较大的参考点位置坐标的数据,并对其余符合要求的参考点的坐标数据进行求平均处理,得到该子标识件301的三维坐标数据。于另一种可选的实施例中,还可以对位于子标识件301上的不同位置的参考点的坐标设置权重,通过各个参考点的坐标乘以对应的权重得到该子标识件301的三维坐标数据。
S609:基于该车厢101在目标车辆坐标系下的三维坐标确定该车厢101的姿态信息。
由上述两个子标识件301在第二车辆2坐标系下的三维坐标,从而可以确定出该车厢101的姿态信息。
该姿态信息如车厢101与第二车辆2之间的相对距离、偏转角等信息。
于一种可行的实施例中,图像识别模块包括特征提取子模块、特征融合子模块和预测识别模块;该特征提取子模块,用于利用特征提取网络对图像进行特征提取操作,得到特征图集;特征融合子模块,用于利用特征融合网络对该特征图集进行特征融合处理,得到目标特征图;预测识别模块,用于利用预测识别网络对该目标特征图进行预测处理,得到目标图像
于一种可行的实施例中,位置确定模块包括:
像素坐标确定模块,用于针对每个该参考点,基于该目标图像和该参考点的识别结果确定该参考点在像素坐标系下的坐标;
相机参数获取模块,用于获取该双目相机4的相机参数;
目标三维坐标确定模块,用于利用双目相机4立体几何视觉算法,基于该双目相机4的相机参数和该参考点在像素坐标系下的坐标确定该参考点的目标三维坐标;基于各 个参考点的目标三维坐标确定该车厢101的目标三维坐标;
姿态信息确定模块,用于基于该车厢101的目标三维坐标确定该车厢101的姿态信息。
于一种可行的实施例中,目标三维坐标确定模块包括第一坐标确定模块,第一坐标转换矩阵和第二坐标确定模块;
第一坐标确定模块,用于基于该双目相机4的左右目相机之间的距离、该双目相机4的内参和该参考点在像素坐标系下的坐标确定该参考点在相机坐标系下的三维坐标;
第一坐标转换矩阵,用于基于该双目相机的安装参数确定第一坐标转换矩阵;
第二坐标确定模块,用于按照该第一坐标转换矩阵,将该参考点在相机坐标系下的三维坐标转换为目标车辆坐标系下的三维坐标;该目标车辆坐标系为该第二车辆2所在的坐标系;基于各个参考点在该目标车辆坐标系下的三维坐标确定该车厢101在目标车辆坐标系下的三维坐标;
姿态信息确定模块,用于基于该车厢101在目标车辆坐标系下的三维坐标确定该车厢101的姿态信息。
上述模块的具体实施过程同上述方法项的描述,在此不再赘述,需要说明的是,上述模块可以作为处理器中的实体子模块,也可以是由程序构成的虚拟模块。
如上所述,基于本申请提供的该车厢姿态识别***,其可选的工作过程如下:将训练好的深度卷积神经网络模型Yolo5KeyPoints部署至CPU,此时神经网模型Yolo5KeyPoints完成一次检测计算得到标识件3及其五个参考点耗时约130ms;如果计算单元含有图像计算单元GPU或AI加速芯片,可将神经网络模型Yolo5KeyPoints部署至图像计算单元GPU或AI加速芯片,提升实时性,此时神经网模型Yolo5KeyPoints完成一次检测计算得到标识件3及其五个参考点耗时约30ms。双目立体几何视觉计算5对参考点生成5个参考点空间三维坐标算法部署至CPU。
***上电,双目相机4卡车车厢101位置、姿态感知***启动、无人挖机控制***启动,无人挖掘机进入无人自主作业状态。
作业人员将第一车辆1(如卡车)泊入停车位,准备装料。
卡车泊车完成后,双目相机4将拍摄的左目和右目彩色图像1280x720 RGB图分别送往深度卷积神经网络Yolo5KeyPoints模型。神经网络Yolo5KeyPoints模型进行推理之前会将左目和右目彩色图像1280x720 RGB图进行预处理,预处理包括图像归一化和图像缩放。Yolo5KeyPoints模型预处理中图像归一化采用图像每个像素RGB值除以127.5减去1,将所有像素值归一化到-1~1;图像缩放操作是将原始1280x720 RGB图缩放至1280x736 RGB图,满足模型输入图像尺寸要求。经过预处理后,图像数据送入Yolo5KeyPoints模型进行推理得到模型输出,模型输出为每个网格点上每个锚框的物体分类结果、检测框回归结果和五个参考点回归结果。之后,Yolo5KeyPoints模型输出通过非极大值抑制等后处理得到最终的标识件3目标检测框及其5个参考点像素坐标。
通过神经网络推理得到双目相机4左目和右目图像的标识件3的5个参考点像素坐标后,利用双目相机4立体几何视觉计算5个参考点像素坐标得到5个参考点的三维空间坐标。该车厢姿态识别***结构简单和适用于大规模室外无人作业场景。
本申请的实施例还提供了一种电子设备,该电子设备包括处理器和存储器,存储器中存储有至少一条指令、至少一段程序、代码集或指令集,该至少一条指令、该至少一段程序、该代码集或该指令集由处理器加载并执行以实现如上述的车厢姿态识别方法。
本申请的实施例还提供了一种计算机存储介质,所述计算机存储介质可设置于服务器之中以保存用于实现方法实施例中一种车厢姿态识别方法相关的至少一条指令、至少一段程序、代码集或指令集,该至少一条指令、该至少一段程序、该代码集或指令集由该处理器加载并执行以实现上述车厢姿态识别方法。
可选的,在本实施例中,上述存储介质可以位于计算机网络的多个网络服务器中的至少一个网络服务器。可选的,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是:上述本申请实施例先后顺序仅仅为了描述,不代表实施例的优劣。且上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (10)

  1. 一种车厢姿态识别***,其特征在于,包括处理器(5)、双目相机(4)、第一车辆(1)、第二车辆(2)和标识件(3);
    所述第一车辆(1)包括车厢(101);
    所述第二车辆(2)用于向所述第一车辆(1)的车厢(101)卸物料;
    所述标识件(3)设于所述车厢(101)的侧面;所述标识件(3)上设有多个参考点;
    所述双目相机(4)设于所述第二车辆(2)上,所述双目相机(4)用于采集包含所述标识件(3)的图像,并将所述图像发送给所述处理器(5);
    所述处理器(5)与所述双目相机(4)通信连接;所述处理器(5)用于利用图像识别网络模型对所述图像进行识别处理,得到目标图像;所述目标图像为包含所述多个参考点的识别结果的图像;并利用双目相机(4)立体几何视觉算法对所述目标图像中的多个参考点中的每个参考点进行目标空间三维坐标确定;基于各个参考点的目标空间三维坐标确定所述车厢(101)的姿态信息。
  2. 根据权利要求1所述的车厢姿态识别***,其特征在于,所述第二车辆(2)包括连接的卸料装置(201)和主体结构(202);
    所述卸料装置(201)与所述主体结构(202)的第一侧面(2021)连接;
    所述双目相机(4)设于所述第一侧面(2021)上。
  3. 根据权利要求2所述的车厢姿态识别***,其特征在于,所述第一侧面(2021)包括第一安装点和第二安装点;
    所述第一安装点和所述第二安装点位于所述第一侧面(2021)的顶部;
    所述第一安装点与所述第二安装点之间存在第一预设距离;
    所述双目相机(4)包括第一双目相机(401)和第二双目相机(402);
    所述第一双目相机(401)设于所述第一安装点;
    所述第二双目相机(402)设于所述第二安装点。
  4. 根据权利要求2所述的车厢姿态识别***,其特征在于,所述标识件(3)包括至少两个子标识件(301);
    所述至少两个子标识件(301)分别位于所述车厢(101)的第二侧面的第一位置点和第二位置点;
    所述第二侧面为朝向所述第一侧面(2021)的面;
    所述第一位置点与所述第二位置点之间存在第二预设距离;所述第二预设距离大于所述第二侧面沿第一预设方向的长度的一半;所述第一预设方向与所述第二预设方向垂直;所述第二预设方向为所述第一车辆(1)的高度方向的延长线方向。
  5. 根据权利要求1所述的车厢姿态识别***,其特征在于,所述处理器(5)包括图像识别模块和位置确定模块;
    所述图像识别模块用于利用图像识别网络模型对所述图像进行识别处理,得到目标图像;所述目标图像为包含所述多个参考点的识别结果的图像,并将所述目标图像发送给位置确定模块;
    所述位置确定模块用于利用双目相机(4)立体几何视觉算法对所述目标图像中的多个参考点中的每个参考点进行目标空间三维坐标确定;基于各个参考点的目标空间三维坐标确定所述车厢(101)的姿态信息。
  6. 一种利用如权利要求1-5任一项所述的车厢姿态识别***实现的车厢姿态识别方法,其特征在于,所述方法包括:
    利用双目相机(4)采集包含标识件(3)的图像;
    将所述图像输入到图像识别网络模型,得到目标图像;所述目标图像上包含有对所述多个参考点中每个参考点的识别结果;
    针对每个所述参考点,基于所述目标图像和所述参考点的识别结果确定所述参考点在像素坐标系下的坐标;
    获取所述双目相机(4)的相机参数;
    利用双目相机立体几何视觉算法,基于所述双目相机(4)的相机参数和所述参考点在像素坐标系下的坐标确定所述参考点的目标空间三维坐标;
    基于各个参考点的目标空间三维坐标确定所述车厢(101)的目标空间三维坐标;
    基于所述车厢(101)的目标空间三维坐标确定所述车厢(101)的姿态信息。
  7. 根据权利要求6所述的车厢姿态识别方法,其特征在于,所述相机参数包括双目相机(4)的左右目相机之间的距离、安装参数以及内参。
  8. 根据权利要求7所述的车厢姿态识别方法,其特征在于,所述基于所述双目相机(4)的相机参数和所述参考点在像素坐标系下的坐标确定所述参考点的目标空间三维坐标;基于各个参考点的目标三维坐标确定所述车厢(101)的目标空间三维坐标;基于所述车厢(101)的目标三维坐标确定所述车厢(101)的姿态信息,包括:
    基于所述双目相机(4)的左右目相机之间的距离、所述双目相机的内参和所述参考点在像素坐标系下的坐标确定所述参考点在相机坐标系下的三维坐标;
    基于所述双目相机的安装参数确定第一坐标转换矩阵;
    按照所述第一坐标转换矩阵,将所述参考点在相机坐标系下的三维坐标转换为目标车辆坐标系下的三维坐标;所述目标车辆坐标系为所述第二车辆所在的坐标系;
    基于各个参考点在所述目标车辆坐标系下的三维坐标确定所述车厢在目标车辆坐标系下的三维坐标;
    基于所述车厢在目标车辆坐标系下的三维坐标确定所述车厢(101)的姿态信息。
  9. 一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求6-8任一项所述的车厢姿态识别方法。
  10. 一种计算机存储介质,其特征在于,所述计算机存储介质中存储有至少一条指令或至少一段程序,所述至少一条指令或至少一段程序由处理器加载并执行以实现如权利要求6-8任一项所述的车厢姿态识别方法。
PCT/CN2023/120389 2022-10-24 2023-09-21 车厢姿态识别***、方法、电子设备及存储介质 WO2024087962A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211299656.9 2022-10-24
CN202211299656.9A CN115497077A (zh) 2022-10-24 2022-10-24 车厢姿态识别***、方法、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024087962A1 true WO2024087962A1 (zh) 2024-05-02

Family

ID=84473534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/120389 WO2024087962A1 (zh) 2022-10-24 2023-09-21 车厢姿态识别***、方法、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN115497077A (zh)
WO (1) WO2024087962A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115497077A (zh) * 2022-10-24 2022-12-20 广西柳工机械股份有限公司 车厢姿态识别***、方法、电子设备及存储介质
CN117495698A (zh) * 2024-01-02 2024-02-02 福建卓航特种设备有限公司 飞行物识别方法、***、智能终端及计算机可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876849A (zh) * 2018-04-24 2018-11-23 哈尔滨工程大学 基于辅助标识的深度学习目标识别及定位方法
US20190204084A1 (en) * 2017-09-29 2019-07-04 Goertek Inc. Binocular vision localization method, device and system
CN210294888U (zh) * 2019-06-18 2020-04-10 深圳诗航智能科技有限公司 基于深度学习追踪目标的自动跟随运输车
CN111551151A (zh) * 2020-06-04 2020-08-18 江苏集萃智能光电***研究所有限公司 基于双目视觉的临近空间飞行器相对位姿测量方法及装置
CN115170648A (zh) * 2022-06-29 2022-10-11 广西柳工机械股份有限公司 一种车厢位姿确定方法及装置
CN115497077A (zh) * 2022-10-24 2022-12-20 广西柳工机械股份有限公司 车厢姿态识别***、方法、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190204084A1 (en) * 2017-09-29 2019-07-04 Goertek Inc. Binocular vision localization method, device and system
CN108876849A (zh) * 2018-04-24 2018-11-23 哈尔滨工程大学 基于辅助标识的深度学习目标识别及定位方法
CN210294888U (zh) * 2019-06-18 2020-04-10 深圳诗航智能科技有限公司 基于深度学习追踪目标的自动跟随运输车
CN111551151A (zh) * 2020-06-04 2020-08-18 江苏集萃智能光电***研究所有限公司 基于双目视觉的临近空间飞行器相对位姿测量方法及装置
CN115170648A (zh) * 2022-06-29 2022-10-11 广西柳工机械股份有限公司 一种车厢位姿确定方法及装置
CN115497077A (zh) * 2022-10-24 2022-12-20 广西柳工机械股份有限公司 车厢姿态识别***、方法、电子设备及存储介质

Also Published As

Publication number Publication date
CN115497077A (zh) 2022-12-20

Similar Documents

Publication Publication Date Title
CN111325796B (zh) 用于确定视觉设备的位姿的方法和装置
US10496104B1 (en) Positional awareness with quadocular sensor in autonomous platforms
CN111563415B (zh) 一种基于双目视觉的三维目标检测***及方法
WO2024087962A1 (zh) 车厢姿态识别***、方法、电子设备及存储介质
JP2021089724A (ja) 構造的制約及び物理的制約を伴う3d自動ラベル付け
CN110587597B (zh) 一种基于激光雷达的slam闭环检测方法及检测***
US11703334B2 (en) Mobile robots to generate reference maps for localization
Ding et al. Vehicle pose and shape estimation through multiple monocular vision
CN113052907B (zh) 一种动态环境移动机器人的定位方法
Liang et al. Image-based positioning of mobile devices in indoor environments
CN115900710A (zh) 基于视觉信息的动态环境导航方法
KR20200094075A (ko) 다수의 차량을 통한 센서 퓨전, v2x 통신 가능한 애플리케이션을 이용한 협업 주행을 위해 주변에 위치하는 카메라 각각에 대응되는 각각의 객체 검출기에 의해 검출되는 객체 검출 정보를 통합하는 방법 및 장치
Shi et al. An improved lightweight deep neural network with knowledge distillation for local feature extraction and visual localization using images and LiDAR point clouds
Ishihara et al. Deep radio-visual localization
KR20230049969A (ko) 글로벌 측위 장치 및 방법
CN111695497B (zh) 基于运动信息的行人识别方法、介质、终端和装置
CN114972492A (zh) 一种基于鸟瞰图的位姿确定方法、设备和计算机存储介质
Scheuermann et al. Mobile augmented reality based annotation system: A cyber-physical human system
CN115170648B (zh) 一种车厢位姿确定方法及装置
JP2023100258A (ja) 空中給油のための姿勢推定の修正
CN116012609A (zh) 一种环视鱼眼多目标跟踪方法、装置、电子设备及介质
CN115790568A (zh) 基于语义信息的地图生成方法及相关设备
US11657506B2 (en) Systems and methods for autonomous robot navigation
CN111860084B (zh) 图像特征的匹配、定位方法及装置、定位***
CN113994382A (zh) 深度图生成方法、电子设备、计算处理设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23881541

Country of ref document: EP

Kind code of ref document: A1