CN116385949A - Mobile robot region detection method, system, device and medium - Google Patents

Mobile robot region detection method, system, device and medium Download PDF

Info

Publication number
CN116385949A
CN116385949A CN202310297047.8A CN202310297047A CN116385949A CN 116385949 A CN116385949 A CN 116385949A CN 202310297047 A CN202310297047 A CN 202310297047A CN 116385949 A CN116385949 A CN 116385949A
Authority
CN
China
Prior art keywords
data set
mobile robot
trained
neural network
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310297047.8A
Other languages
Chinese (zh)
Other versions
CN116385949B (en
Inventor
彭广德
王睿
满天荣
李卫铳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ligong Industrial Co ltd
Original Assignee
Guangzhou Ligong Industrial Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ligong Industrial Co ltd filed Critical Guangzhou Ligong Industrial Co ltd
Priority to CN202310297047.8A priority Critical patent/CN116385949B/en
Publication of CN116385949A publication Critical patent/CN116385949A/en
Application granted granted Critical
Publication of CN116385949B publication Critical patent/CN116385949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/005Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 with correlation of navigation data from several sources, e.g. map or contour matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a region detection method, a system, a device and a medium for a mobile robot. The method comprises the steps of obtaining image data; the image data is identified and analyzed through a multi-task neural network model, a target drivable area of the mobile robot is obtained, and the multi-task neural network model is obtained through training of the following steps: acquiring and labeling multi-category data sets, acquiring and inputting the labeled data sets into a primary network framework for training, and outputting a first data set and a second data set; inputting the second data set into a secondary network framework to obtain a third data set; and carrying out multi-element fusion on the first data set, the third data set and the radar interface data to obtain and update parameters of the multi-task neural network model according to the fused data set to obtain the multi-task neural network model. The region detection method can save training resources of the model, improve the accuracy and detection speed of the model and reduce the hardware calculation cost. The invention can be widely applied to the technical field of automatic navigation.

Description

Mobile robot region detection method, system, device and medium
Technical Field
The invention relates to the technical field of automatic navigation, in particular to a region detection method, a system, a device and a medium for a mobile robot.
Background
In recent years, the intellectualization of mobile robots is a developing trend of heat and fire, wherein the direction of automatic navigation is a relatively mature field. The main direction of automatic navigation is to monitor the drivable area in real time and plan an optimal route, including perception detection, early warning detection and the like of vision and radar on the drivable area. The convolution neural network is used for monitoring and processing the information adopted by the real-time sensor, and judging the exercisable area and the real-time path planning, so that the safety and the driving early warning protection of the robot during operation can be greatly reduced.
At present, a single convolutional neural network is used for processing a classification problem, so that high accuracy can be obtained, if the results of the pictures on n tasks are analyzed simultaneously, n models are required to be trained respectively, and n models are required to be operated during prediction, so that the waste of hardware resources and time is caused. There are some multitasking classification methods, where multiple task branches are directly connected after a shared underlying network, sharing parameters of the underlying network. Because each task is mutually influenced, training is very difficult, so that the accuracy of one task is improved, the accuracy of other tasks is reduced, and high accuracy is difficult to obtain simultaneously.
Accordingly, there is a need for solving and optimizing the problems associated with the prior art.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the related art to a certain extent.
Therefore, an object of the embodiments of the present invention is to provide a region detection method for a mobile robot, which not only can effectively save training resources of a model, but also can improve accuracy and detection speed of the model, and effectively reduce use cost of hardware computing power.
Another object of an embodiment of the present application is to provide a region detection system of a mobile robot.
In order to achieve the technical purpose, the technical scheme adopted by the embodiment of the application comprises the following steps:
in a first aspect, an embodiment of the present application provides a region detection system of a mobile robot, including:
acquiring image data to be detected from a mobile robot;
identifying and analyzing the image data through a trained multi-task neural network model to obtain a target drivable area of the mobile robot, wherein the multi-task neural network model comprises a primary network frame and a secondary network frame;
the multi-task neural network model is obtained through training the following steps:
Acquiring a multi-category data set and radar interface data, and labeling the multi-category data set to obtain a labeled data set;
inputting the annotation data set into an initialized primary network frame for training to obtain a trained primary network frame, and a first data set and a second data set which are output by the trained primary network frame, wherein the first data set is a characteristic image data set which does not need to extract a candidate frame, and the second data set is a characteristic image data set which does need to extract the candidate frame;
inputting the second data set into a trained secondary network framework to obtain a third data set;
performing multi-element fusion on the first data set, the third data set and the radar interface data to obtain a fusion data set;
and updating parameters of the multi-task neural network model according to the fusion data set to obtain the trained multi-task neural network model.
In addition, the area detection method of the mobile robot according to the above embodiment of the present application may further have the following additional technical features:
further, in an embodiment of the present application, the identifying and analyzing the image data through the trained multi-task neural network model to obtain the target drivable area of the mobile robot includes:
Identifying the image data through the trained multi-task neural network model to obtain an initial drivable area of the mobile robot;
acquiring a preset judging rule, wherein the judging rule is used for representing judging logic corresponding to each task in the initial drivable area;
and analyzing the initial drivable region according to the judging rule and the calibration information to obtain the target drivable region.
Further, in one embodiment of the present application, the secondary network framework is trained by:
acquiring a first positive sample, wherein the first positive sample is used for representing a sample data set obtained after the labeling data set is input into an initialized primary network framework for processing;
slicing and feature extraction are carried out on the first data set according to the first positive sample, so that a feature sample set is obtained;
and inputting the characteristic sample set and the labeling data set into an initialized secondary network frame for training to obtain the trained secondary network frame.
Further, in one embodiment of the present application, the method further comprises the steps of: the step of inputting the labeling data set into the initialized primary network frame for training comprises the following steps:
Preprocessing the marked data set to obtain a processed data set;
calculating the loss weight of the processing data set input to the initialized primary network framework; and updating parameters of the initialized primary network framework according to the loss weight.
Further, in one embodiment of the present application, the secondary network framework includes: a feature fusion module;
the feature fusion module is used for realizing the following steps:
acquiring a second data set output by the primary network framework;
performing convolution fusion on a plurality of feature image data in the second data set according to the feature fusion module to obtain a plurality of feature fusion data;
wherein the widths and heights of the characteristic data are different; the channels between the feature fusion data are the same in number, different in width and different in height.
Further, in one embodiment of the present application, the second data set is obtained by:
acquiring an initial data set output by the trained primary network framework and a plurality of preset classification rules;
classifying the initial data set according to the plurality of preset classification rules;
and if the initial data set meets at least one of the plurality of preset classification rules, obtaining the second data set.
In a second aspect, an embodiment of the present application provides a region detection system of a mobile robot, including:
the acquisition module is used for acquiring image data to be detected from the mobile robot;
the recognition analysis module is used for carrying out recognition analysis on the image data through a trained multi-task neural network model to obtain a target drivable area of the mobile robot, wherein the multi-task neural network model comprises a primary network frame and a secondary network frame;
the multi-task neural network model is obtained through training the following steps:
acquiring a multi-category data set and radar interface data, and labeling the multi-category data set to obtain a labeled data set;
inputting the annotation data set into an initialized primary network frame for training to obtain a trained primary network frame, and a first data set and a second data set which are output by the trained primary network frame, wherein the first data set is a characteristic image data set which does not need to extract a candidate frame, and the second data set is a characteristic image data set which does need to extract the candidate frame;
inputting the second data set into a trained secondary network framework to obtain a third data set;
Performing multi-element fusion on the first data set, the third data set and the radar interface data to obtain a fusion data set;
and updating parameters of the multi-task neural network model according to the fusion data set to obtain the trained multi-task neural network model.
Further, in one embodiment of the present application, the secondary network framework is trained by:
acquiring a first positive sample, wherein the first positive sample is used for representing a sample data set obtained after the labeling data set is input into an initialized primary network framework for processing;
slicing and feature extraction are carried out on the first data set according to the first positive sample, so that a feature sample set is obtained;
and inputting the characteristic sample set and the labeling data set into an initialized secondary network frame for training to obtain the trained secondary network frame.
In a third aspect, an embodiment of the present application further provides an area detection device of a mobile robot, including:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the region detection method of the mobile robot of the first aspect described above.
In a fourth aspect, embodiments of the present application further provide a computer readable storage medium, in which a program executable by a processor is stored, the program executable by the processor being configured to implement the region detection method of the mobile robot of the first aspect.
The advantages and benefits of the present application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the present application.
The embodiment of the application discloses a region detection method, a system, a device and a medium for a mobile robot, wherein image data to be detected from the mobile robot are obtained; identifying and analyzing the image data through a trained multi-task neural network model to obtain a target drivable area of the mobile robot, wherein the multi-task neural network model comprises a primary network frame and a secondary network frame; the multi-task neural network model is obtained through training the following steps: acquiring a multi-category data set and radar interface data, and labeling the multi-category data set to obtain a labeled data set; inputting the annotation data set into an initialized primary network frame for training to obtain a trained primary network frame, and a first data set and a second data set which are output by the trained primary network frame, wherein the first data set is a characteristic image data set which does not need to extract a candidate frame, and the second data set is a characteristic image data set which does need to extract the candidate frame; inputting the second data set into a trained secondary network framework to obtain a third data set; performing multi-element fusion on the first data set, the third data set and the radar interface data to obtain a fusion data set; and updating parameters of the multi-task neural network model according to the fusion data set to obtain the trained multi-task neural network model. The region detection method not only can effectively save training resources of the model, but also can improve the accuracy and detection speed of the model, and effectively reduces the use cost of hardware computing power.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description is made with reference to the accompanying drawings of the embodiments of the present application or the related technical solutions in the prior art, and it should be understood that, in the following description, the drawings are only for convenience and clarity of expressing some of the embodiments in the technical solutions of the present application, and other drawings may be obtained according to the drawings without the need of inventive labor for those skilled in the art.
Fig. 1 is a flow chart of a method for detecting a region of a mobile robot according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a multi-task neural network model according to an embodiment of the present application;
fig. 3 is a schematic diagram of an SE unit according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a specific network framework for implementing step 150 according to an embodiment of the present application;
fig. 5 is a convolution fusion schematic diagram of a biFPN network module according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a specific network framework for implementing step 155 according to an embodiment of the present application;
FIG. 7 is a schematic diagram of another specific network framework for implementing step 150 according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a training principle of a secondary network framework for implementing step 150 according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a region detection system of a mobile robot according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an area detection device of a mobile robot according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.
At present, a single convolutional neural network is used for processing a classification problem, so that high accuracy can be obtained, if the results of the pictures on n tasks are analyzed simultaneously, n models are required to be trained respectively, and n models are required to be operated during prediction, so that the waste of hardware resources and time is caused. There are some multitasking classification methods, where multiple task branches are directly connected after a shared underlying network, sharing parameters of the underlying network. Because each task is mutually influenced, training is very difficult, so that the accuracy of one task is improved, the accuracy of other tasks is reduced, and high accuracy is difficult to obtain simultaneously.
Therefore, the embodiment of the invention provides a region detection method of a mobile robot, which not only can effectively save training resources of a model, but also can improve the accuracy and detection speed of the model and effectively reduce the use cost of hardware computing power.
Referring to fig. 1 and 2, in an embodiment of the present application, a method for detecting a region of a mobile robot includes:
step 110, obtaining image data to be detected from a mobile robot;
in this step, the mobile robot may be a mobile robot, and is an industrial vehicle for automatically or manually loading the cargo, automatically traveling or towing the cargo vehicle to a designated place according to a set route, and then automatically or manually loading and unloading the cargo. The image data to be detected from the mobile robot can be image data acquired by sensors arranged at different positions and different types on the mobile robot, specifically, the image data can be data output by a laser radar sensor, point cloud image data output by a sonar sensor, visual image data (including a bird's eye view) output by a visual sensor and the like, and the plurality of sensors work together, so that the detection data without dead angles in all directions can be provided for the area detection of the mobile robot.
Step 120, identifying and analyzing the image data through a trained multi-task neural network model to obtain a target drivable area of the mobile robot, wherein the multi-task neural network model comprises a primary network frame and a secondary network frame;
it may be appreciated that the step 120 of identifying and analyzing the image data through the trained multi-task neural network model to obtain the target travelable area of the mobile robot includes:
step 121, identifying the image data through the trained multi-task neural network model to obtain an initial travelable area of the mobile robot;
step 122, acquiring a preset judging rule, wherein the judging rule is used for representing judging logic corresponding to each task in the initial drivable area;
and 123, analyzing the initial drivable region according to the judging rule and the calibration information to obtain the target drivable region.
It can be appreciated that the initial travelable region is a preliminary screening result output by the trained multi-task neural network model; the calibration information can be a label result required by a target drivable area and a label score corresponding to the label result in the preliminary screening result, wherein the label result comprises labels such as lane lines, semantic segmentation, people, machine tools, sundries and the like; the decision rule may be a preset decision threshold, and the specific value may be set according to the actual requirement, which is not described herein in detail. Specifically, in the embodiment of the present application, image data acquired by a plurality of sensors may be inferred within 100ms continuously through a trained multi-task neural network model to obtain an initial drivable region, then, the drivable region of the mobile robot is analyzed according to a label result, a label score and a decision rule, and finally, a target drivable region of the mobile robot is output in combination with a region aerial view image acquired by a specific sensor of the plurality of sensors.
The multi-task neural network model is obtained through training the following steps:
130, acquiring a multi-category data set and radar interface data, and labeling the multi-category data set to obtain a labeled data set;
in this step, the multi-category dataset may be a set of image data required by different tasks acquired by a plurality of sensors, and each image data is labeled with one or more task labels that can be corresponding to each image data when labeled. In the embodiment of the application, the multi-category data set comprises tasks such as factory workshop production state classification, pedestrian detection, obstacle detection, lane line detection, panoramic semantic segmentation, human body key points and the like.
Step 140, inputting the labeling data set into the initialized primary network frame for training to obtain a trained primary network frame, and outputting a first data set and a second data set by the trained primary network frame, wherein the first data set is a characteristic image data set which does not need to extract a candidate frame, and the second data set is a characteristic image data set which does need to extract the candidate frame;
it may be appreciated that the step 140 of inputting the labeling data set into the initialized primary network frame for training includes:
141, preprocessing the marked data set to obtain a processed data set;
step 142, calculating the loss weight of the processing data set input to the initialized primary network framework; and updating parameters of the initialized primary network framework according to the loss weight.
It is to be understood that preprocessing the labeling data set may be performing a graying process, a binarizing process, an image enhancing process, a noise reduction process, or the like on each image data in the labeling data set. In the embodiment of the application, after preprocessing the labeling data set, the sizes of the image data in the processing data set are the same, and specifically, the sizes of the images may be 3×960×544. It will also be appreciated that for a primary network framework, the accuracy of the primary network framework predictions may be measured by a Loss Function (Loss Function) defined on a single training data for measuring the prediction error of a training data, and in particular, determining the Loss value of the training data from the label of the single training data and the primary network framework's predictions of the training data. In actual training, one training data set has a lot of training data, so that a Cost Function (Cost Function) is generally adopted to measure the overall error of the training data set, the Cost Function is defined on the whole training data set and is used for calculating the average value of the prediction errors of all training data, and the prediction effect of a primary network frame can be measured better. For a general primary network framework, based on the cost function and a regular term for measuring the complexity of the model, the model can be used as a training objective function, and the loss value of the whole training data set can be obtained based on the objective function. There are many kinds of common loss functions, such as 0-1 loss function, square loss function, absolute loss function, logarithmic loss function, cross entropy loss function, etc., which can be used as the loss function of the first-order network frame, and will not be described here. In embodiments of the present application, a loss function may be selected from among which to determine a trained loss value, such as a cross entropy loss function. Also, the process data set is used as training data for the primary network framework; since the penalty function includes penalty weights, based on the trained penalty values, the parameters of the model can be updated using a back-propagation algorithm. It can be further understood that the first data set may be multi-task data obtained by directly predicting the first-level network frame without feature extraction by using an RPN candidate frame network structure, such as lane line segmentation.
It may be understood that in the embodiment of the present application, the primary network frame may be a network frame adopting a ibn-resnet network structure, and a feature attention mechanism unit (SE unit, SE is fully called as squeize-and-Excitation Networks) is additionally added in the basic ibn-resnet network frame, and referring to fig. 3, the feature attention mechanism unit may explicitly model the relationship of each channel in the input image data, specifically may learn through Global Average Pooling and two FC layer connections to obtain the weight of each channel, and re-weight each channel in the original image data, so as to achieve the effects of enhancing the feature channels with more information in the characterization image data and suppressing the secondary channels, and may simultaneously utilize feature information with different dimensions, so as to be beneficial to classifying the targets with different dimensions da.
It is understood that in the step 140, the second data set is obtained by:
step 143, obtaining an initial data set output by the trained primary network frame and a plurality of preset classification rules;
step 144, classifying the initial data set according to the plurality of preset classification rules;
Step 145, obtaining the second data set if the initial data set meets at least one of the preset classification rules.
It can be appreciated that the second data set may be multi-task data that needs to be extracted by the RPN candidate frame network structure for target detection, key point detection, classification detection, etc., and that needs to be input to the secondary network frame prediction to be obtained. The initial data set comprises a first data set and a second data set, the preset classification rules comprise lane line classification rules, obstacle (including people, machine tools, sundries and the like) detection frame classification rules, human body key point detection rules and workshop production state classification rules, when certain image data in the initial data set accords with at least one of the preset classification rules, the image data in the initial data set is classified into the image data in the second data set until the image data in the initial data set are completely classified, and a final second data set is obtained.
Step 150, inputting the second data set into a trained secondary network framework to obtain a third data set;
it can be appreciated that the step 150, the secondary network framework, is trained by the following steps:
Step 151, obtaining a first positive sample, wherein the first positive sample is used for representing a sample data set obtained after the labeling data set is input to an initialized primary network frame for processing;
step 152, slicing and feature extraction are performed on the first data set according to the first positive sample, so as to obtain a feature sample set;
and 153, inputting the characteristic sample set and the labeling data set into an initialized secondary network frame for training, and obtaining the trained secondary network frame.
It can be understood that the first positive sample is a sample data set obtained by reasoning the initialized primary network frame model, then the first data set is sliced according to the first positive sample to obtain an RPN candidate frame of the first data set, feature data in the first data set is extracted according to the RPN candidate frame to obtain a feature sample set, and the secondary network frame is trained according to the obtained feature sample set and the labeling data set to obtain a trained secondary network frame.
It can be understood that, in the embodiment of the present application, referring to fig. 4, the first positive sample is image data of the Feature map layer in fig. 6, and when the secondary network frame is trained, the RPN candidate frame module and the Roi align nm module are used to perform slice registration Feature extraction on the first positive sample and the first data set, so that the training speed and the training accuracy of the secondary network frame are higher.
It will be appreciated that in step 150, the secondary network framework includes: a feature fusion module;
the feature fusion module is used for realizing the following steps:
step 154, obtaining a second data set output by the primary network framework;
step 155, performing convolution fusion on the plurality of feature image data in the second data set according to the feature fusion module to obtain a plurality of feature fusion data;
wherein the widths and heights of the characteristic image data are different; the channels between the feature fusion data are the same in number, different in width and different in height.
It will be appreciated that certain characteristic image data of the second data set may be first subjected to downsampling by different factors, in particular by 1/4, 1/8, 1/16, 1/32, 1/64, 1/128, etc.; and then outputting a plurality of feature image data processed by different downsampling multiples, wherein the widths and the heights of the feature image data are different, inputting the sampled feature image data to a feature fusion module, and adjusting the channel number of the feature fusion data after the feature fusion module convolves and fuses the plurality of feature image data to obtain feature fusion data with the same channel number, different widths and different heights.
It may be understood that, in this embodiment of the present application, the feature fusion module may be a biFPN network module, specifically, as shown in fig. 5 and fig. 6, a second data set may be obtained from a ibn-resnet network structure in the primary network frame, where the second data set is processed and output by downsampling layers with six different sampling multiples, and the outputs of the second data set are respectively recorded as a P1 layer, a P2 layer, a P3 layer, a P4 layer, a P5 layer, and a P6 layer, and collectively referred to as a P series layer, where the height of the P1 layer is 2 times the height of the P2 layer, the width of the P1 layer is 2 times the width of the P2 layer, and the remaining P2 layer, P3 layer, P4 layer, P5 layer, and P6 layer are the same, which are not described herein in detail. After the P1 layer, the P2 layer, the P3 layer, the P4 layer, the P5 layer and the P6 layer are convolved and fused by the biFPN network module, a B1 layer, a B2 layer, a B3 layer, a B4 layer, a B5 layer and a B6 layer which respectively correspond to the P1 layer, the P2 layer, the P3 layer, the P4 layer, the P5 layer and the P6 layer are obtained and are collectively called as B series layers. In the B series layers, the channels of the layers are the same in number, different in width and different in height, wherein the characteristic image data of the B1 layer is the largest, the characteristics are more global abstract, the detection task of a large target is facilitated, the characteristic image data of the B6 layer is the smallest, the characteristics are more detailed, and the detection task of a small target is facilitated.
It can be further understood that, in the embodiment of the present application, each layer in the B-series layer is output to a specific task head layer, and a corresponding task decoding output is obtained in a corresponding decoder layer, that is, the required features corresponding to each task are obtained respectively. In the training of the secondary network frame, referring to fig. 7 and 8, auxiliary learning head layers for different tasks can be introduced, and corresponding B1 layer, B2 layer, B3 layer, B4 layer, B5 layer and B6 layer are respectively accessed, so that each task can acquire required task characteristics from the primary network frame, training learning of other task branches is not affected, forward reasoning process is not needed, rapid convergence is realized, accuracy is improved, and loss is avoided.
Step 160, performing multi-component fusion on the first data set, the third data set and the radar interface data to obtain a fusion data set;
and step 170, updating parameters of the multi-task neural network model according to the fusion data set to obtain the trained multi-task neural network model.
It can be understood that the radar interface data can realize high-precision positioning of the mobile robot, provide an area basis for judging the drivable area of the mobile robot, and the third data set is a prediction result of the trained secondary network frame on the second data set, so that the radar interface data of the first data set and the third data set are subjected to multi-element fusion, and the quality of the fusion data set can be improved by utilizing the complementarity between the data. It can be further understood that, according to the fusion data set, the content of updating the parameters of the multi-task neural network model is similar to the content of the first-level network frame training and the second-level network frame training, and the description of the application is omitted herein. It is worth to say that the multi-metadata combination can complete the complex calculation process of dividing the drivable region into one time of reasoning, greatly reduces the code quantity and complex prediction logic structure, and cooperates with the primary network framework, the secondary network framework and the like, so that the multi-task neural network model is easy to train, and each task can obtain higher accuracy.
The following describes in detail a region detection system of a mobile robot according to an embodiment of the present application with reference to the accompanying drawings.
Referring to fig. 9, a region detection system of a mobile robot according to an embodiment of the present application includes:
an acquisition module 101 for acquiring image data to be detected from a mobile robot;
the recognition analysis module 102 is configured to perform recognition analysis on the image data through a trained multi-task neural network model to obtain a target drivable area of the mobile robot, where the multi-task neural network model includes a primary network frame and a secondary network frame;
the multi-task neural network model is obtained through training the following steps:
acquiring a multi-category data set and radar interface data, and labeling the multi-category data set to obtain a labeled data set;
inputting the annotation data set into an initialized primary network frame for training to obtain a trained primary network frame, and a first data set and a second data set which are output by the trained primary network frame, wherein the first data set is a characteristic image data set which does not need to extract a candidate frame, and the second data set is a characteristic image data set which does need to extract the candidate frame;
Inputting the second data set into a trained secondary network framework to obtain a third data set;
performing multi-element fusion on the first data set, the third data set and the radar interface data to obtain a fusion data set;
and updating parameters of the multi-task neural network model according to the fusion data set to obtain the trained multi-task neural network model.
In some embodiments, the secondary network framework is trained by:
acquiring a first positive sample, wherein the first positive sample is used for representing a sample data set obtained after the labeling data set is input into an initialized primary network framework for processing;
slicing and feature extraction are carried out on the first data set according to the first positive sample, so that a feature sample set is obtained;
and inputting the characteristic sample set and the labeling data set into an initialized secondary network frame for training to obtain the trained secondary network frame.
It can be understood that the content in the above method embodiment is applicable to the system embodiment, and the functions specifically implemented by the system embodiment are the same as those of the above method embodiment, and the achieved beneficial effects are the same as those of the above method embodiment.
Referring to fig. 10, an embodiment of the present application further provides a region detection apparatus of a mobile robot, including:
at least one processor 201;
at least one memory 202 for storing at least one program;
the at least one program, when executed by the at least one processor 201, causes the at least one processor 201 to implement the above-described region detection method embodiment of the mobile robot.
Similarly, it can be understood that the content in the above method embodiment is applicable to the embodiment of the present apparatus, and the functions specifically implemented by the embodiment of the present apparatus are the same as those of the embodiment of the foregoing method, and the achieved beneficial effects are the same as those achieved by the embodiment of the foregoing method.
The embodiment of the present application also provides a computer readable storage medium, in which a program executable by the processor 201 is stored, where the program executable by the processor 201 is used to implement the above-mentioned embodiment of the area detection method of the mobile robot when executed by the processor 201.
Similarly, the content in the above method embodiment is applicable to the present computer-readable storage medium embodiment, and the functions specifically implemented by the present computer-readable storage medium embodiment are the same as those of the above method embodiment, and the beneficial effects achieved by the above method embodiment are the same as those achieved by the above method embodiment.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of this application are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the present application is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the functions and/or features may be integrated in a single physical device and/or software module or one or more of the functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present application. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Thus, those of ordinary skill in the art will be able to implement the present application as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the application, which is to be defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the foregoing description of the present specification, descriptions of the terms "one embodiment/example", "another embodiment/example", "certain embodiments/examples", and the like, are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present application have been described in detail, the present application is not limited to the embodiments, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims (10)

1. A region detection method of a mobile robot, comprising:
acquiring image data to be detected from a mobile robot;
identifying and analyzing the image data through a trained multi-task neural network model to obtain a target drivable area of the mobile robot, wherein the multi-task neural network model comprises a primary network frame and a secondary network frame;
the multi-task neural network model is obtained through training the following steps:
acquiring a multi-category data set and radar interface data, and labeling the multi-category data set to obtain a labeled data set;
inputting the annotation data set into an initialized primary network frame for training to obtain a trained primary network frame, and a first data set and a second data set which are output by the trained primary network frame, wherein the first data set is a characteristic image data set which does not need to extract a candidate frame, and the second data set is a characteristic image data set which does need to extract the candidate frame;
inputting the second data set into a trained secondary network framework to obtain a third data set;
performing multi-element fusion on the first data set, the third data set and the radar interface data to obtain a fusion data set;
And updating parameters of the multi-task neural network model according to the fusion data set to obtain the trained multi-task neural network model.
2. The method for detecting the area of the mobile robot according to claim 1, wherein the identifying and analyzing the image data through the trained multi-task neural network model to obtain the target drivable area of the mobile robot comprises:
identifying the image data through the trained multi-task neural network model to obtain an initial drivable area of the mobile robot;
acquiring a preset judging rule, wherein the judging rule is used for representing judging logic corresponding to each task in the initial drivable area;
and analyzing the initial drivable region according to the judging rule and the calibration information to obtain the target drivable region.
3. The area detection method of a mobile robot according to claim 1, wherein the secondary network frame is trained by:
acquiring a first positive sample, wherein the first positive sample is used for representing a sample data set obtained after the labeling data set is input into an initialized primary network framework for processing;
Slicing and feature extraction are carried out on the first data set according to the first positive sample, so that a feature sample set is obtained;
and inputting the characteristic sample set and the labeling data set into an initialized secondary network frame for training to obtain the trained secondary network frame.
4. A method of area detection for a mobile robot according to claim 3, wherein the inputting the annotation dataset into an initialized primary network framework for training comprises:
preprocessing the marked data set to obtain a processed data set;
calculating the loss weight of the processing data set input to the initialized primary network framework; and updating parameters of the initialized primary network framework according to the loss weight.
5. The area detection method of a mobile robot according to claim 1, wherein the secondary network frame comprises: a feature fusion module;
the feature fusion module is used for realizing the following steps:
acquiring a second data set output by the primary network framework;
performing convolution fusion on a plurality of feature image data in the second data set according to the feature fusion module to obtain a plurality of feature fusion data;
Wherein the widths and heights of the characteristic image data are different; the channels between the feature fusion data are the same in number, different in width and different in height.
6. The area detection method of a mobile robot according to claim 1, wherein the second data set is obtained by:
acquiring an initial data set output by the trained primary network framework and a plurality of preset classification rules;
classifying the initial data set according to the plurality of preset classification rules;
and if the initial data set meets at least one of the plurality of preset classification rules, obtaining the second data set.
7. A region detection system of a mobile robot, comprising:
the acquisition module is used for acquiring image data to be detected from the mobile robot;
the recognition analysis module is used for carrying out recognition analysis on the image data through a trained multi-task neural network model to obtain a target drivable area of the mobile robot, wherein the multi-task neural network model comprises a primary network frame and a secondary network frame;
the multi-task neural network model is obtained through training the following steps:
Acquiring a multi-category data set and radar interface data, and labeling the multi-category data set to obtain a labeled data set;
inputting the annotation data set into an initialized primary network frame for training to obtain a trained primary network frame, and a first data set and a second data set which are output by the trained primary network frame, wherein the first data set is a characteristic image data set which does not need to extract a candidate frame, and the second data set is a characteristic image data set which does need to extract the candidate frame;
inputting the second data set into a trained secondary network framework to obtain a third data set;
performing multi-element fusion on the first data set, the third data set and the radar interface data to obtain a fusion data set;
and updating parameters of the multi-task neural network model according to the fusion data set to obtain the trained multi-task neural network model.
8. The mobile robotic area inspection system of claim 7, wherein the secondary network framework is trained by:
acquiring a first positive sample, wherein the first positive sample is used for representing a sample data set obtained after the labeling data set is input into an initialized primary network framework for processing;
Slicing and feature extraction are carried out on the first data set according to the first positive sample, so that a feature sample set is obtained;
and inputting the characteristic sample set and the labeling data set into an initialized secondary network frame for training to obtain the trained secondary network frame.
9. An area detection device for a mobile robot, comprising:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor is caused to implement the region detection method of a mobile robot as claimed in any one of claims 1-6.
10. A computer-readable storage medium in which a processor-executable program is stored, characterized in that the processor-executable program is for realizing the area detection method of a mobile robot according to any one of claims 1-6 when being executed by the processor.
CN202310297047.8A 2023-03-23 2023-03-23 Mobile robot region detection method, system, device and medium Active CN116385949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310297047.8A CN116385949B (en) 2023-03-23 2023-03-23 Mobile robot region detection method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310297047.8A CN116385949B (en) 2023-03-23 2023-03-23 Mobile robot region detection method, system, device and medium

Publications (2)

Publication Number Publication Date
CN116385949A true CN116385949A (en) 2023-07-04
CN116385949B CN116385949B (en) 2023-09-08

Family

ID=86960833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310297047.8A Active CN116385949B (en) 2023-03-23 2023-03-23 Mobile robot region detection method, system, device and medium

Country Status (1)

Country Link
CN (1) CN116385949B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543600A (en) * 2018-11-21 2019-03-29 成都信息工程大学 A kind of realization drivable region detection method and system and application
CN110298262A (en) * 2019-06-06 2019-10-01 华为技术有限公司 Object identification method and device
CN112418236A (en) * 2020-11-24 2021-02-26 重庆邮电大学 Automobile drivable area planning method based on multitask neural network
CN113343875A (en) * 2021-06-18 2021-09-03 深圳亿嘉和科技研发有限公司 Driving region sensing method for robot
WO2021226921A1 (en) * 2020-05-14 2021-11-18 Harman International Industries, Incorporated Method and system of data processing for autonomous driving
CN114677446A (en) * 2022-03-21 2022-06-28 华南理工大学 Vehicle detection method, device and medium based on roadside multi-sensor fusion
CN114816719A (en) * 2022-06-23 2022-07-29 小米汽车科技有限公司 Training method and device of multi-task model
CN115482518A (en) * 2022-09-26 2022-12-16 大连理工大学 Extensible multitask visual perception method for traffic scene
US20230069215A1 (en) * 2021-08-27 2023-03-02 Motional Ad Llc Navigation with Drivable Area Detection

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543600A (en) * 2018-11-21 2019-03-29 成都信息工程大学 A kind of realization drivable region detection method and system and application
CN110298262A (en) * 2019-06-06 2019-10-01 华为技术有限公司 Object identification method and device
WO2021226921A1 (en) * 2020-05-14 2021-11-18 Harman International Industries, Incorporated Method and system of data processing for autonomous driving
CN112418236A (en) * 2020-11-24 2021-02-26 重庆邮电大学 Automobile drivable area planning method based on multitask neural network
CN113343875A (en) * 2021-06-18 2021-09-03 深圳亿嘉和科技研发有限公司 Driving region sensing method for robot
US20230069215A1 (en) * 2021-08-27 2023-03-02 Motional Ad Llc Navigation with Drivable Area Detection
CN114677446A (en) * 2022-03-21 2022-06-28 华南理工大学 Vehicle detection method, device and medium based on roadside multi-sensor fusion
CN114816719A (en) * 2022-06-23 2022-07-29 小米汽车科技有限公司 Training method and device of multi-task model
CN115482518A (en) * 2022-09-26 2022-12-16 大连理工大学 Extensible multitask visual perception method for traffic scene

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FUWU YAN等: "LiDAR-Based Multi-Task Road Perception Network for Autonomous Vehicles", 《IEEE ACCESS》, vol. 8, pages 86753 - 86764, XP011789424, DOI: 10.1109/ACCESS.2020.2993578 *
MINGXING TAN等: "EfficientDet: Scalable and Efficient Object Detection", 《ARXIV》, pages 1 - 10 *
张凯祥等: "基于YOLOv5的多任务自动驾驶环境感知算法", 《计算机***应用》, vol. 31, no. 9, pages 226 - 232 *
张迪: "基于单目视觉的自动驾驶视觉感知算法设计", 《中国优秀硕士学位论文全文数据库 工程科技II辑》, no. 03, pages 035 - 377 *

Also Published As

Publication number Publication date
CN116385949B (en) 2023-09-08

Similar Documents

Publication Publication Date Title
Kim et al. Surface crack detection using deep learning with shallow CNN architecture for enhanced computation
Mery Aluminum casting inspection using deep learning: a method based on convolutional neural networks
CN108171112B (en) Vehicle identification and tracking method based on convolutional neural network
Doshi et al. Road damage detection using deep ensemble learning
CN109800658B (en) Parking space type online identification and positioning system and method based on neural network
KR102166458B1 (en) Defect inspection method and apparatus using image segmentation based on artificial neural network
Li et al. Automatic recognition and analysis system of asphalt pavement cracks using interleaved low-rank group convolution hybrid deep network and SegNet fusing dense condition random field
CN114207541B (en) Trajectory prediction
Li et al. Automatic bridge crack identification from concrete surface using ResNeXt with postprocessing
CN114723709A (en) Tunnel disease detection method and device and electronic equipment
CN111738114A (en) Vehicle target detection method based on anchor-free accurate sampling remote sensing image
CN113111875A (en) Seamless steel rail weld defect identification device and method based on deep learning
CN116626177A (en) Rail damage identification method and device
Anwar et al. YOLOv4 based deep learning algorithm for defects detection and classification of rail surfaces
Hoang et al. Optimizing YOLO performance for traffic light detection and end-to-end steering control for autonomous vehicles in Gazebo-ROS2
CN116385949B (en) Mobile robot region detection method, system, device and medium
Liu Learning-based defect recognitions for autonomous UAV inspections
Sun et al. Cross validation for CNN based affordance learning and control for autonomous driving
CN114092817B (en) Target detection method, storage medium, electronic device, and target detection apparatus
CN115620006A (en) Cargo impurity content detection method, system and device and storage medium
Kirthiga et al. A survey on crack detection in concrete surface using image processing and machine learning
Khidhir et al. Comparative Transfer Learning Models for End-to-End Self-Driving Car
Sheta et al. Autonomous robot system for pavement crack inspection based cnn model
Khrueakhrai et al. Railway track detection based on SegNet deep learning
Tang et al. Artificial intelligence approach for aerospace defect detection using single-shot multibox detector network in phased array ultrasonic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant