CN116681123B - Perception model training method, device, computer equipment and storage medium - Google Patents

Perception model training method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116681123B
CN116681123B CN202310950761.2A CN202310950761A CN116681123B CN 116681123 B CN116681123 B CN 116681123B CN 202310950761 A CN202310950761 A CN 202310950761A CN 116681123 B CN116681123 B CN 116681123B
Authority
CN
China
Prior art keywords
pseudo
data
model
training
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310950761.2A
Other languages
Chinese (zh)
Other versions
CN116681123A (en
Inventor
洪伟
李帅君
朱子凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foss Hangzhou Intelligent Technology Co Ltd
Original Assignee
Foss Hangzhou Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foss Hangzhou Intelligent Technology Co Ltd filed Critical Foss Hangzhou Intelligent Technology Co Ltd
Priority to CN202310950761.2A priority Critical patent/CN116681123B/en
Publication of CN116681123A publication Critical patent/CN116681123A/en
Application granted granted Critical
Publication of CN116681123B publication Critical patent/CN116681123B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • B60W2050/0028Mathematical models, e.g. for simulation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a perception model training method, a device, computer equipment and a storage medium. The method comprises the following steps: obtaining unlabeled data determined based on a terminal model; inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data, and performing self-training on the cloud models based on the first pseudo-labeled data; inputting the unlabeled data into a terminal model to generate second pseudo-labeled data; and updating the terminal model through distillation training according to the first pseudo labeling data and the second pseudo labeling data to obtain a terminal updating model. By adopting the method, the training efficiency of the perception model can be improved.

Description

Perception model training method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of intelligent sensing technologies, and in particular, to a sensing model training method, apparatus, computer device, and storage medium.
Background
The current perception strategy in the automatic driving field is mostly developed and iterated based on a deep learning algorithm, the effect of the method depends on the size of an algorithm model and the quantity and quality of labeling data, and the scale of a perception model actually used at a vehicle end by automatic driving is far smaller than that of a cloud perception model in consideration of main factors such as cost, so that the effect of the cloud large model is far better than that of a vehicle end small model.
In order to improve the effect of the small model at the vehicle end, the main stream optimization direction is knowledge distillation, and the main idea is to transfer the information of the large cloud model to the small model at the vehicle end. For example, the middle feature of the cloud large model or the generated result thereof is used as a pseudo tag to train the small model at the vehicle end so as to improve the perception effect of the small model at the vehicle end.
However, the AI model training in such a split data closed loop system only optimizes the vehicle end model, resulting in a low training efficiency of the overall model training process.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a perception model training method, apparatus, computer device, and storage medium that can improve the efficiency of perception model training.
In a first aspect, the present application provides a method for training a perception model. The method comprises the following steps:
obtaining unlabeled data determined based on a terminal model;
inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data, and performing self-training on the cloud models based on the first pseudo-labeled data;
inputting the unlabeled data into a terminal model to generate second pseudo-labeled data;
and updating the terminal model through distillation training according to the first pseudo labeling data and the second pseudo labeling data to obtain a terminal updating model.
In one embodiment, the unlabeled data comprises a plurality of consecutive frames of unlabeled data.
In one embodiment, the inputting the unlabeled data into the plurality of cloud models, generating the first pseudo-labeled data includes: identifying the unlabeled data through a plurality of cloud models to generate a plurality of first pseudo tags and a plurality of first intermediate features; tracking a plurality of first pseudo tags in continuous multi-frame unlabeled data through a target tracking algorithm to generate first tracking pseudo tags; performing association processing on the plurality of first pseudo tags and the plurality of first intermediate features according to the time sequence of the first tracking pseudo tags and the continuous multi-frame unlabeled data to generate first continuous frame pseudo tags and first continuous frame intermediate features; and taking the first continuous frame pseudo tag and the first continuous frame intermediate feature as first pseudo labeling data.
In one embodiment, the inputting the unlabeled data into the terminal model, generating the second pseudo-labeled data includes: identifying the unlabeled data through the terminal model to generate a plurality of second pseudo tags and a plurality of second intermediate features; tracking a plurality of second pseudo tags in continuous multi-frame unlabeled data through a target tracking algorithm to generate second tracking pseudo tags; performing association processing on the plurality of second pseudo tags and the plurality of second intermediate features according to the time sequence of the second tracking pseudo tags and the continuous multi-frame unlabeled data to generate second continuous frame pseudo tags and second continuous frame intermediate features; and taking the second continuous frame pseudo tag and the second continuous frame intermediate feature as second pseudo labeling data.
In one embodiment, the updating the terminal model according to the first pseudo-annotation data and the second pseudo-annotation data through distillation training, and obtaining the terminal update model includes: determining difficult-case pseudo tag data according to the first pseudo tag data and the second pseudo tag data; and updating the terminal model through distillation training according to the difficult-to-case pseudo tag data to obtain a terminal updating model.
In one embodiment, the determining the difficult-to-case pseudo tag data according to the first pseudo tag data and the second pseudo tag data includes: calculating Euclidean distance and/or loss value between the first pseudo labeling data and the second pseudo labeling data; and taking the Euclidean distance and/or the loss value exceeding a preset threshold value as difficult-to-case pseudo tag data corresponding to the first pseudo tag data.
In one embodiment, updating the terminal model according to the difficult-to-case pseudo tag data through distillation training, and obtaining a terminal update model includes: and training a center point by taking the second continuous frame pseudo tag as a tag of a detection head of the terminal model, and training Euclidean distance between the intermediate feature of the first continuous frame and the main network feature of the terminal model to update the terminal model so as to obtain a terminal update model.
In one embodiment, the distillation training comprises a single-frame distillation training and a multi-frame time sequence distillation training, wherein the single-frame distillation training is performed based on intermediate features or pseudo tags of a cloud perception model; and the multi-frame time sequence distillation training is performed based on multi-frame results or pseudo labels of the cloud perception model or the cloud pre-labeling model.
In one embodiment, the self-training the cloud models based on the first pseudo-annotation data includes: determining difference values among a plurality of first pseudo tags in the first pseudo tag data according to the first pseudo tag data; performing data rejection on the first pseudo labeling data based on the difference value to obtain rejected first pseudo labeling data; obtaining pre-labeling data; and performing self-training on the cloud models according to the first pseudo-annotation data and the pre-annotation data after being removed.
In a second aspect, the application further provides a device for training the perception model. The device comprises:
the acquisition module is used for acquiring unlabeled data determined based on the terminal model;
the self-training module is used for inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data and self-training the cloud models based on the first pseudo-labeled data;
The calculation module is used for inputting the unlabeled data into a terminal model to generate second pseudo-labeled data;
and the distillation training module is used for updating the terminal model through distillation training according to the first pseudo-annotation data and the second pseudo-annotation data to obtain a terminal updating model.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
obtaining unlabeled data determined based on a terminal model;
inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data, and performing self-training on the cloud models based on the first pseudo-labeled data;
inputting the unlabeled data into a terminal model to generate second pseudo-labeled data;
and updating the terminal model through distillation training according to the first pseudo labeling data and the second pseudo labeling data to obtain a terminal updating model.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
Obtaining unlabeled data determined based on a terminal model;
inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data, and performing self-training on the cloud models based on the first pseudo-labeled data;
inputting the unlabeled data into a terminal model to generate second pseudo-labeled data;
and updating the terminal model through distillation training according to the first pseudo labeling data and the second pseudo labeling data to obtain a terminal updating model.
The perception model training method, the perception model training device, the perception model training computer equipment, the perception model training storage medium and the perception model training computer program product acquire unlabeled data determined based on a terminal model; inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data, and performing self-training on the cloud models based on the first pseudo-labeled data; inputting the unlabeled data into a terminal model to generate second pseudo-labeled data; and updating the terminal model through distillation training according to the first pseudo labeling data and the second pseudo labeling data to obtain a terminal updating model. The problem of the perception model training inefficiency is solved, the technical effect that improves perception model training efficiency is realized.
Drawings
FIG. 1 is a diagram of an application environment for a perception model training method in one embodiment;
FIG. 2 is a flow chart of a method of training a perception model in one embodiment;
FIG. 3 is a schematic diagram of cloud end model self-training in one embodiment;
FIG. 4 is a schematic diagram of a method of training a perception model in a preferred embodiment;
FIG. 5 is a diagram of a vehicle end perception model optimization architecture in one embodiment;
FIG. 6 is a schematic diagram of a cloud multi-model optimization method for a 2D vehicle-end perception minimodel according to another embodiment;
FIG. 7 is a block diagram of a perception model training apparatus in one embodiment;
fig. 8 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Intelligent driving, which refers to a technique that a robot helps to drive and completely replaces human driving under special conditions, is essentially a cognitive engineering problem of attention attraction and distraction. Intelligent driving utilizes a computer system to achieve a state where automatic driving is possible with little manual intervention. Intelligent driving is an important component of strategically emerging industry as an important product of industrial revolution and informatization, and is an important branch in the artificial intelligence era today. The target perception technology is an important component of intelligent driving, is a premise and a foundation for realizing intelligent control, and in the related technology, a perception model is mostly trained by a method of combining an intelligent algorithm with deep learning, but is limited by the data processing capability of a vehicle end, a cloud model training and a vehicle end model training separation mode is often adopted, and the cloud model is trained by a vehicle end model migration learning mode. However, this separate training method also brings the problems of slow optimization iteration of the model and low model training efficiency. Based on the above, the embodiment of the application provides a perception model training method to improve the training efficiency of a perception model.
The perception model training method provided by the embodiment of the application can be applied to an application environment shown in figure 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be a vehicle-mounted computer, a vehicle machine or an internet of things device, and the internet of things device may be an intelligent sound box, an intelligent television, an intelligent air conditioner, an intelligent vehicle-mounted device and the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, a method for training a perception model is provided, and the method is applied to the terminal in fig. 1 for illustration, and includes the following steps:
step S201, unlabeled data determined based on a terminal model is obtained.
Specifically, the terminal model refers to a perception model arranged at a vehicle end, for example, a perception model arranged on a vehicle. The model can be obtained by training a deep learning model through fine annotation data. The fine labeling data refers to picture data obtained by manual labeling or high-precision machine labeling. The unlabeled data are unlabeled data, and the data can be acquired through vehicle-end sensors such as laser radars, vehicle-mounted cameras and the like.
Step S202, inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data, and performing self-training on the cloud models based on the first pseudo-labeled data.
Specifically, the cloud model is a perception model obtained by training a training set with a large data volume and depending on precise annotation data, such as a target detection model, a lane line perception model and the like, and preferably, on the basis, a perception algorithm can be combined with other perception models such as a sami-rpn tracking network, a weather recognition network and the like to obtain the cloud model. After the unlabeled data is input into the cloud model, the cloud model performs reasoning, identification and labeling on the unlabeled data, and first pseudo-labeled data is generated. Since the accuracy of the annotation depends on the accuracy of the cloud model, a situation of misannotation may occur, and thus is called pseudo annotation data. After the pseudo-annotation data are obtained, the cloud model can be subjected to semi-supervised learning training by combining other pre-annotated data and non-annotated data, so that the updating iteration efficiency of the cloud model is improved, and the recognition accuracy of the cloud model is improved. In the related art, a separated data closed-loop system is mostly adopted, only a vehicle end model is usually optimized in an AI model training process, iteration and evolution efficiency is very low, and in the embodiment, a tightly-coupled data closed-loop system is realized by self-training a cloud model. The cloud model is based on the continuous evolution and iteration of the data screened by the small model at the vehicle end, so that the accuracy of the marked data and the requirement of the dynamic support training service are improved continuously.
And step S203, inputting the unlabeled data into a terminal model to generate second pseudo-labeled data.
And step S204, updating the terminal model through distillation training according to the first pseudo-annotation data and the second pseudo-annotation data to obtain a terminal updating model.
Specifically, distillation training, also known as knowledge distillation. Because of the limitation of the computing capacity of the terminal, the terminal model often adopts a lightweight network, but aiming at an intelligent driving scene, the terminal model needs extremely high precision to ensure driving safety, so knowledge needs to be obtained from a large model and transferred to a small model, and the small model can achieve the effect related to the large model, namely the effect of distillation training. And after the first pseudo labeling data and the second pseudo labeling data of the terminal model are obtained, the evolution updating of the terminal model is realized through distillation training.
In the perception model training method, the perception model training does not directly train the terminal model through data, but optimizes the cloud model firstly, and then optimizes the performance of the terminal model together through distillation training, so that the training quality and efficiency of the terminal model are improved.
In one embodiment, the unlabeled data comprises a plurality of consecutive frames of unlabeled data.
Specifically, since the conventional knowledge distillation method does not consider the continuous frame characteristics, there are problems that the accuracy of the labeling result is low and the stability is poor. In this embodiment, the unlabeled data is adjusted to be continuous multi-frame unlabeled data, so that the conditions of continuous frames in the perception information such as the camera data and the radar data can be extracted, and the abnormal labeling result is extracted by a data analysis method, so that the accuracy and the stability of the labeling result are further improved. Compared with the traditional knowledge distillation method based on single frame comparison, the perception model training method based on continuous frame feature design can correct pseudo labels with unstable cloud model class jump according to continuous frame results, the situation that terminal model precision is improved in limited mode due to inaccurate cloud model single frame detection is prevented, and middle knowledge distillation training is characterized by continuous frames, stability of feature distribution is improved, and therefore performance of a terminal model can be improved more efficiently and more specifically.
In one embodiment, the inputting the unlabeled data into the plurality of cloud models, generating the first pseudo-labeled data includes: identifying the unlabeled data through a plurality of cloud models to generate a plurality of first pseudo tags and a plurality of first intermediate features; tracking a plurality of first pseudo tags in continuous multi-frame unlabeled data through a target tracking algorithm to generate first tracking pseudo tags; performing association processing on the plurality of first pseudo tags and the plurality of first intermediate features according to the time sequence of the first tracking pseudo tags and the continuous multi-frame unlabeled data to generate first continuous frame pseudo tags and first continuous frame intermediate features; and taking the first continuous frame pseudo tag and the first continuous frame intermediate feature as first pseudo labeling data.
Specifically, the cloud model comprises a cloud perception large model and a cloud pre-labeling large model, wherein the cloud perception large model mainly comprises a target detection model, a lane line detection model and the like, and the cloud pre-labeling large model is a classified network such as a sami-rpn tracking network, weather identification and scene identification and the like based on the cloud perception large model. Training N cloud large models using fine-scale continuous frame data, denoted bigmodels= { BM 1 、BM 2 … BM N A small model of the vehicle end, the terminal model, was trained using fine-labeled continuous frame data, denoted SmallModel. Reasoning untrimmed continuous frame data by using cloud big model BigModels to generate pseudo tag PseLabels= { PseLabel 1 、PseLabel 2 … PseLabel N The first dummy tag and the first intermediate feature are = { biggest 1, biggest 2 … biggest n } and the intermediate feature biggest. Generating a first tracking pseudo tag TrackPseLabels= { TrackPseLabel based on PseLabels using a SimpleTack Multi-target tracking method 1 、TrackPseLabel 2 … TrackPseLabel N }. A tracksmallpsa label was generated based on smallpsa label using the simpleTack multi-target tracking method. Generating a first continuous frame pseudo tag conPseLabels= { conPseLabel according to the TrackPseLabels by using PseLabels and Bigfeatures according to the sequence of frames and the relation of the front frame and the back frame of the target 1 、conPseLabel 2 … conPseLabel N And the first continuous frame intermediate feature confugeature= { confugeature 1 、conBigfeature 2 … conBigfeature N }. Generating a second tracking pseudo tag T based on a second pseudo tag smallpseibel of the terminal model using a simpleTack multi-target tracking methodrackSmallPseLabel. Each successive frame pseudo tag, confselameLabel, is cut into multiple combinations per n frames (frames) denoted confselameLabel = { frame 1 _conPseLabel、frame 2 _conPseLabel、… frame n _conPseLabel }、{frame 2 _conPseLabel、frame 3 _conPseLabel、… frame n+1 _conPseLabel }、… {frame T-N _conPseLabel、frame T-N+1 _conPseLabel、… frame T -con pseibel }; each first continuous frame intermediate feature, confugeature, is cut into a plurality of combinations per N frames (frames) denoted confugeature = { { frame 1 _conBigfeature、frame 2 _conBigfeature、… framen_conBigfeature }、{frame 2 _conBigfeature、frame 3 _conBigfeature、… frame N+1 _conBigfeature }、… {frame T-n _conBigfeature、frame T-n+1 _conBigfeature、frameT_conBigfeature } }。
In one embodiment, the inputting the unlabeled data into the terminal model, generating the second pseudo-labeled data includes: identifying the unlabeled data through the terminal model to generate a plurality of second pseudo tags and a plurality of second intermediate features; tracking a plurality of second pseudo tags in continuous multi-frame unlabeled data through a target tracking algorithm to generate second tracking pseudo tags; performing association processing on the plurality of second pseudo tags and the plurality of second intermediate features according to the time sequence of the second tracking pseudo tags and the continuous multi-frame unlabeled data to generate second continuous frame pseudo tags and second continuous frame intermediate features; and taking the second continuous frame pseudo tag and the second continuous frame intermediate feature as second pseudo labeling data.
In one embodiment, the updating the terminal model according to the first pseudo-annotation data and the second pseudo-annotation data through distillation training, and obtaining the terminal update model includes: determining difficult-case pseudo tag data according to the first pseudo tag data and the second pseudo tag data; and updating the terminal model through distillation training according to the difficult-to-case pseudo tag data to obtain a terminal updating model.
Specifically, a target detection pseudo tag is generated by reasoning unlabeled data through a small model of a vehicle end, and a continuous frame pseudo tag with tracking information is generated by using a tracking algorithm SimTacker; and comparing the pseudo tags generated by the small model of the vehicle end with a plurality of groups of pseudo tags generated by the large cloud model, such as Euclidean distance and loss function CenterHeadLoss, extracting topN with the largest difference, determining the corresponding first pseudo labeling data and second pseudo labeling data as difficult case pseudo tag data, and updating the terminal model through distillation training based on the difficult case pseudo tag data to obtain a terminal updating model.
In one embodiment, the determining the difficult-to-case pseudo tag data according to the first pseudo tag data and the second pseudo tag data includes: calculating Euclidean distance and/or loss value between the first pseudo labeling data and the second pseudo labeling data; and taking the Euclidean distance and/or the loss value exceeding a preset threshold value as difficult-to-case pseudo tag data corresponding to the first pseudo tag data.
Specifically, randomly extracting n continuous frame data from unlabeled continuous frame data, denoted as frame t~t+N = {frame t 、frame t+1 、……, frame t+n }. Using SmallModel for untagged data frame t~t+n Reasoning to generate pseudo tag frames t~t+n SmallPseLabel and intermediate feature frame t~t+n Smallfeatute. Frame based using target tracking methods t~t+n SmallPseLabel Generation frame t~t+n ConsmallPseLabel and frame t~t+n Consmallfeature. { frames } are extracted from conBigfeatures, conPseLabels separately t 、frame t+1 、…frame t+n N cloud big model results frames corresponding to t~t+n ConPseLabels and frames t~t+n ConBigfeatures. Frame calculation using mAP metrics t~t+n ConsmallPseLabel and frame t~t+n The results in condonelabels are noted as mAP n ={mAP 1 、mAP 2 、…mAP N First pseudo-annotation data corresponding to topN-large result is selected as frame t~t+n conPseLabeltop 1~N ={ frame t~t+n conPseLabeltop1、frame t~t+n conPseLabel top2 … frame t~t+ n ConPseLabeltopN } and frame t~t+n conBigfeature top1~N={ frame t~t+n conBigfeature top1、frame t~t+n conBigfeature top2 … frame t~t+n conBigfeature topN}。
In one embodiment, updating the terminal model according to the difficult-to-case pseudo tag data through distillation training, and obtaining a terminal update model includes: and training a center point by taking the second continuous frame pseudo tag as a tag of a detection head of the terminal model, and training Euclidean distance between the intermediate feature of the first continuous frame and the main network feature of the terminal model to update the terminal model so as to obtain a terminal update model.
Specifically, the pseudo tag is proposed as a label for pseudo tag distillation training, and the main used method is Euclidean distance regression of intermediate features and pseudo tag label training. Using the student Net versus frame t~t+n Reasoning to generate frames t~t+n Is the frame of the reasoning result of (a) t~t+n Output and intermediate feature frame t~t+n Smallfree. Using frames t~t+ n The condoneLabeltop 1-N is used as a label of the SmallModel detection head and is trained by using a CenterPoint mode. frame t~t+n conBigfeature top 1N is trained on Euclidean distance (L2 Loss) with Smallbore intermediate results Smallbore of Smallbore model.
In one embodiment, the distillation training comprises a single-frame distillation training and a multi-frame time sequence distillation training, wherein the single-frame distillation training is performed based on intermediate features or pseudo tags of a cloud perception model; and the multi-frame time sequence distillation training is performed based on multi-frame results or pseudo labels of the cloud perception model or the cloud pre-labeling model.
Specifically, in the knowledge distillation process, the advantages and characteristics of each teacher network are highlighted to pertinently improve the performance of the student network, and if a single frame is the case, the middle characteristic of a cloud large model or a pseudo tag thereof is used; the multi-frame condition is a multi-frame result using a cloud large model/a pre-labeling large model or a pseudo tag thereof. The method mainly used is European with intermediate characteristics Distance regression and pseudo tag label training. The training process specifically comprises the following steps: using the student Net versus frame t~t+n Reasoning to generate frames t~t+n Is the frame of the reasoning result of (a) t~t+n Output and intermediate feature frame t~t+ n Smallfree. For the case of consecutive frames, the consecutive frames use frames t~t+n conBigfeature top1~N And frame t~t+n conPseLabel top1~N As rnn_loss tags. Where rnn_loss is the sum of the multi-frame features and after addition, training is performed using L2 Loss. For the case of a single frame, the single frame uses frames t~t+n PseLabel top1~N As a label of the SmallModel detection head, training was performed using a centrpoint method. And frame t~t+n Bigfeature top1~N Then the euclidean distance (L2 Loss) training is performed with Smallfeature, a back bone intermediate result of SmallModel.
In one embodiment, the self-training the cloud models based on the first pseudo-annotation data includes: determining difference values among a plurality of first pseudo tags in the first pseudo tag data according to the first pseudo tag data; performing data rejection on the first pseudo labeling data based on the difference value to obtain rejected first pseudo labeling data; obtaining pre-labeling data; and performing self-training on the cloud models according to the first pseudo-annotation data and the pre-annotation data after being removed.
Specifically, as shown in fig. 3, the main idea of training the large model of the pseudo tag is to clean up the pseudo tag of the continuous frames, and then add the continuous frames into the original data set to retrain the large model of the cloud end so as to improve the effect of the large model of the cloud end. For example, differences between each other in the condonelabels are calculated in a manner mainly of euclidean distance (L2). Samples with the greatest difference from other classes in the condonelabels are selected and removed, and the reason for this treatment is that samples with poor similarity to other labels are false-standard samples with high probability. And inputting original data, namely unlabeled data, carrying out cloud large model training according to the original data, continuing to determine pseudo labeling data through model reasoning and tracking calculation, determining continuous frame pseudo labels, and removing the pseudo labels with the largest distance through measuring the distance between the pseudo labels to form a training closed loop.
The traditional knowledge distillation teacher-student network can not continuously optimize the cloud large model, but the model training method provided by the embodiment of the application can continuously optimize the cloud large model so as to improve the labeling efficiency and accuracy, provide continuous frame time sequence information for the small model and improve the optimization upper limit. The traditional knowledge distillation teacher-student network is mostly trained based on single frame data, and the perception model training method of the embodiment corrects the false label with unstable cloud end large model class jump according to continuous frame results, so that the problem that the accuracy of a small model at a vehicle end is limited due to inaccurate single frame detection of the cloud end large model is prevented; and the middle knowledge distillation training is characterized by continuous frames, so that the stability of feature distribution is improved, and the performance of the small model at the vehicle end is improved more efficiently and more pertinently.
The traditional knowledge distillation teacher-student network is mostly trained based on a single cloud big model, and the perception model training method in the embodiment of the application uses a plurality of teacher networks to generate a plurality of pseudo labels, compares the results of the middle characteristics and the vehicle end small model, extracts a plurality of results with larger variability to train the vehicle end small model, and aims to dig out the part with larger variability of the vehicle end small model and the teacher network to perform targeted optimization, so that the model optimization effect is improved more efficiently.
In a preferred embodiment, as shown in fig. 4, a tightly coupled data closed-loop system is provided, iteration evolution is performed on a cloud perception large model and a cloud pre-labeling large model by using fine labeling data and unlabeled data, and a vehicle end small model is trained by using knowledge migration methods such as knowledge distillation and the like, so that the effect of the vehicle end small model is improved.
Specifically, high-value data screened by a vehicle-end perception small model can be marked by using a cloud pre-marking large model, and fine marking data are generated. The high-value data refers to scene data of a target result and special scene data. Scenes with more target results, such as scenes with more pedestrians or vehicles, and the small model perceived by the vehicle end is easy to identify errors on the scenes. Special scenes refer to rainy or black night scenes, etc., and the probability of perception errors in these scenes is high, so that the part of the scenes is said to be high-value data. And reasoning to generate pseudo-annotation data through a plurality of old cloud perception large models and cloud pre-annotation large models. The cloud end perception model and the cloud end pre-labeling model are optimized simultaneously by using pseudo labeling data and fine labeling data, a new cloud end perception model and a cloud end pre-labeling model with higher precision are obtained, then a teacher network and a student network are used, a cloud end model is used as a teacher network, a vehicle end perception model is used as a student network, a new vehicle end perception model is obtained based on an old vehicle end perception model by means of difficult-case pseudo tag mining and distillation training, the new vehicle end perception model is deployed at a vehicle end, efficient self-training closed loop is achieved, and the perception performance of the cloud end model is improved.
And the teacher-student network method based on difficult-to-sample pseudo-label mining is used for migrating the knowledge of the optimized cloud model to the small car-end model, so that the perception capability is improved more efficiently aiming at the places where the small car-end perception model has insufficient performance.
As shown in fig. 5, the cloud large model and the knowledge distillation process are small models for optimizing the student vehicle end perception together through a plurality of teacher networks, and specifically include: generating a continuous frame pseudo tag; self-training of a large pseudo tag model; difficult-case pseudo tag mining and distillation training are used as core teacher and student network optimization algorithms.
In another preferred embodiment, as shown in fig. 6, a cloud multi-model optimization method is provided for a 2D vehicle-end perception small model, specifically, in a vehicle-end operation process, the 2D vehicle-end perception small model generates some problems of inaccurate detection caused by environment transformation verification, such as: and the vehicle or pedestrian detection under strong light is lost, the lane line detection is inaccurate, and the error detection is generated in rainy days with heavy fog. Therefore, the data needs to be manually marked for the scenes, and the effect of the small model at the vehicle end under the scenes is enhanced. And collecting the data of the environments, uploading the data serving as fine annotation image data and route mining unlabeled data to the cloud, and generating a pseudo tag by using cloud large model reasoning. Because the cloud end large model effect is far better than the vehicle end small model, the generated pseudo tag can cover targets in the scenes. The generated pseudo tag uses a cloud large model self-training strategy to optimize the cloud large model, so that the effect of the cloud large model can be further improved. And taking out the intermediate result and the pseudo tag generated by the cloud large model, and optimizing by combining the optimized knowledge distillation method. And finally deploying the optimized vehicle end small model to form a closed loop.
According to the perception model training method, in the tightly-coupled data closed-loop system, the cloud large model/pre-labeling model can continuously evolve and iterate through high-value data screened by the vehicle-end small model, so that the accuracy of labeling data and the requirement of dynamic support training business are continuously improved.
AI model training is not directly through data training car end small model, but through optimizing high in the clouds model and high in the clouds mark large model earlier, and the car end small model performance is jointly optimized through teachers and students' network to promote the quality and the efficiency of training.
The lifting of the student network is assisted by a plurality of teacher networks, and the main mode is to compare a plurality of cloud end large model reasoning results with vehicle end small model reasoning results, and take a plurality of results with larger difference as pseudo tag training models.
A brand new teacher-student distillation method is designed by utilizing continuous frame characteristics, and the main idea is to pair IDs of continuous frame results by using a tracking algorithm, extract results within a plurality of frames of the continuous frame results as pseudo tags to train a vehicle end small model, and reduce the defect that the accuracy of the vehicle end small model is limited due to inaccurate labeling results.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a perception model training device for realizing the above related perception model training method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the one or more perception model training devices provided below may be referred to the limitation of the perception model training method hereinabove, and will not be described herein.
In one embodiment, as shown in fig. 7, there is provided a perception model training apparatus, including: acquisition module 10, self-training module 20, calculation module 30, and distillation training module 40, wherein:
an obtaining module 10, configured to obtain unlabeled data determined based on a terminal model;
the self-training module 20 is configured to input the unlabeled data into a plurality of cloud models, generate first pseudo-labeled data, and perform self-training on the plurality of cloud models based on the first pseudo-labeled data;
the calculation module 30 is configured to input the unlabeled data into a terminal model, and generate second pseudo-labeled data;
and the distillation training module 40 is configured to update the terminal model through distillation training according to the first pseudo-annotation data and the second pseudo-annotation data, so as to obtain a terminal update model.
In one embodiment, the unlabeled data comprises a plurality of consecutive frames of unlabeled data.
The self-training module 20 is further configured to identify the unlabeled data through a plurality of cloud models, and generate a plurality of first pseudo tags and a plurality of first intermediate features; tracking a plurality of first pseudo tags in continuous multi-frame unlabeled data through a target tracking algorithm to generate first tracking pseudo tags; performing association processing on the plurality of first pseudo tags and the plurality of first intermediate features according to the time sequence of the first tracking pseudo tags and the continuous multi-frame unlabeled data to generate first continuous frame pseudo tags and first continuous frame intermediate features; and taking the first continuous frame pseudo tag and the first continuous frame intermediate feature as first pseudo labeling data.
The computing module 30 is further configured to identify the unlabeled data through the terminal model, and generate a plurality of second pseudo tags and a plurality of second intermediate features; tracking a plurality of second pseudo tags in continuous multi-frame unlabeled data through a target tracking algorithm to generate second tracking pseudo tags; performing association processing on the plurality of second pseudo tags and the plurality of second intermediate features according to the time sequence of the second tracking pseudo tags and the continuous multi-frame unlabeled data to generate second continuous frame pseudo tags and second continuous frame intermediate features; and taking the second continuous frame pseudo tag and the second continuous frame intermediate feature as second pseudo labeling data.
The distillation training module 40 is further configured to determine difficult-to-case pseudo tag data according to the first pseudo tag data and the second pseudo tag data; and updating the terminal model through distillation training according to the difficult-to-case pseudo tag data to obtain a terminal updating model.
The distillation training module 40 is further configured to calculate a euclidean distance and/or a loss value between the first pseudo-annotation data and the second pseudo-annotation data; and taking the Euclidean distance and/or the loss value exceeding a preset threshold value as difficult-to-case pseudo tag data corresponding to the first pseudo tag data.
The distillation training module 40 is further configured to perform center point training with the second continuous frame pseudo tag as a tag of the detection head of the terminal model, and perform euclidean distance training with the intermediate feature of the first continuous frame and the backbone network feature of the terminal model, so as to update the terminal model, and obtain a terminal update model.
In one embodiment, the distillation training comprises a single-frame distillation training and a multi-frame time sequence distillation training, wherein the single-frame distillation training is performed based on intermediate features or pseudo tags of a cloud perception model; and the multi-frame time sequence distillation training is performed based on multi-frame results or pseudo labels of the cloud perception model or the cloud pre-labeling model.
The self-training module 20 is further configured to determine, according to the first pseudo tag data, a difference value between a plurality of first pseudo tags in the first pseudo tag data; performing data rejection on the first pseudo labeling data based on the difference value to obtain rejected first pseudo labeling data; obtaining pre-labeling data; and performing self-training on the cloud models according to the first pseudo-annotation data and the pre-annotation data after being removed.
The modules in the above-described perception model training apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 8. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing fine labeling data, unlabeled data and pseudo standard data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a perception model training method.
It will be appreciated by those skilled in the art that the structure shown in FIG. 8 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
obtaining unlabeled data determined based on a terminal model;
inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data, and performing self-training on the cloud models based on the first pseudo-labeled data;
inputting the unlabeled data into a terminal model to generate second pseudo-labeled data;
and updating the terminal model through distillation training according to the first pseudo labeling data and the second pseudo labeling data to obtain a terminal updating model.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
obtaining unlabeled data determined based on a terminal model;
inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data, and performing self-training on the cloud models based on the first pseudo-labeled data;
inputting the unlabeled data into a terminal model to generate second pseudo-labeled data;
and updating the terminal model through distillation training according to the first pseudo labeling data and the second pseudo labeling data to obtain a terminal updating model.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:
obtaining unlabeled data determined based on a terminal model;
inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data, and performing self-training on the cloud models based on the first pseudo-labeled data;
Inputting the unlabeled data into a terminal model to generate second pseudo-labeled data;
and updating the terminal model through distillation training according to the first pseudo labeling data and the second pseudo labeling data to obtain a terminal updating model.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as Static Random access memory (Static Random access memory AccessMemory, SRAM) or dynamic Random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (7)

1. A method of training a perception model, the method comprising:
acquiring unlabeled data determined based on a terminal model, wherein the unlabeled data comprises continuous multi-frame unlabeled data;
inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data, and performing self-training on the cloud models based on the first pseudo-labeled data;
Inputting the unlabeled data into a plurality of cloud models, and generating first pseudo-labeled data comprises:
identifying the unlabeled data through a plurality of cloud models to generate a plurality of first pseudo tags and a plurality of first intermediate features;
tracking a plurality of first pseudo tags in continuous multi-frame unlabeled data through a target tracking algorithm to generate first tracking pseudo tags;
performing association processing on the plurality of first pseudo tags and the plurality of first intermediate features according to the time sequence of the first tracking pseudo tags and the continuous multi-frame unlabeled data to generate first continuous frame pseudo tags and first continuous frame intermediate features;
taking the first continuous frame pseudo tag and the first continuous frame intermediate feature as first pseudo labeling data;
the self-training of the cloud models based on the first pseudo-annotation data includes:
determining difference values among a plurality of first pseudo tags in the first pseudo tag data according to the first pseudo tag data;
performing data rejection on the first pseudo labeling data based on the difference value to obtain rejected first pseudo labeling data;
obtaining pre-labeling data; performing self-training on the cloud models according to the first pseudo-annotation data and the pre-annotation data after being removed;
Inputting the unlabeled data into a terminal model to generate second pseudo-labeled data;
updating the terminal model through distillation training according to the first pseudo labeling data and the second pseudo labeling data to obtain a terminal updating model;
updating the terminal model through distillation training according to the first pseudo-annotation data and the second pseudo-annotation data, wherein obtaining the terminal updating model comprises the following steps:
determining difficult-case pseudo tag data according to the first pseudo tag data and the second pseudo tag data;
updating the terminal model through distillation training according to the difficult-to-case pseudo tag data to obtain a terminal updating model;
the difficult case pseudo tag data is determined based on the following manner: the unlabeled data is subjected to reasoning to generate a target detection pseudo tag through the terminal model, and a tracking algorithm is used for generating a continuous frame pseudo tag with tracking information; determining a plurality of difference values between the target detection pseudo tag, the continuous frame pseudo tag with tracking information and the first pseudo tags respectively based on Euclidean distance and a loss function; determining a maximum value in the multiple difference values, and determining first pseudo labeling data and second pseudo labeling data corresponding to the maximum value as difficult pseudo labeling data;
Updating the terminal model through distillation training according to the difficult-to-case pseudo tag data, wherein obtaining the terminal updating model comprises the following steps:
and training a center point by taking the second continuous frame pseudo tag as a tag of a detection head of the terminal model, and training Euclidean distance between the intermediate feature of the first continuous frame and the main network feature of the terminal model to update the terminal model so as to obtain a terminal update model.
2. The method of claim 1, wherein inputting the unlabeled data into a terminal model to generate second pseudo-labeled data comprises:
identifying the unlabeled data through the terminal model to generate a plurality of second pseudo tags and a plurality of second intermediate features;
tracking a plurality of second pseudo tags in continuous multi-frame unlabeled data through a target tracking algorithm to generate second tracking pseudo tags;
performing association processing on the plurality of second pseudo tags and the plurality of second intermediate features according to the time sequence of the second tracking pseudo tags and the continuous multi-frame unlabeled data to generate second continuous frame pseudo tags and second continuous frame intermediate features;
and taking the second continuous frame pseudo tag and the second continuous frame intermediate feature as second pseudo labeling data.
3. The method of claim 1, wherein determining difficult-to-case pseudo tag data from the first pseudo tag data and the second pseudo tag data comprises:
calculating Euclidean distance and/or loss value between the first pseudo labeling data and the second pseudo labeling data;
and taking the Euclidean distance and/or the loss value exceeding a preset threshold value as difficult-to-case pseudo tag data corresponding to the first pseudo tag data.
4. The perception model training method according to claim 1, wherein the cloud model comprises a cloud perception model and a cloud pre-labeling model, and the distillation training comprises single-frame distillation training and multi-frame time sequence distillation training, wherein the single-frame distillation training is performed based on intermediate features or pseudo tags of the cloud perception model; and the multi-frame time sequence distillation training is performed based on multi-frame results or pseudo labels of the cloud perception model or the cloud pre-labeling model.
5. A perception model training apparatus, the apparatus comprising:
the acquisition module is used for acquiring unlabeled data determined based on the terminal model, wherein the unlabeled data comprises continuous multi-frame unlabeled data;
The self-training module is used for inputting the unlabeled data into a plurality of cloud models, generating first pseudo-labeled data and self-training the cloud models based on the first pseudo-labeled data;
the self-training module is further used for identifying the unlabeled data through a plurality of cloud models to generate a plurality of first pseudo tags and a plurality of first intermediate features; tracking a plurality of first pseudo tags in continuous multi-frame unlabeled data through a target tracking algorithm to generate first tracking pseudo tags; performing association processing on the plurality of first pseudo tags and the plurality of first intermediate features according to the time sequence of the first tracking pseudo tags and the continuous multi-frame unlabeled data to generate first continuous frame pseudo tags and first continuous frame intermediate features; taking the first continuous frame pseudo tag and the first continuous frame intermediate feature as first pseudo labeling data; determining difference values among a plurality of first pseudo tags in the first pseudo tag data according to the first pseudo tag data; performing data rejection on the first pseudo labeling data based on the difference value to obtain rejected first pseudo labeling data; obtaining pre-labeling data; performing self-training on the cloud models according to the first pseudo-annotation data and the pre-annotation data after being removed;
The calculation module is used for inputting the unlabeled data into a terminal model to generate second pseudo-labeled data;
the distillation training module is used for updating the terminal model through distillation training according to the first pseudo-annotation data and the second pseudo-annotation data to obtain a terminal updating model;
the distillation training module is further used for determining difficult-to-case pseudo tag data according to the first pseudo tag data and the second pseudo tag data; updating the terminal model through distillation training according to the difficult-to-case pseudo tag data to obtain a terminal updating model; the unlabeled data is subjected to reasoning to generate a target detection pseudo tag through the terminal model, and a tracking algorithm is used for generating a continuous frame pseudo tag with tracking information; determining a plurality of difference values between the target detection pseudo tag, the continuous frame pseudo tag with tracking information and the first pseudo tags respectively based on Euclidean distance and a loss function; determining a maximum value in the multiple difference values, and determining first pseudo labeling data and second pseudo labeling data corresponding to the maximum value as difficult pseudo labeling data; and training a center point by taking the second continuous frame pseudo tag as a tag of a detection head of the terminal model, and training Euclidean distance between the intermediate feature of the first continuous frame and the main network feature of the terminal model to update the terminal model so as to obtain a terminal update model.
6. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 4 when the computer program is executed.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 4.
CN202310950761.2A 2023-07-31 2023-07-31 Perception model training method, device, computer equipment and storage medium Active CN116681123B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310950761.2A CN116681123B (en) 2023-07-31 2023-07-31 Perception model training method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310950761.2A CN116681123B (en) 2023-07-31 2023-07-31 Perception model training method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116681123A CN116681123A (en) 2023-09-01
CN116681123B true CN116681123B (en) 2023-11-14

Family

ID=87781305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310950761.2A Active CN116681123B (en) 2023-07-31 2023-07-31 Perception model training method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116681123B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949786A (en) * 2021-05-17 2021-06-11 腾讯科技(深圳)有限公司 Data classification identification method, device, equipment and readable storage medium
CN113127666A (en) * 2020-01-15 2021-07-16 初速度(苏州)科技有限公司 Continuous frame data labeling system, method and device
CN113807399A (en) * 2021-08-16 2021-12-17 华为技术有限公司 Neural network training method, neural network detection method and neural network detection device
CN113947196A (en) * 2021-10-25 2022-01-18 中兴通讯股份有限公司 Network model training method and device and computer readable storage medium
CN114120319A (en) * 2021-10-09 2022-03-01 苏州大学 Continuous image semantic segmentation method based on multi-level knowledge distillation
CN114445789A (en) * 2022-01-24 2022-05-06 上海宏景智驾信息科技有限公司 Automatic driving scene mining method based on semi-supervised transform detection
CN115311605A (en) * 2022-09-29 2022-11-08 山东大学 Semi-supervised video classification method and system based on neighbor consistency and contrast learning
CN115661615A (en) * 2022-12-13 2023-01-31 浙江莲荷科技有限公司 Training method and device of image recognition model and electronic equipment
WO2023045935A1 (en) * 2021-09-22 2023-03-30 北京智行者科技股份有限公司 Automated iteration method for target detection model, device and storage medium
CN115879535A (en) * 2023-02-10 2023-03-31 北京百度网讯科技有限公司 Training method, device, equipment and medium for automatic driving perception model
CN116402976A (en) * 2023-03-07 2023-07-07 嬴彻星创智能科技(上海)有限公司 Training method and device for three-dimensional target detection model
CN116453109A (en) * 2023-03-28 2023-07-18 上海高德威智能交通***有限公司 3D target detection method, device, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230169389A1 (en) * 2021-11-30 2023-06-01 International Business Machines Corporation Domain adaptation

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127666A (en) * 2020-01-15 2021-07-16 初速度(苏州)科技有限公司 Continuous frame data labeling system, method and device
WO2021143230A1 (en) * 2020-01-15 2021-07-22 初速度(苏州)科技有限公司 Labeling system, method and apparatus for continuous frame data
CN112949786A (en) * 2021-05-17 2021-06-11 腾讯科技(深圳)有限公司 Data classification identification method, device, equipment and readable storage medium
CN113807399A (en) * 2021-08-16 2021-12-17 华为技术有限公司 Neural network training method, neural network detection method and neural network detection device
WO2023045935A1 (en) * 2021-09-22 2023-03-30 北京智行者科技股份有限公司 Automated iteration method for target detection model, device and storage medium
CN114120319A (en) * 2021-10-09 2022-03-01 苏州大学 Continuous image semantic segmentation method based on multi-level knowledge distillation
CN113947196A (en) * 2021-10-25 2022-01-18 中兴通讯股份有限公司 Network model training method and device and computer readable storage medium
CN114445789A (en) * 2022-01-24 2022-05-06 上海宏景智驾信息科技有限公司 Automatic driving scene mining method based on semi-supervised transform detection
CN115311605A (en) * 2022-09-29 2022-11-08 山东大学 Semi-supervised video classification method and system based on neighbor consistency and contrast learning
CN115661615A (en) * 2022-12-13 2023-01-31 浙江莲荷科技有限公司 Training method and device of image recognition model and electronic equipment
CN115879535A (en) * 2023-02-10 2023-03-31 北京百度网讯科技有限公司 Training method, device, equipment and medium for automatic driving perception model
CN116402976A (en) * 2023-03-07 2023-07-07 嬴彻星创智能科技(上海)有限公司 Training method and device for three-dimensional target detection model
CN116453109A (en) * 2023-03-28 2023-07-18 上海高德威智能交通***有限公司 3D target detection method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Consistency regularization teacher–student semi-supervised learning method for target recognitionin SAR images;Ye Tian et al.;《The visual computer》;4179-4192 *
基于深度学习的域适应弱监督目标检测算法研究;欧阳胜雄;《中国优秀硕士学位论文全文数据库》;1-77 *

Also Published As

Publication number Publication date
CN116681123A (en) 2023-09-01

Similar Documents

Publication Publication Date Title
Xu et al. Exploring categorical regularization for domain adaptive object detection
Mou et al. RiFCN: Recurrent network in fully convolutional network for semantic segmentation of high resolution remote sensing images
Khaliq et al. A holistic visual place recognition approach using lightweight cnns for significant viewpoint and appearance changes
Olid et al. Single-view place recognition under seasonal changes
Lopez-Antequera et al. Appearance-invariant place recognition by discriminatively training a convolutional neural network
Gomez-Ojeda et al. Training a convolutional neural network for appearance-invariant place recognition
Bai et al. Sequence searching with CNN features for robust and fast visual place recognition
CN103578119A (en) Target detection method in Codebook dynamic scene based on superpixels
CN110781262A (en) Semantic map construction method based on visual SLAM
CN112766170B (en) Self-adaptive segmentation detection method and device based on cluster unmanned aerial vehicle image
CN113052184B (en) Target detection method based on two-stage local feature alignment
Zhou et al. Cross-weather image alignment via latent generative model with intensity consistency
CN113065409A (en) Unsupervised pedestrian re-identification method based on camera distribution difference alignment constraint
CN115393745A (en) Automatic bridge image progress identification method based on unmanned aerial vehicle and deep learning
CN115546840A (en) Pedestrian re-recognition model training method and device based on semi-supervised knowledge distillation
CN114511627A (en) Target fruit positioning and dividing method and system
Yun et al. Target-style-aware unsupervised domain adaptation for object detection
CN116681123B (en) Perception model training method, device, computer equipment and storage medium
Ke et al. Dense small face detection based on regional cascade multi‐scale method
CN114067356B (en) Pedestrian re-recognition method based on combined local guidance and attribute clustering
CN112215205B (en) Target identification method and device, computer equipment and storage medium
Hou et al. Forest: A lightweight semantic image descriptor for robust visual place recognition
Feng et al. Incremental Learning-based Lane Detection for Automated Rubber-Tired Gantries in Container Terminal
Taha et al. Review of place recognition approaches: traditional and deep learning methods
CN115861997B (en) License plate detection and recognition method for key foreground feature guided knowledge distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant