CN111489365B - Training method of neural network, image processing method and device - Google Patents

Training method of neural network, image processing method and device Download PDF

Info

Publication number
CN111489365B
CN111489365B CN202010278429.2A CN202010278429A CN111489365B CN 111489365 B CN111489365 B CN 111489365B CN 202010278429 A CN202010278429 A CN 202010278429A CN 111489365 B CN111489365 B CN 111489365B
Authority
CN
China
Prior art keywords
image
semantic segmentation
network
information
parameter values
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010278429.2A
Other languages
Chinese (zh)
Other versions
CN111489365A (en
Inventor
周千寓
程光亮
石建萍
马利庄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Original Assignee
Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sensetime Lingang Intelligent Technology Co Ltd filed Critical Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority to CN202010278429.2A priority Critical patent/CN111489365B/en
Publication of CN111489365A publication Critical patent/CN111489365A/en
Application granted granted Critical
Publication of CN111489365B publication Critical patent/CN111489365B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a training method, an image processing method and a device of a neural network, wherein the training method comprises the following steps: carrying out semantic segmentation processing on a first noise image of the target image by utilizing a student network to obtain a first semantic segmentation image; performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image and the credibility information; based on the updated parameter values of the student network, the parameter values of the teacher network are updated. According to the embodiment of the disclosure, the first semantic segmentation image, the second semantic segmentation image and the credibility information are used for controlling specific features in the student network and the teacher network learning target image, so that negative migration of the student network and the teacher network in migration learning is avoided.

Description

Training method of neural network, image processing method and device
Technical Field
The disclosure relates to the technical field of image processing, and in particular relates to a training method of a neural network, an image processing method and an image processing device.
Background
Image segmentation refers to the task of assigning a semantic label to each pixel of a given image; in the supervised training or semi-supervised training process of the semantic segmentation model, labels are firstly required to be carried out on a large number of sample images pixel by pixel; and then training a semantic segmentation model based on the marked sample. However, the process of labeling a large number of sample images pixel by pixel requires a great deal of time and cost; to solve this problem, a sample data set is generally constructed by adopting a mode of simulating and synthesizing a sample image at present; however, due to a certain difference between the synthesized image and the real image, the performance of the semantic segmentation network obtained based on the training of the synthesized image is significantly reduced when the semantic segmentation processing is performed on the real image.
Disclosure of Invention
The embodiment of the disclosure at least provides a training method, an image processing method and a device of a neural network.
In a first aspect, an embodiment of the present disclosure provides a training method of a neural network, including: carrying out semantic segmentation processing on a first noise image of the target image by utilizing a student network to obtain a first semantic segmentation image; performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, and the credibility information; updating the parameter values of the teacher network based on the updated parameter values of the student network.
The first semantic segmentation image, the second semantic segmentation image and the credibility information are used for controlling the student network and the teacher network to generate consistent prediction results after the same target image is disturbed, so that the student network can learn specific characteristics in the target image in the process of migrating based on the target image, namely, the student network migrates and learns towards a specific direction, and the parameter value of the teacher network is updated according to the parameter value of the student network, so that the teacher network migrates and learns towards the specific direction, and the problem of negative migration is avoided.
In a possible embodiment, the method further comprises: performing semantic segmentation processing on a style migration image of a source image by using a student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located; the updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information includes: updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the labeling information of the source image.
In this way, semantic segmentation processing is performed on the style migration image of the source image by using the student network to obtain a third semantic segmentation image, and then the parameter value updating process of the student network is supervised based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the labeling information of the source image, so that the semantic segmentation precision of the student network and the teacher network can be further improved.
In a possible implementation manner, the updating the parameter value of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image, and the labeling information of the source image includes: determining a consistency loss based on the first semantically segmented image, the second semantically segmented image, and the confidence information; determining the weight of the consistency loss based on the current iteration times; determining semantic segmentation loss based on the third semantic segmentation image and annotation information of the source image; updating parameter values of the student network based on the consistency loss, the weight, and the semantic segmentation loss.
In this way, the weight of the consistency loss is determined according to the current iteration times, the adjustment process of the parameter values of the student network is supervised based on the consistency loss, the determined weight of the consistency loss and the semantic segmentation loss, and the influence of the consistency loss and the semantic segmentation loss on the parameter values of the student network and the teacher network is dynamically adjusted along with the increase of the iteration times of the student network and the teacher network, so that specific features in the target image are learned on the premise of guaranteeing the semantic segmentation precision of the student network and the teacher network.
In a possible implementation manner, the semantic segmentation processing is performed on the second noise image of the target image by using a teacher network to obtain a second semantic segmentation image, which includes: respectively carrying out semantic segmentation processing on a plurality of second noise images of the target image by using a teacher network to obtain a plurality of intermediate semantic segmentation images; the second semantically segmented image is generated based on the plurality of intermediate semantically segmented images.
In this way, semantic segmentation processing is respectively carried out on a plurality of second noise images by using a teacher network to obtain a plurality of intermediate semantic segmentation images, and the second semantic segmentation images are generated based on the plurality of intermediate semantic segmentation images, so that uncertainty information in the second noise images can be extracted more, reliability information of each pixel point in the second semantic segmentation images obtained based on the second noise images has better prominence, and optimization efficiency of student network parameter values is further improved.
In a possible implementation manner, the generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images includes: sequentially calculating pixel value mean values of pixel points at corresponding positions in a plurality of intermediate semantic segmentation images; and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
In this way, more uncertain information can be extracted by solving the mean value of the pixel values of the pixel points at corresponding positions in the plurality of intermediate semantic segmentation images.
In a possible implementation manner, the determining, based on the second semantically segmented image, the credibility information of each pixel point in the second semantically segmented image includes: determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.
In this way, the information entropy of each pixel point in the second semantic segmentation image is extracted through the pixel value of each pixel point in the second semantic segmentation image, and then the credibility information of each pixel point in the second semantic segmentation image is determined based on the information entropy.
In a possible implementation manner, the determining the credibility information of each pixel in the second semantically segmented image based on the information entropy of each pixel in the second semantically segmented image and a predetermined information entropy threshold value includes: comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold; determining credibility information of each pixel point in the second semantic segmentation image based on the comparison result; and if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, the credibility information corresponding to the any pixel point is set as a preset value representing the credibility of the pixel value of the any pixel point, wherein the preset value is larger than 0.
In this way, the consistency loss of the generated first semantic segmentation image and the generated second semantic segmentation image only considers the credible pixel points in the second semantic segmentation image, so that when the parameter values of the student network are updated based on the consistency loss, the result of the semantic segmentation processing of the student network and the teacher network on the target image added with different disturbance can be ensured to be consistent. And updating the parameter values of the teacher network based on the updated parameter values of the student network, so that the parameter values of the teacher network and the parameter values of the student network can be kept consistent, and the teacher network and the student network can learn the specific characteristics of the target image.
In a possible implementation manner, the information entropy threshold value is generated by adopting the following way: and determining the information entropy threshold based on the semantic segmentation type of the teacher network.
In a possible implementation, updating the parameter values of the teacher network based on the updated parameter values of the student network includes: performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values; and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
Therefore, the parameter value of the teacher network is an exponential moving average value based on the parameter value of the student network, so that the teacher network and the student network can be converged more quickly, and the training efficiency of the neural network is improved.
In a second aspect, an embodiment of the present disclosure further provides a training apparatus for a neural network, including: the first processing module is used for carrying out semantic segmentation processing on the first noise image of the target image by utilizing the student network to obtain a first semantic segmentation image; the second processing module is used for carrying out semantic segmentation processing on a second noise image of the target image by utilizing a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; a first updating module, configured to update parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, and the credibility information; and the second updating module is used for updating the parameter value of the teacher network based on the updated parameter value of the student network.
In a possible embodiment, the apparatus further comprises: the third processing module is used for carrying out semantic segmentation processing on the style migration image of the source image by utilizing the student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located; the first updating module is configured to, when updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information: updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the labeling information of the source image.
In a possible implementation manner, the first updating module is configured to, when updating the parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image, and the labeling information of the source image: determining a consistency loss based on the first semantically segmented image, the second semantically segmented image, and the confidence information; determining the weight of the consistency loss based on the current iteration times; determining semantic segmentation loss based on the third semantic segmentation image and annotation information of the source image; updating parameter values of the student network based on the consistency loss, the weight, and the semantic segmentation loss.
In a possible implementation manner, the second processing module is configured to, when performing semantic segmentation processing on the second noise image of the target image by using the teacher network, obtain a second semantic segmentation image: respectively carrying out semantic segmentation processing on a plurality of second noise images of the target image by using a teacher network to obtain a plurality of intermediate semantic segmentation images; the second semantically segmented image is generated based on the plurality of intermediate semantically segmented images.
In a possible implementation manner, the second processing module is configured, when generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images, to: sequentially calculating pixel value mean values of pixel points at corresponding positions in a plurality of intermediate semantic segmentation images; and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
In a possible implementation manner, the second processing module is configured to, when determining, based on the second semantically segmented image, reliability information of each pixel point in the second semantically segmented image: determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.
In a possible implementation manner, the second processing module is configured to, when determining the reliability information of each pixel point in the second semantically segmented image based on the information entropy of each pixel point in the second semantically segmented image and a predetermined information entropy threshold value: comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold; determining credibility information of each pixel point in the second semantic segmentation image based on the comparison result; and if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, the credibility information corresponding to the any pixel point is set as a preset value representing the credibility of the pixel value of the any pixel point, wherein the preset value is larger than 0.
In a possible implementation manner, the second processing module is further configured to generate the information entropy threshold in the following manner: and determining the information entropy threshold based on the semantic segmentation type of the teacher network.
In a possible implementation manner, the second updating module is configured to, when updating the parameter value of the teacher network based on the updated parameter value of the student network: performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values; and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
In a third aspect, an embodiment of the present disclosure further provides an image processing method, including: acquiring an image to be processed; and carrying out semantic segmentation processing on the image to be processed by using the neural network trained by the training method based on the neural network in any one of the first aspect to obtain a semantic segmentation result of the image to be processed.
In a fourth aspect, an embodiment of the present disclosure further provides an image processing apparatus, including: the acquisition module is used for acquiring the image to be processed; the processing module is used for carrying out semantic segmentation processing on the image to be processed by utilizing the neural network trained by the training method based on the neural network in any one of the first aspect to obtain a semantic segmentation result of the image to be processed.
In a fifth aspect, an embodiment of the present disclosure further provides an intelligent travel control method, including: acquiring an image acquired by a running device in the running process; detecting a target object in the image using a neural network trained based on the training method of any one of the first aspects; the running apparatus is controlled based on the detected target object.
In a sixth aspect, an embodiment of the present disclosure further provides an intelligent travel control apparatus, including: the data acquisition module is used for acquiring images acquired by the driving device in the driving process; a detection module for detecting a target object in the image using a neural network trained based on the training method of the neural network of any one of the first aspects; and the control module is used for controlling the running device based on the detected target object.
In a seventh aspect, an optional implementation manner of the disclosure further provides an electronic device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when executed by the processor, the machine-readable instructions execute the steps in the first aspect, or any possible implementation manner of the first aspect, or execute the steps in the possible implementation manner of the third aspect, or execute the steps in the possible implementation manner of the fifth aspect.
In an eighth aspect, an alternative implementation manner of the present disclosure further provides a computer readable storage medium, where a computer program is stored, the computer program when executed performs the steps in the first aspect, or any possible implementation manner of the first aspect, or performs the steps in the possible implementation manner of the third aspect, or performs the steps in the possible implementation manner of the fifth aspect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.
FIG. 1 illustrates a flow chart of a method of training a neural network provided by an embodiment of the present disclosure;
FIG. 2 illustrates a flowchart of a particular method for determining confidence information for each pixel in a second semantically segmented image provided by embodiments of the present disclosure;
FIG. 3 illustrates a flow chart of another neural network training method provided by embodiments of the present disclosure;
FIG. 4 is a schematic diagram showing a specific example of a training method of a neural network according to an embodiment of the present disclosure;
FIG. 5 shows a flowchart of an image processing method provided by an embodiment of the present disclosure;
FIG. 6 shows a flow chart of an intelligent travel control method provided by an embodiment of the present disclosure;
FIG. 7 illustrates a schematic diagram of a training apparatus for a neural network provided by an embodiment of the present disclosure;
fig. 8 shows a schematic diagram of an image processing apparatus provided by an embodiment of the present disclosure;
fig. 9 shows a schematic diagram of an intelligent travel control apparatus provided in an embodiment of the present disclosure;
fig. 10 shows a schematic diagram of an electronic device provided by an embodiment of the disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
It has been found that a neural network usually takes a lot of time and cost to label a sample image before training to form a labeled dataset; to reduce sample labeling time and cost, neural networks are in many cases trained by computer simulated synthetic images; however, because of a certain domain difference between the synthesized image and the real image, the neural network obtained by training the synthesized image has the problem of performance degradation when executing an image processing task on the real image; to solve this problem, more supervised signal supervised training is currently generally performed on the antagonism framework, for example, depth, style, category constraint, decision boundary and other supervised signals are adopted on the basis of the generated antagonism network to perform migration learning on the neural network; however, in the process of performing migration learning by using the neural network, the learned features have great uncertainty, so that the problem of negative migration may be caused.
Based on the above study, the disclosure provides a training method and device for a neural network, which controls a teacher network and a student network to generate consistent prediction results for unlabeled target images under different disturbance so as to supervise the student network to perform migration learning, and updates the teacher network based on parameter values of the student network, so that the teacher network and the student network can learn specific technical characteristics in the target images in the migration learning process, and the problem of negative migration is avoided.
The present invention is directed to a method for manufacturing a semiconductor device, and a semiconductor device manufactured by the method.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
For the sake of understanding the present embodiment, first, a detailed description will be given of a neural network training method disclosed in the embodiments of the present disclosure, where an execution subject of the neural network training method provided in the embodiments of the present disclosure is generally a computer device having a certain computing capability, where the computer device includes, for example: a terminal device or server or other processing device; in some possible implementations, the training method of the neural network may be implemented by a processor invoking computer readable instructions stored in a memory.
The following describes a training method of a neural network provided in an embodiment of the present disclosure.
In the embodiment of the present disclosure, before updating the parameter values of the Student Network (Student Network) and the Teacher Network (Teacher Network) based on S101 to S104, the parameter values of the Student Network and the Teacher Network may be initialized first.
By way of example, the teacher network and the student network may be initialized, for example, with a pre-trained semantic segmentation network.
Here, the pre-trained semantic segmentation network is, for example, a neural network trained based on a source image; in the embodiment of the disclosure, the processes S101 to S104 are processes of controlling the pre-trained semantic segmentation network to perform transfer learning from the source domain to the target domain based on the target image, so that performance of the semantic segmentation network in performing semantic segmentation processing on the image of the target domain is not reduced after the transfer learning is performed.
The image of the source domain includes, for example: synthesizing the images; the image of the target domain includes, for example: a real image.
After initializing parameter values of the student network and the teacher network, performing multiple iterations on the student network and the teacher network based on S101-S104, and determining the teacher network or the student network after the multiple iterations as a trained neural network. Here, each time the processes of S101 to S104 are performed, a round of iteration is performed on the student network and the teacher network.
Referring to fig. 1, a flowchart of a neural network training method according to an embodiment of the disclosure is shown, where the method includes:
s101: and carrying out semantic segmentation processing on the first noise image of the target image by using the student network to obtain a first semantic segmentation image.
In a specific implementation, the first noise image may be obtained by injecting random noise into the target image, for example.
Illustratively, random noise includes, for example: any one of gaussian noise, white noise, and the like may be specifically determined according to actual needs.
After random noise is injected into a target image, a first noise image is generated, and then semantic segmentation processing is carried out on the first noise image by utilizing a student network; when the student network performs semantic segmentation processing on the first noise image, a semantic segmentation result of each pixel point in the first noise image can be obtained; then forming a first semantic segmentation image based on the semantic segmentation result of each pixel point in the first noise image; the size of the first semantically segmented image is the same as the size of the first noisy image.
The pixel value of any pixel point a 'in the first semantic segmentation image is the semantic segmentation result of the pixel point a corresponding to the any pixel point a' in the first noise image.
The training method of the neural network provided by the embodiment of the disclosure further comprises the following steps:
s102: performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image.
In specific implementation, S102 and S101 have no sequential logic relationship; the execution may be synchronous or asynchronous.
The second noise image is generated in a similar manner to the first noise image, and may be obtained by injecting random noise into the target image, for example. Wherein different noise images of the target image are different in noise injected.
In one possible embodiment, the second noise image has one piece; in this case, the semantic segmentation processing is performed on the second noise image by using the teacher network, so that a semantic segmentation result of each pixel point in the second noise image can be obtained, and then the second semantic segmentation image is formed based on the semantic segmentation result of each pixel point in the second noise image.
In another possible embodiment, the second noise image has a plurality of pieces; under the situation, semantic segmentation processing is respectively carried out on a plurality of second noise images of the target image by using a teacher network, so that an intermediate semantic segmentation image corresponding to each second noise image in the plurality of second noise images is obtained; a second semantically segmented image is then generated based on the plurality of intermediate semantically segmented images.
Here, for example, the pixel value average value may be sequentially calculated for the pixels at the corresponding positions in the plurality of second semantic division images, and the pixel value average value at any corresponding position may be determined as the pixel value of the pixel at the corresponding position in the second semantic division image.
For example, the size of the target image is h×w, and the number of second noise images of the target image is N, which are A1, A2, … …, AN; then the teacher network is utilized to carry out semantic segmentation processing on the plurality of second noise images, and then an intermediate semantic segmentation image of the ith second noise image is obtainedExpressed as: />Wherein x is t Representing a target image; h represents the height of the target image, and w represents the width of the target image; c represents the semantic segmentation class of the teacher network.
Second semantically segmented imageFor example, the following formula (1) is satisfied:
thus, random noise is injected into the target image for many times, a plurality of second noise images are generated, and based on the intermediate semantic segmentation images corresponding to the second noise images, the second semantic segmentation images are obtained, and uncertainty information in the second noise images can be extracted more, so that reliability information of each pixel point in the second semantic segmentation images obtained based on the second noise images has better prominence, and further, optimization efficiency of student network parameter values is improved.
After obtaining the second semantic segmentation image, referring to fig. 2, the embodiment of the disclosure further provides a specific method for determining reliability information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image, which includes:
s201: and determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image.
Here, the information entropy of any pixel pointFor example, the following formula (2) is satisfied:
s202: and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.
Here, the information entropy threshold may be determined, for example, based on the semantic segmentation class of the teacher network.
The information entropy threshold H satisfies the following formula (3), for example:
wherein a, b and c are super parameters; k (K) max =log c; c represents the semantic segmentation class of the teacher network. t represents the current iteration round number; t is t max Representing the maximum number of iteration rounds.
Illustratively, the information entropy threshold satisfies, for example:
for example, the information entropy of each pixel point in the second semantic segmentation image may be compared with a predetermined information entropy threshold; and then determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result.
And if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, the credibility information corresponding to the any pixel point is set as a preset value representing the credibility of the pixel value of the any pixel point, wherein the preset value is larger than 0.
In a specific implementation, as can be seen from the above formula (2), the value of the information entropy is a negative number; for a certain pixel point in the second semantic segmentation image, the smaller the value of the information entropy of the pixel point is, the higher the credibility of the pixel point is represented, namely, the higher the credibility of the classification of the pixel point in the corresponding target image represented by the pixel value of the pixel point in the second semantic segmentation image is. When determining that the consistency between the first semantic segmentation image and the second semantic segmentation image is lost, taking pixels with higher reliability in the second semantic segmentation image into consideration, and increasing the influence of the pixels with higher reliability on the consistency loss; and for the pixel points with lower credibility in the second semantic segmentation image, the influence of the pixel points on the consistency loss can be reduced, and even the influence of the pixel points on the consistency loss can be removed.
Furthermore, for example, the preset that the pixel value is reliable may be set to 1; the preset value for which the pixel value is not trusted is set to 0.
For another example, the preset value for which the pixel value is trusted may be set to 1, the preset value for which the pixel value is not trusted may be set to 0.5, and so on.
Specific setting can be carried out according to actual needs.
Further, exemplarily, the reliability information of each pixel point in the second semantically segmented image satisfies the following formula (4), for example:
wherein H represents an information entropy threshold; i (·) represents a 0-1 function; and is also provided withWhen I (& gt) is adopted, 1 is taken; />When I (·) takes 0.
With the S101 and S102 described above in mind, the training method of the neural network provided in the embodiment of the present disclosure further includes:
s103: updating parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information.
S104: updating the parameter values of the teacher network based on the updated parameter values of the student network.
In a specific implementation, for example, a consistency loss between the first semantically segmented image and the second semantically segmented image may be determined based on the first semantically segmented image, the second semantically segmented image, and the confidence information, and then parameter values of the student network may be updated based on the consistency loss.
In a specific implementation, as can be seen from the above formula (3), H is a time-dependent function, and the consistency loss may be, for example, a mean square error between a first semantic segmentation image extracted by the student network and a second semantic segmentation image extracted by the teacher network, and the consistency loss L con For example, the following formula (5) is satisfied:
wherein f S Representing a student network; f (f) T Representing a teacher network; x is x t1 Representing a first noisy image; x is x t2 Representing a second noisy image; sigma represents an activation function, for example a softmax activation function.
When updating the parameter values of the student network based on the consistency loss, for example, the parameter values of the student network are adjusted in a direction to reduce the consistency loss.
When updating the parameter values of the teacher network based on the updated parameter values of the student network, for example, an exponential moving average process may be performed on the parameter values of the parameters in the student network to obtain target parameter values; and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
In a specific implementation, as can be known from the above formula (4) and the formula (5), when the semantic segmentation result represented by any pixel point in the second semantic segmentation image is reliable, the value of the credibility information corresponding to the any pixel point is 1; when the semantic segmentation result represented by any pixel point in the second semantic segmentation image is not credible, the credibility information corresponding to the any pixel point is 0, and further the consistency loss is determined based on the pixel points with credible semantic segmentation results in the second semantic segmentation image, and further the generated consistency loss of the first semantic segmentation image and the second semantic segmentation image only considers the credible pixel points in the second semantic segmentation image, so that when the parameter values of the student network are updated based on the consistency loss, the result of the semantic segmentation processing of the student network and the teacher network on the target image added with different disturbance tends to be consistent. And updating the parameter values of the teacher network based on the updated parameter values of the student network, so that the parameter values of the teacher network and the parameter values of the student network can keep consistent change directions, and the teacher network and the student network can learn specific characteristics of the target image.
In the embodiment of the disclosure, the first noise image and the second noise image are images obtained by different disturbance on the target image; performing semantic segmentation processing on the first noise image by using a student network to obtain a first semantic segmentation image, performing semantic segmentation processing on the second noise image by using a teacher network to obtain a second semantic segmentation image, determining reliability information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image, updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image and the reliability information, and updating parameter values of the teacher network based on the updated parameter values of the student network; according to the process, through the first semantic segmentation image, the second semantic segmentation image and the credibility information, the student network and the teacher network are controlled to generate consistent prediction results after the same target image is disturbed, so that the student network can learn specific characteristics in the target image in the process of migrating based on the target image, namely, the student network migrates and learns towards a specific direction, and because the parameter value of the teacher network is updated according to the parameter value of the student network, the teacher network migrates and learns towards the specific direction, and the problem of negative migration is avoided.
Referring to fig. 3, another method for training a neural network is provided in an embodiment of the present disclosure, including:
s301: and carrying out semantic segmentation processing on the first noise image of the target image by using the student network to obtain a first semantic segmentation image.
S302: performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image.
The specific implementation process of S301 to S302 is similar to that of S101 to S102, and will not be repeated here.
S303: and carrying out semantic segmentation processing on the style migration image of the source image by utilizing the student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located.
In specific implementation, S303 has no sequential logic relationship with the above S301-S302; the execution may be synchronous or asynchronous.
Specifically, the style migration image of the source image can be obtained, for example, in the following manner:
performing style migration processing on the source image by utilizing a pre-trained style migration network to obtain a style migration image corresponding to the source image; the style migration network is trained by utilizing the source image and the target image.
In particular implementations, the style migration network is, for example, a generative antagonism network (Generative Adversarial Networks, GANs), such as CycleGAN, or the like. The generation type countermeasure network can integrate semantic information of a source domain carried in the source image and semantic information of a target domain carried in the target image, so that the source image is converted into a style migration image containing partial features in the target image; and then carrying out semantic segmentation processing on the style migration image by using the student network.
In addition, the segmented migration image can also be generated by using a style migration network with other architectures, such as a neural network with an architecture VGG, googLeNet, and the style migration image can be specifically selected according to actual needs.
With the S302 and S303 described above in mind, the training method of the neural network provided in the embodiment of the present disclosure further includes:
s304: updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the labeling information of the source image.
In a specific embodiment, the parameter values of the student network may be updated, for example, in the following manner: generating a consistency loss of the first semantic segmentation image and the second semantic segmentation image based on the first semantic segmentation image, the second semantic segmentation image, and the confidence information; generating semantic segmentation loss based on the labeling information of the third semantic segmentation image and the source image; parameters of the student network are updated based on the consistency loss and the semantic segmentation loss.
Exemplary, semantic segmentation penalty L seg For example, for optimizing cross entropy loss of source images from a source domain, which satisfies the following formula (6):
wherein H represents the height of the style migration image; w represents the width of the style migration image; c represents the number of channels; y is s Labeling information representing a source image;representing a third semantically segmented image; />Representing a source image; f (f) S (. Cndot.) represents the student network.
When updating the parameter values of the student network based on the semantic segmentation loss and the consistency loss, for example, the weight of the consistency loss may be determined according to the current iteration number, and then the parameter values of the student network may be updated according to the consistency loss, the weight of the consistency loss, and the semantic segmentation loss.
For example, determining science from semantic segmentation loss, consistency lossTotal loss of the network; wherein the total loss L total For example, the following formula (7) is satisfied:
L total =L segcon L con (7)
wherein L is seg Representing semantic segmentation loss; l (L) con Representing a consistency loss; lambda (lambda) con The weight of the consistency loss is, for example, a dynamic weight, which is set as a rising function increasing with the iteration number, the dynamic weight can balance between the semantic segmentation loss and the consistency loss, the advantage of the semantic segmentation loss is increased in the early training process of the neural network, and the advantage of the consistency loss is gradually increased in the later training process, so as to stably control the convergence of the parameter values of the neural network.
In connection with S304, the training method of the neural network provided in the embodiment of the disclosure further includes:
s305: updating the parameter values of the teacher network based on the updated parameter values of the student network.
Here, the specific implementation process of S305 is similar to S104 described above, and will not be described here again.
According to the embodiment of the disclosure, the semantic segmentation processing is carried out on the style migration image of the source image by utilizing the student network to obtain the third semantic segmentation image, and then the parameter value updating process of the student network is supervised based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the labeling information of the source image, so that the semantic segmentation precision of the student network and the teacher network can be further improved.
Referring to fig. 4, the embodiment of the disclosure further provides a specific example of a training method of a neural network, including:
step 1: image x of source s Inputting the source image x into a style migration network to obtain a source image x s Is a style migration image of (1)
Step 2: migrating styles to imagesAnd inputting the third semantic segmentation image into a student network to obtain a third semantic segmentation image.
Step 3: based on source image x s Is marked with information y s And a third semantic segmentation image to obtain a semantic segmentation penalty L seg
Step 4: for the target image x t Injecting random noise, generating a first noise image, and inputting the first noise image into a student network to obtain a first semantic segmentation image.
Step 5: for the target image x t Injecting random noise, generating N second noise images, and inputting the N second noise images into a teacher network to obtain a plurality of intermediate semantic segmentation images. And sequentially calculating pixel value mean values of pixel points at corresponding positions in the plurality of intermediate semantic segmentation images to obtain a second semantic segmentation image.
Step 7: and (3) calculating the information entropy of each pixel point in the second semantically segmented image according to the formula (2).
Step 8: and (3) performing reliability calculation according to the calculation of the formula (4) to obtain reliability information of each pixel point in the second semantic segmentation image.
Step 9: obtaining consistency loss L of the first semantic segmentation image and the second semantic segmentation image according to the first semantic segmentation image, the second semantic segmentation image and the credibility information con
Step 10: calculate the total loss L according to equation (7) total
Step 11: according to the total loss L total Updating the parameter values of the student network.
Step 12: and carrying out index moving average processing on the updated parameter values of the student network, and updating the parameter values of the teacher network based on the result of the index moving average processing.
Through the above process, a round of iteration of the student network and the teacher network is achieved.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Referring to fig. 5, an embodiment of the present disclosure further provides an image processing method, including:
s501: acquiring an image to be processed;
s502: and carrying out semantic segmentation processing on the image to be processed by using the neural network trained by the training method based on any embodiment of the neural network to obtain a semantic segmentation result of the image to be processed.
When the semantic segmentation processing is carried out on the image to be processed, the implementation is realized by utilizing the neural network trained by the neural network training method provided by the embodiment of the disclosure, the neural network trained by the neural network training method has better semantic style precision on the image to be processed, and further, the obtained semantic segmentation result of the image to be processed is more accurate.
Referring to fig. 6, an embodiment of the present disclosure further provides an intelligent driving control method, including:
S601: acquiring an image acquired by a running device in the running process;
s602: detecting a target object in the image using a neural network trained based on the training method of the neural network described in any of the embodiments of the present disclosure;
s603: the running apparatus is controlled based on the detected target object.
In particular implementations, the running gear is, for example, but not limited to, any of the following: an autonomous vehicle, a vehicle equipped with an advanced driving assistance system (Advanced Driving Assistance System, ADAS), or a robot, etc.
Controlling the running gear may include, for example, controlling the running gear to accelerate, decelerate, steer, brake, etc., or may play a voice prompt to prompt the driver to control the running gear to accelerate, decelerate, steer, brake, etc.
The intelligent driving control method disclosed by the embodiment of the disclosure is realized by utilizing the neural network trained by the neural network training method disclosed by the embodiment of the disclosure, and when the neural network trained by the neural network training method performs semantic segmentation processing on the image obtained in the driving process, a more accurate semantic segmentation processing result can be obtained, so that higher safety in the driving control execution process is ensured.
Based on the same inventive concept, the embodiments of the present disclosure further provide a neural network training device corresponding to the neural network training method, and since the principle of solving the problem of the device in the embodiments of the present disclosure is similar to that of the neural network training method in the embodiments of the present disclosure, implementation of the device may refer to implementation of the method, and repeated parts will not be repeated.
Referring to fig. 7, a schematic diagram of a training device for a neural network according to an embodiment of the disclosure is shown, where the device includes: a first processing module 71, a second processing module 72, a first updating module 73, and a second updating module 74; wherein,
a first processing module 71, configured to perform semantic segmentation processing on a first noise image of the target image by using the student network, so as to obtain a first semantic segmentation image;
a second processing module 72, configured to perform semantic segmentation processing on a second noise image of the target image by using a teacher network, so as to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;
a first updating module 73, configured to update parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information;
A second updating module 74 for updating the parameter values of the teacher network based on the updated parameter values of the student network.
In a possible embodiment, the apparatus further comprises: a third processing module 75, configured to perform semantic segmentation processing on a style migration image of a source image by using a student network, to obtain a third semantic segmentation image, where the style migration image of the source image is an image obtained by migrating a style of the source image to a target domain where the target image is located;
the first updating module 73 is configured to, when updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information:
updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the labeling information of the source image.
In a possible implementation manner, the first updating module 73 is configured to, when updating the parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image, and the labeling information of the source image:
Determining a consistency loss based on the first semantically segmented image, the second semantically segmented image, and the confidence information; determining the weight of the consistency loss based on the current iteration times;
determining semantic segmentation loss based on the third semantic segmentation image and annotation information of the source image;
updating parameter values of the student network based on the consistency loss, the weight, and the semantic segmentation loss.
In a possible implementation manner, the second processing module 72 is configured to, when performing semantic segmentation processing on the second noise image of the target image by using the teacher network, obtain a second semantic segmented image:
respectively carrying out semantic segmentation processing on a plurality of second noise images of the target image by using a teacher network to obtain a plurality of intermediate semantic segmentation images;
the second semantically segmented image is generated based on the plurality of intermediate semantically segmented images.
In a possible implementation manner, the second processing module 72 is configured, when generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images, to:
sequentially calculating pixel value mean values of pixel points at corresponding positions in a plurality of intermediate semantic segmentation images;
And determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
In a possible implementation manner, the second processing module 72 is configured to, when determining, based on the second semantically segmented image, reliability information of each pixel point in the second semantically segmented image:
determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image;
and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.
In a possible implementation manner, the second processing module 72 is configured to, when determining the reliability information of each pixel point in the second semantically segmented image based on the information entropy of each pixel point in the second semantically segmented image and a predetermined information entropy threshold value:
comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold;
determining credibility information of each pixel point in the second semantic segmentation image based on the comparison result;
And if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, the credibility information corresponding to the any pixel point is set as a preset value representing the credibility of the pixel value of the any pixel point, wherein the preset value is larger than 0.
In a possible implementation manner, the second processing module 72 is further configured to generate the information entropy threshold in the following manner:
and determining the information entropy threshold based on the semantic segmentation type of the teacher network.
In a possible implementation manner, the second updating module 74 is configured to, when updating the parameter value of the teacher network based on the updated parameter value of the student network:
performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values;
and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
In a possible embodiment, the method further comprises: a first generation module 76 is configured to generate the style migration image in the following manner:
performing style migration processing on the source image by utilizing a pre-trained style migration network to obtain a style migration image corresponding to the source image; the style migration network is trained by utilizing the source image and the target image.
In a possible embodiment, the method further comprises: an initialization module 77, configured to perform an initialization process on the teacher network and the student network using a pre-trained semantic segmentation network.
In a possible embodiment, the method further comprises: a second generating module 78, configured to generate the first noise image and the second noise image in the following manner:
injecting random noise into the target image to obtain the first noise image and the second noise image; wherein the noise corresponding to different noise images is different.
The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.
Referring to fig. 8, an embodiment of the present disclosure further provides an image processing apparatus including:
an acquiring module 81, configured to acquire an image to be processed;
the processing module 82 is configured to perform semantic segmentation processing on the image to be processed by using the neural network trained by the neural network training method according to any embodiment of the present disclosure, so as to obtain a semantic segmentation result of the image to be processed.
Referring to fig. 9, an embodiment of the present disclosure further provides an intelligent driving control device, which is characterized by including:
A data acquisition module 91, configured to acquire an image acquired by the driving device during driving;
a detection module 92 for detecting a target object in the image using a neural network trained based on the neural network training method according to any of the embodiments of the present disclosure;
a control module 93 for controlling the running apparatus based on the detected target object.
The embodiment of the present disclosure further provides an electronic device 10, as shown in fig. 10, which is a schematic structural diagram of the electronic device 10 provided in the embodiment of the present disclosure, including:
a processor 11 and a memory 12; the memory 12 stores machine readable instructions executable by the processor 11 which, when the electronic device is running, are executed by the processor to perform the steps of:
carrying out semantic segmentation processing on a first noise image of the target image by utilizing a student network to obtain a first semantic segmentation image; performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, and the credibility information; updating the parameter values of the teacher network based on the updated parameter values of the student network.
Or the following steps are realized: acquiring an image to be processed; performing semantic segmentation processing on the image to be processed by using a neural network trained based on the training method of the neural network according to any embodiment of the disclosure to obtain a semantic segmentation result of the image to be processed;
or the following steps are realized: acquiring an image acquired by a running device in the running process; detecting a target object in the image using a neural network trained based on the training method of the neural network described in any of the embodiments of the present disclosure; the running apparatus is controlled based on the detected target object. .
The specific execution process of the above instruction may refer to the steps of the neural network training method described in the embodiments of the present disclosure, or the image processing steps, which are not described herein.
The disclosed embodiments also provide a computer-readable storage medium having a computer program stored thereon, which when executed by a processor performs the steps of the neural network training method described in the method embodiment, or performs the steps of the image processing method described in the method embodiment, or performs the steps of the intelligent travel control method described in the method embodiment. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
The computer program product of the training method and the image processing method of the neural network provided in the embodiments of the present disclosure includes a computer readable storage medium storing a program code, where the program code includes instructions for executing the training method, the steps of the image processing method, or the steps of the intelligent driving control method of the neural network described in the embodiments of the methods, and specifically, reference may be made to the embodiments of the methods described above, and details thereof will not be repeated herein.
The disclosed embodiments also provide a computer program which, when executed by a processor, implements any of the methods of the previous embodiments. The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions for causing an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (14)

1. A method of training a neural network, comprising:
carrying out semantic segmentation processing on a first noise image of the target image by utilizing a student network to obtain a first semantic segmentation image;
performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;
Updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, and the credibility information;
updating the parameter values of the teacher network based on the updated parameter values of the student network;
the method further comprises the steps of:
performing semantic segmentation processing on a style migration image of a source image by using a student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located;
the updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information includes:
updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the labeling information of the source image;
the updating the parameter value of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the labeling information of the source image comprises the following steps:
Determining a consistency loss based on the first semantically segmented image, the second semantically segmented image, and the confidence information; determining the weight of the consistency loss based on the current iteration times;
determining semantic segmentation loss based on the third semantic segmentation image and annotation information of the source image;
updating parameter values of the student network based on the consistency loss, the weight, and the semantic segmentation loss.
2. The training method according to claim 1, wherein performing semantic segmentation processing on the second noise image of the target image by using a teacher network to obtain a second semantic segmented image comprises:
respectively carrying out semantic segmentation processing on a plurality of second noise images of the target image by using a teacher network to obtain a plurality of intermediate semantic segmentation images;
the second semantically segmented image is generated based on the plurality of intermediate semantically segmented images.
3. The training method of claim 2, wherein the generating the second semantically segmented image based on the plurality of intermediate semantically segmented images comprises:
sequentially calculating pixel value mean values of pixel points at corresponding positions in a plurality of intermediate semantic segmentation images;
And determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
4. A training method according to any of claims 1-3, wherein said determining the confidence information of each pixel point in the second semantically segmented image based on the second semantically segmented image comprises:
determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image;
and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.
5. The training method of claim 4, wherein determining the confidence information for each pixel in the second semantically segmented image based on the information entropy of each pixel in the second semantically segmented image and a predetermined information entropy threshold comprises:
comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold;
determining credibility information of each pixel point in the second semantic segmentation image based on the comparison result;
And if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, the credibility information corresponding to the any pixel point is set as a preset value representing the credibility of the pixel value of the any pixel point, wherein the preset value is larger than 0.
6. The training method of claim 4, wherein the information entropy threshold is generated by:
and determining the information entropy threshold based on the semantic segmentation type of the teacher network.
7. The training method of any of claims 1-3, 5, 6, wherein updating the parameter values of the teacher network based on the updated parameter values of the student network comprises:
performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values;
and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
8. An image processing method, comprising:
acquiring an image to be processed;
performing semantic segmentation processing on the image to be processed by using the neural network trained by the training method based on the neural network according to any one of claims 1-7 to obtain a semantic segmentation result of the image to be processed.
9. An intelligent travel control method is characterized by comprising the following steps:
acquiring an image acquired by a running device in the running process;
detecting a target object in the image using a neural network trained based on the training method of any one of claims 1-7;
the running apparatus is controlled based on the detected target object.
10. A neural network training device, comprising:
the first processing module is used for carrying out semantic segmentation processing on the first noise image of the target image by utilizing the student network to obtain a first semantic segmentation image;
the second processing module is used for carrying out semantic segmentation processing on a second noise image of the target image by utilizing a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;
a first updating module, configured to update parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, and the credibility information;
a second updating module, configured to update a parameter value of the teacher network based on the updated parameter value of the student network;
The apparatus further comprises: the third processing module is used for carrying out semantic segmentation processing on the style migration image of the source image by utilizing the student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located;
the first updating module is configured to, when updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information:
updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the labeling information of the source image;
the first updating module is configured to, when updating parameter values of the student network based on the first semantically-segmented image, the second semantically-segmented image, the credibility information, the third semantically-segmented image, and the labeling information of the source image:
determining a consistency loss based on the first semantically segmented image, the second semantically segmented image, and the confidence information; determining the weight of the consistency loss based on the current iteration times;
Determining semantic segmentation loss based on the third semantic segmentation image and annotation information of the source image;
updating parameter values of the student network based on the consistency loss, the weight, and the semantic segmentation loss.
11. An image processing apparatus, comprising:
the acquisition module is used for acquiring the image to be processed;
the processing module is used for carrying out semantic segmentation processing on the image to be processed by utilizing the neural network trained by the training method based on the neural network according to any one of claims 1-7 to obtain a semantic segmentation result of the image to be processed.
12. An intelligent travel control device, comprising:
the data acquisition module is used for acquiring images acquired by the driving device in the driving process;
a detection module for detecting a target object in the image using a neural network trained based on the training method of any one of claims 1-7;
and the control module is used for controlling the running device based on the detected target object.
13. An electronic device, comprising: a processor, a memory storing machine-readable instructions executable by the processor for executing the machine-readable instructions stored in the memory, which when executed by the processor, perform the steps of the method of any one of claims 1 to 9.
14. A computer-readable storage medium, on which a computer program is stored which, when being run by an electronic device, performs the steps of the method according to any one of claims 1 to 9.
CN202010278429.2A 2020-04-10 2020-04-10 Training method of neural network, image processing method and device Active CN111489365B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010278429.2A CN111489365B (en) 2020-04-10 2020-04-10 Training method of neural network, image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010278429.2A CN111489365B (en) 2020-04-10 2020-04-10 Training method of neural network, image processing method and device

Publications (2)

Publication Number Publication Date
CN111489365A CN111489365A (en) 2020-08-04
CN111489365B true CN111489365B (en) 2023-12-22

Family

ID=71794812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010278429.2A Active CN111489365B (en) 2020-04-10 2020-04-10 Training method of neural network, image processing method and device

Country Status (1)

Country Link
CN (1) CN111489365B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967597A (en) * 2020-08-18 2020-11-20 上海商汤临港智能科技有限公司 Neural network training and image classification method, device, storage medium and equipment
CN112150478B (en) * 2020-08-31 2021-06-22 温州医科大学 Method and system for constructing semi-supervised image segmentation framework
CN112070163B (en) * 2020-09-09 2023-11-24 抖音视界有限公司 Image segmentation model training and image segmentation method, device and equipment
CN112419326B (en) * 2020-12-02 2023-05-23 腾讯科技(深圳)有限公司 Image segmentation data processing method, device, equipment and storage medium
CN112633285B (en) * 2020-12-23 2024-07-23 平安科技(深圳)有限公司 Domain adaptation method, domain adaptation device, electronic equipment and storage medium
WO2023019444A1 (en) * 2021-08-17 2023-02-23 华为技术有限公司 Optimization method and apparatus for semantic segmentation model
CN114399640B (en) * 2022-03-24 2022-07-15 之江实验室 Road segmentation method and device for uncertain region discovery and model improvement
CN114708436B (en) * 2022-06-02 2022-09-02 深圳比特微电子科技有限公司 Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium
CN114842457B (en) * 2022-06-29 2023-09-26 小米汽车科技有限公司 Model training and feature extraction method and device, electronic equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1969297A (en) * 2001-06-15 2007-05-23 索尼公司 Image processing apparatus and method and image pickup apparatus
CN106127810A (en) * 2016-06-24 2016-11-16 惠州紫旭科技有限公司 The recording and broadcasting system image tracking method of a kind of video macro block angle point light stream and device
CN106709918A (en) * 2017-01-20 2017-05-24 成都信息工程大学 Method for segmenting images of multi-element student t distribution mixed model based on spatial smoothing
CN110414526A (en) * 2019-07-31 2019-11-05 达闼科技(北京)有限公司 Training method, training device, server and the storage medium of semantic segmentation network
CN110458844A (en) * 2019-07-22 2019-11-15 大连理工大学 A kind of semantic segmentation method of low illumination scene
CN110827963A (en) * 2019-11-06 2020-02-21 杭州迪英加科技有限公司 Semantic segmentation method for pathological image and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10147185B2 (en) * 2014-09-11 2018-12-04 B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University Interactive segmentation
US10643320B2 (en) * 2017-11-15 2020-05-05 Toyota Research Institute, Inc. Adversarial learning of photorealistic post-processing of simulation with privileged information
US10769771B2 (en) * 2018-06-22 2020-09-08 Cnh Industrial Canada, Ltd. Measuring crop residue from imagery using a machine-learned semantic segmentation model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1969297A (en) * 2001-06-15 2007-05-23 索尼公司 Image processing apparatus and method and image pickup apparatus
CN106127810A (en) * 2016-06-24 2016-11-16 惠州紫旭科技有限公司 The recording and broadcasting system image tracking method of a kind of video macro block angle point light stream and device
CN106709918A (en) * 2017-01-20 2017-05-24 成都信息工程大学 Method for segmenting images of multi-element student t distribution mixed model based on spatial smoothing
CN110458844A (en) * 2019-07-22 2019-11-15 大连理工大学 A kind of semantic segmentation method of low illumination scene
CN110414526A (en) * 2019-07-31 2019-11-05 达闼科技(北京)有限公司 Training method, training device, server and the storage medium of semantic segmentation network
CN110827963A (en) * 2019-11-06 2020-02-21 杭州迪英加科技有限公司 Semantic segmentation method for pathological image and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Lequan Yu 等.Uncertainty-aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation.arXiv.2019,第1-9页. *
华敏杰.基于深度学习的图像语义分割算法概述.中国战略新兴产业.2018,第130页. *
郑宝玉 等.基于深度卷积神经网络的弱监督图像语义分割.南京邮电大学学报(自然科学版).2018,第5-16页. *

Also Published As

Publication number Publication date
CN111489365A (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN111489365B (en) Training method of neural network, image processing method and device
US20200327409A1 (en) Method and device for hierarchical learning of neural network, based on weakly supervised learning
US20220363259A1 (en) Method for generating lane changing decision-making model, method for lane changing decision-making of unmanned vehicle and electronic device
CN110349190B (en) Adaptive learning target tracking method, device, equipment and readable storage medium
US11741356B2 (en) Data processing apparatus by learning of neural network, data processing method by learning of neural network, and recording medium recording the data processing method
CN111742333A (en) Method and apparatus for performing deep neural network learning
JP7295282B2 (en) Method for on-device learning of machine learning network of autonomous driving car through multi-stage learning using adaptive hyperparameter set and on-device learning device using the same
CN111914878B (en) Feature point tracking training method and device, electronic equipment and storage medium
CN111401557B (en) Agent decision making method, AI model training method, server and medium
Hornauer et al. Gradient-based uncertainty for monocular depth estimation
CN112200889A (en) Sample image generation method, sample image processing method, intelligent driving control method and device
CN116097277A (en) Method and system for training neural network models using progressive knowledge distillation
CN114139637A (en) Multi-agent information fusion method and device, electronic equipment and readable storage medium
KR102234917B1 (en) Data processing apparatus through neural network learning, data processing method through the neural network learning, and recording medium recording the method
CN113705402A (en) Video behavior prediction method, system, electronic device and storage medium
CN113625753A (en) Method for guiding neural network to learn maneuvering flight of unmanned aerial vehicle by expert rules
CN117372536A (en) Laser radar and camera calibration method, system, equipment and storage medium
US20230281981A1 (en) Methods, devices, and computer readable media for training a keypoint estimation network using cgan-based data augmentation
CN115879536A (en) Learning cognition analysis model robustness optimization method based on causal effect
Ertenli et al. Streaming multiscale deep equilibrium models
KR102157441B1 (en) Learning method for neural network using relevance propagation and service providing apparatus
KR20230038136A (en) Knowledge distillation method and system specialized for lightweight pruning-based deep neural networks
CN113689437A (en) Interactive image segmentation method based on iterative selection-correction network
CN106446524A (en) Intelligent hardware multimodal cascade modeling method and apparatus
KR102608304B1 (en) Task-based deep learning system and method for intelligence augmented of computer vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant