CN113255915A

CN113255915A - Knowledge distillation method, device, equipment and medium based on structured instance graph

Info

Publication number: CN113255915A
Application number: CN202110551061.7A
Authority: CN
Inventors: 陈亦新; 陈鹏光; 贾佳亚; 沈小勇; 吕江波
Original assignee: Shenzhen Smartmore Technology Co Ltd; Shanghai Smartmore Technology Co Ltd
Current assignee: Shenzhen Smartmore Technology Co Ltd; Shanghai Smartmore Technology Co Ltd
Priority date: 2021-05-20
Filing date: 2021-05-20
Publication date: 2021-08-13
Anticipated expiration: 2041-05-20
Also published as: CN113255915B; CN113255915B8

Abstract

The present application relates to a knowledge distillation method, apparatus, computer device and storage medium based on structured instance graphs. By the method and the device, the foreground characteristic and the background characteristic in the teacher network can be effectively migrated to the student network, and the accuracy of target detection is further improved. The method comprises the following steps: acquiring a training image, and respectively inputting the training image to a teacher network student network to obtain a first characteristic diagram and a second characteristic diagram; respectively inputting the first characteristic diagram and the second characteristic diagram into the area candidate network to obtain a first diagram to be detected and a second diagram to be detected which contain a boundary frame; coding is carried out based on the object examples in each boundary box to obtain a first structural diagram and a second structural diagram; obtaining a distillation loss part based on the first structural diagram and the second structural diagram; obtaining a basic loss part according to the distance between the detection result of the second image to be detected and the real label; training a student network based on the distillation loss part and the base loss part.

Description

Knowledge distillation method, device, equipment and medium based on structured instance graph

Technical Field

The present application relates to the field of artificial intelligence technology, and in particular, to a knowledge distillation method, apparatus, computer device, and storage medium based on a structured instance graph.

Background

With the development of neural networks in the field of target detection, the function of the target detector is more and more powerful. These target detectors generally adopt a deep neural network structure, but the large amount of weight in these deep neural networks consumes much memory space and computation amount, and thus are difficult to deploy on a mobile device.

To solve this problem, the known distillation methods have come into force. Knowledge Distillation (Knowledge Distillation) is a model compression method, and the main idea is to utilize a completely trained larger teacher network (teacher network) to assist the training of a student network (student network) with low resource consumption, so as to reduce the resource consumption in the computer vision task processing and achieve the same task processing effect as the teacher network (teacher network).

However, in the process of computer vision task processing, the ratio of foreground features to background features of an image is greatly different, so that the foreground features and the background features learned by students through network are always unbalanced in the knowledge distillation process, the foreground features in the image are not ignored, the background features are not fully utilized, and the corresponding features do not play a role in due knowledge transfer in the knowledge distillation.

Disclosure of Invention

In view of the above, there is a need to provide a knowledge distillation method, apparatus, computer device and storage medium based on a structured example graph.

A knowledge distillation method based on a structured example graph, the method comprising:

acquiring a training image;

inputting the training image into a backbone network of a teacher network to obtain a first feature map, and inputting the training image into a backbone network of a student network to obtain a second feature map;

inputting the first feature map and the second feature map into a regional candidate network respectively to obtain a first to-be-detected map containing a boundary frame and a second to-be-detected map containing a boundary frame;

coding is carried out on the basis of an object instance in the boundary frame of the first diagram to be detected and an object instance in the boundary frame of the second diagram to be detected, and a first structural diagram and a second structural diagram are obtained;

obtaining a distillation loss part by combining a preset distillation loss function based on the first structural diagram and the second structural diagram;

obtaining a basic loss part according to the distance between the detection result of the second image to be detected and the real label;

training the student network based on the distillation loss fraction and the base loss fraction.

In one embodiment, the object instances are characterized using nodes and corresponding edges; the nodes comprise foreground nodes and background nodes; the encoding is carried out based on the object instance in the boundary frame of the first diagram to be detected and the object instance in the boundary frame of the second diagram to be detected, so as to obtain a first structural diagram and a second structural diagram, and the method comprises the following steps:

calculating a classification loss value of the background node by using a preset classification loss function;

and after removing the background nodes with the classification loss value smaller than a preset threshold value, coding the rest background nodes and the foreground nodes to obtain the first structural diagram and the second structural diagram.

In one embodiment, the training of the student network based on the distillation loss part and the base loss part comprises

Taking the sum of the distillation lost portion and the base lost portion as the global loss of the student network;

and adjusting network parameters of the student network based on the global loss until the global loss meets a preset condition.

In one embodiment, the object instances are characterized using nodes and corresponding edges; the distillation loss part comprises a foreground node loss part, a background node loss part and an edge loss part; the distillation loss function is:

wherein the content of the first and second substances,

wherein L is_GThe function of the loss of distillation is expressed,

represents distillation loss of the foreground node;

represents the distillation loss of the background node, L_ERepresents an edge loss; lambda [ alpha ]₁Distillation loss weight, λ, for the foreground node₂Weight loss by distillation, λ, for background nodes₃Distillation loss weight as edge; n is a radical of_fgRepresenting the total number of foreground nodes,

a node vector representing the ith foreground node in the teacher network,

a node vector representing an ith foreground node in the student network; n is a radical of_bgRepresenting the total number of background nodes,

a node vector representing the ith background node in the teacher network,

a node vector representing an ith background node in the student network; n represents the total number of edges,

representing the correlation in feature space from the ith node to the jth node in the teacher network,

representing the correlation of the ith node and the jth node in the student network in the feature space.

In one embodiment, the base loss section includes a detection loss function and a KL divergence function.

A method of target detection, the method comprising:

training the student network for image instance detection by utilizing the steps in the embodiment of the knowledge distillation method based on the structured instance graph;

acquiring an image including an object to be detected;

inputting the image to the student network to cause the student network to output a target image containing a prediction box; the prediction box is used for identifying the class label and the position of the object to be detected.

A knowledge distillation apparatus based on a structured example graph, the apparatus comprising:

the image acquisition module is used for acquiring a training image;

the characteristic diagram acquisition module is used for inputting the training image into a backbone network of a teacher network to obtain a first characteristic diagram and inputting the training image into a backbone network of a student network to obtain a second characteristic diagram;

the mapping acquisition module is used for inputting the first feature map and the second feature map into a candidate network of a region respectively to obtain a first mapping containing a boundary frame and a second mapping containing the boundary frame;

the structure diagram output module is used for coding based on an object example in a boundary frame of the first diagram to be detected and an object example in a boundary frame of the second diagram to be detected to obtain a first structure diagram and a second structure diagram;

a distillation loss acquisition module, configured to obtain a distillation loss part by combining a preset distillation loss function based on the first structural diagram and the second structural diagram;

a basic loss acquisition module for obtaining a basic loss part according to the distance between the detection result of the second to-be-detected image and the real label;

and the student network training module is used for training the student network based on the distillation loss part and the basic loss part.

An object detection apparatus, the apparatus comprising:

the image acquisition module is used for acquiring an image comprising an object to be detected;

a target image output module, configured to utilize the student network for image instance detection, which is trained in each step in the above embodiment of the knowledge distillation method based on the structured instance graph, to input the image to the student network, so that the student network outputs a target image including a prediction box; the prediction box is used for identifying the class label and the position of the object to be detected.

A computer device comprising a memory storing a computer program and a processor that when executed implements the steps of any of the above described structured example graph-based knowledge distillation method and target detection method embodiments.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the above-described structured example graph-based knowledge distillation method and target detection method embodiments.

The knowledge distillation method and device based on the structured example graph, the computer equipment and the storage medium acquire the training images, and input the training images into a backbone network of a teacher network and a backbone network of a student network respectively to obtain a first characteristic graph and a second characteristic graph; respectively inputting the first characteristic diagram and the second characteristic diagram into the area candidate network to obtain a first diagram to be detected containing a boundary frame and a second diagram to be detected containing the boundary frame; coding is carried out based on the object examples in each boundary box to obtain a first structural diagram and a second structural diagram; obtaining a distillation loss part by combining a preset distillation loss function based on the first structural diagram and the second structural diagram; obtaining a basic loss part according to the distance between the detection result of the second image to be detected and the real label; and training the student network based on the distillation loss part and the basic loss part. The method uses an example model of a graph structure, fuses the relation between the foreground characteristic and the background characteristic in the image through a preset distillation loss function, so that the foreground characteristic and the background characteristic in a teacher network can be effectively transferred to a student network, the student network can extract more effective knowledge from the teacher network, and the target detection accuracy is further improved.

Drawings

FIG. 1 is a schematic diagram of a distillation structure of a known distillation method based on a structured example graph in one embodiment;

FIG. 2 is a schematic flow diagram of a knowledge distillation process based on a structured example graph in one embodiment;

FIG. 3 is a schematic flow chart diagram of a method for object detection in one embodiment;

FIG. 4 is a block diagram of a knowledge distillation apparatus based on a diagram of a structured example in one embodiment;

FIG. 5 is a block diagram of an embodiment of an object detection device;

FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment;

fig. 7 is an internal structural view of a computer device in another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The knowledge distillation method based on the structured example graph provided by the application can be assisted and understood according to the knowledge distillation structure schematic diagram shown in figure 1. Knowledge distillation is to train a compact neural network by using knowledge gathered and extracted from a large model or a model set, wherein the large model or the model set is called a teacher network, a small and compact model is called a student network, the teacher network generally has high requirements on hardware and usually needs a large server or a server cluster, and the student network can operate in various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices.

In one embodiment, as shown in fig. 2, a knowledge distillation method based on a structured example graph is provided, comprising the steps of:

step S201, acquiring a training image;

the training image is an image used for model construction, the image includes category labels of object categories, positions, ranges and other elements which are marked manually or by a machine, and one image includes an image surrounded by a bounding box with different colors, such as a character image surrounded by a red bounding box and a puppy image surrounded by a yellow bounding box.

Step S202, inputting a training image into a backbone network of a teacher network to obtain a first feature map, and inputting the training image into the backbone network of a student network to obtain a second feature map;

the backbone network refers to a front-end network used for extracting image features in a teacher network or a student network.

Specifically, the training images are respectively input into a teacher network and a student network, backbone networks of the teacher network and the student network are both formed by convolution kernels, each convolution kernel represents a feature extraction mode, namely, one convolution kernel only extracts features in one form, the training images are subjected to feature extraction through the backbone networks of the teacher network and the student network to respectively obtain a first feature map and a second feature map, and the feature maps comprise different features such as color, shape, texture and the like.

Step S203, inputting the first characteristic diagram and the second characteristic diagram into a candidate network of the area respectively to obtain a first diagram to be detected containing a boundary frame and a second diagram to be detected containing the boundary frame;

as shown in fig. 1, the model further includes a Region candidate Network (RPN), where the RPN is configured to output a to-be-detected map including bounding boxes, where the bounding boxes are mainly used to distinguish a foreground from a background in the first feature map and the second feature map, and the bounding boxes are used to respectively frame the foreground from the background so as to input a next layer for further detection and analysis. In this step, the first feature map and the second feature map are extracted by RPN to obtain a first inspection map containing a bounding box and a second inspection map containing a bounding box, respectively.

Step S204, coding is carried out based on an object example in a boundary frame of a first diagram to be detected and an object example in a boundary frame of a second diagram to be detected, and a first structural diagram and a second structural diagram are obtained;

the object instance refers to a specific object which is expected to be identified in the image, such as a red balloon, a blue balloon, a pet dog, a building, a person and the like, different individuals in the same object are distinguished by the instance, for example, a plurality of people exist in the image, and each person is called an object instance.

Specifically, first, the first inspection diagram containing the bounding box and the second inspection diagram containing the bounding box obtained in step S203 are respectively transferred to a next layer for inspection, the next layer is an ROI Pooling layer in fig. 1, the ROI Pooling layer is used to pool (i.e. downsample) the first inspection diagram and the second inspection diagram, and output to obtain a Region of Interest (Region of Interest), wherein the student network and the teacher network share a sampling Region for aligning their distillation objects, and the teacher network and the student network can simultaneously distill for the same object instance. Further, a representation form based on a graph structure is used in the present application, and therefore, it is also necessary to represent the interested areas by using the graph structure, where the graph structure represents object instances in the interested areas by using nodes and edges; wherein, the node is used for representing the attribute characteristics (such as color, texture, shape and the like) of the object instance, and the edge is used for representing the relationship characteristics (such as the distance between the nodes) between the nodes; these nodes and edges are all represented by corresponding matrices, so all nodes and edges need to be encoded to obtain corresponding matrices. And finally obtaining the graph structures of all the nodes, wherein the teacher network obtains a corresponding first structural graph, and the student network obtains a corresponding second structural graph.

Optionally, the encoding process may be self-encoding using a corresponding model, or may be represented by manual encoding.

The relationship features include node correlation, which is measured using cosine similarity between node vectors.

Step S205, based on the first structural diagram and the second structural diagram, combining a preset distillation loss function to obtain a distillation loss part;

wherein the predetermined distillation loss function can be regarded as the difference loss between the first structural diagram of the teacher network and the second structural diagram of the student network, and respectively comprises the graph node loss L_VAnd loss of graph edge L_EThe present application also calculates such losses using the euclidean distance function, i.e. the distance between the corresponding object instances in the teacher network and the student network:

wherein, L_GThe function of the loss of distillation is expressed,

represents distillation loss of the foreground node;

represents the distillation loss of the background node, L_ERepresents an edge loss; lambda [ alpha ]₁Distillation loss weight, λ, for the foreground node₂Weight loss by distillation, λ, for background nodes₃Weight is lost for distillation of the side. The distillation loss fraction was calculated using the distillation loss function described above.

And step S206, obtaining a basic loss part according to the distance between the detection result of the second to-be-detected image and the real label.

Wherein, the real label refers to the label which is already marked by a machine or a man.

Specifically, there is also a loss function in the original classification task, which is determined according to the difference between the prediction result of the student network and the true tag, and the loss function may be, for example, softmax loss function (softmax loss) or bbox regression loss function (bbox regression loss, also called border regression loss function).

And step S207, training the student network based on the distillation loss part and the basic loss part.

Specifically, the total of the distillation loss part and the basic operation part is used as a global loss function, and in the training process, network parameters in the student network are continuously adjusted to enable the global loss function value to reach a preset condition, wherein the preset condition can be that the global loss function value reaches a minimum value or reaches a minimum value within a certain range.

In the embodiment, the training images are respectively input into the backbone network of the teacher network and the backbone network of the student network by acquiring the training images to obtain the first characteristic diagram and the second characteristic diagram; respectively inputting the first characteristic diagram and the second characteristic diagram into the area candidate network to obtain a first diagram to be detected containing a boundary frame and a second diagram to be detected containing the boundary frame; coding is carried out based on the object examples in each boundary box to obtain a first structural diagram and a second structural diagram; obtaining a distillation loss part by combining a preset distillation loss function based on the first structural diagram and the second structural diagram; obtaining a basic loss part according to the distance between the detection result of the second image to be detected and the real label; and training the student network based on the distillation loss part and the basic loss part. The method uses an example model of a graph structure, fuses the relation between the foreground characteristic and the background characteristic in the image through a preset distillation loss function, so that the foreground characteristic and the background characteristic in a teacher network can be effectively transferred to a student network, the student network can extract more effective knowledge from the teacher network, and the accuracy of target detection is further improved.

In one embodiment, the object instances are characterized by using nodes and corresponding edges; the nodes comprise foreground nodes and background nodes; the step S204 includes: calculating a classification loss value of the background node by using a preset classification loss function; and after removing the background nodes with the classification loss value smaller than the preset threshold value, coding the rest background nodes and foreground nodes to obtain a first structural diagram and a second structural diagram.

Specifically, in the present application, object instances are characterized by using nodes and edges, where the nodes include foreground nodes and background nodes, and the present application obtains feature establishing nodes based on RoI pooling, that is, classifying the nodes based on IoU (Intersection over Union, i.e., Intersection area of two rectangular boxes/Union area of two rectangular boxes). The node sets in each graph G are represented as

Wherein the content of the first and second substances,

a feature vector representing the ith foreground feature,

a feature vector representing the ith background feature. In this node set, n and m represent the data of the object instance of the foreground and the number of the object instances of the background, respectively, in each graph.

The set of edges in each graph G is denoted as E ═ E_ij]_k×kWhere k represents the size of the node set, e_ijRepresents the correlation of the ith node and the jth node in the feature space:

e_ij:＝sim_function(v_j，v_j)

sim_function(v_i，v_j) For the ith node v_iAnd j node v_jA correlation function between. In the present application, cosine similarity may be adopted as a measure of correlation between two nodes, that is:

wherein v is_iIs a node vector of the ith node, v_jIs the node vector of the jth node. Cosine similarity is not affected by the two node vector modes. We assume that there are edges between every two nodes in graph G, and graph G is a complete graph. In practice, because the correlation metric is a symmetric function, e is for each of i and j_ij＝_jiI.e., the edge from node i to node j is equal to the edge from node j to node i.

For the above-mentioned complete graph, distilling the whole non-sparse edge similarity matrix may adversely affect the training because a large number of background nodes may generate many useless edges, and in the distilling, many unproductive loss function values may be brought, and the affected model may converge. If all the edges relevant to the background are pruned, an edge set only containing foreground nodes is established, and a large amount of information is lost, so that a background sample mining method is designed to mine some background nodes meeting the expectation of people to construct a graph with information value.

The background sample mining method in the application is to select nodes added into a graph based on the RoI classification loss, namely, a preset classification loss function is used for calculating classification loss values of background nodes, after the background nodes with the classification loss values smaller than a preset threshold value are removed, the rest background nodes are added into the graph to be constructed, the rest background nodes can be considered as nodes which are easily classified to the foreground in a wrong mode, therefore, the nodes and the foreground nodes are coded together, the background sample mining method is adopted for construction of graphs in a teacher network and a student network, and finally a first structural graph and a second structural graph are obtained.

In the embodiment, by adopting the background sample mining method, a part of useless background nodes are removed, and nodes having a large relation with the foreground are left, so that the model can be simplified, the calculation amount is reduced, and effective knowledge can be transferred from a teacher network to a student network without being omitted.

In an embodiment, the object instance is characterized by using nodes and corresponding edges, the distillation loss part includes a robbery node loss part, a background node loss part and an edge loss part, and the distillation loss function is:

wherein the content of the first and second substances,

L_Grepresenting a distillation loss function that can be viewed as a difference loss between an object instance of a teacher network and an object instance of a student network, each comprising a graphNode loss L_VAnd loss of graph edge L_E(ii) a In the present application, the two losses are calculated using the euclidean distance function. In the above formula, the first and second carbon atoms are,

represents distillation loss of the foreground node;

a node vector representing the ith foreground node in the teacher network,

a node vector representing the ith background node in the teacher network,

According to the cross validation, dividing lambda₁And λ₃Are all set to 0.5, lambda₂Loss weight designed to be adaptive:

where α is a hyperparameter achieving the same order of magnitude as the other losses.

The above embodiment adjusts the parameters of the student network by accounting for the loss of graph nodes and the loss of graph edges in the distillation loss function, the graph node loss being the loss generated in the node set, and it performs a pixel-level matching of the object instance features of the student network with the object instance features of the teacher network, and generally speaking, directly matching the feature graph between two networks is a simple and direct distillation method. However, in the detection model, not all pixels in the feature map can subsequently generate classification and bounding box regression loss. Compared to using a global feature map, we sample the foreground and background features of the image to compute the map node loss, which causes the student model to focus more on the RoI and valuable knowledge. The edge loss of the graph is the loss generated in the set of edges that causes the nodes of the student network to align with the correlations generated by the nodes of the teacher network. Distillation at the pixel level alone does not fully exploit the potential for knowledge migration from teacher networks to student networks. Since the correlation of high-order semantics is not well distilled in training, the designed edge loss function directly promotes the optimization of the pair-wise correlation. Therefore, in order to align the topological relationship between the students and the teachers, it is necessary to design edge loss so as to capture the global structural information in the detection network model.

Specifically, the global loss function of the student network includes a base loss part and a distillation loss part, and the global loss function is expressed as follows:

L＝L_Det+L_G+L_logits

wherein L is_DetThe method is a detection loss function, and commonly used detection loss functions include loss functions such as L1, L2 and Smooth L1, and can be selected and adjusted according to actual needs; l is_GIs the distillation loss fraction, L, above_logitsIs KLD loss function (KLD) in classification and regression, i.e. KL divergence function.

In the embodiment, the basic loss function and the distillation loss function are used as the global loss function together, and training is performed on the student network, so that the performance of the student network can be further optimized, and the student network can learn more comprehensive knowledge from a teacher network.

In an embodiment, a target detection method is further provided, as shown in fig. 3, including the following steps:

step S301, acquiring an image including an object to be detected;

specifically, given any push, the image contains objects to be detected, such as a specific person and its location, which are contained in the image to be detected.

Step S302, inputting the image to the student network so that the student network outputs a target image containing a prediction frame; the prediction box is used for identifying the class label and the position of the object to be detected. The student network is obtained by training through the steps in the method embodiments.

Specifically, a student network which can be used for target detection is obtained by training using the knowledge distillation method, the image to be detected is input into the trained student network, the student network can output a target image containing a prediction box, the prediction box can mark the category and the position of the target for a desired target area, for example, a desired person and the position of the person in the image are marked by using a box, wherein the object a is marked by using a red box, the object B is marked by using a green box, and the like.

In the embodiment, the student network is obtained by training by using the methods in the knowledge distillation method embodiments, and the student network can extract deeper knowledge from the teacher network, can be used for detecting the image target and outputting the target position and the class, and further improves the performance of the student network.

It should be understood that although the various steps in the flow charts of fig. 1-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-3 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 4, there is provided a knowledge distillation apparatus 400 based on a structured example graph, comprising: the system comprises an image acquisition module 401, a characteristic map acquisition module 402, a to-be-detected map acquisition module 403, a structure diagram output module 404, a distillation loss acquisition module 405, a basic loss acquisition module 406 and student network training 407, wherein:

an image acquisition module 401, configured to acquire a training image;

a feature map obtaining module 402, configured to input the training image to a backbone network of a teacher network to obtain a first feature map, and input the training image to a backbone network of a student network to obtain a second feature map;

a diagram to be detected acquisition module 403, configured to input the first feature diagram and the second feature diagram into a candidate network of a region, respectively, to obtain a first diagram to be detected including a bounding box and a second diagram to be detected including the bounding box;

a structure diagram output module 404, configured to perform encoding based on an object instance in a boundary frame of the first to-be-detected map and an object instance in a boundary frame of the second to-be-detected map to obtain a first structure diagram and a second structure diagram;

a distillation loss obtaining module 405, configured to obtain a distillation loss part based on the first structural diagram and the second structural diagram by combining a preset distillation loss function;

a base loss obtaining module 406, configured to obtain a base loss portion according to a distance between a detection result of the second to-be-detected map and the real label;

a student network training module 407, configured to train the student network based on the distillation loss part and the base loss part.

In one embodiment, the object instances are characterized by using nodes and corresponding edges; the nodes comprise foreground nodes and background nodes; the structure diagram output module 404 is further configured to calculate a classification loss value of the background node by using a preset classification loss function; and after removing the background nodes with the classification loss value smaller than a preset threshold value, coding the rest background nodes and the foreground nodes to obtain the first structural diagram and the second structural diagram.

In an embodiment, the student network training module 407 is further configured to take the sum of the distillation loss fraction and the base loss fraction as the global loss of the student network; and adjusting network parameters of the student network based on the global loss until the global loss meets a preset condition.

wherein the content of the first and second substances,

wherein L is_GThe function of the loss of distillation is expressed,

represents distillation loss of the foreground node;

a node vector representing the ith foreground node in the teacher network,

a node vector representing the ith background node in the teacher network,

representing the i-th node and the j-th node in the student networkCorrelation in feature space.

In one embodiment, as shown in fig. 5, there is also provided an object detecting apparatus 500, including: an image acquisition module 501, a target image output module 502; wherein the content of the first and second substances,

an image obtaining module 501, configured to obtain an image including an object to be detected;

a target image output module 501, configured to utilize the student network for image instance detection obtained through training in the above embodiment of the knowledge distillation method based on the structured instance graph, and input the image to the student network, so that the student network outputs a target image including a prediction box; the prediction box is used for identifying the class label and the position of the object to be detected.

For specific limitations of the knowledge distillation apparatus and the target detection apparatus based on the structured example diagram, reference may be made to the above limitations of the knowledge distillation method and the target detection method based on the structured example diagram, which are not described herein again. The various modules in the knowledge distillation apparatus and the target detection apparatus based on the structured example diagram described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store image characteristic data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a knowledge distillation method or an object detection method based on a structured example graph.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a knowledge distillation method or an object detection method based on a structured example graph. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the configurations shown in fig. 6-7 are only block diagrams of some of the configurations relevant to the present disclosure, and do not constitute a limitation on the computing devices to which the present disclosure may be applied, and that a particular computing device may include more or less components than shown in the figures, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory having a computer program stored therein and a processor that when executed implements the steps in the knowledge distillation method embodiment and the object detection method embodiment as described above based on the structured example graph.

In one embodiment, a computer readable storage medium is provided, having stored thereon a computer program that, when executed by a processor, performs the steps in the structured instance graph-based knowledge distillation method embodiment and the object detection method embodiment as described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A knowledge distillation method based on a structured example graph, the method comprising:

acquiring a training image;

2. The method of claim 1, wherein the object instances are characterized using nodes and corresponding edges; the nodes comprise foreground nodes and background nodes; the encoding is carried out based on the object instance in the boundary frame of the first diagram to be detected and the object instance in the boundary frame of the second diagram to be detected, so as to obtain a first structural diagram and a second structural diagram, and the method comprises the following steps:

3. The method of claim 1, wherein training the student network based on the distillation loss fraction and the base loss fraction comprises training the student network based on the distillation loss fraction and the base loss fraction

4. The method of claim 3, wherein the object instances are characterized using nodes and corresponding edges; the distillation loss part comprises a foreground node loss part, a background node loss part and an edge loss part; the distillation loss function is:

wherein the content of the first and second substances,

wherein L is_GThe function of the loss of distillation is expressed,

represents distillation loss of the foreground node;

a node vector representing the ith foreground node in the teacher network,

a node vector representing the ith background node in the teacher network,

5. The method according to any of claims 1 to 4, wherein the base loss fraction comprises a detection loss function and a KL divergence function.

6. A method of object detection, the method comprising:

training the student network for image instance detection using the method of any one of claims 1 to 5;

acquiring an image including an object to be detected;

7. A knowledge distillation apparatus based on a structured example graph, the apparatus comprising:

the image acquisition module is used for acquiring a training image;

8. An object detection apparatus, characterized in that the apparatus comprises:

a target image output module, configured to utilize the student network for image instance detection trained by the method according to any one of claims 1 to 5, input the image to the student network, so that the student network outputs a target image including a prediction box; the prediction box is used for identifying the class label and the position of the object to be detected.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.