CN116227556A - Method, device, computer equipment and storage medium for acquiring target network model - Google Patents

Method, device, computer equipment and storage medium for acquiring target network model Download PDF

Info

Publication number
CN116227556A
CN116227556A CN202310253048.2A CN202310253048A CN116227556A CN 116227556 A CN116227556 A CN 116227556A CN 202310253048 A CN202310253048 A CN 202310253048A CN 116227556 A CN116227556 A CN 116227556A
Authority
CN
China
Prior art keywords
network model
target network
convolution
data set
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310253048.2A
Other languages
Chinese (zh)
Inventor
江宁
卿海峰
石璐瑶
吴文青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202310253048.2A priority Critical patent/CN116227556A/en
Publication of CN116227556A publication Critical patent/CN116227556A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention relates to the technical field of artificial intelligence algorithms, in particular to a method, a device, computer equipment and a storage medium for acquiring a target network model, wherein the method comprises the following steps: acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified; in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the student network model, a part of convolution data set is weighted into a residual convolution data set after being processed by the residual structure, local characteristics of the convolution data set are enhanced, and then the partial convolution data set is input into a second convolution layer, and the overall parameter number of the whole network model is hardly influenced by the added residual structure, so that the characteristic enhancement is carried out on the data, and further the performance of the network model is improved.

Description

Method, device, computer equipment and storage medium for acquiring target network model
Technical Field
The present invention relates to the field of artificial intelligence algorithms, and in particular, to a method, an apparatus, a computer device, and a storage medium for acquiring a target network model.
Background
Along with the increasing application field of deep learning, the functional complexity of the deep learning network model is increased, and although the accuracy of identifying or classifying by adopting the network model is improved, the structure and the capacity of the model are improved continuously, and the hardware requirement matched with the model is also complicated, so that the model cannot be applied to miniaturized products.
Therefore, how to reduce the complexity of the network model to ensure the performance thereof is a technical problem to be solved at present.
Disclosure of Invention
The present invention has been made in view of the above problems, and provides a method, apparatus, computer device, and storage medium for acquiring a target network model that overcomes or at least partially solves the above problems.
In a first aspect, the present invention provides a method for obtaining a target network model, including:
acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified;
in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the target network model, part of the convolution data set is weighted into a residual convolution data set after being processed by the residual structure, and local features of the convolution data set are enhanced and then input into a second convolution layer.
Further, the residual structure includes:
a first layer elimination influence layer, which is used for eliminating the first abnormal data in the partial convolution data set;
the residual processing layer is used for carrying out data enhancement on partial convolution data after abnormal data are eliminated, and weighting the partial convolution data to a residual convolution data set to obtain a convolution data set with enhanced local characteristics;
and a second layer of elimination influence layer is used for eliminating second abnormal data in the convolution data set with the enhanced local characteristics.
Further, the residual structure is specifically configured to:
performing feature enhancement on part of the convolution data to obtain feature enhancement data:
v 2 =∑ i p i 2 /c
Figure SMS_1
wherein p is i For any data in any vector in the partial convolution data set, c is the number of channels of the partial convolution data set, v 2 For the mean square calculation of any vector in a partial convolution dataset, p i ' is any vector in a vector set obtained by normalizing the mean square calculation result, gamma and beta are trainable super-parameters, and E is a positive number for preventing denominator from being zero;
and fusing any vector in the normalized vector set with the same position data in the residual convolution data set to obtain the convolution data set with enhanced local characteristics.
Further, in the student network model, adding a residual structure between two initially adjacent convolution layers to obtain a target network model, further includes:
obtaining local feature similarity differences and output differences between the target network model and the teacher network model and differences between the target network model and a real target;
and determining the total network loss of the target network model based on the local feature similarity difference and the output difference between the target network model and the teacher network model and the difference between the target network model and the real target.
Further, the obtaining the local feature similarity difference between the target network model and the teacher network model includes:
acquiring a plurality of first characteristic outputs of the target network model and second characteristic outputs of the same position in the teacher network model;
and determining a local feature similarity difference between the target network model and the teacher network model based on the first feature output and the second feature output.
Further, the determining the total network loss of the target network model based on the local feature similarity difference between the target network model and the teacher network model, the output difference, and the difference between the target network model and the real target includes:
determining a total network loss of the target network model based on local feature similarity differences between the target network model and the teacher network model, output differences, and differences of the target network model and a real target in the following manner:
L LFNE =αL KD +(1-α)L CE +βL SLFN
wherein L is LFNE L is the total network loss of the target network model SLFN L is the difference of local feature similarity between the target network model and the teacher network model KD L is the output difference between the target network model and the teacher network model CE For the difference of the target network model from the real target, α and β are hyper-parameters for tuning.
Further, after determining the total loss of the target network model based on the local feature similarity difference between the target network model and the teacher network model, the output difference, and the difference between the target network model and the real target, the method further includes:
and adjusting parameters of the target network model based on the total loss of the target network model to obtain an optimized target network model.
In a second aspect, the present invention further provides an apparatus for obtaining a target network model, including:
the system comprises an acquisition module, a calculation module and a calculation module, wherein the acquisition module is used for acquiring a teacher network model and a student network model, and the student network model is a model of which the teacher network model is simplified;
and the adding module is used for adding a residual structure between the two initial adjacent convolution layers in the student network model to obtain a target network model, so that in the convolution data set obtained after the training data set passes through the first convolution layer in the student network model, part of the convolution data set is weighted into the residual convolution data set after the processing of the residual structure, so that the local characteristics of the convolution data set are enhanced, and the local characteristics are input into the second convolution layer.
In a third aspect, the invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method steps described in the first aspect when executing the program.
In a fourth aspect, the invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method steps described in the first aspect.
One or more technical solutions in the embodiments of the present invention at least have the following technical effects or advantages:
the invention provides a method for acquiring a target network model, which comprises the following steps: acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified; in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the student network model, a part of convolution data set is weighted into a residual convolution data set after being processed by the residual structure, local characteristics of the convolution data set are enhanced, and then the partial convolution data set is input into a second convolution layer, and the overall parameter number of the whole network model is hardly influenced by the added residual structure, so that the characteristic enhancement is carried out on the data, and further the performance of the network model is improved.
Based on the training data set VOC, using YOLOV5M as a teacher network model and YOLOV5s as a student network model, the final target network model obtained only had a parameter size of 7.4M, and an accuracy of 67.35% was obtained.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also throughout the drawings, like reference numerals are used to designate like parts. In the drawings:
FIG. 1 is a flowchart illustrating steps of a method for acquiring a target network model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a student network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of another student network model according to an embodiment of the invention;
FIG. 4 shows a schematic structural diagram of a residual structure LFNR in an embodiment of the invention;
FIG. 5 is a schematic diagram showing a teacher network model, a student network model and an added residual structure used in obtaining a target network model in an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an apparatus for acquiring a target network model according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computer device for implementing a method for acquiring a target network model in an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example 1
The embodiment of the invention provides a method for acquiring a target network model, as shown in fig. 1, comprising the following steps:
s101, acquiring a teacher network model and a student network model, wherein the student network model is a model with a simplified teacher network model;
s102, adding a residual structure between two initial adjacent convolution layers in the student network model to obtain a target network model, so that in the convolution data set obtained after the training data set passes through a first convolution layer in the target network model, part of the convolution data set is weighted into the residual convolution data set after the processing of the residual structure, so that the local characteristics of the convolution data set are enhanced, and then the local characteristics are input into a second convolution layer.
In order to be able to apply some high-performance recognition algorithms on small devices (such as cell phones, cameras, drones, etc.) to recognize objects, a low-power, high-performance small model is required.
The invention provides a method for acquiring a target network model, which can acquire the target network model, and compared with the existing identification model, the high target network model has no increase in parameter capacity and effectively improved performance.
The target network model in the invention is obtained by improving the knowledge distillation algorithm.
The conventional knowledge distillation algorithm is to input sample data to an untrained student network model and a trained teacher network model to obtain output results of the student network model and the teacher network model, determine a divergence comparison difference between the two based on the output results of the two, compare the output result of the student network model with a standard result to obtain cross entropy loss, and finally adjust internal parameters of the student network model based on the divergence comparison difference and the cross entropy loss.
The method for acquiring the target network model in the invention comprises the following steps:
s101, acquiring a teacher network model and a student network model, wherein the student network model is a model with the simplified teacher network model.
The teacher network model is a trained model, the student network model is an untrained model, and the teacher network model comprises a plurality of functional blocks (blocks), and each functional Block comprises a plurality of structural layers (layers). The student network models have the same structure, but the parameter amounts of the student network models are smaller.
Next, the student network model is improved, specifically as follows:
s102, adding a residual structure between two initial adjacent convolution layers in the student network model to obtain a target network model, so that in the convolution data set obtained after the training data set passes through a first convolution layer in the target network model, part of the convolution data set is weighted into the residual convolution data set after the processing of the residual structure, so that the local characteristics of the convolution data set are enhanced, and then the local characteristics are input into a second convolution layer.
The residual structure is specifically added between the two convolution layers that are initially adjacent to the student network model. Specifically, taking two kinds of student network models as an example, as shown in fig. 2, the first kind of student network model is that an initial convolutional layer CONV of the student network model is followed by other structural layers (e.g., BN), and the residual structure LFNR is added between the first layer of convolutional layer CONV and a second layer of convolutional layer CONV that occurs subsequently. As shown in fig. 3, the initial convolutional layer CONV of the second student network model is still followed by the convolutional layer CONV, and thus the residual structure LFNR is increased between the initial two convolutional layers CONV.
The residual structure is described in detail below.
As shown in fig. 4, a schematic structural diagram of the residual structure LFNR is shown. The residual structure includes:
a first layer elimination influencing layer RELU for eliminating first abnormal data in a partial convolution data set;
the residual processing layer LFN is used for carrying out data enhancement on partial convolution data after the abnormal data is eliminated, and weighting the partial convolution data into a residual convolution data set to obtain a convolution data set with enhanced local characteristics;
the second layer eliminates the influence layer RELU, is used for eliminating the second abnormal data in the convolution data set with enhanced local characteristics.
The residual processing layer LFN is a local feature normalization processing mode, and has a small parameter quantity, and almost does not influence the overall parameter quantity of the network model.
Specifically, the residual processing layer LFN is specifically configured to:
performing feature enhancement on part of the convolution data to obtain feature enhancement data:
v 2 =∑ i p i 2 /c
Figure SMS_2
wherein p is i For any data in any vector in the partial convolution data set, c is the number of channels of the partial convolution data set, v 2 For the mean square calculation of any vector in a partial convolution dataset, p i ' is any vector in a vector set obtained by normalizing a mean square calculation result, gamma and beta are trainable super-parameters, and epsilon is a positive number for preventing denominator from being zero.
And fusing any vector in the vector set obtained after normalization processing with data in the same position in the residual convolution data set to obtain the convolution data set with enhanced local characteristics.
The mean square calculation result v is obtained by means of the mean square calculation of the vectors P= { P1, P2 … Pn } at the same position in the training data set z 2 N is the number of vectors, and the numerical value of the local feature can be enhanced through the mean square processing. Then, by normalizing to a unified interval, a vector set P' is obtained, and any vector P in the vector set i ' calculated by the second equation above. And fusing any vector in the vector set P' with the same position data in the residual convolution data set, thereby obtaining the convolution data with enhanced local characteristics.
The eliminating influence layer RELU, namely the first eliminating influence layer and the second eliminating influence layer, is arranged before and after the residual processing layer LFN, and aims to eliminate abnormal data in the training data set, such as eliminating negative values in the training data set or eliminating larger values or smaller values generated after the residual processing layer LFN processing, so as to ensure that the training data are all effective data.
The processing mode of feature enhancement can improve the performance of the student network model, and further obtain the target network model.
Of course, in order to optimize the target network model, it is also possible to implement by adjusting parameters of the target network model.
Specifically, after obtaining the target network model, the method further includes:
obtaining local feature similarity differences and output differences between the target network model and the teacher network model, and differences between the target network model and a real target;
and determining the total network loss of the target network model based on the local feature similarity difference and the output difference between the target network model and the teacher network model and the difference between the target network model and the real target.
The method for obtaining the local feature similarity difference between the target network model and the teacher network model comprises the following steps:
acquiring a plurality of first characteristic data of a target network model and outputting second characteristics of the same position in a teacher network model;
and determining a local feature similarity difference between the target network model and the teacher network model based on the first feature output and the second feature output.
As shown in fig. 5, if the teacher network model includes three functional blocks (blocks), the corresponding student network model also has three corresponding functional blocks (blocks). Taking a Block as an example,
Figure SMS_3
wherein the method comprises the steps of,F t For the training data set input to the teacher network model, LFN is the normalization process of local features, MAXPOOL is the operation of maximum pooling,
Figure SMS_4
and outputting a second characteristic of the functional block in the teacher network model.
Figure SMS_5
F s To input a training data set of the student network model,
Figure SMS_6
and outputting the first characteristic of the functional block at the same position of the target network model and the teacher network model. />
After obtaining the first feature output and the second feature output, obtaining the local feature similarity difference between the target network model and the teacher network model based on the first feature output and the second feature output, and dividing the network into three functional blocks (blocks), wherein the steps are repeated for three times:
Figure SMS_7
wherein L is SLFN And (3) calculating MSE as mean square error for local feature similarity difference between the target network model and the teacher network model.
For the output difference L KD I.e. the difference between the final output of the target network model and the final output of the teacher network model. The final output of the teacher model is p t The final output of the student model is p s :
L KD =KL(p s ,p t )
For the difference L between the target network model and the real target CE Is the difference of the final output of the target network model from the real target. The label of the real target is y:
L CE =CE(p s ,y)
thus, the total network loss of the target network model is obtained according to the following formula:
L LFNE =αL KD +(1-α)L CE +βL SLFN
where α and β are hyper-parameters for modulation.
After determining the total network loss of the target network model, parameters of the target network model are adjusted based on the total network loss of the target network model to obtain an optimized target network model.
Specifically, the overall network loss is propagated in opposite phase by utilizing the gradient descent principle to train the parameters of the target network model, and the parameters are cycled back and forth to set times to obtain an optimized target network model.
Finally, the target network model may be validated.
The teacher network model, the student network model and the residual structure are used in obtaining the target network model, and the residual structure is specifically shown in fig. 5.
By adding the residual structure in the student network model, the performance of the network can be greatly improved under the condition that the overall parameter capacity of the network is not influenced. Therefore, the lightweight network model can also meet the performance requirements of hardware deployment, and the deployment and calculation cost of hardware are reduced. By calculating the total network loss, the local characterization features can be extracted from the teacher network; the obtained local features are passed to the student to guide the student's network learning. Therefore, the performance of the student network can even exceed that of a teacher model, the cost of correspondingly deploying hardware is effectively reduced, the power consumption of products is reduced, and the practical problem of the deep learning model in application is solved. The method is suitable for future development demands, and can be rapidly and efficiently applied to models with increasingly-increased complexity.
One or more technical solutions in the embodiments of the present invention at least have the following technical effects or advantages:
the invention provides a method for acquiring a target network model, which comprises the following steps: acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified; in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the student network model, a part of convolution data set is weighted into a residual convolution data set after being processed by the residual structure, local characteristics of the convolution data set are enhanced, and then the partial convolution data set is input into a second convolution layer, and the overall parameter number of the whole network model is hardly influenced by the added residual structure, so that the characteristic enhancement is carried out on the data, and further the performance of the network model is improved.
Based on the training data set VOC, using YOLOV5M as a teacher network model and YOLOV5s as a student network model, the final target network model obtained only had a parameter size of 7.4M, and an accuracy of 67.35% was obtained.
Example two
Based on the same inventive concept, the embodiment of the present invention further provides an apparatus for acquiring a target network model, as shown in fig. 6, including:
an obtaining module 601, configured to obtain a teacher network model and a student network model, where the student network model is a model simplified by the teacher network model;
and an adding module 602, configured to add a residual structure between two initially adjacent convolution layers in the student network model, so as to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the student network model, a part of the convolution data set is weighted into a remaining convolution data set after being processed by the residual structure, so that local features of the convolution data set are enhanced, and then the local features are input into a second convolution layer.
In an alternative embodiment, the residual structure comprises:
a first layer elimination influence layer, which is used for eliminating the first abnormal data in the partial convolution data set;
the residual processing layer is used for carrying out data enhancement on partial convolution data after abnormal data are eliminated, and weighting the partial convolution data to a residual convolution data set to obtain a convolution data set with enhanced local characteristics;
and a second layer of elimination influence layer is used for eliminating second abnormal data in the convolution data set with the enhanced local characteristics.
In an alternative embodiment, the residual structure is specifically configured to:
performing feature enhancement on part of the convolution data to obtain feature enhancement data:
v 2 =∑ i p i 2 /c
Figure SMS_8
wherein p is i For any data in any vector in the partial convolution data set, c is the number of channels of the partial convolution data set, v 2 For the mean square calculation of any vector in a partial convolution dataset, p i ' is any vector in a vector set obtained by normalizing the mean square calculation result, gamma and beta are trainable super-parameters, and E is a positive number for preventing denominator from being zero;
and fusing any vector in the normalized vector set with the same position data in the residual convolution data set to obtain the convolution data set with enhanced local characteristics.
In an alternative embodiment, the method further comprises: a network total loss determination module comprising:
the acquisition unit is used for acquiring local feature similarity differences and output differences between the target network model and the teacher network model and differences between the target network model and a real target;
and the determining unit is used for determining the total network loss of the target network model based on the local feature similarity difference and the output difference between the target network model and the teacher network model and the difference between the target network model and the real target.
In an alternative embodiment, the obtaining unit is configured to determine the total network loss of the target network model based on the local feature similarity difference between the target network model and the teacher network model, the output difference, and the difference between the target network model and the real target in the following manner:
L LFNE =αL KD +(1-α)L CE +βL SLFN
wherein L is LFNE L is the total network loss of the target network model SLFN L is the difference of local feature similarity between the target network model and the teacher network model KD L is the output difference between the target network model and the teacher network model CE For the difference of the target network model from the real target, α and β are hyper-parameters for tuning.
In an alternative embodiment, the network total loss determination module further includes:
and the adjusting unit is used for adjusting the parameters of the target network model based on the total loss of the target network model so as to obtain an optimized target network model.
Example III
Based on the same inventive concept, an embodiment of the present invention provides a computer device, as shown in fig. 7, including a memory 704, a processor 702, and a computer program stored on the memory 704 and executable on the processor 702, where the processor 602 implements the steps of the method for obtaining a target network model described above when executing the program.
Where in FIG. 7 a bus architecture (represented by bus 700), bus 700 may comprise any number of interconnected buses and bridges, with bus 700 linking together various circuits, including one or more processors, as represented by processor 702, and memory, as represented by memory 704. Bus 700 may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., as are well known in the art and, therefore, will not be described further herein. Bus interface 706 provides an interface between bus 700 and receiver 701 and transmitter 703. The receiver 701 and the transmitter 703 may be the same element, i.e. a transceiver, providing a unit for communicating with various other apparatus over a transmission medium. The processor 702 is responsible for managing the bus 700 and general processing, while the memory 704 may be used to store data used by the processor 702 in performing operations.
Example IV
Based on the same inventive concept, an embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described method of obtaining a target network model.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the means for obtaining a model of a target network, computer device, according to embodiments of the invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

Claims (10)

1. A method for obtaining a target network model, comprising:
acquiring a teacher network model and a student network model, wherein the student network model is a model of which the teacher network model is simplified;
in the student network model, a residual structure is added between two initial adjacent convolution layers to obtain a target network model, so that in a convolution data set obtained after a training data set passes through a first convolution layer in the target network model, part of the convolution data set is weighted into a residual convolution data set after being processed by the residual structure, and local features of the convolution data set are enhanced and then input into a second convolution layer.
2. The method of claim 1, wherein the residual structure comprises:
a first layer elimination influence layer, which is used for eliminating the first abnormal data in the partial convolution data set;
the residual processing layer is used for carrying out data enhancement on partial convolution data after abnormal data are eliminated, and weighting the partial convolution data to a residual convolution data set to obtain a convolution data set with enhanced local characteristics;
and a second layer of elimination influence layer is used for eliminating second abnormal data in the convolution data set with the enhanced local characteristics.
3. The method according to claim 1, wherein the residual structure is specifically configured to:
performing feature enhancement on part of the convolution data to obtain feature enhancement data:
v 2 =∑ i p i 2 /c
Figure FDA0004128490770000011
wherein p is i For any data in any vector in the partial convolution data set, c is the number of channels of the partial convolution data set, v 2 For the mean square calculation of any vector in a partial convolution dataset, p i For any vector in a vector set obtained by normalizing the mean square calculation result, gamma and beta are trainable super-parameters, and E is a positive number for preventing denominator from being zero;
and fusing any vector in the normalized vector set with the same position data in the residual convolution data set to obtain the convolution data set with enhanced local characteristics.
4. The method of claim 1, wherein adding a residual structure between two initially adjacent convolution layers in the student network model, after obtaining a target network model, further comprises:
obtaining local feature similarity differences and output differences between the target network model and the teacher network model and differences between the target network model and a real target;
and determining the total network loss of the target network model based on the local feature similarity difference and the output difference between the target network model and the teacher network model and the difference between the target network model and the real target.
5. The method of claim 4, wherein the obtaining the local feature similarity differences between the target network model and the teacher network model comprises:
acquiring a plurality of first characteristic outputs of the target network model and second characteristic outputs of the same position in the teacher network model;
and determining a local feature similarity difference between the target network model and the teacher network model based on the first feature output and the second feature output.
6. The method of claim 4, wherein the determining the total network loss for the target network model based on the local feature similarity differences between the target network model and the teacher network model, the output differences, and the differences between the target network model and the real target comprises:
determining a total network loss of the target network model based on local feature similarity differences between the target network model and the teacher network model, output differences, and differences of the target network model and a real target in the following manner:
L LFNE =αL KD +(1-α)L CE +βL SLFN
wherein L is LFNE L is the total network loss of the target network model SLFN L is the difference of local feature similarity between the target network model and the teacher network model KD L is the output difference between the target network model and the teacher network model CE For the difference of the target network model from the real target, α and β are hyper-parameters for tuning.
7. The method of claim 4, wherein the determining the total loss of the target network model based on the local feature similarity differences between the target network model and the teacher network model, the output differences, and the differences of the target network model and the real target further comprises:
and adjusting parameters of the target network model based on the total loss of the target network model to obtain an optimized target network model.
8. An apparatus for obtaining a target network model, comprising:
the system comprises an acquisition module, a calculation module and a calculation module, wherein the acquisition module is used for acquiring a teacher network model and a student network model, and the student network model is a model of which the teacher network model is simplified;
and the adding module is used for adding a residual structure between the two initial adjacent convolution layers in the student network model to obtain a target network model, so that in the convolution data set obtained after the training data set passes through the first convolution layer in the student network model, part of the convolution data set is weighted into the residual convolution data set after the processing of the residual structure, so that the local characteristics of the convolution data set are enhanced, and the local characteristics are input into the second convolution layer.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method steps of any of claims 1 to 7 when the program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method steps of any of claims 1-7.
CN202310253048.2A 2023-03-16 2023-03-16 Method, device, computer equipment and storage medium for acquiring target network model Pending CN116227556A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310253048.2A CN116227556A (en) 2023-03-16 2023-03-16 Method, device, computer equipment and storage medium for acquiring target network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310253048.2A CN116227556A (en) 2023-03-16 2023-03-16 Method, device, computer equipment and storage medium for acquiring target network model

Publications (1)

Publication Number Publication Date
CN116227556A true CN116227556A (en) 2023-06-06

Family

ID=86576858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310253048.2A Pending CN116227556A (en) 2023-03-16 2023-03-16 Method, device, computer equipment and storage medium for acquiring target network model

Country Status (1)

Country Link
CN (1) CN116227556A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993962A (en) * 2023-07-20 2023-11-03 广东南方智媒科技有限公司 Two-dimensional code detection method, device, equipment and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993962A (en) * 2023-07-20 2023-11-03 广东南方智媒科技有限公司 Two-dimensional code detection method, device, equipment and readable storage medium
CN116993962B (en) * 2023-07-20 2024-04-26 广东南方智媒科技有限公司 Two-dimensional code detection method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
WO2021143396A1 (en) Method and apparatus for carrying out classification prediction by using text classification model
US20200334520A1 (en) Multi-task machine learning architectures and training procedures
US20200334457A1 (en) Image recognition method and apparatus
Chen et al. Sdae: Self-distillated masked autoencoder
CN112184508B (en) Student model training method and device for image processing
US20230095606A1 (en) Method for training classifier, and data processing method, system, and device
US9875294B2 (en) Method and apparatus for classifying object based on social networking service, and storage medium
US20180032835A1 (en) Image recognizing apparatus, computer-readable recording medium, image recognizing method, and recognition apparatus
CN116010713A (en) Innovative entrepreneur platform service data processing method and system based on cloud computing
CN113705769A (en) Neural network training method and device
CN111368937A (en) Image classification method and device, and training method, device, equipment and medium thereof
CN111738403B (en) Neural network optimization method and related equipment
US11636667B2 (en) Pattern recognition apparatus, pattern recognition method, and computer program product
CN112529146A (en) Method and device for training neural network model
US20240135174A1 (en) Data processing method, and neural network model training method and apparatus
CN110874627B (en) Data processing method, data processing device and computer readable medium
CN116227556A (en) Method, device, computer equipment and storage medium for acquiring target network model
CN115238909A (en) Data value evaluation method based on federal learning and related equipment thereof
CN110598837A (en) Artificial neural network adjusting method and device
CN113065634B (en) Image processing method, neural network training method and related equipment
CN111783936B (en) Convolutional neural network construction method, device, equipment and medium
CN115795355B (en) Classification model training method, device and equipment
CN116258871A (en) Fusion feature-based target network model acquisition method and device
CN116384516A (en) Cost sensitive cloud edge cooperative method based on ensemble learning
WO2023220878A1 (en) Training neural network trough dense-connection based knowlege distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination