CN115906986A

CN115906986A - Network searching method and device, electronic equipment and storage medium

Info

Publication number: CN115906986A
Application number: CN202211584290.XA
Authority: CN
Inventors: 才贺; 张召凯; 冯天鹏
Original assignee: Oppo Chongqing Intelligent Technology Co Ltd
Current assignee: Oppo Chongqing Intelligent Technology Co Ltd
Priority date: 2022-12-09
Filing date: 2022-12-09
Publication date: 2023-04-04

Abstract

The embodiment of the disclosure relates to a network searching method and device, electronic equipment and a storage medium, which relate to the technical field of network searching, and the network searching method comprises the following steps: acquiring a search space of a hyper network; performing a first type training on a sub-network in the super-network to obtain a training network; dividing the hyper-network into a plurality of sub-networks according to the search space, and performing second type training on each sub-network based on the training network until the network converges to perform network search; wherein the first type of training and the second type of training are different in dimensionality. According to the technical scheme in the embodiment of the disclosure, the interference among different sub-networks can be reduced, and the accuracy is improved.

Description

Network searching method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of network search technologies, and in particular, to a network search method and apparatus, an electronic device, and a computer-readable storage medium.

Background

The network architecture search can select a network structure with better performance on a pre-designed search space through an effective search strategy and an evaluation method.

In the related art, the network search may be performed based on a method of super-network weight sharing. The method for sharing the weight of the hyper-network aims to train a complete large network, the network comprises all selection branches, the branches are selected according to requirements during each training, all the branches share a set of weight parameters, and finally, the candidate sub-networks are sampled and evaluated to select the optimal sub-network.

In the above manner, interference and mutual influence are inevitably generated between the sub-networks due to the weight sharing mechanism, so that the consistency between the sub-network performance and the authenticity performance of the sub-network in the super-network is greatly reduced, and the result is inaccurate.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a network searching method and apparatus, an electronic device, and a storage medium, which overcome, at least to some extent, the problem of high interference between different subnetworks in a network searching process due to the limitations and disadvantages of the related art.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the present disclosure, there is provided a network search method, including: acquiring a search space of a hyper network; performing first type training on a sub-network in the super-network to obtain a training network; dividing the hyper-network into a plurality of sub-networks according to the search space, and performing second type training on each sub-network based on the training network until the network converges to perform network search; wherein the first type of training and the second type of training are different in dimensionality.

According to a second aspect of the present disclosure, there is provided a network search apparatus comprising: the search space determining module is used for acquiring a search space of the hyper-network; the first training module is used for carrying out first type training on the sub-networks in the super-network to obtain a training network; the second training module is used for dividing the hyper-network into a plurality of sub-networks according to the search space, and performing second type training on each sub-network based on the training network until the network converges so as to perform network search; wherein the first type of training and the second type of training are different in dimensionality.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the network search method of the first aspect described above and possible implementations thereof via execution of the executable instructions.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the network search method of the first aspect described above and possible implementations thereof.

According to the technical scheme provided by the embodiment of the disclosure, on one hand, by performing the first type training and the second type training on the sub-networks in the super-network and by distinguishing and classifying the characteristics of different sub-networks, each sub-network can be trained from different dimensions, the problem of mutual interference between different sub-networks is avoided, the mutual influence between different sub-networks is reduced, the consistency and the correlation between the performance and the real performance of the sub-network in the super-network are improved, and the accuracy of the sub-networks is improved. On the other hand, the training of the sub-networks from different dimensions can improve the comprehensiveness and effectiveness of the training of the super-networks, improve the accuracy of the training of the super-networks and improve the accuracy of the network structures obtained by searching.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 is a schematic diagram illustrating an application scenario in which the network search method and the network search apparatus according to the embodiment of the present disclosure may be applied.

Fig. 2 schematically illustrates a network search method according to an embodiment of the present disclosure.

Fig. 3 schematically illustrates a schematic diagram of performing a network search in an embodiment of the present disclosure.

Fig. 4 schematically illustrates a flowchart of determining a candidate network according to an embodiment of the disclosure.

Fig. 5 schematically illustrates a schematic diagram of determining candidate networks according to similarity according to an embodiment of the present disclosure.

Fig. 6 schematically illustrates a flow diagram for performing a second type of training of an embodiment of the present disclosure.

Fig. 7 schematically illustrates an overall flowchart of a network searching method in the embodiment of the present disclosure.

Fig. 8 schematically illustrates a block diagram of a network searching apparatus in an embodiment of the present disclosure.

Fig. 9 schematically illustrates a block diagram of an electronic device in an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In the related art, the network search may be performed in the following manner. For example, a search method based on reinforcement learning and evolutionary algorithm takes the real performance of the model as a training label of the evaluation model, and the result is relatively accurate, but obtaining the real performance of the network structure requires a large amount of computing resources. In order to solve the problem of overlarge calculated amount, the method can be based on a differential method and a super-network weight sharing method, and the accuracy of the two search strategies can be accurately evaluated without training to obtain the real performance of a network model.

The method based on the differential mainly evaluates the real performance of the large model in a proxy model mode. However, the performance of the small proxy model must be highly consistent with the performance of the corresponding large model, the design requirement on the proxy model is high, and the result is still inaccurate in actual experiments and applications. The method aims to train a complete large network, the network comprises all selection branches, the branches are selected according to requirements during each training, all the branches share a set of weight parameters, and finally, sampling and evaluating are carried out on candidate sub-networks to select the optimal sub-network. The method based on weight sharing of the super network can directly select the corresponding sub network from the super network according to the evaluation effect without an intermediate agent model. However, consistency between the performance of the sub-network in the super-network and the actual performance obtained by training the sub-network alone becomes an important condition for ensuring search results. The optimal subnetwork in the super network can be effectively selected only if the evaluation performance of the subnetwork in the super network is consistent with the real performance of the subnetwork.

However, in the existing weight sharing search training of the super network, interference and mutual influence are inevitably generated between sub-networks due to a weight sharing mechanism, so that the consistency of the sub-network performance and the authenticity performance of the sub-networks in the super network is greatly reduced, and the result is inaccurate.

In order to solve the technical problem in the related art, the embodiments of the present disclosure provide a network search method, which may be applied to any scenario in which a network search is performed on any model. Fig. 1 is a schematic diagram illustrating a system architecture to which the network search method and apparatus according to the embodiment of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include a terminal 110 and a server 120. The terminal 110 may be any type of device capable of deploying a model and performing a computing function, for example, a computer, a smartphone, a smart television, a tablet computer, a smart wearable device (such as AR glasses), a robot, an unmanned aerial vehicle, and the like, as long as the network search and the model computation can be performed. The server 120 may conduct a network search. In the process of network searching, the server can perform plastic training on the sub-networks of the super-network from the width dimension, and then perform strengthening training on the sub-networks from the depth dimension, so that super-network training is realized. Further, network searching can be performed in the trained hyper-network to obtain a network structure, and the network structure obtained through the network searching can be sent to the terminal, so that the terminal processes the object to be processed based on the network structure. In addition, the terminal can also perform hyper-network training and subsequent network searching by itself to obtain a network structure, so that the obtained network structure is used for performing data processing on the object to be processed.

In some embodiments, when the terminal 110 is different from the server 120, the super network may be obtained in the terminal 110 and sent to the server, the super network training and the subsequent network search are performed in the server to obtain a network structure with optimal performance, and the server processes the data of the object to be processed to obtain a processing result. Further, the processing result may be sent to the terminal 110 for subsequent processing.

It should be noted that the network searching method provided by the embodiment of the present disclosure may be executed by the terminal 110 or the server 120, which is specifically defined according to the location of network deployment and the performance and actual requirements of the device.

Fig. 2 schematically illustrates a network search method in an embodiment of the present disclosure, which specifically includes the following steps:

step S210, obtaining a search space of the hyper-network;

step S220, performing first type training on the sub-networks in the super-network to obtain a training network;

step S230, dividing the hyper-network into a plurality of sub-networks according to the search space, and performing a second type training on each sub-network based on the training network until the network converges to perform network search; wherein the first type of training and the second type of training are different in dimensionality.

In the embodiment of the present disclosure, the super-network may be divided into a plurality of sub-networks according to the search space, where the plurality of sub-networks may be obtained by sampling. A first dimension training may be performed on each subnetwork based on a width dimension to yield a corresponding training network.

Next, a second type of training may be performed on the sub-network based on the training network until the network converges to perform network training; wherein the first type of training and the second type of training are different in dimensionality. That is, the second type training may be further performed again on the basis of the training network obtained by the first type training, and the second type training may be enhanced training of the depth dimension.

After training of the hyper-network is completed, network search can be performed based on the trained hyper-network to obtain a required network structure.

In the technical scheme, on one hand, the sub-networks in the super-network are subjected to the first type training and the second type training, and the characteristics of different sub-networks are distinguished and classified, so that each sub-network can be trained from different dimensions, the problem of mutual interference among different sub-networks is avoided, the mutual influence among different sub-networks is reduced, the consistency and the correlation between the performance and the real performance of the sub-network in the super-network are improved, and the accuracy of the sub-networks is improved. On the other hand, the training of the sub-networks from different dimensions can improve the comprehensiveness and effectiveness of the training of the super-networks, improve the accuracy of the training of the super-networks and improve the accuracy of the network structure obtained by searching.

Next, referring to fig. 2, each step of the network searching method in the embodiment of the present disclosure will be described in detail.

In step S210, a search space of the hyper network is acquired.

In the embodiment of the present disclosure, the super network may be any type of super network, and may be used for network search. The search space may be constructed based on the ResNet48 network. The core of the ResNet48 network is the residual structure. Because the search space usually contains huge parameters and calculation amount, and the corresponding model deployment is limited on some scenes and equipment, network search is needed to be carried out, and a network structure with better performance is determined to improve the model accuracy.

When network search is carried out, search can be carried out in a search space based on a search strategy, and therefore accurate network search is achieved. The search process of the web search is shown in fig. 3. First, a specific search space is specified. The search space may be constructed based on the ResNet48 network. The type of the intermediate layer is selectable except for the input layer and the output layer, and different limited selection ranges can be set for different types of layers generally. The hyper-parameters of each layer of the network can also be selected, and comprise the number of convolution kernels, the number of convolution kernel channels, the height, the width, the step length in the horizontal direction and the step length in the vertical direction and the like.

TABLE 1

Search space	ResNet48
		Number of stages	4
Number of blocks in Stages	[2,3,4,5],[2,3,4,5],[2,3,4,5,6,7,8],[2,3,4,5]
		Number of basic channels corresponding to Stages	[64,128,256,512]
Conv layer channel scaling	[1.0,0.95,0.9,0.85,0.8,0.75,0.7]

The specific structure of the search space of the hyper-network may be as shown in table 1. In the embodiment of the present disclosure, the super network may be divided into a plurality of parts, and the plurality of parts may be divided according to the depth of the network, and may be a network from shallow to deep. And, each layer of the network structure of each portion is the same. Referring to table 1, the super network may be divided into a plurality of sections, which may also be referred to as a multi-section. The number of the parts can be 4 parts, and the specific number is determined according to actual requirements. It should be noted that the super network may be divided into a plurality of parts according to the network depth, and the parts from shallow to deep may be sequentially expressed as: stage1, stage2, stage3 and Stage4.

Refer to table 1, where the number of blocks in Stages can be used to indicate the range of the number of blocks each section contains, i.e., the depth search range of each section. For example, the number of blocks in Stage1 may be [2,3,4,5], which may be used to indicate the first part, i.e., stage1 may be the first few layers, e.g., 2,3,4, or 5 layers, etc. The basic channel number [64,128,256,512] corresponding to the Stage represents that the channel number of the Stage1 is 64, the channel number of the Stage2 is 128, the channel number of the Stage3 is 256, and the channel number of the Stage4 is 512. The Conv layer channel scaling is used to indicate a scaling for scaling the number of channels of each part, and one of the channels may be selected to be scaled to obtain the scaled number of channels, for example, stage1 may be selected to be 0.8, stage2 may be selected to be 0.7, and so on. Among the parts that can be searched are: the number of blocks in different stages (the setting method of blocks is the same as that of Resnet) and the scaling of Conv layer channels.

In step S220, a first type of training is performed on the sub-networks in the super-network to obtain a training network.

In the embodiment of the disclosure, after the search space is determined, network search can be performed according to the search strategy and the performance evaluation model. The network search may be conducted, for example, based on a super network weight sharing search training. However, in the process of weight sharing search training, interference and mutual influence are inevitably generated between sub-networks due to a weight sharing mechanism, so that the consistency of the sub-network performance and the authenticity performance of the sub-networks in the super-network is greatly reduced, and the result is inaccurate.

In order to solve the technical problem, the super-network can be trained from two dimensions of depth and width, and the problem of mutual interference among different sub-networks can be optimized. Wherein a super network may comprise a plurality of sub-networks of different structures. Illustratively, the subnetworks of the hyper-network may be training-intensive and training-malleable, e.g., the subnetworks may be training-intensive to reduce the interaction between the subnetworks at different depths; and the interference between the sub-networks with different widths can be reduced through the sub-network plastic training, so that the problem of mutual interference between different sub-networks in the super-network can be reduced from two dimensions of depth and width. It should be noted that the reinforcement training may be performed first and then the plastic training may be performed, or the plastic training may be performed first and then the reinforcement training is performed, which is not specifically limited herein, and it is only necessary to perform the joint training of the depth dimension and the width dimension on the sub-network.

In the embodiment of the present disclosure, in the training of the sub-networks, since the weights of the sub-networks are all inherited from the super-network, mutual interference may occur between different sub-networks, and the cause of the interference may be related to the size of the super-network and the number of sub-networks included in the super-network. Based on this, all sub-networks in the super-network can be trained from the perspective of network width. All sub-networks in the super-network can also be trained from a network depth perspective.

In some embodiments, a first type of training may first be performed on a subnetwork of the hyper-network to obtain a training network. The first type training may be any one of depth dimension training or width dimension training, and it may be determined whether the first type training is depth dimension training or width dimension training according to actual requirements, where the first type training is described as width dimension training as an example.

Wherein the first type of training may be plastic training of the width dimension training representation. Plastic training refers to the training of the width dimension of subnetworks in a hyper-network according to the similarity between the subnetworks. It should be noted that, in the first type of training, the sub-networks may be obtained by sampling, and one sub-network or a plurality of sub-networks may be obtained by sampling each time, which is determined according to actual requirements.

To further reduce the interaction between subnetworks, a network plasticity training step may be added. Illustratively, the next candidate network to participate in training may be determined based on the similarity between different subnetworks. After obtaining the next candidate network participating in the training, the next round of training can be continued based on the next candidate network participating in the training until all the candidate networks determined according to the similarity are trained, so as to obtain the training network. In particular, the current subnetwork may be trained until the current subnetwork converges to complete the current round of network training. Further, similarity calculation can be carried out on the current sub-network and any sub-network so as to determine the similarity between the two sub-networks; determining a candidate network according to the similarity between the two networks, and performing the next round of training on the candidate network; and repeating the steps until the training is finished so as to realize the width dimension training.

A flow chart for determining candidate networks is schematically shown in fig. 4, and referring to fig. 4, determining candidate networks may include the steps of:

in step S410, sampling the super network to obtain a current sub-network and any sub-network;

in step S420, width codes of the same layer of the current sub-network and any sub-network are obtained, and a similarity between the current sub-network and any sub-network is determined according to the width codes;

in step S430, the candidate network is determined according to the comparison result between the similarity and the similarity threshold.

In some embodiments, the current subnetwork and any subnetwork may both be sampled in a sampling manner. The sampling mode may be random sampling or other sampling modes, and is not limited in particular here. Any of the sub-networks may be a different sub-network than the current sub-network.

After determining the current sub-network and any sub-network, the width coding of the same layer of each sub-network may be obtained. Width coding refers to the width coding of each layer in a subnetwork, which is used to describe the characteristics of the subnetwork in the width dimension. The width coding of the same layer in different subnetworks may be the same or different, and is not particularly limited herein.

Based on this, the width coding of the current sub-network and any sub-network at the same layer can be obtained. For example, the current subnet may be the ith subnet and either subnet may be the jth subnet. The same layer refers to the same layer as the current sub-network and any sub-network, e.g., the kth layer of the current sub-network and any sub-network, where k can be any value between 0-n. The width coding may be the width coding of the k-th layer in the ith sub-network and the width coding of the k-th layer in the jth sub-network. Further, the distance between the current sub-network and any one of the sub-networks can be determined according to the difference between width codes of the same layer of different sub-networks, and the similarity between the current sub-network and any one of the sub-networks can be determined according to the distance between the current sub-network and any one of the sub-networks. For example, the distance between the current sub-network and any one of the sub-networks may be determined according to an absolute value of a difference between width codes of the same layer of the current sub-network and any one of the sub-networks. The distance between the two can be calculated with reference to equation (1):

wherein the content of the first and second substances,

is the width coding of the k-th layer in the ith sub-network. />

Is the width coding of the k-th layer in the jth sub-network.

The similarity between the sub-networks can be effectively judged according to the distance between the sub-networks. On the basis, a similarity threshold value can be set, and the similarity between the two is compared with the similarity threshold value to obtain a comparison result. The comparison result may be that the similarity between the two is greater than the similarity threshold or less than the similarity threshold. When the similarity is smaller than the similarity threshold value as a result of the comparison, the difference between any sub-network and the current sub-network is considered to be large, so that the influence on the weight of the current sub-network is small, and the any sub-network can be used as a candidate network of the current sub-network to perform the next round of training based on the candidate network. When the similarity is greater than the similarity threshold value as a result of the comparison, it may be considered that the difference between any sub-network and the current sub-network is large, and therefore the influence on the weight of the current sub-network is large, and therefore any sub-network may not be determined as a candidate network. In this case, sampling of any one of the subnetworks may be continued, and the steps from step S410 to step S430 may be repeated to determine a candidate network.

For example, referring to fig. 5, the width coding 502 for the k-th layer of the current sub-network 501 may be determined, and the width coding 504 for the k-th layer of either sub-network 503 may be determined. The width code 502 and the distance 505 between the width codes 504 are further determined to determine the similarity 506 between the current subnetwork and either subnetwork. When the similarity is smaller than the similarity threshold, determining any sub-network as a candidate network 507; and when the similarity is greater than the similarity threshold, re-determining any sub-network until a candidate network is determined. If no candidate network is determined, training is considered to be complete.

After the candidate network is determined, the candidate network may be subjected to a next round of training until the candidate network converges, so as to complete a next round of network training. Further, the candidate network and any next subnetwork can be subjected to similarity calculation to determine the similarity between the candidate network and any next subnetwork, and then the next candidate network participating in training is determined for the candidate network, so as to perform the next round of training after the candidate network training process, and the training is iterated until the network converges, so that the training of the super-network in the width dimension is completed, and the training network is obtained.

In the embodiment of the disclosure, by judging the similarity, the subnetworks with different characteristics in the super network can be trained to the maximum extent, the whole super network is trained more effectively and comprehensively through a plurality of subnetworks, the mutual interference between the subnetworks with different widths is reduced to the maximum extent from the network width dimension, and the correlation and consistency between the performance of the sub network and the real performance in the super network are improved.

Next, with continuing reference to fig. 2, in step S230, dividing the super network into a plurality of sub-networks according to the search space, and performing a second type training on each of the sub-networks based on the training network until the network converges to perform a network search; wherein the first type of training and the second type of training differ in dimensionality.

In the disclosed embodiments, after the first type of training is performed, the second type of training may be continued on the basis of the first type of training. The second type of training may also be one of width dimension training or depth dimension training, and the second type of training is a different dimension than the first type of training. For example, when the first type of training is plastic training in the width dimension, the second type of training is reinforcement training of the depth dimension training representation; while the first type of training is intensive training in the depth dimension, the second type of training is plastic training in the width dimension. The execution order of the reinforcement training and the plastic training may be interchanged, for example, the reinforcement training may be performed after the plastic training, or the reinforcement training may be performed before the plastic training, which is not specifically limited herein. Here, the plastic training of the first type training representation is performed first, and then the intensive training of the second type training representation is performed as an example.

A flow chart for performing a second type of training is schematically shown in fig. 6, and with reference to the flow chart shown in fig. 6, mainly comprises the following steps:

in step S610, determining a depth selection space based on the search space, and dividing the super network into a plurality of sub-networks in a depth dimension according to the depth selection space;

in step S620, each of the subnetworks is individually and intensively trained, and a target subnetwork in the subnetworks is complementarily trained.

In the embodiment of the present disclosure, when performing the depth dimension enhancement training, the original super network may be first divided into a plurality of sub networks. Since the reinforcement training can be performed based on the plastic training, the training network and the original hyper-network have a correspondence relationship, and thus the sub-network subjected to the reinforcement training based on the training network can be considered as a plurality of sub-networks obtained by dividing the original hyper-network. Illustratively, the original super network may be divided into a number of different sub-networks from a network depth perspective. The subnetworks used for the second type of training are divided differently than the subnetworks used for the first type of training. The sub-networks used for the second type of training may be sub-supernets divided according to depth, while the sub-networks of the first type of training are sampled sub-networks. The number of the plurality of different sub-networks of the second type of training may be set according to actual requirements, for example, five sub-networks may be provided, and the size and structure of the five sub-networks may be different.

In the sub-network division process, a depth selection space for each sub-network may be determined based on a search space of the super-network, and the super-network is divided into a plurality of sub-networks in a depth dimension according to the depth selection space. The sub-networks are taken from the original super network and can be represented as sub-super network 0, sub-super network 1, sub-super network 2 and sub-super network 3, sub-super network 4. Illustratively, the hyper-network may be divided into a plurality of sections, the plurality of sections representing different depths. For example, the search space may be constructed by a Resnet48 network, and the super network is divided into four parts according to the depth, which are sequentially from shallow to deep: stage1, stage2, stage3 and Stage4, and each layer network structure in each part Stage is the same.

After determining the plurality of parts, the depth search ranges of the plurality of parts of the entire super network may be determined based on the search space in step S210, so that the depth search ranges of the plurality of parts of the super network are combined in the depth dimension to obtain a depth selection space of the plurality of sub-networks in each part, and the super network is divided into the plurality of sub-networks according to the determined depth selection space.

Specifically, when dividing the sub-networks, the first part, the second part, the third part, and the fourth part in the original super-network may be combined into a spliced part of the first part, the second part, the third part, and the fourth part. The step of determining the depth selection space may include: determining a depth selection space of a first part according to a depth search range of the first part in a search space, and determining a depth selection space of a second part according to a depth search range of the second part; and determining the depth selection space of the spliced part corresponding to the third part and the fourth part by combining the depth search ranges of the third part and the fourth part.

Wherein the structure of the sub-super-net represented by each sub-net can be represented by the depth selection space in table 2.

TABLE 2

Referring to table 2, it can be seen that sub-supernet 0, sub-supernet 1, sub-supernet 2, and sub-supernet 3 may each include a first portion, a second portion, and a spliced portion of a third portion and a fourth portion. However, for each sub-super-net, the depth selection spaces or search ranges of the splicing parts corresponding to the first part, the second part, the third part and the fourth part are different. Wherein the depth selection spaces of the first and second parts of different sub-hypergraphs may be the same, e.g. both may be determined according to the depth search range of the corresponding part of the hypergraph network in the search space, e.g. the depth search range of the first part is [2,3,4,5], and the depth selection space of the first part of the sub-hypergraph is [5,4,3,2]. As another example, the depth search range of the second portion is [2,3,4,5], and the depth selection space of the second portion of the sub-piconet is [5,4,3,2]. The depth search range of the third section may be [2,3,4,5,6,7,8], the depth search range of the fourth section is [2,3,4,5], and the depth selection space of the splicing section composed of the third section and the fourth section may be obtained by combining and splicing the depth search ranges of the third section and the fourth section, but the value needs to satisfy the respective depth search ranges of the third section and the fourth section. For example, the value of each part may be any one or more of the values of the corresponding part, and is not specifically limited herein.

Referring to table 2, for example, for sub-network 0, the depth search range of the splicing part [ stage3, stage4] may be any one of [2,2], [2,3], [2,4 ]. For the subnetwork 2, the depth search range of the splicing part [ stage3, stage4] can be any one of [8,3], [8,4], [8,5 ]. It should be added that the sub-super-net 4 can be all the remaining sub-super-nets except the sub-super-net 0, the sub-super-net 1, the sub-super-net 2 and the sub-super-net 3 in all the sub-super-nets corresponding to the search space of the super-net.

In the embodiment of the disclosure, the super-network can be divided into a plurality of sub-networks with different depths from the depth dimension according to a plurality of parts, so that a plurality of groups of sub-super-networks with different depths can be obtained.

After the structure of each sub-network is determined, model training can be performed on the super-network and each sub-network corresponding to the training network independently on the basis of the training network obtained through the first type training, and the trained sub-networks are obtained. For example, the model parameters of each subnetwork may be updated individually until each subnetwork converges, so as to obtain a trained subnetwork corresponding to each subnetwork. The training mode of each sub-network can be different, but the training processes among different sub-networks are independent of each other and are not influenced by any other sub-network.

After training for each subnetwork is complete, a target subnetwork may be determined from the plurality of subnetworks. The target subnetwork may be a subnetwork with a small degree of training, and may specifically be determined according to the degree of training. The training degree may be represented by the trained performance, for example, the trained performance of the sub-network satisfies a preset condition or does not satisfy the preset condition. When the performance of the trained sub-network does not meet the preset condition, the trained sub-network can be determined as the target sub-network. The predetermined condition may be determined based on the accuracy of the sub-network on the authentication set, or the performance in the super-network. For example, if the accuracy of the sub-network on the verification set is low or the performance of the sub-network on the super-network is different from that of the individual sub-network, the trained sub-network may be considered not to satisfy the preset condition, and thus the sub-network may be determined as the target sub-network. Further, in order to improve the training accuracy, the target subnetwork may be trained again for multiple times until the performance of the target subnetwork meets the preset condition, so as to achieve sufficient training.

For example, if subnetwork 0, subnetwork 1, subnetwork 2, and subnetwork 3 are less trained, the target subnetworks may be subnetwork 0, subnetwork 1, subnetwork 2, and subnetwork 3. Further, subnetwork 0, subnetwork 1, subnetwork 2, and subnetwork 3 are trained multiple times to achieve adequate training.

In the embodiment of the disclosure, the super network is divided into a plurality of sub networks in the depth dimension, and each sub network is subjected to the intensive training of the depth dimension on the basis of the training network, so that the correlation and consistency of the performance and the authenticity performance of the sub networks divided in the super network can be improved from the depth dimension, and the mutual interference and the mutual influence between the sub networks with different depths are reduced.

After training of the sub-networks is completed, a network search may be conducted based on the trained super-networks. Consistency between the performance of the sub-network in the super-network and the actual performance obtained by training the sub-network alone is an important condition for ensuring search results. The optimal subnetwork in the super network can be effectively selected only if the evaluation performance of the subnetwork in the super network is consistent with the real performance of the subnetwork. Due to the fact that the sub-networks with the width dimension and the depth dimension are trained, the consistency of the performance of the sub-networks in the super-network and the authenticity of the sub-networks is improved, and therefore the accuracy of network searching can be improved.

Fig. 7 schematically shows a flow chart of network training, and referring to fig. 7, the method mainly includes the following steps:

in step S701, a search space of the super network is acquired;

in step S702, a current subnet and any subnet are acquired;

in step S703, determining similarity according to the width codes of the current sub-network and the same layer of any sub-network;

in step S704, it is determined whether the similarity is smaller than a similarity threshold; if yes, go to step S705; if not, go to step S702;

in step S705, determining any subnetwork as a candidate network, and performing a next round of training to obtain a training network;

in step S706, a plurality of subnetworks are acquired based on the training network;

in step S707, a plurality of subnetworks are subjected to intensive training.

According to the technical scheme, mutual interference among different sub-networks in the same hyper-network due to weight sharing is relieved by combining sub-network strengthening training and sub-network plastic training, consistency of performance of the sub-networks in the hyper-network and authenticity performance of the sub-networks is improved, training accuracy is improved, and the whole model searching effect is improved.

On the basis, after the training is carried out on the super-network to obtain the trained super-network, the mutual interference among sub-networks with different depths and different widths in the super-network can be reduced, and the consistency of the performance of the sub-networks is improved. The trained super-network performance is better due to the higher performance of the sub-network. And performing network search based on the trained hyper-network to perform processing operation on the object to be processed. The object to be processed may be an image to be processed, a voice to be processed, a text to be processed, and the like, and may specifically be determined according to the type of the processing operation and an actual application scenario. Based on the above, the required network structure obtained in the trained hyper-network through network search can be used for processing the object to be processed, so as to realize the function corresponding to the processing operation. The processing operation may be an operation in a target task scene, and is specifically determined according to the type of the target task. The target task can be various types of tasks, such as a classification task, a detection task and a segmentation task, an identification task, and the like, and can be determined according to an actual application scenario and an actual requirement. Based on this, the processing operation may include, but is not limited to, a classification operation, a detection operation, a segmentation operation, and an identification operation, so that various types of operations may be implemented on the object to be processed based on the network structure, which is not specifically limited herein.

In the disclosed embodiment, sub-network plastic training is used to increase the Pearson correlation coefficient between the hyper-network and the sampled subnetwork from an initial value of 0.80263 to 0.81415, and sub-network intensive training is used to increase the Pearson correlation coefficient between the hyper-network and the subnetwork from 0.81415 to 0.83768. The pearson correlation coefficient is used to measure whether two data sets are on a line, and is used to measure the linear correlation degree between random variables.

Therefore, the relevance and consistency between the performance of the sub-network and the real performance of the sub-network in the super network are effectively improved, and the effectiveness of the search method based on the inheritance of the weight of the super network is improved.

In the embodiment of the disclosure, in the current architecture search task based on weight sharing of the hyper-network, according to the mutual interference problem between sub-networks with different depths and different widths, respectively, sub-network plastic training and reinforcement training are adopted, the training steps and the training modes are adjusted by distinguishing and classifying the characteristics of different sub-networks, the mutual interference problem between the sub-networks is effectively improved and optimized, a search space is constructed based on the Resnet48 network, the advancement and effectiveness based on the plastic training and the reinforcement training are accurately demonstrated through experimental results, and the performance of the search method based on weight sharing of the hyper-network is remarkably improved.

In an embodiment of the present disclosure, a network searching apparatus is provided, and referring to fig. 8, the network searching apparatus 800 may include:

a search space determining module 801, configured to obtain a search space of a super network;

a first training module 802, configured to perform a first type training on a subnetwork in the super-network, to obtain a training network;

a second training module 803, configured to divide the hyper-network into multiple sub-networks according to the search space, and perform a second type of training on each of the sub-networks based on the training network until the network converges, so as to perform network search; wherein the first type of training and the second type of training are different in dimensionality.

In an exemplary embodiment of the present disclosure, the first training module includes: and the candidate network determining module is used for determining a next candidate network participating in training for the current sub-network of the super-network and training the candidate network to obtain the training network.

In an exemplary embodiment of the disclosure, the candidate network determination module includes: the sampling module is used for sampling the super network to obtain a current sub network and any sub network; the similarity determining module is used for acquiring width codes of the same layer of the current sub-network and any sub-network and determining the similarity between the current sub-network and any sub-network according to the width codes; and the comparison module is used for determining the candidate network according to the comparison result of the similarity and the similarity threshold.

In an exemplary embodiment of the present disclosure, the similarity determination module includes: and the distance determining module is used for determining the distance between the current sub-network and any sub-network according to the difference value between the width codes of the same layer of the current sub-network and any sub-network, and determining the similarity according to the distance.

In an exemplary embodiment of the present disclosure, the comparison module includes: and the comparison control module is used for determining that any sub-network is a candidate network corresponding to the current sub-network if the similarity between the current sub-network and any sub-network is smaller than a similarity threshold value.

In an exemplary embodiment of the present disclosure, the second training module includes: the sub-network dividing module is used for determining a depth selection space based on the search space and dividing the super network into a plurality of sub-networks in a depth dimension according to the depth selection space; and the strengthening training module is used for carrying out individual strengthening training on each sub-network and carrying out supplementary training on a target sub-network in the sub-networks.

In an exemplary embodiment of the present disclosure, the sub-network dividing module includes: a partitioning module for partitioning the hyper-network into a plurality of portions; a selection space determining module, configured to determine the depth selection space by combining depth search ranges corresponding to the multiple portions in the search space.

In an exemplary embodiment of the disclosure, the selection space determination module is configured to: determining a depth selection space of a first part according to a depth search range of the first part in a search space, and determining a depth selection space of a second part according to a depth search range of the second part; and determining the depth selection space of the spliced part corresponding to the third part and the fourth part by combining the depth search ranges of the third part and the fourth part.

It should be noted that, the specific details of each part in the network search apparatus have been described in detail in some embodiments of the network search method, and details that are not disclosed may refer to the embodiments of the method part, and thus are not described again.

Exemplary embodiments of the present disclosure also provide an electronic device. The electronic device may be the terminal 110 described above. In general, the electronic device may include a processor and a memory for storing executable instructions of the processor, the processor being configured to perform the above-described method via execution of the executable instructions.

The following takes the mobile terminal 900 in fig. 9 as an example, and the configuration of the electronic device is exemplarily described. It will be appreciated by those skilled in the art that the configuration of figure 9 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes.

As shown in fig. 9, the mobile terminal 900 may specifically include: the mobile communication terminal comprises a processor 901, a memory 902, a bus 903, a mobile communication module 904, an antenna 1, a wireless communication module 905, an antenna 2, a display screen 906, a camera module 907, an audio module 908, a power supply module 909 and a sensor module 910.

Processor 901 may include one or more processing units, such as: the Processor 901 may include an AP (Application Processor), a modem Processor, a GPU (Graphics Processing Unit), an ISP (Image Signal Processor), a controller, an encoder, a decoder, a DSP (Digital Signal Processor), a baseband Processor, and/or an NPU (Neural-Network Processing Unit), etc. The method of the exemplary embodiment may be performed by the AP, GPU or DSP, and when the method involves neural network related processing, may be performed by the NPU, e.g., the NPU may load neural network parameters and execute neural network related algorithm instructions.

An encoder may encode (i.e., compress) an image or video to reduce the data size for storage or transmission. The decoder may decode (i.e., decompress) the encoded data for the image or video to recover the image or video data. The mobile terminal 900 may support one or more encoders and decoders, such as: image formats such as JPEG (Joint Photographic Experts Group), PNG (Portable Network Graphics), BMP (Bitmap), and Video formats such as MPEG (Moving Picture Experts Group) 1, MPEG10, h.1063, h.1064, and HEVC (High Efficiency Video Coding).

The processor 901 may be connected to the memory 902 or other components via the bus 903.

The memory 902 may be used to store computer-executable program code, which includes instructions. The processor 901 executes various functional applications of the mobile terminal 900 and data processing by executing instructions stored in the memory 902. The memory 902 may also store application data, such as files for storing images, videos, and the like.

The communication function of the mobile terminal 900 may be implemented by the mobile communication module 904, the antenna 1, the wireless communication module 905, the antenna 2, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 904 may provide a mobile communication solution of 3G, 4G, 5G, etc. applied to the mobile terminal 900. The wireless communication module 905 may provide wireless communication solutions for wireless local area network, bluetooth, near field communication, etc. applied to the mobile terminal 900.

The display screen 906 is used to implement display functions, such as displaying a user interface, images, videos, and the like. The camera module 907 is used for implementing shooting functions, such as shooting images, videos, and the like, and may include a color temperature sensor array therein. The audio module 908 is used for implementing audio functions, such as playing audio, capturing voice, and the like. The power module 909 is used to implement power management functions such as charging batteries, powering devices, monitoring battery status, etc. The sensor module 910 may include one or more sensors for implementing corresponding sensing functions. For example, the sensor module 910 may include an inertial sensor for detecting a motion pose of the mobile terminal 900 and outputting inertial sensing data.

It should be noted that, in the embodiments of the present disclosure, a computer-readable storage medium is also provided, and the computer-readable storage medium may be included in the electronic device described in the foregoing embodiments; or may exist separately without being assembled into the electronic device.

A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The computer-readable storage medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims. It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. A method of searching a network, comprising:

acquiring a search space of a hyper network;

performing first type training on a sub-network in the super-network to obtain a training network;

dividing the hyper-network into a plurality of sub-networks according to the search space, and performing second type training on each sub-network based on the training network until the network converges to perform network search; wherein the first type of training and the second type of training differ in dimensionality.

2. The method of claim 1, wherein the training of the sub-networks in the hyper-network to obtain a training network of a first type comprises:

and determining a next candidate network participating in training for the current sub-network of the super-network, and training the candidate network to obtain the training network.

3. The method of claim 2, wherein determining the next candidate network to participate in training for the current subnetwork of the super network comprises:

sampling the super network to obtain a current sub network and any sub network;

acquiring width codes of the same layer of the current sub-network and any sub-network, and determining the similarity between the current sub-network and any sub-network according to the width codes;

and determining the candidate network according to the comparison result of the similarity and a similarity threshold.

4. The method of claim 3, wherein determining the similarity between the current sub-network and any one of the sub-networks according to the width code comprises:

and determining the distance between the current sub-network and any sub-network according to the difference between the width codes of the same layer of the current sub-network and any sub-network, and determining the similarity according to the distance.

5. The method according to claim 3, wherein the determining the candidate network according to the comparison result between the similarity and a similarity threshold comprises:

and if the similarity between the current sub-network and any sub-network is smaller than a similarity threshold, determining that any sub-network is a candidate network corresponding to the current sub-network.

6. The method of claim 1, wherein the dividing the hyper-network into a plurality of sub-networks according to the search space and performing a second type of training on each of the sub-networks based on the training network comprises:

determining a depth selection space based on the search space, and dividing the super network into a plurality of sub-networks in a depth dimension according to the depth selection space;

and performing individual strengthening training on each sub-network, and performing supplementary training on each sub-network according to the training result.

7. The method of claim 6, wherein determining the depth selection space based on the search space comprises:

dividing the super network into a plurality of portions;

and determining the depth selection space by combining the depth search ranges corresponding to the plurality of parts in the search space.

8. The method according to claim 7, wherein said determining the depth selection space in combination with the depth search ranges corresponding to the plurality of portions in the search space comprises:

determining a depth selection space of a first part according to a depth search range of the first part in a search space, and determining a depth selection space of a second part according to a depth search range of the second part;

and determining a depth selection space of the spliced part corresponding to the third part and the fourth part by combining the depth search ranges of the third part and the fourth part.

9. A network search apparatus, comprising:

the search space determining module is used for acquiring a search space of the hyper network;

the first training module is used for carrying out first type training on the sub-networks in the hyper-network to obtain a training network;

the second training module is used for dividing the hyper-network into a plurality of sub-networks according to the search space, and performing second type training on each sub-network based on the training network until the network converges so as to perform network search; wherein the first type of training and the second type of training are different in dimensionality.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the network search method of any one of claims 1-8 via execution of the executable instructions.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the network search method of any one of claims 1 to 8.