CN115600653B

CN115600653B - Neural network model deployment method and device

Info

Publication number: CN115600653B
Application number: CN202211563787.3A
Authority: CN
Inventors: 伍德亮; 唐巍; 李本辉
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2023-05-12
Anticipated expiration: 2042-12-07
Also published as: CN115600653A

Abstract

The application provides a method and a device for deploying a neural network model, wherein the method comprises the following steps: determining target deployment parameters of the neural network model by using a first optimization algorithm, so that the reasoning time length when the neural network model reasoning the image to be processed according to the target deployment parameters meets the preset requirements, wherein the target deployment parameters comprise target layer fusion parameters and target feature graph segmentation parameters; and deploying the neural network model on the electronic device according to the target deployment parameters. According to the scheme, the optimizing algorithm is mainly utilized to optimize and deploy the deployment parameters of the neural network model, so that the reasoning time length of the neural network model meets the requirements, namely, the optimal grouping strategy of the neural network model and the segmentation strategy of the feature map are determined through optimizing the deployment parameters, the processing efficiency is improved, and the reasoning time length is shortened.

Description

Neural network model deployment method and device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a neural network model deployment method and device.

Background

When processing data such as images by using a neural network model, the processing time length can be directly influenced by the data volume of the data to be processed, if the data volume of the data to be processed is large, the reasoning time length can be increased, and particularly for the high-resolution image to be processed, the data volume is too large, and the processing time is long. Especially for terminal equipment such as mobile phones and tablet computers, the influence on the reasoning time length is more prominent due to limited computing capacity, and when the data size of the data to be processed is too large, the reasoning time length is too long, so that the user experience is influenced.

In order to shorten the reasoning time, although a neural network model with a simpler structure can be used for replacing the neural network model with an original complex structure, on the one hand, the accuracy of a reasoning result is greatly reduced, and even the accuracy requirement of a reasoning task cannot be met, on the other hand, many reasoning tasks are that the neural network model with a simple structure cannot be replaced, and the scheme has the greatest defects that only data such as images with smaller data volume can be processed, and data such as images with larger data volume cannot be processed.

Therefore, how to better improve the reasoning efficiency and shorten the reasoning time is a technical problem to be solved urgently.

Disclosure of Invention

The application provides a method and a device for deploying a neural network model, which can better improve the reasoning efficiency of the neural network model and shorten the reasoning time.

In a first aspect, a method for deploying a neural network model is provided, the method comprising: determining target deployment parameters of the neural network model by using a first optimization algorithm, so that the reasoning time length when the neural network model reasoning the image to be processed according to the target deployment parameters meets the preset requirements, wherein the target deployment parameters comprise target layer fusion parameters and target feature graph segmentation parameters; and deploying the neural network model on the electronic device according to the target deployment parameters.

In the technical scheme, the deployment parameters of the neural network model are optimized and deployed mainly through an optimization algorithm, so that the reasoning time length of the neural network model meets the requirements, namely, the optimal grouping strategy of the neural network model and the segmentation strategy of the feature map are determined through optimizing the deployment parameters, the processing efficiency is improved, and the reasoning time length is shortened. The scheme does not deteriorate the neural network model, so that the accuracy of the reasoning result is not reduced, and the method is suitable for application scenes of most of the neural network models.

The deployment parameters can be understood as parameters such as width, height or depth of the feature map of each group input and how to split the image to be processed when the neural network model is deployed. The target deployment parameters are determined after optimizing the first optimization algorithm, and when reasoning is carried out according to the target deployment parameters, the reasoning duration can meet the preset requirements.

Layer fusion parameters can be understood as how the various layers of the neural network model are grouped, including grouping into groups, and which neural network layers are included in each group. It should be understood that, generally, the packets herein are grouped sequentially according to an interlayer structure of the neural network, not arbitrarily, because the neural network model processes data layer by layer in sequence as it processes the data. The target layer fusion parameter may be understood as a layer fusion parameter obtained after optimizing by using an optimization algorithm, or as a layer fusion parameter to be adopted when actual deployment is performed.

The feature map splitting parameter may be understood as how to split the original feature map input to each packet, and may be represented by splitting into several pieces, in which direction the splitting is performed, or may be represented by the size and number of the split feature maps. The original feature map can be understood as a feature map input to each layer if no grouping is performed. It should be understood that, for the first neural network layer, the input may be the image to be processed directly, or may be a feature vector of the image to be processed, and there is no limitation. The feature map segmentation parameters may include, for example, the number of parts and the segmentation position or the segmentation mode of segmenting the original feature map according to the width, height and depth directions. It is to be understood that the above examples are intended to be illustrative only and not limiting. The target feature map segmentation parameter may be understood as a feature map segmentation parameter obtained after optimizing by using an optimization algorithm, or may be understood as a feature map segmentation parameter to be adopted when actual deployment is performed.

The reasoning time length when the neural network model performs reasoning on the image to be processed according to the target deployment parameters can be understood as the time length of obtaining the reasoning result after the neural network model is used for segmenting and reasoning according to the target layer fusion parameters and the feature map segmentation parameters in the target deployment parameters. Or the time length from the beginning of inputting the image to be processed or the feature vector of the image to be processed to the first layer of the neural network model to the ending of outputting the reasoning result of the last layer of the neural network model is the reasoning time length.

The preset requirement may be, for example, that the inference duration is within a preset duration range, that is, that the preset requirement is satisfied when the inference duration is within the preset duration range, and otherwise, the preset requirement is not satisfied.

With reference to the first aspect, in certain implementation manners of the first aspect, when the neural network model is deployed on the electronic device according to the target deployment parameter, the method may include: dividing a neural network layer of a neural network model into a plurality of target groups according to the target layer fusion parameters, and fusing the neural network layers in the target groups; and determining the number of feature graphs input to each target group and the size of each feature graph according to the target feature graph segmentation parameters.

With reference to the first aspect, in some implementation manners of the first aspect, when determining, by using a first optimization algorithm, a target deployment parameter of the neural network model, so that an inference duration when the neural network model infers an image to be processed according to the target deployment parameter meets a preset requirement, the method may include:

determining deployment parameters of the ith optimization by using a first optimization algorithm, wherein i is an integer greater than or equal to zero;

according to the ith optimized deployment parameter, counting the ith reasoning time length when the neural network model reasoning the training image according to the ith optimized deployment parameter;

When the ith reasoning time length meets the preset requirement, determining the ith optimized deployment parameter as a target deployment parameter; or alternatively, the process may be performed,

and when the ith reasoning time length does not meet the preset requirement, repeatedly executing the steps after adding 1 to the i to obtain the ith reasoning time length, and outputting the target deployment parameter until the ith reasoning time length meets the preset requirement.

It should be noted that, the training image utilized in the optimization stage may be not an image to be processed, but other images that may be used for reasoning, for example, may be a training image read from a storage device of the electronic device, or a training image obtained from a network through a communication interface, or the like, which is not limited.

It should be further understood that, since the solution of the embodiment of the present application is mainly for optimizing how to deploy better, the solution is not executed in the training stage of the neural network model, and after deployment is completed, the neural network model is inferred according to the target deployment parameters, and the solution of the embodiment of the present application is not executed in the reasoning stage, so it should be understood that the solution of the embodiment of the present application is executed during the period when the neural network model is trained and available, but is not deployed on the electronic device yet. It should also be appreciated that the solution of the embodiments of the present application does not change the structure and parameters of the neural network model (i.e., the weights of the neural network model), and that the execution is not contained within the duration of the reasoning with the neural network model.

With reference to the first aspect, in certain implementations of the first aspect, the first optimization algorithm is a bayesian optimization algorithm;

when i=0, in determining the deployment parameter for the ith optimization using the first optimization algorithm, it may include:

randomly initializing N observation points to obtain an initialized observation data set; the initialized observation data set comprises initialized deployment parameters, the initialized deployment parameters are substituted into initialization reasoning time length obtained by black box function calculation, and N is a positive integer; initializing deployment parameters, namely, the 0 th optimized deployment parameters;

when i >0, in determining the ith optimized deployment parameter using the first optimization algorithm, it may comprise:

estimating the distribution of the black box function based on the data set after the i-1 th optimization by using the proxy model;

and determining the deployment parameters of the (n+i) th sampling point through the acquisition function, wherein the deployment parameters of the (n+i) th sampling point are the ith optimized deployment parameters.

The Bayesian optimization algorithm is utilized to conduct optimization in the scheme of the embodiment of the application, the method is more suitable for a black box optimization scene, the structure and parameters of the neural network model are not changed, the neural network model is only regarded as one black box task with the input as deployment parameters and the output as reasoning duration, and therefore the method is suitable for optimizing by utilizing the Bayesian optimization algorithm.

The bayesian optimization continuously iterates new input parameters based on the input parameters (here, deployment parameters) and the obtained output parameters (here, inference duration), so that a set of parameters with the lowest inference duration is finally obtained, and the set of parameters is the optimal parameters. The deployment parameters may include, for example, how many layers the neural network model is divided into, how many parts the feature map is cut into in height, width, and depth, that is, the layer fusion parameters and the feature map cut parameters described above. The Bayesian optimization is characterized in that the distribution condition of the black box function is estimated by using the proxy model, and then the next sampling point is determined by the acquisition function, so that the optimal value can be found relatively quickly; and the Bayesian optimization adopts an exploration strategy, so that a globally optimal solution can be found without falling onto the locally optimal solution.

It should be noted that, the proxy model, the black box function and the acquisition function are all parts involved in the bayesian optimization algorithm and participate in optimization. The proxy model is mainly used to estimate the distribution of the black box functions. The black box function can be regarded as a function representing the relationship between the input data and the output data, where the black box function is a function of the relationship between the inference duration, which is the output data, and the deployment parameter, which is the input data. The acquisition function is then used to determine the function of the sampling points. The observation data set is a data set of coordinates composed of input data and output data corresponding to each observation point, and if the observation data set is updated by using the data of the sampling points, it is understood that the sampling point after a known output is added to the observation data set as a new observation point.

With reference to the first aspect, in certain implementation manners of the first aspect, the method further includes: and updating the observation data set by using the deployment parameters of the (N+i) th sampling point and the reasoning time length corresponding to the (N+i) th sampling point.

With reference to the first aspect, in certain implementations of the first aspect, the proxy model is a gaussian process model, a gaussian mixture model, a probabilistic random forest model, or a tree structure estimation model.

With reference to the first aspect, in certain implementations of the first aspect, the acquisition function is a desired delta function, a probability delta function, a lower confidence boundary function, or an upper confidence boundary function.

In a second aspect, a deployment apparatus of a neural network model is provided, the apparatus comprising means for performing any one of the methods of the first aspect, comprised of software and/or hardware.

In a third aspect, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor being capable of implementing any one of the methods of the first aspect when the computer program is executed.

In a fourth aspect, there is provided a chip comprising a processor for reading and executing a computer program stored in a memory, the computer program being capable of carrying out any one of the methods of the first, second or third aspects when executed by the processor.

Optionally, the chip further comprises a memory, the memory being electrically connected to the processor.

Optionally, the chip may further comprise a communication interface.

In a fifth aspect, there is provided a computer readable storage medium storing a computer program capable of implementing any one of the methods of the first aspect when the computer program is executed by a processor.

In a sixth aspect, there is provided a computer program product comprising a computer program capable of implementing any one of the methods of the first aspect when the computer program is executed by a processor.

Drawings

Fig. 1 is a comparative diagram of an implementation of reasoning using a neural network model.

Fig. 2 is a schematic flow chart of a method of deploying a neural network model according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a process for reasoning using a neural network model in an embodiment of the present application.

Fig. 4 is a schematic diagram of an implementation procedure of step S201.

Fig. 5 is a schematic flow chart of a bayesian optimization algorithm according to an embodiment of the present application.

Fig. 6 is a schematic diagram of the optimization effect of the embodiment of the present application.

Fig. 7 is a comparison of the reasoning process of the neural network model.

Fig. 8 is a schematic diagram of a deployment apparatus of a neural network model according to an embodiment of the present application.

Fig. 9 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

Fig. 10 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The following describes embodiments of the present application with reference to the drawings. The deployment method of the neural network model can be applied to various scenes needing to deploy the neural network model on the electronic equipment.

Fig. 1 is a comparative diagram of an implementation of reasoning using a neural network model. As shown in fig. 1 (a), the conventional implementation process when the neural network model is used for reasoning is implemented, after the neural network model is subjected to format conversion, an executable file is generated, and then the executable file is used for reasoning the data to be processed, so as to obtain a reasoning result, where the data to be processed can be, for example, voice, video, audio, image, text and the like. This solution, if the amount of data to be processed is large, can lead to an excessively long reasoning time.

Aiming at the problems, the embodiment of the application provides a new deployment scheme of the neural network model, and the overall processing efficiency of the neural network model is improved mainly by optimizing how to group each layer and divide the data to be processed of the neural network model. As shown in fig. 1 (b), the data to be processed is mainly described as an example of image data, but it should be understood that the scheme of the embodiment of the present application may also be applied to processing other data to be processed. In the scheme of the embodiment of the application, the steps of acquiring the target deployment parameters are added, namely, how to group each layer and how to divide the corresponding parameters of the data to be processed, then the neural network model is deployed and format-converted according to the target deployment parameters to obtain an executable file, when the executable file is utilized to infer the image to be processed, the first target group is divided into the image to be processed according to the parameters of the target feature map corresponding to the first target group, then the image to be processed is processed, and the subsequent other target groups are divided and inferred according to each target group and the parameters of the target feature map corresponding to each target group, so that an inference result is finally obtained.

For ease of understanding, the following examples assume that the neural network model is a model of an image classification task, including a 7-layer neural network, and the target deployment parameters include a target layer fusion parameter and a target feature map segmentation parameter. At deployment time, the neural network model is divided into 3 target groups according to target layer fusion parameters: layer 0-layer 1 is the first group, i.e., the first target group, layer 2-layer 5 is the second group, and layer 6 is the third group. Further, assuming that the resolution of the image to be processed is a×a, the parameters of the first set of corresponding target feature maps are: the size of the target feature map of the first group is a (a/2), the number is 1*2 =2, that is, the height is not divided, and the width is divided into 2 equal parts; the second set of corresponding target feature graph segmentation parameters are: the size of the target feature map of the second group is (a/3) × (a/2), the number is 3*2 =6, that is, the height is divided into 3 parts which are equally divided, and the width is divided into 2 parts which are equally divided; the third set of corresponding target feature graph segmentation parameters are: the target feature map of the third group has a size of (a/2) a and a number of 2*1 =2, i.e., the height is divided into 2 parts, which are equally divided, and the width is not divided. When the executable file is used for reasoning, the target feature images of the first group of input are 2 feature images with the size of A (A/2), and the two feature images are obtained after the images to be processed are segmented. The target feature map input by the second group is 6 feature maps with the size of (A/3) × (A/2), and the six feature maps are obtained by combining the output result of the first group into one feature map and then splitting the feature map. The target feature map input by the third group is 2 feature maps with the size of A (A/2), and the two feature maps are obtained by combining the output results of the second group into one feature map and then splitting the feature map.

It should be understood that the numerical values in the foregoing examples are only for illustrating the solution, and there is no limitation, for example, the feature map may not be split equally, and for example, the neural network model may be a model for performing other reasoning tasks, for example, an image super-resolution reconstruction task, a face recognition task, and the like, and for example, when the feature map is split, the feature map may be split according to width and height, and further split according to depth, and not listed one by one.

The embodiment of the application mainly considers the processing of the image, namely, the condition that the data to be processed is the image. However, it should be understood that the solution of the embodiment of the present application may also be applied to processing other data to be processed, where certain modification needs to be performed on the deployment parameter, for example, for audio data, the feature map segmentation parameter may be replaced with the audio segmentation parameter, and for text data, the feature map segmentation parameter may be replaced with the text segmentation parameter.

In this embodiment of the present application, the electronic device may be a mobile phone, a computer, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), a smart bracelet, or other terminal devices, or may be a data processing device such as a host, a server, or a cloud server. For the data processing device, the operation capability is stronger, the reasoning time is shorter than that of the terminal device, but if the scheme of the embodiment of the application is utilized, the reasoning time can be further shortened.

Fig. 2 is a schematic flow chart of a method of deploying a neural network model according to an embodiment of the present application. The method can be applied to the electronic equipment capable of deploying the neural network model. The steps of fig. 2 are described below.

S201, determining target deployment parameters of the neural network model by using a first optimization algorithm, so that the reasoning time length when the neural network model reasoning the image to be processed according to the target deployment parameters meets the preset requirement, wherein the target deployment parameters comprise target layer fusion parameters and target feature graph segmentation parameters.

The feature map splitting parameter may be understood as how to split the original feature map input to each packet, and may be represented by splitting into several pieces, in which direction the splitting is performed, or may be represented by the size and number of the split feature maps. The original feature map can be understood as a feature map input to each layer if no grouping is performed. It should be understood that, for the first neural network layer, the input may be the image to be processed directly, or may be a feature vector of the image to be processed, and there is no limitation. The feature map segmentation parameters may include, for example, the number of parts and the segmentation position or the segmentation mode of segmenting the original feature map according to the width, height and depth directions. For example, it can be said that the feature map splitting parameter corresponding to a certain group includes dividing into 2 parts in the width direction, splitting into two parts in the height direction by 1:2, and not splitting in the depth direction, where the dividing, the ratio of 1:2, and the non-splitting are splitting positions or splitting modes, and the width, the height, and the depth direction are splitting directions. The feature map segmentation parameters may in turn comprise, for example, the feature map size and number into which the original feature map is segmented. As an example, the feature map slicing parameters corresponding to a certain packet may be said to include slicing the original feature map into D feature maps with a size of width height=a×b×c, or slicing the original feature map into D1 parts with a size of width height=a 1×b1×c1, D2 part feature maps with a width height=a 2×b2×c2, and D3 part feature maps with a width height=a 3×b3×c3, where a×b C, A1×b1×c1, a2×b2×c2, and a3×b3 are feature map sizes, and D, D, D2, and D3 are numbers. It is to be understood that the above examples are intended to be illustrative only and not limiting. The target feature map segmentation parameter may be understood as a feature map segmentation parameter obtained after optimizing by using an optimization algorithm, or may be understood as a feature map segmentation parameter to be adopted when actual deployment is performed.

In one implementation, step S201 may include:

It should be further understood that, since the solution of the embodiment of the present application is mainly aimed at how to deploy better and perform optimization, the solution is not executed in the training stage of the neural network model, and after deployment is completed, the neural network model is inferred according to the target deployment parameters, and the solution of the embodiment of the present application is not executed in the reasoning stage, so it should be understood that the solution of the embodiment of the present application is executed during the period when the neural network model is already trained and may be used, but is not deployed on the electronic device yet. It should also be appreciated that the solution of the embodiments of the present application does not change the structure and parameters of the neural network model (i.e., the weights of the neural network model), and that the execution is not contained within the duration of the reasoning with the neural network model.

For ease of understanding, step S201 may be referred to as the optimizing stage and step S202 as the deployment stage. The reasoning process using the neural network model is the reasoning stage described herein. It should be understood that, for the neural network model, the training of a large amount of training data is required to have the capability of reasoning to meet the accuracy requirement of the reasoning task, so that a training stage is also corresponding. The solution of the embodiment of the application does not occur in the reasoning stage and training stage of the neural network model.

For how to deploy the neural network model, one existing method is mainly a deployment method of end cloud combination, so that complex operation is put on cloud equipment, and terminal equipment only performs the steps of acquisition and reception or bears a small amount of simple operation. However, such schemes do not consider how the neural network model is grouped and the inference data (the data to be processed and the feature map generated in the intermediate process) is divided to improve the inference efficiency, but only how the scheme is easy to implement as a whole. In this scenario, if the deployment is performed by using the scheme of the embodiment of the application, that is, the cloud device or the terminal device, the trained neural network model is subjected to optimization and deployment of the deployment parameters, so that the reasoning time is shortened. Other deployment schemes are often set manually through experience, have low accuracy and have no universality. It should be understood that, although the end-cloud collaboration may involve dividing a complete neural network model into two sub-models and then disposing the two sub-models on two sides, the division is only to enable the two divided sub-models to be suitable for the setting of the terminal device and the cloud device, which is essentially to divide one model into two sub-models, and does not care how to shorten the reasoning time as much as possible, or the division is not divided for the purpose of shortening the reasoning time, and then the two sub-models may be trained separately, which has already been explained that the two sub-models can be essentially regarded as two independent models. On the contrary, the scheme of the embodiment of the application does not hard segment the neural network model, that is, the structure of the neural network model is not changed, but only how to make each layer to process the change of the rule of the data in the reasoning stage so as to shorten the reasoning time as much as possible.

In one implementation, the first optimization algorithm may be a bayesian optimization algorithm. The Bayesian optimization algorithm is utilized to conduct optimization in the scheme of the embodiment of the application, the method is more suitable for a black box optimization scene, the structure and parameters of the neural network model are not changed, the neural network model is only regarded as one black box task with one input as deployment parameter and one output as reasoning duration, and therefore the method is suitable for optimizing by utilizing the Bayesian optimization algorithm.

In some other possible implementations, the genetic evolutionary algorithm may be used to perform the optimization, for example, a differential evolutionary algorithm based on a population, but the genetic evolutionary algorithm has poor convergence compared with the bayesian optimization algorithm, and experiments prove that the optimization result is not good by using the bayesian optimization algorithm. The duration of the inference can still be somewhat shortened compared to if no changes were made.

In one example, when i=0, in determining the ith optimized deployment parameter using the first optimization algorithm, it may include:

randomly initializing N observation points to obtain an initialized observation data set; the initialized observation data set comprises initialized deployment parameters, the initialized deployment parameters are substituted into initialization reasoning time length obtained by black box function calculation, and N is a positive integer; and initializing the deployment parameters, namely, the 0 th optimized deployment parameters.

In another example, when i >0, in determining the ith optimized deployment parameter using the first optimization algorithm, it may comprise:

Based on this example, the method may further include: and updating the observation data set by using the deployment parameters of the (N+i) th sampling point and the reasoning time length corresponding to the (N+i) th sampling point.

In one example, the proxy model is a Gaussian Process (GP) model, a Gaussian mixture model, a probabilistic random forest (probability random forests, PRF) model, or a tree-structured parzen estimator, TPE model.

In one example, the acquisition function is a desired delta (expected improvement, EI) function, a probability delta (probability of improvement, PI) function, a lower confidence boundary (lower confidence bound, LCB) function, or an upper confidence boundary (upper confidence bound, UCB) function.

S202, deploying the neural network model on the electronic equipment according to the target deployment parameters.

When the target deployment parameter is determined, the neural network model may be deployed on an electronic device according to the target deployment parameter, where the electronic device may be any of the electronic devices, for example, a terminal device or a data processing device.

In one implementation, step S202 may include: dividing a neural network layer of a neural network model into a plurality of target groups according to the target layer fusion parameters, and fusing the neural network layers in the target groups; and determining the number of feature graphs input to each target group and the size of each feature graph according to the target feature graph segmentation parameters.

In the scheme shown in fig. 2, the optimizing algorithm is mainly used to optimize and deploy the deployment parameters of the neural network model, so that the reasoning time length of the neural network model meets the requirements, that is, the optimal grouping strategy of the neural network model and the segmentation strategy of the feature map are determined through optimizing the deployment parameters, so that the processing efficiency is improved, and the reasoning time length is shortened. The scheme does not deteriorate the neural network model, so that the accuracy of the reasoning result is not reduced, and the method is suitable for application scenes of most of the neural network models.

To facilitate an understanding of the overall implementation and optimization of the scheme of fig. 2, further description is provided below in conjunction with the various figures.

Fig. 3 is a schematic diagram of a process for reasoning using a neural network model in an embodiment of the present application. Fig. 3 can be regarded as a specific example of the procedure shown in fig. 1 (b) and also as an example of the application procedure of the scheme shown in fig. 2. As shown in fig. 3, with the solution according to the embodiment of the present application, performing step S201 obtains a configuration file, where the configuration file includes the target deployment parameters, and in the reasoning stage, the configuration file is used to infer each group according to the size and number of the target feature maps in the configuration file. In the example shown in fig. 3, the layering of the network is: the 0 layers are separately divided into a group, the 1-4 layers are divided into a group, the input picture with the size of 512x512 is divided into (256+256) x (256+256) in the 0 th layer, namely, the height and the width are divided into halves, and the 1-4 layers are divided into halves, and the width is divided into 4 parts. It should be understood, however, that fig. 3 is only one specific example and that no numerical limitation exists. For example, it may also be assumed that the neural network model has 39 layers in total, and the size of the input is 512×512, that is, the size of the image to be processed. Then the inputs inside the bayesian optimization are: the model is divided into a number of groups, the size is divided into a total of 3 groups, for example 39 layers (3 groups have a plurality of combinations such as [0], [1-10], [11-38]; or [0-10], [11-20], [21-38], etc.), the height and width of the size are divided into two groups, for example: (256+256) * (256+256).

After deployment is completed, when a to-be-processed image is input, segmentation and processing are performed according to the deployed grouping and segmentation modes.

Fig. 4 is a schematic diagram of an implementation procedure of step S201.

S401, starting.

I.e. to start optimizing the deployment parameters.

S402, acquiring a neural network model.

I.e. determining which neural network model deployment parameters are optimized.

S403, initializing deployment parameters.

It can be regarded as an example when i=0 in the above step S201.

S404, deployment is carried out according to the deployment parameters.

That is, the model is deployed according to the layer fusion parameter and the feature map segmentation parameter in the deployment parameters, for the deployment parameters after the initialization in step S403, the deployment parameters adopted at this time are the initialized deployment parameters, and for the deployment parameters after the subsequent optimization for one or more times, the deployment parameters adopted at this time are the deployment parameters after the last optimization.

The deployment parameters in steps S403 and S404 can be considered as examples of the deployment parameters obtained by the above-described "determination of the deployment parameter for the ith optimization using the first optimization algorithm, i being an integer greater than or equal to zero". Where S403 is an example when i=0, and S404 is an example when i is greater than or equal to 0.

S405, reasoning the data to be processed by using the deployed neural network model.

S406, counting reasoning time.

Steps S405 and S406 can be regarded as an example of the step "the ith inference duration when the statistical neural network model infers the training image according to the ith optimized deployment parameter" according to the ith optimized deployment parameter.

S407, judging whether convergence exists, when the judging result is yes, executing step S408, and when the judging result is no, executing step S409.

Whether or not to converge may be understood as whether or not to converge within the range of the preset inference duration, that is, whether or not one example of the preset requirement is satisfied.

And S408, finishing optimization and outputting the target deployment parameters.

Steps S407 and S408 can be regarded as one example of the step of "determining the i-th optimized deployment parameter as the target deployment parameter when the i-th inference duration satisfies the preset requirement".

S409, determining the next deployment parameter by using Bayesian optimization, and proceeding to step S404.

Steps S407 and S409 may be regarded as an example of the step of "when the i-th inference duration does not satisfy the preset requirement, repeating the above steps after adding 1 to i to obtain the i-th inference duration, until the i-th inference duration satisfies the preset requirement, and outputting the target deployment parameter".

S501, initializing deployment parameters.

It can be regarded as one example of the above step S201 when i=0, or one example of step S403. The initialization process may include, for example: randomly initializing n observation points to obtain an initialized observation data set, namely randomly selecting n groups of input parameters

(deployment parameters) and calculate corresponding output parameters

(reasoning time length) to obtain an initialization observation data set { of a batch of initialization data

(data set of deployment parameters and corresponding inference duration).

That is, N observation points may be randomly initialized to obtain an initialized observation data set; the initialized observation data set comprises initialized deployment parameters, the initialized deployment parameters are substituted into initialization reasoning time length obtained by black box function calculation, and N is a positive integer; and initializing the deployment parameters, namely, the 0 th optimized deployment parameters.

S502, estimating the distribution of the black box function f (x) based on the observation data set by using the proxy model g (x).

S503, determining deployment parameters of the next sampling point through an acquisition function.

Assuming that the next sampling point is denoted by n+1, determining x _n+1 And corresponding toy _n+1 。

S504, willx _n+1 ，y _n+1 ) To the observation dataset.

And (4) circularly executing the steps S502-S504 until the termination condition is met, exiting the process, and outputting the target deployment parameters.

For ease of understanding, assume for illustration that performing step S501 at the time of optimization (i.e., initialization) at time 0 determines deployment parameters and corresponding inference durations for 10 observation points in the initialized observation dataset. Step S502 is performed to estimate the distribution assumption of the black box function f (x) from the data set consisting of the 10 observation point data using the g (x) proxy model, which is represented by distribution 1. Step S503 is performed to determine the 11 th sampling point, that is, the deployment parameters to be adopted for the 1 st optimization, by using the acquisition function. Step S504 is performed to add the data of the 11 th sampling point to the observation data set, assuming that the observation data set after the addition is referred to as the 1 st optimized observation data set. Then step S502 is performed again, and the new distribution assumption of the black box function f (x) estimated by using the g (x) proxy model according to the observation data set composed of the 11 observation point data is represented by the distribution 2. Step S503 is executed to determine the 12 th sampling point, that is, determine the deployment parameters to be adopted for the 2 nd optimization, by using the acquisition function. And so on until the termination condition is met. The termination condition may be the number of iterations, or the number of sample points, or y _n+1 Meets the preset range, etc., and there is no limitation.

Bayesian optimization will collect (i.e., utilize) multiple times around points where the mean μ of the random variable f (x) is high/low. Since the mean value is high and low means a high probability that this is an extreme point; meanwhile, the Bayesian optimization can also explore other possible points, namely points with large standard deviation sigma; therefore, the Bayesian optimization can find the extreme points quickly and search other new points at the same time, so that the extreme points cannot fall into the local optimum. As described above, the acquisition function may take various suitable types of functions, taking the acquisition function as an EI function as an example.

The EI function may satisfy the following formula:

EI(x,ξ)=(μ-f(x ^* )-ξ)Φ((μ-f(x ^* )-ξ)/σ)+σφ((μ-f(x ^* )-ξ)/σ)；

where EI (x, ζ) represents the desired increment, μ represents the mean, σ represents the standard deviation,x ^* representing the current optimal sampling point of the sample,f(x ^* ) Represents the current optimum value, ζ represents the elastic factor, Φ (. Mu.)f(x ^* ) ζ) represents the cumulative distribution function, φ [ (. Mu. ]f(x ^* ) ζ)/σ represents the probability density function (μ -f(x ^* )-ξ)Φ((μ-f(x ^* ) ζ)/σ represents the mean μ -dominant mining (explication) part, σφ (. Mu.)f(x ^* ) ζ)/σ) represents the variance σ -dominant exploration (expression) part. The digging portion may also be referred to as a utilization portion.

In the bayesian optimization algorithm, the next sampling point will collect the point corresponding to the maximum value of EI, and for the above formula, the point when μ or σ is large, and for the observed sampling point ei=0.

Fig. 6 is a schematic diagram of the optimization effect of the embodiment of the present application. In this example, the neural network model is 39-layer. Scheme a was manually determined by an experienced expert to be divided into 7 groups, with the feature map divisions containing 2 and 4 divisions. Scheme B is a scheme utilizing embodiments of the present application, where deployment parameters are determined via bayesian optimization. As shown in fig. 6, a trend graph of the time length of the inference of the optimization performed by using the scheme a and the scheme B is shown, wherein the corresponding curve of the scheme a is a straight line of a broken line in the graph, and the corresponding curve of the scheme B is a broken line of a solid line in the graph. It can be seen that the curve of scheme B (broken line in the figure) is located below the curve of scheme a (broken line in the figure), that is, the scheme of the embodiment of the present application is superior to scheme a. The abscissa in fig. 6 is the number of iterations, e.g., 750 on the abscissa represents the 750 th iteration. The ordinate is the duration of the inference, for example 30 on the abscissa represents 30 milliseconds (ms). It should be appreciated that in each iteration, the input data remains the deployment parameter for that iteration. Each circle represents the duration of reasoning corresponding to a set of input parameters. As can be seen from the figure, most of the small circles are below the dashed line. It can also be seen from the graph that the inference duration varies from 30ms to 50ms, which is 1.6ms lower than the inference duration corresponding to the pattern a, i.e., better than the pattern a, and the time delay is improved by 5.1%.

In order to further understand the distinction between the solution of the embodiment of the present application when reasoning and the deployment mode without grouping and feature map segmentation, the following description is presented in connection with fig. 7.

Fig. 7 is a comparison of the reasoning process of the neural network model. Fig. 7 illustrates a 5-layer neural network model, where the resolution of the image to be processed is 1024×1024. However, it should be understood that neural network models of other structures are possible, and there is no limitation.

Fig. 7 (a) is a schematic diagram showing an internal process procedure when reasoning is performed by the neural network model which is not deployed. As can be seen from fig. 7 (a), the feature map input by each layer of neural network has a size of 1024×1024, and only a single execution path.

As shown in fig. 7 (b), a schematic diagram of an internal processing procedure when reasoning is performed on the neural network model after deployment, and as shown in fig. 7 (b), the neural network model is divided into 3 groups according to deployment parameters after optimizing, the 1 st group is composed of a 0 th layer and a 1 st layer, the second group is composed of a 2 nd layer and a 3 rd layer, and the third group is a 4 th layer. The target feature map splitting parameter of the 1 st group is to split 1024 x 1024 into two feature maps of 512 x 1024, so that the 1 st group can process the two feature maps of 512 x 1024 in parallel, thereby improving the processing efficiency, wherein the efficiency is improved on one hand because the resolution of the feature maps is reduced, and on the other hand, because the feature maps can be processed in parallel. The target feature map splitting parameter of the 2 nd group is to split 1024×1024 into 4 feature maps of 512×512, so that the 2 nd group can process the 4 feature maps of 512×512 in parallel, thereby improving the processing efficiency, and the efficiency is improved here because of the reduction of the resolution of the feature maps on the one hand and the parallel processing on the other hand. The target feature map splitting parameter of the 3 rd group is to split 1024×1024 into 1024×512 two feature maps, so that the 3 rd group can process the two 1024×512 feature maps in parallel, thereby improving the processing efficiency, and the efficiency is improved here because of the reduction of the resolution of the feature maps on the one hand and the parallel processing on the other hand.

Therefore, compared with the scheme without grouping and feature map segmentation, the scheme provided by the embodiment of the application has the advantages that the obvious reasoning efficiency is improved, and the optimizing algorithm is added to seek the optimal grouping and segmentation scheme, so that the grouping and segmentation are more scientific and reasonable compared with the deployment scheme with direct artificial grouping and segmentation, and the obvious reasoning efficiency is still improved.

The foregoing description of the method of the embodiments of the present application is provided primarily with reference to the accompanying drawings. It should be understood that, although the steps in the flowcharts related to the embodiments described above are shown in order, these steps are not necessarily performed in the order shown in the figures. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages. The apparatus according to the embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 8 is a schematic diagram of a deployment apparatus of a neural network model according to an embodiment of the present application. As shown in fig. 8, the apparatus 1000 includes an optimizing unit 1001 and a deploying unit 1002. The apparatus 1000 may be integrated into an electronic device as described in embodiments of the present application.

The apparatus 1000 can be used to perform any of the above methods of deploying neural network models. For example, the optimizing unit 1001 may be used to perform step S201, and the deploying unit 1002 may be used to perform step S202. For another example, the optimizing unit 1001 may be used to perform steps S401-S403, S405-S409, and the deploying unit 1002 may be used to perform step S404. For another example, the optimization unit 1001 may be used to perform steps S501-S504.

The apparatus 1000 may also be used to perform the process shown in fig. 1 (b), or to perform the process shown in fig. 3.

In one implementation, the apparatus 1000 may further include a storage unit for storing data such as deployment parameters. The storage unit may be integrated in the optimizing unit 1001 or the deploying unit 1002, or may be a unit independent from the optimizing unit 1001 and the deploying unit 1002.

Fig. 9 is a schematic hardware structure of an electronic device according to an embodiment of the present application. As shown in fig. 9, the electronic device 900 may include a processor 910, an external memory interface 920, an internal memory 921, a universal serial bus (universal serial bus, USB) interface 930, a charge management module 940, a power management module 941, a battery 942, an antenna 1, an antenna 2, a mobile communication module 950, a wireless communication module 960, an audio module 970, a speaker 970A, a receiver 970B, a microphone 970C, an earphone interface 970D, a sensor module 980, keys 990, a motor 991, an indicator 992, a camera 993, a display screen 994, and a subscriber identity module (subscriber identification module, SIM) card interface 995, etc. The sensor module 980 may include, among other things, a pressure sensor 980A, a gyroscope sensor 980B, a barometric sensor 980C, a magnetic sensor 980D, an acceleration sensor 980E, a distance sensor 980F, a proximity sensor 980G, a fingerprint sensor 980H, a temperature sensor 980J, a touch sensor 980K, an ambient sensor 980L, a bone conduction sensor 980M, and the like.

It should be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 900. In other embodiments of the present application, electronic device 900 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Illustratively, the processor 910 shown in fig. 9 may include one or more processing units, such as: the processor 910 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural hub and a command center of the electronic device 900, among other things. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 910 for storing instructions and data. In some embodiments, the memory in the processor 910 is a cache memory. The memory may hold instructions or data that the processor 910 has just used or recycled. If the processor 910 needs to reuse the instruction or data, it may be called directly from the memory. Repeated accesses are avoided and the latency of the processor 910 is reduced, thereby improving the efficiency of the system.

In some embodiments, processor 910 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

In some embodiments, the I2C interface is a bi-directional synchronous serial bus including a serial data line (SDA) and a serial clock line (derail clock line, SCL). The processor 910 may include multiple sets of I2C buses. The processor 910 may be coupled to the touch sensor 980K, charger, flash, camera 993, etc., respectively, through different I2C bus interfaces. For example, the processor 910 may couple the touch sensor 980K through an I2C interface, causing the processor 910 to communicate with the touch sensor 980K through an I2C bus interface, implementing the touch functionality of the electronic device 900.

In some embodiments, the I2S interface may be used for audio communication. The processor 910 may include multiple sets of I2S buses. The processor 910 may be coupled to the audio module 970 by an I2S bus to enable communication between the processor 910 and the audio module 970.

In some embodiments, the audio module 970 may communicate audio signals to the wireless communication module 960 through an I2S interface to implement a function of answering a phone call through a bluetooth headset.

In some embodiments, the PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. The audio module 970 and the wireless communication module 960 may be coupled through a PCM bus interface.

In some embodiments, the audio module 970 may also communicate audio signals to the wireless communication module 960 through a PCM interface to enable answering a call through a bluetooth headset. It should be appreciated that both the I2S interface and the PCM interface may be used for audio communication.

In some embodiments, the UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. UART interfaces are typically used to connect the processor 910 with the wireless communication module 960. For example, the processor 910 communicates with a bluetooth module in the wireless communication module 960 through a UART interface to implement bluetooth functions. In some embodiments, the audio module 970 may communicate audio signals to the wireless communication module 960 through a UART interface to implement a function of playing music through a bluetooth headset.

In some embodiments, a MIPI interface may be used to connect processor 910 with peripheral devices such as display 994, camera 993, and the like. The MIPI interfaces include camera serial interfaces (camera serial interface, CSI), display serial interfaces (display serial interface, DSI), and the like. The processor 910 and the camera 993 communicate through the CSI interface to implement the photographing function of the electronic device 900. Processor 910 and display 994 communicate via a DSI interface to implement the display functions of electronic device 900.

In some embodiments, the GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. GPIO interfaces may be used to connect processor 910 with camera 993, display 994, wireless communication module 960, audio module 970, sensor module 980, and so forth. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.

Illustratively, the USB interface 930 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 900, or may be used to transfer data between the electronic device 900 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices, etc.

It should be understood that the connection relationships between the modules illustrated in the embodiments of the present application are merely illustrative, and do not limit the structure of the electronic device 900. In other embodiments of the present application, the electronic device 900 may also use different interfacing manners, or a combination of multiple interfacing manners in the foregoing embodiments.

The charge management module 940 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 940 may receive a charging input of the wired charger through the USB interface 930. In some wireless charging embodiments, the charge management module 940 may receive wireless charging input through a wireless charging coil of the electronic device 900. The charging management module 940 may also provide power to the electronic device through the power management module 941 while charging the battery 942.

The power management module 941 is used to connect the battery 942, the charge management module 940 and the processor 910. The power management module 941 receives input from the battery 942 and/or the charge management module 940 and provides power to the processor 910, the internal memory 921, the external memory, the display 994, the camera 993, the wireless communication module 960, and the like. The power management module 941 may also be used to monitor battery capacity, battery cycle times, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 941 may also be provided in the processor 910. In other embodiments, the power management module 941 and the charge management module 940 may be disposed in the same device.

The wireless communication function of the electronic device 900 may be implemented by the antenna 1, the antenna 2, the mobile communication module 950, the wireless communication module 960, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 900 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example, the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 950 may provide a solution for wireless communication applied on the electronic device 900, such as at least one of the following: second generation (2th generation,2G) mobile communications solutions, third generation (3 g) mobile communications solutions, fourth generation (4th generation,5G) mobile communications solutions, fifth generation (5th generation,5G) mobile communications solutions. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 950 may receive electromagnetic waves by the antenna 1, perform processes such as filtering and amplifying the received electromagnetic waves, and then transmit to a modem processor for demodulation. The mobile communication module 950 may also amplify the signal modulated by the modem processor, and the amplified signal is converted into electromagnetic waves by the antenna 1 and radiated. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 910. In some embodiments, at least some of the functional modules of the mobile communication module 950 may be provided in the same device as at least some of the modules of the processor 910.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to speaker 970A, receiver 970B, etc.), or displays images or video through display 994. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communications module 950 or other functional modules, independent of the processor 910.

The wireless communication module 960 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 900. The wireless communication module 960 may be one or more devices that integrate at least one communication processing module. The wireless communication module 960 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 910. The wireless communication module 960 may also receive a signal to be transmitted from the processor 910, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 of electronic device 900 is coupled to mobile communication module 950 and antenna 2 of electronic device 900 is coupled to wireless communication module 960 so that electronic device 900 may communicate with networks and other electronic devices via wireless communication techniques. The wireless communication technology may include at least one of the following communication technologies: global system for mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, IR technologies. The GNSS may include at least one of the following positioning techniques: global satellite positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), beidou satellite navigation system (beidou navigation satellite system, BDS), quasi zenith satellite system (quasi-zenith satellite system, QZSS), satellite based augmentation system (satellite based augmentation systems, SBAS).

The electronic device 900 implements display functionality via a GPU, a display 994, and an application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display 994 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 910 may include one or more GPUs that execute program instructions to generate or change display information.

The display 994 is used to display images, videos, and the like. The display screen X94 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (flex-emitting diode), mini-Led, micro-OLED, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the electronic device 900 may include 1 or N displays 994, N being a positive integer greater than 1.

The electronic device 900 may implement shooting functions through an ISP, a camera 993, a video codec, a GPU, a display 994, an application processor, and the like.

The ISP is used to process the data fed back by the camera 993. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, an ISP may be provided in the camera 993.

The camera 993 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, the electronic device 900 may include 1 or N cameras 993, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 900 is selecting a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 900 may support one or more video codecs. Thus, the electronic device 900 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent cognition of the electronic device 900 may be implemented by the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 920 may be used to connect an external memory card, such as a Secure Digital (SD) card, to enable expanding the memory capabilities of the electronic device 900. The external memory card communicates with the processor 910 through the external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 921 may be used to store computer-executable program code including instructions. The processor 910 executes various functional applications of the electronic device 900 and data processing by executing instructions stored in the internal memory 921. The internal memory 921 may include a stored program area and a stored data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 900 (e.g., audio data, phonebook, etc.), and so forth. In addition, the internal memory 921 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

Electronic device 900 may implement audio functionality through audio module 970, speaker 970A, receiver 970B, microphone 970C, headphone interface 970D, and application processors, among others. Such as music playing, recording, etc.

The audio module 970 is used to convert digital audio information to an analog audio signal output and also to convert an analog audio input to a digital audio signal. The audio module 970 may also be used to encode and decode audio signals. In some embodiments, the audio module 970 may be disposed in the processor 910 or some functional modules of the audio module 970 may be disposed in the processor 910.

Speaker 970A, also known as a "horn," is configured to convert audio electrical signals into sound signals. The electronic device 900 may listen to music, or to hands-free conversations, through the speaker 970A.

A receiver 970B, also known as a "earpiece," is used to convert an audio electrical signal into an acoustic signal. When electronic device 900 is answering a telephone call or voice message, voice may be received by placing receiver 970B in close proximity to the human ear.

Microphone 970C, also known as a "microphone" or "microphone," is used to convert acoustic signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 970C through the mouth, inputting an acoustic signal to the microphone 970C. The electronic device 900 may be provided with at least one microphone 970C. In other embodiments, the electronic device 900 may be provided with two microphones 970C, which may also perform noise reduction in addition to collecting sound signals. In other embodiments, the electronic device 900 may also be provided with three, four, or more microphones 970C to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.

The earphone interface 970D is for connecting a wired earphone. The earphone interface 970D may be a USB interface 930 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 980A is configured to sense a pressure signal and convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 980A may be disposed on the display 994. The pressure sensor 980A is of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. When a force is applied to the pressure sensor 980A, the capacitance between the electrodes changes. The electronic device 900 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display 994, the electronic device 900 detects the intensity of the touch operation from the pressure sensor 980A. The electronic device 900 may also calculate the location of the touch based on the detection signal of the pressure sensor 980A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity smaller than a first pressure threshold acts on the short message application icon, an instruction to view the short message is executed. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.

The gyroscope sensor 980B may be used to determine a motion gesture of the electronic device 900. In some embodiments, the angular velocity of electronic device 900 about three axes (i.e., the 9, y, and z axes) may be determined by gyro sensor 980B. The gyro sensor 980B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 980B detects the shake angle of the electronic device 900, and calculates the distance to be compensated by the lens module according to the angle, so that the lens counteracts the shake of the electronic device 900 by the reverse motion, thereby realizing anti-shake. The gyro sensor 980B can also be used for navigating, somatosensory game scenes.

The air pressure sensor 980C is for measuring air pressure. In some embodiments, the electronic device 900 calculates altitude from barometric pressure values measured by the barometric pressure sensor 980C, aiding in positioning and navigation.

The magnetic sensor 980D includes a hall sensor. The electronic device 900 may detect the opening and closing of the flip holster using the magnetic sensor 980D. In some embodiments, when the electronic device 900 is a flip machine, the electronic device 900 may detect the opening and closing of the flip according to the magnetic sensor 980D; and setting the characteristics of automatic unlocking of the flip cover and the like according to the detected opening and closing state of the leather sheath or the detected opening and closing state of the flip cover.

The acceleration sensor 980E can detect the magnitude of acceleration of the electronic device 900 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the electronic device 900 is stationary. The electronic equipment gesture recognition method can also be used for recognizing the gesture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

The distance sensor 980F is used to measure distance. The electronic device 900 may measure distance by infrared or laser. In some embodiments, the electronic device 900 may range using the distance sensor 980F to achieve quick focus.

The proximity light sensor 980G may include, for example, a light-emitting diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 900 emits infrared light outward through the light emitting diode. The electronic device 900 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it may be determined that an object is in the vicinity of the electronic device 900. When insufficient reflected light is detected, the electronic device 900 may determine that there is no object in the vicinity of the electronic device 900. The electronic device 900 may detect that the user holds the electronic device 900 in close proximity to the ear using the proximity sensor 980G, so as to automatically extinguish the screen for power saving purposes. The proximity light sensor 980G can also be used in holster mode, pocket mode to automatically unlock and lock the screen.

The ambient light sensor 980L is for sensing ambient light level. The electronic device 900 may adaptively adjust the brightness of the display 994 based on the perceived ambient light level. The ambient light sensor 980L may also be used to automatically adjust white balance when taking a photograph. Ambient light sensor 980L can also cooperate with proximity light sensor 980G to detect whether electronic device 900 is in a pocket to prevent false touches.

The fingerprint sensor 980H is for capturing a fingerprint. The electronic device 900 may utilize the collected fingerprint feature to unlock the fingerprint, access the application lock, photograph the fingerprint, answer the incoming call, etc.

The temperature sensor 980J is for detecting temperature. In some embodiments, the electronic device 900 utilizes the temperature detected by the temperature sensor 980J to execute a temperature processing strategy. For example, when the temperature reported by temperature sensor 980J exceeds a threshold, electronic device 900 performs a reduction in performance of a processor located near temperature sensor 980J in order to reduce power consumption to implement thermal protection. In other embodiments, when the temperature is below another threshold, the electronic device 900 heats the battery 942 to avoid abnormal shutdown of the electronic device 900 due to low temperatures. In other embodiments, when the temperature is below a further threshold, the electronic device 900 performs boosting of the output voltage of the battery 942 to avoid abnormal shutdown caused by low temperatures.

Touch sensor 980K, also referred to as a "touch panel". The touch sensor 980K may be disposed on the display 994, and the touch sensor 980K and the display 994 form a touch screen, which is also referred to as a "touch screen". The touch sensor 980K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 994. In other embodiments, the touch sensor 980K may be disposed on a surface of the electronic device 900 other than where the display 994 is located.

The bone conduction sensor 980M may acquire a vibration signal. In some embodiments, bone conduction sensor 980M may acquire a vibration signal of the human vocal tract vibrating bone pieces. The bone conduction sensor 980M may also contact the pulse of the human body and receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 980M may also be provided in a headset, in combination with an osteoinductive headset. The audio module 970 may analyze the voice signal based on the vibration signal of the sound part vibration bone block obtained by the bone conduction sensor 980M, so as to realize the voice function. The application processor can analyze heart rate information based on the blood pressure beat signals acquired by the bone conduction sensor 980M, so as to realize a heart rate detection function.

The keys 990 include a power-on key, a volume key, etc. The keys 990 may be mechanical keys. Or may be a touch key. The electronic device 900 may receive key inputs, generate key signal inputs related to user settings and function controls of the electronic device 900.

The motor 991 may generate a vibratory alert. The motor 991 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 991 may also correspond to different vibration feedback effects by touch operations applied to different areas of the display screen 994. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 992 may be an indicator light, which may be used to indicate a state of charge, a change in charge, an indication message, a missed call, a notification, or the like.

The SIM card interface 995 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 995, or removed from the SIM card interface 995, to enable contact and separation with the electronic device 900. The electronic device 900 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 995 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 995 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 995 may also be compatible with different types of SIM cards. SIM card interface 995 may also be compatible with external memory cards. The electronic device 900 interacts with the network through the SIM card to implement functions such as talking and data communication. In some embodiments, the electronic device 900 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device 900 and cannot be separated from the electronic device 900.

The software system of the electronic device 900 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.

Fig. 10 is a schematic hardware structure of an electronic device according to an embodiment of the present application. As shown in fig. 10, the electronic device 2000 includes: at least one processor 2001 (only one shown in fig. 10), a memory 2002, and a computer program 2003 stored in the memory 2002 and executable on the at least one processor 2001, the processor 2001 implementing steps in any of the methods described above when the computer program 2003 is executed.

It will be appreciated by those skilled in the art that fig. 10 is merely an example of an electronic device and is not meant to be limiting, and that in practice an electronic device may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The processor 2001 may be a central processing unit (central processing unit, CPU), other general purpose processor, digital signal processor (digital signal processor, DSP), application specific integrated circuit (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field-programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 2002 may be an internal storage unit of the electronic device 2000 in some embodiments, such as a hard disk or memory of the electronic device 2000. The memory 2002 may also be an external storage device of the electronic device 2000 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the electronic device 2000. Memory 2002 may also optionally include both internal storage units and external storage devices of electronic device 2000. The memory 2002 is used to store an operating system, application programs, boot loader programs, data, and other programs, such as program code for the computer programs. The memory 2002 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The embodiment of the application also provides electronic equipment, which comprises: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing steps of any of the methods described above when the computer program is executed.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps that may implement the various method embodiments described above.

The present application provides a computer program product comprising a computer program for performing the steps of the method embodiments described above when the computer program is executed by a processor.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/electronic apparatus, recording medium, computer memory, read-only memory (ROM), random access memory (random access memory, RAM), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/device and method may be implemented in other manners. For example, the apparatus/device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for deploying a neural network model, comprising:

determining target deployment parameters of a neural network model by using a first optimization algorithm, so that the reasoning time length when the neural network model reasoning the image to be processed according to the target deployment parameters meets preset requirements, wherein the target deployment parameters comprise target layer fusion parameters and target feature graph segmentation parameters;

deploying the neural network model on the electronic device according to the target deployment parameters;

determining target deployment parameters of a neural network model by using a first optimization algorithm, so that the reasoning time length when the neural network model reasoning the image to be processed according to the target deployment parameters meets preset requirements, wherein the method comprises the following steps:

determining deployment parameters of the ith optimization by utilizing the first optimization algorithm, wherein i is an integer greater than or equal to zero;

when the ith reasoning time length meets the preset requirement, determining the ith optimized deployment parameter as the target deployment parameter; or alternatively, the process may be performed,

When the ith reasoning time length does not meet the preset requirement, repeatedly executing the steps after adding 1 to the i to obtain the ith reasoning time length, and outputting the target deployment parameter until the ith reasoning time length meets the preset requirement;

the first optimization algorithm is a bayesian optimization algorithm,

when i=0, the determining, with the first optimization algorithm, the deployment parameter for the ith optimization includes:

randomly initializing N observation points to obtain an initialized observation data set; the initialized observation data set comprises initialized deployment parameters, the initialized deployment parameters are substituted into initialization reasoning time length obtained by black box function calculation, and N is a positive integer; the initialized deployment parameters are the deployment parameters optimized for the 0 th time;

when i >0, said determining, with said first optimization algorithm, the deployment parameters of the ith optimization, comprising:

estimating the distribution of the black box function based on the data set after the i-1 th optimization by using a proxy model;

2. The method of claim 1, wherein deploying the neural network model on an electronic device according to the target deployment parameters comprises:

Dividing a neural network layer of the neural network model into a plurality of target groups according to the target layer fusion parameters, and fusing the neural network layers in the target groups;

and determining the number of feature graphs input to each target group and the size of each feature graph according to the target feature graph segmentation parameters.

3. The method according to claim 1, wherein the method further comprises:

and updating the observation data set by using the deployment parameters of the (N+i) th sampling point and the reasoning time length corresponding to the (N+i) th sampling point.

4. A method according to any one of claims 1 to 3, wherein the proxy model is a gaussian process model, a gaussian mixture model, a probabilistic random forest model or a tree structure estimation model.

5. A method according to any one of claims 1 to 3, wherein the acquisition function is a desired delta function, a probability delta function, a lower confidence boundary function or an upper confidence boundary function.

6. A neural network model deployment apparatus, comprising:

the optimizing unit is used for determining target deployment parameters of the neural network model by utilizing a first optimizing algorithm, so that the reasoning duration of the neural network model when the image to be processed is reasoning according to the target deployment parameters meets preset requirements, and the target deployment parameters comprise target layer fusion parameters and target feature graph segmentation parameters;

The deployment unit is used for deploying the neural network model on the electronic equipment according to the target deployment parameters;

the optimizing unit is specifically configured to:

the first optimization algorithm is a bayesian optimization algorithm,

when i=0, the optimization unit is specifically configured to:

When i >0, the optimization unit is specifically configured to:

7. The apparatus of claim 6, wherein the proxy model is a gaussian process model, a gaussian mixture model, a probabilistic random forest model, or a tree structure estimation model.

8. The apparatus of claim 6, wherein the acquisition function is a desired delta function, a probability delta function, a lower confidence boundary function, or an upper confidence boundary function.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 5 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method according to any one of claims 1 to 5.