CN111242298A

CN111242298A - Training method and device for random network, storage medium and processor

Info

Publication number: CN111242298A
Application number: CN201911425437.9A
Authority: CN
Inventors: 郝佳恺; 赵广怀; 高鹏; 郝毅; 牛海洋
Original assignee: Beijing Smart Feeler Technology Co Ltd; State Grid Corp of China SGCC; State Grid Beijing Electric Power Co Ltd; Beijing Borui Xianglun Technology Development Co Ltd
Current assignee: Beijing Smart Feeler Technology Co Ltd; State Grid Corp of China SGCC; State Grid Beijing Electric Power Co Ltd; Beijing Borui Xianglun Technology Development Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-06-05

Abstract

The invention discloses a training method and device of a random network, a storage medium and a processor. Wherein, the method comprises the following steps: constructing a random network, wherein the random network comprises complex neurons; training a random network based on machine learning to obtain a first network model; fine tuning the random network based on the first network model, and obtaining a second network model after training; and comparing the prediction accuracy of the first network model and the second network model, and replacing the first network model with the second network model under the condition that the prediction accuracy of the second network model is greater than that of the first network model. The invention solves the technical problem that a network model which can effectively process the problem of complex data types cannot be trained in the prior art.

Description

Training method and device for random network, storage medium and processor

Technical Field

The present invention relates to the field of machine learning, and in particular, to a method and an apparatus for training a random network, a storage medium, and a processor.

Background

In recent years, the Artificial Intelligence (AI) technology is developed day by day, and the machine is more and more smart and can make reasoning and judgment like a human, so that the human can be better served, and the life of people is fundamentally changed. The neural network as an AI technology comes from the 70 th century, is gradually different from military projection in the early period, exceeds other AI technologies, and improves the capabilities of complex reasoning, prediction and decision-making to a new height when processing big data, and even reaches or exceeds human beings in some fields. The AlphaGo algorithm driven by AI in the last few years overcomes the most serious hands of weiqi of human beings, and shows the impressive intelligence brought by AI to the machine all over the world.

The neural network technology adopts a complex multilayer network model, comprises huge parameter clusters and can obtain tens of millions of parameters. The technique requires training the model with big data to obtain intelligent characteristics. This training process is very computationally expensive, and requires a long time for a professional AI engineer to optimize the network structure, adjust the training parameters, and try and improve continuously. Therefore, in recent years, a neural network automatic training technology (AutoML) has been developed, which can automatically preprocess big data (feature engineering), adjust a network structure, adjust network hyper-parameters, and finally automatically train a high-performance network. Therefore, the threshold of using the technology is greatly reduced, and people without AI professional technology can train a high-quality network model.

However, some success has been achieved in AutoML for processing large data of a single structure, such as image recognition, but AutoML is not applicable to all types of networks, nor is it applicable to all types of large data. For the picture detection problem, the AutoML performance is good, but for the problem of complex data types, the AutoML performance is often ineffective, and a satisfactory network model cannot be trained automatically. For example, the power purchase delay of the power purchase system is predicted, and the big data is logs of multi-module data transmission, weather data and complaint data in the power purchase system. Such complex data combinations do not allow the AutoML to perform well.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a training method, a training device, a storage medium and a processor of a random network, which are used for at least solving the technical problem that a network model capable of effectively processing the problem of complex data types cannot be trained in the prior art.

According to an aspect of the embodiments of the present invention, there is provided a training method of a random network, including: constructing a stochastic network, wherein the stochastic network comprises complex neurons; training the random network based on machine learning to obtain a first network model; fine tuning the random network based on the first network model, and obtaining a second network model after training; and comparing the prediction accuracy of the first network model and the second network model, and replacing the first network model with the second network model when the prediction accuracy of the second network model is greater than that of the first network model.

Optionally, constructing the random network comprises: initializing the random network.

Optionally, training the random network based on machine learning, and obtaining the first network model includes: obtaining a predetermined amount of training data, wherein the training data comprises: different types of network parameters and corresponding prediction results; deriving the first network model by machine learning training using the training data.

Optionally, the fine-tuning the random network based on the first network model, and obtaining a second network model after training includes: selecting at least one complex neuron, wherein at least one complex neuron is configured to increase the complexity of the stochastic network; modifying a network structure of the stochastic network based on at least one of the complex neurons; and training according to the modified network structure of the random network to obtain the second network model.

Optionally, comparing the prediction accuracy of the first network model and the second network model, and in case the prediction accuracy of the second network model is greater than the prediction accuracy of the first network model, replacing the first network model with the second network model further comprises: respectively calculating a first prediction accuracy of the first network model and a second prediction accuracy of the second network model according to the prediction results of the first network model and the second network model; determining a difference between the first prediction accuracy and the second prediction accuracy; judging whether the difference value is within a threshold range, and if the difference value is within the threshold range, stopping training the random network; and if the difference value is not in the threshold value range, continuing to train the random network.

According to another aspect of the embodiments of the present invention, there is also provided a training apparatus for a random network, including: a construction module for constructing a stochastic network, wherein the stochastic network comprises complex neurons; the first training module is used for training the random network based on machine learning to obtain a first network model; the second training module is used for carrying out fine tuning on the random network based on the first network model and obtaining a second network model after training; a substitution module for comparing the prediction accuracy of the first network model and the second network model, and substituting the first network model with the second network model if the prediction accuracy of the second network model is greater than the prediction accuracy of the first network model.

Optionally, the building module comprises: and the initialization unit is used for initializing the random network.

Optionally, the first training module comprises: an obtaining unit configured to obtain a predetermined number of training data, wherein the training data includes: different types of network parameters and corresponding prediction results; a first training unit, configured to obtain the first network model through machine learning training using the training data.

Optionally, the second training module comprises: a selecting unit, configured to select at least one complex neuron, wherein at least one complex neuron is configured to increase the complexity of the stochastic network; a modification unit for modifying the network structure of the stochastic network based on at least one of the complex neurons; and the second training unit is used for training according to the modified network structure of the random network to obtain the second network model.

Optionally, the replacement module further comprises: the calculation unit is used for respectively calculating a first prediction accuracy of the first network model and a second prediction accuracy of the second network model according to the prediction results of the first network model and the second network model; a determining unit for determining a difference between the first prediction accuracy and the second prediction accuracy; the judging unit is used for judging whether the difference value is within a threshold range, and if the difference value is within the threshold range, the training of the random network is stopped; and if the difference value is not in the threshold value range, continuing to train the random network.

According to another aspect of the embodiments of the present invention, there is also provided a storage medium, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute the random network training method described in any one of the above.

According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes the method for training a random network according to any one of the above.

In the embodiment of the invention, a random network is constructed, wherein the random network comprises complex neurons; training the random network based on machine learning to obtain a first network model; fine tuning the random network based on the first network model, and obtaining a second network model after training; and comparing the prediction accuracy of the first network model with that of the second network model, and under the condition that the prediction accuracy of the second network model is greater than that of the first network model, using the second network model to replace the first network model, and automatically training and continuously adjusting a random network to achieve the purpose of adapting to more types of big data analysis, so that the technical effects of saving labor and reducing professional knowledge required by training are achieved, and the technical problem that a network model capable of effectively processing the problem of complex data types cannot be trained in the prior art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a method of training a random network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the internal structure of a complex neuron, according to an embodiment of the invention;

FIG. 3 is a flow chart of a method of training a random network in accordance with an alternative embodiment of the present invention;

fig. 4 is a schematic diagram of a training apparatus for a random network according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided an embodiment of a method for training a stochastic network, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a flowchart of a training method of a random network according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S102, constructing a random network, wherein the random network comprises complex neurons;

the random network is a simple network generated randomly, and the random network becomes more and more complex with the further training until the prediction effect is expected.

Step S104, training a random network based on machine learning to obtain a first network model;

step S106, fine tuning is carried out on the random network based on the first network model, and a second network model is obtained after training;

and S108, comparing the prediction accuracy of the first network model and the second network model, and replacing the first network model with the second network model under the condition that the prediction accuracy of the second network model is greater than that of the first network model.

The method can be applied to predicting the electricity purchasing delay and is also suitable for other big data analysis problems with complex components.

Through the steps, the random network can be constructed, wherein the random network comprises complex neurons; training a random network based on machine learning to obtain a first network model; fine tuning the random network based on the first network model, and obtaining a second network model after training; the prediction accuracy of the first network model and the prediction accuracy of the second network model are compared, and under the condition that the prediction accuracy of the second network model is larger than that of the first network model, the second network model is used for replacing the first network model, and the purpose of adapting to more types of big data analysis is achieved by automatically training and continuously adjusting a random network, so that the technical effects of saving labor and reducing professional knowledge required by training are achieved, and the technical problem that the network model which can effectively process the problem of complex data types cannot be trained in the prior art is solved.

Optionally, constructing the random network comprises: the random network is initialized.

Through the initialization of the random network, the initial complexity of the random network can be reduced, and the network complexity can be increased continuously in subsequent training, so that a better training effect can be achieved.

Optionally, training the random network based on machine learning, and obtaining the first network model includes: acquiring a predetermined amount of training data, wherein the training data comprises: different types of network parameters and corresponding prediction results; a first network model is derived by machine learning training using the training data.

The random network is trained through a large amount of data, and the obtained first network model has better recognition capability, so that the prediction result is more accurate. For example, the network parameter to be detected is input into the first network model, and the first network model can accurately output the prediction result corresponding to the network parameter to be detected.

Optionally, the fine-tuning the random network based on the first network model, and obtaining the second network model after training includes: selecting at least one complex neuron, wherein the at least one complex neuron is configured to increase the complexity of the stochastic network; modifying a network structure of the random network based on the at least one complex neuron; and training according to the modified network structure of the random network to obtain a second network model.

The use of a more flexible architecture replaces neurons with a more complex small neural network, called a complex neuron. Fig. 2 is a schematic diagram of an internal structure of a complex neuron according to an embodiment of the present invention, and as shown in fig. 2, each complex neuron can flexibly adopt structures with various complexities, which can be a conventional simple neuron structure or a much more complex neural network, and what network structure is adopted is automatically completed by a training process.

The interior of each complex neuron can be a simple traditional structure or a small neural network. For the information of the same layer, the network automatically selects a series of complex neurons with different complexities to process. Such networks have greater flexibility and can automatically adjust complexity during training, whereas conventional networks cannot automatically adjust complexity, which requires an AI engineer to manually adjust constant experimentation based on experience.

Therefore, the more flexible network structure can process big data with complex components and adapt to more types of big data analysis problems. The network complexity can be automatically adjusted in the training process, so that the labor is saved, and the professional knowledge required by the training is reduced.

Optionally, comparing the prediction accuracy of the first network model and the second network model, and in case that the prediction accuracy of the second network model is greater than the prediction accuracy of the first network model, replacing the first network model with the second network model further comprises: respectively calculating a first prediction accuracy of the first network model and a second prediction accuracy of the second network model according to the prediction results of the first network model and the second network model; determining a difference between the first prediction accuracy and the second prediction accuracy; judging whether the difference value is within a threshold range, and if the difference value is within the threshold range, stopping training the random network; and if the difference value is not within the threshold value range, continuing to train the random network.

By the method, whether the prediction capability of the current network achieves the expected training effect or not can be judged, so that the trained network model can be accurately predicted, and the prediction capability or the recognition capability of the network model is further improved.

An alternative embodiment of the present application is described below.

Fig. 3 is a flowchart of a method for training a random network according to an alternative embodiment of the present invention, as shown in fig. 3, the method comprising the steps of:

(1) initializing a random network: randomly generating a simple network, and training the network to be more complex until the predicted effect is achieved.

(2) Training: and training the network by using big data to obtain better network parameters and prediction capability under the network configuration.

(3) Fine adjustment: and randomly fine-tuning the network, wherein the random fine-tuning comprises randomly selecting a complex neuron and randomly modifying the network structure of the neuron, and the network complexity is increased, and the increase amplitude is not large enough.

(4) The performance is improved: and comparing the prediction accuracy of the network before and after fine tuning, and judging whether the fine-tuned network is improved.

(5) The method achieves the expected results: and judging whether the predictive capability of the current network achieves the expected effect of training or not, and stopping training if the predictive capability of the current network achieves the expected effect of training.

It should be noted that, in the above method, a complex neuron concept is introduced into the network, so that the flexibility of the complexity of the network is increased, and the flexibility of the network structure is increased. A mechanism for randomly adjusting the network complexity is introduced in the automatic training process, so that the optimal network complexity can be automatically found.

Example 2

According to another aspect of the embodiment of the present invention, there is also provided a training apparatus for a random network, and fig. 4 is a schematic diagram of the training apparatus for a random network according to the embodiment of the present invention, as shown in fig. 4, the training apparatus for a random network includes: build module 42, first training module 44, second training module 46, and substitution module 48. The training apparatus of the random network will be described in detail below.

A construction module 42 for constructing a random network;

a first training module 44, connected to the building module 42, for training the random network based on machine learning to obtain a first network model;

a second training module 46, connected to the first training module 44, configured to perform fine tuning on the random network based on the first network model, and obtain a second network model after training;

and a replacing module 48, connected to the second training module 46, for comparing the prediction accuracy of the first network model and the second network model, and replacing the first network model with the second network model if the prediction accuracy of the second network model is greater than the prediction accuracy of the first network model.

The network model obtained by the device can be applied to predicting the electricity purchasing delay and is also suitable for other big data analysis problems with complex components.

The device can achieve the purpose of adapting to the analysis of more types of big data by automatically training and continuously adjusting the random network, thereby saving labor, reducing the technical effect of professional knowledge required by training, and further solving the technical problem that a network model which can effectively process the problem of complex data types cannot be trained in the prior art.

It should be noted here that the building module 42, the first training module 44, the second training module 46 and the replacing module 48 correspond to steps S102 to S108 in embodiment 1, and the modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the modules described above as part of an apparatus may be implemented in a computer system such as a set of computer-executable instructions.

Optionally, the building block comprises: and the initialization unit is used for initializing the random network.

Optionally, the first training module comprises: an acquisition unit configured to acquire a predetermined number of training data, wherein the training data includes: different types of network parameters and corresponding prediction results; and the first training unit is used for obtaining a first network model through machine learning training by using the training data.

Optionally, the second training module comprises: a selecting unit, configured to select at least one complex neuron, where the at least one complex neuron is configured to increase the complexity of the stochastic network; a modification unit for modifying a network structure of the random network based on the at least one complex neuron; and the second training unit is used for training according to the modified network structure of the random network to obtain a second network model.

Optionally, the substitution module further comprises: the calculation unit is used for respectively calculating a first prediction accuracy of the first network model and a second prediction accuracy of the second network model according to the prediction results of the first network model and the second network model; a determining unit for determining a difference between the first prediction accuracy and the second prediction accuracy; the judging unit is used for judging whether the difference value is within a threshold range, and if the difference value is within the threshold range, the training of the random network is stopped; and if the difference value is not within the threshold value range, continuing to train the random network.

Example 3

According to another aspect of the embodiments of the present invention, there is also provided a storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus where the storage medium is located is controlled to execute the method for training the random network according to any one of the foregoing methods.

Example 4

According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes a method for training a random network according to any one of the above methods.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for training a random network, comprising:

constructing a stochastic network, wherein the stochastic network comprises complex neurons;

training the random network based on machine learning to obtain a first network model;

fine tuning the random network based on the first network model, and obtaining a second network model after training;

and comparing the prediction accuracy of the first network model and the second network model, and replacing the first network model with the second network model when the prediction accuracy of the second network model is greater than that of the first network model.

2. The method of claim 1, wherein constructing a random network comprises:

initializing the random network.

3. The method of claim 1, wherein training the stochastic network based on machine learning to obtain a first network model comprises:

obtaining a predetermined amount of training data, wherein the training data comprises: different types of network parameters and corresponding prediction results;

deriving the first network model by machine learning training using the training data.

4. The method of claim 1, wherein the fine-tuning the stochastic network based on the first network model, and wherein the training to obtain the second network model comprises:

selecting at least one complex neuron, wherein at least one complex neuron is configured to increase the complexity of the stochastic network;

modifying a network structure of the stochastic network based on at least one of the complex neurons;

and training according to the modified network structure of the random network to obtain the second network model.

5. The method of any of claims 1-4, wherein comparing the predicted accuracy of the first network model and the second network model, and in the event that the predicted accuracy of the second network model is greater than the predicted accuracy of the first network model, replacing the first network model with the second network model further comprises:

respectively calculating a first prediction accuracy of the first network model and a second prediction accuracy of the second network model according to the prediction results of the first network model and the second network model;

determining a difference between the first prediction accuracy and the second prediction accuracy;

judging whether the difference value is within a threshold range, and if the difference value is within the threshold range, stopping training the random network; and if the difference value is not in the threshold value range, continuing to train the random network.

6. An apparatus for training a random network, comprising:

a construction module for constructing a stochastic network, wherein the stochastic network comprises complex neurons;

the first training module is used for training the random network based on machine learning to obtain a first network model;

the second training module is used for carrying out fine tuning on the random network based on the first network model and obtaining a second network model after training;

a substitution module for comparing the prediction accuracy of the first network model and the second network model, and substituting the first network model with the second network model if the prediction accuracy of the second network model is greater than the prediction accuracy of the first network model.

7. The apparatus of claim 6, wherein the building module comprises:

and the initialization unit is used for initializing the random network.

8. The apparatus of claim 6, wherein the first training module comprises:

an obtaining unit configured to obtain a predetermined number of training data, wherein the training data includes: different types of network parameters and corresponding prediction results;

a first training unit, configured to obtain the first network model through machine learning training using the training data.

9. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device in which the storage medium is located is controlled to execute the training method of the random network according to any one of claims 1 to 5.

10. A processor, characterized in that the processor is configured to run a program, wherein the program is configured to perform the method for training a random network according to any one of claims 1 to 5 when running.