CN113361575A

CN113361575A - Model training method and device and electronic equipment

Info

Publication number: CN113361575A
Application number: CN202110593535.4A
Authority: CN
Inventors: 李建伟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-09-07
Anticipated expiration: 2041-05-28
Also published as: CN113361575B

Abstract

The disclosure discloses a model training method and device and electronic equipment, relates to the technical field of artificial intelligence, and particularly relates to the technical field of computer vision and deep learning. The specific implementation scheme is as follows: obtaining a training sample; training a model by using the training sample, and updating model parameters of the model; and in the process of each training, responding to the end of the updating of the model parameters, and carrying out normalization processing on the model parameters. Therefore, the model is trained by utilizing the training sample, the model parameters of the model are updated, and the model parameters are normalized in response to the end of updating the model parameters in the process of each training, so that the fluctuation degree of the values of the model parameters in the training process is small, extreme change cannot occur, the model training can be accelerated, the training time of the model is saved, and the training efficiency of the model is improved.

Description

Model training method and device and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a model training method, an apparatus, an electronic device, a storage medium, and a computer program product.

Background

At present, with the development of artificial intelligence technology, models are widely applied in the fields of image recognition, voice recognition, text translation and the like, and are convenient for the life of people. For example, the face recognition model can be used for recognizing the face in the picture, so that the user identity can be obtained according to the recognized face, and the method and the device can be applied to application scenes such as smart cities. However, in the current model training method, the model training needs to take longer time, and the model training efficiency is low.

Disclosure of Invention

A model training method, apparatus, electronic device, storage medium, and computer program product are provided.

According to a first aspect, there is provided a model training method comprising: obtaining a training sample; training a model by using the training sample, and updating model parameters of the model; and in the process of each training, responding to the end of the updating of the model parameters, and carrying out normalization processing on the model parameters.

According to a second aspect, there is provided a model training apparatus comprising: the acquisition module is used for acquiring a training sample; the updating module is used for training a model by utilizing the training sample and updating the model parameters of the model; and the normalization module is used for responding to the end of updating the model parameters in the process of each training and carrying out normalization processing on the model parameters.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method of the first aspect of the disclosure.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the model training method of the first aspect of the present disclosure.

According to a fifth aspect, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements the model training method of the first aspect of the disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic flow diagram of a model training method according to a first embodiment of the present disclosure;

FIG. 2 is a schematic flow chart illustrating a process of training a model by using a training sample in a model training method according to a second embodiment of the present disclosure to update model parameters of the model;

FIG. 3 is a schematic flow chart of a normalization process for model parameters in a model training method according to a third embodiment of the present disclosure;

FIG. 4 is a schematic flow chart illustrating a normalization process of model parameters in a model training method according to a fourth embodiment of the present disclosure;

FIG. 5 is a block diagram of a model training apparatus according to a first embodiment of the present disclosure;

FIG. 6 is a block diagram of an electronic device for implementing a model training method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

AI (Artificial Intelligence) is a technical science that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. At present, the AI technology has the advantages of high automation degree, high accuracy and low cost, and is widely applied.

Computer Vision (Computer Vision) is machine Vision that uses a camera and a Computer to replace human eyes to identify, track and measure a target, and further performs graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. Computer vision is a comprehensive discipline, including computer science and engineering, signal processing, physics, applied mathematics and statistics, neurophysiology and cognitive science, and the like.

Deep Learning (Deep Learning) is a new research direction in the field of Machine Learning (Machine Learning), and is an internal rule and expression level for Learning sample data, so that a Machine can have the analysis and Learning capability like a human, can recognize data such as characters, images and sounds, and is widely applied to voice and image recognition.

Fig. 1 is a schematic flow chart diagram of a model training method according to a first embodiment of the present disclosure.

As shown in fig. 1, a model training method according to a first embodiment of the present disclosure includes:

and S101, obtaining a training sample.

It should be noted that the execution subject of the model training method of the embodiment of the present disclosure may be a hardware device with data information processing capability and/or necessary software for driving the hardware device to work. Alternatively, the execution body may include a workstation, a server, a computer, a user terminal and other intelligent devices. The user terminal includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, and the like.

In the embodiment of the disclosure, the training samples can be obtained, and the types of the training samples are not limited too much. It will be appreciated that different models may correspond to different training samples, for example, if the model is an image recognition model, the training samples may include images and image recognition reference results, if the model is a text translation model, the training samples may include text and text translation reference results, and if the model is a speech recognition model, the training samples may include speech and speech recognition reference results.

In one embodiment, the training samples may be obtained by manual calibration.

And S102, training the model by using the training sample, and updating the model parameters of the model.

In the embodiment of the disclosure, the model can be trained by using the training sample, and the model parameters of the model are updated.

In one embodiment, training the model using the training samples and updating the model parameters of the model may include inputting the training samples into the model, obtaining gradient information of a loss function of the model, and updating the model parameters of the model using the gradient information.

For example, the model is an image recognition model, the training sample comprises an image and an image recognition reference result, the image in the training sample can be input into the image recognition model, the image recognition model recognizes the image, and outputs an image recognition prediction result, the gradient information of the loss function of the model can be obtained according to the image recognition prediction result and the image recognition reference result corresponding to the image, and the model parameter of the model can be updated by using the gradient information.

And S103, in the process of each training, in response to the end of updating of the model parameters, performing normalization processing on the model parameters.

In the embodiment of the present disclosure, in the process of each training, the model parameters may be normalized (Normalization) in response to the end of the update of the model parameters. That is to say, after updating of the model parameters of each training is finished, the model parameters are normalized, and the model is continuously trained based on the normalized model parameters, so that the fluctuation degree of the values of the model parameters in the training process is small, extreme change is avoided, the model training can be accelerated, and the training time of the model is saved.

In the embodiment of the present disclosure, the normalization processing manner is not limited too much. For example, a preset algorithm may be used for normalization, wherein the preset algorithm may be set according to actual conditions, for example, x may be used^’(x-min)/(max-min), where x is the model parameter to be normalized, x^’The model parameters are normalized, min is the minimum value of the model parameters to be normalized, max is the maximum value of the model parameters to be normalized, and the model parameters can be adjusted to be 0-1 by the formula.

In an embodiment, in the process of each training, the method further includes judging whether a condition for finishing the training is satisfied in response to the end of the normalization processing of the model parameter, if so, finishing the model training, and if not, continuing to train the model by using the training sample. The condition for finishing training may be set according to an actual situation, for example, but not limited to, when the model precision reaches a preset threshold, the number of model iterations reaches a preset number threshold, and the like, which is not limited herein.

In summary, according to the model training method disclosed by the embodiment of the present disclosure, the model is trained by using the training sample, the model parameters of the model are updated, and the model parameters are normalized in response to the end of updating the model parameters in each training process, so that the fluctuation degree of the values of the model parameters in the training process is small, extreme changes do not occur, the model training can be accelerated, the training time of the model is saved, and the training efficiency of the model is improved.

On the basis of any of the above embodiments, training the model with the training sample in step S102 includes obtaining the actual size of the convolution kernel of the model, generating a candidate training size of the convolution kernel according to the actual size and the preset interval size, and training the model with the training sample, where the size of the convolution kernel of the model is any candidate training size.

In embodiments of the present disclosure, the actual size of the convolution kernel of the model may be obtained, where the actual size of the convolution kernel includes the actual height and the actual width of the convolution kernel. For example, the actual size of the convolution kernel for a layer of the model is 7 x 7, and the actual height and the actual width of the convolution kernel are both 7.

Further, a candidate training size for the convolution kernel may be generated based on the actual size and the preset interval size. The preset interval size comprises a preset height interval size and a preset width interval size, and can be set according to actual conditions. It should be noted that the candidate training size of the convolution kernel includes the actual size of the convolution kernel. For example, the preset space size may be 2 × 2, the preset height space size and the preset width space size are both 2, and when the actual size is 7 × 7, the generated candidate training sizes may include 7 × 7, 5 × 5, and 3 × 3.

Further, the model may be trained using training samples, where the size of the convolution kernel of the model is any candidate training size. In one embodiment, the model may be trained using training samples in the order of the candidate training sizes in reverse order, for example, when the candidate training sizes include 7 × 7, 5 × 5, and 3 × 3, the model with convolution kernels having sizes of 7 × 7, 5 × 5, and 3 × 3 may be trained using training samples in sequence.

Therefore, the method can generate candidate training sizes according to the actual sizes of the convolution kernels of the models, and train the models with the sizes of the convolution kernels being any candidate training sizes by using the training samples.

On the basis of any of the above embodiments, the model may comprise a plurality of sub-models, at least two of which have model parameters that coincide with each other. Therefore, overlapped model parameters are arranged between at least two sub-models, so that the model training frequency is reduced, and the model training efficiency is improved.

In one embodiment, the topology of the submodel is the same as the topology of the model, and the model parameters of the submodel are part of the model parameters of the model.

In one embodiment, the model may be a hyper-network model constructed by a NAS (Neural Architecture Search) algorithm, the hyper-network model having a Search space comprising a plurality of sub-models with coinciding model parameters between at least two of the sub-models.

Further, as shown in fig. 2, training the model with the training samples in step S102, and updating the model parameters of the model may include:

s201, training the sub-model by using the training sample, and updating the model parameters of the sub-model.

In the embodiment of the disclosure, the sub-model can be trained by using the training sample, and the model parameters of the sub-model are updated.

In one embodiment, the target sub-model of the training may be obtained from a preset sub-model training queue, and the training sample is used to train the target sub-model, so as to update the model parameters of the target sub-model. The sub-model training queue can be set according to actual conditions, and is not limited too much. The number of the target submodels may be one or more, that is, the number of the submodels participating in the training process may be one or more, which is not limited herein.

S202, updating model parameters of other submodels of the model based on the updated model parameters of the submodels.

In the embodiment of the disclosure, the model parameters of the rest submodels of the model can be updated based on the updated model parameters of the submodels, so that the model parameters of the submodels can be updated synchronously.

In an embodiment, updating the model parameters of the remaining submodels of the model based on the updated model parameters of the submodel may include obtaining an associated submodel corresponding to the submodel, where there is a first model parameter that coincides between the submodel and the associated submodel, and updating the first model parameter of the associated submodel based on the updated first model parameter of the submodel. It is understood that the number of associated submodels corresponding to a submodel may be one or more, and is not limited herein.

For example, if submodel M₁And associated submodel M₂A first model parameter a, a sub-model M with coincidence therebetween₁The updated first model parameter a has a value of 10, and the submodel M is associated₂Has a value of 15, the submodel M may be associated with₂The value of the first model parameter a of (1) is updated from 15 to 10.

Therefore, the method can train the submodel by utilizing the training sample, update the model parameters of the submodel, and update the model parameters of the other submodels of the model based on the updated model parameters of the submodel, thereby being beneficial to reducing the training times of the model and improving the training efficiency of the model.

On the basis of any of the above embodiments, as shown in fig. 3, the normalizing the model parameters in step S103 may include:

s301, responding to the fact that the number of model layers participating in training at this time is smaller than the actual number of model layers, and determining the integral quantization parameter corresponding to the model based on the model parameters of all the model layers.

It is understood that the number of model layers involved in training each time may be smaller than the actual number of model layers, for example, the actual number of model layers is 5, and the number of model layers involved in training a certain time may be 4. Then, the model is trained by using the training sample, the model parameters of each layer in the model participating in the training at this time are updated, and the model parameters of each layer in the model not participating in the training at this time are not updated.

In the embodiment of the disclosure, in response to that the number of model layers participating in the training at this time is smaller than the actual number of model layers, the overall quantization parameter corresponding to the model is determined based on the model parameters of all the model layers. Therefore, when the number of model layers participating in the training at this time is smaller than the actual number of model layers, model parameters of all the model layers can be comprehensively considered to determine the integral quantitative parameters of the model, and the fluctuation degree of the values of the model parameters in the training process is favorably reduced.

In the embodiments of the present disclosure, the type of the integer quantization parameter is not limited too much. It is understood that different normalization processing manners may correspond to different overall quantization parameters. E.g. continue with x^’As an example, the overall quantization parameter may be a maximum value and a minimum value among model parameters of all layers of the model.

S302, based on the integral quantization parameter, the model parameter of each layer in the model is normalized.

In an embodiment of the present disclosure, the model parameters of each layer in the model may be normalized based on the overall quantization parameter. That is, the model parameters of each layer in the model participating in the training at this time are normalized, and the model parameters of each layer in the model not participating in the training at this time are also normalized.

E.g. continue with x^’Taking the normalization processing mode of (x-min)/(max-min) as an example, the overall quantization parameters are max and min, wherein max is the maximum value of the model parameters of all layers of the model,min is the minimum value of the model parameters of all layers of the model, and the model parameters of each layer in the model can be normalized based on the integral quantization parameters max and min.

Therefore, when the number of model layers participating in training at this time is smaller than the actual number of model layers, the method can determine the integral quantization parameter corresponding to the model based on the model parameters of all the model layers, and perform normalization processing on the model parameters of each layer in the model based on the integral quantization parameter, can determine the integral quantization parameter of the model by comprehensively considering the model parameters of all the model layers, and is beneficial to reducing the fluctuation degree of the values of the model parameters in the training process.

On the basis of any of the above embodiments, as shown in fig. 4, the normalizing the model parameters in step S103 may include:

s401, responding to the fact that the number of model layers participating in training at this time is equal to the actual number of model layers, and determining unit quantization parameters corresponding to any layer based on model parameters of any layer of the model.

It is understood that the number of model layers involved in training each time may be equal to the number of actual model layers, for example, the number of actual model layers is 5, and the number of model layers involved in training a certain time may be 5. The model is trained using the training samples at this time, and the model parameters of each layer in the model are updated.

In the embodiment of the disclosure, in response to that the number of model layers participating in the training at this time is equal to the actual number of model layers, the unit quantization parameter corresponding to any layer is determined based on the model parameter of any layer of the model. Therefore, when the number of model layers participating in training at this time is equal to the actual number of model layers, the unit quantization parameter corresponding to any layer can be determined by using the model parameter of any layer of the model, different layers of the model can correspond to different unit quantization parameters, and the flexibility is high.

In the embodiments of the present disclosure, the type of the unit quantization parameter is not limited too much. It is understood that different normalization processing manners may correspond to different unit quantization parameters. E.g. continue with x^’For example, the normalization processing method of (x-min)/(max-min) is used, and the unit corresponding to any layerThe quantization parameter may be a maximum value and a minimum value of the model parameter of any layer of the model.

S402, based on the unit quantization parameters, the model parameters of any layer are normalized.

In the embodiment of the present disclosure, the model parameter of any layer may be normalized based on the unit quantization parameter corresponding to any layer. That is, the model parameters of each layer in the model are normalized.

E.g. continue with x^’For example, the unit quantization parameters corresponding to any layer are max and min, where max is the maximum value of the model parameter of any layer of the model, and min is the minimum value of the model parameter of any layer of the model, and the model parameter of any layer may be normalized based on the unit quantization parameters max and min.

Therefore, when the number of model layers participating in training at this time is equal to the actual number of model layers, the method can determine the unit quantization parameter corresponding to any layer based on the model parameter of any layer of the model, and perform normalization processing on the model parameter of any layer based on the unit quantization parameter corresponding to any layer, and different layers of the model can correspond to different unit quantization parameters, so that the flexibility is high.

On the basis of any embodiment, when the model is a super-network model constructed by the NAS algorithm, after training of the super-network model is finished, the sub-models meeting the preset conditions can be screened from the search space of the super-network model. The preset conditions can be set according to actual conditions, for example, but not limited to, the accuracy of the submodel is greater than a preset accuracy threshold, the speed is greater than a preset speed threshold, and the occupied space is within a preset space range. Therefore, the method only needs to train one hyper-network model, and can screen out the sub-models meeting the preset conditions from the hyper-network model based on the preset conditions, thereby being beneficial to saving the model training time.

Fig. 5 is a block diagram of a model training apparatus according to a first embodiment of the present disclosure.

As shown in fig. 5, the model training apparatus 500 of the embodiment of the present disclosure includes: an obtaining module 501, an updating module 502 and a normalizing module 503.

An obtaining module 501, configured to obtain a training sample;

an updating module 502, configured to train a model using the training sample, and update model parameters of the model;

and a normalization module 503, configured to perform normalization processing on the model parameters in response to the end of updating the model parameters in each training process.

In an embodiment of the present disclosure, the normalization module 503 is specifically configured to: responding to the fact that the number of model layers participating in training at this time is smaller than the actual number of model layers, and determining an integral quantization parameter corresponding to the model based on model parameters of all the model layers; and based on the integral quantification parameter, carrying out normalization processing on the model parameter of each layer in the model.

In an embodiment of the present disclosure, the normalization module 503 is specifically configured to: responding to the fact that the number of model layers participating in training at this time is equal to the actual number of model layers, and determining unit quantization parameters corresponding to any layer based on model parameters of any layer of the model; and normalizing the model parameters of any layer based on the unit quantization parameters.

In an embodiment of the present disclosure, the update module 502 is specifically configured to: acquiring the actual size of a convolution kernel of the model; generating a candidate training size of the convolution kernel according to the actual size and a preset interval size; and training the model by using the training samples, wherein the size of a convolution kernel of the model is any one of the candidate training sizes.

In one embodiment of the present disclosure, the model includes a plurality of sub-models, at least two of the sub-models having model parameters that coincide therebetween.

In an embodiment of the present disclosure, the update module 502 is specifically configured to: training the submodel by using the training sample, and updating the model parameters of the submodel; updating the model parameters of the other submodels of the model based on the updated model parameters of the submodels.

In summary, the model training device according to the embodiment of the present disclosure trains the model by using the training sample, updates the model parameters of the model, and normalizes the model parameters in response to the end of updating the model parameters in each training process, so that the fluctuation degree of the values of the model parameters in the training process is small, extreme changes do not occur, the model training can be accelerated, the training time of the model is saved, and the training efficiency of the model is improved.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the electronic device 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the electronic device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the electronic device 600 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the various methods and processes described above, such as the model training methods described in fig. 1-4. For example, in some embodiments, the model training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the model training method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the model training method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

According to an embodiment of the present disclosure, there is also provided a computer program product including a computer program, wherein the computer program, when executed by a processor, implements the model training method according to the above-mentioned embodiment of the present disclosure.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A model training method, comprising:

obtaining a training sample;

training a model by using the training sample, and updating model parameters of the model;

and in the process of each training, responding to the end of the updating of the model parameters, and carrying out normalization processing on the model parameters.

2. The method of claim 1, wherein the normalizing the model parameters comprises:

responding to the fact that the number of model layers participating in training at this time is smaller than the actual number of model layers, and determining an integral quantization parameter corresponding to the model based on model parameters of all the model layers;

and based on the integral quantification parameter, carrying out normalization processing on the model parameter of each layer in the model.

3. The method of claim 1, wherein the normalizing the model parameters comprises:

responding to the fact that the number of model layers participating in training at this time is equal to the actual number of model layers, and determining unit quantization parameters corresponding to any layer based on model parameters of any layer of the model;

and normalizing the model parameters of any layer based on the unit quantization parameters.

4. The method of claim 1, wherein the training the model with the training samples comprises:

acquiring the actual size of a convolution kernel of the model;

generating a candidate training size of the convolution kernel according to the actual size and a preset interval size;

and training the model by using the training samples, wherein the size of a convolution kernel of the model is any one of the candidate training sizes.

5. The method of any of claims 1-4, wherein the model comprises a plurality of sub-models, at least two of the sub-models having model parameters that coincide therebetween.

6. The method of claim 5, wherein the training the model with the training samples, updating model parameters of the model, comprises:

training the submodel by using the training sample, and updating the model parameters of the submodel;

updating the model parameters of the other submodels of the model based on the updated model parameters of the submodels.

7. A model training apparatus comprising:

the acquisition module is used for acquiring a training sample;

the updating module is used for training a model by utilizing the training sample and updating the model parameters of the model;

and the normalization module is used for responding to the end of updating the model parameters in the process of each training and carrying out normalization processing on the model parameters.

8. The apparatus according to claim 7, wherein the normalization module is specifically configured to:

9. The apparatus according to claim 7, wherein the normalization module is specifically configured to:

10. The apparatus of claim 7, wherein the update module is specifically configured to:

acquiring the actual size of a convolution kernel of the model;

11. The apparatus of any of claims 7-10, wherein the model comprises a plurality of sub-models, at least two of the sub-models having model parameters that coincide therebetween.

12. The apparatus of claim 11, wherein the update module is specifically configured to:

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the model training method according to any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements a model training method according to any one of claims 1-6.