CN112434717A - Model training method and device - Google Patents

Model training method and device Download PDF

Info

Publication number
CN112434717A
CN112434717A CN201910791960.7A CN201910791960A CN112434717A CN 112434717 A CN112434717 A CN 112434717A CN 201910791960 A CN201910791960 A CN 201910791960A CN 112434717 A CN112434717 A CN 112434717A
Authority
CN
China
Prior art keywords
preset
model
parameters
model parameters
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910791960.7A
Other languages
Chinese (zh)
Other versions
CN112434717B (en
Inventor
赵飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201910791960.7A priority Critical patent/CN112434717B/en
Publication of CN112434717A publication Critical patent/CN112434717A/en
Application granted granted Critical
Publication of CN112434717B publication Critical patent/CN112434717B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a model training method and a model training device, and the method comprises the steps of firstly, obtaining a plurality of groups of model parameters and a test data set of a preset model, wherein each group of model parameters are obtained by training the preset model by using the training data set under a preset scene, then, respectively carrying out generalization test on the preset model using each group of model parameters by using the test data set to obtain generalization parameters corresponding to each group of model parameters, further, calculating to obtain collaborative model parameters of the preset model under a plurality of preset scenes according to the generalization parameters and the plurality of groups of model parameters, and taking the preset model using the collaborative model parameters as a target model. Therefore, in the model training process, training data sets under a plurality of different scenes are taken as a basis, and model parameters of the plurality of different scenes are subjected to collaborative calculation, so that the generalization performance of the obtained deep learning model can be improved, and the model which can be applied to multiple scenes is obtained.

Description

Model training method and device
Technical Field
The invention relates to the technical field of data processing, in particular to a model training method and device.
Background
In some scenarios, the electronic device needs to implement complex functions, such as target detection, target recognition, target segmentation, and behavior analysis, on the image through a deep learning model.
For example, if the electronic device needs to perform target detection on an image, first, a sample image needs to be acquired, then, the sample data is input into a preset deep learning model for training to obtain a target detection model, and then, after the image to be processed is acquired, the image to be processed may be processed by using the target detection model to obtain a target detection result.
However, the deep learning model obtained by the above method has poor generalization performance, for example, the target detection model trained by sample data acquired in scene a has good target detection performance on the to-be-processed data acquired in scene a, and the target detection performance on the to-be-processed data acquired in scene B is seriously degraded.
Therefore, a model training method applicable to multiple scenes is needed.
Disclosure of Invention
The embodiment of the invention aims to provide a model training method and a model training device so as to improve the generalization performance of an obtained deep learning model. The specific technical scheme is as follows:
the embodiment of the invention provides a model training method, which comprises the following steps:
acquiring a plurality of groups of model parameters and a test data set of a preset model, wherein each group of model parameters is obtained by training the preset model by using a training data set under a preset scene;
respectively carrying out generalization test on the preset model using each group of model parameters by using the test data set to obtain generalization parameters corresponding to each group of model parameters;
and calculating to obtain collaborative model parameters of the preset model under the plurality of preset scenes according to the generalization parameters and the plurality of groups of model parameters, and taking the preset model using the collaborative model parameters as a target model.
Optionally, the obtaining multiple sets of model parameters and test data sets of the preset model includes:
acquiring multiple groups of sample data under multiple preset scenes, wherein each group of sample data corresponds to each preset scene respectively;
determining a plurality of groups of training data sets and test data sets from the plurality of groups of sample data according to a preset grouping rule;
and aiming at each group of training data sets, training the group of training data sets by using a preset model to obtain model parameters of the preset model corresponding to the group of training data sets.
Optionally, the obtaining multiple sets of sample data in multiple preset scenarios includes:
acquiring multiple groups of sample data under multiple preset scenes and identifications of the multiple preset scenes;
the method further comprises the following steps:
acquiring an initial data set and identifications of all preset scenes;
judging whether a preset scene which does not acquire the sample data exists or not according to the identifiers of the plurality of preset scenes and the identifiers of all the preset scenes;
and if the initial scene exists, selecting filling sample data of the preset scene from the initial data set aiming at each preset scene where the sample data is not acquired.
Optionally, the determining, according to a preset grouping rule, multiple sets of training data sets and test data sets from the multiple sets of sample data includes:
dividing the multiple groups of sample data into multiple groups of training data sets and test data sets, wherein the test data sets comprise one or more groups; or the like, or, alternatively,
and selecting a first amount of data from the group of sample data as a training data set of the preset scene and a second amount of data from the group of sample data as a test data set of the preset scene aiming at each group of sample data under each preset scene.
Optionally, the calculating, according to the generalization parameters and the multiple sets of model parameters, cooperative model parameters of the preset model in the multiple preset scenes, and using the preset model using the cooperative model parameters as a target model includes:
normalizing the generalization parameters of each group of model parameters to obtain the normalization value of the generalization parameters of each group of model parameters, wherein the sum of the normalization values of the generalization parameters of each group of model parameters is 1;
and taking the normalization value as the weight of each group of model parameters, calculating the weighted average value of each group of model parameters to obtain the cooperative model parameters of the preset model under the plurality of preset scenes, and taking the preset model using the cooperative model parameters as a target model.
Optionally, the obtaining multiple sets of model parameters and test data sets of the preset model includes:
acquiring a plurality of groups of model parameters and test data sets of a preset model according to a preset period;
the step of respectively calculating the generalization parameters of each group of model parameters by using the test data set comprises the following steps:
aiming at each preset period, respectively calculating each group of model parameters obtained in the preset period and generalization parameters of the collaborative model parameters determined in the previous period by using the test data set obtained in the preset period;
the step of calculating and obtaining the collaborative model parameters of the preset model under the plurality of preset scenes according to the generalization parameters and the plurality of groups of model parameters comprises:
and for each preset period, calculating to obtain collaborative model parameters of the preset model in the preset period and under the plurality of preset scenes according to the generalization parameters obtained in the preset period, the plurality of groups of model parameters obtained in the preset period and the collaborative model parameters determined in the previous period.
The embodiment of the invention also provides a model training device, which comprises:
the system comprises an acquisition module, a test module and a processing module, wherein the acquisition module is used for acquiring a plurality of groups of model parameters and a test data set of a preset model, and each group of model parameters are obtained by training the preset model by utilizing a training data set under a preset scene;
the test module is used for respectively carrying out generalization test on the preset model using each group of model parameters by using the test data set to obtain generalization parameters corresponding to each group of model parameters;
and the calculation module is used for calculating and obtaining the collaborative model parameters of the preset model under the plurality of preset scenes according to the generalization parameters and the plurality of groups of model parameters, and taking the preset model using the collaborative model parameters as a target model.
Optionally, the obtaining module is specifically configured to:
acquiring multiple groups of sample data under multiple preset scenes, wherein each group of sample data corresponds to each preset scene respectively;
determining a plurality of groups of training data sets and test data sets from the plurality of groups of sample data according to a preset grouping rule;
and aiming at each group of training data sets, training the group of training data sets by using a preset model to obtain model parameters of the preset model corresponding to the group of training data sets.
Optionally, the obtaining module is specifically configured to:
acquiring multiple groups of sample data under multiple preset scenes and identifications of the multiple preset scenes;
the device further comprises:
acquiring an initial data set and identifications of all preset scenes;
judging whether a preset scene which does not acquire the sample data exists or not according to the identifiers of the plurality of preset scenes and the identifiers of all the preset scenes;
and if the initial scene exists, selecting filling sample data of the preset scene from the initial data set aiming at each preset scene where the sample data is not acquired.
Optionally, the obtaining module is specifically configured to:
dividing the multiple groups of sample data into multiple groups of training data sets and test data sets, wherein the test data sets comprise one or more groups; or the like, or, alternatively,
and selecting a first amount of data from the group of sample data as a training data set of the preset scene and a second amount of data from the group of sample data as a test data set of the preset scene aiming at each group of sample data under each preset scene.
Optionally, the calculation module is specifically configured to:
normalizing the generalization parameters of each group of model parameters to obtain the normalization value of the generalization parameters of each group of model parameters, wherein the sum of the normalization values of the generalization parameters of each group of model parameters is 1;
and taking the normalization value as the weight of each group of model parameters, calculating the weighted average value of each group of model parameters to obtain the cooperative model parameters of the preset model under the plurality of preset scenes, and taking the preset model using the cooperative model parameters as a target model.
Optionally, the obtaining module is specifically configured to obtain multiple sets of model parameters and test data sets of a preset model according to a preset period;
the test module is specifically configured to, for each preset period, respectively calculate each group of model parameters acquired in the preset period and generalization parameters of the collaborative model parameters determined in the previous period by using the test data set acquired in the preset period;
the calculation module is specifically configured to, for each preset period, calculate and obtain collaborative model parameters of the preset model in the preset period and under the multiple preset scenes according to the generalization parameters obtained in the preset period, the multiple sets of model parameters obtained in the preset period, and the collaborative model parameters determined in the previous period.
The embodiment of the invention also provides electronic equipment which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any one of the model training methods when executing the program stored in the memory.
An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements any of the above-mentioned model training methods.
Embodiments of the present invention also provide a computer program product containing instructions, which when run on a computer, cause the computer to perform any of the above-described model training methods.
The model training method and device provided by the embodiment of the invention can firstly obtain a plurality of groups of model parameters and test data sets of a preset model, wherein each group of model parameters is obtained by training the preset model by using the training data set under a preset scene, then, the preset model using each group of model parameters is respectively subjected to generalization test by using the test data sets to obtain generalization parameters corresponding to each group of model parameters, further, collaborative model parameters of the preset model under a plurality of preset scenes are obtained by calculation according to the generalization parameters and the plurality of groups of model parameters, and the preset model using the collaborative model parameters is taken as a target model. Therefore, in the model training process, training data sets under a plurality of different scenes are taken as a basis, and model parameters of the plurality of different scenes are subjected to collaborative calculation, so that the generalization performance of the obtained deep learning model can be improved, and the model which can be applied to multiple scenes is obtained. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a model training method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For example, a target detection model trained by sample data acquired in a scene a has a good target detection performance for data to be processed acquired in a scene a, and a target detection performance for data to be processed acquired in a scene B is seriously degraded.
In order to solve the above technical problem, the present invention provides a model training method, which can be applied to various electronic devices, such as a computer, a server, a webcam, and the like, and is not limited in this embodiment of the present invention.
The following generally describes a model training method provided in an embodiment of the present invention, where the model training method includes:
acquiring a plurality of groups of model parameters and a test data set of a preset model, wherein each group of model parameters is obtained by training the preset model by using a training data set under a preset scene;
respectively carrying out generalization test on the preset model using each group of model parameters by using the test data set to obtain generalization parameters corresponding to each group of model parameters;
and calculating to obtain cooperative model parameters of the preset model under a plurality of preset scenes according to the generalization parameters and the plurality of groups of model parameters, and taking the preset model using the cooperative model parameters as a target model.
As can be seen from the above, by applying the model training method provided by the embodiment of the present invention, in the model training process, the training data sets in a plurality of different scenes are taken as the basis, and the model parameters in the plurality of different scenes are cooperatively calculated, so that the generalization performance of the obtained deep learning model can be improved, and the model applicable to multiple scenes can be obtained.
The following describes the model training method provided by the embodiment of the present invention in detail by using specific embodiments.
As shown in fig. 1, a schematic flow chart of a model training method provided in an embodiment of the present invention includes the following steps:
s101: and acquiring a plurality of groups of model parameters and test data sets of the preset model.
Each group of model parameters is obtained by training a preset model by using a training data set under a preset scene. The preset model may be a model corresponding to a specific Deep learning task, where the specific Deep learning task may be target detection, tracking, recognition, segmentation, and the like, and each preset model may have a different Network model structure, such as LSTM (Long Short-Term Memory, Long Short-Term Memory Network), ResNet (Deep Residual Network), MobileNet, initiation Network, VGG (Visual Geometry Group Network), and the like.
In one implementation, the sets of model parameters and test data sets of the predetermined model may be obtained by the electronic device (execution subject) from other devices.
For example, after the front-end device acquires a training data set for a certain scene, the front-end device may directly train the training data set to obtain a set of model parameters, and then the electronic device may obtain the model parameters from the front-end device. The front-end device may be an image capturing device, such as a camera, a monitoring probe, or the like, or an electronic device connected to the image capturing device, such as a mobile terminal connected to the image capturing device, and is not limited specifically.
Therefore, the bandwidth requirement for data summarization through network transmission can be reduced, and the model training has higher real-time performance.
Or, in another implementation manner, the multiple sets of model parameters and test data sets of the preset model may also be obtained by processing the acquired sample data by the electronic device (the execution subject). The sample data may be all data acquired by the image acquisition device, or may also be data selected according to a certain rule from the data acquired by the image acquisition device. The sample data may include any one or more types of data, such as RGB images, video, optical flow images, audio, text, and so forth.
For example, first, a plurality of sets of sample data under a plurality of preset scenes may be obtained, where each set of sample data corresponds to each preset scene. The data size of the sample data corresponding to each preset scene may be the same or different, for example, the scenes may be uniformly sampled, randomly sampled, sampled according to a certain weight ratio, and the like, and is not limited specifically.
Then, a plurality of sets of training data sets and test data sets can be determined from the plurality of sets of sample data according to a preset grouping rule. There are various ways to determine multiple sets of training data sets and test data sets from multiple sets of sample data according to a preset grouping rule.
For example, the multiple sets of sample data may be divided into multiple sets of training data sets and multiple sets of test data sets, where the test data set includes one or more sets. That is, for multiple sets of sample data under multiple preset scenarios, some sample data under the preset scenarios are training data sets, and other sample data under the other scenarios are testing data sets.
Or, for each group of sample data in each preset scene, a first amount of data may be selected from the group of sample data to serve as a training data set of the preset scene, and a second amount of data may be selected from the group of sample data to serve as a test data set of the preset scene. That is, for multiple groups of sample data under multiple preset scenarios, a first amount of data in each group of sample data is a training data set, and a second amount of sample data is a testing data set. Wherein the sum of the first number and the second number is typically less than or equal to the total number of sample data per group.
Furthermore, for each set of training data set, the set of training data set may be trained by using a preset model, so as to obtain model parameters of the preset model corresponding to the set of training data set.
In addition, in an implementation manner, when multiple groups of sample data under multiple preset scenes are acquired, identifiers of the multiple preset scenes can be synchronously acquired, and according to the identifiers of the multiple preset scenes, which preset scene each group of sample data corresponds to can be determined. Therefore, under the condition that the preset scenes are limited, the sample data of which preset scenes are missing can be judged, namely, which preset scenes do not acquire the sample data, and the missing data can be filled.
For example, first, an initial data set and identifiers of all preset scenes may be obtained, then, according to the identifiers of the plurality of preset scenes and the identifiers of all preset scenes, it is determined whether there is a preset scene in which sample data is not obtained, and if so, for each preset scene in which sample data is not obtained, the sample data for filling the preset scene is selected from the initial data set.
The initial data may be all used as filling sample data of a preset scene where sample data is not acquired, or, in a case that the data size of the initial data is large, a part of data may be selected from the initial data set to be used as filling sample data of the preset scene where sample data is not acquired, which is not limited specifically.
S102: and performing generalization test on the preset model using each group of model parameters by using the test data set to obtain the generalization parameters corresponding to each group of model parameters.
According to different preset models, different methods can be adopted to carry out generalization test on the preset model using each group of model parameters, so as to obtain the generalization parameters corresponding to each group of model parameters. The generalization parameters can reflect the adaptability of the preset model using each group of model parameters to the test data set, in other words, the better the generalization performance of the model, the more accurate the result obtained by processing the data to be processed in different scenes.
For example, if the predetermined model is a target classification model, the target classification accuracy of the target classification model using each set of model parameters may be calculated by using the target classification accuracy as a generalization parameter. The target classification accuracy of the target classification model refers to a ratio of the number of correctly classified samples to all the samples, and generally, the higher the target classification accuracy is, the better the classification effect of the target classification model is.
Or, if the preset model is a target detection model in the image processing field, then an Average Precision Mean (mAP) of target detection may be used as a generalization parameter, and the mAP of the target detection model using each set of model parameters is calculated. It is understood that when the target detection is performed on the image, a plurality of targets of different classes may be included in the image, and therefore, the classification performance and the positioning performance of the target detection model need to be evaluated. The mAP is the mean value of the average precision values of each type of targets in the detected image, and is determined by the classification performance and the positioning performance of the target detection model together, so that the performance of the target detection model can be better reflected.
In addition, in the case where the preset model is a target detection model in the image processing field, the generalization parameter may also be IoU (intersection over intersection) of the target detection model. Wherein IoU is the ratio between the area where two rectangular boxes intersect and the area where two rectangular boxes meet. It can be understood that when the target detection is performed on the image, a prediction frame is generated to mark the position of the target predicted by the target detection model in the image, and meanwhile, an original mark frame exists in the image, namely the position of the target in the image actually, and the target detection model can be evaluated by evaluating the similarity between the 2 rectangular frames. IoU can evaluate the similarity index between 2 rectangle frames, therefore, the ratio of intersection and union between the prediction frame and the original mark frame can be calculated according to IoU, wherein the optimal condition is complete overlap, i.e. the ratio is 1. That is, the closer IoU of the target detection model is to 1, the better the performance of the target detection model.
It can be understood that different preset models have different characteristics, and corresponding generalization parameters may also be different, and in the embodiment of the present invention, optimally, a parameter that can most embody the performance of the preset model may be selected as the generalization parameter.
S103: and calculating to obtain cooperative model parameters of the preset model under a plurality of preset scenes according to the generalization parameters and the plurality of groups of model parameters, and taking the preset model using the cooperative model parameters as a target model.
For example, the generalization parameters of each set of model parameters may be normalized to obtain the normalized value of the generalization parameters of each set of model parameters, wherein the sum of the normalized values of the generalization parameters of each set of model parameters is 1. Then, the normalized value is used as the weight of each group of model parameters, the weighted average value of each group of model parameters is calculated, the cooperative model parameters of the preset model under a plurality of preset scenes are obtained, and the preset model using the cooperative model parameters is used as a target model.
In one implementation, multiple sets of model parameters and test data sets of the preset model may be obtained according to a preset period. Then, aiming at each preset period, each group of model parameters obtained in the preset period and generalization parameters of the collaborative model parameters determined in the previous period are respectively calculated by using the test data set obtained in the preset period. Furthermore, the collaborative model parameters of the preset model in the preset period and under a plurality of preset scenes can be calculated according to the generalization parameters obtained in the preset period, the plurality of sets of model parameters obtained in the preset period and the collaborative model parameters determined in the previous period.
The preset period may be a short period, that is, multiple sets of model parameters and test data sets of the preset model may be obtained in real time, or the preset period may also be any fixed period, for example, every 10 minutes may be set to start obtaining new sets of model parameters and test data sets, and the like, and the details are not limited.
Alternatively, when the number of times of calculation of the preset model using the collaborative model parameters reaches a preset threshold, a new plurality of sets of model parameters and test data sets may be obtained, that is, when the preset model using the collaborative model parameters reaches a certain number of times, the preset model is adjusted once.
Or, in another implementation, the next cycle may be started to obtain new sets of model parameters and test data sets when the generalization parameters of the current cycle meet the preset conditions. For example, if the current model is the target detection model, a new cycle may be started when the mapp of the target detection model is smaller than a preset threshold.
That is to say, can be according to the multiunit model parameter and the test data set of the preset model that constantly acquire, update the cooperation model parameter to obtain the cooperation model parameter that generalization ability is stronger, in addition, can promote the quality of training data, reduced the risk of model overfitting.
As can be seen from the above, by applying the model training method provided by the embodiment of the present invention, in the model training process, the training data sets in a plurality of different scenes are taken as the basis, and the model parameters in the plurality of different scenes are cooperatively calculated, so that the generalization performance of the obtained deep learning model can be improved, and the model applicable to multiple scenes can be obtained.
An embodiment of the present invention further provides a model training apparatus, as shown in fig. 2, a schematic structural diagram of the model training apparatus not provided in the embodiment of the present invention, the apparatus includes:
an obtaining module 201, configured to obtain multiple sets of model parameters and a test data set of a preset model, where each set of model parameters is obtained by training the preset model with a training data set in a preset scene;
the test module 202 is configured to perform a generalization test on the preset model using each set of model parameters by using the test data set, respectively, to obtain a generalization parameter corresponding to each set of model parameters;
and a calculating module 230, configured to calculate, according to the generalization parameters and the multiple sets of model parameters, collaborative model parameters of the preset model in multiple preset scenes, and use the preset model using the collaborative model parameters as a target model.
In one implementation, the obtaining module 201 is specifically configured to:
acquiring multiple groups of sample data under multiple preset scenes, wherein each group of sample data corresponds to each preset scene respectively;
determining a plurality of groups of training data sets and test data sets from a plurality of groups of sample data according to a preset grouping rule;
and aiming at each group of training data sets, training the group of training data sets by using a preset model to obtain model parameters of the preset model corresponding to the group of training data sets.
In one implementation, the obtaining module 201 is specifically configured to:
acquiring a plurality of groups of sample data under a plurality of preset scenes and identifiers of the plurality of preset scenes;
the device still includes:
acquiring an initial data set and identifications of all preset scenes;
judging whether a preset scene which does not acquire the sample data exists or not according to the identifiers of the plurality of preset scenes and the identifiers of all the preset scenes;
and if the preset scene exists, selecting filling sample data of the preset scene from the initial data set aiming at each preset scene where the sample data is not acquired.
In one implementation, the obtaining module 201 is specifically configured to:
dividing a plurality of groups of sample data into a plurality of groups of training data sets and test data sets, wherein the test data sets comprise one or more groups; or the like, or, alternatively,
and selecting a first amount of data from the group of sample data as a training data set of the preset scene and a second amount of data from the group of sample data as a test data set of the preset scene aiming at each group of sample data under each preset scene.
In one implementation, the calculating module 203 is specifically configured to:
normalizing the generalization parameters of each group of model parameters to obtain the normalization value of the generalization parameters of each group of model parameters, wherein the sum of the normalization values of the generalization parameters of each group of model parameters is 1;
and taking the normalized value as the weight of each group of model parameters, calculating the weighted average value of each group of model parameters to obtain the cooperative model parameters of the preset model under a plurality of preset scenes, and taking the preset model using the cooperative model parameters as a target model.
In one implementation, the obtaining module 201 is specifically configured to obtain multiple sets of model parameters and test data sets of a preset model according to a preset period;
the test module 202 is specifically configured to, for each preset period, respectively calculate each group of model parameters obtained in the preset period and generalization parameters of the collaborative model parameters determined in the previous period by using the test data set obtained in the preset period;
the calculating module 203 is specifically configured to, for each preset period, calculate and obtain collaborative model parameters of the preset model in the preset period and under multiple preset scenes according to the generalization parameters obtained in the preset period, the multiple sets of model parameters obtained in the preset period, and the collaborative model parameters determined in the previous period.
As can be seen from the above, in the model training process, the model training device provided by the embodiment of the present invention performs collaborative computation on model parameters of a plurality of different scenes based on training data sets of the plurality of different scenes, so that the generalization performance of the obtained deep learning model can be improved, and a model applicable to multiple scenes can be obtained.
The embodiment of the present invention further provides an electronic device, as shown in fig. 3, which includes a processor 301, a communication interface 302, a memory 303, and a communication bus 304, where the processor 301, the communication interface 302, and the memory 303 complete mutual communication through the communication bus 304,
a memory 303 for storing a computer program;
the processor 301, when executing the program stored in the memory 303, implements the following steps:
acquiring a plurality of groups of model parameters and a test data set of a preset model, wherein each group of model parameters is obtained by training the preset model by using a training data set under a preset scene;
respectively carrying out generalization test on the preset model using each group of model parameters by using the test data set to obtain generalization parameters corresponding to each group of model parameters;
and calculating to obtain cooperative model parameters of the preset model under a plurality of preset scenes according to the generalization parameters and the plurality of groups of model parameters, and taking the preset model using the cooperative model parameters as a target model.
As can be seen from the above, in the model training process, the electronic device provided by the embodiment of the present invention performs collaborative computation on model parameters of a plurality of different scenes based on training data sets in the plurality of different scenes, so that the generalization performance of the obtained deep learning model can be improved, and a model applicable to multiple scenes can be obtained.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which has instructions stored therein, which when run on a computer, cause the computer to perform the model training method described in any of the above embodiments.
In a further embodiment of the present invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the model training method of any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, the electronic device embodiment and the storage medium embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and in relation to the description, reference may be made to some portions of the description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (14)

1. A method of model training, the method comprising:
acquiring a plurality of groups of model parameters and a test data set of a preset model, wherein each group of model parameters is obtained by training the preset model by using a training data set under a preset scene;
respectively carrying out generalization test on the preset model using each group of model parameters by using the test data set to obtain generalization parameters corresponding to each group of model parameters;
and calculating to obtain collaborative model parameters of the preset model under the plurality of preset scenes according to the generalization parameters and the plurality of groups of model parameters, and taking the preset model using the collaborative model parameters as a target model.
2. The method of claim 1, wherein obtaining the plurality of sets of model parameters and test data sets of the predetermined model comprises:
acquiring multiple groups of sample data under multiple preset scenes, wherein each group of sample data corresponds to each preset scene respectively;
determining a plurality of groups of training data sets and test data sets from the plurality of groups of sample data according to a preset grouping rule;
and aiming at each group of training data sets, training the group of training data sets by using a preset model to obtain model parameters of the preset model corresponding to the group of training data sets.
3. The method according to claim 2, wherein the obtaining multiple sets of sample data under multiple preset scenarios comprises:
acquiring multiple groups of sample data under multiple preset scenes and identifications of the multiple preset scenes;
the method further comprises the following steps:
acquiring an initial data set and identifications of all preset scenes;
judging whether a preset scene which does not acquire the sample data exists or not according to the identifiers of the plurality of preset scenes and the identifiers of all the preset scenes;
and if the initial scene exists, selecting filling sample data of the preset scene from the initial data set aiming at each preset scene where the sample data is not acquired.
4. The method of claim 2, wherein determining sets of training data set and testing data set from the sets of sample data according to a predetermined grouping rule comprises:
dividing the multiple groups of sample data into multiple groups of training data sets and test data sets, wherein the test data sets comprise one or more groups; or the like, or, alternatively,
and selecting a first amount of data from the group of sample data as a training data set of the preset scene and a second amount of data from the group of sample data as a test data set of the preset scene aiming at each group of sample data under each preset scene.
5. The method according to claim 1, wherein the calculating, according to the generalization parameters and the plurality of sets of model parameters, cooperative model parameters of the preset model under the plurality of preset scenarios, and using a preset model using the cooperative model parameters as a target model comprises:
normalizing the generalization parameters of each group of model parameters to obtain the normalization value of the generalization parameters of each group of model parameters, wherein the sum of the normalization values of the generalization parameters of each group of model parameters is 1;
and taking the normalization value as the weight of each group of model parameters, calculating the weighted average value of each group of model parameters to obtain the cooperative model parameters of the preset model under the plurality of preset scenes, and taking the preset model using the cooperative model parameters as a target model.
6. The method of claim 1, wherein obtaining the plurality of sets of model parameters and test data sets of the predetermined model comprises:
acquiring a plurality of groups of model parameters and test data sets of a preset model according to a preset period;
the step of respectively calculating the generalization parameters of each group of model parameters by using the test data set comprises the following steps:
aiming at each preset period, respectively calculating each group of model parameters obtained in the preset period and generalization parameters of the collaborative model parameters determined in the previous period by using the test data set obtained in the preset period;
the step of calculating and obtaining the collaborative model parameters of the preset model under the plurality of preset scenes according to the generalization parameters and the plurality of groups of model parameters comprises:
and for each preset period, calculating to obtain collaborative model parameters of the preset model in the preset period and under the plurality of preset scenes according to the generalization parameters obtained in the preset period, the plurality of groups of model parameters obtained in the preset period and the collaborative model parameters determined in the previous period.
7. A model training apparatus, the apparatus comprising:
the system comprises an acquisition module, a test module and a processing module, wherein the acquisition module is used for acquiring a plurality of groups of model parameters and a test data set of a preset model, and each group of model parameters are obtained by training the preset model by utilizing a training data set under a preset scene;
the test module is used for respectively carrying out generalization test on the preset model using each group of model parameters by using the test data set to obtain generalization parameters corresponding to each group of model parameters;
and the calculation module is used for calculating and obtaining the collaborative model parameters of the preset model under the plurality of preset scenes according to the generalization parameters and the plurality of groups of model parameters, and taking the preset model using the collaborative model parameters as a target model.
8. The apparatus of claim 7, wherein the obtaining module is specifically configured to:
acquiring multiple groups of sample data under multiple preset scenes, wherein each group of sample data corresponds to each preset scene respectively;
determining a plurality of groups of training data sets and test data sets from the plurality of groups of sample data according to a preset grouping rule;
and aiming at each group of training data sets, training the group of training data sets by using a preset model to obtain model parameters of the preset model corresponding to the group of training data sets.
9. The apparatus of claim 8, wherein the obtaining module is specifically configured to:
acquiring multiple groups of sample data under multiple preset scenes and identifications of the multiple preset scenes;
the device further comprises:
acquiring an initial data set and identifications of all preset scenes;
judging whether a preset scene which does not acquire the sample data exists or not according to the identifiers of the plurality of preset scenes and the identifiers of all the preset scenes;
and if the initial scene exists, selecting filling sample data of the preset scene from the initial data set aiming at each preset scene where the sample data is not acquired.
10. The apparatus of claim 8, wherein the obtaining module is specifically configured to:
dividing the multiple groups of sample data into multiple groups of training data sets and test data sets, wherein the test data sets comprise one or more groups; or the like, or, alternatively,
and selecting a first amount of data from the group of sample data as a training data set of the preset scene and a second amount of data from the group of sample data as a test data set of the preset scene aiming at each group of sample data under each preset scene.
11. The apparatus of claim 7, wherein the computing module is specifically configured to:
normalizing the generalization parameters of each group of model parameters to obtain the normalization value of the generalization parameters of each group of model parameters, wherein the sum of the normalization values of the generalization parameters of each group of model parameters is 1;
and taking the normalization value as the weight of each group of model parameters, calculating the weighted average value of each group of model parameters to obtain the cooperative model parameters of the preset model under the plurality of preset scenes, and taking the preset model using the cooperative model parameters as a target model.
12. The apparatus of claim 7,
the acquisition module is specifically used for acquiring a plurality of groups of model parameters and test data sets of a preset model according to a preset period;
the test module is specifically configured to, for each preset period, respectively calculate each group of model parameters acquired in the preset period and generalization parameters of the collaborative model parameters determined in the previous period by using the test data set acquired in the preset period;
the calculation module is specifically configured to, for each preset period, calculate and obtain collaborative model parameters of the preset model in the preset period and under the multiple preset scenes according to the generalization parameters obtained in the preset period, the multiple sets of model parameters obtained in the preset period, and the collaborative model parameters determined in the previous period.
13. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.
14. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.
CN201910791960.7A 2019-08-26 2019-08-26 Model training method and device Active CN112434717B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910791960.7A CN112434717B (en) 2019-08-26 2019-08-26 Model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910791960.7A CN112434717B (en) 2019-08-26 2019-08-26 Model training method and device

Publications (2)

Publication Number Publication Date
CN112434717A true CN112434717A (en) 2021-03-02
CN112434717B CN112434717B (en) 2024-03-08

Family

ID=74690262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910791960.7A Active CN112434717B (en) 2019-08-26 2019-08-26 Model training method and device

Country Status (1)

Country Link
CN (1) CN112434717B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113066486A (en) * 2021-03-25 2021-07-02 北京金山云网络技术有限公司 Data identification method and device, electronic equipment and computer readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150324690A1 (en) * 2014-05-08 2015-11-12 Microsoft Corporation Deep Learning Training System
US20150347846A1 (en) * 2014-06-02 2015-12-03 Microsoft Corporation Tracking using sensor data
CN108288038A (en) * 2018-01-19 2018-07-17 东华大学 Night robot motion's decision-making technique based on scene cut
CN109754068A (en) * 2018-12-04 2019-05-14 中科恒运股份有限公司 Transfer learning method and terminal device based on deep learning pre-training model
CN109919202A (en) * 2019-02-18 2019-06-21 新华三技术有限公司合肥分公司 Disaggregated model training method and device
CN110119815A (en) * 2019-05-21 2019-08-13 深圳市腾讯网域计算机网络有限公司 Model training method, device, storage medium and equipment
CN110135223A (en) * 2018-02-08 2019-08-16 浙江宇视科技有限公司 Method for detecting human face and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150324690A1 (en) * 2014-05-08 2015-11-12 Microsoft Corporation Deep Learning Training System
US20150347846A1 (en) * 2014-06-02 2015-12-03 Microsoft Corporation Tracking using sensor data
CN108288038A (en) * 2018-01-19 2018-07-17 东华大学 Night robot motion's decision-making technique based on scene cut
CN110135223A (en) * 2018-02-08 2019-08-16 浙江宇视科技有限公司 Method for detecting human face and device
CN109754068A (en) * 2018-12-04 2019-05-14 中科恒运股份有限公司 Transfer learning method and terminal device based on deep learning pre-training model
CN109919202A (en) * 2019-02-18 2019-06-21 新华三技术有限公司合肥分公司 Disaggregated model training method and device
CN110119815A (en) * 2019-05-21 2019-08-13 深圳市腾讯网域计算机网络有限公司 Model training method, device, storage medium and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LESLIE N.SMITH 等: "Super-Convergence:Very Fast Training of Neural Networks Using Large Learning Rates", ARXIV:1708.07120V3, 17 May 2018 (2018-05-17) *
于之靖 等: "跨域标记牌文字检测算法研究", 计算机应用与软件, no. 05, 12 May 2019 (2019-05-12) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113066486A (en) * 2021-03-25 2021-07-02 北京金山云网络技术有限公司 Data identification method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN112434717B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
CN110222791B (en) Sample labeling information auditing method and device
CN108921206B (en) Image classification method and device, electronic equipment and storage medium
CN108830235B (en) Method and apparatus for generating information
CN108805091B (en) Method and apparatus for generating a model
CN109241343B (en) System, method and device for identifying brush amount user
US20200175062A1 (en) Image retrieval method and apparatus, and electronic device
CN108197652B (en) Method and apparatus for generating information
CN109165691B (en) Training method and device for model for identifying cheating users and electronic equipment
CN110175278B (en) Detection method and device for web crawler
CN109598414B (en) Risk assessment model training, risk assessment method and device and electronic equipment
CN110889463A (en) Sample labeling method and device, server and machine-readable storage medium
WO2022213565A1 (en) Review method and apparatus for prediction result of artificial intelligence model
CN111860568B (en) Method and device for balanced distribution of data samples and storage medium
CN113763348A (en) Image quality determination method and device, electronic equipment and storage medium
CN112765402A (en) Sensitive information identification method, device, equipment and storage medium
CN110909005B (en) Model feature analysis method, device, equipment and medium
CN110111311B (en) Image quality evaluation method and device
CN116129224A (en) Training method, classifying method and device for detection model and electronic equipment
CN108805332B (en) Feature evaluation method and device
CN114494863A (en) Animal cub counting method and device based on Blend Mask algorithm
CN113516251A (en) Machine learning system and model training method
CN112434717B (en) Model training method and device
CN112580581A (en) Target detection method and device and electronic equipment
CN116912911A (en) Satisfaction data screening method and device, electronic equipment and storage medium
CN114463345A (en) Multi-parameter mammary gland magnetic resonance image segmentation method based on dynamic self-adaptive network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant