CN117009775A

CN117009775A - Model training data acquisition method, model training method and device

Info

Publication number: CN117009775A
Application number: CN202311278048.4A
Authority: CN
Inventors: 张潇澜; 李峰
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2023-09-28
Filing date: 2023-09-28
Publication date: 2023-11-07

Abstract

The embodiment of the application provides a model training data acquisition method, a model training method and a device, which comprise the following steps: acquiring a first target model; and filtering the data subset based on the first target model to obtain a target data set, wherein the data subset is obtained by dividing the original data based on a preset dividing instruction input by a user, and the target data set is used for training a first target model corresponding to the first target model. According to the embodiment of the application, the filtering of the sample data with poor quality can be realized by filtering the data subset, the quality of the data set during training the model is ensured, the data set scale can be reduced while the quality of the data set is ensured in the face of large-scale data set, the time consumption of the subsequent model training is reduced, and the influence of the data set with poor quality on the model updating direction during the subsequent model training can be reduced by removing the data with poor quality.

Description

Model training data acquisition method, model training method and device

Technical Field

The application relates to the technical field of data processing, in particular to a model training data acquisition method, a model training method and a model training device.

Background

The data set contains rich real characteristic information, and model training is a process of continuously updating model parameters by using data characteristics. The larger the data scale is, the more abundant the data features are, and the better the model performance obtained by training is and the higher the generalization is. However, the advent of large-scale data sets has created significant difficulties for model training, such as too long training times, failure of hardware devices to keep up, and so forth.

In view of the above problems, in the related art, model training is performed through data parallelism, when a large-scale data set is faced, for example, object detection is an image segmentation based on object geometry and statistical characteristics, the segmentation and recognition of the object are combined into one, the accuracy and the instantaneity are an important capability of the whole system, especially when large-scale object detection processing is performed, when the data set required by training is huge, in order to increase the training speed of the object detection processing model and the processing speed when the object detection processing is performed, the related art only increases the training throughput through increasing parallel training equipment, and reduces the whole training time consumption, however, the training process is optimized through increasing the number of equipment, when the data set required by more complex and standard object detection is faced, the data set can reach a text file of TB level, the training process is completed for several months, and a large amount of equipment is required to be increased, and meanwhile, the training time consumption still cannot be reduced to a great extent.

Disclosure of Invention

The embodiment of the application aims to provide a model training data acquisition method, a model training method and a model training device, which are used for solving the technical problem that training time consumption cannot be reduced better when a complex and large-scale data set is faced in the prior art. The specific technical scheme is as follows:

in a first aspect of the present application, there is provided a model training data acquisition method applied to a GPU, the model training data acquisition method including:

acquiring a first target model;

and filtering the data subset based on the first target model to obtain a target data set, wherein the data subset is obtained by dividing the original data based on a preset dividing instruction input by a user, and the target data set is used for training the first target model.

Optionally, the filtering the subset of data based on the first target model to obtain a target data set includes:

acquiring a deviation value corresponding to each sample in any one data subset;

and adding the sample to a target data set under the condition that the deviation value corresponding to the sample is detected to meet the preset filtering condition.

Optionally, the obtaining the deviation value corresponding to each sample in the arbitrary data subset includes:

Acquiring all samples in any one data subset;

and inputting each sample into the first target model to perform model reasoning processing to obtain a deviation value corresponding to each sample, wherein the deviation value is the deviation between the reasoning result of the sample and the real result of the sample.

Optionally, the adding the sample to the target data set includes, if it is detected that the deviation value corresponding to the sample meets a preset filtering condition;

comparing the deviation value corresponding to the sample with a preset threshold value to obtain a comparison result, wherein the comparison result is a Boolean value;

adding the sample to the target data set if the boolean value of 1 is detected;

and removing the sample when the Boolean value is detected to be 0.

Optionally, the acquiring the first target model includes:

under the condition that initial models are loaded in a preset model library for the first time, the initial models are used as first target models;

in case that the latest model of the current version is detected to be loaded in the preset model library, the latest model is taken as a first target model.

Optionally, before the step of acquiring the first target model, the method comprises:

dividing the original data into data sets comprising at least one data subset according to a preset dividing instruction input by a user;

screening processing is carried out in the data set to obtain an initial sample;

and performing pre-training treatment according to the initial sample to obtain an initial model, wherein the initial model is used as a first target model when being loaded in a preset model library for the first time.

In still another aspect of the present application, there is further provided a model training method applied to a GPU, the model training method including:

performing pre-training treatment according to the initial sample to obtain an initial model, wherein the initial model is used as a first target model when being loaded in a preset model library for the first time;

and carrying out model evolution processing on the data subset according to the first target model to obtain a second target model.

Optionally, the performing model evolution processing on the data subset according to the first target model to obtain a second target model includes:

Acquiring a first target model;

filtering the data subset based on the first target model to obtain a target data subset;

training the first target model according to target samples in the target data subset to obtain a second target model, wherein the second target model is used for filtering the data subset again.

Optionally, after the step of performing model evolution processing on the subset of data according to the first target model to obtain a second target model, the method includes:

performing verification processing on the second target model according to a preset verification data set;

and if the performance corresponding to the second target model meets the preset condition, stopping training the second target model.

Optionally, the training the first target model according to the target samples in the target data subset, to obtain a second target model includes:

and training the first target model according to the target sample and a preset training hyper-parameter to obtain a second target model.

Optionally, the preset training hyper-parameters include at least one of:

training times, batch data size, learning rate and optimizer.

Optionally, after the step of acquiring the first target model, the method comprises:

and carrying out model evolution processing on all the data subsets based on the first target model and a preset sequence corresponding to the data subsets to obtain a target transverse model.

Optionally, performing model evolution processing on all the data subsets based on the first target model and a preset sequence corresponding to the data subsets, and obtaining the target transverse model includes:

performing model evolution processing on the data subset with the first priority based on the first target model and a preset sequence corresponding to the data subset to obtain a first target transverse model, wherein the first target transverse model is a model which is subjected to preset training rounds or reaches a preset training progress;

performing model evolution processing on the data subset with the second priority based on the first transverse target model and a preset sequence corresponding to the data subset to obtain a second target transverse model, wherein the second target transverse model is a model which is subjected to preset training rounds or reaches a preset training progress;

and taking the second target transverse model as the target transverse model under the condition that all the data subsets are detected to finish model evolution processing.

and carrying out model evolution processing on all the data subsets based on the first target model, wherein the data subsets correspond to a preset sequence and a single training round to obtain a target longitudinal model.

Optionally, based on the first target model, performing model evolution processing on all data subsets in a preset sequence and a single training round corresponding to the data subsets, and obtaining a target longitudinal model includes:

based on the first target model, carrying out model evolution processing of a single training round on the data subset of the first priority according to a preset sequence corresponding to the data subset to obtain a first target longitudinal model;

based on the first target longitudinal model, performing model evolution processing of a single training round on the data subset of the second priority according to a preset sequence corresponding to the data subset to obtain a second target longitudinal model;

based on the second target longitudinal model, performing model evolution processing of a single training round on the data subset of the first priority according to a preset sequence corresponding to the data subset to obtain a third target longitudinal model;

Based on the third target longitudinal model, performing model evolution processing of a single training round on the data subset of the second priority according to a preset sequence corresponding to the data subset to obtain a fourth target longitudinal model;

and taking the fourth target longitudinal model as the target longitudinal model under the condition that all the data subsets are detected to finish model evolution processing.

Optionally, the performing model evolution processing on the samples in the target data set according to the first target model and the target data set to obtain a second target model includes:

and carrying out model evolution processing on any one data subset based on the first target model to obtain a target hybrid model.

Optionally, performing model evolution processing on any one data subset based on the first target model to obtain a target hybrid model, wherein the target hybrid model comprises;

performing model evolution processing on any one data subset based on the first target model to obtain a first target hybrid model;

performing model evolution processing on any one data subset based on the first target mixed model to obtain a second target mixed model;

and taking the second target mixed model as the target longitudinal model under the condition that all the data subsets are detected to finish model evolution processing.

Optionally, the dividing the original data into the data sets including at least one data subset according to the preset dividing instruction input by the user includes:

and under the condition that a balanced division instruction sent by a user is received, extracting in the original data according to a preset sequence to obtain at least one data subset, wherein the number of samples in each data subset is the same.

under the condition of receiving an unbalanced dividing instruction sent by a user, clustering the original data to obtain target original data of different categories;

determining sampling weights corresponding to the target original data according to the categories of the target original data;

and sampling the target original data according to the sampling weight to obtain a plurality of data subsets.

In still another aspect of the present application, there is further provided a model training data acquisition apparatus applied to a GPU, the apparatus including:

the acquisition module is used for acquiring a first target model;

the filtering module is used for filtering the data subset based on the first target model to obtain a target data set, wherein the data subset is obtained by dividing original data based on a preset dividing instruction input by a user, and the target data set is used for training the first target model.

In still another aspect of the present application, there is also provided a model training apparatus applied to a GPU, the apparatus including:

the dividing module is used for dividing the original data into data sets comprising at least one data subset according to a preset dividing instruction input by a user;

the first screening module is used for screening in the data set to obtain an initial sample;

the pre-training module is used for performing pre-training treatment according to the initial sample to obtain an initial model, wherein the initial model is used as a first target model when being loaded in a preset model library for the first time;

and the model evolution module is used for carrying out model evolution processing on the data subset according to the first target model to obtain a second target model.

In yet another aspect of the present application, there is also provided a communication device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing any one of the model training data acquisition methods or the model training methods when executing the programs stored in the memory.

In yet another aspect of the present application, there is also provided a computer readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform any one of the model training data acquisition methods described above, or the model training method.

The model training data acquisition method provided by the embodiment of the application is implemented by acquiring a first target model; and filtering the data subset based on the first target model to obtain a target data set, wherein the data subset is obtained by dividing the original data based on a preset dividing instruction input by a user, and the target data set is used for training the first target model. According to the embodiment of the application, the filtering of the sample data with poor quality can be realized by filtering the data subset, the quality of the data set during training the model is ensured, the data set scale can be reduced while the quality of the data set is ensured in the face of large-scale data set, the time consumption of the subsequent model training is reduced, and the influence of the data set with poor quality on the model updating direction during the subsequent model training can be reduced by removing the data with poor quality.

In addition, by preprocessing and pre-training the original data, the characteristics of the training data can be better mined, the training data is split into a plurality of data subsets according to requirements, the training time is reduced, the hardware threshold is reduced, the application range of the model training process after optimization is wider, the efficiency and flexibility of an algorithm can be greatly improved in the fields of image classification, target detection, image segmentation and natural language processing, the hardware cost of model training is reduced, the training time is consumed, and the flexibility of the training process is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flowchart showing steps of a method for obtaining model training data according to an embodiment of the present application;

FIG. 2 is a flowchart showing steps of a model training method according to an embodiment of the present application;

FIG. 3 shows a second flowchart of steps of a model training method provided by an embodiment of the present application;

FIG. 4 shows a third flowchart of steps of a model training method provided by an embodiment of the present application;

FIG. 5 shows a fourth flowchart of steps of a model training method provided by an embodiment of the present application;

FIG. 6 shows a block diagram of a model training data acquisition apparatus according to an embodiment of the present application;

FIG. 7 shows a block diagram of a model training apparatus according to an embodiment of the present application;

fig. 8 is a block diagram of a communication device according to an embodiment of the present application;

FIG. 9 shows a schematic diagram of model evolution in a processing unit according to an embodiment of the present application;

FIG. 10 shows a data filtering flow chart provided by an embodiment of the present application;

FIG. 11 illustrates a schematic view of a lateral process provided by an embodiment of the present application;

FIG. 12 is a schematic diagram of a model sequence of a lateral process provided by an embodiment of the present application;

FIG. 13 illustrates a schematic view of a longitudinal process provided by an embodiment of the present application;

FIG. 14 is a schematic diagram of a model sequence of a longitudinal process provided by an embodiment of the present application;

FIG. 15 shows a schematic diagram of a mixing process provided by an embodiment of the present application;

FIG. 16 is a schematic diagram of a model sequence of a blending process according to an embodiment of the present application;

fig. 17 shows a schematic diagram of a model training process according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not be construed as limiting the specific implementation of the present application, and the embodiments can be mutually combined and referred to without contradiction.

It should be noted that, in the embodiment of the present application, the method is applied to a GPU, and a plurality of processing Unit modules are mounted on a GPU card, as shown in fig. 9, fig. 9 shows a schematic diagram of model evolution in a processing Unit provided in the embodiment of the present application, where each Processing Unit (PU) includes four processes corresponding to model evolution, that is, four basic processes of model loading, data filtering, model training and model saving.

The model evolution process is carried out by carrying the processing unit on the GPU, so that the training process is more flexible, and training is stopped or the model is reloaded to start training at any time according to the actual demands of users.

Referring to fig. 1, a flowchart illustrating steps of a model training data acquisition method according to an embodiment of the present application is shown, where the method may include:

step 101, a first target model is acquired.

Further, in step 101, the obtaining a first object model includes:

It should be noted that, step 101 is a model loading module, and a first target model is obtained, where the first target model is the latest model in the preset model library, and if the first target model is the initial model after the pre-training is used as the first target model.

Specifically, before step 101, the method further includes: dividing the original data into data sets comprising at least one data subset according to a preset dividing instruction input by a user; screening processing is carried out in the data set to obtain an initial sample; and performing pre-training treatment according to the initial sample to obtain an initial model, wherein the initial model is used as a first target model when being loaded in a preset model library for the first time.

By loading the current latest model, the latest model (namely the first target model) can be recorded as a model M, if the model M is loaded for the first time, the pre-training model is loaded, otherwise, the latest model in the current state is loaded.

Step 102, filtering the data subset based on the first target model to obtain a target data set, wherein the data subset is obtained by dividing original data based on a preset dividing instruction input by a user, and the target data set is used for training the first target model.

It should be noted that, in the foregoing model loading process performed in the processing unit on the GPU card in step 101, the data filtering process performed in the processing unit on the GPU card in step 102, for the data filtering portion, any one data subset is filtered by using the latest model, where any one data subset may be denoted as subseti, i denotes a data subset of the data set divided by the original data, and a new data set is obtained, and the latest data set may be denoted as f_subseti, that is, the target data set.

It should be noted that, in the embodiment of the present application, the data subsets are based on different sample data, and may be divided into different data subsets according to the attribute of the sample data, or may be mixed sample data.

Specifically, the application is applied to a target detection scene, for example, target detection is performed on an object moving in real time, the user-required data subset can be a data subset formed based on image data, or a data subset formed based on video frame data in video data, and subsequent model training is realized by taking the image data or the video data as the data subset.

In addition, the method can be applied to other application scenes needing models, for example, analysis processing of server data needs to be performed on a large amount of data in real time, and then the method can be used for processing model training data in a model training stage so as to further continue subsequent operation and maintenance management of a server based on the model.

Further, the filtering the subset of data based on the first target model to obtain a target data set includes: acquiring a deviation value corresponding to each sample in any one data subset; and adding the sample to a target data set under the condition that the deviation value corresponding to the sample is detected to meet the preset filtering condition.

Further, the obtaining the deviation value corresponding to each sample in the arbitrary data subset includes: acquiring all samples in any one data subset; and inputting each sample into the first target model to perform model reasoning processing to obtain a deviation value corresponding to each sample, wherein the deviation value is the deviation between the reasoning result of the sample and the real result of the sample.

Further, the adding the sample to the target data set includes, when detecting that the deviation value corresponding to the sample meets a preset filtering condition;

adding the sample to the target data set if the boolean value of 1 is detected;

and removing the sample when the Boolean value is detected to be 0.

It should be noted that, as shown in fig. 10, fig. 10 shows a data filtering flow chart provided by the embodiment of the present application, it can be seen that after the first target model is obtained, a latest model M is obtained, at this time, one of the samples is obtained from one of the data subsets in the data set, for example, the 2 nd sample s2 is obtained from the subset1, and the sample s2 is used as an input of the model M to perform model reasoning.

Specifically, for example, when a user performs target detection to detect whether the sample image data is a cat, a plurality of sample image data, such as a cat, a dog and a rabbit, are taken as inputs, and the cat, the dog and the rabbit are input into a preset target detection model, at this time, each input corresponds to two output values, one is a probability that the preset target detection model determines that the current sample is a real cat, that is, a model reasoning result, and the other is a loss value, which can be expressed as loss, wherein loss represents a deviation value between the model reasoning result and the real result.

Therefore, each sample is input into the first target model corresponding to the first target model to perform model reasoning processing, and a deviation value corresponding to each sample is obtained, wherein the deviation value can be a loss value in the model reasoning process.

The above-mentioned deviation value is not limited to the loss value, and other expressions may be adopted in other items, for example, mou may be adopted as the deviation value, and in general, the deviation value in the present application relates to a parameter related to the sample quality index in order to represent the sample variability.

In the application, the process of data filtering aims at exploring the difference between a sample and information represented by a current model M, and by setting a filtering threshold T and comparing and judging the deviation value corresponding to each sample, whether the sample is a sample with good quality required by subsequent model training is determined, wherein the filtering threshold is used for filtering out sample data with poor quality, only sample data with good quality is left, and the filtering threshold T can be dynamically modified with the deep model training process, such as being reduced with the improvement of model performance.

Further, the data filtering process can be expressed as FM (M(s), T) by a functional representation, where M(s) is a model M to sample s reasoning process, the result is denoted as result, and the FM () function compares the reasoning result with the filtering threshold T and outputs a boolean value. If the result and the threshold value of result meet the filtering standard, outputting the FM () function as True, and adding the sample s into a new filtering subset F_subseti, otherwise, outputting the FM () function as False, and not further operating the sample s, wherein the type of the filtering function FM () can be selected according to the user needs and the domain to which the model belongs, for example, the loss value, the precision or the like of the sample after being inferred by the model M can be used.

Referring to fig. 2, a flowchart illustrating steps of a model training method according to an embodiment of the present application is shown, where the method may include:

it should be noted that, as shown in fig. 17, fig. 17 shows a schematic diagram of a model training flow provided by the embodiment of the present application, that is, in the present application, the model training process includes model pre-training, and model evolution 4 processes in the processing unit, so as to perform model verification on a final model outputted by model evolution, and determine whether the final model meets the user requirement.

Step 201, dividing original data into data sets comprising at least one data subset according to a preset dividing instruction input by a user;

it should be noted that, the foregoing step 201 may be referred to the foregoing discussion, and will not be repeated herein.

Further, step 201, namely dividing the original data into data sets including at least one data subset according to a preset dividing instruction input by a user, includes:

In addition, step 201 may further include: under the condition of receiving an unbalanced dividing instruction sent by a user, clustering the original data to obtain target original data of different categories; determining sampling weights corresponding to the target original data according to the categories of the target original data; and sampling the target original data according to the sampling weight to obtain a plurality of data subsets.

It should be noted that, in the embodiment of the present application, the large-scale data set may cover more data features, which is helpful to improve the performance and generalization ability of the neural network model. In the application, the original large-scale Data set Data is divided into k subsets data= { subset1subset2 … … subset }, and each Data subset is independently trained.

The data set partitioning method may be designed according to the field needs and the characteristics of the data set, for example, if the data distribution is balanced, samples may be randomly (or sequentially selected) extracted to generate k data subsets in equal amounts. For example, if the data distribution is unbalanced, the data distribution may be clustered, different weights may be set for different types of data to sample, or data enhancement may be performed for the types with fewer samples, and then a certain amount of samples may be selected from each type to generate a subset of data. This ensures that each data subset covers all classes of samples with little variance in distribution.

Therefore, in the training process, through dividing the data set, a certain sample is selected in a targeted manner according to the data selection standard to carry out iterative training on the model, and a proper sample can be better selected according to the actual requirement of a user to carry out training.

Step 202, screening processing is carried out in the data set to obtain an initial sample;

step 203, performing pre-training processing according to the initial sample to obtain an initial model, wherein the initial model is used as a first target model when being loaded in a preset model library for the first time;

It should be noted that in the above steps 202-203, an initial model needs to be trained first for the subsequent training process. Samples of the pre-training Data set are derived from the original Data set Data, and the initial samples are obtained by extracting or screening the Data set in a selective manner.

Specifically, the model is trained using the pre-training dataset, and an initial model is obtained and stored as an initial model ckpt0.

And 204, performing model evolution processing on the data subset according to the first target model to obtain a second target model, wherein the target data set is generated by filtering the data subset based on the first target model.

Specifically, step 204 may include the steps of: acquiring a first target model; filtering the data subset based on the first target model to obtain a target data subset; training the first target model according to target samples in the target data subset to obtain a second target model, wherein the second target model is used for filtering the data subset again.

Specifically, the target sample selection may preferentially select data having a large difference from the information represented by the first target model M, for example, a larger sample s in the reasoning result of the first target model may be selected as the target sample.

In the embodiment of the application, after the target sample is determined, the first target model M may be trained based on the target sample to obtain the second target model.

It should be noted that, for model training, the complete model training needs to train the filtered valid samples corresponding to each data subset in the data set, and after the data set is divided into a plurality of data subsets, each data subset is an independent training process, that is, a processing unit on a GPU card can be used as a training module of a data subset, so that the training process can be more flexible, and training can be stopped or model starting training can be reloaded at any time according to the actual demands of users.

Therefore, the model evolution process can select different modes of the evolution process according to the actual demands of users, including transverse process, longitudinal process and mixed process.

In particular, the three processes described above may be discussed in detail below.

Further, training the first target model according to the target sample, and obtaining a second target model includes: training the first target model according to the target sample and a preset training hyper-parameter to obtain a second target model, wherein the preset training hyper-parameter comprises at least one of the following: training times, batch data size, learning rate and optimizer.

Specifically, a proper training hyper-parameter is set empirically to obtain a model with better performance. The training hyper-parameters include the number of training epochs, the size of batch data batch_size, learning rate learning_rate, optimizer, etc.

Step 205, performing verification processing on the second target model according to a preset verification data set;

and 206, if the performance corresponding to the second target model meets the preset condition, stopping training the second target model.

It should be noted that in the foregoing steps 205-206, the final training model is verified in the verification data record, whether to retrain the model is determined according to whether the performance meets the requirement, after determining that the training process on all the data subsets is completed, the optimal model is saved, and the optimal model may be saved in the preset model library, so that the user may directly use the model next time, or in order to further optimize the model, the optimal model is further trained as the first target model.

Referring to fig. 3, a second step flowchart of a model training method provided by an embodiment of the present application is shown, where the method may include:

step 301, dividing original data into data sets comprising at least one data subset according to a preset dividing instruction input by a user;

step 302, screening processing is carried out in the data set to obtain an initial sample;

step 303, performing pre-training processing according to the initial sample to obtain an initial model, wherein the initial model is used as a first target model when being loaded in a preset model library for the first time;

It should be noted that, the steps 301-303 are discussed with reference to the foregoing, and are not repeated herein.

Step 304, performing model evolution processing on all the data subsets based on a transverse circulation unit, the first target model and a preset sequence corresponding to the data subsets until traversing the data subsets to obtain a target transverse model,

the lateral circulation unit includes: and carrying out model evolution processing on one data subset in a preset training round based on a preset sequence corresponding to the data subset, or carrying out model evolution processing on one data subset in a preset sequence corresponding to the data subset until a preset training progress is reached.

It should be noted that the horizontal cyclic unit is the minimum cyclic unit for performing horizontal processing on the model, and each cycle includes performing model evolution processing on a data subset in a preset round according to a preset sequence corresponding to the data subset, or performing model evolution processing on a data subset until a training requirement is met, for example, a preset training progress or a preset training precision is met.

After one transversal circulation unit is completed, the obtained model circulation repeats the transversal circulation unit until all traversing of k data subsets is completed, and then the final target transversal model can be obtained.

The transverse loop is based on the preset sequence according to the data subsets, model evolution processing is carried out on the initial model according to training precision and training rounds, and the transverse loop is iterated until all the data subsets are traversed.

It should be noted that, the above-mentioned preset training round refers to the number of times of Processing Unit (PU) operations required for performing model evolution processing on a data subset according to the actual requirement of a user, for example, the preset training round may be m times of PU operations, specifically, the pre-training model ckpt0 completes model evolution on the data subset1, that is, completes m times of PU operations (model loading, data filtering, model training and model saving), and obtains a model ckpt1.M.

In addition, one model evolution may include a Processing Unit (PU) operation of a preset training round, and may also reach a preset training progress, for example, in the transverse training, a user needs that one data subset training progress reaches 80% and then the model evolution of the next data subset can be performed, and then the preset training progress is 80%.

It should be noted that, the step 304 is the aforementioned transverse processing, as shown in fig. 11, fig. 11 shows a schematic view of the transverse processing provided by the embodiment of the present application, and the transverse processing is that after the PU operation is performed multiple times on the inside of each data subset preferentially, the same operation is performed on the new data subset until all the data subsets are trained, and then a complete model training process is completed.

Specifically, firstly, a pre-training model ckpt0 is loaded, model evolution is completed on the data subset1, namely, m times of PU operation (model loading, data filtering, model training and model saving) are completed, and a model ckpt1.M is obtained. Then, the ckpt1.M model is loaded, and the data subset2 is similarly completed m times of PU operations (the m value here may be different from the above), resulting in ckpt2.M. And sequentially cycling until model evolution operation is completed on all k data subsets, and completing a complete training process. And obtaining a model ckptk.m, namely a final optimal model.

Specifically, the obtained model sequence is shown in fig. 12, and fig. 12 shows a schematic diagram of a model sequence of transverse processing provided by an embodiment of the present application.

According to the embodiment of the application, the filtering of the sample data with poor quality can be realized by filtering the data subset, the quality of the data set during training the model is ensured, the data set scale can be reduced while the quality of the data set is ensured in the face of large-scale data set, the time consumption of the subsequent model training is reduced, and the influence of the data set with poor quality on the model updating direction during the subsequent model training can be reduced by removing the data with poor quality.

In addition, the model evolution process is divided into a transverse process flow and a longitudinal process flow by taking the data subset as a division standard, the dynamic selection of a user can be met by different processes, and different model evolution process flows are selected according to different projects.

Referring to fig. 4, a third step flowchart of a model training method provided by an embodiment of the present application is shown, where the method may include:

step 401, dividing the original data into data sets including at least one data subset according to a preset dividing instruction input by a user;

Step 402, screening processing is carried out in the data set to obtain an initial sample;

step 403, performing pre-training treatment according to the initial sample to obtain an initial model, wherein the initial model is used as a first target model when being loaded in a preset model library for the first time;

it should be noted that, the steps 401-403 are discussed with reference to the foregoing, and are not repeated herein.

And step 404, performing model evolution processing on the data subset based on a longitudinal circulation unit, the first target model and a preset sequence corresponding to the data subset until a preset training round or a preset training progress is reached, so as to obtain a target longitudinal model.

Wherein the longitudinal circulation unit comprises: and respectively carrying out model evolution processing on the data subsets according to the preset sequence until the data subsets are traversed.

It should be noted that, in the embodiment of the present application, for the longitudinal processing, the minimum circulation unit is a longitudinal circulation unit, that is, for the model, according to the preset sequence of the data subsets, a single model evolution is performed for each data subset, instead of a model evolution of a preset training round, after each of the k data subsets is performed with a single model evolution, the longitudinal circulation unit is circulated again, and the single model evolution is performed for each data subset until the obtained target longitudinal model reaches the preset training times, or the model precision has satisfied the actual requirement, and the model evolution is stopped.

It should be noted that, in the embodiment of the present application, the one-time model evolution process refers to the model evolution process of performing one processing unit operation on one data subset, where, in the foregoing, performing model evolution on one data subset may include multiple processing unit operations, and then performing one processing unit operation on one data subset in the longitudinal training process performs model evolution on the next data subset, and then performing one processing unit operation on the next data subset is replaced by another data subset.

It should be noted that, the step 404 is the aforementioned longitudinal processing, as shown in fig. 13, fig. 13 shows a schematic view of the longitudinal processing provided in the embodiment of the present application, and the longitudinal processing process alternately executes PU operations according to the data subset sequence, so as to complete model evolution.

Specifically, firstly, a pre-training model ckpt0 is loaded, and one PU operation (model loading, data filtering, model training and model storage) is completed on the data subset1, so that a model ckpt1.1 is obtained. And then loading the ckpt1.1 model, and completing the PU operation on the data subset2 once again to obtain the ckpt1.1 model. And sequentially cycling k data subsets, wherein each subset completes one PU operation, and a ckptk.1 model is obtained. Returning to the first data subset2, the above-mentioned loop of k data subsets is completed. If the training process is completed once after m times of circulation, the model ckptk.m is the final optimal model, the obtained model sequence is shown in fig. 14, and fig. 14 shows a schematic diagram of a model sequence of longitudinal processing provided by the embodiment of the application.

Referring to fig. 5, a flowchart illustrating a step of a model training method according to an embodiment of the present application is shown, where the method may include:

step 501, dividing original data into data sets comprising at least one data subset according to a preset dividing instruction input by a user;

step 502, screening processing is carried out in the data set to obtain an initial sample;

step 503, performing pre-training processing according to the initial sample to obtain an initial model, wherein the initial model is used as a first target model when being loaded in a preset model library for the first time;

it should be noted that, the above steps 501-503 are discussed with reference to the foregoing, and are not repeated herein.

And 504, performing model evolution processing on any one data subset based on the first target model until the data subset is traversed, so as to obtain a target hybrid model.

It should be noted that, for the hybrid processing, the hybrid training is to perform model evolution on any one of the data subsets on the basis of the first target model to obtain the first hybrid target model, further perform model evolution on any one of the data subsets other than the data subset selected for the first time on the basis of the first hybrid target model, and so on until all the data subsets are traversed, and finally obtain the target hybrid model.

I.e. the mixing process does not take into account the order of the data subsets, but rather is randomly selected.

In the embodiment of the present application, as shown in fig. 15, fig. 15 shows a schematic diagram of a hybrid processing provided by the embodiment of the present application, and as shown in the drawing, it can be obviously seen that, for the hybrid processing, a certain data subset is randomly extracted by single or multiple PU operations to complete the model evolution process, and no sequence requirement exists, so that the most suitable data subset can be selected for training according to the actual requirement of the user.

The preamble is described in the data set division, the sampling weight can be set according to the data type so as to divide the data subsets, therefore, in the mixing process, a pre-training model, namely an initial model ckpt0, is loaded, a certain data subset subseti is arbitrarily selected, one or more processing unit operations (model loading, data filtering, model training and model saving) are completed, a model ckpt1 is obtained, then a model ckpt1 file is loaded, a certain data subset subseti is arbitrarily selected, and one or more PU operations are completed, so that a model ckpt2 is obtained. And sequentially cycling until all the data subsets are traversed, completing model evolution, namely completing a one-time completed hybrid training process, and obtaining a final model ckptm which is an optimal model. The model sequence obtained in the process is shown in fig. 16, and fig. 16 shows a schematic diagram of the model sequence of the mixing process according to the embodiment of the application.

In addition, by taking the processing unit PU as a division standard, a mixed processing flow can be obtained, namely, a transverse processing flow and a longitudinal processing flow exist at the same time, the sequence, the number and the stopping standard do not have specific requirements, and the flexibility of an evolution process can be improved by combining the dynamic change of actual conditions.

Referring to fig. 6, fig. 6 shows a model training data acquisition apparatus provided by an embodiment of the present application, which is applied to a GPU, and the apparatus includes:

an obtaining module 601, configured to obtain a first target model;

the filtering module 602 is configured to perform filtering processing on a data subset based on the first target model to obtain a target data set, where the data subset is obtained by dividing original data based on a preset dividing instruction input by a user, and the target data set is used for training the first target model.

Referring to fig. 7, fig. 7 shows a model training apparatus according to an embodiment of the present application, which is characterized in that the apparatus is applied to a GPU, and the apparatus includes:

A dividing module 701, configured to divide the original data into data sets including at least one data subset according to a preset dividing instruction input by a user;

a first screening module 702, configured to perform screening processing in the dataset to obtain an initial sample;

the pre-training module 703 is configured to perform pre-training processing according to the initial sample to obtain an initial model, where the initial model is a first target model when loaded in a preset model library for the first time;

and the model evolution module 704 is configured to perform model evolution processing on the data subset according to the first target model to obtain a second target model, where the target data set is generated by performing filtering processing on the data subset based on the first target model.

The embodiment of the present application also provides a communication device, as shown in fig. 8, including a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete communication with each other through the communication bus 804,

a memory 803 for storing a computer program;

the processor 801, when executing the program stored in the memory 803, may implement the following steps:

acquiring a first target model;

Or dividing the original data into data sets comprising at least one data subset according to a preset dividing instruction input by a user;

and carrying out model evolution processing on the data subset according to the first target model to obtain a second target model, wherein the target data set is generated by filtering the data subset based on the first target model.

The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the terminal and other devices.

The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present application, there is also provided a computer readable storage medium having instructions stored therein that, when run on a computer, cause the computer to perform congestion control as described in any of the above embodiments.

In yet another embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform congestion control as described in any of the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or third database to another website, computer, server, or third database by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more servers, third databases, etc. that can be integrated with the available medium. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A model training data acquisition method, which is applied to a GPU, the model training data acquisition method comprising:

acquiring a first target model;

and filtering the data subset based on the first target model to obtain a target data subset, wherein the data subset is obtained by dividing the original data based on a preset dividing instruction input by a user, and the target data subset is used for training the first target model.

2. The method of claim 1, wherein filtering the subset of data based on the first target model to obtain a target data set comprises:

3. The method for obtaining model training data according to claim 2, wherein obtaining the deviation value corresponding to each sample in the arbitrary subset of data comprises:

Acquiring all samples in any one data subset;

4. The model training data acquisition method according to claim 2, wherein the adding the sample to the target data set in the case where it is detected that the deviation value corresponding to the sample satisfies a preset filtering condition includes;

adding the sample to the target data set if the boolean value of 1 is detected;

and removing the sample when the Boolean value is detected to be 0.

5. The model training data acquisition method of claim 1, wherein the acquiring the first target model comprises:

6. The model training data acquisition method according to claim 1, characterized in that before the step of acquiring the first target model, the method comprises:

7. A model training method, applied to a GPU, the model training method comprising:

8. The method of claim 7, wherein said model evolution processing of the subset of data according to the first object model to obtain a second object model comprises:

acquiring a first target model;

9. The model training method of claim 7, wherein after the step of model evolution processing of the subset of data according to the first target model to obtain a second target model, the method comprises:

10. The model training method of claim 8, wherein training the first target model based on the target samples in the target data subset to obtain a second target model comprises:

11. The model training method of claim 10, wherein the preset training hyper-parameters comprise at least one of:

training times, batch data size, learning rate and optimizer.

12. The model training method of claim 7, wherein said performing model evolution processing on the subset of data according to the first target model to obtain a second target model comprises:

and carrying out model evolution processing on all the data subsets based on a transverse circulation unit, the first target model and a preset sequence corresponding to the data subsets until the data subsets are traversed, so as to obtain the target transverse model.

13. The model training method of claim 12, wherein the lateral circulation unit comprises:

and carrying out model evolution processing on one data subset in a preset training round based on a preset sequence corresponding to the data subset, or carrying out model evolution processing on one data subset in a preset sequence corresponding to the data subset until a preset training progress is reached.

14. The model training method of claim 7, wherein the model evolution processing is performed on the samples in the target data set according to the first target model and the target data set to obtain a second target model, comprising:

and carrying out model evolution processing on the data subset based on a longitudinal circulation unit, the first target model and a preset sequence corresponding to the data subset until a preset training round is reached or a preset training progress is reached, so as to obtain a target longitudinal model.

15. The model training method of claim 14, wherein the longitudinal circulation unit comprises:

and respectively carrying out model evolution processing on the data subsets according to the preset sequence until the data subsets are traversed.

16. The model training method of claim 7, wherein the model evolution processing is performed on the samples in the target data set according to the first target model and the target data set to obtain a second target model, comprising:

and performing model evolution processing on any one data subset based on the first target model until traversing the data subset to obtain a target hybrid model.

17. The model training method of claim 7, wherein the dividing the raw data into the data sets including at least one subset of data according to the preset division instruction input by the user comprises:

18. The model training method of claim 7, wherein the dividing the raw data into the data sets including at least one subset of data according to the preset division instruction input by the user comprises:

19. A model training data acquisition apparatus for use with a GPU, the apparatus comprising:

the acquisition module is used for acquiring a first target model;

20. A model training apparatus for use with a GPU, the apparatus comprising:

21. A communication device, comprising: a transceiver, a memory, a processor, and a program stored on the memory and executable on the processor;

The processor is configured to read a program in the memory to implement the model training data acquisition method according to any one of claims 1 to 6 or to implement the model training method according to any one of claims 7 to 18.

22. A readable storage medium storing a program, wherein the program when executed by a processor implements a model training data acquisition method according to any one of claims 1-6 or a model training method according to any one of claims 7-18.