CN111931492A

CN111931492A - Data expansion mixing strategy generation method and device and computer equipment

Info

Publication number: CN111931492A
Application number: CN202010686538.8A
Authority: CN
Inventors: 朱威; 李恬静
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-07-16
Filing date: 2020-07-16
Publication date: 2020-11-13
Anticipated expiration: 2040-07-16
Also published as: WO2021139233A1

Abstract

The application relates to the technical field of artificial intelligence, and provides a data expansion mixing strategy generation method, a data expansion mixing strategy generation device and computer equipment. The method comprises the following steps: obtaining strategy feedback data and training data of the current time, inputting the strategy feedback data of the current time to a preset mixed strategy search model to obtain a data expansion mixed strategy, expanding the training data according to the data expansion mixed strategy to obtain expanded training data, inputting the expanded training data to a preset recurrent neural network for training to obtain strategy feedback data corresponding to the data expansion mixed strategy, taking the strategy feedback data corresponding to the data expansion mixed strategy as the strategy feedback data of the current time, returning to the step of inputting the strategy feedback data of the current time to the preset mixed strategy search model until the training times of the preset mixed strategy search model reach the preset training times, and obtaining the optimal data expansion mixed strategy. By adopting the method, the data expansion efficiency can be improved.

Description

Data expansion mixing strategy generation method and device and computer equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for generating a data expansion hybrid strategy, a computer device, and a storage medium.

Background

With the continuous development of artificial intelligence, deep learning algorithms and machine learning are also trending towards developing hot trends. Deep learning algorithms such as neural network models require a large amount of training data to ensure the generalization ability of the model. Data enhancement (data expansion) is a common data processing means in machine learning and deep learning, and can enable limited data to generate more data, increase the number and diversity (noise data) of training samples, and improve the robustness of a model. In natural language processing tasks, common ways of data augmentation include synonym replacement and reverse translation.

At present, in a natural language processing task, a large amount of labor cost is needed for collecting marking data, the collected data has limitations, a data expansion mixing strategy is usually designed artificially, and the phenomenon that the strategy and a data set are not suitable or the expansion amount is too large often occurs, so that a trained model generates an overfitting phenomenon, and the natural language data expansion efficiency is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a data expansion mixing policy generation method, apparatus, computer device, and storage medium capable of improving natural language data expansion efficiency.

A data expansion mixing strategy generation method comprises the following steps:

acquiring strategy feedback data and training data of the current time;

inputting strategy feedback data of the current time into a preset hybrid strategy search model to obtain a data expansion hybrid strategy of the current time;

expanding the training data according to the data expansion mixing strategy to obtain expanded training data;

inputting the expanded training data into a preset cyclic neural network for training to obtain strategy feedback data corresponding to the data expansion mixed strategy;

and taking the strategy feedback data corresponding to the data expansion mixing strategy as the strategy feedback data of the current time, returning to the step of inputting the strategy feedback data of the current time into the preset mixing strategy search model so as to update the data expansion mixing strategy until the training times of the preset mixing strategy search model reach the preset training times, and obtaining the optimal data expansion mixing strategy.

In one embodiment, the expanding the training data according to the data expansion mixing strategy, and obtaining the expanded training data includes:

replacing any character in a sentence in the training data with a mask character by using the trained MLM model;

predicting characters corresponding to the mask characters according to the pre-trained language model to obtain predicted characters;

and if the confidence coefficient of the predicted character is larger than the preset threshold value, using the training data containing the predicted character as the expanded training data.

representing words in the training data as word vectors;

randomly representing byte segments of any sentence in the training data as target vectors;

calculating the similarity between the target vector and the word vector, and finding out the synonym vector of the target vector based on the similarity;

and replacing the byte segments with words corresponding to the synonym vectors to obtain the expanded training data.

and based on the training data, generating new training data by using a pre-trained generated model to obtain expanded training data, and training the pre-trained generated model based on historical sentence data.

In one embodiment, generating new training data using the pre-trained generative model based on the training data, and obtaining the augmented training data comprises:

randomly removing byte fragments of any sentence in the training data to obtain a target sentence;

and aiming at the removed byte segments in the target sentence, predicting a corresponding new character by adopting a pre-trained generation model to obtain expanded training data.

In one embodiment, the step of inputting the policy feedback data of the current time to the preset hybrid policy search model to update the data expansion hybrid policy includes:

inputting the feedback data of the current time as return data to the preset hybrid strategy search model again, and updating the parameters of the preset hybrid strategy search model;

and generating a new data expansion mixing strategy based on the mixing strategy search model after the parameters are updated.

In one embodiment, the updating the parameters of the preset hybrid strategy search model includes:

and updating the parameters of the preset hybrid strategy search model according to a REINFORCE strategy gradient algorithm.

A data augmentation hybrid policy generation apparatus, the apparatus comprising:

the data acquisition module is used for acquiring strategy feedback data and training data of the current time;

the hybrid strategy acquisition module is used for inputting the strategy feedback data of the current time into a preset hybrid strategy search model to obtain a data expansion hybrid strategy of the current time;

the data expansion module is used for expanding the training data according to the data expansion mixing strategy to obtain expanded training data;

the strategy feedback data updating module is used for inputting the expanded training data into a preset cyclic neural network for training to obtain strategy feedback data corresponding to the data expansion mixing strategy;

and the hybrid strategy updating module is used for taking the strategy feedback data corresponding to the data expansion hybrid strategy as the strategy feedback data of the current time, awakening the hybrid strategy acquisition module to execute the operation of inputting the strategy feedback data of the current time into the preset hybrid strategy search model so as to update the data expansion hybrid strategy until the training times of the preset hybrid strategy search model reach the preset training times, and obtaining the optimal data expansion hybrid strategy.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring strategy feedback data and training data of the current time;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring strategy feedback data and training data of the current time;

The method, the device, the computer equipment and the storage medium for generating the data expansion mixing strategy input strategy feedback data to a preset mixing strategy search model to initially generate a data expansion mixing strategy, expand training data according to the generated data expansion mixing strategy, further input the expanded training data to a preset cyclic nerve to update the strategy feedback data, and circulate the steps to input the updated strategy feedback data to the preset mixing strategy search model to update the parameters of the mixing strategy search model, so that the model tends to be mature, and further the optimal data expansion mixing strategy is obtained. According to the scheme, the time consumed by strategy search can be reduced, the optimal data expansion mixed strategy can be automatically constructed according to the training data, the precision and the robustness of the model are improved, the efficiency of natural language data expansion is further improved, and the labor cost and the calculation cost are saved.

Drawings

FIG. 1 is a diagram of an application environment of a data augmentation hybrid policy generation method in one embodiment;

FIG. 2 is a flow diagram illustrating a method for generating a hybrid strategy for data augmentation in one embodiment;

FIG. 3 is a flowchart illustrating the steps of augmenting training data according to a data augmentation hybrid strategy in one embodiment;

FIG. 4 is a flowchart illustrating another step of augmenting training data according to a data augmentation hybrid strategy;

FIG. 5 is a block diagram showing the structure of a data expansion mixing policy generation apparatus according to an embodiment;

FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The data expansion mixing strategy generation method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. Specifically, the user uploads training data and strategy feedback data constructed by natural language data to the server 104 through the terminal 102, performs corresponding operation on an operation interface of the terminal 102, sends a data expansion mixing strategy generation message to the server 104, the server 104 responds to the message to obtain strategy feedback data and training data at the current time, inputs the strategy feedback data at the current time to a preset mixing strategy search model to obtain a data expansion mixing strategy at the current time, expands the training data according to the data expansion mixing strategy to obtain expanded training data, inputs the expanded training data to a preset recurrent neural network for training to obtain strategy feedback data corresponding to the data expansion mixing strategy, and takes the strategy feedback data corresponding to the data expansion mixing strategy as the strategy feedback data at the current time, and returning to the step of inputting the strategy feedback data of the current time into the preset mixed strategy search model to update the data expansion mixed strategy until the training times of the preset mixed strategy search model reach the preset training times, so as to obtain the optimal data expansion mixed strategy. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a data expansion mixing policy generation method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

step 202, obtaining strategy feedback data and training data of the current time.

The data enhancement strategy plays an important role in improving the training sample data size, improving the stability and robustness of the model and improving the adaptability and generalization of the model in the real world. In the data preparation phase, a training set and a development set (i.e., a validation set) are prepared. The present application is intended to find the optimal data augmentation hybrid strategy by performing performance tests of a data augmentation (enhancement) strategy search and validation set on training data through a feedback mechanism. At the beginning of algorithm execution, the strategy feedback data at the current time is the initial data acquired in advance to expand the mixed strategy feedback data. The initial data expansion mixing strategy feedback data refers to feedback data obtained by expanding the expression of the mixing strategy on a development set based on historical data for a preset recurrent neural network. The training data is data to be augmented, and may be different types of data. In this embodiment, a corresponding recurrent neural network for training the training data is preset for different types of training data. And if a certain type of training data is selected, correspondingly selecting a recurrent neural network for training the type of data. For example, if the training data is a classification task data set, the recurrent neural network training the classification data may be a model Text-CNN for data classification, where Text-CNN network parameters are shared during the data-augmented hybrid strategy search process.

And 204, inputting the strategy feedback data of the current time into a preset hybrid strategy search model to obtain a data expansion hybrid strategy of the current time.

In this embodiment, the hybrid strategy search model is a controller, which is composed of a recurrent neural network. The hybrid strategy search model is provided with a plurality of defined data expansion sub-strategies. After the strategy feedback data of the current time is obtained, the strategy feedback data of the current time is used as the input data of the hybrid strategy search model, the hidden state of each step of the network is input into a classifier, and each parameter of the hybrid strategy is determined. The controller randomly initializes and randomly generates a data expansion mixing strategy at the current time.

And step 206, expanding the training data according to the data expansion mixing strategy to obtain expanded training data.

After the data expansion mixing strategy is obtained, the training data is expanded according to the data expansion mixing strategy, and further the expanded training data is obtained, so that the effect of updating the training data is achieved. Specifically, the hybrid data augmentation hybrid strategy may include any combination of data augmentation hybrid strategies using data translation, generating a new sentence using a generative model, enhanced semantic based synonym replacement, and predictive character replacement.

And 208, inputting the expanded training data into a preset cyclic neural network for training to obtain strategy feedback data corresponding to the data expansion mixing strategy.

In this embodiment, the training data is a classification task data set, and the recurrent neural network corresponding to the classification task data set may be a classification network, specifically, a Text-CNN network. After data are expanded by using a data expansion mixed strategy, the expanded data are used as new training data and are input into a corresponding Text-CNN network, the expanded training data are classified and trained by the Text-CNN network, then the performance of the training data on a development set is compared to obtain feedback data, specifically, the feedback data can be labels of the data of a model prediction development set, and then the labels are compared with standard answers and scored according to accuracy and the like to obtain strategy feedback data.

Step 210, taking the strategy feedback data corresponding to the data expansion mixing strategy as the strategy feedback data of the current time, returning to step 204 to update the data expansion mixing strategy until the training times of the preset mixing strategy search model reach the preset training times, and obtaining the optimal data expansion mixing strategy.

In order to select the optimal data expansion mixing strategy, after feedback data of the data expansion mixing strategy is obtained, the feedback data can be input into the mixing strategy search model again as the return (reward) of the data expansion mixing strategy at the current time, and parameters of the mixing strategy search model are updated so as to update the data expansion mixing strategy at the current time. And then, according to the updated generated data expansion mixing strategy, omitting the expansion training data, inputting the expanded training data into the Text-CNN network again to obtain new strategy feedback data, further inputting the new strategy feedback data into the mixing strategy search model again, repeating the steps until the number of times of model training reaches a preset number, terminating the training, and selecting the corresponding data expansion mixing strategy with the highest accuracy from the strategy feedback data (accuracy) obtained by each training round as the optimal data expansion mixing strategy, so as to screen out the optimal data expansion mixing strategy.

The method for generating the data expansion mixing strategy comprises the steps of inputting strategy feedback data into a preset mixing strategy search model to generate a data expansion mixing strategy, expanding training data according to the generated data expansion mixing strategy, further inputting the expanded training data into a preset cyclic nerve to update the strategy feedback data, repeating the steps, inputting the updated strategy feedback data into the preset mixing strategy search model to update parameters of the mixing strategy search model, enabling the model to tend to be mature, and further obtaining the optimal data expansion mixing strategy. According to the scheme, the time consumed by strategy search can be reduced, the optimal data expansion mixed strategy can be automatically constructed according to the training data, the precision and the robustness of the model are improved, the efficiency of natural language data expansion is further improved, and the labor cost and the calculation cost are saved.

As shown in FIG. 3, in one embodiment, the hybrid strategy search model deploys a plurality of data expansion sub-strategies;

expanding the training data according to the data expansion mixing strategy, wherein the obtaining of the expanded training data comprises the following steps:

step 226, replacing any character in the sentence in the training data with a mask character by using the trained MLM model;

step 246, predicting characters corresponding to the mask characters according to the pre-trained language model to obtain predicted characters;

step 266, if the confidence of the predicted character is greater than the preset threshold, the training data containing the predicted character is used as the expanded training data.

In specific implementation, the data expansion sub-strategy is provided with a plurality of strategies, the generated data filling mixing strategy is a combination comprising a plurality of data expansion sub-strategies, and specifically, the data expansion sub-strategy comprises a sentence data expansion strategy by using an MLM model. Specifically, a trained MLM (Masked Language Model) may be used to convert a word in a sentence in the training data into a "[ MASK ]" character (MASK character), the pre-trained Language Model predicts what the removed place should be to obtain a predicted character, and if the confidence of the character is greater than 0.85, the new sentence including the predicted character is expanded to obtain expanded training data. Specifically, the predicted new character is a parameter LMHead pre-trained according to a language model, which predicts the probability that a word corresponding to the place of the "[ MASK ]" character is a certain word in the vocabulary thereof, and takes the word with the highest probability as the predicted new character. In the embodiment, by using the MLM model different from the traditional data expansion mode to perform character replacement, a new sentence can be quickly generated to achieve the effect of expanding training data.

As shown in fig. 4, in one embodiment, the expanding the training data according to the data expansion mixing strategy to obtain expanded training data includes:

step 216, representing words in the training data as word vectors;

step 236, randomly representing byte segments of any sentence in the training data as target vectors;

256, calculating the similarity between the target vector and the word vector, and finding out the synonym vector of the target vector based on the similarity;

step 276, replacing the byte segments with words corresponding to the synonym vector to obtain the expanded training data.

In particular, the data expansion sub-strategy further comprises a synonym replacement strategy based on the enhanced semantic meaning. The synonym replacement strategy based on the enhanced semantic meaning is used to expand the training data can be: first, a pre-training model is fine-tuned, and the fine-tuning task is to determine whether two phrases (words) are synonyms. Then, all words in the preset knowledge base are represented as word vectors by using the pre-training model. Then, a certain n-gram in a sentence is randomly selected, a pre-training model is used for carrying out vector representation on the n-gram to obtain a target vector, the target vector is searched in a preset knowledge base based on the target vector, namely, the similarity between the target vector and word vectors in the preset knowledge base is calculated, whether word vectors similar to (synonymous with) the target vector exist is checked, in the embodiment, if the similarity between the two vectors is more than 0.95, the two vectors are represented as synonyms, and words corresponding to the word vectors in the preset knowledge base can be replaced with the n-gram in the selected sentence to form a new sentence. It is understood that, in other embodiments, the similarity may also be 0.96, 0.97, and other values, which may be determined according to practical situations and is not limited herein. In this embodiment, in this way, the sentences in the training data are expanded to obtain expanded training data, thereby enriching the training data.

In practical applications, the data expansion sub-strategy may further include generating a new sentence strategy using the generative model based on the sentences of the training data. In another embodiment, it may be: and randomly removing byte segments of any sentence in the training data to obtain a target sentence, and predicting a corresponding new character by adopting a pre-training generation model aiming at the removed byte segments in the target sentence to obtain the expanded training data. Specifically, the 3-gram (byte fragment) in the second half of a sentence in the training data may be randomly selected first, and the 3-gram may be removed. And predicting 3 new characters by the pre-trained generating model aiming at the removed 3-gram to form a new sentence. In this embodiment, in this way, the sentences in the training data are expanded to generate new sentences, so that the expanded training data can be obtained quickly and the training data can be expanded.

In specific implementation, the feedback data at the current time can be expressed as expansion data obtained by expanding a development set based on a historical data expansion mixing strategy, a mixing strategy search model trains the expansion data for 1 round to obtain a training result (label data), the obtained label data is compared with the known standard label data in advance to obtain the accuracy, the scoring is performed based on the accuracy, the score is used as the return (reward) of the data expansion mixing strategy, namely the feedback data is input into a preset mixing strategy search model again, the parameters of the network are updated, and the network generates a new data expansion mixing strategy. In one embodiment, updating the parameters of the preset hybrid strategy search model comprises: and updating the parameters of the preset hybrid strategy search model according to a REINFORCE strategy gradient algorithm. Specifically, the parameter updating policy of the preset hybrid policy search model may be as follows: assuming that the parameters of the preset hybrid strategy search model are vector theta and the strategy is pi (theta), the expected return obtained is R-E [ pi (theta) × R | theta | (theta)]Where r represents the feedback data (i.e., reward) for the current time, then a gradient of the reward to the parameter is expected to be

A gradient corresponding to pi (theta). Is concretely made by

To approximate the estimate, so that the parameter update is

The preset hybrid strategy search model is gradually mature after parameters are updated, and data can be better generated to expand the hybrid strategy. Such a cycle, through 50-80 epochs, can achieve better training. Wherein, the training times are adjusted according to the existing resources. Generally, the training is performed by rolling an epoch round the training set. A model requires about 100 epochs for sufficient training, and an algorithm engineer typically uses 200 epochs to freely select a data expansion strategy. But this strategy is typically far from optimal and typically has a 3-4 point accuracy improvement space. The hybrid data augmentation strategy has more than 1e +5 options, and if all possibilities are trained simply, then 1e +5 x 100epoch of time consuming training is required. In the embodiment, the parameters of the model are updated by using the REINFORCE strategy gradient algorithm, and after 50 times of training, the hybrid strategy search model is well trained. And finally, generating an optimal data expansion hybrid strategy by using the trained hybrid strategy search model. Therefore, an optimized data expansion mixing strategy can be obtained by less time consumption for training a common neural network, so that the model precision is obviously improved. By updating parameters by using a REINFORCE strategy gradient algorithm, data can be better fitted, and the strategy can be trained quickly.

In practical applications, the data expansion sub-strategy is not limited to the three data expansion sub-strategies listed above. If the three types of data expansion sub-strategies are denoted as s _0, s _1, and s _2, the probability of each strategy s _ i acting on each piece of data is p _ i (0< ═ p _ i < ═ 5), and this means that the original training data is expanded by p _ i times by using this strategy. When p _ i <1, a part of data is randomly expanded. Its influence degree d _ i (0< d _ i < ═ 5) refers to how many words in a certain piece of data this expansion strategy will influence. For example, d _0 ═ 2 means that two words in one sentence are randomly removed. Therefore, two values, p _ i and d _ i, need to be determined. For convenience, p _ i may be discretized into 10 numbers equidistant between 0 and 5. Based on the principle, the mixed data expansion strategy comprising three types of strategies corresponds to (10 × 5) ^3 ═ 1e +5 choices, so that the model can provide the matched optimal data expansion strategy according to different data task sets.

It should be understood that although the various steps in the flow charts of fig. 2-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-4 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 5, there is provided a data expansion policy generation apparatus, including: a data acquisition module 510, a hybrid policy acquisition module 520, a data expansion module 530, a policy feedback data update module 540, and a hybrid policy update module 550, wherein:

and a data obtaining module 510, configured to obtain the strategy feedback data and the training data at the current time.

The hybrid strategy obtaining module 520 is configured to input the strategy feedback data of the current time to the preset hybrid strategy search model, so as to obtain the data expansion hybrid strategy of the current time.

The data expansion module 530 is configured to expand the training data according to the data expansion hybrid strategy to obtain expanded training data.

And a strategy feedback data updating module 540, configured to input the expanded training data to a preset recurrent neural network for training, so as to obtain strategy feedback data corresponding to the data expansion hybrid strategy.

And a hybrid strategy updating module 550, configured to take the strategy feedback data corresponding to the data expansion hybrid strategy as the strategy feedback data at the current time, and wake up the hybrid strategy acquisition module to perform an operation of inputting the strategy feedback data at the current time to the preset hybrid strategy search model, so as to update the data expansion hybrid strategy until the training times of the preset hybrid strategy search model reach the preset training times, so as to obtain an optimal data expansion hybrid strategy.

In one embodiment, the data expansion module 530 is further configured to replace any character in a sentence in the training data with a mask character by using the trained MLM model, predict a character corresponding to the mask character according to the pre-trained language model to obtain a predicted character, and if the confidence of the predicted character is greater than a preset threshold, use the training data including the predicted character as the expanded training data.

In one embodiment, the data expansion module 530 is further configured to represent words in the training data as word vectors, randomly represent byte segments of any sentence in the training data as target vectors, calculate similarity between the target vectors and the word vectors, find out synonym vectors of the target vectors based on the similarity, replace the byte segments with words corresponding to the synonym vectors, and obtain expanded training data.

In one embodiment, the data expansion module 530 is further configured to generate new training data based on the training data by using a pre-trained generative model, which is obtained by training based on the historical sentence data, to obtain expanded training data.

In one embodiment, the data expansion module 530 is further configured to randomly remove byte segments of any sentence in the training data to obtain a target sentence, and predict, by using a pre-trained generation model, a corresponding new character for the removed byte segments in the target sentence to obtain the expanded training data.

In one embodiment, the hybrid policy updating module 550 is further configured to input the feedback data of the current time as the reward data to the preset hybrid policy search model again, and update the parameters of the preset hybrid policy search model; and generating a new data expansion mixing strategy based on the mixing strategy search model after the parameters are updated.

In one embodiment, the blending strategy updating module 550 is further configured to update the parameters of the preset blending strategy search model according to a REINFORCE strategy gradient algorithm.

For specific limitations of the data expansion mixing strategy generation device, reference may be made to the above limitations of the data expansion mixing strategy generation method, which are not described herein again. The modules in the data expansion mixing strategy generating device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing feedback data, training data, a hybrid strategy search model and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data augmentation hybrid policy generation method.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: obtaining strategy feedback data and training data of the current time, inputting the strategy feedback data of the current time to a preset mixed strategy search model to obtain a data expansion mixed strategy of the current time, expanding the training data according to the data expansion mixed strategy to obtain expanded training data, inputting the expanded training data to a preset recurrent neural network for training to obtain strategy feedback data corresponding to the data expansion mixed strategy, taking the strategy feedback data corresponding to the data expansion mixed strategy as the strategy feedback data of the current time, returning to the step of inputting the strategy feedback data of the current time to the preset mixed strategy search model to update the data expansion mixed strategy until the training times of the preset mixed strategy search model reach the preset training times to obtain an optimal data expansion mixed strategy.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and replacing any character in a sentence in the training data with a mask character by using a trained MLM model, predicting the character corresponding to the mask character according to the pre-trained language model to obtain a predicted character, and taking the training data containing the predicted character as the expanded training data if the confidence coefficient of the predicted character is greater than a preset threshold value.

In one embodiment, the processor, when executing the computer program, further performs the steps of: representing words in the training data as word vectors, randomly representing byte segments of any sentence in the training data as target vectors, calculating the similarity between the target vectors and the word vectors, finding out synonym vectors of the target vectors based on the similarity, and replacing the byte segments with words corresponding to the synonym vectors to obtain the expanded training data.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and based on the training data, generating new training data by using a pre-trained generated model to obtain expanded training data, and training the pre-trained generated model based on historical sentence data.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and randomly removing byte segments of any sentence in the training data to obtain a target sentence, and predicting a corresponding new character by adopting a pre-training generation model aiming at the removed byte segments in the target sentence to obtain the expanded training data.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and inputting the feedback data of the current time as return data into the preset hybrid strategy search model again, updating the parameters of the preset hybrid strategy search model, and generating a new data expansion hybrid strategy based on the hybrid strategy search model after the parameters are updated.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and updating the parameters of the preset hybrid strategy search model according to a REINFORCE strategy gradient algorithm.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: obtaining strategy feedback data and training data of the current time, inputting the strategy feedback data of the current time to a preset mixed strategy search model to obtain a data expansion mixed strategy of the current time, expanding the training data according to the data expansion mixed strategy to obtain expanded training data, inputting the expanded training data to a preset recurrent neural network for training to obtain strategy feedback data corresponding to the data expansion mixed strategy, taking the strategy feedback data corresponding to the data expansion mixed strategy as the strategy feedback data of the current time, returning to the step of inputting the strategy feedback data of the current time to the preset mixed strategy search model to update the data expansion mixed strategy until the training times of the preset mixed strategy search model reach the preset training times to obtain an optimal data expansion mixed strategy.

In one embodiment, the computer program when executed by the processor further performs the steps of: and replacing any character in a sentence in the training data with a mask character by using a trained MLM model, predicting the character corresponding to the mask character according to the pre-trained language model to obtain a predicted character, and taking the training data containing the predicted character as the expanded training data if the confidence coefficient of the predicted character is greater than a preset threshold value.

In one embodiment, the computer program when executed by the processor further performs the steps of: representing words in the training data as word vectors, randomly representing byte segments of any sentence in the training data as target vectors, calculating the similarity between the target vectors and the word vectors, finding out synonym vectors of the target vectors based on the similarity, and replacing the byte segments with words corresponding to the synonym vectors to obtain the expanded training data.

In one embodiment, the computer program when executed by the processor further performs the steps of: and based on the training data, generating new training data by using a pre-trained generated model to obtain expanded training data, and training the pre-trained generated model based on historical sentence data.

In one embodiment, the computer program when executed by the processor further performs the steps of: and randomly removing byte segments of any sentence in the training data to obtain a target sentence, and predicting a corresponding new character by adopting a pre-training generation model aiming at the removed byte segments in the target sentence to obtain the expanded training data.

In one embodiment, the computer program when executed by the processor further performs the steps of: and inputting the feedback data of the current time as return data into the preset hybrid strategy search model again, updating the parameters of the preset hybrid strategy search model, and generating a new data expansion hybrid strategy based on the hybrid strategy search model after the parameters are updated.

In one embodiment, the computer program when executed by the processor further performs the steps of: and updating the parameters of the preset hybrid strategy search model according to a REINFORCE strategy gradient algorithm.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A data expansion mixing strategy generation method is characterized by comprising the following steps:

acquiring strategy feedback data and training data of the current time;

inputting the strategy feedback data of the current time into a preset hybrid strategy search model to obtain a data expansion hybrid strategy of the current time;

2. The method of claim 1, wherein said augmenting the training data according to the data augmentation hybrid strategy to obtain augmented training data comprises:

replacing any character in a sentence in the training data with a mask character by using a trained MLM model;

predicting characters corresponding to the mask characters according to a pre-trained language model to obtain predicted characters;

and if the confidence coefficient of the predicted character is larger than a preset threshold value, using the training data containing the predicted character as the expanded training data.

3. The method of claim 1, wherein said augmenting the training data according to the data augmentation hybrid strategy to obtain augmented training data comprises:

representing words in the training data as word vectors;

and replacing the byte segments with words corresponding to the synonym vector to obtain expanded training data.

4. The method of claim 1, wherein said augmenting the training data according to the data augmentation hybrid strategy to obtain augmented training data comprises:

and generating new training data by using a pre-trained generated model based on the training data to obtain expanded training data, wherein the pre-trained generated model is obtained by training based on historical sentence data.

5. The method of claim 4, wherein generating new training data using a pre-trained generative model based on the training data, and obtaining augmented training data comprises:

and predicting a corresponding new character by adopting a pre-trained generation model aiming at the removed byte segments in the target sentence to obtain expanded training data.

6. The method of claim 1, wherein the step of inputting the strategy feedback data of the current time into the preset hybrid strategy search model to update the data-augmented hybrid strategy comprises:

7. The method of claim 6, wherein updating the parameters of the preset blending strategy search model comprises:

8. A data augmentation hybrid policy generation apparatus, the apparatus comprising:

and the hybrid strategy updating module is used for taking the strategy feedback data corresponding to the data expansion hybrid strategy as the strategy feedback data of the current time, awakening the hybrid strategy acquisition module to execute the operation of inputting the strategy feedback data of the current time into a preset hybrid strategy search model so as to update the data expansion hybrid strategy until the training times of the preset hybrid strategy search model reach the preset training times, and obtaining the optimal data expansion hybrid strategy.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.