CN115098623A

CN115098623A - Physical training data feature extraction method based on BERT

Info

Publication number: CN115098623A
Application number: CN202210628842.6A
Authority: CN
Inventors: 刘翔; 何晓
Original assignee: China Shipbuilding Corp System Engineering Research Institute
Current assignee: China Shipbuilding Corp System Engineering Research Institute
Priority date: 2022-06-06
Filing date: 2022-06-06
Publication date: 2022-09-23

Abstract

The embodiment of the invention provides a physical training data feature extraction method based on BERT, which comprises the steps of taking 80% of daily training data of people who perform physical training as a training set of a model, taking 10% as a verification set and taking 10% as a test set, performing K-fold cross verification, and clustering the training by using imbedding of a word2vec pre-training physical training item in a BERT part; improving MLM, changing the form of the whole training information including attributes of the original direct mask into only one part of information for mask training; the fusion part fuses the age, the sex, the physiological state and the psychological state distribution of the population developing physical training, which meet the preset conditions, into a network; the output part is used for classifying and predicting the age, the sex, the physiological state and the psychological state distribution of the trainee and outputting the user class probability for realizing effective training by adopting different trainings; and training the model until the model is converged.

Description

Physical training data feature extraction method based on BERT

Technical Field

The invention relates to the technical field of data processing, in particular to a physical training data feature extraction method based on BERT.

Background

Along with the continuous development of economic level and national quality, the crowd who carries out physical training is huge day by day, and the data of all kinds of physical training is also abundant constantly. When a large amount of physical training data are classified and further analyzed, only the physical training data are subjected to feature extraction, and high-dimensional original data are mapped to low-dimensional manifold, so that the physical training data can become separable.

Journal articles 'application of data mining technology in physical training of fire-fighting troops [ J ] fire-fighting technology and product information, 2017(08): 74-77' use a fuzzy classification method to realize system classification, and provide basis for classification training. And (4) calculating by decomposition by combining the AHP method with training scores and standards of different items of the trainees.

However, in the above-mentioned paper technical solutions, the fuzzy classification method and the AHP method adopted by the method use subjective factors of experts as the basis for determining key features, so that the subjective factors have an excessive influence, it is difficult to implement implicit knowledge mining, and the scientific and effective feature extraction cannot be achieved, which will affect the data analysis effect.

Disclosure of Invention

The embodiment of the invention provides a physical training data feature extraction method based on BERT, which is characterized in that corresponding key features are extracted according to the difference of time sequence physical training data through feature engineering, the feature dimension of the training data is reduced, the data analysis difficulty of physical training is further reduced, the data analysis efficiency and effectiveness are improved, and the prediction of the training target effect is finally realized.

The embodiment of the invention provides a physical training data feature extraction method based on BERT, which takes 80% of daily training data of people who perform physical training as a training set of a model, 10% as a verification set and 10% as a test set, and performs K-fold cross verification, and comprises the following steps:

a BERT part, which is embodied by using word2vec to pre-train the embedding of the physical training item, and clustering the training; improving MLM, changing the form of the whole training information including attributes of the original direct mask into only one part of information for mask training;

the fusion part is used for fusing the distribution of the age, the sex, the physiological state and the psychological state of the population who carries out physical training and meets the preset conditions into a network;

the output part is used for classifying and predicting the age, the sex, the physiological state and the psychological state distribution of the trained personnel and outputting the user category probability which can realize effective training by adopting different training;

and training the model until the model is converged, and finishing the training, wherein the BERT part of the model is the training feature extractor.

In some embodiments of the present invention, the merging the distribution of the age, sex, physiological status and psychological status of the population who performs physical training and meets the preset conditions into the network includes:

dividing the population who meets the preset conditions and carries out physical training into a plurality of parts;

if the trainee performs multiple training and comes from the first part of the population, the rest of the population is used as seed population, the age, sex, physiological state and psychological state distribution of each training are calculated, and 12-layer transformers are adopted for fusion.

In some embodiments of the present invention, the population for developing physical training who meets the preset condition is divided into a plurality of parts, specifically:

the population who meets the preset conditions for developing physical training is divided into 64 parts.

In some embodiments of the present invention, the fusion performed by using 12 layers of transformers specifically includes:

and (3) coding by using a BERT model containing 12 layers of transformers as a coder, wherein each layer of Transformer takes the hidden variable of the previous layer as input and outputs the hidden variable with higher information interaction degree.

In some embodiments of the present invention, the predetermined condition is that a predetermined training time period is satisfied and training is performed at a predetermined frequency.

In some embodiments of the present invention, the method for extracting characteristics of physical training data based on BERT further includes:

when the improved MLM prediction training is adopted, the latest specific project name ID and the attributes thereof can be used for prediction, so that the embedding of different attributes are aligned in a semantic space.

The physical training data feature extraction method based on BERT provided by the embodiment of the invention has the following advantages: after physical training data input is received, physical training feature extraction is carried out through an improved BERT method, each training is assumed to be a word, a training sequence of a trained user is regarded as a sentence, namely, an original problem is converted into a text classification problem in the NLP (natural language processing) field, and with the help of a mature embedding method in the field, a training target effect prediction problem is converted into a classification problem according to the type of a stable training person, so that physical training data feature extraction and training effect prediction are achieved. A physical training feature extraction method based on BERT is established, and effective extraction of physical training data features is realized by means of mature and effective language feature extraction capability of the BERT.

Drawings

FIG. 1 is a flowchart of a method for extracting the characteristics of the BERT-based physical training data according to an embodiment of the present invention;

fig. 2 is a block diagram of a method for extracting physical training data features based on BERT according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, those skilled in the art will further describe the present invention with reference to the accompanying drawings and the detailed description.

The phrases "in one embodiment," "in another embodiment," "in yet another embodiment," "in an embodiment," "in some embodiments," or "in other embodiments" may be used in this specification to refer to one or more of the same or different embodiments in accordance with the invention.

Specific embodiments of the present invention are described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which can be embodied in various forms. Well-known and/or repeated functions and structures have not been described in detail so as to not unnecessarily obscure the present invention, but may be learned by the operation of the user's history and actual intent. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present invention in virtually any appropriately detailed structure.

The embodiment of the invention provides a physical training data feature extraction method based on BERT, as shown in fig. 1 and fig. 2, the method takes 80% of daily training data of a population performing physical training as a training set of a model, 10% as a verification set and 10% as a test set, and performs K-fold cross-validation, and the method comprises the following steps:

a BERT (bidirectional Encoder reproduction From Transformer) part, which is to pre-train the embedding of the physical training items by using word2vec, and cluster the training; improving MLM, changing the form of the whole training information including attributes of the original direct mask into only one part of information for mask training;

In some embodiments of the present invention, the integrating the distribution of the age, sex, physiological status and psychological status of the population performing physical training meeting the preset conditions into the network includes:

dividing the population for developing physical training meeting the preset conditions into a plurality of parts;

In some embodiments of the present invention, the predetermined condition is that a predetermined training duration is satisfied and training is performed at a predetermined frequency.

when the improved MLM prediction training is adopted, the latest specific project name ID and the attribute thereof can be used for prediction, so that embedding of different attributes are aligned in a semantic space.

According to the technical scheme, the method for extracting the physical training data features based on the BERT provided by the embodiment of the invention has the following advantages: after physical training data input is received, physical training characteristic extraction is carried out through an improved BERT method, each training is assumed to be a word, a training sequence of a trained user is regarded as a sentence, namely, an original problem is converted into a text classification problem in the NLP (natural language processing) field, and with the help of a mature embedding method in the field, a training target effect prediction problem is converted into a classification problem according to a stable training person type, so that physical training data characteristic extraction and training effect prediction are achieved. A physical training feature extraction method based on BERT is established, and effective extraction of physical training data features is realized by means of mature and effective language feature extraction capability of BERT.

In order to facilitate detailed description of the above technical solution, a specific example is described in detail below, and 80% of daily training data of a population who often performs physical training is selected as a training set of a model, 10% is selected as a verification set, and 10% is selected as a test set, and K-fold cross-validation is performed. FIG. 1 is a model introduction diagram of the technical solution of the present invention. As shown in fig. 1, the physical ability training feature extraction method of the present invention includes the following steps:

(1) BERT part: assuming that each training is a word, the training sequence of the trained user is regarded as a sentence, and the original question can be converted into a text classification question in the field of NLP (Natural Language Processing). And (3) coding by using a BERT model containing 12 layers of transformers as a coder, wherein each layer of Transformer takes the hidden variable of the previous layer as input, and finally outputs the hidden variable with higher information interaction degree. The BERT is trained by adopting a pre-training method of the following two stages: firstly, using word2vec to pre-train embedding of physical training items, and clustering similar training; and secondly, improving the MLM, and changing the form of the whole training information including the attributes of the original direct mask into the form of only one part of information (training items or attributes) of the mask training. When the improved mlm (masked language model) is used for prediction training, not only the latest specific item name ID but also the attribute of the latest specific item name ID can be used for prediction, so that embedding of different attributes can be aligned in a semantic space.

(2) And (3) fusion part: the method is characterized in that the age, the sex, the physiological state and the psychological state distribution of the population capable of stably developing physical training for a long time are integrated into a network, the population is divided into 64 parts, and if the trainee performs n training and comes from the first population, the rest population can be used as seed population, the age, the sex, the physiological state and the psychological state distribution of each training are calculated, and 12 layers of transformers are used for integration.

(3) An output section: the ages, the sexes, the physiological states and the psychological state distribution of the trained personnel are classified and predicted, and the user category probability of effective training can be output by adopting different training.

And (5) training the model until the model is converged, and finishing the training, wherein the BERT part of the model is the training feature extractor.

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. A method for extracting physical training data features based on BERT is characterized in that 80% of daily training data of a population subjected to physical training is used as a training set of a model, 10% is used as a verification set, and 10% is used as a test set, and K-fold cross validation is performed, wherein the method comprises the following steps:

2. The method for extracting characteristics of physical training data based on BERT according to claim 1, wherein the step of integrating the distribution of ages, sexes, physiological states and psychological states of the population who carries out physical training and meets preset conditions into a network comprises the following steps:

3. The BERT-based physical fitness training data feature extraction method according to claim 2, wherein the population for developing physical fitness training that meets preset conditions is divided into a plurality of parts, specifically:

the population who performs physical training and meets the preset conditions is divided into 64 parts.

4. The method for extracting physical training data features based on BERT according to claim 3, wherein the 12-layer Transformer is adopted for fusion, specifically:

5. The method of feature extraction for BERT-based physical training data according to claim 4,

and the preset condition is that the preset training time is met and the training is carried out at the preset frequency.

6. The method of claim 5, further comprising: