CN115098623A - Physical training data feature extraction method based on BERT - Google Patents

Physical training data feature extraction method based on BERT Download PDF

Info

Publication number
CN115098623A
CN115098623A CN202210628842.6A CN202210628842A CN115098623A CN 115098623 A CN115098623 A CN 115098623A CN 202210628842 A CN202210628842 A CN 202210628842A CN 115098623 A CN115098623 A CN 115098623A
Authority
CN
China
Prior art keywords
training
bert
population
physical
physical training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210628842.6A
Other languages
Chinese (zh)
Inventor
刘翔
何晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Shipbuilding Corp System Engineering Research Institute
Original Assignee
China Shipbuilding Corp System Engineering Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Shipbuilding Corp System Engineering Research Institute filed Critical China Shipbuilding Corp System Engineering Research Institute
Priority to CN202210628842.6A priority Critical patent/CN115098623A/en
Publication of CN115098623A publication Critical patent/CN115098623A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a physical training data feature extraction method based on BERT, which comprises the steps of taking 80% of daily training data of people who perform physical training as a training set of a model, taking 10% as a verification set and taking 10% as a test set, performing K-fold cross verification, and clustering the training by using imbedding of a word2vec pre-training physical training item in a BERT part; improving MLM, changing the form of the whole training information including attributes of the original direct mask into only one part of information for mask training; the fusion part fuses the age, the sex, the physiological state and the psychological state distribution of the population developing physical training, which meet the preset conditions, into a network; the output part is used for classifying and predicting the age, the sex, the physiological state and the psychological state distribution of the trainee and outputting the user class probability for realizing effective training by adopting different trainings; and training the model until the model is converged.

Description

Physical training data feature extraction method based on BERT
Technical Field
The invention relates to the technical field of data processing, in particular to a physical training data feature extraction method based on BERT.
Background
Along with the continuous development of economic level and national quality, the crowd who carries out physical training is huge day by day, and the data of all kinds of physical training is also abundant constantly. When a large amount of physical training data are classified and further analyzed, only the physical training data are subjected to feature extraction, and high-dimensional original data are mapped to low-dimensional manifold, so that the physical training data can become separable.
Journal articles 'application of data mining technology in physical training of fire-fighting troops [ J ] fire-fighting technology and product information, 2017(08): 74-77' use a fuzzy classification method to realize system classification, and provide basis for classification training. And (4) calculating by decomposition by combining the AHP method with training scores and standards of different items of the trainees.
However, in the above-mentioned paper technical solutions, the fuzzy classification method and the AHP method adopted by the method use subjective factors of experts as the basis for determining key features, so that the subjective factors have an excessive influence, it is difficult to implement implicit knowledge mining, and the scientific and effective feature extraction cannot be achieved, which will affect the data analysis effect.
Disclosure of Invention
The embodiment of the invention provides a physical training data feature extraction method based on BERT, which is characterized in that corresponding key features are extracted according to the difference of time sequence physical training data through feature engineering, the feature dimension of the training data is reduced, the data analysis difficulty of physical training is further reduced, the data analysis efficiency and effectiveness are improved, and the prediction of the training target effect is finally realized.
The embodiment of the invention provides a physical training data feature extraction method based on BERT, which takes 80% of daily training data of people who perform physical training as a training set of a model, 10% as a verification set and 10% as a test set, and performs K-fold cross verification, and comprises the following steps:
a BERT part, which is embodied by using word2vec to pre-train the embedding of the physical training item, and clustering the training; improving MLM, changing the form of the whole training information including attributes of the original direct mask into only one part of information for mask training;
the fusion part is used for fusing the distribution of the age, the sex, the physiological state and the psychological state of the population who carries out physical training and meets the preset conditions into a network;
the output part is used for classifying and predicting the age, the sex, the physiological state and the psychological state distribution of the trained personnel and outputting the user category probability which can realize effective training by adopting different training;
and training the model until the model is converged, and finishing the training, wherein the BERT part of the model is the training feature extractor.
In some embodiments of the present invention, the merging the distribution of the age, sex, physiological status and psychological status of the population who performs physical training and meets the preset conditions into the network includes:
dividing the population who meets the preset conditions and carries out physical training into a plurality of parts;
if the trainee performs multiple training and comes from the first part of the population, the rest of the population is used as seed population, the age, sex, physiological state and psychological state distribution of each training are calculated, and 12-layer transformers are adopted for fusion.
In some embodiments of the present invention, the population for developing physical training who meets the preset condition is divided into a plurality of parts, specifically:
the population who meets the preset conditions for developing physical training is divided into 64 parts.
In some embodiments of the present invention, the fusion performed by using 12 layers of transformers specifically includes:
and (3) coding by using a BERT model containing 12 layers of transformers as a coder, wherein each layer of Transformer takes the hidden variable of the previous layer as input and outputs the hidden variable with higher information interaction degree.
In some embodiments of the present invention, the predetermined condition is that a predetermined training time period is satisfied and training is performed at a predetermined frequency.
In some embodiments of the present invention, the method for extracting characteristics of physical training data based on BERT further includes:
when the improved MLM prediction training is adopted, the latest specific project name ID and the attributes thereof can be used for prediction, so that the embedding of different attributes are aligned in a semantic space.
The physical training data feature extraction method based on BERT provided by the embodiment of the invention has the following advantages: after physical training data input is received, physical training feature extraction is carried out through an improved BERT method, each training is assumed to be a word, a training sequence of a trained user is regarded as a sentence, namely, an original problem is converted into a text classification problem in the NLP (natural language processing) field, and with the help of a mature embedding method in the field, a training target effect prediction problem is converted into a classification problem according to the type of a stable training person, so that physical training data feature extraction and training effect prediction are achieved. A physical training feature extraction method based on BERT is established, and effective extraction of physical training data features is realized by means of mature and effective language feature extraction capability of the BERT.
Drawings
FIG. 1 is a flowchart of a method for extracting the characteristics of the BERT-based physical training data according to an embodiment of the present invention;
fig. 2 is a block diagram of a method for extracting physical training data features based on BERT according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, those skilled in the art will further describe the present invention with reference to the accompanying drawings and the detailed description.
The phrases "in one embodiment," "in another embodiment," "in yet another embodiment," "in an embodiment," "in some embodiments," or "in other embodiments" may be used in this specification to refer to one or more of the same or different embodiments in accordance with the invention.
Specific embodiments of the present invention are described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which can be embodied in various forms. Well-known and/or repeated functions and structures have not been described in detail so as to not unnecessarily obscure the present invention, but may be learned by the operation of the user's history and actual intent. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present invention in virtually any appropriately detailed structure.
The embodiment of the invention provides a physical training data feature extraction method based on BERT, as shown in fig. 1 and fig. 2, the method takes 80% of daily training data of a population performing physical training as a training set of a model, 10% as a verification set and 10% as a test set, and performs K-fold cross-validation, and the method comprises the following steps:
a BERT (bidirectional Encoder reproduction From Transformer) part, which is to pre-train the embedding of the physical training items by using word2vec, and cluster the training; improving MLM, changing the form of the whole training information including attributes of the original direct mask into only one part of information for mask training;
the fusion part is used for fusing the distribution of the age, the sex, the physiological state and the psychological state of the population who carries out physical training and meets the preset conditions into a network;
the output part is used for classifying and predicting the age, the sex, the physiological state and the psychological state distribution of the trained personnel and outputting the user category probability which can realize effective training by adopting different training;
and training the model until the model is converged, and finishing the training, wherein the BERT part of the model is the training feature extractor.
In some embodiments of the present invention, the integrating the distribution of the age, sex, physiological status and psychological status of the population performing physical training meeting the preset conditions into the network includes:
dividing the population for developing physical training meeting the preset conditions into a plurality of parts;
if the trainee performs multiple training and comes from the first part of the population, the rest of the population is used as seed population, the age, sex, physiological state and psychological state distribution of each training are calculated, and 12-layer transformers are adopted for fusion.
In some embodiments of the present invention, the population for developing physical training who meets the preset condition is divided into a plurality of parts, specifically:
the population who meets the preset conditions for developing physical training is divided into 64 parts.
In some embodiments of the present invention, the fusion performed by using 12 layers of transformers specifically includes:
and (3) coding by using a BERT model containing 12 layers of transformers as a coder, wherein each layer of Transformer takes the hidden variable of the previous layer as input and outputs the hidden variable with higher information interaction degree.
In some embodiments of the present invention, the predetermined condition is that a predetermined training duration is satisfied and training is performed at a predetermined frequency.
In some embodiments of the present invention, the method for extracting characteristics of physical training data based on BERT further includes:
when the improved MLM prediction training is adopted, the latest specific project name ID and the attribute thereof can be used for prediction, so that embedding of different attributes are aligned in a semantic space.
According to the technical scheme, the method for extracting the physical training data features based on the BERT provided by the embodiment of the invention has the following advantages: after physical training data input is received, physical training characteristic extraction is carried out through an improved BERT method, each training is assumed to be a word, a training sequence of a trained user is regarded as a sentence, namely, an original problem is converted into a text classification problem in the NLP (natural language processing) field, and with the help of a mature embedding method in the field, a training target effect prediction problem is converted into a classification problem according to a stable training person type, so that physical training data characteristic extraction and training effect prediction are achieved. A physical training feature extraction method based on BERT is established, and effective extraction of physical training data features is realized by means of mature and effective language feature extraction capability of BERT.
In order to facilitate detailed description of the above technical solution, a specific example is described in detail below, and 80% of daily training data of a population who often performs physical training is selected as a training set of a model, 10% is selected as a verification set, and 10% is selected as a test set, and K-fold cross-validation is performed. FIG. 1 is a model introduction diagram of the technical solution of the present invention. As shown in fig. 1, the physical ability training feature extraction method of the present invention includes the following steps:
(1) BERT part: assuming that each training is a word, the training sequence of the trained user is regarded as a sentence, and the original question can be converted into a text classification question in the field of NLP (Natural Language Processing). And (3) coding by using a BERT model containing 12 layers of transformers as a coder, wherein each layer of Transformer takes the hidden variable of the previous layer as input, and finally outputs the hidden variable with higher information interaction degree. The BERT is trained by adopting a pre-training method of the following two stages: firstly, using word2vec to pre-train embedding of physical training items, and clustering similar training; and secondly, improving the MLM, and changing the form of the whole training information including the attributes of the original direct mask into the form of only one part of information (training items or attributes) of the mask training. When the improved mlm (masked language model) is used for prediction training, not only the latest specific item name ID but also the attribute of the latest specific item name ID can be used for prediction, so that embedding of different attributes can be aligned in a semantic space.
(2) And (3) fusion part: the method is characterized in that the age, the sex, the physiological state and the psychological state distribution of the population capable of stably developing physical training for a long time are integrated into a network, the population is divided into 64 parts, and if the trainee performs n training and comes from the first population, the rest population can be used as seed population, the age, the sex, the physiological state and the psychological state distribution of each training are calculated, and 12 layers of transformers are used for integration.
(3) An output section: the ages, the sexes, the physiological states and the psychological state distribution of the trained personnel are classified and predicted, and the user category probability of effective training can be output by adopting different training.
And (5) training the model until the model is converged, and finishing the training, wherein the BERT part of the model is the training feature extractor.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims (6)

1. A method for extracting physical training data features based on BERT is characterized in that 80% of daily training data of a population subjected to physical training is used as a training set of a model, 10% is used as a verification set, and 10% is used as a test set, and K-fold cross validation is performed, wherein the method comprises the following steps:
a BERT part, which is embodied by using word2vec to pre-train the embedding of the physical training item, and clustering the training; improving MLM, changing the form of the whole training information including attributes of the original direct mask into only one part of information for mask training;
the fusion part is used for fusing the distribution of the age, the sex, the physiological state and the psychological state of the population who carries out physical training and meets the preset conditions into a network;
the output part is used for classifying and predicting the age, the sex, the physiological state and the psychological state distribution of the trained personnel and outputting the user category probability which can realize effective training by adopting different training;
and training the model until the model is converged, and finishing the training, wherein the BERT part of the model is the training feature extractor.
2. The method for extracting characteristics of physical training data based on BERT according to claim 1, wherein the step of integrating the distribution of ages, sexes, physiological states and psychological states of the population who carries out physical training and meets preset conditions into a network comprises the following steps:
dividing the population for developing physical training meeting the preset conditions into a plurality of parts;
if the trainee performs multiple training and comes from the first part of the population, the rest of the population is used as seed population, the age, sex, physiological state and psychological state distribution of each training are calculated, and 12-layer transformers are adopted for fusion.
3. The BERT-based physical fitness training data feature extraction method according to claim 2, wherein the population for developing physical fitness training that meets preset conditions is divided into a plurality of parts, specifically:
the population who performs physical training and meets the preset conditions is divided into 64 parts.
4. The method for extracting physical training data features based on BERT according to claim 3, wherein the 12-layer Transformer is adopted for fusion, specifically:
and (3) coding by using a BERT model containing 12 layers of transformers as a coder, wherein each layer of Transformer takes the hidden variable of the previous layer as input and outputs the hidden variable with higher information interaction degree.
5. The method of feature extraction for BERT-based physical training data according to claim 4,
and the preset condition is that the preset training time is met and the training is carried out at the preset frequency.
6. The method of claim 5, further comprising:
when the improved MLM prediction training is adopted, the latest specific project name ID and the attributes thereof can be used for prediction, so that the embedding of different attributes are aligned in a semantic space.
CN202210628842.6A 2022-06-06 2022-06-06 Physical training data feature extraction method based on BERT Pending CN115098623A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210628842.6A CN115098623A (en) 2022-06-06 2022-06-06 Physical training data feature extraction method based on BERT

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210628842.6A CN115098623A (en) 2022-06-06 2022-06-06 Physical training data feature extraction method based on BERT

Publications (1)

Publication Number Publication Date
CN115098623A true CN115098623A (en) 2022-09-23

Family

ID=83289393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210628842.6A Pending CN115098623A (en) 2022-06-06 2022-06-06 Physical training data feature extraction method based on BERT

Country Status (1)

Country Link
CN (1) CN115098623A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647612A (en) * 2019-09-18 2020-01-03 合肥工业大学 Visual conversation generation method based on double-visual attention network
US20210089724A1 (en) * 2019-09-25 2021-03-25 Google Llc Contrastive Pre-Training for Language Tasks
CN112732874A (en) * 2021-01-15 2021-04-30 北京明略昭辉科技有限公司 Training method and device for user identification call-through model
CN113434699A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Pre-training method of BERT model, computer device and storage medium
CN113723106A (en) * 2021-07-29 2021-11-30 北京工业大学 Zero sample text classification method based on label extension
CN114218921A (en) * 2021-11-30 2022-03-22 中国医学科学院医学信息研究所 Problem semantic matching method for optimizing BERT
US20220139384A1 (en) * 2020-11-03 2022-05-05 Salesforce.Com, Inc. System and methods for training task-oriented dialogue (tod) language models
CN114580418A (en) * 2022-03-04 2022-06-03 河北师范大学 Knowledge map system for police physical training

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647612A (en) * 2019-09-18 2020-01-03 合肥工业大学 Visual conversation generation method based on double-visual attention network
US20210089724A1 (en) * 2019-09-25 2021-03-25 Google Llc Contrastive Pre-Training for Language Tasks
US20220139384A1 (en) * 2020-11-03 2022-05-05 Salesforce.Com, Inc. System and methods for training task-oriented dialogue (tod) language models
CN112732874A (en) * 2021-01-15 2021-04-30 北京明略昭辉科技有限公司 Training method and device for user identification call-through model
CN113434699A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Pre-training method of BERT model, computer device and storage medium
CN113723106A (en) * 2021-07-29 2021-11-30 北京工业大学 Zero sample text classification method based on label extension
CN114218921A (en) * 2021-11-30 2022-03-22 中国医学科学院医学信息研究所 Problem semantic matching method for optimizing BERT
CN114580418A (en) * 2022-03-04 2022-06-03 河北师范大学 Knowledge map system for police physical training

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张媛媛;: "数据挖掘技术在消防部队体能训练中的应用", 消防技术与产品信息, no. 08, 15 August 2017 (2017-08-15) *
徐菲菲;冯东升;: "文本词向量与预训练语言模型研究", 上海电力大学学报, no. 04, 15 August 2020 (2020-08-15) *

Similar Documents

Publication Publication Date Title
Ahmed et al. Tweeting India’s Nirbhaya protest: A study of emotional dynamics in an online social movement
CN109885670A (en) A kind of interaction attention coding sentiment analysis method towards topic text
CN112749341B (en) Important public opinion recommendation method, readable storage medium and data processing device
CN105701210A (en) Microblog theme emotion analysis method based on mixed characteristic calculation
CN107688870B (en) Text stream input-based hierarchical factor visualization analysis method and device for deep neural network
CN109558492A (en) A kind of listed company's knowledge mapping construction method and device suitable for event attribution
US20110264443A1 (en) Information processing device, information processing method, and program
Liu et al. Learning to predict population-level label distributions
Jorge-Botana et al. Bridging the theoretical gap between semantic representation models without the pressure of a ranking: some lessons learnt from LSA
Mirza Extracting temporal and causal relations between events
Jimenez et al. Word2set: WordNet-based word representation rivaling neural word embedding for lexical similarity and sentiment analysis
CN116383430A (en) Knowledge graph construction method, device, equipment and storage medium
CN117909507A (en) AI-based data classification system
Lu et al. An emotion analysis method using multi-channel convolution neural network in social networks
CN112836043A (en) Long text clustering method and device based on pre-training language model
CN115600602B (en) Method, system and terminal device for extracting key elements of long text
Feng et al. Incident threading for news passages
Lipizzi et al. A computational model implementing subjectivity with the'Room Theory'. The case of detecting Emotion from Text
Ireland et al. Sentimental analysis for AIML-based e-health conversational agents
CN115098623A (en) Physical training data feature extraction method based on BERT
Bai et al. Gated character-aware convolutional neural network for effective automated essay scoring
CN111625623B (en) Text theme extraction method, text theme extraction device, computer equipment, medium and program product
Westbury Prenominal adjective order is such a fat big deal because adjectives are ordered by likely need
CN111815426B (en) Data processing method and terminal related to financial investment and research
Ji et al. Improving non-negative positive-unlabeled learning for news headline classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination