CN113343697A - Network protocol entity extraction method and system based on small sample learning - Google Patents

Network protocol entity extraction method and system based on small sample learning Download PDF

Info

Publication number
CN113343697A
CN113343697A CN202110660203.3A CN202110660203A CN113343697A CN 113343697 A CN113343697 A CN 113343697A CN 202110660203 A CN202110660203 A CN 202110660203A CN 113343697 A CN113343697 A CN 113343697A
Authority
CN
China
Prior art keywords
network protocol
protocol entity
entity
text
entity extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110660203.3A
Other languages
Chinese (zh)
Inventor
李守斌
常志远
胡军
王青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202110660203.3A priority Critical patent/CN113343697A/en
Publication of CN113343697A publication Critical patent/CN113343697A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides a network protocol entity extraction method and a system based on small sample learning, and the method can realize network protocol entity extraction on a large number of unmarked RFC documents and keep higher identification precision only by a small number of marked RFC document samples. The method comprises the steps of firstly mining potential network protocol entities in RFC documents as much as possible, and secondly accurately re-identifying the identified potential network protocol entities. Experiments show that when 5 artificially labeled RFC documents are used for training the model disclosed by the invention, the accuracy rate of network protocol entity extraction reaches 88.4%, and compared with the existing method, the method has higher precision and better robustness in the aspect of network protocol entity extraction, and also has better identification capability on network protocol entities which do not appear in a training set. The invention is helpful to realize the automatic analysis of the network protocol in the future and provides help for the research on the aspect of computer networks.

Description

Network protocol entity extraction method and system based on small sample learning
Technical Field
The invention belongs to the technical field of computers, and provides a network protocol entity extraction method and system based on small sample learning. The method can realize network protocol entity extraction of a large number of unlabelled RFC documents and keep higher identification precision only by a small number of RFC document samples with labels, and has important significance for the research of the field of computer networks.
Background
With the development of the internet era, the importance and the daily increase of the network security problem, and the network protocol is taken as the infrastructure in the internet, so that the enhancement of the deep analysis of the network protocol is very important. At present, many researches aiming at network protocols exist, such as utilizing an automatic fuzzy test to mine protocol bugs so as to improve the safety of the protocols; and the network protocol recognition algorithm is utilized to prevent network attack, so that the security of the network is further improved, and the like. In these studies, knowledge-based network protocol analysis is of particular importance. Researchers connect all different kinds of information together through the knowledge map by means of data mining, information processing, knowledge metering, graph drawing and the like to form a relationship network and analyze the problem from the perspective of the relationship, and the extraction of network protocol entities is a key ring for constructing the network protocol knowledge map by explaining the dynamic development rules of the knowledge field. RFC (request For comments) is a series of files, arranged by number, that collect network protocol related information about the internet, as well as software files For UNIX and the internet community, the basic internet communication protocol being specified in the RFC file. The RFC has long time span in the manuscript forming process, more organizations participating in writing and more types of contained network protocols, so that the document manuscript forming structure of the RFC is not standardized and uniform, and great difficulty is brought to the automatic extraction work of a network protocol entity.
Disclosure of Invention
Aiming at the problems, the invention provides a network protocol entity extraction method and a system based on small sample learning, aiming at accurately extracting network protocol entities by fully learning the semantic features of samples, wherein the training effect on small samples is consistent with the training effect on large samples, and the method has high robustness and also has higher extraction precision on network protocol entities which do not appear in training samples.
The technical scheme adopted by the invention is as follows:
a network protocol entity extraction method based on small sample learning comprises the following steps:
1) constructing a network protocol document set according to expert knowledge;
2) extracting fields and description information contained in a network protocol entity from the network protocol document set, and forming a network protocol information data set by the fields and the description information;
3) carrying out block processing on the network protocol information data set to form a network protocol text block set;
4) training a traditional machine learning model on the network protocol text block set to obtain a trained potential network protocol entity classifier;
5) training a network protocol entity accurate identification model based on a neural network by utilizing the network protocol text block set;
6) fusing the potential network protocol entity classifier and the network protocol entity precise identification model to obtain a network protocol entity extraction model based on small sample learning;
7) and performing network protocol entity extraction on the network protocol text to be subjected to entity extraction based on the network protocol entity extraction model based on the small sample learning.
Further, step 1) uses heuristic rules or toolkits to preprocess the documents in the network protocol document set (i.e. RFC document set), and the steps include:
removing headers and footers in the text by a pattern matching method;
most charts consist of a symbol "+ -" or other special character, which is first located in the text on the line where the symbol is located, and then every line containing a special symbol is deleted from that line down until the single line word sparsity is above a threshold.
Further, the step 3) of performing block processing on the network protocol information data set includes: each sentence in the text is converted into a grammar tree structure by applying an NLP tool in the CoreNLP package, and each sentence can be segmented into a plurality of grammar phrases according to the grammar tree.
Further, step 3) divides description information in the network protocol text block set obtained after the block processing into positive and negative samples, and the samples are represented by vectorization and then used as input of the traditional machine learning model in step 4) to generate a classifier for predicting potential network protocol entities, namely the potential network protocol entity classifier.
Further, the potential network protocol entity in step 4) includes twelve parts of speech that most negative samples include, the positive samples do not include, and a tool kit is used to extract the parts of speech corresponding to the network protocol entity and remove entities including the parts of speech, where the twelve parts of speech include adverb, verb indefinite form, verb singular verb, exclamatory word, quantifier, verb modal verb, preposition, verb, conditional conjunctive word, non-third person named singular, verb primitive, and noun ownership.
Further, step 5) performing word embedding processing on the network protocol text blocks in the network protocol text block set, dividing the network protocol text blocks according to a result set, inputting the divided network protocol text blocks into a network protocol field model, namely a network protocol entity accurate identification model, and training the network protocol entity accurate identification model sensitive to a protocol header field by using a neural network; the network protocol entity precise identification model comprises a linear aggregation layer and a nonlinear layer; the descriptive semantic information of the field information is ensured to be checked separately through the nonlinear layer, so that valuable information of the field information is reserved; all hidden states, i.e. intermediate results from the non-linear layer, are connected by the linear aggregation layer to fully exploit the inference results of the network.
Further, step 7) comprises:
1) preprocessing the network protocol text to be subjected to entity extraction according to the method;
2) inputting the preprocessed network protocol text block set into the constructed potential network protocol entity classifier to obtain a potential network protocol entity set;
3) inputting the obtained potential network protocol entity set into the constructed network protocol entity accurate identification model;
4) and inputting the result after the accurate identification model of the network protocol entity into a classification layer for classification to obtain an entity extraction result.
A network protocol entity extraction system based on small sample learning, comprising:
1) the model module comprises a network protocol entity extraction model constructed by the method, and the model receives a network protocol text of an entity to be extracted as input;
2) the fusion module is used for fusing the potential network protocol entity classifier and the network protocol entity precise identification model to obtain a network protocol entity precise identification model;
3) and the classification module is used for inputting the result of the network protocol entity accurate identification model into a classification layer for classification to obtain an entity extraction result.
A storage medium having stored therein a computer program for executing the above-mentioned method of the present invention.
An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the above-described method of the invention.
Compared with the prior art, the invention has the advantages that: the invention provides a network protocol entity extraction method based on small sample learning, which can realize protocol entity extraction of unmarked RFC documents and keep higher identification precision by training with a small amount of RFC document samples with marks. The model is helpful for realizing the automatic analysis of network protocols in the future and providing help for the research on computer networks.
Drawings
FIG. 1 shows a model architecture diagram of potential entity mining of the present invention.
Fig. 2 shows a deep learning network protocol entity extraction model architecture diagram of the present invention.
Fig. 3 shows an RFC text diagram corresponding to different extraction methods of the present invention.
FIG. 4 is a diagram illustrating an exemplary block of text resulting from the blocking process of the present invention.
Detailed Description
The present invention will be described in further detail below with reference to specific examples and the accompanying drawings.
The main content of the invention comprises:
1. network protocol document set
RFC is a series of numbered documents that gather network protocol related information about the internet, and software documents for UNIX and the internet community, the basic internet communication protocol being specified in the RFC document. The chart and text in the RFC document all contain network protocol entities (such as Offset, Sequence Number, Reserved, etc.), and if a character matching method is used to extract the RFC protocol entities from the chart, the entities cannot be correctly extracted because the same entities in different documents have different representation forms. And the text content of the RFC document contains the detailed description of the protocol entity, and the entity can be searched through semantic features, so that the protocol entity can be extracted from the text.
2. Latent entity mining model (latent network protocol entity classifier)
The first stage model architecture is shown in fig. 1. The model input is each word in a block, and is converted into a vector (F in FIG. 1) through TF-IDF1~F5) And then inputting the data into an SVM classifier for classification, and outputting a label corresponding to the block, wherein the block in the figure is input with the Sequence Number of the sender, is an entity reference of the Sequence Number in the result set, and is marked as a positive label. After all the blocks are classified by the classifier, the positive label set is used as the input to the next stage.
3. Accurate entity recognition model (network protocol entity accurate recognition model)
To understand more physical features, the present invention extends the sample set. After sample expansion, it was found that the accuracy of extracting entities from the first stage of the method of the invention was significantly reduced. To analyze the cause of the decrease in accuracy, the result set is compared with the extracted entities and the extracted entity set is divided into positive and negative samples. The goal of the rigorous solid object identification model is to reduce the number of negative samples as much as possible while maintaining the number of positive samples as unreduced.
Through analyzing the composition of positive and negative samples, the fact that most negative sample blocks contain verb phrases or adverbs but the positive samples do not contain the verb phrases or adverbs is found, parts of speech which only appear in the negative samples are summarized and presented in a form of table 1, and partial negative samples are deleted in a part of speech noise reduction mode. In the experiment, the part of speech of each word in the block is extracted by applying an NLP tool in a CoreNLP package, and the block containing the part of speech in the table 1 in a sample set is deleted.
Table 1. list of parts of speech contained in negative examples. The first column is part-of-speech tags in the NLP tool.
Part of speech tag Means of
RB Adverb
TO Indefinite verb form
VBZ Singular verb
UH Exclamation word
CD Volume word
MD Emotional verbs
IN Preposition word
VBG Kinetic noun
CC Conditional conjunctions
VBP Non-third person calls singular
VB Verb original shape
POS Noun all lattice
After the word property noise reduction is adopted for the entity set, half of the number of blocks in the original negative sample are deleted, the number of the positive samples is unchanged, and the remaining negative samples and the positive samples have noun phrases with the same word property, so that the noun phrases cannot be directly eliminated by using the word property.
The invention uses AttBi-LSTM as a classifier, positive and negative samples processed by the word characteristic as input, uses a word2vec model as word vector conversion for a deep learning model, and uses TF-IDF as vector conversion for SVM. The training classifier learns a score function S _2(Pos, Neg) to calculate the probability of a block being a positive sample, Pos representing a positive sample and Neg representing a negative sample. And applying the S _2 function to the new block set, accurately predicting positive samples in the block set, and selecting the positive samples as a final entity set, thereby further reducing the number of negative samples. Taking AttBi-LSTM (Bi-LSTM based on Attention mechanism) as an example, as shown in FIG. 2, the input is the firstAnd extracting a block left after the part of speech noise reduction of the entity in one stage, wherein the block consists of 5 words, and the positive and negative labels corresponding to the block are output after the AttBi-LSTM classification. The vector u in the figure represents the word importance, atTo normalize word weights. The sum of all the information in the sentence v is each
Figure BDA0003114872280000051
Is given in formula (a), wherein atAre the corresponding weights.
Figure BDA0003114872280000052
In FIG. 2, the Embedding Layer (Embedding Layer) is used for word Embedding, the Bi-LSTM Layer is used for vector encoding, the Attention Layer (Attention Layer) is used for feature focusing, and the fully-connected Layer + Softmax is used for tag classification.
Example (b):
the method comprises the following steps: acquisition of web protocol text
First, RFC documents are selected that comply with two specific rules: 1) the header field name is in the same row as the byte size it occupies, and the two are separated by special characters such as colon, space or parentheses, for example: (Type of Service:8 bits); 2) the detailed description of the header field is right below the field, the upper box of fig. 3 is an example that conforms to the above features, the lower box is a stand-alone text, the entity cannot be extracted by a heuristic method, and the extraction is required by the small sample-based network protocol entity extraction method of the present invention (abbreviated as FSL method in fig. 3, and the FLS is collectively called Few-Shot Learning), where the "ACK control bit" and "sequence number" are the header fields. Then, applying a heuristic-based method to extract entities from the RFC documents meeting the above conditions, and classifying the entities into a result set.
Step two: network protocol text preprocessing
1) Since the protocol entities are extracted from the RFC text, information that is not relevant to the text content of the description protocol field should be deleted in order to reduce text noise. The header and footer in the text are deleted first, the form of the part is basically fixed, and the part can be removed by a mode matching method. Next, the diagrams in the document are deleted, most of which consist of the symbol "+ -" or other special characters, and we can first locate the line in the text where the symbol is located and then delete the lines from that line down. Considering that the word sparsity of each line containing the graph is low, a threshold value is specified in advance, and the line-by-line deletion operation is stopped until a certain line does not contain a special symbol and the word sparsity is higher than the threshold value.
2) And partitioning the adjusted network protocol description text. The description part follows the header field definition, and the method of extracting the description part is as follows: after locating the header field, reading line by line from the lower part, stopping reading when the definition of the next header field is read, repeating the above operation to read the description part of the next field, and stopping reading the description part of the last field until the header field of the upper paragraph is matched. For each sample RFC, the extracted description text is saved separately for subsequent chunking. In the experiment, the NLP tool in the CoreNLP package is used to realize text segmentation, the NLP tool can convert each sentence in the text into a syntax tree structure, and each sentence can be segmented into a plurality of syntax phrases according to the syntax tree, as shown in fig. 4.
Step three: latent entity mining model based on traditional machine learning
Firstly, judging whether a training set text block is in an entity result set, and marking a positive label and a negative label on each block. And combining the text blocks of the training samples and the result set entity into an overall corpus, and converting each text block and the result set entity into vectors with the same dimension by using a TF-IDF method. In order to describe the characteristics of each label, each text block and all result set entities are subjected to cosine similarity operation, and finally all cosine values are combined to be used as the characteristics of the label of the text block. And (4) calculating a cosine value set of each block by using a matrix structure, and then sending the block characteristics and the labels into the SVM for training. The SVM classifier for classifying text blocks can be obtained preliminarily. Inputting a new cosine value vector between the blocks and the result set, predicting the labels of the blocks corresponding to the cosine vector characteristics, and using the text blocks predicted as positive labels as extraction entities of RFC documents in the first stage. The test set in the first stage is preprocessed and then sent to an SVM classifier, and a candidate entity set of the test set is obtained.
Step four: accurate entity recognition model based on neural network
Firstly, filtering out wrongly divided text blocks in the candidate entity set by using a word noise reduction method. And introducing a result set entity of the test set in the first stage, and dividing the block set subjected to the word de-noising treatment into a positive type and a negative type. The positive labelsets correspond to the set of blocks that contain the entities in any result set, and the negative labelsets are the opposite. The positive and negative label sets are converted into word vectors before being fed into the second stage classifier. We use the word2vec model for word vector transformation. The description text in the sample RFC document about the header field is merged as a total training corpus set. Word2vec obtains a language model by training a corpus, and converts input text into Word vectors. Also, because our corpus is not large, only the dimension of the word vector is set to 100. After the blocks in the positive and negative label sets are converted into word vectors through word2vec, the word vectors are divided into a training set and a testing set according to the ratio of 8:2 and sent into an AttBi-LSTM classifier for training. After training, a positive and negative label classifier can be obtained.
Step five: network protocol entity abstraction
For an unlabelled RFC document extraction protocol entity, firstly, an SVM classifier at the first stage screens the segmented text to obtain an extracted text block set at the first stage. And then, performing word de-noising on the text block set, screening the remaining text block set after word processing by using a second-stage deep learning model classifier, and extracting a positive sample block set. The positive sample set is taken as a network protocol entity extraction set of the un-labeled RFC document.
And (3) analysis: the invention provides a network protocol entity extraction method based on small sample learning to solve the entity extraction problem in the field of network protocols, and experiments prove that the accuracy (shown in table 2) of the method is obviously higher than that of a single entity extraction model, thereby proving that the method is feasible. Experiments show that when 5 artificially labeled RFC documents are used for training the model disclosed by the invention, the accuracy rate of network protocol entity extraction reaches 88.4%, and compared with the existing method, the method has higher precision and better robustness in the aspect of network protocol entity extraction, and also has better identification capability on network protocol entities which do not appear in a training set.
TABLE 2 Experimental results
Model name Accuracy of Recall rate F1
SVM 80.2% 53.5% 64.1%
AttBi-LSTM 76.1% 54.7% 63.7%
Combined model 88.4% 58.5% 70.4%
Based on the same inventive concept, another embodiment of the present invention provides a network protocol entity extraction system based on small sample learning, which includes:
the model module comprises a network protocol entity extraction model constructed by the method, and the model receives a network protocol text of an entity to be extracted as input;
the fusion module is used for fusing the potential network protocol entity classifier and the network protocol entity accurate identification model to obtain the network protocol entity extraction model;
and the classification module is used for inputting the result of the network protocol entity extraction model into a classification layer for classification to obtain an entity extraction result.
Wherein the specific implementation process of each module takes part in the description of the method of the present invention.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smartphone, etc.) comprising a memory storing a computer program configured to be executed by the processor and a processor, the computer program comprising instructions for performing the steps of the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.
Although specific details of the invention, algorithms and figures are disclosed for illustrative purposes, these are intended to aid in the understanding of the contents of the invention and the implementation in accordance therewith, as will be appreciated by those skilled in the art: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. The invention should not be limited to the preferred embodiments and drawings disclosed herein, but rather should be defined only by the scope of the appended claims.

Claims (10)

1. A network protocol entity extraction method based on small sample learning comprises the following steps:
constructing a network protocol document set according to expert knowledge;
extracting fields and description information contained in a network protocol entity from the network protocol document set, and forming a network protocol information data set by the fields and the description information;
carrying out block processing on the network protocol information data set to form a network protocol text block set;
training a machine learning model on the network protocol text block set to obtain a trained potential network protocol entity classifier;
training a network protocol entity accurate identification model based on a neural network by utilizing the network protocol text block set;
fusing the potential network protocol entity classifier and the network protocol entity precise identification model to obtain a network protocol entity extraction model based on small sample learning;
and performing network protocol entity extraction on the network protocol text to be subjected to entity extraction based on the network protocol entity extraction model based on the small sample learning.
2. The method of claim 1, wherein preprocessing documents in the network protocol document set using heuristic rules or toolkits comprises:
removing headers and footers in the text by a pattern matching method;
most charts consist of a symbol "+ -" or other special character, which is first located in the text on the line where the symbol is located, and then every line containing a special symbol is deleted from that line down until the single line word sparsity is above a threshold.
3. The method of claim 1, wherein the chunking the network protocol information dataset comprises: each sentence in the text is converted into a grammar tree structure by applying an NLP tool in the CoreNLP package, and each sentence can be segmented into a plurality of grammar phrases according to the grammar tree.
4. The method of claim 1, wherein description information in the segmented set of network protocol text segments is divided into positive and negative samples, and the samples are vectorized and represented to be used as an input of the machine learning model to generate the potential network protocol entity classifier.
5. The method of claim 1, wherein most negative examples of the potential network protocol entities include twelve parts of speech, positive examples do not include, extracting parts of speech corresponding to network protocol entities by using a tool kit, and removing entities including the parts of speech; the twelve parts of speech include adverbs, verb indefinite forms, single verbs, exclamation words, quantifiers, emotional verbs, prepositions, verb nouns, conditional conjunctions, non-third-person nominal unions, verb prototypes and noun ownership lattices.
6. The method of claim 1, wherein the network protocol text blocks in the network protocol text block set are divided according to a result set and input into a network protocol entity precise recognition model for training through word embedding processing, and a network protocol entity precise recognition model sensitive to a protocol header field is generated by using a neural network; the network protocol entity precise identification model comprises a linear aggregation layer and a nonlinear layer; the descriptive semantic information of the field information is ensured to be checked separately through the nonlinear layer, so that valuable information of the field information is reserved; all hidden states, i.e. intermediate results from the non-linear layer, are connected by the linear aggregation layer to fully exploit the inference results of the network.
7. The method of claim 1, wherein the performing network protocol entity extraction on the network protocol text to be subjected to entity extraction based on the small sample learning-based network protocol entity extraction model comprises:
preprocessing a network protocol text to be subjected to entity extraction;
inputting the preprocessed protocol text block set into the potential network protocol entity classifier to obtain a potential network protocol entity set;
inputting the obtained potential network protocol entity set into the network protocol entity accurate identification model;
and inputting the result after the network protocol entity precise identification model into a classification layer for classification to obtain an entity extraction result.
8. A network protocol entity extraction system based on small sample learning, comprising:
a model module, comprising a network protocol entity extraction model constructed by the method of any one of claims 1 to 7, the model receiving as input a network protocol text of an entity to be extracted;
the fusion module is used for fusing the potential network protocol entity classifier and the network protocol entity precise identification model to obtain a network protocol entity extraction model;
and the classification module is used for inputting the result of the network protocol entity extraction model into a classification layer for classification to obtain an entity extraction result.
9. A storage medium, characterized in that a computer program is stored in the storage medium, which computer program performs the method of any of claims 1-7.
10. An electronic device, comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the method of any of claims 1-7.
CN202110660203.3A 2021-06-15 2021-06-15 Network protocol entity extraction method and system based on small sample learning Pending CN113343697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110660203.3A CN113343697A (en) 2021-06-15 2021-06-15 Network protocol entity extraction method and system based on small sample learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110660203.3A CN113343697A (en) 2021-06-15 2021-06-15 Network protocol entity extraction method and system based on small sample learning

Publications (1)

Publication Number Publication Date
CN113343697A true CN113343697A (en) 2021-09-03

Family

ID=77477246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110660203.3A Pending CN113343697A (en) 2021-06-15 2021-06-15 Network protocol entity extraction method and system based on small sample learning

Country Status (1)

Country Link
CN (1) CN113343697A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912625A (en) * 2016-04-07 2016-08-31 北京大学 Linked data oriented entity classification method and system
US20170132529A1 (en) * 2000-09-28 2017-05-11 Intel Corporation Method and Apparatus for Extracting Entity Names and Their Relations
CN111259087A (en) * 2020-01-10 2020-06-09 中国科学院软件研究所 Computer network protocol entity linking method and system based on domain knowledge base
CN111274814A (en) * 2019-12-26 2020-06-12 浙江大学 Novel semi-supervised text entity information extraction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132529A1 (en) * 2000-09-28 2017-05-11 Intel Corporation Method and Apparatus for Extracting Entity Names and Their Relations
CN105912625A (en) * 2016-04-07 2016-08-31 北京大学 Linked data oriented entity classification method and system
CN111274814A (en) * 2019-12-26 2020-06-12 浙江大学 Novel semi-supervised text entity information extraction method
CN111259087A (en) * 2020-01-10 2020-06-09 中国科学院软件研究所 Computer network protocol entity linking method and system based on domain knowledge base

Similar Documents

Publication Publication Date Title
CN112434535B (en) Element extraction method, device, equipment and storage medium based on multiple models
CN113168499A (en) Method for searching patent document
CN113961685A (en) Information extraction method and device
CN111274829A (en) Sequence labeling method using cross-language information
CN111581964A (en) Theme analysis method for Chinese ancient books
CN112183102A (en) Named entity identification method based on attention mechanism and graph attention network
CN115718792A (en) Sensitive information extraction method based on natural semantic processing and deep learning
CN113157918A (en) Commodity name short text classification method and system based on attention mechanism
CN113065349A (en) Named entity recognition method based on conditional random field
CN115618866A (en) Method and system for paragraph identification and subject extraction of engineering project bid document
US11314922B1 (en) System and method for generating regulatory content requirement descriptions
CN114861082A (en) Multi-dimensional semantic representation-based aggressive comment detection method
Venkataramana et al. Abstractive text summarization using bart
CN113901813A (en) Event extraction method based on topic features and implicit sentence structure
CN112613315B (en) Text knowledge automatic extraction method, device, equipment and storage medium
CN111523301B (en) Contract document compliance checking method and device
Liu Automatic argumentative-zoning using word2vec
CN117009516A (en) Converter station fault strategy model training method, pushing method and device
CN116822513A (en) Named entity identification method integrating entity types and keyword features
Sondhi et al. A constrained hidden Markov model approach for non-explicit citation context extraction
US20230419110A1 (en) System and method for generating regulatory content requirement descriptions
CN115759082A (en) Text duplicate checking method and device based on improved Simhash algorithm
CN113420119B (en) Intelligent question-answering method, device, equipment and storage medium based on knowledge card
CN113343697A (en) Network protocol entity extraction method and system based on small sample learning
CN114330350A (en) Named entity identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210903