CN116340530A - Intelligent design method based on mechanical knowledge graph - Google Patents
Intelligent design method based on mechanical knowledge graph Download PDFInfo
- Publication number
- CN116340530A CN116340530A CN202310128512.5A CN202310128512A CN116340530A CN 116340530 A CN116340530 A CN 116340530A CN 202310128512 A CN202310128512 A CN 202310128512A CN 116340530 A CN116340530 A CN 116340530A
- Authority
- CN
- China
- Prior art keywords
- entity
- model
- module
- relationship
- knowledge graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000013461 design Methods 0.000 title claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 238000012360 testing method Methods 0.000 claims abstract description 7
- 238000012795 verification Methods 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 11
- 238000002372 labelling Methods 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 8
- 238000005516 engineering process Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 238000012015 optical character recognition Methods 0.000 claims description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000013515 script Methods 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 3
- 230000001364 causal effect Effects 0.000 claims description 2
- 230000007717 exclusion Effects 0.000 claims description 2
- 230000000977 initiatory effect Effects 0.000 claims description 2
- 238000011084 recovery Methods 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims 1
- 238000011156 evaluation Methods 0.000 abstract description 13
- 238000004519 manufacturing process Methods 0.000 description 10
- 230000015654 memory Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 230000002457 bidirectional effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 4
- 238000013145 classification model Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000003754 machining Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000007514 turning Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Probability & Statistics with Applications (AREA)
- Animal Behavior & Ethology (AREA)
- Manufacturing & Machinery (AREA)
- Economics (AREA)
- Human Computer Interaction (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses an intelligent design method based on a mechanical knowledge graph in the field of mechanical part manufacturability evaluation, which comprises the steps of converting industrial text document data into a document form which can be identified by a computer, marking a preprocessed data source, and dividing the preprocessed data source into a training set, a testing set and a verification set according to the classification; performing entity classification and relationship classification on the preprocessed data sources, respectively constructing an entity recognition model and an entity relationship model based on the training set, wherein the entity recognition model recognizes entities in a specific field, and the entity relationship model realizes extraction of relationships among the entities; the entity relation model, the entity identification model and the preprocessed data source are spliced into triples, and the triples are stored in a background database; according to the method, the entity and the relation in the field are defined, the knowledge structure of the data source is divided, and the entity identification and relation extraction model in the professional field is built, so that professional knowledge required by acquiring text data in the complicated professional field is facilitated, and the identification rate is improved.
Description
Technical Field
The invention relates to the field of evaluation of manufacturability of mechanical parts, in particular to a knowledge graph construction and intelligent aided design method for evaluating the manufacturability of parts.
Background
As new rounds of technology of industrial intelligence evolve, smart manufacturing is gradually converting manufacturing data from data intelligence to cognitive intelligence. How to transform multi-source heterogeneous data accumulated in manufacturing industry into specific industrial knowledge and how to use the knowledge more conveniently has been developed as the current main research content. In particular, multi-source heterogeneous data of the manufacturing industry is mainly stored in the form of industrial documents. Such files exist primarily in the form of natural text and a table of process cards. A qualified design and evaluation personnel needs to have knowledge in various fields to design and evaluate the whole life cycle or even multiple life cycles of the parts. If the information exchange between the designer and the processing personnel is insufficient or not timely, the designed part is easy to be processed or difficult to be processed, and the manufacturing cost is increased. Currently, most of the information of a document is handled manually by a technician. How to use the information to the maximum degree forms knowledge for intelligent factory service, so that the improvement of the competitiveness of enterprises becomes more and more urgent. Therefore, a method is needed to be designed, information is obtained from engineering documents to form available knowledge, meanwhile, the knowledge is shared to production designers and evaluators, information islands are broken, and product design and manufacturing quality is improved.
At present, a common entity identification method and a word vectorization method for carrying out map construction and aided design mainly comprise the following steps: bidirectional long-short-time memory network (Bi_LSTM), bidirectional long-short-time memory network+conditional random field (BiLSTM+CRF), word vectorization of common Word2vec model, and the like; the methods have limitations before, generally Chinese sentences have more complex composition components and more word ambiguity, and the methods cannot be used for clearly distinguishing certain terms or word ambiguity. The document with the Chinese patent application number of CN202111587368.9 discloses an improved BiLSTM-CRF electronic medical record naming entity identification method, and simultaneously, input characters and labels are combined by means of coding, a multi-head Attention mechanism is introduced to obtain more useful information in an Attention layer, and structured electronic medical record information extraction is carried out. However, when the method is used for word vectorization, the relation between contexts is not considered, key information between texts is not extracted, and therefore accuracy of recognition results may be reduced. The document with the Chinese patent application number of CN201910766428.X discloses a knowledge graph construction method, which adopts BERT (Bidirectional Encoder Representation from Transformers) fine tuning to carry out word vectorization, and adopts a classification algorithm to classify specific entity categories, so that the graph construction quality is improved, but the application field of the method is more specific, the recognition accuracy of text vocabulary aiming at different designated fields is lower, the text with stronger field applicability cannot be adapted, and the subsequent knowledge reusability is poorer.
Disclosure of Invention
The invention aims to solve the problems of the existing part manufacturability evaluation, and provides an intelligent design method based on a mechanical knowledge graph, which is accurate in identification, improves production and design efficiency and is convenient to use.
The invention adopts the following technical scheme to realize the purposes:
step (1): collecting industrial text document data, converting the industrial text document data into a document form which can be identified by a computer, obtaining an original data source, and preprocessing the original data source to obtain a preprocessed data source;
step (2): labeling the preprocessed data sources, and dividing the labeled data sources into a training set, a testing set and a verification set;
step (3): performing entity classification and relationship classification on the preprocessed data source, wherein the entity classification is divided into a product design class, a part processing class, a part assembly class and a related data class, the relationship classification is divided into a causal relationship, a mutual exclusion relationship, a finite relationship, an initiating relationship and a fixed relationship, and an entity recognition model and an entity relationship model are respectively constructed based on the training set to obtain a complete mechanical knowledge graph, the entity recognition model recognizes entities in the specific field, and the entity relationship model realizes extraction of the relationships among the entities;
step (4): and the entity relation model, the entity identification model and the preprocessed data source are spliced into triples, and the triples are stored in a background database.
The beneficial effects of the invention after adopting the technical scheme are as follows:
1. the invention defines the entity and the relation in the field, divides the knowledge structure of the data source and provides a basis for classifying the entity and the relation for the follow-up.
2. The invention builds the entity recognition and relation extraction model in the professional field, is convenient for acquiring the required professional knowledge from the text data in the complex professional field, and improves the recognition rate.
3. The invention extracts a large amount of text data in the field, stores the text data in a map form and provides a basis for various downstream tasks.
4. The invention constructs a question-answer template and classifies the related professional questions as far as possible. After the user inputs the questions, the system can correspondingly classify the questions so as to conveniently inquire the results of the questions.
5. The invention builds and trains a classification model of some common problems in the field, and provides a basis for a knowledge question-answering module.
6. The invention builds an auxiliary design platform by combining the front end framework, provides a friendly man-machine interaction interface, can perform multiple functions such as entity identification, statement question answering, entity inquiry and the like, is convenient for design and evaluation personnel to use, and improves production and design efficiency.
Drawings
The technical scheme of the invention is clearly and completely described below with reference to the accompanying drawings;
FIG. 1 is a design flow chart of the intelligent design method based on the mechanical knowledge graph of the invention;
FIG. 2 is a frame diagram of the data source of FIG. 1;
FIG. 3 is a graph of relationship types for the relationship classification of FIG. 1;
FIG. 4 is a block diagram of an entity identification and relationship extraction network model;
FIG. 5 is a schematic diagram of an auxiliary design platform module arrangement;
FIG. 6 is a question query parsing diagram of FIG. 5;
FIG. 7 is an entity class diagram in an embodiment.
Detailed Description
The invention extracts and saves a large amount of structured data and unstructured data stored in the factory and the factory, specifically divides the internal knowledge base, entity and entity belonged relation in the field, adopts a deep learning method to construct a specific extraction model, and finally stores the extraction result in a Neo4j graph database for later use. The method comprises the following steps:
referring to fig. 1, acquisition of a data source is performed in a data acquisition layer:
relevant industrial text document data, which is in a text form, is obtained and collected from factory internal part design documents and process manuals, process cards, external textbooks, recording documents, experience accumulated by workers and experts for a long time, and the like. The collected industrial text document data in text form is converted into a document form which can be recognized by a computer through OCR (Optical Character Recognition) recognition technology, and an original data source is obtained.
The original data source is input into the data processing layer, and the original data source obtained by the data obtaining layer is processed by adopting a common data preprocessing method to obtain the preprocessed data source. The common data preprocessing method comprises the methods of regular expression, stop word removal and the like, and performs data preprocessing on some useless or erroneous recognition information to obtain a required preprocessed data source.
Inputting the preprocessed data source into an entity recognition layer, firstly carrying out entity classification on the preprocessed data source in the entity recognition layer, defining specific categories of the entity classification, constructing an entity recognition data set, self-constructing an entity recognition model by the entity recognition data set, carrying out text entity recognition by combining the entity recognition model, recognizing the entity in a specific field by adopting the BERT+Bi_LSTM+CRF entity recognition model, and transmitting the text entity recognition data into the relation recognition layer. The preprocessed data sources obtained in the entity identification layer are also transferred into the relationship identification layer.
The entity classification, referring to fig. 2, is to study and analyze the related content in the fields of mechanical design and manufacturing evaluation, and the like, and define important terms and concepts in the fields. Dividing the preprocessed data sources into four types, wherein the specific types are as follows: first category, product design category: consists of part design intention, characteristic design experience and part structure manufacturability. Second category, parts machining category: consists of a part processing machine tool, a part processing cutter and a part processing manufacturability. Third, part assembly: consists of a part assembly type, a part assembly experience and a part assembly sequence. Fourth, related data: consists of a product design manual, a standard operation manual and a professional teaching material. The entities in the field have the characteristics of various categories, characteristic differentiation and the like, various entities are analyzed, the common characteristics are extracted to be used as classification basis, and for example, the types of the product entities can be further classified into holes, faces, cavities, grooves and the like.
The building entity identifies the data set: converting the preprocessed data source into a text-entity tag form, namely tagging the data set, specifically marking entity text contained in the preprocessed data source as [ @ entity part # entity class ]: if "the hole and the end face need to have chamfer transition" are manually marked as "[ @ hole #b-hole ] and [ @ end face #b-face [ @ chamfer #b-chamfer ]" through a marking tool, the following are defined as: B. i, O are each indicated at Begin, inner, other. Saving the marked text as a txt. Then writing a Python script file, and using a regular expression, wherein the specific formula is as follows: the specific content of each mark is filtered out from the input file, "[ @ | | [ # ]", then the irrelevant content is removed by the expression "[ [ ] (. For example: hole B-hole, O, terminal B-surface, surface I-surface, O, inverted B-inverted, angle I-angle, transition O.
Dividing the marked data set into a training set, a testing set and a verification set according to a certain proportion according to the data quantity of the text-entity label, and constructing the entity recognition model. For example, according to the ratio of past experience to results of 8:1:1.
The entity recognition model adopts a deep learning-based method, selects a TensorFlow framework to realize entity recognition, and selects a BERT (Bidirectional Encoder Representation from Transformers, converter-based bi-directional encoder representation technology) pre-training model. Based on the BERT pre-training model, using a transducer encoder as the subject language model, longer range dependencies can be captured and more efficient than the recurrent neural network. The specific network structure of the BERT pre-training model is shown in figure 4, and the structure comprises a BERT module, a forward LSTM module, a backward LSTM module and a CRF module, wherein the text in the training set is firstly sequentially input into the BERT module, the text is converted into 768-dimension word vectors, the word vectors output by the BERT module are simultaneously input into the forward LSTM module and the backward LSTM module, and finally the output result is outputSplicing, inputting the obtained product into a CRF module for classification treatment, and finally outputting a result. The pre-training task of the BERT pre-training model mainly comprises a MASK text preprocessing layer, a prediction upper sentence relationship, a prediction lower sentence relationship, a word embedding layer and a transform feature coding layer, wherein the transform coding layer can generate dynamic word vectors through a self-attention mechanism, so that the dynamic word vectors are more suitable for word vectorization in the mechanical field. In order to be more suitable for Chinese text, the invention adopts the full word MASK to carry out text MASK, namely, a single word in one word is MASK, and a complete word which belongs to the single word is MASK. For example, the original text: "hole and end face need chamfer", mask: hole and Mask][Mask]The chamfering is needed, so that the word ambiguity phenomenon can be better overcome, and 768-dimensional dynamic word vectors containing context semantics are generated; LSTM (Long-Short Term Memory, long-short term memory network) is a variant of Recurrent Neural Network (RNN), whose core is mainly the following structure: input gate I t Forgetting door F t Output door O t And a subsequent memory Cell, the formula is as follows:
wherein: w (w) i Inputting door weights; h is a t-1 Is an implicit layer vector; x is x t Is input data; b i An input gate bias term; sigma is a sigmoid function; w (w) f Forgetting the door weight; b f A forget door bias term; w (w) o Outputting door weights; b o To output a gate bias term.
The memory Cell stores historic memory contents, and updates the Cell after determining the reserved portion of the past memory and new contents, as follows:
cell=tannh(W c [h t-1, x t ]+b c ),
wherein: w (W) c Is implicit state weight; b c Biasing items for implicit states.
Input gate I t Forgetting door F t The combined action of the two is to discard useless information and transfer the useful information to the nextTime of day. Bi_LSTM (Bidirectional Long-Short Term Memory, two-way long and short term memory network) whose basic idea is to take forward and backward LSTM for each word sequence respectively, and then combine the output results at the same time. Thus for each instant of time, there is correspondence between forward and backward information. In the named entity recognition task, bi_lstm is good at handling long distance text information, but cannot handle dependency relationships between adjacent labels. Bi_lstm extracts Bi-directional semantic information while identifying mechanical entities, but does not take into account dependencies between entities. It is possible for an entity to predict consecutive 'B-', 'B-' or to appear starting with 'I-' and therefore the present invention chooses the relationship between CRF (conditional random field) prediction context labels to solve this problem. The characters in the required prediction text are calculated through Bi-LSTM to obtain an output result, and the CRF comprehensively scores the entity character labels in the required prediction field by considering the entity label score and the transfer score between adjacent character labels, wherein the specific scoring formula is as follows:
wherein:representing the transfer score between two adjacent labels in a text of a mechanical design field; />Y representing the ith character in the mechanical text i Tag scoring; n represents the number of characters currently entered. Comparing the correct labeling total score of the text characters of the mechanical design with all possible labeling total scores to obtain a probability value P (Y|X) of the correct labeling at present, wherein the probability value P (Y|X) is as follows:
and obtaining a correct probability value of character label prediction, and when P (Y|X) is close to 1, representing that the labeling result is consistent with the predicted result of the model, and effectively training the entity recognition model by mechanical design.
The BERT+Bi_LSTM+CRF model is selected as a main model for entity identification, a group of model parameters with the best identification effect are selected as final parameters after debugging, the specific Bi_LSTM+CRF model batch_size=32, the hidden layer number of LSTM is 2, dropout=0.5, epoch=80 and the like, and finally a TensorBoard module is added to carry out visual output on a training process curve.
Finally, evaluating the entity identification model by using a test set, wherein the evaluation index comprises an F-score, and the formula is as follows:
precision is the accuracy rate; recovery is the recall rate; TP is predicted as positive example, and is actually positive example; FP is predicted as positive and actually negative; FN is predicted negative and actually positive; repeating the training model step if the F-score is lower than a predetermined target value; if the F-score reaches a predetermined goal, the parametric model is selected for use.
In the relation recognition layer, the type of relation classification is defined, an entity relation extraction data set is built according to the data source preprocessed in the input entity recognition layer, and an entity relation model is built by the entity relation extraction data set. A complete domain knowledge graph is constructed, not only domain entities are needed, but also relationships among the entities are needed to be obtained, and the text_CNN+LSTM is adopted to extract the relationships among the entities.
And combining the entity relation model, the entity recognition model and the preprocessed data source in the input entity recognition layer to splice the entity relation model, the entity recognition model and the preprocessed data source into a triplet.
Type of relationship classification: the relationship between the mechanical design and the evaluation field is complicated, the relationship between the entities in the field and the relationship are classified according to the specific entity relationship, and the relationship between the mechanical design and the evaluation field is divided into five relationship types, as shown in fig. 3, which are respectively: the five relationship types comprise most of the relationships in the field, and the relationships under each relationship category can be subdivided into a plurality of relationships.
The entity relation model is similar to the entity recognition model in the entity recognition layer, and firstly, a relation extraction data set is manufactured: the text data transmitted by the entity recognition layer is firstly subjected to label processing, and the text data is in a sentence and relation label form and is used as a data set. Specific examples: if "chamfer is needed between the hole and the end face", there is a relation. The data set is divided into a training set, a testing set and a verification set according to the ratio of 8:1:1. Then, a complete domain knowledge graph is constructed, and not only domain entities but also relations among the entities are required to be acquired. And adopting a text convolutional neural network (textCNN) +LSTM to classify the relationship among the entities. The specific structure is shown in FIG. 4, and is mainly divided into a BERT module and a textCNN+LSTM module. And inputting the processed text data, converting the text into 768-dimensional vectors through the BERT module in sequence, inputting the output result into a convolution layer, a pooling layer and an LSTM circulation layer in sequence, and finally accessing the full-connection layer to output the result. Compared with other convolutional neural networks, the textCNN network has simple structure and fewer network parameters. To reduce semantic feature loss, the network employs only one layer of convolution and one layer of pooling. The network contains 32 filters, a convolution kernel of 3, a fill in SAME mode, and an activation function of RELU. In particular, the nature of the relation extraction can be understood as a text classification task. And evaluating the model by adopting the F-score evaluation index, selecting the model with the highest accuracy as a final parameter model, and building an entity relationship model in the field.
And writing a script into the triples by adopting a third party library contained in Python, and storing the triples into a MySQL and Neo4j graph database for subsequent use and maintenance.
And transmitting the triple data into a data storage layer, receiving the triple data by the data storage layer, and combining the prestored existing third-party database with two database software of MySQL and Neo4j to store the triple data into a background database.
In the application layer, the background database is applied and mainly comprises functions of user login, entity inquiry, auxiliary question and answer, knowledge update and the like.
The application layer builds an auxiliary decision-making platform by using a Django framework, and the main structure of the platform is shown in figure 5, and the specific structure is as follows: entity recognition (entity identification module), query (Query module), overview (Overview module), and Question and answer (auxiliary question-answering module).
Entity identification module: the module mainly realizes the functions of identifying the content input by the user, separating the entity words and labeling the parts of speech, and mainly adopts an entity identification model and a word separation model to identify whether the sentences input by the user contain the required entities or not.
And a query module: the web framework is connected with the Neo4j graph database to realize the query of entities, relations and node attributes in the graph database, and mainly realize the query and modification functions of node contents and relations. The knowledge graph constructed by the invention is constructed and stored based on the graph database Neo4j, and when inquiring knowledge, the knowledge needs to be searched through a Cypher inquiry statement used in Neo4j and an inquiry result is returned. After clicking and submitting the entity or relation to be queried at the front end of the webpage, automatically generating a Cypher query statement of the node or relation by the background to search data, returning the queried result to the front end webpage, and realizing visualization through the inserts such as Neovis. Js, ECHARTS and the like.
An overview module: the module mainly realizes the graph display function and displays partial nodes and relations contained in the database on a front-end interface.
An auxiliary question-answering module: firstly, identifying an entity from a query sentence, secondly, carrying out grammar analysis on the question, and finally extracting a structured semantic triplet from a natural language question through structural features of grammar matching dependency books, so as to provide a basis for subsequent question classification and template matching.
The invention adopts a question-answer form to carry out auxiliary design and evaluation: firstly, word segmentation and grammar analysis are carried out on a question raised by a user, specific entity and grammar relations in sentences are extracted, sentence classification is carried out, then, the extracted content is matched with a pre-designed question template, finally, the best matching question is obtained, and a Cypher (Cypher is a declarative graph database query language which has rich expressive force and can efficiently query and update graph data) sentence corresponding to the question is constructed to search a graph database, so that a final answer is obtained. These functions are integrated and visualized by some front-end tools and are concentrated into a web page, which is convenient for the user to operate.
There are many well-established tools available to identify entities from query statements. The language technology platform LTP is a whole set of Chinese natural language processing system, and the platform system provides a whole set of rich, efficient and high-precision Chinese natural language processing modules from bottom to top. Therefore, the invention selects LTP (Language Technology Platform) platform, carries out the earlier stage natural language processing work through the platform, carries out the dependency grammar analysis and semantic dependency analysis on the question, and the recognition result is shown in figure 6. And finally, searching the Neo4j database by constructing a Cypher sentence search, and returning a query result, wherein the situation of insufficient recognition of the words in the professional domain can occur to the dictionary of the LTP, so that the entity, the relationship and the attribute value in the extracted domain knowledge base form a domain professional dictionary, and the domain professional dictionary is additionally loaded into the LTP expansion dictionary. For example, "how to machine a stepped hole? "how/r process/v stepped hole/nmwp" is the result of the part of speech tagging, where r represents a pronoun, v represents a verb, nm represents a noun identified from the added dictionary, wp represents a punctuation mark. And extracting the structured semantic triples from the question through the structural features of the grammar dependent book.
Defining a question template: entities or relationships are identified from query sentences entered by the user, for example from sentences: "how to machine a stepped hole? "can extract" machining, stepped hole,? And (3) performing problem template matching according to the feature words, and finally searching a corresponding result in the map.
Because the field does not have higher-quality manually-marked questions and answers or structured Chinese data sets, the invention needs to construct a mechanical design and evaluation class problem template to support the realization of intelligent query of a map system. The sample of the query part of the concrete design processing type question is as follows:
problem classification model: in the present invention, the problem classification is mainly divided into two parts: 1. the method can effectively avoid the influence of common words on keywords and improve the relevance of the keywords and the text. 2. And selecting a naive Bayes text classifier as a feature classification model, constructing a required problem data set, evaluating the model by adopting the same evaluation method, and selecting a model with optimal indexes as a final model.
Extracting entity relation feature words from the query sentences, combining the feature words to form a Cypher query sentence, retrieving corresponding answers from a graph database Neo4j and returning the answers, wherein the process is as follows:
how does Q be a stepped hole machined?
K: how do/r process/v step hole/nm? Wp
A, turning step Kong Cuche finish turning
Q represents a specific question set, K represents the recognized question feature words and grammar set, and A represents the answer returned to the user. Generating a Cypher sentence by using the entities and the relations identified in the question, for example: 'MATCH (n) - [ r ] - > (m) WHERE n.name=' node name 'RETURN n, r, m', the queried result is output to the front end interface, and the result is returned to the user to complete the query.
One embodiment of the invention is provided below:
example 1
Aiming at any design model and historical design data, firstly, aiming at an enterprise internal similar or historical design database, collecting and arranging to obtain an industrial text document, and processing and cleaning the collected document by adopting an OCR and regular expression technology, wherein the specific cleaning results are as follows:
the axis of the hole is vertical to the end face, thereby reducing the cutting difficulty
The holes should be protected from deep holes, and the ratio L/D of depth to aperture should be less than or equal to 5
The depth-to-diameter ratios of the stepped holes cannot differ too much
The size of the hole meets the standard specification
The dimension tolerance A, the position tolerance B, the shape tolerance C and the roughness D of the hole should satisfy A > B > C > D
The width of the face should be as uniform as possible to avoid uneven impact cutting of the tool
Processing curved surface with complex surface avoiding structure
The long and thin cylindrical surface is avoided, and the ratio L/D of the length to the diameter of the cylindrical surface is less than or equal to 5
The plane with larger area should be processed with low precision as much as possible
Dimensional tolerance A, positional tolerance B, shape tolerance C and roughness D of the surface should satisfy A > B > C > D
For processing, a switching circle is arranged between adjacent side walls of the cavity
Avoiding sharp angles between the side walls and the floor of the chamber
The documents are classified according to the line entity shown in fig. 2, and as shown in fig. 7, the documents are divided into five general classes of holes, faces, grooves and tools, and the five classes are further divided into a plurality of entity classes according to the five classes, for example, the tool class is further divided into a cutter, a spanner and a machine tool, and the specific division classes are shown in fig. 7. Labeling the processed data, making a data set, respectively constructing an entity recognition model and an entity relation recognition model, and then training the model; and finally, storing the triples and visualizing the triples to a Neo4j graph database.
Constructing an auxiliary decision-making platform by adopting a Django framework, and deploying the trained entity recognition model to an entity recognition module; and connecting the neosis.js with the Neo4j graph database to finish the query function of the entity. And finally, building a question-answering platform, firstly building a question template, then adopting naive Bayes to classify questions, carrying out semantic and part-of-speech analysis on questions input by a user according to the LTP platform, knowing the intention of the user, then carrying out question classification, finally obtaining entities and relations in the questions of the user, building a Cypher sentence, inquiring and returning to the front end.
Claims (10)
1. An intelligent design method based on a mechanical knowledge graph is characterized by comprising the following steps:
step (1): collecting industrial text document data, converting the industrial text document data into a document form which can be identified by a computer, obtaining an original data source, and preprocessing the original data source to obtain a preprocessed data source;
step (2): labeling the preprocessed data sources, and dividing the labeled data sources into a training set, a testing set and a verification set;
step (3): performing entity classification and relationship classification on the preprocessed data source, wherein the entity classification is divided into a product design class, a part processing class, a part assembly class and a related data class, the relationship classification is divided into a causal relationship, a mutual exclusion relationship, a finite relationship, an initiating relationship and a fixed relationship, and an entity recognition model and an entity relationship model are respectively constructed based on the training set to obtain a complete mechanical knowledge graph, the entity recognition model recognizes entities in the specific field, and the entity relationship model realizes extraction of the relationships among the entities;
step (4): and the entity relation model, the entity identification model and the preprocessed data source are spliced into triples, and the triples are stored in a background database.
2. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: in the step (1), the industrial text document data is converted into a document form which can be recognized by a computer through an OCR (optical character recognition) technology, and the original data source is subjected to data preprocessing through a regular expression and a stop word removing method.
3. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: in the step (2), the text of the preprocessed data source is stored as a txt.ann format, a Python script file is written, a regular expression is used for filtering out specific contents of each mark, irrelevant contents are removed through the expression, and the contents are placed into an array to form a text-entity label.
4. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: and (3) the entity recognition model adopts a TensorFlow framework to realize entity recognition, and adopts a pre-training model consisting of a BERT module, a forward LSTM module, a backward LSTM module and a CRF module.
5. The intelligent design method based on the mechanical knowledge graph as claimed in claim 4, wherein the method is characterized in that: evaluating the entity identification model by adopting the test set, and evaluating indexes precision is the accuracy; recovery is the recall rate; TP is predicted as positive example, and is actually positive example; FP is predicted as positive and actually negative; FN is predicted negative and actually positive; repeating training if the F-score is below a predetermined target value; if the F-score reaches a predetermined goal, the parametric model is selected for use.
6. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: in the step (3), the entity relation model adopts a text convolutional neural network to classify the relation among entities, the entity relation model is divided into a BERT module and a textCNN+LSTM module, characters are converted into 768-dimensional vectors through the BERT module, an output result is sequentially input into a convolutional layer, a pooling layer and an LSTM circulating layer, and finally the result is output.
7. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: in the step (4), the triples are written with scripts by adopting a third party library contained in Python, and the triples are stored into a MySQL and Neo4j graph database.
8. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: in the step (4), the background database is applied, and the background database comprises the functions of user login, entity inquiry, auxiliary question and answer and knowledge updating.
9. The intelligent design method based on the mechanical knowledge graph as claimed in claim 8, wherein the method is characterized in that: an auxiliary decision-making platform comprising an entity identification module, a query module, an overview module and an auxiliary question-answering module is built.
10. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: the entity recognition module realizes the functions of recognizing the input content of a user, separating the entity into words and labeling the parts of speech, the query module is connected with the Neo4j graph database through a web frame to realize the query of the entity, the relation and the node attribute in the graph database, the overview module displays part of the nodes and the relation contained in the database on a front end interface, the auxiliary question-answering module recognizes the entity from query sentences, carries out grammar analysis on the question sentences, and extracts the structured semantic triples from natural language question sentences through the structural characteristics of grammar matching dependency books.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310128512.5A CN116340530A (en) | 2023-02-17 | 2023-02-17 | Intelligent design method based on mechanical knowledge graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310128512.5A CN116340530A (en) | 2023-02-17 | 2023-02-17 | Intelligent design method based on mechanical knowledge graph |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116340530A true CN116340530A (en) | 2023-06-27 |
Family
ID=86883044
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310128512.5A Pending CN116340530A (en) | 2023-02-17 | 2023-02-17 | Intelligent design method based on mechanical knowledge graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116340530A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116522233A (en) * | 2023-07-03 | 2023-08-01 | 国网北京市电力公司 | Method and system for extracting and classifying key point review content of research document |
CN117235929A (en) * | 2023-09-26 | 2023-12-15 | 中国科学院沈阳自动化研究所 | Three-dimensional CAD (computer aided design) generation type design method based on knowledge graph and machine learning |
CN118014072A (en) * | 2024-04-10 | 2024-05-10 | 中国电建集团昆明勘测设计研究院有限公司 | Construction method and system of knowledge graph for hydraulic and hydroelectric engineering |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080059607A1 (en) * | 1999-09-01 | 2008-03-06 | Eric Schneider | Method, product, and apparatus for processing a data request |
CN104933164A (en) * | 2015-06-26 | 2015-09-23 | 华南理工大学 | Method for extracting relations among named entities in Internet massive data and system thereof |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
US20200004832A1 (en) * | 2018-07-02 | 2020-01-02 | Babylon Partners Limited | Computer Implemented Method for Extracting and Reasoning with Meaning from Text |
CN111737496A (en) * | 2020-06-29 | 2020-10-02 | 东北电力大学 | Power equipment fault knowledge map construction method |
CN113010663A (en) * | 2021-04-26 | 2021-06-22 | 东华大学 | Adaptive reasoning question-answering method and system based on industrial cognitive map |
CN113312501A (en) * | 2021-06-29 | 2021-08-27 | 中新国际联合研究院 | Construction method and device of safety knowledge self-service query system based on knowledge graph |
CN113569054A (en) * | 2021-05-12 | 2021-10-29 | 浙江工业大学 | Knowledge graph construction method and system for multi-source Chinese financial bulletin document |
CN113723632A (en) * | 2021-08-27 | 2021-11-30 | 北京邮电大学 | Industrial equipment fault diagnosis method based on knowledge graph |
CN114911945A (en) * | 2022-04-13 | 2022-08-16 | 浙江大学 | Knowledge graph-based multi-value chain data management auxiliary decision model construction method |
CN115269857A (en) * | 2022-04-28 | 2022-11-01 | 东北林业大学 | Knowledge graph construction method and device based on document relation extraction |
-
2023
- 2023-02-17 CN CN202310128512.5A patent/CN116340530A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080059607A1 (en) * | 1999-09-01 | 2008-03-06 | Eric Schneider | Method, product, and apparatus for processing a data request |
CN104933164A (en) * | 2015-06-26 | 2015-09-23 | 华南理工大学 | Method for extracting relations among named entities in Internet massive data and system thereof |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
US20200004832A1 (en) * | 2018-07-02 | 2020-01-02 | Babylon Partners Limited | Computer Implemented Method for Extracting and Reasoning with Meaning from Text |
CN111737496A (en) * | 2020-06-29 | 2020-10-02 | 东北电力大学 | Power equipment fault knowledge map construction method |
CN113010663A (en) * | 2021-04-26 | 2021-06-22 | 东华大学 | Adaptive reasoning question-answering method and system based on industrial cognitive map |
CN113569054A (en) * | 2021-05-12 | 2021-10-29 | 浙江工业大学 | Knowledge graph construction method and system for multi-source Chinese financial bulletin document |
CN113312501A (en) * | 2021-06-29 | 2021-08-27 | 中新国际联合研究院 | Construction method and device of safety knowledge self-service query system based on knowledge graph |
CN113723632A (en) * | 2021-08-27 | 2021-11-30 | 北京邮电大学 | Industrial equipment fault diagnosis method based on knowledge graph |
CN114911945A (en) * | 2022-04-13 | 2022-08-16 | 浙江大学 | Knowledge graph-based multi-value chain data management auxiliary decision model construction method |
CN115269857A (en) * | 2022-04-28 | 2022-11-01 | 东北林业大学 | Knowledge graph construction method and device based on document relation extraction |
Non-Patent Citations (1)
Title |
---|
崔硕等: "基于深度学习的机械领域知识图谱构建及应用", 《制造技术与机床》, 2 February 2023 (2023-02-02), pages 83 - 89 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116522233A (en) * | 2023-07-03 | 2023-08-01 | 国网北京市电力公司 | Method and system for extracting and classifying key point review content of research document |
CN117235929A (en) * | 2023-09-26 | 2023-12-15 | 中国科学院沈阳自动化研究所 | Three-dimensional CAD (computer aided design) generation type design method based on knowledge graph and machine learning |
CN117235929B (en) * | 2023-09-26 | 2024-06-04 | 中国科学院沈阳自动化研究所 | Three-dimensional CAD (computer aided design) generation type design method based on knowledge graph and machine learning |
CN118014072A (en) * | 2024-04-10 | 2024-05-10 | 中国电建集团昆明勘测设计研究院有限公司 | Construction method and system of knowledge graph for hydraulic and hydroelectric engineering |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111639171B (en) | Knowledge graph question-answering method and device | |
CN112115238B (en) | Question-answering method and system based on BERT and knowledge base | |
CN111813802B (en) | Method for generating structured query statement based on natural language | |
CN110727779A (en) | Question-answering method and system based on multi-model fusion | |
CN110990590A (en) | Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning | |
CN107766483A (en) | The interactive answering method and system of a kind of knowledge based collection of illustrative plates | |
CN113987212A (en) | Knowledge graph construction method for process data in numerical control machining field | |
CN110765277B (en) | Knowledge-graph-based mobile terminal online equipment fault diagnosis method | |
CN116340530A (en) | Intelligent design method based on mechanical knowledge graph | |
CN113312501A (en) | Construction method and device of safety knowledge self-service query system based on knowledge graph | |
CN113962219A (en) | Semantic matching method and system for knowledge retrieval and question answering of power transformer | |
CN114238653B (en) | Method for constructing programming education knowledge graph, completing and intelligently asking and answering | |
CN116127084A (en) | Knowledge graph-based micro-grid scheduling strategy intelligent retrieval system and method | |
CN113919366A (en) | Semantic matching method and device for power transformer knowledge question answering | |
CN115577086A (en) | Bridge detection knowledge graph question-answering method based on hierarchical cross attention mechanism | |
CN113988071A (en) | Intelligent dialogue method and device based on financial knowledge graph and electronic equipment | |
CN112925918A (en) | Question-answer matching system based on disease field knowledge graph | |
CN116595195A (en) | Knowledge graph construction method, device and medium | |
CN115659947A (en) | Multi-item selection answering method and system based on machine reading understanding and text summarization | |
Sun | A natural language interface for querying graph databases | |
CN111104503A (en) | Construction engineering quality acceptance standard question-answering system and construction method thereof | |
CN114579709A (en) | Intelligent question-answering intention identification method based on knowledge graph | |
CN114817454A (en) | NLP knowledge graph construction method combining information content and BERT-BilSTM-CRF | |
CN117216221A (en) | Intelligent question-answering system based on knowledge graph and construction method | |
CN116342167A (en) | Intelligent cost measurement method and device based on sequence labeling named entity recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |