CN116340530A - Intelligent design method based on mechanical knowledge graph - Google Patents

Intelligent design method based on mechanical knowledge graph Download PDF

Info

Publication number
CN116340530A
CN116340530A CN202310128512.5A CN202310128512A CN116340530A CN 116340530 A CN116340530 A CN 116340530A CN 202310128512 A CN202310128512 A CN 202310128512A CN 116340530 A CN116340530 A CN 116340530A
Authority
CN
China
Prior art keywords
entity
model
module
relationship
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310128512.5A
Other languages
Chinese (zh)
Inventor
张春燕
崔硕
王林
王岩岩
李骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202310128512.5A priority Critical patent/CN116340530A/en
Publication of CN116340530A publication Critical patent/CN116340530A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Probability & Statistics with Applications (AREA)
  • Animal Behavior & Ethology (AREA)
  • Manufacturing & Machinery (AREA)
  • Economics (AREA)
  • Human Computer Interaction (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an intelligent design method based on a mechanical knowledge graph in the field of mechanical part manufacturability evaluation, which comprises the steps of converting industrial text document data into a document form which can be identified by a computer, marking a preprocessed data source, and dividing the preprocessed data source into a training set, a testing set and a verification set according to the classification; performing entity classification and relationship classification on the preprocessed data sources, respectively constructing an entity recognition model and an entity relationship model based on the training set, wherein the entity recognition model recognizes entities in a specific field, and the entity relationship model realizes extraction of relationships among the entities; the entity relation model, the entity identification model and the preprocessed data source are spliced into triples, and the triples are stored in a background database; according to the method, the entity and the relation in the field are defined, the knowledge structure of the data source is divided, and the entity identification and relation extraction model in the professional field is built, so that professional knowledge required by acquiring text data in the complicated professional field is facilitated, and the identification rate is improved.

Description

Intelligent design method based on mechanical knowledge graph
Technical Field
The invention relates to the field of evaluation of manufacturability of mechanical parts, in particular to a knowledge graph construction and intelligent aided design method for evaluating the manufacturability of parts.
Background
As new rounds of technology of industrial intelligence evolve, smart manufacturing is gradually converting manufacturing data from data intelligence to cognitive intelligence. How to transform multi-source heterogeneous data accumulated in manufacturing industry into specific industrial knowledge and how to use the knowledge more conveniently has been developed as the current main research content. In particular, multi-source heterogeneous data of the manufacturing industry is mainly stored in the form of industrial documents. Such files exist primarily in the form of natural text and a table of process cards. A qualified design and evaluation personnel needs to have knowledge in various fields to design and evaluate the whole life cycle or even multiple life cycles of the parts. If the information exchange between the designer and the processing personnel is insufficient or not timely, the designed part is easy to be processed or difficult to be processed, and the manufacturing cost is increased. Currently, most of the information of a document is handled manually by a technician. How to use the information to the maximum degree forms knowledge for intelligent factory service, so that the improvement of the competitiveness of enterprises becomes more and more urgent. Therefore, a method is needed to be designed, information is obtained from engineering documents to form available knowledge, meanwhile, the knowledge is shared to production designers and evaluators, information islands are broken, and product design and manufacturing quality is improved.
At present, a common entity identification method and a word vectorization method for carrying out map construction and aided design mainly comprise the following steps: bidirectional long-short-time memory network (Bi_LSTM), bidirectional long-short-time memory network+conditional random field (BiLSTM+CRF), word vectorization of common Word2vec model, and the like; the methods have limitations before, generally Chinese sentences have more complex composition components and more word ambiguity, and the methods cannot be used for clearly distinguishing certain terms or word ambiguity. The document with the Chinese patent application number of CN202111587368.9 discloses an improved BiLSTM-CRF electronic medical record naming entity identification method, and simultaneously, input characters and labels are combined by means of coding, a multi-head Attention mechanism is introduced to obtain more useful information in an Attention layer, and structured electronic medical record information extraction is carried out. However, when the method is used for word vectorization, the relation between contexts is not considered, key information between texts is not extracted, and therefore accuracy of recognition results may be reduced. The document with the Chinese patent application number of CN201910766428.X discloses a knowledge graph construction method, which adopts BERT (Bidirectional Encoder Representation from Transformers) fine tuning to carry out word vectorization, and adopts a classification algorithm to classify specific entity categories, so that the graph construction quality is improved, but the application field of the method is more specific, the recognition accuracy of text vocabulary aiming at different designated fields is lower, the text with stronger field applicability cannot be adapted, and the subsequent knowledge reusability is poorer.
Disclosure of Invention
The invention aims to solve the problems of the existing part manufacturability evaluation, and provides an intelligent design method based on a mechanical knowledge graph, which is accurate in identification, improves production and design efficiency and is convenient to use.
The invention adopts the following technical scheme to realize the purposes:
step (1): collecting industrial text document data, converting the industrial text document data into a document form which can be identified by a computer, obtaining an original data source, and preprocessing the original data source to obtain a preprocessed data source;
step (2): labeling the preprocessed data sources, and dividing the labeled data sources into a training set, a testing set and a verification set;
step (3): performing entity classification and relationship classification on the preprocessed data source, wherein the entity classification is divided into a product design class, a part processing class, a part assembly class and a related data class, the relationship classification is divided into a causal relationship, a mutual exclusion relationship, a finite relationship, an initiating relationship and a fixed relationship, and an entity recognition model and an entity relationship model are respectively constructed based on the training set to obtain a complete mechanical knowledge graph, the entity recognition model recognizes entities in the specific field, and the entity relationship model realizes extraction of the relationships among the entities;
step (4): and the entity relation model, the entity identification model and the preprocessed data source are spliced into triples, and the triples are stored in a background database.
The beneficial effects of the invention after adopting the technical scheme are as follows:
1. the invention defines the entity and the relation in the field, divides the knowledge structure of the data source and provides a basis for classifying the entity and the relation for the follow-up.
2. The invention builds the entity recognition and relation extraction model in the professional field, is convenient for acquiring the required professional knowledge from the text data in the complex professional field, and improves the recognition rate.
3. The invention extracts a large amount of text data in the field, stores the text data in a map form and provides a basis for various downstream tasks.
4. The invention constructs a question-answer template and classifies the related professional questions as far as possible. After the user inputs the questions, the system can correspondingly classify the questions so as to conveniently inquire the results of the questions.
5. The invention builds and trains a classification model of some common problems in the field, and provides a basis for a knowledge question-answering module.
6. The invention builds an auxiliary design platform by combining the front end framework, provides a friendly man-machine interaction interface, can perform multiple functions such as entity identification, statement question answering, entity inquiry and the like, is convenient for design and evaluation personnel to use, and improves production and design efficiency.
Drawings
The technical scheme of the invention is clearly and completely described below with reference to the accompanying drawings;
FIG. 1 is a design flow chart of the intelligent design method based on the mechanical knowledge graph of the invention;
FIG. 2 is a frame diagram of the data source of FIG. 1;
FIG. 3 is a graph of relationship types for the relationship classification of FIG. 1;
FIG. 4 is a block diagram of an entity identification and relationship extraction network model;
FIG. 5 is a schematic diagram of an auxiliary design platform module arrangement;
FIG. 6 is a question query parsing diagram of FIG. 5;
FIG. 7 is an entity class diagram in an embodiment.
Detailed Description
The invention extracts and saves a large amount of structured data and unstructured data stored in the factory and the factory, specifically divides the internal knowledge base, entity and entity belonged relation in the field, adopts a deep learning method to construct a specific extraction model, and finally stores the extraction result in a Neo4j graph database for later use. The method comprises the following steps:
referring to fig. 1, acquisition of a data source is performed in a data acquisition layer:
relevant industrial text document data, which is in a text form, is obtained and collected from factory internal part design documents and process manuals, process cards, external textbooks, recording documents, experience accumulated by workers and experts for a long time, and the like. The collected industrial text document data in text form is converted into a document form which can be recognized by a computer through OCR (Optical Character Recognition) recognition technology, and an original data source is obtained.
The original data source is input into the data processing layer, and the original data source obtained by the data obtaining layer is processed by adopting a common data preprocessing method to obtain the preprocessed data source. The common data preprocessing method comprises the methods of regular expression, stop word removal and the like, and performs data preprocessing on some useless or erroneous recognition information to obtain a required preprocessed data source.
Inputting the preprocessed data source into an entity recognition layer, firstly carrying out entity classification on the preprocessed data source in the entity recognition layer, defining specific categories of the entity classification, constructing an entity recognition data set, self-constructing an entity recognition model by the entity recognition data set, carrying out text entity recognition by combining the entity recognition model, recognizing the entity in a specific field by adopting the BERT+Bi_LSTM+CRF entity recognition model, and transmitting the text entity recognition data into the relation recognition layer. The preprocessed data sources obtained in the entity identification layer are also transferred into the relationship identification layer.
The entity classification, referring to fig. 2, is to study and analyze the related content in the fields of mechanical design and manufacturing evaluation, and the like, and define important terms and concepts in the fields. Dividing the preprocessed data sources into four types, wherein the specific types are as follows: first category, product design category: consists of part design intention, characteristic design experience and part structure manufacturability. Second category, parts machining category: consists of a part processing machine tool, a part processing cutter and a part processing manufacturability. Third, part assembly: consists of a part assembly type, a part assembly experience and a part assembly sequence. Fourth, related data: consists of a product design manual, a standard operation manual and a professional teaching material. The entities in the field have the characteristics of various categories, characteristic differentiation and the like, various entities are analyzed, the common characteristics are extracted to be used as classification basis, and for example, the types of the product entities can be further classified into holes, faces, cavities, grooves and the like.
The building entity identifies the data set: converting the preprocessed data source into a text-entity tag form, namely tagging the data set, specifically marking entity text contained in the preprocessed data source as [ @ entity part # entity class ]: if "the hole and the end face need to have chamfer transition" are manually marked as "[ @ hole #b-hole ] and [ @ end face #b-face [ @ chamfer #b-chamfer ]" through a marking tool, the following are defined as: B. i, O are each indicated at Begin, inner, other. Saving the marked text as a txt. Then writing a Python script file, and using a regular expression, wherein the specific formula is as follows: the specific content of each mark is filtered out from the input file, "[ @ | | [ # ]", then the irrelevant content is removed by the expression "[ [ ] (. For example: hole B-hole, O, terminal B-surface, surface I-surface, O, inverted B-inverted, angle I-angle, transition O.
Dividing the marked data set into a training set, a testing set and a verification set according to a certain proportion according to the data quantity of the text-entity label, and constructing the entity recognition model. For example, according to the ratio of past experience to results of 8:1:1.
The entity recognition model adopts a deep learning-based method, selects a TensorFlow framework to realize entity recognition, and selects a BERT (Bidirectional Encoder Representation from Transformers, converter-based bi-directional encoder representation technology) pre-training model. Based on the BERT pre-training model, using a transducer encoder as the subject language model, longer range dependencies can be captured and more efficient than the recurrent neural network. The specific network structure of the BERT pre-training model is shown in figure 4, and the structure comprises a BERT module, a forward LSTM module, a backward LSTM module and a CRF module, wherein the text in the training set is firstly sequentially input into the BERT module, the text is converted into 768-dimension word vectors, the word vectors output by the BERT module are simultaneously input into the forward LSTM module and the backward LSTM module, and finally the output result is outputSplicing, inputting the obtained product into a CRF module for classification treatment, and finally outputting a result. The pre-training task of the BERT pre-training model mainly comprises a MASK text preprocessing layer, a prediction upper sentence relationship, a prediction lower sentence relationship, a word embedding layer and a transform feature coding layer, wherein the transform coding layer can generate dynamic word vectors through a self-attention mechanism, so that the dynamic word vectors are more suitable for word vectorization in the mechanical field. In order to be more suitable for Chinese text, the invention adopts the full word MASK to carry out text MASK, namely, a single word in one word is MASK, and a complete word which belongs to the single word is MASK. For example, the original text: "hole and end face need chamfer", mask: hole and Mask][Mask]The chamfering is needed, so that the word ambiguity phenomenon can be better overcome, and 768-dimensional dynamic word vectors containing context semantics are generated; LSTM (Long-Short Term Memory, long-short term memory network) is a variant of Recurrent Neural Network (RNN), whose core is mainly the following structure: input gate I t Forgetting door F t Output door O t And a subsequent memory Cell, the formula is as follows:
Figure BDA0004082943880000051
wherein: w (w) i Inputting door weights; h is a t-1 Is an implicit layer vector; x is x t Is input data; b i An input gate bias term; sigma is a sigmoid function; w (w) f Forgetting the door weight; b f A forget door bias term; w (w) o Outputting door weights; b o To output a gate bias term.
The memory Cell stores historic memory contents, and updates the Cell after determining the reserved portion of the past memory and new contents, as follows:
cell=tannh(W c [h t-1, x t ]+b c ),
wherein: w (W) c Is implicit state weight; b c Biasing items for implicit states.
Input gate I t Forgetting door F t The combined action of the two is to discard useless information and transfer the useful information to the nextTime of day. Bi_LSTM (Bidirectional Long-Short Term Memory, two-way long and short term memory network) whose basic idea is to take forward and backward LSTM for each word sequence respectively, and then combine the output results at the same time. Thus for each instant of time, there is correspondence between forward and backward information. In the named entity recognition task, bi_lstm is good at handling long distance text information, but cannot handle dependency relationships between adjacent labels. Bi_lstm extracts Bi-directional semantic information while identifying mechanical entities, but does not take into account dependencies between entities. It is possible for an entity to predict consecutive 'B-', 'B-' or to appear starting with 'I-' and therefore the present invention chooses the relationship between CRF (conditional random field) prediction context labels to solve this problem. The characters in the required prediction text are calculated through Bi-LSTM to obtain an output result, and the CRF comprehensively scores the entity character labels in the required prediction field by considering the entity label score and the transfer score between adjacent character labels, wherein the specific scoring formula is as follows:
Figure BDA0004082943880000052
wherein:
Figure BDA0004082943880000053
representing the transfer score between two adjacent labels in a text of a mechanical design field; />
Figure BDA0004082943880000054
Y representing the ith character in the mechanical text i Tag scoring; n represents the number of characters currently entered. Comparing the correct labeling total score of the text characters of the mechanical design with all possible labeling total scores to obtain a probability value P (Y|X) of the correct labeling at present, wherein the probability value P (Y|X) is as follows:
Figure BDA0004082943880000055
and obtaining a correct probability value of character label prediction, and when P (Y|X) is close to 1, representing that the labeling result is consistent with the predicted result of the model, and effectively training the entity recognition model by mechanical design.
The BERT+Bi_LSTM+CRF model is selected as a main model for entity identification, a group of model parameters with the best identification effect are selected as final parameters after debugging, the specific Bi_LSTM+CRF model batch_size=32, the hidden layer number of LSTM is 2, dropout=0.5, epoch=80 and the like, and finally a TensorBoard module is added to carry out visual output on a training process curve.
Finally, evaluating the entity identification model by using a test set, wherein the evaluation index comprises an F-score, and the formula is as follows:
Figure BDA0004082943880000061
Figure BDA0004082943880000062
Figure BDA0004082943880000063
precision is the accuracy rate; recovery is the recall rate; TP is predicted as positive example, and is actually positive example; FP is predicted as positive and actually negative; FN is predicted negative and actually positive; repeating the training model step if the F-score is lower than a predetermined target value; if the F-score reaches a predetermined goal, the parametric model is selected for use.
In the relation recognition layer, the type of relation classification is defined, an entity relation extraction data set is built according to the data source preprocessed in the input entity recognition layer, and an entity relation model is built by the entity relation extraction data set. A complete domain knowledge graph is constructed, not only domain entities are needed, but also relationships among the entities are needed to be obtained, and the text_CNN+LSTM is adopted to extract the relationships among the entities.
And combining the entity relation model, the entity recognition model and the preprocessed data source in the input entity recognition layer to splice the entity relation model, the entity recognition model and the preprocessed data source into a triplet.
Type of relationship classification: the relationship between the mechanical design and the evaluation field is complicated, the relationship between the entities in the field and the relationship are classified according to the specific entity relationship, and the relationship between the mechanical design and the evaluation field is divided into five relationship types, as shown in fig. 3, which are respectively: the five relationship types comprise most of the relationships in the field, and the relationships under each relationship category can be subdivided into a plurality of relationships.
The entity relation model is similar to the entity recognition model in the entity recognition layer, and firstly, a relation extraction data set is manufactured: the text data transmitted by the entity recognition layer is firstly subjected to label processing, and the text data is in a sentence and relation label form and is used as a data set. Specific examples: if "chamfer is needed between the hole and the end face", there is a relation. The data set is divided into a training set, a testing set and a verification set according to the ratio of 8:1:1. Then, a complete domain knowledge graph is constructed, and not only domain entities but also relations among the entities are required to be acquired. And adopting a text convolutional neural network (textCNN) +LSTM to classify the relationship among the entities. The specific structure is shown in FIG. 4, and is mainly divided into a BERT module and a textCNN+LSTM module. And inputting the processed text data, converting the text into 768-dimensional vectors through the BERT module in sequence, inputting the output result into a convolution layer, a pooling layer and an LSTM circulation layer in sequence, and finally accessing the full-connection layer to output the result. Compared with other convolutional neural networks, the textCNN network has simple structure and fewer network parameters. To reduce semantic feature loss, the network employs only one layer of convolution and one layer of pooling. The network contains 32 filters, a convolution kernel of 3, a fill in SAME mode, and an activation function of RELU. In particular, the nature of the relation extraction can be understood as a text classification task. And evaluating the model by adopting the F-score evaluation index, selecting the model with the highest accuracy as a final parameter model, and building an entity relationship model in the field.
And writing a script into the triples by adopting a third party library contained in Python, and storing the triples into a MySQL and Neo4j graph database for subsequent use and maintenance.
And transmitting the triple data into a data storage layer, receiving the triple data by the data storage layer, and combining the prestored existing third-party database with two database software of MySQL and Neo4j to store the triple data into a background database.
In the application layer, the background database is applied and mainly comprises functions of user login, entity inquiry, auxiliary question and answer, knowledge update and the like.
The application layer builds an auxiliary decision-making platform by using a Django framework, and the main structure of the platform is shown in figure 5, and the specific structure is as follows: entity recognition (entity identification module), query (Query module), overview (Overview module), and Question and answer (auxiliary question-answering module).
Entity identification module: the module mainly realizes the functions of identifying the content input by the user, separating the entity words and labeling the parts of speech, and mainly adopts an entity identification model and a word separation model to identify whether the sentences input by the user contain the required entities or not.
And a query module: the web framework is connected with the Neo4j graph database to realize the query of entities, relations and node attributes in the graph database, and mainly realize the query and modification functions of node contents and relations. The knowledge graph constructed by the invention is constructed and stored based on the graph database Neo4j, and when inquiring knowledge, the knowledge needs to be searched through a Cypher inquiry statement used in Neo4j and an inquiry result is returned. After clicking and submitting the entity or relation to be queried at the front end of the webpage, automatically generating a Cypher query statement of the node or relation by the background to search data, returning the queried result to the front end webpage, and realizing visualization through the inserts such as Neovis. Js, ECHARTS and the like.
An overview module: the module mainly realizes the graph display function and displays partial nodes and relations contained in the database on a front-end interface.
An auxiliary question-answering module: firstly, identifying an entity from a query sentence, secondly, carrying out grammar analysis on the question, and finally extracting a structured semantic triplet from a natural language question through structural features of grammar matching dependency books, so as to provide a basis for subsequent question classification and template matching.
The invention adopts a question-answer form to carry out auxiliary design and evaluation: firstly, word segmentation and grammar analysis are carried out on a question raised by a user, specific entity and grammar relations in sentences are extracted, sentence classification is carried out, then, the extracted content is matched with a pre-designed question template, finally, the best matching question is obtained, and a Cypher (Cypher is a declarative graph database query language which has rich expressive force and can efficiently query and update graph data) sentence corresponding to the question is constructed to search a graph database, so that a final answer is obtained. These functions are integrated and visualized by some front-end tools and are concentrated into a web page, which is convenient for the user to operate.
There are many well-established tools available to identify entities from query statements. The language technology platform LTP is a whole set of Chinese natural language processing system, and the platform system provides a whole set of rich, efficient and high-precision Chinese natural language processing modules from bottom to top. Therefore, the invention selects LTP (Language Technology Platform) platform, carries out the earlier stage natural language processing work through the platform, carries out the dependency grammar analysis and semantic dependency analysis on the question, and the recognition result is shown in figure 6. And finally, searching the Neo4j database by constructing a Cypher sentence search, and returning a query result, wherein the situation of insufficient recognition of the words in the professional domain can occur to the dictionary of the LTP, so that the entity, the relationship and the attribute value in the extracted domain knowledge base form a domain professional dictionary, and the domain professional dictionary is additionally loaded into the LTP expansion dictionary. For example, "how to machine a stepped hole? "how/r process/v stepped hole/nmwp" is the result of the part of speech tagging, where r represents a pronoun, v represents a verb, nm represents a noun identified from the added dictionary, wp represents a punctuation mark. And extracting the structured semantic triples from the question through the structural features of the grammar dependent book.
Defining a question template: entities or relationships are identified from query sentences entered by the user, for example from sentences: "how to machine a stepped hole? "can extract" machining, stepped hole,? And (3) performing problem template matching according to the feature words, and finally searching a corresponding result in the map.
Because the field does not have higher-quality manually-marked questions and answers or structured Chinese data sets, the invention needs to construct a mechanical design and evaluation class problem template to support the realization of intelligent query of a map system. The sample of the query part of the concrete design processing type question is as follows:
Figure BDA0004082943880000081
Figure BDA0004082943880000091
problem classification model: in the present invention, the problem classification is mainly divided into two parts: 1. the method can effectively avoid the influence of common words on keywords and improve the relevance of the keywords and the text. 2. And selecting a naive Bayes text classifier as a feature classification model, constructing a required problem data set, evaluating the model by adopting the same evaluation method, and selecting a model with optimal indexes as a final model.
Extracting entity relation feature words from the query sentences, combining the feature words to form a Cypher query sentence, retrieving corresponding answers from a graph database Neo4j and returning the answers, wherein the process is as follows:
how does Q be a stepped hole machined?
K: how do/r process/v step hole/nm? Wp
A, turning step Kong Cuche finish turning
Q represents a specific question set, K represents the recognized question feature words and grammar set, and A represents the answer returned to the user. Generating a Cypher sentence by using the entities and the relations identified in the question, for example: 'MATCH (n) - [ r ] - > (m) WHERE n.name=' node name 'RETURN n, r, m', the queried result is output to the front end interface, and the result is returned to the user to complete the query.
One embodiment of the invention is provided below:
example 1
Aiming at any design model and historical design data, firstly, aiming at an enterprise internal similar or historical design database, collecting and arranging to obtain an industrial text document, and processing and cleaning the collected document by adopting an OCR and regular expression technology, wherein the specific cleaning results are as follows:
the axis of the hole is vertical to the end face, thereby reducing the cutting difficulty
The holes should be protected from deep holes, and the ratio L/D of depth to aperture should be less than or equal to 5
The depth-to-diameter ratios of the stepped holes cannot differ too much
The size of the hole meets the standard specification
The dimension tolerance A, the position tolerance B, the shape tolerance C and the roughness D of the hole should satisfy A > B > C > D
The width of the face should be as uniform as possible to avoid uneven impact cutting of the tool
Processing curved surface with complex surface avoiding structure
The long and thin cylindrical surface is avoided, and the ratio L/D of the length to the diameter of the cylindrical surface is less than or equal to 5
The plane with larger area should be processed with low precision as much as possible
Dimensional tolerance A, positional tolerance B, shape tolerance C and roughness D of the surface should satisfy A > B > C > D
For processing, a switching circle is arranged between adjacent side walls of the cavity
Avoiding sharp angles between the side walls and the floor of the chamber
The documents are classified according to the line entity shown in fig. 2, and as shown in fig. 7, the documents are divided into five general classes of holes, faces, grooves and tools, and the five classes are further divided into a plurality of entity classes according to the five classes, for example, the tool class is further divided into a cutter, a spanner and a machine tool, and the specific division classes are shown in fig. 7. Labeling the processed data, making a data set, respectively constructing an entity recognition model and an entity relation recognition model, and then training the model; and finally, storing the triples and visualizing the triples to a Neo4j graph database.
Constructing an auxiliary decision-making platform by adopting a Django framework, and deploying the trained entity recognition model to an entity recognition module; and connecting the neosis.js with the Neo4j graph database to finish the query function of the entity. And finally, building a question-answering platform, firstly building a question template, then adopting naive Bayes to classify questions, carrying out semantic and part-of-speech analysis on questions input by a user according to the LTP platform, knowing the intention of the user, then carrying out question classification, finally obtaining entities and relations in the questions of the user, building a Cypher sentence, inquiring and returning to the front end.

Claims (10)

1. An intelligent design method based on a mechanical knowledge graph is characterized by comprising the following steps:
step (1): collecting industrial text document data, converting the industrial text document data into a document form which can be identified by a computer, obtaining an original data source, and preprocessing the original data source to obtain a preprocessed data source;
step (2): labeling the preprocessed data sources, and dividing the labeled data sources into a training set, a testing set and a verification set;
step (3): performing entity classification and relationship classification on the preprocessed data source, wherein the entity classification is divided into a product design class, a part processing class, a part assembly class and a related data class, the relationship classification is divided into a causal relationship, a mutual exclusion relationship, a finite relationship, an initiating relationship and a fixed relationship, and an entity recognition model and an entity relationship model are respectively constructed based on the training set to obtain a complete mechanical knowledge graph, the entity recognition model recognizes entities in the specific field, and the entity relationship model realizes extraction of the relationships among the entities;
step (4): and the entity relation model, the entity identification model and the preprocessed data source are spliced into triples, and the triples are stored in a background database.
2. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: in the step (1), the industrial text document data is converted into a document form which can be recognized by a computer through an OCR (optical character recognition) technology, and the original data source is subjected to data preprocessing through a regular expression and a stop word removing method.
3. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: in the step (2), the text of the preprocessed data source is stored as a txt.ann format, a Python script file is written, a regular expression is used for filtering out specific contents of each mark, irrelevant contents are removed through the expression, and the contents are placed into an array to form a text-entity label.
4. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: and (3) the entity recognition model adopts a TensorFlow framework to realize entity recognition, and adopts a pre-training model consisting of a BERT module, a forward LSTM module, a backward LSTM module and a CRF module.
5. The intelligent design method based on the mechanical knowledge graph as claimed in claim 4, wherein the method is characterized in that: evaluating the entity identification model by adopting the test set, and evaluating indexes
Figure FDA0004082943870000011
Figure FDA0004082943870000012
precision is the accuracy; recovery is the recall rate; TP is predicted as positive example, and is actually positive example; FP is predicted as positive and actually negative; FN is predicted negative and actually positive; repeating training if the F-score is below a predetermined target value; if the F-score reaches a predetermined goal, the parametric model is selected for use.
6. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: in the step (3), the entity relation model adopts a text convolutional neural network to classify the relation among entities, the entity relation model is divided into a BERT module and a textCNN+LSTM module, characters are converted into 768-dimensional vectors through the BERT module, an output result is sequentially input into a convolutional layer, a pooling layer and an LSTM circulating layer, and finally the result is output.
7. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: in the step (4), the triples are written with scripts by adopting a third party library contained in Python, and the triples are stored into a MySQL and Neo4j graph database.
8. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: in the step (4), the background database is applied, and the background database comprises the functions of user login, entity inquiry, auxiliary question and answer and knowledge updating.
9. The intelligent design method based on the mechanical knowledge graph as claimed in claim 8, wherein the method is characterized in that: an auxiliary decision-making platform comprising an entity identification module, a query module, an overview module and an auxiliary question-answering module is built.
10. The intelligent design method based on the mechanical knowledge graph as claimed in claim 1, wherein the method is characterized in that: the entity recognition module realizes the functions of recognizing the input content of a user, separating the entity into words and labeling the parts of speech, the query module is connected with the Neo4j graph database through a web frame to realize the query of the entity, the relation and the node attribute in the graph database, the overview module displays part of the nodes and the relation contained in the database on a front end interface, the auxiliary question-answering module recognizes the entity from query sentences, carries out grammar analysis on the question sentences, and extracts the structured semantic triples from natural language question sentences through the structural characteristics of grammar matching dependency books.
CN202310128512.5A 2023-02-17 2023-02-17 Intelligent design method based on mechanical knowledge graph Pending CN116340530A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310128512.5A CN116340530A (en) 2023-02-17 2023-02-17 Intelligent design method based on mechanical knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310128512.5A CN116340530A (en) 2023-02-17 2023-02-17 Intelligent design method based on mechanical knowledge graph

Publications (1)

Publication Number Publication Date
CN116340530A true CN116340530A (en) 2023-06-27

Family

ID=86883044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310128512.5A Pending CN116340530A (en) 2023-02-17 2023-02-17 Intelligent design method based on mechanical knowledge graph

Country Status (1)

Country Link
CN (1) CN116340530A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116522233A (en) * 2023-07-03 2023-08-01 国网北京市电力公司 Method and system for extracting and classifying key point review content of research document
CN117235929A (en) * 2023-09-26 2023-12-15 中国科学院沈阳自动化研究所 Three-dimensional CAD (computer aided design) generation type design method based on knowledge graph and machine learning
CN118014072A (en) * 2024-04-10 2024-05-10 中国电建集团昆明勘测设计研究院有限公司 Construction method and system of knowledge graph for hydraulic and hydroelectric engineering

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059607A1 (en) * 1999-09-01 2008-03-06 Eric Schneider Method, product, and apparatus for processing a data request
CN104933164A (en) * 2015-06-26 2015-09-23 华南理工大学 Method for extracting relations among named entities in Internet massive data and system thereof
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
US20200004832A1 (en) * 2018-07-02 2020-01-02 Babylon Partners Limited Computer Implemented Method for Extracting and Reasoning with Meaning from Text
CN111737496A (en) * 2020-06-29 2020-10-02 东北电力大学 Power equipment fault knowledge map construction method
CN113010663A (en) * 2021-04-26 2021-06-22 东华大学 Adaptive reasoning question-answering method and system based on industrial cognitive map
CN113312501A (en) * 2021-06-29 2021-08-27 中新国际联合研究院 Construction method and device of safety knowledge self-service query system based on knowledge graph
CN113569054A (en) * 2021-05-12 2021-10-29 浙江工业大学 Knowledge graph construction method and system for multi-source Chinese financial bulletin document
CN113723632A (en) * 2021-08-27 2021-11-30 北京邮电大学 Industrial equipment fault diagnosis method based on knowledge graph
CN114911945A (en) * 2022-04-13 2022-08-16 浙江大学 Knowledge graph-based multi-value chain data management auxiliary decision model construction method
CN115269857A (en) * 2022-04-28 2022-11-01 东北林业大学 Knowledge graph construction method and device based on document relation extraction

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059607A1 (en) * 1999-09-01 2008-03-06 Eric Schneider Method, product, and apparatus for processing a data request
CN104933164A (en) * 2015-06-26 2015-09-23 华南理工大学 Method for extracting relations among named entities in Internet massive data and system thereof
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
US20200004832A1 (en) * 2018-07-02 2020-01-02 Babylon Partners Limited Computer Implemented Method for Extracting and Reasoning with Meaning from Text
CN111737496A (en) * 2020-06-29 2020-10-02 东北电力大学 Power equipment fault knowledge map construction method
CN113010663A (en) * 2021-04-26 2021-06-22 东华大学 Adaptive reasoning question-answering method and system based on industrial cognitive map
CN113569054A (en) * 2021-05-12 2021-10-29 浙江工业大学 Knowledge graph construction method and system for multi-source Chinese financial bulletin document
CN113312501A (en) * 2021-06-29 2021-08-27 中新国际联合研究院 Construction method and device of safety knowledge self-service query system based on knowledge graph
CN113723632A (en) * 2021-08-27 2021-11-30 北京邮电大学 Industrial equipment fault diagnosis method based on knowledge graph
CN114911945A (en) * 2022-04-13 2022-08-16 浙江大学 Knowledge graph-based multi-value chain data management auxiliary decision model construction method
CN115269857A (en) * 2022-04-28 2022-11-01 东北林业大学 Knowledge graph construction method and device based on document relation extraction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
崔硕等: "基于深度学习的机械领域知识图谱构建及应用", 《制造技术与机床》, 2 February 2023 (2023-02-02), pages 83 - 89 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116522233A (en) * 2023-07-03 2023-08-01 国网北京市电力公司 Method and system for extracting and classifying key point review content of research document
CN117235929A (en) * 2023-09-26 2023-12-15 中国科学院沈阳自动化研究所 Three-dimensional CAD (computer aided design) generation type design method based on knowledge graph and machine learning
CN117235929B (en) * 2023-09-26 2024-06-04 中国科学院沈阳自动化研究所 Three-dimensional CAD (computer aided design) generation type design method based on knowledge graph and machine learning
CN118014072A (en) * 2024-04-10 2024-05-10 中国电建集团昆明勘测设计研究院有限公司 Construction method and system of knowledge graph for hydraulic and hydroelectric engineering

Similar Documents

Publication Publication Date Title
CN111639171B (en) Knowledge graph question-answering method and device
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
CN111813802B (en) Method for generating structured query statement based on natural language
CN110727779A (en) Question-answering method and system based on multi-model fusion
CN110990590A (en) Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning
CN107766483A (en) The interactive answering method and system of a kind of knowledge based collection of illustrative plates
CN113987212A (en) Knowledge graph construction method for process data in numerical control machining field
CN110765277B (en) Knowledge-graph-based mobile terminal online equipment fault diagnosis method
CN116340530A (en) Intelligent design method based on mechanical knowledge graph
CN113312501A (en) Construction method and device of safety knowledge self-service query system based on knowledge graph
CN113962219A (en) Semantic matching method and system for knowledge retrieval and question answering of power transformer
CN114238653B (en) Method for constructing programming education knowledge graph, completing and intelligently asking and answering
CN116127084A (en) Knowledge graph-based micro-grid scheduling strategy intelligent retrieval system and method
CN113919366A (en) Semantic matching method and device for power transformer knowledge question answering
CN115577086A (en) Bridge detection knowledge graph question-answering method based on hierarchical cross attention mechanism
CN113988071A (en) Intelligent dialogue method and device based on financial knowledge graph and electronic equipment
CN112925918A (en) Question-answer matching system based on disease field knowledge graph
CN116595195A (en) Knowledge graph construction method, device and medium
CN115659947A (en) Multi-item selection answering method and system based on machine reading understanding and text summarization
Sun A natural language interface for querying graph databases
CN111104503A (en) Construction engineering quality acceptance standard question-answering system and construction method thereof
CN114579709A (en) Intelligent question-answering intention identification method based on knowledge graph
CN114817454A (en) NLP knowledge graph construction method combining information content and BERT-BilSTM-CRF
CN117216221A (en) Intelligent question-answering system based on knowledge graph and construction method
CN116342167A (en) Intelligent cost measurement method and device based on sequence labeling named entity recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination