CN117370527A - Method and system for constructing traffic construction industry standard knowledge base by using ChatGPT - Google Patents

Method and system for constructing traffic construction industry standard knowledge base by using ChatGPT Download PDF

Info

Publication number
CN117370527A
CN117370527A CN202311380162.8A CN202311380162A CN117370527A CN 117370527 A CN117370527 A CN 117370527A CN 202311380162 A CN202311380162 A CN 202311380162A CN 117370527 A CN117370527 A CN 117370527A
Authority
CN
China
Prior art keywords
data
module
chatgpt
article
industry standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311380162.8A
Other languages
Chinese (zh)
Inventor
余莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunji Smart Engineering Co ltd
Original Assignee
Yunji Smart Engineering Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunji Smart Engineering Co ltd filed Critical Yunji Smart Engineering Co ltd
Priority to CN202311380162.8A priority Critical patent/CN117370527A/en
Publication of CN117370527A publication Critical patent/CN117370527A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for constructing a traffic and building industry standard knowledge base by using ChatGPT, wherein the method comprises the following steps: acquiring industry standard data; processing industry standard data into an article module; the ChatGPT processes the article module into vector data; vector data is stored. According to the invention, a question-answering knowledge base system is constructed based on the ChatGPT, and in the construction process, the knowledge base system does not need to be manually carded, and can be used for summarizing the contents of each article and forming corresponding problem points.

Description

Method and system for constructing traffic construction industry standard knowledge base by using ChatGPT
Technical Field
The invention relates to the technical field of knowledge bases, in particular to a method and a system for constructing a standard knowledge base of traffic construction industry by using ChatGPT.
Background
Along with the development of the digital twinning platform, various standards of traffic and building industries are gradually formed, including various standards of work, construction and the like of the country, industry, group and enterprise, and various standard methods of implementation and construction and the like. However, these knowledge are distributed at present and cannot be better converged into our daily work, for example, when writing schemes, more scheme data need to be queried, and after various comparison, a scheme can be formed after the data is queried. In daily construction work, a construction standard is constructed, and knowledge information of construction, regulations and the like is needed to be known so as to write the construction standard. For example, in the process of model design, based on various project characteristics of various industries, the corresponding component construction size standard needs to be known, the coding requirement needs to include information such as elements, and the model design can be performed after the corresponding various scene information needs to be overturned. The daily designer needs to spend a great deal of time to query and collect the data, and the model design work can be performed after analysis and review. The traditional implementation scheme is that professionals are arranged to classify various knowledge, a labeling mode is adopted to construct a content labeling database model, the content is returned by matching with the label, the construction mode cannot meet personalized questioning and additional business requirements of users, and the construction mode cannot meet more complex business scene requirements of the users.
Disclosure of Invention
Therefore, in order to overcome the defects of the prior art, the invention provides the method and the system for constructing the standard knowledge base of the traffic and construction industry by using the ChatGPT, which reduce the labor cost and are convenient to use.
The technical scheme of the invention is that the method for constructing the traffic and building industry standard knowledge base by using the ChatGPT comprises the following steps:
acquiring industry standard data;
processing the industry standard data into an article module;
the ChatGPT processes the article module into vector data;
the vector data is stored.
Further, in the step of processing the industry standard data into an article module, the method comprises the following steps:
preprocessing the industry standard data into text data;
and splitting the text data into the article modules.
Further, in the step of preprocessing the industry standard data into text data, it includes: the picture information is described.
Further, in the step of splitting the text data into the article modules, it includes: and performing coding processing, redundant character processing, segmentation of complete semantic segments, lexical analysis and directory tree analysis on the text data.
Further, in the step of the ChatGPT processing the article module into vector data, it includes:
the ChatGPT carries out fine adjustment on the article module according to the model;
and processing the article module after fine adjustment into the vector data by the ChatGPT.
The invention provides another technical scheme that a system for constructing a traffic and construction industry standard knowledge base by using ChatGPT comprises:
the acquisition module is used for acquiring industry standard data;
the processing module is used for processing the industry standard data into an article module;
the ChatGPT is used for processing the article module into vector data;
and the vector database is used for storing the vector data.
Further, the processing module comprises a preprocessing module and a splitting module;
the preprocessing module is used for preprocessing the industry standard data into text data;
and the splitting module is used for splitting the text data into the article modules.
Further, the preprocessing module includes: the picture information is described.
Further, the splitting module includes: and performing coding processing, redundant character processing, segmentation of complete semantic segments, lexical analysis and directory tree analysis on the text data.
Further, the ChatGPT includes:
the ChatGPT carries out fine adjustment on the article module according to the model;
and processing the article module after fine adjustment into the vector data by the ChatGPT.
According to the invention, a question-answering knowledge base system is constructed based on the ChatGPT, and in the construction process, the knowledge base system does not need to be manually carded, and can be used for summarizing the contents of each article and forming corresponding problem points.
Drawings
Fig. 1 is a flow chart of a method of constructing a traffic, construction industry standard knowledge base using ChatGPT in the present invention.
Fig. 2 is a schematic block diagram of a system for constructing a standard knowledge base of the transportation and construction industry using ChatGPT in the present invention.
Detailed Description
For a thorough understanding of the objects, features and effects of the present invention, reference will be made to the following detailed description of the invention taken in conjunction with the accompanying drawings.
1. The invention provides a method for constructing a traffic and construction industry standard knowledge base by using ChatGPT, which is shown in FIG. 1 and specifically comprises the following steps.
100. Acquiring industry standard data;
200. processing industry standard data into an article module;
ChatGPT processes the article module into vector data;
400. vector data is stored.
In the present invention, in the above step 100, industry standard data is acquired, and the industry standard data includes: the government issues rules documents such as national standard, industry standard, group standard, enterprise standard, professional books, bidding requirements, various bidding books, various processes of traffic construction engineering, construction safety and the like.
In the present invention, in the above step 200, industry standard data is processed into an article module. The method comprises the following steps:
201. the industry standard data is preprocessed into text data.
Preprocessing refers to the step of uniformly converting data into a plain text format and reserving a paragraph format, wherein industry standard data comprises text contents, pictures and tables.
In the process of processing a picture, picture information needs to be described. The method comprises the steps of describing specific contents in a picture, coloring each module of the picture, using the description of a scene range, creating a hash index for the picture, and storing the hash index into the picture description. Or analyzing and summarizing the context content of the picture through the LIamaIndex, then dividing a content module, adding a mark of 'picture content data' to the content of the module, processing according to the flow, and storing the processed content in a vector database.
In processing the table, the header and table content distribution is converted into a txt document in csv format. For the standard of storing multiple tables, such as code dictionaries of various industries specified in various national standards, when a user wants to acquire a corresponding code collection through the industry standard, codes can be automatically generated through a code knowledge base which is processed by the user. For the data of the scene, we can convert the data into the article sections of the table head and the table content preferentially, name the stored files by the table name and the sequence number, add the same data content standard (such as the national industrial internet channel coding standard 1.0 standard pier coding standard) to each table file, and then respectively construct indexes in the LIamaIndex, thereby obtaining the whole table data relationship of the data. When a user needs to use the data for relevant scenarios, such as generating an industrial internet identification code, etc. The system needs to acquire a file list from the vector database according to the questioning content of the user, sends the data content to the ChatGPT in batches, informs the ChatGPT that the codes to be generated need to be generated according to the data in the table, and can acquire the code information wanted by the user through multiple times of adjustment.
202. The text data is split into chapter modules.
The text data is split uniformly according to paragraphs, and the method comprises the steps of coding the text data, processing redundant characters, segmenting complete semantic segments, lexical analysis and directory tree analysis. The ChatGPT maximum token requires (no more than 8191 Tokens per block, as this is the input length constraint of the OpenAI embeddings model) splitting of the document content, clipping into multiple article modules (text chunks).
Redundant character processing, for repeated characters, or data belonging to non-business rules, in-fact descriptions, cases, etc. in documents, can be considered to reduce storage priority or delete the part of content directly.
Lexical analysis, distribution requires rich labeling of article slices from time dimension, space dimension, logical dimension, catalog dimension. Time dimension: for data containing state meaning, time or time meaning, etc., it is necessary to extract separately as a reference tag for the content context of the segment. Such as current rules, failure rules, planning rules, etc., represent current limits, previous validity, future validity, etc. Spatial dimension: for data including position information, area information, country administration area information, and the like, it is necessary to add a space dimension tag. Logic dimension: for containing keywords that must, potentially, etc. include content rules, logical dimension labels need to be added.
And the catalogue dimension is used for adding a catalogue label to the catalogue result of the article module. The document paragraph directory results must be saved with a root directory to current node directory structure
Analyzing the directory tree, checking whether the current directory is the final leaf directory, if so, adding an end flag, and if not, adding downstream node information such as next=current execution standard coding rule.
In the present invention, in step 300 described above, chatGPT processes the article module into vector data. Comprises the following steps.
ChatGPT fine tunes the article module according to the model.
Learning and summarizing the article module and converting the article module into a plurality of vectors with summarization modes. In the fine tuning process, the contents are de-duplicated, the contents are combined, the data are read and summarized, and corresponding problem sets are formed. For OpenAI, to determine the similarity of two text segments, it is necessary to first change the two text segments into a number vector (vector embeddings), just like a stack of coordinate axis numbers, and then a decimal between 0 and 1 can be obtained by comparing the numbers, and the similarity is higher as the number is closer to 1. And merging the sent content and the returned result according to the similarity result, and performing deduplication operation.
And 302, processing the article module after fine adjustment into vector data by the ChatGPT's Embedding.
A python is used for writing a calling program, an API of OpenAI (application programming interface) is called in batches, and the latest mode is text-embedding-ada-002 at present, so that an article module is changed into vector data.
In the present invention, in the above-described step 400, vector data is stored. The vector data is stored in a vector database, and it is noted that the original text blocks and the digital vectors need to be stored together so that the original text can be obtained in the reverse direction from the digital vectors.
Note that ChatGpt: chatGPT is a large language model developed by the OpenAI team. It may accept user input and generate a corresponding natural language response. The model is generated by training using a large corpus of text and can be used for various natural language processing tasks such as language understanding, text generation, machine translation, etc. ChatGPT is open and can be used by developers and researchers to build a variety of language applications and tools.
Embedding (Embedding): embedding is a vector, which is a list of numbers that can be understood by the machine. Mapping natural language into such vectors can help machines understand the meaning of words and the word-to-word relationship.
Vector database (vector database): a vector database is a database that stores information as vectors or arrays of numbers. Each piece of information is represented as a vector, where each number in the vector corresponds to a particular attribute or feature of the data.
LlamaIndex:LlamaIndex(formerly GPT Index)is a data framework for LLM applications to ingest,structure,and access private or domain-specific data。
2. The invention provides a system for constructing a traffic and construction industry standard knowledge base by using a ChatGPT, which is shown in FIG. 2 and comprises an acquisition module 21, a processing module 22, a ChatGPT23 and a vector database 24.
An acquisition module 21, configured to acquire industry standard data;
a processing module 22 for processing industry standard data into an article module;
ChatGP23, for processing the article module into vector data;
vector database 24 for storing vector data.
In this embodiment, the acquiring module 21 acquires industry standard data, where the industry standard data includes: the government issues rules documents such as national standard, industry standard, group standard, enterprise standard, professional books, bidding requirements, various bidding books, various processes of traffic construction engineering, construction safety and the like.
In this embodiment, the processing module 22 includes a preprocessing module and a splitting module. And the preprocessing module is used for preprocessing industry standard data into text data. And the splitting module is used for splitting the text data into a seal module.
And the preprocessing module is used for preprocessing industry standard data into text data. The method comprises the following steps:
preprocessing refers to the step of uniformly converting data into a plain text format and reserving a paragraph format, wherein industry standard data comprises text contents, pictures and tables.
In the process of processing a picture, picture information needs to be described. The method comprises the steps of describing specific contents in a picture, coloring each module of the picture, using the description of a scene range, creating a hash index for the picture, and storing the hash index into the picture description. Or analyzing and summarizing the context content of the picture through the LIamaIndex, then dividing a content module, adding a mark of 'picture content data' to the content of the module, processing according to the flow, and storing the processed content in a vector database.
In processing the table, the header and table content distribution is converted into a txt document in csv format. For the standard of storing multiple tables, such as code dictionaries of various industries specified in various national standards, when a user wants to acquire a corresponding code collection through the industry standard, codes can be automatically generated through a code knowledge base which is processed by the user. For the data of the scene, we can convert the data into the article sections of the table head and the table content preferentially, name the stored files by the table name and the sequence number, add the same data content standard (such as the national industrial internet channel coding standard 1.0 standard pier coding standard) to each table file, and then respectively construct indexes in the LIamaIndex, thereby obtaining the whole table data relationship of the data. When a user needs to use the data for relevant scenarios, such as generating an industrial internet identification code, etc. The system needs to acquire a file list from the vector database according to the questioning content of the user, sends the data content to the ChatGPT in batches, informs the ChatGPT that the codes to be generated need to be generated according to the data in the table, and can acquire the code information wanted by the user through multiple times of adjustment.
The splitting module is configured to split Cheng Wenzhang the text data, and includes the following contents:
the text data is split uniformly according to paragraphs, and the method comprises the steps of coding the text data, processing redundant characters, segmenting complete semantic segments, lexical analysis and directory tree analysis. The ChatGPT maximum token requires (no more than 8191 Tokens per block, as this is the input length constraint of the OpenAI embeddings model) splitting of the document content, clipping into multiple article modules (text chunks).
Redundant character processing, for repeated characters, or data belonging to non-business rules, in-fact descriptions, cases, etc. in documents, can be considered to reduce storage priority or delete the part of content directly.
Lexical analysis, distribution requires rich labeling of article slices from time dimension, space dimension, logical dimension, catalog dimension. Time dimension: for data containing state meaning, time or time meaning, etc., it is necessary to extract separately as a reference tag for the content context of the segment. Such as current rules, failure rules, planning rules, etc., represent current limits, previous validity, future validity, etc. Spatial dimension: for data including position information, area information, country administration area information, and the like, it is necessary to add a space dimension tag. Logic dimension: for containing keywords that must, potentially, etc. include content rules, logical dimension labels need to be added.
And the catalogue dimension is used for adding a catalogue label to the catalogue result of the article module. The document paragraph directory results must be saved with a root directory to current node directory structure
Analyzing the directory tree, checking whether the current directory is the final leaf directory, if so, adding an end flag, and if not, adding downstream node information such as next=current execution standard coding rule.
In this embodiment, the ChatGPT23 is configured to process the article module into vector data.
ChatGPT23 fine-tunes the article module according to the model.
Learning and summarizing the article module and converting the article module into a plurality of vectors with summarization modes. In the fine tuning process, the contents are de-duplicated, the contents are combined, the data are read and summarized, and corresponding problem sets are formed. For OpenAI, to determine the similarity of two text segments, it is necessary to first change the two text segments into a number vector (vector embeddings), just like a stack of coordinate axis numbers, and then a decimal between 0 and 1 can be obtained by comparing the numbers, and the similarity is higher as the number is closer to 1. And merging the sent content and the returned result according to the similarity result, and performing deduplication operation.
The ChatGPT 23's Embedding processes the trimmed article module into vector data.
A python is used for writing a calling program, an API of OpenAI (application programming interface) is called in batches, and the latest mode is text-embedding-ada-002 at present, so that an article module is changed into vector data.
In this embodiment, the vector database 24 is used for storing the vector data. It is noted that the original text block and the digital vector need to be stored together so that the original text can be obtained in the reverse direction from the digital vector.
3. An embodiment is provided, and the method and the system for verifying the construction of the traffic and construction industry standard knowledge base by using the ChatGPT comprise the following steps.
1. The user asks: the system needs to configure a plurality of scene templates, and a user selects different scenes to carry out corresponding questioning.
2. User problem vectorization: the problem posed by the user is subjected to ChatGPT Embedding vectorization conversion to be changed into a digital vector, and the vector database is conveniently used for inquiring the data of each vector module converted in the data processing process.
3. User problem correlation query: after the user questions are converted into vectors, the vector answers are queried in a vector library and extracted according to the relevance. The specific implementation process is that after the digital vector converted by the question asking by the user is obtained, the digital vector is searched by a vector database center, a result set is returned, the result set is scored according to the matching similarity, the higher the score is, the more matching is indicated, and then a related result is returned according to the matching degree in a reverse order.
4. Article module merging: the user questions and preset answers are combined and the result set is added to the promtt of the request ChatGPT.
For example, the user presents a problem: "I want to read the highway engineering information model application standard (JTGT 2421-2021). Pdf, and summarize the model requirements, return results in Xmini's format? "the text paragraphs retrieved from the database are: what is the highway engineering design information model application standard? What is the model requirement? What is the general provision of model requirements? What is the numbering convention required by the model? ", then the final campt looks like the following:
and (5) returning a result:
Plain Text
@startmindmap
* Model requirement
* General provision for
* The information model includes information
* Geometric information
* Geometry of
* Spatial position
* Attribute information
* Identification code
* Classification coding
* Position of
* Size of
* Quantity of
* Type of x
* Materials:, and methods of making same
* Dosage of
* Numbering convention
* Numbering rules of bridges, culverts, tunnels, monitoring facilities and the like in the information model.
@endmindmap
And saving the returned result in a planmol format, and then selectively importing the xmind file.
5. Big model questioning: asking questions to the ChatGPT big model and returning final answers.
6. Fine tuning the model by user questioning: for the condition that the answer returned in the questioning project is not right or unsuitable, negative and correct answers can be given, the wrong points of the ChatGPT are told, the ChatGPT is modified, and the like, and the ChatGPT changes the original model data according to the answer of the user, so that the aim of perfecting the knowledge base by the foot is fulfilled.

Claims (10)

1. A method for constructing a standard knowledge base of traffic and construction industry by using ChatGPT, which is characterized by comprising the following steps:
acquiring industry standard data;
processing the industry standard data into an article module;
the ChatGPT processes the article module into vector data;
the vector data is stored.
2. The method of claim 1, wherein in the step of processing the industry standard data into an article module, comprising the steps of:
preprocessing the industry standard data into text data;
and splitting the text data into the article modules.
3. The method of claim 2, wherein in the step of preprocessing the industry standard data into text data, comprising: the picture information is described.
4. The method of claim 2, wherein in the step of splitting the text data into the article modules, comprising: and performing coding processing, redundant character processing, segmentation of complete semantic segments, lexical analysis and directory tree analysis on the text data.
5. The method of claim 1, wherein in the step of ChatGPT processing the article module into vector data, comprising:
the ChatGPT carries out fine adjustment on the article module according to the model;
and processing the article module after fine adjustment into the vector data by the ChatGPT.
6. A system for constructing a traffic, building industry standard knowledge base using ChatGPT, comprising:
the acquisition module is used for acquiring industry standard data;
the processing module is used for processing the industry standard data into an article module;
the ChatGPT is used for processing the article module into vector data;
and the vector database is used for storing the vector data.
7. The system of claim 6, wherein the processing module comprises a preprocessing module and a splitting module;
the preprocessing module is used for preprocessing the industry standard data into text data;
and the splitting module is used for splitting the text data into the article modules.
8. The system of claim 7, wherein the preprocessing module comprises: the picture information is described.
9. The system of claim 7, wherein the splitting module comprises: and performing coding processing, redundant character processing, segmentation of complete semantic segments, lexical analysis and directory tree analysis on the text data.
10. The system of claim 6, wherein the ChatGPT comprises:
the ChatGPT carries out fine adjustment on the article module according to the model;
and processing the article module after fine adjustment into the vector data by the ChatGPT.
CN202311380162.8A 2023-10-23 2023-10-23 Method and system for constructing traffic construction industry standard knowledge base by using ChatGPT Pending CN117370527A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311380162.8A CN117370527A (en) 2023-10-23 2023-10-23 Method and system for constructing traffic construction industry standard knowledge base by using ChatGPT

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311380162.8A CN117370527A (en) 2023-10-23 2023-10-23 Method and system for constructing traffic construction industry standard knowledge base by using ChatGPT

Publications (1)

Publication Number Publication Date
CN117370527A true CN117370527A (en) 2024-01-09

Family

ID=89407378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311380162.8A Pending CN117370527A (en) 2023-10-23 2023-10-23 Method and system for constructing traffic construction industry standard knowledge base by using ChatGPT

Country Status (1)

Country Link
CN (1) CN117370527A (en)

Similar Documents

Publication Publication Date Title
CN111753099B (en) Method and system for enhancing relevance of archive entity based on knowledge graph
US20220261427A1 (en) Methods and system for semantic search in large databases
CN110083805B (en) Method and system for converting Word file into EPUB file
US20170235841A1 (en) Enterprise search method and system
US6721451B1 (en) Apparatus and method for reading a document image
JP4343213B2 (en) Document processing apparatus and document processing method
CN112541490A (en) Archive image information structured construction method and device based on deep learning
CN109002499B (en) Discipline correlation knowledge point base construction method and system
CN113190687B (en) Knowledge graph determining method and device, computer equipment and storage medium
CN112231494A (en) Information extraction method and device, electronic equipment and storage medium
JPWO2004034282A1 (en) Content reuse management device and content reuse support device
CN117095419A (en) PDF document data processing and information extracting device and method
EP2544100A2 (en) Method and system for making document modules
CN117370527A (en) Method and system for constructing traffic construction industry standard knowledge base by using ChatGPT
CN116756395A (en) Electronic archiving method and system for urban construction archives
CN115203445A (en) Multimedia resource searching method, device, equipment and medium
Zaslavsky et al. Using copy-detection and text comparison algorithms for cross-referencing multiple editions of literary works
CN114997167A (en) Resume content extraction method and device
JP2003288332A (en) Method and system for supporting structured document creation
CN113434760B (en) Construction method recommendation method, device, equipment and storage medium
Paskali et al. Six Steps Toward Improving Discoverability of Ph. D. Dissertations
CN117493712B (en) PDF document navigable directory extraction method and device, electronic equipment and storage medium
CN110457659B (en) Clause document generation method and terminal equipment
JPH07296005A (en) Japanese text registration/retrieval device
Monostori et al. Using the MatchDetectReveal system for comparative analysis of texts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination