CN111626568B

CN111626568B - Knowledge base construction method and knowledge search method and system in natural disaster field

Info

Publication number: CN111626568B
Application number: CN202010373034.0A
Authority: CN
Inventors: 何原荣; 陈秋瑾; 苏群; 冷鹏; 何婷婷
Original assignee: Xiamen University of Technology
Current assignee: Xiamen University of Technology
Priority date: 2020-05-06
Filing date: 2020-05-06
Publication date: 2024-02-20
Anticipated expiration: 2040-05-06
Also published as: CN111626568A

Abstract

The invention discloses a method for constructing a knowledge base in the field of natural disasters, which is used for realizing automatic generation of the knowledge base. The method comprises the following steps: constructing a basic database in the natural disaster field; acquiring disaster text data; extracting structured disaster data from the disaster text data by adopting a machine-like learning method; classifying the structured disaster data according to preset classification to obtain a disaster classification result; adopting a data analysis algorithm to carry out mining analysis on the structured disaster data to obtain disaster mining analysis results; integrating a monitoring system related to natural disasters, and extracting natural disaster monitoring data from the detection system; and expanding the basic database based on the data obtained in the steps to form a natural disaster field knowledge base. The application also discloses a device for realizing the method, and a knowledge searching method and a searching system.

Description

Knowledge base construction method and knowledge search method and system in natural disaster field

Technical Field

The invention relates to the field of computers, in particular to a knowledge base construction method, a knowledge search method and a knowledge search system in the field of natural disasters.

Background

Along with global climate change and environmental destruction, various natural disasters occur continuously, the threats to the life and property safety of the country and people are increasingly prominent, and currently, most university scientific research institutions lack a search engine or expert knowledge base management system aiming at the natural disaster field, so that the researcher can quickly know the recent disaster dynamics and analyze disaster data to cause inconvenience.

The search engine or expert knowledge base management system in the natural disaster field needs to be built on the basis of rich, real and reliable historical disaster data accumulation, the extraction of data belongs to the technical category of data structuring of big data, and the data structuring is the most critical step in the big data mining field. Because of the rapid development of internet technology, a large amount of data is generated in the network of information explosion, wherein most of the data is stored in a non-structural or semi-structural mode such as text, so that the mining of the large data is firstly to systematically study how to mine the non-structural text data. The web crawler is a program for automatically extracting web pages, downloads web pages from the internet for a search engine, and is an important component of the search engine. After crawling, preprocessing, extracting and analyzing the data on the internet, the Web document exists as text data, and the text data needs to be converted into structural data in order to mine and infer knowledge contained in the text data. At present, the method for extracting the structured data from the text data mainly comprises the methods of character string matching, regular expression matching, neural network in machine learning and the like, but when the data volume is large, the extraction rate of character string matching and regular expression matching is lower, the problems that the data in the Internet is exponentially increased, the structure is continuously changed and the code needs to be continuously modified can not be solved, and the neural network method in machine learning has lower data accuracy and can cause performance bottleneck and other problems when processing massive data on the Internet.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a knowledge base construction method, a knowledge search method and a knowledge search system in the natural disaster field.

The invention is realized by the following technical scheme:

in a first aspect, a method for constructing a knowledge base in a natural disaster field is provided, including constructing a basic database in the natural disaster field;

acquiring disaster text data;

extracting structured disaster data from the disaster text data by adopting a machine-like learning method;

classifying the structured disaster data according to preset classification to obtain a disaster classification result;

adopting a data analysis algorithm to carry out mining analysis on the structured disaster data to obtain disaster mining analysis results;

integrating a monitoring system related to natural disasters, and extracting natural disaster monitoring data from the detection system:

and expanding the basic database based on the disaster text data, the structured disaster data, the disaster classification processing result, the disaster excavation analysis result and the monitoring data to form a natural disaster field knowledge base.

In a second aspect, a method for searching knowledge in natural disaster domain is provided, including

Obtaining keywords matched with search words according to the search words input by a user;

obtaining data matched with the keywords through a knowledge base;

transmitting the data to a user;

the knowledge base comprises a basic knowledge base which is expanded based on disaster text data, structured disaster data extracted from the disaster text data by a machine-like learning method, disaster classification processing results, disaster mining analysis results and monitoring data, and a natural disaster field knowledge base is formed.

In a third aspect, a device for constructing a knowledge base in a natural disaster field is provided, including: the system comprises an initialization module, a data crawling module, a data processing module, a data classification module, a data analysis module, a system integration module and a knowledge base expansion module;

the initialization module is used for constructing a basic database in the natural disaster field;

the data crawling module is used for acquiring disaster text data:

the data processing module is used for extracting structured disaster data from the disaster text data by adopting a machine-like learning method;

the data classification module is used for classifying the structured disaster data according to preset classification to obtain a disaster classification processing result;

The data analysis module is used for carrying out mining analysis on the structured disaster data by adopting a data analysis algorithm to obtain a disaster mining analysis result;

the system integration module is used for integrating a monitoring system related to natural disasters and extracting natural disaster monitoring data from the detection system;

the knowledge base expansion module is used for expanding the basic database based on the disaster text data, the structured disaster data, the disaster classification processing result, the disaster mining analysis result and the monitoring data to form a natural disaster field knowledge base.

In a fourth aspect, a natural disaster area knowledge search system is provided, including

The query module is used for obtaining keywords matched with the search words according to the search words input by the user;

the searching module is used for obtaining data matched with the keywords through a knowledge base;

the output module is used for sending the data to a user;

the knowledge base is constructed through a knowledge base construction module, and the knowledge base construction module is used for expanding a basic knowledge base based on disaster text data, structured disaster data extracted from the disaster text data by a machine-like learning method, disaster classification processing results, disaster mining analysis results and monitoring data to form a natural disaster field knowledge base.

The knowledge base construction method and device and the knowledge search method and system have the following beneficial effects: the structured disaster data is extracted from the disaster text data in advance by adopting a machine-like learning method, so that the data extraction rate is improved; by classifying and mining and analyzing the structured disaster data, scientific researchers can be helped to quickly know recent disaster dynamics and analyze disaster data.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art. Fig. 1 is a flowchart of a knowledge base construction method according to an embodiment of the present invention.

Fig. 2 is a flowchart of extracting structured disaster data from disaster text data according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an information gathering processing model according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of an engine module corresponding to an information gathering and processing model according to an embodiment of the present invention.

Fig. 5 is a schematic technical route diagram of a knowledge base construction method according to an embodiment of the present invention.

Fig. 6 is a flowchart of a knowledge search method according to an embodiment of the present invention.

Fig. 7 is a schematic structural diagram of a knowledge base construction device according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of a functional module of a knowledge search system according to an embodiment of the present invention.

Fig. 9 is a schematic diagram of natural disaster data based on knowledge tree presentation according to an embodiment of the present invention.

Fig. 10 is a schematic diagram of keyword cloud graphics provided by an embodiment of the present invention.

Fig. 11 is a schematic diagram of typhoon cloud graphics provided by an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

Referring to fig. 1 of the specification, an embodiment of the invention provides a method for constructing a knowledge base in the field of natural disasters. The knowledge base construction method may include the steps of:

S101, constructing a basic database in the natural disaster field;

specifically, the basic database may include a keyword library, a sample database, a structured rule library, a knowledge tree library, a tag library, a target website library, and the like. The keyword library stores keywords related to natural disasters, the keywords can be set according to academic websites and professional websites, the academic websites can comprise Chinese knowledge networks, wipe, all parties and the like, the professional websites can comprise hundred degrees, newwave microblogs, knowledge, weChat public numbers, related forums and the like, and the keywords can be customized according to the current searching condition, real-time hot news and the like and then added into the database; sample data is stored in a sample database, and can be added manually, and consists of a sample source and a target data result; the structured rule library stores structured rules by which structured disaster data can be extracted from disaster text data from the Internet; the knowledge tree base stores each disaster category in the natural disaster field in a tree structure; the labels in the label library are used for identifying information in natural disaster data, such as disaster occurrence date, occurrence place, disaster level and the like; the target website library stores URL (uniform resource locator) resources to be crawled, and indexes the URL resources and the structured data.

In a specific embodiment, in order to improve the reading and writing efficiency, the target website library may use the Redis cache database to store other data in combination with the MySQL relational database, and establish an index relationship between URL resources in the target website library and the other data, for example, may establish an index relationship between the URL resources in the target website library and text data in the text library and structured data in the structured database.

S102, acquiring disaster text data;

in the embodiment of the invention, the preset disaster keywords to be crawled can be determined based on the target website to be crawled, and the preset disaster keywords can be customized or obtained from a keyword library; then, acquiring original text data matched with the preset disaster keywords from a target website by adopting a distributed crawler technology and/or an incremental crawler technology; and finally, preprocessing the original text data to obtain disaster text data, and storing the disaster text data in a text library.

In a specific embodiment, in order to implement a distributed crawler, a crawler program can be deployed on a plurality of hosts, so that a plurality of crawler programs can perform collaborative crawling at the same time, each crawler program executes different crawler tasks, a mode of performing collaborative crawling by using multiple threads of the crawler tasks can be adopted, task concurrency can be increased, so that crawler efficiency is greatly improved, and a mode of deploying multiple distributed crawler servers inside and outside a state can be used; the incremental crawler technology refers to a crawler technology which adopts incremental updating to downloaded webpages and only crawls newly generated or changed webpages, so that the crawled webpages can be ensured to have timeliness to a certain extent, and the data downloading amount and the time and space consumption are reduced. In order to know the state of data acquisition in real time, the crawler task can be monitored, for example, the monitoring task is executed at regular intervals, and safety and maintainability are improved.

In a specific embodiment, preprocessing the original text data includes: the text content extraction can extract information such as text content, logic structure of the text and the like from the original text data; translation: according to language requirements, translating the original text data into the required language; semantic recognition: the semantics composed of the keywords and the corresponding sequences are understood from the machine perspective, so that the structuring of the data is better performed; part-of-speech recognition: part of speech recognition is carried out in a certain semantic environment, part of speech of a sentence is recognized, and then the part of speech of a certain article/comment is deduced according to weight; and (5) performing cleaning and deduplication operations on the data, and removing invalid data.

In a specific embodiment, the pre-duplication removal processing of the website can be realized in the crawling process, the unique identification formulation, namely the data fingerprint, can be carried out on the crawled target website URL, when the data size is not large, the purpose of duplication removal can be achieved by utilizing the characteristics of a small function or a collection, and if the data size is large enough, the space can be saved by setting the identification; a Bloom Filter (also known as Bloom Filter) algorithm may also be introduced as a means of deduplication; when the re-judging process is carried out on the crawled original text data, firstly, a mark, namely a data fingerprint, is added to keywords in the original text data, then the mark is compared with disaster text data in a database, corresponding weight is determined according to the occurrence frequency of each keyword, the original text data where the keyword with higher weight is located is most likely to be repeated, finally, the full-text original text data is sequentially compared from top to bottom according to the weight, and if the original text data is found to be repeated, the repeated data is deleted or discarded according to the existing storage logic.

S103, extracting structured disaster data from the disaster text data by adopting a machine-like learning method;

the extraction of the structured disaster data from the disaster text data mainly aims at mining the hidden structure in the disaster text data, so that the disaster text data can be analyzed conveniently, namely, the needed data types are extracted from the disaster text data according to a certain rule, and the text data is structured. In order to solve the problem of low data rate of character string matching and regular expression matching extraction and solve the problem of low accuracy of automatic data extraction of a neural network related method in machine learning, the embodiment of the invention provides a machine-like learning method which extracts required knowledge from text data in a semi-automatic manner through a structured rule of a defined rule base. In order to achieve the purpose, a quasi-machine learning data extraction model is constructed in the natural disaster field, content related to release time can be rapidly extracted, and the like, and the quasi-machine learning data extraction model is used as a training mode for semi-automatically extracting text data and is used as an extraction means for extracting other information, such as information of authors, regions, URLs, article sources and the like.

Specifically, referring to fig. 2 of the specification, a machine-like learning method is used to extract structures from the disaster text data

The step of transforming disaster data may include:

s201, acquiring keywords of the disaster text data:

s202, determining a label of the disaster text data;

wherein, the label includes: disaster type tags (typhoons, earthquakes, etc.), year tags (years) of time, month tags (months) of time, day tags (days) of time, regional tags, disaster scale tags, etc.

S203, obtaining a structuring rule from a structuring rule base according to the keywords and the labels;

specifically, matching corresponding structuring rules from a structuring rule base according to the keyword and the label automatically and according to the priority order, selecting the structuring rule with the highest priority as a conversion rule, and discarding text data converted at the time if the structuring rule is not matched; if a plurality of structuring rules are matched, namely structuring rules with the same priority are matched with the keywords and the tag information, the system prompts, such as alarming, at the moment, the text data converted at the moment are not processed, and the next cycle is waited for reprocessing. At the same time, related personnel (such as management personnel or maintenance personnel) can be based on the prompt

And (3) information, adjusting the priority of the structural rule with conflict, and realizing semi-automation of data extraction.

S204, extracting structured disaster data from the disaster text data according to the structuring rule.

Specifically, information such as an author, disaster occurrence time, disaster grade or disaster area can be extracted from the disaster text data according to the set structuring rule. After the disaster text data are converted into the structured disaster data in advance, the structured disaster data are stored in the structured disaster database according to certain logic, so that the data extraction rate can be improved.

S104, classifying the structured disaster data according to preset classification to obtain a disaster classification processing result;

the preset classification may include news information, professional documents, thematic classification, knowledge trees, and popular journals.

S105, adopting a data analysis algorithm to carry out mining analysis on the structured disaster data to obtain a disaster mining analysis result;

specifically, the mining analysis of the structured disaster data can use related data analysis algorithms such as Echarts technology and knowledge graph to analyze information such as the downloading amount and the introduction amount of hot news, the amount of articles in the current year, the number of articles related to natural disasters and the like, so that users or related scientific researchers can conveniently know recent disaster dynamics through the information, and further analyze and research the disaster according to the information.

In a specific application, according to the mining analysis result, detailed research information of natural disasters can be embodied in a more visual mode such as various charts, keyword cloud charts, mechanism collaborative co-occurrence charts and the like, so that the disaster influence degree, the research trend and the like can be more conveniently researched.

S106, integrating a monitoring system related to the natural disasters, and extracting natural disaster monitoring data from the monitoring system;

specifically, the embodiment of the invention can acquire detection data from the integrated typhoon detection system, complement and perfect a typhoon knowledge system through real-time typhoon detection platform data, and facilitate the monitoring and analysis of typhoons by users. In another preferred embodiment, the knowledge base can be further extended continuously by providing a calling interface to obtain resource information of other systems.

And S107, expanding the basic database based on the disaster text data, the structured disaster data, the disaster classification processing result, the disaster excavation analysis result and the monitoring data to form a natural disaster field knowledge base.

Specifically, establishing an index relation between the text database and the structured database; supplementing the disaster classification processing result and the disaster mining analysis result in a text library and a structured database; extracting keywords from the disaster text data, and updating a keyword library; determining whether sample data is added according to the disaster text data information, and updating a rule base; and finally forming a knowledge base in the natural disaster field, and continuously perfecting the knowledge base in the data acquisition and processing process.

In a specific real-time manner, since the formats of the text data obtained from the internet are not consistent, the structured rule base may be extended continuously at irregular intervals in order to extract structured disaster data from the disaster text data rapidly. In order to obtain the structuring rule (rule for extracting the structuring data from the text data), the sample data can be customized, and a certain logic processing method is adopted to automatically extract the structuring rule from the sample data. As shown in table 1, to extract content related to the release time in the text data, a structuring rule is obtained from two sample data constructed.

TABLE 1 example of structuring rules

In a specific application, using the structuring rules generated by sample data 1, text data may be converted into a form of structured data as: "year: year, month of current date: a, day: b, time: and C, dividing: d'; using the structuring rules generated by sample data 2, the text data may be converted into a form of structured data as: "year: year, month of current date: a, day: b, time: 0, divide: 0; the time zone is: indian time zone. By analogy, by defining the structuring rules of different sample data, information such as author, region, URL, article source, etc. can also be extracted from the text data.

In machine learning, a certain class in the classification problem is called a class, a data point is called a sample, and a class corresponding to a certain sample is called a label.

In the embodiment of the invention, in order to collect information related to natural disasters from mass data and process the collected information, an information collection and processing model shown in fig. 3 is provided. As shown in fig. 3, the information collecting and processing model mainly comprises a lower base data layer, a user data processing layer and a learning improvement layer, wherein the lower base data layer, the user data processing layer and the learning improvement layer only represent hierarchical division in logic functions, and are not limited to the hierarchical division, and fig. 4 is an engine block diagram corresponding to fig. 3.

Specifically, the lower layer of basic data includes a crawler engine, a text content extraction engine, a translation engine, a semantic recognition engine and a part-of-speech recognition engine, and mainly implements the function of step S102. The crawler engine is mainly used for acquiring text data such as webpage resources, file data and the like from the Internet; the text content extraction engine is mainly used for judging the content frame of the webpage, the logic structure related to the knowledge points and identifying and extracting the text content of the webpage according to the webpage resources; the translation engine is mainly used for converting the text content into a required language; the semantic recognition engine is mainly used for understanding the semantics composed of the keywords and the corresponding sequences from the machine perspective, so that the structuring of data is better performed; the part-of-speech recognition engine is mainly used for recognizing the part of speech of a sentence in a certain semantic environment, and further deducing the part of speech of a certain article/comment according to weights.

Specifically, the user data processing layer includes a knowledge tree engine, a keyword generation engine, and a structuring engine, and may implement the functions of step S103, and a part of the functions of step S107. The knowledge tree engine is mainly based on a preset knowledge tree, supplements the logic structure to a knowledge tree base according to a certain proportion according to the logic structure related to the knowledge points identified by the text content extraction engine so as to enable the professional knowledge tree to grow continuously and realize the automatic generation and expansion of the knowledge tree; the keyword generation engine preprocesses the text content extracted by the text content extraction engine, calculates keywords of the text by combining the Chinese word stock, the industry knowledge tree and the word occurrence frequency of the text to obtain preset keywords of the text, and stores the preset keywords in the keyword stock; the structuring engine is mainly used for extracting meaningful information in the text content according to structuring rules preset by a system or structuring requirements defined by users by combining machine semantic understanding after the text content is extracted by the text content extracting engine, extracting structuring data and storing the structuring data in the structuring database.

Specifically, the learning improvement layer, that is, the user interaction and machine learning self improvement layer, may implement a part of the functions of step S107. For example, a preset keyword of a text stored in a database may be utilized, and after a user inputs a keyword, structured or unstructured data may be displayed according to the user's needs; the first 10 keywords can be displayed in sequence according to a certain weight by using the associated keywords in the knowledge tree and the associated keywords in the text library, so that the interaction range of the user is reduced, and the user can select a plurality of keywords to reduce the displayed text content or the structured data range corresponding to the text content; and a machine learning related algorithm can be used, and the relevance of the keywords is continuously perfected according to the text keywords prompted to the user by the system and the result selected by the user as a learning sample.

In one particular embodiment, the information gathering process model may operate in several ways: constructing a knowledge network by using professional knowledge points, namely natural disaster relations, and initializing keywords; acquiring data on the Internet in real time, extracting text content in the data, extracting logical relations of blocks, and correcting and supplementing a knowledge base according to authority of a text source; calculating keywords of the text by using text content and adopting a word frequency mode, and extracting common keywords on the basis of a large number of texts to continuously complement and perfect a knowledge base; after the user inputs the keyword information, the association degree of the keywords is improved under the condition of machine learning based on the user interaction; based on the association of the keywords, an index tree, namely a knowledge tree, is established, the searching efficiency is improved, and a knowledge base is quickly established based on the knowledge tree. The special advantages of the user-defined simple data processing logic or the user-defined processing logic after communicating with the user are that the customized searching and researching of the content related to the natural disaster field are realized.

In a specific embodiment, the background architecture for constructing the knowledge base may be based on a mature system framework, for example, an SSM framework, where the SSM framework integrates the functions of Spring, springMVC and MyBatis frameworks, sends a request to the controller through a page, the controller invokes business layer processing logic, the logic layer sends a request to the persistence layer, the persistence layer interacts with the database, and returns the result to the business layer, the business layer sends the processing logic to the controller, and the controller recalls the view to present data at the presentation layer.

In the embodiment of the invention, the knowledge base construction adopts a technical route, and reference is made to fig. 5 of the specification. The system shown in fig. 5 firstly sets keywords according to several authoritative large news websites (including hundred degrees, newness microblogs, knowledgeable, west letter public numbers, related forums and the like) and professional academic websites (including Chinese knowledgeable networks, wipe and mastery and the like), constructs knowledge trees according to the keywords and stores the knowledge trees in a database, and crawls data according to the set keywords by adopting a web crawler technology to obtain unstructured text data; then carrying out preprocessing operations such as de-duplication, content analysis, data cleaning, text extraction, translation, metadata processing and the like on unstructured text data according to related algorithms of word segmentation, semantic analysis and text extraction; carrying out structuring treatment on the unstructured text data after pretreatment, converting the unstructured text data into structured data, and storing the structured data in a database; finally, based on a knowledge tree model, realizing the structured display of disaster specific conditions according to different categories according to the tree node associated keywords; in addition, the system further utilizes knowledge related to the knowledge graph and algorithm to further excavate and analyze the structured data, and the detailed research information of natural disasters is embodied in a more visual mode such as various charts, keyword cloud charts, mechanism collaborative co-occurrence charts and the like.

The knowledge base in the natural disaster field can have various use scenes, for example, the knowledge base is used for providing hot news or disaster data of the current natural disaster for the user, the user can analyze and research the disaster by using the disaster data, or the knowledge base is used for carrying out knowledge sharing for the user. Knowledge searching is an application in a knowledge base application scenario.

Referring to the attached 6 of the specification, the embodiment of the invention also provides a knowledge searching method in the natural disaster field. The search method may include the steps of:

s601, obtaining keywords matched with search words according to the search words input by a user;

s602, obtaining data matched with the keywords through a knowledge base;

and S603, sending the data matched with the keywords to a user.

The knowledge base comprises a basic knowledge base based on disaster text data, structured disaster data obtained by converting the disaster text data by a machine-like learning method, disaster classification processing results, disaster mining analysis results and monitoring data, and the knowledge base is expanded to form a natural disaster field knowledge base.

In a specific embodiment, the preset keywords stored in the knowledge base can be utilized, and after a search word is input by a user, structured or unstructured data is displayed according to the needs of the user; the method can also utilize preset keywords in the knowledge base, after a user inputs a search word, the search word is extracted, the first 10 keywords are sequentially displayed according to a certain weight by the associated keywords in the knowledge tree and the associated keywords in the text base, the interaction range with the user is shortened, and the user can select 1 or more keywords in the keywords to narrow the displayed text data or the range of structured data corresponding to the text data.

The above description has been made with knowledge of the implementation process of constructing a knowledge base and performing a knowledge search using the knowledge base, and the above implementation process may be implemented by an apparatus/system, and the apparatus for constructing a knowledge base and the knowledge search system are described below.

Referring to fig. 7 of the specification, in order to implement a knowledge base construction process, an embodiment of the present invention further provides a device for constructing a knowledge base in the field of natural disasters. The knowledge base construction apparatus 700 includes: an initialization module 701, a data crawling module 702, a data processing module 703, a data classification module 704, a data analysis module 705, a system integration module 706, and a knowledge base expansion module 707;

the initialization module 701 is used for constructing a basic database in the natural disaster field;

the data crawling module 702 is configured to obtain disaster text data;

the data processing module 703 is configured to convert the disaster text data into structured disaster data by using a machine-like learning method;

the data classification module 704 is configured to classify the structured disaster data according to a preset classification, so as to obtain a disaster classification result;

the data analysis module 705 is configured to perform mining analysis on the structured disaster data by using a data analysis algorithm, so as to obtain a disaster mining analysis result;

The system integration module 706 is configured to integrate a monitoring system related to a natural disaster, and extract natural disaster monitoring data from the monitoring system;

the knowledge base expansion module 707 is configured to expand the base database based on the disaster text data, the structured disaster data, the disaster classification processing result, the disaster excavation analysis result, and the monitoring data, so as to form a natural disaster domain knowledge base.

The knowledge base expansion module 707 is further configured to automatically extract a structuring rule from the customized sample data, and store the structuring rule in the structuring rule base.

In order to facilitate management of the acquired data, the knowledge base construction device further includes a resource management module 708 for managing resources in the knowledge base construction process, such as information document management, search condition management, crawling engine management, user management, and log management.

Referring to fig. 8 of the specification, in order to implement a knowledge searching process, an embodiment of the present invention further provides a knowledge searching system in the natural disaster field. The knowledge search system includes: a query module 801, a search module 802, an output module 803, and a knowledge base construction module 804;

The query module 801 is configured to obtain keywords matching with search terms input by a user according to the search terms

The searching module 802 is configured to obtain data matching the keyword through a knowledge base;

the output module 803 is configured to send the data to a user;

the knowledge base is constructed by a knowledge base construction module 804, which is used for expanding a basic knowledge base based on disaster text data, structured disaster data obtained by converting the disaster text data by a machine-like learning method, disaster classification processing results, disaster mining analysis results and monitoring data to form a knowledge base in the natural disaster field. The knowledge base construction module 804 corresponds to the knowledge base construction device 700.

The knowledge search system provided by the embodiment of the invention is similar to an intelligent search engine based on a knowledge base, and can display disaster text data or structured disaster data according to user needs.

In order to more clearly illustrate the method and the device for constructing the knowledge base in the natural disaster field and the method and the system for searching knowledge provided by the embodiment of the invention, the embodiment of the invention also provides a series of interface effect schematic diagrams. It should be noted that these drawings are only for clarity of illustrating embodiments of the invention and are not intended to limit the embodiments of the invention.

Referring to fig. 9 of the specification, data to be displayed is quickly obtained from a knowledge base according to search word information input by a user. Searching according to knowledge tree, date range of disaster, keywords, source and title, obtaining natural disaster data related to searching conditions from database by system, including title, release time, source, detail URL, summary and keywords, and operating each piece of information including 'hiding' and 'collecting' functions of article, and improving user experience; the user can "hide" the invalid articles and "collect" the important articles; the hidden articles can be restored and displayed in the hidden clip function interface, and the collected articles can be discarded in the favorites.

Because of the mining analysis of text data, after a user inputs search words, research trends in the natural disaster field can be displayed in a visual manner such as a histogram, a line graph, a pie chart, a keyword cloud chart, a mechanism co-occurrence chart (also referred to as an author synthetic chart), and information such as hot news, the downloading amount and the introduction amount of articles, the current year article amount, and the number of natural disaster related articles can be displayed. Referring to fig. 10 of the description, a user can quickly understand the research trend of natural disasters from the cloud charts of the natural disaster keywords.

Through the typhoon monitoring system of integration, can load real-time typhoon monitoring platform data, show typhoon cloud picture and typhoon detailed information, the convenience of customers monitors and analyzes typhoon risk condition, especially in the typhoon high incidence area, have important meaning to understanding the typhoon condition in real time. Referring to fig. 11 of the specification, the system can search according to the typhoon date range, year, month, name, number and other information, automatically provide typhoon year/season, number, name, birth place, generation time, death time, original website for data acquisition, lowest air pressure, maximum air speed and other information, and a user can access the provided original website to view more detailed information.

In summary, the embodiment of the invention greatly facilitates the collaborative work of related scientific researchers by acquiring the natural disaster data in real time, searches and downloads related documents, news information, document data and the like for management and research, and can also manage, accumulate and share the knowledge related to the natural disaster through the system provided by the embodiment of the invention. In addition, after the structured data is extracted in a semi-automatic way, certain reasoning prediction is carried out on knowledge, so that a user can more clearly know information such as the latest research trend in the field, and valuable references are provided for people to learn; the knowledge tree model-based form is structured, classified and organized for display, so that a user can conveniently browse and inquire, and related data management and user authority management are performed, and the knowledge base is developed towards the artificial intelligence direction. The construction of the natural disaster knowledge base and the search system has great value for rapidly and accurately grasping various loss conditions caused by natural disasters and developing disaster relief work in the future, so that the cause, defense and regularity of the disasters are better researched and comprehensively researched.

While the foregoing description illustrates and describes the preferred embodiments of the present invention, as noted above, it is to be understood that the invention is not limited to the forms disclosed herein but is not to be construed as excluding other embodiments, and that various other combinations, modifications and environments are possible and may be made within the scope of the inventive concepts described herein, either by way of the foregoing teachings or by those of skill or knowledge of the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims

1. The method for constructing the knowledge base in the natural disaster field is characterized by comprising the following steps of:

constructing a basic database in the natural disaster field; the basic database comprises a structured rule base and a target website base, wherein the structured rule base stores structured rules; extracting the structuring rules based on sample data and expanding a structuring rule base; the sample data is customized based on disaster text data; URL resources to be crawled are stored in the target website library, and indexes are established for the URL resources and the structured data;

automatically extracting the structuring rule from the self-defined sample data, and storing the structuring rule in the structuring rule base;

Acquiring disaster text data, wherein the disaster text data is data obtained by preprocessing original text data; the disaster text data comprises information of authors, disaster occurrence time, disaster grades and disaster areas;

acquiring disaster text data includes:

determining a preset disaster keyword;

acquiring original text data matched with a preset disaster keyword from the target website by adopting a distributed crawler technology and/or an incremental crawler technology;

preprocessing the original text data to obtain disaster text data;

the preprocessing of the original text data to obtain disaster text data comprises the following steps:

extracting text content, cleaning and de-duplicating data, translating, identifying semantics and identifying parts of speech of the original text data to obtain disaster text data;

extracting structured disaster data from the disaster text data by adopting a machine-like learning method; the machine-like learning method extracts the structured disaster data from the disaster text data based on the structured rules in the structured rule base in a semi-automatic manner;

the extracting the structured disaster data from the disaster text data by adopting a machine-like learning method comprises the following steps:

Obtaining typhoon keywords of the disaster text data;

determining typhoon labels of the disaster text data; the typhoon labels comprise disaster type labels, time year labels, time month labels, time day labels, regional labels and disaster scale labels; the tag is used for identifying information in natural disaster data;

acquiring the structuring rule from the structuring rule base according to the typhoon keyword and the typhoon tag;

the structural rule is obtained from the structural rule base according to the typhoon keyword and the typhoon label,

comprising the following steps:

based on the typhoon keywords and the typhoon labels, matching the corresponding structuring rules in the structuring rule base according to the priority order automatically, and selecting the structuring rule with the highest priority as a conversion rule;

if the structuring rule is not matched, discarding the disaster text data converted at the time;

if the structural rules are matched, namely the structural rules with the same priority are matched with the typhoon keywords and typhoon tag information, prompting by a system, and not processing the converted disaster text data, and waiting for the next cycle to be processed;

Extracting structured disaster data from the disaster text data according to the structuring rule;

integrating a monitoring system related to natural disasters, and extracting natural disaster monitoring data from the monitoring system;

2. The method of claim 1, wherein the integrating a natural disaster-related monitoring system comprises:

integrating the typhoon monitoring system.

3. The natural disaster domain knowledge searching method is characterized by comprising the following steps of:

obtaining data matched with the keywords through a natural disaster field knowledge base;

transmitting the data to a user;

The natural disaster domain knowledge base comprises a basic database which is expanded based on disaster text data, structured disaster data extracted from the disaster text data by a machine-like learning method, disaster classification processing results, disaster mining analysis results and monitoring data, and is formed; the machine-like learning method extracts the structured disaster data from the disaster text data based on the structured rules in a structured rule base in a semi-automatic manner;

the disaster text data are data obtained by preprocessing original text data;

preprocessing the original text data to obtain disaster text data, wherein the preprocessing comprises the following steps:

extracting text content, cleaning and de-duplicating data, translating, identifying semantics and identifying parts of speech of the original text data to obtain disaster text data; the disaster text data comprises information of authors, disaster occurrence time, disaster grades and disaster areas;

the basic database comprises a structured rule base and a target website base, wherein the structured rule base stores structured rules; extracting the structuring rules based on sample data and expanding a structuring rule base; the sample data is customized based on disaster text data; URL resources to be crawled are stored in the target website library, and indexes are established for the URL resources and the structured data;

the method for extracting the structured disaster data from the disaster text data by adopting the machine-like learning method comprises the following steps:

obtaining typhoon keywords of the disaster text data;

the step of obtaining the structuring rule from the structuring rule base according to the typhoon keyword and the typhoon label comprises the following steps:

and extracting structured disaster data from the disaster text data according to the structuring rule.

4. The utility model provides a natural disasters field knowledge base construction device which characterized in that includes: the system comprises an initialization module, a data crawling module, a data processing module, a data classification module, a data analysis module, a system integration module and a natural disaster field knowledge base expansion module;

the initialization module is used for constructing a basic database in the natural disaster field; the basic database comprises a structured rule base and a target website base, wherein the structured rule base stores structured rules; extracting the structuring rules based on sample data and expanding a structuring rule base; the sample data is customized based on disaster text data; URL resources to be crawled are stored in the target website library, and indexes are established for the URL resources and the structured data;

the data crawling module is used for acquiring disaster text data, wherein the disaster text data is obtained by preprocessing original text data; the disaster text data comprises information of authors, disaster occurrence time, disaster grades and disaster areas; acquiring disaster text data includes:

determining a preset disaster keyword;

acquiring original text data matched with a preset disaster keyword from a target website by adopting a distributed crawler technology and/or an incremental crawler technology;

preprocessing the original text data to obtain disaster text data;

the data processing module is used for extracting structured disaster data from the disaster text data by adopting a machine-like learning method; the machine-like learning method extracts the structured disaster data from the disaster text data based on the structured rules in the structured rule base in a semi-automatic manner;

obtaining typhoon keywords of the disaster text data;

the data analysis module is used for carrying out mining analysis on the structured disaster data by adopting a data analysis algorithm to obtain disaster mining analysis results;

the system integration module is used for integrating a monitoring system related to natural disasters and extracting natural disaster monitoring data from the monitoring system;

the natural disaster domain knowledge base expansion module is used for expanding the basic database based on the disaster text data, the structured disaster data, the disaster classification processing result, the disaster mining analysis result and the monitoring data to form a natural disaster domain knowledge base.

5. The apparatus of claim 4, wherein the natural disaster area knowledge base extension module is further configured to automatically extract a structuring rule from the custom sample data, and store the structuring rule in the structuring rule base.

6. A natural disaster domain knowledge search system, comprising:

the searching module is used for obtaining data matched with the keywords through a natural disaster field knowledge base;

the output module is used for sending the data to a user;

the natural disaster field knowledge base is constructed through a natural disaster field knowledge base construction module, and the natural disaster field knowledge base construction module is used for expanding a basic database based on disaster text data, structured disaster data extracted from the disaster text data by a machine-like learning method, disaster classification processing results, disaster mining analysis results and monitoring data to form a natural disaster field knowledge base;

the disaster text data are data obtained by preprocessing original text data; the disaster text data comprises information of a worker, disaster occurrence time, disaster grade and disaster area;

the method for extracting the structured disaster data from the disaster text data by adopting the machine-like learning method comprises the following steps: the machine-like learning method extracts the structured disaster data from the disaster text data based on the structured rules in the structured rule base in a semi-automatic manner;

obtaining typhoon keywords of the disaster text data;