Data query method, apparatus and computer readable storage medium
Technical field
This disclosure relates to technical field of data processing, in particular to a kind of data query method, apparatus and computer-readable
Storage medium.
Background technique
In today of internet rapid development, information is largely present in non-structured text data, a large amount of semi-structured
Table and webpage and the structural data of production system in.Financial industry is compared to other industries to various media events
Susceptibility it is much higher, therefore be highly dependent on the accuracy of media event, comprehensive and timeliness.Much, more full data
If can be timely aggregated into financial industry user hand with information, and can then will from the relationship for wherein analyzing countless ties
The working efficiency of user is greatly promoted, auxiliary user, which makes, most accurately to be judged, maximum return value is obtained.
Financial industry user generally requires experience following steps before making final investment or decision.First
It is manual data acquisition and processing, data (such as search engine, major news portal net is acquired from each different news sources
Stand, government notice, forum etc.), thus to obtain a large amount of news information, it is wherein right to be filtered out according to the personal experience of financial user
The invalid data of user.Followed by analysis and pooled classification data, user needs to extract each message event after acquisition of information
Key point and pooled classification, formed document, report or storage into the data system of structuring (such as Excel or
The databases such as mysql).Since data volume is very huge, in order to search for convenient and improve search efficiency, so the knot of data storage
Fruit can lack the information that some data provide originally.It is finally decision, user will do it data analysis after classification data, therefrom sentence
The range of influence and influence of the generation of disconnected media event to investment subject, then makes according to data analysis result final
Decision.
Summary of the invention
The technical problem that the disclosure solves is how to realize more comprehensive data analysis.
According to the one aspect of the embodiment of the present disclosure, a kind of data query method is provided, including:Receive user's input
Query information;Natural language processing is carried out to the query information of user's input, parsing obtains searching keyword;In knowledge mapping
Determine the corresponding query entity of searching keyword;Data-link associated with query entity in knowledge mapping is returned to user.
In some embodiments, determine that the corresponding query entity of searching keyword includes in knowledge mapping:In knowledge graph
The corresponding starting query entity of searching keyword is determined in spectrum;Number associated with query entity in knowledge mapping is returned to user
Include according to chain:It returns in knowledge mapping to user to originate data-link of the query entity as start node.
In some embodiments, determine that the corresponding query entity of searching keyword includes in knowledge mapping:In knowledge graph
The corresponding starting query entity of searching keyword is determined in spectrum and terminates query entity;To user return knowledge mapping in look into
Asking the associated data-link of entity includes:To user return knowledge mapping in using originate query entity as start node, with terminate
Query entity is the data-link of end node.
In some embodiments, the descending sequence of the relationship number for including according to data-link returns to knowledge graph to user
Data-link associated with query entity in spectrum.
In some embodiments, which further includes:Acquisition database data;Database data is carried out certainly
Right Language Processing extracts the relationship between entity and entity;Knowledge mapping is generated according to the relationship between entity and entity.
In some embodiments, natural language processing is carried out to database data, to extract between entity and entity
Relationship includes:Data critical word is extracted from database data using Natural Language Processing Models;By the reverse document frequency of word frequency-
Higher than first threshold data critical word as entity;Extracted from database data using Natural Language Processing Models entity it
Between relationship.
In some embodiments, acquisition database data include:Configuration data source address list starts the page number, sign-off sheet
Code and acquisition time;According to acquisition time, automatically extracts data source address list, starts determined by the page number, end code
News data;Parsing obtains the title and textual data in news data, stores to database.
According to the other side of the embodiment of the present disclosure, a kind of data query device is provided, including:Information receives mould
Block is configured as receiving the query information of user's input;Keyword resolution module is configured as the query information inputted to user
Natural language processing is carried out, parsing obtains searching keyword;Entity determining module is configured as determining inquiry in knowledge mapping
The corresponding query entity of keyword;Data return module is configured as returning to user related to query entity in knowledge mapping
The data-link of connection.
In some embodiments, entity determining module is configured as:Determine that searching keyword is corresponding in knowledge mapping
Originate query entity;Data return module is configured as:It returns in knowledge mapping to user to originate query entity as starting section
The data-link of point.
In some embodiments, entity determining module is configured as:Determine that searching keyword is corresponding in knowledge mapping
It originates query entity and terminates query entity;Data return module is configured as:It returns in knowledge mapping to user with starting
Query entity is start node, the data-link to terminate query entity as end node.
In some embodiments, data return module is configured as:The relationship number for including according to data-link is descending
Sequentially, data-link associated with query entity in knowledge mapping is returned to user.
In some embodiments, data query device further includes knowledge mapping generation module, is configured as:Acquisition database
Data;Natural language processing is carried out to database data, extracts the relationship between entity and entity;According to entity and entity
Between relationship generate knowledge mapping.
In some embodiments, knowledge mapping generation module is configured as:Using Natural Language Processing Models from database
Data critical word is extracted in data;The reverse document frequency of word frequency-is higher than the data critical word of first threshold as entity;It utilizes
Natural Language Processing Models from database data extract entity between relationship.
In some embodiments, knowledge mapping generation module is configured as:Configuration data source address list, start the page number,
End code and acquisition time;According to acquisition time, automatically extracts data source address list, starts the page number, end code institute
Determining news data;Parsing obtains the title and textual data in news data, stores to database.
According to the another aspect of the embodiment of the present disclosure, another data query device is provided, including:Memory;With
And it is coupled to the processor of memory, processor is configured as executing data above-mentioned based on instruction stored in memory
Querying method.
According to another aspect of the embodiment of the present disclosure, a kind of computer readable storage medium is provided, wherein computer
Readable storage medium storing program for executing is stored with computer instruction, and instruction realizes data query method above-mentioned when being executed by processor.
The disclosure returns to the related data chain in knowledge mapping in user input query information rear line, can be to user
Sufficiently show that the upstream and downstream incidence relation of related data information facilitates user to extend the range of customer analysis data
Carry out more comprehensive data analysis.
By the detailed description referring to the drawings to the exemplary embodiment of the disclosure, the other feature of the disclosure and its
Advantage will become apparent.
Detailed description of the invention
In order to illustrate more clearly of the embodiment of the present disclosure or technical solution in the prior art, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Disclosed some embodiments without any creative labor, may be used also for those of ordinary skill in the art
To obtain other drawings based on these drawings.
Fig. 1 shows the flow diagram for generating one embodiment of knowledge mapping process.
Fig. 2 shows the schematic diagrames of knowledge mapping.
Fig. 3 shows the flow diagram of one embodiment of disclosure data query method.
Fig. 4 shows the complete job flow chart of data query joint data acquisition.
Fig. 5 shows the flow diagram of the data query device of an embodiment of the present disclosure.
Fig. 6 shows the structural schematic diagram of the data query device of the disclosure another embodiment.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present disclosure, the technical solution in the embodiment of the present disclosure is carried out clear, complete
Site preparation description, it is clear that described embodiment is only disclosure a part of the embodiment, instead of all the embodiments.Below
Description only actually at least one exemplary embodiment be it is illustrative, never as to the disclosure and its application or making
Any restrictions.Based on the embodiment in the disclosure, those of ordinary skill in the art are not making creative work premise
Under all other embodiment obtained, belong to the disclosure protection range.
Inventor is the study found that in the related technology in order to allow final user to focus more on the analysis of event, usual data
Collection personnel and user are not the same person or the same team, in this way caused by result be exactly both sides to acquired information
Key point have different opinions, the missing of significant data or certain details certainly will be will lead to, to influence final number
According to analysis, the risk of customer investment is improved.Data analysis summary result is cured in document or database be in order to
User is facilitated to check, however document usually has fixed template, database has fixed field, such data processing
Cheng Wufa, come the cause and effect of dynamic analysis event, causes the imperfection of data according to the attribute of current information.In addition, with
The increment storage of data, historical data can accumulate gradually, cause data query process slow.
The data query method of the disclosure is discussed in detail step by step below.
(1) knowledge mapping is generated
Fig. 1 introduction is combined to generate one embodiment of knowledge mapping process first.
Fig. 1 shows the flow diagram for generating one embodiment of knowledge mapping process.As shown in Figure 1, the present embodiment
The middle process for generating knowledge mapping includes step S102~step S106.
In step s 102, acquisition database data.
It is completely dependent on information required for manually going in multiple news sources to acquire in the related technology, therefore needs daily a large amount of
Manpower complete complicated onerous toil, cause temporal waste, as soon as necessarily had certain whenever newly-increased data source
Manpower go to collect new data source information.
In order to reduce the degree of difficulty of acquisition data, the present embodiment is accomplished that the automatic collection of news data, to save
Manpower and time cost.Firstly, configuration data source address list, the beginning page number, end code and acquisition time.Then, root
According to acquisition time, automatically extracts data source address list, starts news data determined by the page number, end code.Finally, parsing
The title and textual data in news data are obtained, is stored to database, next natural language processing is used for.If terminal
The knowledge mapping for paying close attention to financial events generates, and can acquire news data relevant to financial, finance and economics.
In step S104, natural language processing is carried out to database data, extracts the relationship between entity and entity.
Specifically, can use Natural Language Processing Models extracts data critical word from database data.Such as news number
According to:" Monday (June 11), Zhang San has arrived at Singapore, i.e., historic meeting will be carried out with Korean Li Si.It is reported that Zhang Sanyu
The independent meeting of Li Si will be in the morning 9 12 day local time:15 hold.After talks, Zhang San will hold press conference, and in evening
8 points or so are left Singapore and return to the U.S..' deep V ' market, 1,003 high pointes of breakthrough are in a few days presented in stock gold.But because of ' independent meeting
View ' the rare progress in the eve U.S. is negotiated in advance, in addition the U.S.'s 10 term Yield of public debt uplink, causes hedging mood to start back
It falls, price of gold is hovered near 1300 dollars of critical points ".In the present embodiment, Natural Language Processing Models can be by the text in news data
The cutting of this progress semantically, by text dividing at words such as Zhang San, Singapore, Korea;Then by each root according to mark phase
The part of speech answered, such as:Zhang San's (name), Singapore's (place name);Finally, the mark entity needed for obtaining, including name, place name, number
Word etc..
Optionally, Natural Language Processing Models can be using the reverse document frequency algorithm of word frequency-, by the reverse document of word frequency-
Frequency is higher than the data critical word of first threshold as entity.When the reverse document frequency algorithm of concrete application word frequency-, word frequency-is inverse
It is to the calculation formula of document frequency:
X=Y*lg { M/ (N+1) }
Wherein, X indicates that the reverse document frequency of the word frequency-of certain word, Y indicate the word frequency of certain word, i.e., the word is in this article
There is total degree, M indicates to store the total number of documents in the corpus of all news datas, and N indicates to include the word in corpus
Number of files.The thought of the reverse document frequency algorithm of word frequency-is, if the frequency that some word or phrase occur in an article
Height, and seldom occur in other articles, then it is assumed that this word or phrase have good discrimination.Take the reverse text of word frequency-
The shelves biggish data critical word of frequency values helps to generate the knowledge mapping for representing news key point as entity.
It can be further from the relationship extracted in database data between entity using Natural Language Processing Models.For example,
Database data records text information " influence of 618 pairs of stock markets ".So, text dividing is first by Natural Language Processing Models
" 618 ", " on ", " stock market ", " influence ", then mark the part of speech of each word." 618 (noun) ", " stock market (noun) ", " to (be situated between
Word) ".Noun in text is entity, the relationship " between ... influence " can determine entity in text, i.e. starting entity is
" 618 ", terminating entity is " stock market ".It will be understood by those skilled in the art that if only one entity of text information, has been defaulted as
Beginning entity.
When occurring has the entity of identical content, entity will not be replaced, and only increase by one on the corresponding node of entity
Relationship is directed toward next node.Final numerous data will will form network complicated in a mistake, and here it is knowledge mappings
It is formed.
In step s 106, knowledge mapping is generated according to the relationship between entity and entity.
Fig. 2 shows the schematic diagrames of knowledge mapping.Can be used in stored knowledge map Neo4j, OrientDB,
TITAN is, it is preferable to use Neo4j.Neo4j is a high performance graphic data base, and structural data is stored on network by it
Rather than in table.It is one it is Embedded, based on disk, have the Java persistence engine of complete transactional attribute, but
It is that it stores structural data on the diagram rather than in table.Neo4j can also be counted as a high performance figure engine, should
Engine has all characteristics of mature database.The structured message for including in the news data of acquisition can be portrayed in this way
In Neo4j database.
In above-described embodiment, automatic data collection source data simultaneously automatically processes event by natural language processing, parses
To relationship storage between entity and entity into knowledge mapping, thus for user query in a manner of knowledge mapping, it can be to user
It shows and the related entity of event or relationship, help user's quick obtaining dependent event information.
(2) data query
One embodiment of disclosure data query method is introduced below with reference to Fig. 3.
Fig. 3 shows the flow diagram of one embodiment of disclosure data query method.As shown in figure 3, this implementation
Data query method in example includes step S302~step S308.
In step s 302, the query information of user's input is received.
In step s 304, natural language processing is carried out to the query information of user's input, parsing obtains searching keyword.
In step S306, the corresponding query entity of searching keyword is determined in knowledge mapping.
In step S308, data-link associated with query entity in knowledge mapping is returned to user.
Query information according to user's input is different, and query process can substantially be divided into two kinds of situations.
The first situation is, using the query information of natural language processing parsing user's input, after obtaining searching keyword
The corresponding starting query entity of searching keyword is determined in knowledge mapping and terminates query entity, then returns to knowledge to user
To originate query entity as start node, the data-link to terminate query entity as end node in map.
For example, the query information of user's input is " Zhang San meets with Li Si ".Parsing " Zhang San meet with Li Si " obtain " Zhang San ",
" Li Si ".Starting query entity " Zhang San " is determined in knowledge mapping, terminates query entity " Li Si ".Finally inquire two numbers
According to chain, the first data chain is " Zhang San "->" U.S. "->" independent meeting "->" Korea "->" Li Si ", the second data
Chain is " Zhang San "->" sanction "->" Korea "->" Li Si ".
Second case is, using the query information of natural language processing parsing user's input, after obtaining searching keyword
The corresponding starting query entity of searching keyword is determined in knowledge mapping, then is returned in knowledge mapping to user to originate inquiry
Entity is the data-link of start node.
When determining data-link, related knot can be begun stepping through to originate query entity according to depth-first rule
Shu Shiti, then to terminate entity to start to continue to traverse related end node, until traversal terminates or reaches certain
Terminate after condition.
Optionally, in step S308, the descending sequence of the relationship number for including according to data-link returns to user and knows
Know data-link associated with query entity in map.I.e. two hit entities between existing relational tree more multipriority more
Height, relationship number mostly will preferentially be shown.Finally according to the ranking value of traversal number, the number of nodes more than incidence relation is preferentially returned
According to.
For example, the first data chain " Zhang San "->" U.S. "->" independent meeting "->" Korea "->" Li Si " prior to
Article 2 data-link " Zhang San "->" sanction "->" Korea "->" Li Si "
Optionally, if entity in query process in any map of miss, can by middleware (such as
Kafka text information) is transferred to the information collection stage, executes the data acquisition instructions comprising specified text.Fig. 4 shows number
It is investigated that asking the complete job flow chart of joint data acquisition.
In above-described embodiment, the related data chain in knowledge mapping, energy are returned in user input query information rear line
It is enough sufficiently to show that the upstream and downstream incidence relation of related data information has to extend the range of customer analysis data to user
Help user and carries out more comprehensive data analysis.
In addition, relationship is most important element in chart database, it can be by the interrelated structure of entity by relationship
Build relevant complex model.Each node in chart database model directly includes a relation list, is deposited in relation list
Put the relation record of this node Yu other nodes.These relation records are organized according to type and direction, and can be saved
Adeditive attribute.When no matter when running the connection JOIN operation of similarity relation database, chart database will all be come using this list
The directly node of access connection, without search, the matching primitives operation recorded, to improve the efficiency of information inquiry and steady
It is qualitative.
The data query device of an embodiment of the present disclosure is described below with reference to Fig. 5.
Fig. 5 shows the flow diagram of the data query device of an embodiment of the present disclosure.As shown in Fig. 5, this implementation
Example in data query device 50 include:
Information receiving module 502 is configured as receiving the query information of user's input.
Keyword resolution module 504 is configured as carrying out natural language processing to the query information that user inputs, parse
To searching keyword.
Entity determining module 506 is configured as determining the corresponding query entity of searching keyword in knowledge mapping.
Data return module 508 is configured as returning to data-link associated with query entity in knowledge mapping to user.
In some embodiments, entity determining module 506 is configured as:Determine that searching keyword is corresponding in knowledge mapping
Starting query entity;Data return module 508 is configured as:It is returned in knowledge mapping to user and is to originate query entity
The data-link of beginning node.
In some embodiments, entity determining module 506 is configured as:Determine that searching keyword is corresponding in knowledge mapping
Starting query entity and terminate query entity;Data return module 508 is configured as:It is returned in knowledge mapping to user
To originate query entity as start node, the data-link to terminate query entity as end node.
In some embodiments, data return module 508 is configured as:The relationship number for including according to data-link is descending
Sequence, to user return knowledge mapping in data-link associated with query entity.
In above-described embodiment, the related data chain in knowledge mapping, energy are returned in user input query information rear line
It is enough sufficiently to show that the upstream and downstream incidence relation of related data information has to extend the range of customer analysis data to user
Help user and carries out more comprehensive data analysis.
In some embodiments, data query device 50 further includes knowledge mapping generation module 500, is configured as:Acquisition
Database data;Natural language processing is carried out to database data, extracts the relationship between entity and entity;According to entity with
And the relationship between entity generates knowledge mapping.
In some embodiments, knowledge mapping generation module 500 is configured as:Using Natural Language Processing Models from data
Data critical word is extracted in the data of library;The reverse document frequency of word frequency-is higher than the data critical word of first threshold as entity;Benefit
With Natural Language Processing Models from database data extract entity between relationship.
In some embodiments, knowledge mapping generation module 500 is configured as:Configuration data source address list starts page
Code, end code and acquisition time;According to acquisition time, automatically extracts data source address list, starts the page number, end code
Identified news data;Parsing obtains the title and textual data in news data, stores to database.
In above-described embodiment, automatic data collection source data simultaneously automatically processes event by natural language processing, parses
To relationship storage between entity and entity into knowledge mapping, thus for user query in a manner of knowledge mapping, it can be to user
It shows and the related entity of event or relationship, help user's quick obtaining dependent event information.
Fig. 6 shows the structural schematic diagram of the data query device of the disclosure another embodiment.As shown in fig. 6, the reality
The data query device 60 for applying example includes:Memory 610 and the processor 620 for being coupled to the memory 610, processor 620
It is configured as executing the data query method in any one aforementioned embodiment based on the instruction being stored in memory 610.
Wherein, memory 610 is such as may include system storage, fixed non-volatile memory medium.System storage
Device is for example stored with operating system, application program, Boot loader (Boot Loader) and other programs etc..
Data query device 60 can also include input/output interface 630, network interface 640, memory interface 650 etc..This
It can for example be connected by bus 660 between a little interfaces 630,640,650 and memory 610 and processor 620.Wherein, defeated
Enter output interface 630 and provides connecting interface for input-output equipment such as display, mouse, keyboard, touch screens.Network interface 640
Connecting interface is provided for various networked devices.The external storages such as memory interface 650 is SD card, USB flash disk provide connecting interface.
The disclosure further includes a kind of computer readable storage medium, is stored thereon with computer instruction, and the instruction is processed
Device realizes the data query method in any one aforementioned embodiment when executing.
It should be understood by those skilled in the art that, embodiment of the disclosure can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the disclosure
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the disclosure, which can be used in one or more,
The calculating implemented in non-transient storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) can be used
The form of machine program product.
The disclosure is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present disclosure
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
The foregoing is merely the preferred embodiments of the disclosure, not to limit the disclosure, all spirit in the disclosure and
Within principle, any modification, equivalent replacement, improvement and so on be should be included within the protection scope of the disclosure.