CN110990590A - Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning - Google Patents

Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning Download PDF

Info

Publication number
CN110990590A
CN110990590A CN201911322390.3A CN201911322390A CN110990590A CN 110990590 A CN110990590 A CN 110990590A CN 201911322390 A CN201911322390 A CN 201911322390A CN 110990590 A CN110990590 A CN 110990590A
Authority
CN
China
Prior art keywords
financial
entity
data
financial entity
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911322390.3A
Other languages
Chinese (zh)
Inventor
闫宏飞
张霞
苗睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201911322390.3A priority Critical patent/CN110990590A/en
Publication of CN110990590A publication Critical patent/CN110990590A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a dynamic financial knowledge map construction method based on reinforcement learning and transfer learning. The method comprises the following steps: 1) constructing a financial knowledge map for the structured data and semi-structured data of each selected listed company; inserting entity names corresponding to the financial entities in the map into a financial entity database; 2) obtaining a financial entity data set for unstructured data associated with the selected listed company; 3) training a financial entity recognition model by utilizing the financial entity data set and the standard entity recognition data set; 4) generating a financial entity link data set, and then training a financial entity link model by using the financial entity link data set; 5) finding a financial entity corresponding to each entity in the unstructured data in the financial knowledge graph by using the trained financial entity link model and updating the financial knowledge graph; 6) and performing financial entity relationship extraction from the unstructured data by using a financial relationship extraction model and updating the financial knowledge map.

Description

Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning
Technical Field
The invention relates to a construction method of a dynamic financial knowledge map, which specifically comprises the steps of constructing a basic knowledge map by utilizing related structured and unstructured data of a company on sale of A stock, expanding and optimizing the map through a plurality of models such as reinforcement learning, transfer learning and the like, and finally constructing and displaying the dynamic financial knowledge map. The present invention is in the field of presentation learning and data analysis.
Background
1. Dynamic financial knowledge map
The name "knowledge graph" originated from the knowledge base introduced by ***, inc 2012, which is used to support the semantic organization of data on a network to provide intelligent search services. The entities and relationships stored in the knowledge base can be completely equivalent to the nodes and edges of the graph, so that the knowledge graph is gradually equivalent to the concept of the knowledge base.
Since the creation of the concepts of knowledge bases and knowledge maps, many influential knowledge base projects are emerging at home and abroad. The construction of the traditional knowledge map mainly depends on expert knowledge, such as CYC, WordNet and the like. Later, with the development of the internet, a great deal of high-quality user-generated content, such as WikiData, Freebase, CN-DBpedia, etc., was produced. With the rapid development of machine learning and deep learning, the automatic map construction technology is more mature, and the coverage rate and scale of knowledge maps, such as NELL, are greatly improved.
With the development of knowledge maps, the general knowledge map has wider and wider coverage, but the depth of the general knowledge map far reaches the field application requirement aiming at the professional requirement of a specific field. The strong dependence of the financial field on data determines its rigid requirements for financial knowledge maps. The financial knowledge map can be used for realizing the knowledge and standardization of mass financial data, and can be visually displayed in a user-friendly mode, so that the burden of financial practitioners is greatly reduced. Meanwhile, compared with the medical field, the data of the financial field is relatively open, and relevant financial institutions integrate, so that the integrity is guaranteed to a certain extent.
Compared with the common knowledge graph, the dynamic knowledge graph increases the information of time dimension and completely records the evolution and trend of the graph structure along with the time. The difficulty of dynamic knowledge mapping is that dynamic data is difficult to collect, data inconsistency exists in multi-party information, and how to display the information in a user-friendly way. The industry currently has less investment in dynamic knowledge mapping research.
2. Learning algorithm
In the construction process of the dynamic financial knowledge map, a plurality of learning algorithms such as deep learning, representation learning and transfer learning are mainly used.
The deep learning is different from the shallow machine learning in that it is possible to automatically extract a feature with high abstraction from a large amount of data and perform model training. Mainstream models are classified into Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). The long short-Term Memory (LSTM) network is an extension of the RNN network, and mainly aims to solve the problems of gradient disappearance and gradient explosion generated by a long sequence in the training process of using the recurrent neural network.
Meaning learning refers to mapping an object into another space, such as f: X → Y. The conversion of text into numerical data is mainly discussed herein to enable text information to be efficiently encoded for algorithms such as machine learning, where Word2Vec is a classical Word representation learning algorithm proposed by Mikolov 2013.
The transfer learning refers to transferring the learned model parameters to a related new task to help improve the learning effect of the new model. At the core of transfer learning, the similarity between the source domain and the target domain is analyzed and found, and is utilized. If there is no similarity between the two tasks, but the application of the transfer learning is forced, a negative transfer (negative transfer) phenomenon will be generated, that is, the knowledge learned in the source domain will have negative effect on the learning in the target domain.
Reinforcement learning is an important machine learning algorithm, and the whole process imitates the process of animal learning. Animals learn to follow a set of incentive and punishment patterns, whereby rewarded behavior is reinforced and punished behavior is weakened. Reinforcement learning attempts to train an Agent (Agent) to obtain maximum Reward (Reward) by deciding what Action (Action) to take in what State (State).
3. Map construction related technology
The key technology of the current knowledge graph construction can be divided into three parts of entity identification, entity linkage and relationship extraction.
Entity Recognition, Named Entity Recognition (NER), refers to recognizing predefined special objects from text, such as person names, place names, organization names, etc. The entity recognition algorithm can be divided into a traditional entity recognition algorithm and a deep learning algorithm. Conventional entity recognition algorithms can be subdivided into: rule-based algorithms, unsupervised algorithms, and feature-based supervised algorithms. The common way of deep learning is to regard entity identification as a sequence labeling problem and then learn by using a deep learning correlation algorithm.
Entity Linking (EL), also known as Entity disambiguation, refers to mapping an Entity reference (Mention; e.g., "university of Beijing" or "Beida") in a text to a corresponding Entity in a knowledge base. The entity link is mainly divided into two parts, firstly a candidate set is generated, and then the candidate items are sorted. For the highest ranked candidate, it is also necessary to determine whether the entity is a reasonable mapping that the entity refers to, and if no reasonable mapping exists, NIL is returned. The candidate set is typically generated by counting the instances of the entity reference to the entity and then establishing a candidate set of entities for each entity reference. The main difficulty of entity linking is also how to measure the relevance of the candidates to the current reference.
After entity identification and entity linking of the text, the result is a heap of hashed entities. The entities correspond to nodes of the knowledge graph, and the relations correspond to edges of the knowledge graph, so that the relations among the entities are extracted from the corresponding linguistic data, and the graph construction process is complete. The relationship extraction task was introduced in 1998 as a task of MUC-7 in the information Understanding Conference (Message Understanding Conference: MUC). The development and the entity recognition task are very similar, and the development is from a naive template matching method to unsupervised learning and supervised learning and finally to a deep learning stage.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention aims to provide a dynamic financial knowledge graph construction method based on reinforcement learning and transfer learning.
Based on the A stock of listed company data, the invention utilizes the related background technology to construct a dynamic knowledge graph with a time dimension. The invention is based on the following steps:
1) and constructing a basic dynamic financial knowledge map through the A stock related structured data and semi-structured data.
2) And training a financial entity recognition model constructed by using BERT, BilSTM and CRF by using a transfer learning algorithm.
3) And training a financial entity link model by using the similarity characteristic and the prior knowledge characteristic.
4) And removing noise caused by remote supervision by using reinforcement learning, and training a financial relation classification model.
5) And designing and constructing a display website, and dynamically displaying the structural change of the knowledge graph along with time.
The technical scheme of the invention is as follows:
a dynamic financial knowledge map construction method based on reinforcement learning and transfer learning comprises the following steps:
1) constructing a financial knowledge map for the structured data and semi-structured data of each selected listed company; in the construction process of the financial knowledge graph, inserting entity designations corresponding to financial entities in the financial knowledge graph into a financial entity database according to the mapping relation between the entity designations and the financial entities;
2) for unstructured data related to a selected listed company, acquiring each sentence sequence containing an entity in the unstructured data, inquiring whether a tagged entity exists in the financial entity database for the entity of each sentence sequence, filtering out sentence sequences in which the entity does not exist in the financial entity database, taking a tag carried by the entity in the financial entity database as a mark of a corresponding entity in the sentence sequence, and taking the obtained sentence sequence set as a financial entity data set;
3) training a financial entity recognition model by utilizing the financial entity data set and the standard entity recognition data set;
4) carrying out financial entity recognition on unstructured data by using the financial entity recognition model trained in the step 3), and reserving a sentence with a unique entity named financial entity in the financial entity database; then, carrying out negative sampling on the sentence to obtain a financial entity link data set with balanced positive and negative samples, and then training a financial entity link model by using the financial entity link data set;
5) finding a financial entity corresponding to each entity designation in the unstructured data in the financial knowledge graph by using the trained financial entity link model, linking the found financial entity with the corresponding entity designation, and updating the financial knowledge graph;
6) and utilizing a financial relation extraction model to extract financial entity relations from the unstructured data and update the financial knowledge map to obtain a dynamically displayed financial knowledge map.
Further, in step 1), the method for constructing the financial knowledge graph comprises the following steps: firstly, constructing a financial entity set of a listed company and a relationship set among financial entities in the financial entity set on the basis of structured data and semi-structured data of the listed company; then, acquiring relevant entity information of each financial entity in the financial entity set as an additional attribute value of the corresponding financial entity; and then constructing the financial knowledge graph based on the financial entity set and the corresponding relation set of each listed company.
Further, constructing the financial entity identification model based on BERT, BilSTM and CRF; the financial entity recognition model sequentially comprises an input layer, a BERT layer, a bidirectional long-short term memory network layer and a conditional random field layer as output; the BERT layer is used for processing the word sequence input by the input layer to obtain the semantic code of the BERT and sending the semantic code to the bidirectional long-short term memory network layer; the bidirectional long and short term memory network layer is used for collecting bidirectional long sentence information, learning and expressing sentence semantics and transmitting the sentence semantics to the conditional random field layer; and the conditional random field layer is used for ensuring the legality of the predicted label according to the set constraint to obtain the labeling sequence.
Further, training the financial entity recognition model by using a transfer learning algorithm: firstly, carrying out primary training on the financial entity recognition model by using a standard entity recognition data set to obtain the financial entity recognition model capable of recognizing basic entity categories; and then changing the sizes of the mapping layer and the CRF layer according to the data type of the financial entity, reserving the weights of other network layers as initialization data, and finally using a financial entity data set to continuously train the financial entity recognition model to obtain the final financial entity recognition model.
Further, in step 5), the financial entity link model determines the financial entity corresponding to the entity designation based on the extracted text similarity feature and context similarity feature between the entity designation and the financial entity.
Further, the financial knowledge-graph comprises a plurality of financial entities, the generated relation between the financial entities exists in the financial knowledge-graph in the form of edge representation, the starting point and the ending point of each edge are the financial entities, and each edge is printed with a time stamp which represents the time generated by the edge.
Further, the edges are financial relationships including stockholder relationships, high-management relationships, corporate relationships, concept affiliations, industry affiliations, location affiliations, hosting relationships, management relationships, and product relationships.
Further, the financial relation extraction model is a relation classification algorithm model based on BERT and reinforcement learning, and comprises an example selector and a relation classifier; the example selector is used as a classifier, remote supervision data can be screened to serve as training data of the relation classifier, and the relation classifier optimizes the example selector through feedback obtained by a classification task; and optimizing the example selector and the relation classifier through a reinforcement learning algorithm.
Further, the financial entity includes: companies, institutions, funds, products, individuals, concepts, industries, and locations.
Further, the semi-structured data includes high pipe data, corporate data, and stockholder data.
Compared with the prior art, the invention has the following positive effects:
the dynamic financial knowledge map constructed by the invention has stronger practical application value because the change of the relevant attributes along with time can be displayed. The construction process and the algorithm both adopt the latest deep learning model and technology, provide a framework and an idea for constructing a professional dynamic map, and have good generalization.
Drawings
FIG. 1 is a schematic diagram of a dynamic financial knowledge graph building process.
FIG. 2 is a schematic diagram of transfer learning in a financial entity identification task.
FIG. 3 is a schematic diagram of a BERT and reinforcement learning based financial relationship classification model.
FIG. 4 is a diagram of a knowledge-graph visualization interface.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The dynamic financial knowledge graph construction process is shown in fig. 1. The map construction process can be roughly divided into two parts: firstly, a basic dynamic financial knowledge map is built based on semi-structured data and structured data, and secondly, knowledge is extracted from unstructured data under the supervision of the basic map to expand the map. The first part is mainly engineering work and relates to data processing, database construction and website construction; the second part is the focus of the patent of the invention, emphasizing the algorithm and model design.
First, data acquisition
The dynamic financial knowledge graph constructed in the invention is based on a large amount of crawled internet data, and comprises the following steps: a stock listed company list, basic information and brief introduction of listed company, main stockholders and circulating stockholders (quarterly updated), high-management information of listed company, news, bulletins and newspaper of listed company, etc.
Since this task requires the collection of a large amount of data, it requires that the crawler be able to efficiently capture the data and to cope with anti-crawler measures for each website. The crawler is realized based on the Scapy framework, the agent pool is constructed and updated, and the required data is crawled in parallel. The data captured by the crawler is stored in the MongoDB, so that the crawler can index the data quickly by a program.
Second, basic dynamic financial knowledge map construction
Based on the A stock listed company list obtained by the Xinlang finance and economics network and through the CNDBpedia and the snowball network company data, the invention constructs basic A stock listed company financial entities for listed companies one by one. And then, utilizing semi-structured data such as high-management data, company data and stockholder data to arrange and construct related personal entities and other company entities. Meanwhile, by means of a Tushare tool, fixed contents about stock concepts, industries, places and the like of listed companies are obtained, and then related entity information is obtained through encyclopedia, Wikipedia and CN-DBpedia and serves as an additional attribute value when the entity is constructed. Namely, for each obtained entity, the query is carried out through the three databases to obtain the attributes such as the brief description, the encyclopedia label and the like. After the basic financial entities are constructed, the relationship data existing in the semi-structured data and the structured data are extracted as the relationship set between the entities.
After the entity and relationship set is preliminarily obtained, information among related entities is further expanded and enriched through structured data (such as a knowledge base CN-DBpedia), and a small-sized network structure related to finance is extracted. Some information needs to be recorded during the construction process, such as the mapping of entity designations to entities, in which stock ranges the financial entity will appear, and so on. The related information and the constructed basic dynamic financial knowledge map help the map extension work based on the unstructured data.
Third, financial entity
An entity refers to things that exist objectively and are distinguishable from each other, such as people, places, and organizations. The summary herein divides 8 financial entities of particular interest based on the a shares of related data: companies, other institutions, funds, products, individuals, concepts, industries, and locations.
The entity definition of the invention aims at the financial field, and is more finely divided compared with a common entity identification data set. For the deep learning model constructed by the entity identification task, the functions of the modules such as feature extraction and the like constructed by the entity identification task are consistent. The general entity identification task and the financial entity identification task are very similar in aspects of data format, model setting, task purpose and the like, and are suitable for migration learning of the two tasks.
The usage data set in the model is divided into two parts: a standard entity identification data set and a financial entity identification data set. The standard entity identification data set refers to a data set used by the general Chinese entity identification task in the industry, and is used in the present document as a SIGHANBAKEOFF2006 data set, which comprises 3 basic entities and 7 mark symbols.
1. Financial entity identification
The financial entity recognition data set is characterized in that after data preprocessing is carried out on structured and semi-structured data, entity recognition is carried out on sentence sequences in unstructured data through an entity recognition interface of a Stanford CoreNLP tool, and a sentence sequence set with entities is obtained. During the basic dynamic financial knowledge map construction process, the mapping of each entity to a financial entity will be inserted into the financial entity database. And for each sentence sequence containing the entity, inquiring whether the database has a mark entity or not, and filtering out the sentence sequence in which the entity does not exist in the database. And finally, taking the labels carried by the entities in the financial entity database as the marks of the corresponding entities in the sentence sequence, wherein the obtained sentence sequence set is the financial entity data set.
In the entity recognition task, the invention is based on an algorithm combining a bidirectional long-short term memory network and a conditional random field, is assisted by a BERT language model as a financial entity recognition model, and is called BERT _ BilSTM _ CRF. The model comprises an input layer, a BERT layer, a bidirectional long-short term memory network layer and a conditional random field layer as output from bottom to top.
The BERT layer uses the word sequence as input, obtains the semantic code of the BERT and sends the semantic code to the bidirectional long-short term memory network layer; the bidirectional long and short term memory network layer collects bidirectional long sentence information, performs learning expression on sentence semantics and transmits the sentence semantics to the conditional random field layer; the conditional random field layer can add constraints to ensure the legality of the predicted label, and finally obtain a labeling sequence.
FIG. 2 depicts a transfer learning process in a financial entity identification task. Firstly, the model is preliminarily trained by using standard entity identification data to obtain a BERT _ BilSTM _ CRF model capable of identifying basic entity categories. And then changing the sizes of a mapping layer (Proj in figure 2) and a CRF layer according to the data type of the financial entity, reserving the weight of other network layers as initialization data, and finally training a BERT _ BilSTM _ CRF model by using financial entity identification data to obtain a final financial entity identification model.
2. Financial entity linking
Due to the natural ambiguity of natural language, the same entity designation may correspond to different entities in different contexts. Entity linking is an important way to resolve entity ambiguity by linking an entity designation chain to a graph entity with a unique identifier, thereby achieving disambiguation of the entity. Financial entity designations are also ambiguous, so linking financial entity designations in text is an essential step for the following relationship extraction task.
The invention firstly carries out financial entity recognition on unstructured data by means of the financial entity recognition model (namely BERT _ BilSTM _ CRF) trained above, and only sentences containing unique entity designated financial entities in a financial entity database are left for the accuracy of data sets. And then carrying out negative sampling on the screened sentences, namely randomly sampling an entity from the financial entity database, and taking the link of the entity to the entity as a negative sample. And finally, forming a financial entity link data set with balanced positive and negative examples, and taking the data set as a training set of financial entity link tasks.
The financial entity linking task in the invention refers to finding the corresponding financial entity in a financial entity relational database (namely, a basic dynamic financial knowledge map) for each entity in unstructured data after the unstructured text financial entity is identified. For the financial entity link task, the extracted features are mainly divided into two parts, one part is features based on similarity, and the other part is the corpus prior knowledge features. The similarity features are further divided into text similarity features and context similarity features. The text similarity feature mainly refers to similarity between entity names and entities, wherein Word similarity is measured by vector similarity represented by a trained Word2Vec model, Jaccard similarity and the like. The context similarity is mainly calculated by the semantic similarity of the context where the entity is named and the related content of the entity, vector representation of sentences is calculated by using pre-training models BERT and the like provided by Doc2Vec and Google in Wikipedia corpus training, and similarity calculation is carried out on the represented vectors. The corpus-based features are some features discovered in the process of constructing the basic dynamic financial knowledge map, for example, some financial entities only appear in the related data of some stocks, and the stock _ codes field in the financial entities is used as the record of the related stocks. And whether the set stock code tag appears in stock codes for each a-stock company related unstructured data may be a feature.
And finally, linking the financial entities by using a classification model of the support vector machine.
Four, financial relationship extraction
Financial relationships refer to the resulting connections between financial entities, in the form of representations of edges in a graph. In the dynamic financial knowledge graph defined in the invention, the starting point and the ending point of the financial relationship are financial entities, and simultaneously, as the acquired data have time attributes and describe the dynamic change of the graph, each financial relationship is printed with a timestamp which represents the time generated by the relationship. Based on the defined 8 financial entities, we design and construct 9 financial relationships together, including stockholder relationship, high-management relationship, legal relationship, concept affiliation, industry affiliation, location affiliation, hosting relationship, management relationship and product relationship.
The relation extraction is to judge whether two entities in a sentence contain a relation or not and to judge the relation type. The relationship classification refers to determining which relationship two entities in a sentence belong to, so that the relationship classification can be regarded as a subtask of relationship extraction. In the model of the present invention, if "NA" (Not Available) is used as a special relationship class, the relationship classification task is completely equivalent to the relationship extraction task.
The difficulty of the task of relation classification is also that a large amount of labeled data is needed, and the mainstream reference method is remote supervision. Remote supervision refers to training a model by making corresponding training data according to existing database knowledge. Traditional supervised learning may loosely assume that all statements containing two entities describe a relationship in a database between the two entities, which would inevitably introduce much noise into the data.
We constructed a model of a relationship classification algorithm based on BERT and reinforcement learning. As shown in FIG. 3, the algorithmic model is composed of two parts, an Instance Selector (Instance Selector) and a relationship Classifier (relationship Classifier). The example selector is used as a classifier, remote supervision data is screened to be used as training data of the relation classifier, and the relation classifier optimizes the example selector through feedback obtained by a classification task. And optimizing the example selector and the relation classifier through a reinforcement learning algorithm. The invention is based on the OpenNRE project of the Qinghua university, adds a BERT language representation module and an improved reinforcement learning part, and combines to form a financial relation classification model of the text.
And finally, identifying, linking and extracting the mass unstructured data obtained by crawling through the trained financial entity identification model, financial entity link model and financial relation extraction model. The dynamic financial knowledge map database of the invention is continuously perfected for the extracted knowledge.
Visualization system
A knowledge graph is essentially a knowledge base of graph structures, which fuses knowledge units from different sources and constructs a graph by interconnecting the units. In general, direct observation of atlas data is not an effective knowledge acquisition means, and a visualization technology converts a complex graph structure into an intuitive graph form for representation, and helps people to know and master knowledge atlases more effectively through a man-machine interaction means.
The invention designs and realizes a dynamic knowledge graph visualization system supporting interaction. The interface of the system is shown in fig. 4, and the interface mainly includes: (a) the system comprises a knowledge graph relation graph, (b) a time axis, (c) an entity information display view and (d) an entity retrieval frame.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A dynamic financial knowledge map construction method based on reinforcement learning and transfer learning comprises the following steps:
1) constructing a financial knowledge map for the structured data and semi-structured data of each selected listed company; inserting entity names corresponding to the financial entities in the financial knowledge graph into a financial entity database;
2) for unstructured data related to a selected listed company, acquiring each sentence sequence containing an entity in the unstructured data, inquiring whether a tagged entity exists in the financial entity database for the entity of each sentence sequence, filtering out sentence sequences in which the entity does not exist in the financial entity database, taking a tag carried by the entity in the financial entity database as a mark of a corresponding entity in the sentence sequence, and taking the obtained sentence sequence set as a financial entity data set;
3) training a financial entity recognition model by utilizing the financial entity data set and the standard entity recognition data set;
4) carrying out financial entity recognition on unstructured data by using the financial entity recognition model trained in the step 3), and reserving a sentence with a unique entity named financial entity in the financial entity database; then, carrying out negative sampling on the sentence to obtain a financial entity link data set with balanced positive and negative samples, and then training a financial entity link model by using the financial entity link data set;
5) finding a financial entity corresponding to each entity designation in the unstructured data in the financial knowledge graph by using the trained financial entity link model, linking the found financial entity with the corresponding entity designation, and updating the financial knowledge graph;
6) and utilizing a financial relation extraction model to extract financial entity relations from the unstructured data and update the financial knowledge map to obtain a dynamically displayed financial knowledge map.
2. The method of claim 1, wherein in step 1), the method for constructing the financial knowledge graph comprises: firstly, constructing a financial entity set of a listed company and a relationship set among financial entities in the financial entity set on the basis of structured data and semi-structured data of the listed company; then, acquiring relevant entity information of each financial entity in the financial entity set as an additional attribute value of the corresponding financial entity; and then constructing the financial knowledge graph based on the financial entity set and the corresponding relation set of each listed company.
3. The method of claim 1, wherein the financial entity identification model is constructed based on BERT, BiLSTM, and CRF; the financial entity recognition model sequentially comprises an input layer, a BERT layer, a bidirectional long-short term memory network layer and a conditional random field layer as output; the BERT layer is used for processing the word sequence input by the input layer to obtain the semantic code of the BERT and sending the semantic code to the bidirectional long-short term memory network layer; the bidirectional long and short term memory network layer is used for collecting bidirectional long sentence information, learning and expressing sentence semantics and transmitting the sentence semantics to the conditional random field layer; and the conditional random field layer is used for ensuring the legality of the predicted label according to the set constraint to obtain the labeling sequence.
4. The method of claim 1, 2 or 3, wherein the financial entity recognition model is trained using a transfer learning algorithm: firstly, carrying out primary training on the financial entity recognition model by using a standard entity recognition data set to obtain the financial entity recognition model capable of recognizing basic entity categories; and then changing the sizes of the mapping layer and the CRF layer according to the data type of the financial entity, reserving the weights of other network layers as initialization data, and finally using a financial entity data set to continuously train the financial entity recognition model to obtain the final financial entity recognition model.
5. The method of claim 1, wherein in step 5), the financial entity link model determines the financial entity corresponding to the entity designation based on the extracted text similarity features and context similarity features between the designated entity designation and the financial entity.
6. The method of claim 1, wherein the financial knowledgegraph comprises a plurality of financial entities, the connections generated between the financial entities are represented in the financial knowledgegraph as edges, the starting and ending points of the edges are the financial entities and each edge is imprinted with a timestamp representing the time at which the edge was generated.
7. The method of claim 6, wherein the edge is a financial relationship including a stockholder relationship, a high-governance relationship, a corporate relationship, a concept affiliation, an industry affiliation, a place affiliation, a hosting relationship, a management relationship, and a product relationship.
8. The method of claim 1, wherein the financial relationship extraction model is a BERT and reinforcement learning based relationship classification algorithm model comprising an instance selector and a relationship classifier; the example selector is used as a classifier, remote supervision data can be screened to serve as training data of the relation classifier, and the relation classifier optimizes the example selector through feedback obtained by a classification task; and optimizing the example selector and the relation classifier through a reinforcement learning algorithm.
9. The method of claim 1, wherein the financial entity comprises: companies, institutions, funds, products, individuals, concepts, industries, and locations.
10. The method of claim 1, wherein the semi-structured data comprises high pipe data, corporate data, and stockholder data.
CN201911322390.3A 2019-12-20 2019-12-20 Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning Pending CN110990590A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911322390.3A CN110990590A (en) 2019-12-20 2019-12-20 Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911322390.3A CN110990590A (en) 2019-12-20 2019-12-20 Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning

Publications (1)

Publication Number Publication Date
CN110990590A true CN110990590A (en) 2020-04-10

Family

ID=70073278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911322390.3A Pending CN110990590A (en) 2019-12-20 2019-12-20 Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning

Country Status (1)

Country Link
CN (1) CN110990590A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522965A (en) * 2020-04-22 2020-08-11 重庆邮电大学 Question-answering method and system for entity relationship extraction based on transfer learning
CN111581973A (en) * 2020-04-24 2020-08-25 中国科学院空天信息创新研究院 Entity disambiguation method and system
CN111666374A (en) * 2020-05-15 2020-09-15 华东师范大学 Method for integrating additional knowledge information into deep language model
CN111737594A (en) * 2020-06-24 2020-10-02 中网数据(北京)股份有限公司 Virtual network role behavior modeling method based on unsupervised label generation
CN111767368A (en) * 2020-05-27 2020-10-13 重庆邮电大学 Question-answer knowledge graph construction method based on entity link and storage medium
CN112101034A (en) * 2020-09-09 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Method and device for distinguishing attribute of medical entity and related product
CN112101029A (en) * 2020-08-18 2020-12-18 淮阴工学院 College instructor recommendation management method based on bert model
CN112100401A (en) * 2020-09-14 2020-12-18 北京大学 Knowledge graph construction method, device, equipment and storage medium for scientific and technological service
CN112905806A (en) * 2021-03-25 2021-06-04 哈尔滨工业大学 Knowledge graph materialized view generator and generation method based on reinforcement learning
CN113051365A (en) * 2020-12-10 2021-06-29 深圳证券信息有限公司 Industrial chain map construction method and related equipment
CN113220899A (en) * 2021-05-10 2021-08-06 上海博亦信息科技有限公司 Intellectual property identity identification method based on academic talent information intellectual map
CN113377884A (en) * 2021-07-08 2021-09-10 中央财经大学 Event corpus purification method based on multi-agent reinforcement learning
CN114398492A (en) * 2021-12-24 2022-04-26 森纵艾数(北京)科技有限公司 Knowledge graph construction method, terminal and medium in digital field
CN114491541A (en) * 2022-03-31 2022-05-13 南京众智维信息科技有限公司 Safe operation script automatic arrangement method based on knowledge graph path analysis
CN114626530A (en) * 2022-03-14 2022-06-14 电子科技大学 Reinforced learning knowledge graph reasoning method based on bilateral path quality assessment
CN114741526A (en) * 2022-03-23 2022-07-12 中国人民解放军国防科技大学 Knowledge graph cloud platform in network space security field
CN115759104A (en) * 2023-01-09 2023-03-07 山东大学 Financial field public opinion analysis method and system based on entity recognition
CN115796280A (en) * 2023-01-31 2023-03-14 南京万得资讯科技有限公司 Entity identification entity linking system suitable for high efficiency and controllability in financial field
CN117312578A (en) * 2023-11-28 2023-12-29 烟台云朵软件有限公司 Construction method and system of non-genetic carrier spectrum

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RUI MIAO, XIA ZHANG, HONGFEI YAN, CHONG CHEN: "A dynamic Financial Knowledge Graph Based on Reinforcement learning and Transfer learning", 《2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522965A (en) * 2020-04-22 2020-08-11 重庆邮电大学 Question-answering method and system for entity relationship extraction based on transfer learning
CN111581973A (en) * 2020-04-24 2020-08-25 中国科学院空天信息创新研究院 Entity disambiguation method and system
CN111666374A (en) * 2020-05-15 2020-09-15 华东师范大学 Method for integrating additional knowledge information into deep language model
CN111767368A (en) * 2020-05-27 2020-10-13 重庆邮电大学 Question-answer knowledge graph construction method based on entity link and storage medium
CN111737594A (en) * 2020-06-24 2020-10-02 中网数据(北京)股份有限公司 Virtual network role behavior modeling method based on unsupervised label generation
CN112101029A (en) * 2020-08-18 2020-12-18 淮阴工学院 College instructor recommendation management method based on bert model
CN112101029B (en) * 2020-08-18 2024-05-03 淮阴工学院 Bert model-based university teacher recommendation management method
CN112101034B (en) * 2020-09-09 2024-02-27 沈阳东软智能医疗科技研究院有限公司 Method and device for judging attribute of medical entity and related product
CN112101034A (en) * 2020-09-09 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Method and device for distinguishing attribute of medical entity and related product
CN112100401A (en) * 2020-09-14 2020-12-18 北京大学 Knowledge graph construction method, device, equipment and storage medium for scientific and technological service
CN112100401B (en) * 2020-09-14 2024-05-07 北京大学 Knowledge graph construction method, device, equipment and storage medium for science and technology services
CN113051365A (en) * 2020-12-10 2021-06-29 深圳证券信息有限公司 Industrial chain map construction method and related equipment
CN112905806A (en) * 2021-03-25 2021-06-04 哈尔滨工业大学 Knowledge graph materialized view generator and generation method based on reinforcement learning
CN113220899A (en) * 2021-05-10 2021-08-06 上海博亦信息科技有限公司 Intellectual property identity identification method based on academic talent information intellectual map
CN113377884A (en) * 2021-07-08 2021-09-10 中央财经大学 Event corpus purification method based on multi-agent reinforcement learning
CN114398492A (en) * 2021-12-24 2022-04-26 森纵艾数(北京)科技有限公司 Knowledge graph construction method, terminal and medium in digital field
CN114398492B (en) * 2021-12-24 2022-08-30 森纵艾数(北京)科技有限公司 Knowledge graph construction method, terminal and medium in digital field
CN114626530A (en) * 2022-03-14 2022-06-14 电子科技大学 Reinforced learning knowledge graph reasoning method based on bilateral path quality assessment
CN114741526B (en) * 2022-03-23 2024-02-02 中国人民解放军国防科技大学 Knowledge graph cloud platform in network space safety field
CN114741526A (en) * 2022-03-23 2022-07-12 中国人民解放军国防科技大学 Knowledge graph cloud platform in network space security field
CN114491541A (en) * 2022-03-31 2022-05-13 南京众智维信息科技有限公司 Safe operation script automatic arrangement method based on knowledge graph path analysis
CN115759104A (en) * 2023-01-09 2023-03-07 山东大学 Financial field public opinion analysis method and system based on entity recognition
CN115759104B (en) * 2023-01-09 2023-09-22 山东大学 Financial domain public opinion analysis method and system based on entity identification
CN115796280A (en) * 2023-01-31 2023-03-14 南京万得资讯科技有限公司 Entity identification entity linking system suitable for high efficiency and controllability in financial field
CN117312578A (en) * 2023-11-28 2023-12-29 烟台云朵软件有限公司 Construction method and system of non-genetic carrier spectrum
CN117312578B (en) * 2023-11-28 2024-02-23 烟台云朵软件有限公司 Construction method and system of non-genetic carrier spectrum

Similar Documents

Publication Publication Date Title
CN110990590A (en) Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
CN111428053B (en) Construction method of tax field-oriented knowledge graph
CN111737495B (en) Middle-high-end talent intelligent recommendation system and method based on domain self-classification
CN109271529B (en) Method for constructing bilingual knowledge graph of Xilier Mongolian and traditional Mongolian
CN106776711B (en) Chinese medical knowledge map construction method based on deep learning
CN112199511A (en) Cross-language multi-source vertical domain knowledge graph construction method
CN110502621A (en) Answering method, question and answer system, computer equipment and storage medium
CN112542223A (en) Semi-supervised learning method for constructing medical knowledge graph from Chinese electronic medical record
CN107180045B (en) Method for extracting geographic entity relation contained in internet text
CN103440287B (en) A kind of Web question and answer searching system based on product information structure
CN113806563B (en) Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material
CN109271505A (en) A kind of question answering system implementation method based on problem answers pair
CN111475623A (en) Case information semantic retrieval method and device based on knowledge graph
CN111639171A (en) Knowledge graph question-answering method and device
CN111651447B (en) Intelligent construction life-span data processing, analyzing and controlling system
WO2020010834A1 (en) Faq question and answer library generalization method, apparatus, and device
CN113535917A (en) Intelligent question-answering method and system based on travel knowledge map
CN112328800A (en) System and method for automatically generating programming specification question answers
CN111143574A (en) Query and visualization system construction method based on minority culture knowledge graph
CN113010663A (en) Adaptive reasoning question-answering method and system based on industrial cognitive map
CN115599899B (en) Intelligent question-answering method, system, equipment and medium based on aircraft knowledge graph
Miao et al. A dynamic financial knowledge graph based on reinforcement learning and transfer learning
CN106897274B (en) Cross-language comment replying method
CN114911893A (en) Method and system for automatically constructing knowledge base based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200410

WD01 Invention patent application deemed withdrawn after publication