CN109299442A - Chinese chapter primary-slave relation recognition methods and system - Google Patents

Chinese chapter primary-slave relation recognition methods and system Download PDF

Info

Publication number
CN109299442A
CN109299442A CN201811168250.0A CN201811168250A CN109299442A CN 109299442 A CN109299442 A CN 109299442A CN 201811168250 A CN201811168250 A CN 201811168250A CN 109299442 A CN109299442 A CN 109299442A
Authority
CN
China
Prior art keywords
chapter
primary
slave relation
identified
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811168250.0A
Other languages
Chinese (zh)
Inventor
王体爽
李培峰
朱巧明
周国栋
张玉华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201811168250.0A priority Critical patent/CN109299442A/en
Publication of CN109299442A publication Critical patent/CN109299442A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/221Parsing markup language streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of Chinese chapter primary-slave relation recognition methods, it include: the chapter unit markup information and chapter primary-slave relation type markup information read in mark collection of document, obtain chapter primary-slave relation set, left subtree conversion is carried out to the non-dualization chapter primary-slave relation in chapter primary-slave relation set, obtains binary chapter primary-slave relation set.The mark collection of document is the collection of document for being labelled with chapter primary-slave relation type, is the basis of training pattern of the present invention.Based on the chapter primary-slave relation identifying system and method for gate memory network, the chapter unit of prediction is converted to term vector, automatically captures the hidden feature between chapter unit using gate Memory Neural Networks, and relative to generally prior information;Method and system of the invention, compared with existing method and system, Chinese chapter primary-slave relation recognition performance is improved.

Description

Chinese chapter primary-slave relation recognition methods and system
Technical field
The present invention relates to discourse analysis technical fields, more particularly to Chinese chapter primary-slave relation recognition methods and system.
Background technique
Chapter refers to length and chapters and sections, is the research object of natural language understanding, is by semantic association and structured organization The natural language text of formation, chapter have connecting (cohesion), continuity (coherence), intention property (intentionality), acceptability (acceptability), informedness (informativity), scene (situationality) and across chapter property (intertextuality) totally seven essential characteristics.Discourse analysis includes structure structure It builds, three primary and secondary identification, relationship classification subtasks.Wherein, chapter primary-slave relation illustrates main and secondary interior inside chapter Relationship between appearance.Main contents refer to the part for occuping ascendancy in chapter, playing a decisive role, and minor coverage refers to a piece The part for occuping secondary status in chapter, not playing a decisive role.Chapter primary-slave relation is divided into monokaryon relationship and multicore relationship in RST. Wherein, monokaryon relationship includes core (Nucleus) and satellite (Statellite), and core expresses main contents, satellite expression time Want content.Multicore relationship includes two or more cores.So there is following three types: core in chapter primary-slave relation The heart-satellite (NS) indicates that left subtree is major part, satellite-core (SN) expression right subtree is major part, core-core (NN) indicate that left and right subtree is all major part.Wherein NS, SN belong to monokaryon relationship, and NN belongs to multicore relationship.Chapter primary and secondary is closed The research object of system be sentence, sentence group, the semantic association between paragraph and their importance relationship performance be continuity this A chapter essential characteristic, the purpose of chapter primary-slave relation research are to analyze the main contents and minor coverage of chapter, and then understand Chapter theme, expansion thinking and main contents.One chapter relationship generally comprises two chapter units, the two chapter lists Position belongs to a relation layer, if one of chapter unit can summarize its place relation layer purport and content, can represent it Place relation layer is related with the external world, then this relationship is monokaryon relationship;If two chapter units are of equal importance, this Relationship is multicore relationship.For example, one is statement item, and one is to lift in two chapter units of statement citing relationship connection Item, citing item be for state item service, therefore state item be the chapter relationship core, statement-citing relationship is monokaryon Relationship;In coordination, chapter unit can there are two or it is multiple, the core of coordination may be by an one or more pieces Zhang Danwei is served as, i.e., coordination may be monokaryon relationship, it is also possible to which multicore closes.
Below with a concrete example in Chinese chapter treebank (Chinese Discourse Treebank, CDTB) [1] Sub (chtb_0019, " Ningbo Bonded area construction achievement is significant ") illustrates the meaning of chapter primary-slave relation.
Example 1: Chinese to, in relation to the adjustment of distinctive policy, bonded area is exempted to demonstrate,prove, be exempted from outside bonded area with starting April from this year Tax a, the stability advantage of bonded policy, which seems, becomes apparent b, and domestic and international large quantities of industry processing projects settle in area in succession c.Added up to set up 1,614 d of enterprise at the bottom of December last year, in area, gross investment up to 1,200,000,000 dollars of e, wherein 260 f of foreign-investment enterprise, 1 point 103 hundred million dollar of g of the disbursement of foreign capital.In addition, numerous domestic enterprises are also by guarantor Integrate with the world market h in tax area.
1 paragraph of example includes 8 (a-h) basic chapter units, and structure of an article tree is as shown in Figure 1.Wherein, leaf node (a- It h) is basic chapter unit (Elementary Discourse Units), and non-leaf nodes is relationship node, indicates the section Relationship type between two children of point connection.Leaf node and relationship node are referred to as chapter unit (Discourse Unit, DU).Arrow is directed toward the core cell of the more pith in chapter primary-slave relation, such as the causality node in example 1 The reason of left child a-b is right child c thinks that result is more important in this example, therefore arrow is directed toward right child, i.e. leaf node C is core, their relationship is satellite-core.In example 1, since root node shown in FIG. 1, core cell is selected every time Until leaf node, available (domestic and international large quantities of industry processing projects settle c in area in succession) this chapter is substantially single Member can be used as the abstract of entire paragraph.
Currently, the corpus base resource for being related to chapter primary-slave relation mainly has English rhetorical structure chapter treebank (RST Discourse Treebank, RST-DT) [1] and Chinese chapter treebank CDTB [2].Most of chapter primary-slave relation Study of recognition It concentrates on English corpus RST-DT, these researchs generally regard the identification of chapter primary-slave relation in the analysis of chapter rhetorical structure as An auxiliary link, have ignored its structure of an article analysis in importance.
On RST-DT treebank, the method that most of researchs use is based on support vector machines (Support VectorMachine, SVM) and condition random field (Conditional Random Fields, CRF) model and their change The conventional machines learning method such as body.Hernault [3] etc. uses two SVM, realizes a bottom-up automatic building chapter The frame of tree.Joty [4] etc. has used two dynamic condition random fields according to the otherness in the relationship distribution in sentence between sentence In model construction sentence between sentence two levels discourse analysis device, and the building progress using dynamic programming algorithm to chapter tree Optimization.Feng [5] is divided using two groups of linear conditions random field models to chapter relationship area and chapter primary-slave relation makes knowledge Not.The building of chapter tree is converted to shift-reduce sequence using the method based on transfer by Wang [6], proposes first mark knot Structure-primary and secondary, then carry out two step models of label for labelling.Li [7] proposition is indicated using dependency structure between two chapter units Relationship.
It is less using the correlative study of neural network method on RST-DT treebank.Li [8] uses two layers of Feedforward Neural Networks Network is obtained using recurrent neural network by calculating the subtree of chapter unit to determine the relationship between two chapter units Take the expression of the chapter unit.Li [9] proposes a kind of layering Bi-LSTM network based on Attention to learn chapter list The expression of member, and use the correlation captured between chapter element characteristic based on the transforming function transformation function of tensor.
Relative to RST-DT, the researches based on CDTB corpus.Li [10] using contextual feature, lexical feature, Dependency tree feature carries out primary-slave relation identification using maximum entropy model.Kong [11] use semantic similarity, contextual feature, Using maximum entropy model, a structure of an article analyzer end to end is constructed.Xu [12] proposes a TMN (Text Matching Networks) model, using Bi-LSTM and CNN to pass through after two chapter cell encodings three kinds of matching relationships into The identification of row primary-slave relation, performance of their method on CDTB corpus have reached 69.0 (micro- average F1), hence it is evident that better than biography The Feature Engineering method of system.
Bibliography:
[1]Carlson L,Okurowski M E,Marcu D.RST discourse treebank[M] .Linguistic Data Consortium,University of Pennsylvania.2002.
[2]Li Y,Kong F,Zhou G.Building Chinese discourse corpus with connective-driven dependency tree structure[C]//Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing.2014:2105- 2114.
[3]Hernault H,Prendinger H,Ishizuka M.HILDA:A discourse parser using support vector machine classification[J].Dialogue&Discourse,2010,1(3):1-33.
[4]Joty S,Carenini G,Ng R,et al.Combining intra-and multi-sentential rhetorical parsing for document-level discourse analysis[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics.2013,1:486-496.
[5]Feng V W,Hirst G.A linear-time bottom-up discourse parser with constraints and postediting[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.2014:511-521.
[6]Wang Y,Li S,Wang H.A two-stage parsing method for text-level discourse analysis[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.2017:184-188.
[7]Li S,Wang L,Cao Z,et al.Text-level discourse dependency parsing [C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.2014:25-35.
[8]Li J,Li R,Hovy E.Recursive deep models for discourse parsing[C]// Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing.2014:2061-2069.
[9]Li Q,Li T,Chang B.Discourse parsing with attention-based hierarchical neural networks[C]//Proceedings of the 2016 Conference on Empirical Methodsin Natural Language Processing.2016:362-371.
[10] the Li Yan kingfisher Chinese structure of an article indicates system and resource construction studies the Suzhou [D]: University Of Suzhou doctor learns Degree thesis whole-length, 2015.
[11]Kong F,Zhou G.A CDT-styled end-to-end Chinese discourse parser [J].ACM Transactions on Asian and Low-Resource Language Information Processing.2016(4):26.
[12]Xu S,Li P,Zhou G,et al.Employing text matching network to recognise nuclearity in Chinese discourse[C]//Proceedings of the 2018 International Conference on Computational Linguistics.2018: there are following skills for traditional technology Art problem:
On CDTB corpus, the TMN model that Xu [13] is proposed is best in primary-slave relation identification mission performance.TMN model Main thought have following two points: 1) think that two bigger chapter units of semantic similarity are more likely multicore relationships.2) Thinking closer with paragraph topic in monokaryon relationship is more likely core.Based on this two o'clock thought, TMN model is introduced The semantic similarity of two DU and the similarity of each DU and paragraph topic.
TMN model is easy the higher non-multicore relationship wrong identification of semantic similarity to be multicore relationship.As shown in example 2, In example " agricultural " and " grain ", the similarity-rough set of " harvest " and " yield " is high, and example 2 is accidentally identified as multicore and closed by TMN model System.
Example 2: agricultural is obtained compared with good harvest a, and annual total output of grain is up to 7 points 6 kilograms of b.(NS relationship)
In addition, when two chapter unit sequence length imbalances, based on TMN is partial to identify longer chapter unit Part is wanted, and shorter chapter unit is identified as secondary part.As shown in example 3, the word sequence length of two EDU is very uneven Weighing apparatus, b contains more information, when being closer to paragraph topic by b can be obtained after matching relationship, and a and paragraph master The relationship of topic will more become estranged relative to b, and example 3 is accidentally identified as satellite-Key Relationships by TMN model.
Example 3: economic benefits a turns over 25,800,000,000 yuan of expenses of taxation realization and generates profit 10,000,000,000 yuan of b.(NS is closed System)
Summary of the invention
Based on this, it is necessary in view of the above technical problems, a kind of Chinese chapter primary-slave relation recognition methods and system are provided, Chinese chapter primary-slave relation recognition performance is obviously improved.
A kind of Chinese chapter primary-slave relation recognition methods, comprising:
The chapter unit markup information and chapter primary-slave relation type markup information in mark collection of document are read, a piece is obtained Chapter primary-slave relation set carries out left subtree conversion to the non-dualization chapter primary-slave relation in chapter primary-slave relation set, obtains Binary chapter primary-slave relation set;Wherein, the mark collection of document is the document for being labelled with chapter primary-slave relation type Set;
Tool is segmented to each chapter cell call in binary chapter primary-slave relation set, each chapter primary and secondary is obtained and closes The word feature of system generates binary chapter primary-slave relation word characteristic set;To every in binary chapter primary-slave relation word characteristic set One chapter cell call part of speech extraction tool obtains the part of speech feature of each all word of chapter unit, generates binary chapter Primary-slave relation word and part of speech feature set;
Document is used and marked to each of each of collection of document to be identified document to be identified chapter unit Gather identical processing method and obtains binary chapter primary-slave relation word to be identified and part of speech feature set;Wherein, described to be identified Collection of document is the collection of document for not marking chapter primary-slave relation type;
By each chapter unit structure of chapter primary-slave relation each in binary chapter primary-slave relation word and part of speech feature set The receptible input form of neural network is caused, mark file characteristics input set is obtained;One is built using deep learning tool Mark file characteristics input set cooperation is input, training chapter master by a neural network classifier based on gate memory network Secondary relation recognition model;To each chapter element characteristic in binary chapter primary-slave relation word to be identified and part of speech feature set Identical input format is gathered in construction and mark file characteristics input, obtains file characteristics input set to be identified, then will be wait know Other file characteristics input set cooperation be mode input the chapter primary-slave relation in collection of document to be identified is identified, obtain to It identifies type belonging to each chapter primary-slave relation in collection of document, generates document chapter primary-slave relation set of types to be identified It closes.
A kind of Chinese chapter primary-slave relation identifying system, comprising:
Chapter primary-slave relation chapter unit abstraction module reads chapter unit markup information and a piece in mark collection of document Chapter primary-slave relation type markup information, obtains chapter primary-slave relation set, to the non-dualization piece in chapter primary-slave relation set Chapter primary-slave relation carries out left subtree conversion, obtains binary chapter primary-slave relation set;The mark collection of document is to have marked The collection of document of chapter primary-slave relation type;
Chapter primary-slave relation chapter unit preprocessing module, to each chapter unit in binary chapter primary-slave relation set Participle tool is called, the word feature of each chapter primary-slave relation is obtained, generates binary chapter primary-slave relation word characteristic set;To two Each of first chapter primary-slave relation word characteristic set chapter cell call part of speech extraction tool, obtains each chapter unit The part of speech feature of all words generates binary chapter primary-slave relation word and part of speech feature set;
Chapter primary-slave relation chapter cell processing module to be identified, to each of collection of document to be identified text to be identified Each of shelves chapter unit uses and marks the identical processing method of collection of document, obtains binary chapter primary and secondary to be identified and closes Copula and part of speech feature set;Wherein, the collection of document to be identified is the collection of document for not marking chapter primary-slave relation type;
Chapter primary-slave relation identification module to be identified, by each piece in binary chapter primary-slave relation word and part of speech feature set Each chapter unit of chapter primary-slave relation is configured to the receptible input form of neural network, obtains mark file characteristics input set It closes;The neural network classifier based on gate memory network is built using deep learning tool, it is defeated by file characteristics are marked Enter set as input, training chapter primary-slave relation identification model;To binary chapter primary-slave relation word to be identified and part of speech feature Each chapter element characteristic in set also constructs and marks file characteristics input and gathers identical input format, obtains text to be identified Then file characteristics input set cooperation to be identified is mode input to the piece in collection of document to be identified by shelves feature input set Chapter primary-slave relation is identified, is obtained type belonging to each chapter primary-slave relation in collection of document to be identified, is generated wait know Other document chapter primary-slave relation type set.
Above-mentioned chapter primary-slave relation identifying system and method based on gate memory network, the chapter unit of prediction is converted At term vector, the hidden feature between chapter unit is automatically captured using gate Memory Neural Networks, and relative to entirety For prior information;Method and system of the invention, compared with existing method and system, the identification of Chinese chapter primary-slave relation Performance is improved.
A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage The step of computer program, the processor realizes any one the method when executing described program.
A kind of computer readable storage medium, is stored thereon with computer program, realization when which is executed by processor The step of any one the method.
A kind of processor, the processor is for running program, wherein described program executes described in any item when running Method.
Detailed description of the invention
Fig. 1 is the schematic diagram of the structure of an article tree of example 1 in background technique provided by the embodiments of the present application.
Fig. 2 is the flow chart of Chinese chapter primary-slave relation recognition methods of the present invention.
Fig. 3 is the flow chart that chapter primary-slave relation chapter unit of the present invention extracts.
Fig. 4 is the pretreated flow chart of chapter primary-slave relation chapter unit of the present invention.
Fig. 5 is the flow chart of present invention chapter primary-slave relation chapter cell processing to be identified.
Fig. 6 is the flow chart of present invention chapter primary-slave relation identification to be identified.
Fig. 7 is the structural schematic diagram of Chinese chapter primary-slave relation identifying system of the present invention.
Fig. 8 is the structural schematic diagram of chapter primary-slave relation chapter unit abstraction module of the present invention.
Fig. 9 is the structural schematic diagram of chapter primary-slave relation chapter unit preprocessing module of the present invention.
Figure 10 is the structural schematic diagram of present invention chapter primary-slave relation chapter cell processing module to be identified.
Figure 11 is the structural schematic diagram of present invention chapter primary-slave relation identification module to be identified.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Chapter primary-slave relation (Nuclearity Recognition): main between minor coverage inside expression chapter Relationship.
Core (Nucleus): the main contents in chapter primary-slave relation refer to and occupy ascendancy in chapter, play decision work Part.
Satellite (Statellite): the minor coverage in chapter primary-slave relation refers to and occupy secondary status in chapter, do not rise certainly Tailor-made part.
Monokaryon relationship (Mono-nuclearrelation): the chapter primary-slave relation type comprising a core.
Multicore relationship (Multi-nuclearrelation): the chapter primary-slave relation comprising two or more cores Type.
Core-satellite (Nucleus-Statellite): left subtree is major part in chapter primary-slave relation, and right subtree is Secondary part.
Satellite-core (Statellite-Nucleus): right subtree is major part in chapter primary-slave relation, and left subtree is Secondary part.
Core-core (Nucleus-Nucleus): left and right subtree is all major part in chapter primary-slave relation.
Basic chapter unit (ElementaryDiscourse Units): clause, sentence or phrase.
Accuracy rate (Precision): to the chapter primary-slave relation type of particular category, the chapter primary and secondary that system correctly identifies Relationship number accounts for the ratio of all identified category chapter primary-slave relation quantity.Measure chapter primary-slave relation distinguishing indexes it One.
Recall rate (Recall): to the chapter primary-slave relation type of particular category, the chapter primary and secondary that system correctly identifies is closed It is the ratio that number accounts for all category chapter primary-slave relation quantity.Measure one of chapter primary-slave relation distinguishing indexes.
F1 index (F1-Measure): one of the overall target that chapter primary and secondary closes recognition performance, recall rate (R) and standard are measured The weighted geometric mean of true rate (P), it may be assumed that
In order to solve technical problem present in traditional technology, herein gate memory network (GMN Gated MemoryNetwork) it is applied to primary-slave relation identification mission, realizes the Chinese primary-slave relation identification model based on GMN (GMN-Nu).Its framework includes three parts: 1) inputting and encode;2) the gate memory network of multilayered structure;3) primary-slave relation Identification.
Firstly, model is input with the word of two chapter cells Ds U1 and DU2 and part of speech in input and coding layer part.Make With Bi-LSTM and CNN to two chapter cell encodings, to obtain global information and local message.By taking example 1 as an example, DU1/DU2 can Be 1 structure of an article tree of example any one relationship node left and right child.
Secondly, extracting each DU from Global Information using multilayer gate memory network in gate memory network layer part It is indicated relative to generally prior semantic information.This method merges the information of DU1 and DU2 to obtain Global Information, leads to It crosses sigmoid and calculates door control unit, act on DU1 and DU2 to obtain relative to whole prior semantic information.
Finally, carrying out primary-slave relation identification using softmax in primary-slave relation identification layer.Method and system of the invention, Compared with existing method and system, Chinese chapter primary-slave relation recognition performance is obviously improved.
A kind of Chinese chapter primary-slave relation recognition methods, as shown in Figure 2, comprising:
S10 reads chapter unit markup information and chapter primary-slave relation type markup information in mark collection of document, obtains To chapter primary-slave relation set, left subtree conversion is carried out to the non-dualization chapter primary-slave relation in chapter primary-slave relation set, Obtain binary chapter primary-slave relation set.The mark collection of document is the document sets for being labelled with chapter primary-slave relation type It closes, is the basis of training pattern of the present invention.
Wherein, as shown in figure 3, S10 detailed process is as follows:
S101 reads the chapter unit markup information and chapter primary-slave relation type mark in mark collection of document according to label Information is infused, chapter primary-slave relation set is obtained.Example format is as follows in the chapter primary-slave relation set:
" chapter unit 1 | chapter unit 2 | ... ", " chapter primary-slave relation type "
By taking example 4 as an example,
Example 4:
< Sentence=" but this legal system closely follow the movable way of economy and society, receive the good of domestic and international investor It comments, | they think that investing working to Pudong New District has the art of composition, say that rule, interests can be protected." Center=" 1 "/>
They think that investing working to Pudong New District has the art of composition to < Sentence=", | say rule, | interests can be protected Barrier." Center=" 3 "/>
Wherein, Sentence indicates the chapter unit that chapter primary-slave relation is included, and is separated between chapter unit with " | ", One chapter primary-slave relation includes two or more chapter units.Center illustrates chapter primary-slave relation type, In " 1 " expression " core-satellite " relationship, " 2 " expression " satellite-core " relationship, " 3 " expression " core-core " relationship.The example In two chapter primary-slave relation examples are marked.
Chapter primary-slave relation in example 4 indicates are as follows:
Example 5:
" but this legal system closely follows the movable way of economy and society, receives the favorable comment of domestic and international investor, | they recognize For investing working to Pudong New District has the art of composition, says that rule, interests can be protected.", " 1 "
" they think that investing working to Pudong New District has the art of composition, | say rule, | interests can be protected.", " 3 "
Content representation chapter unit in example 5 in first quotation marks is separated between chapter unit, second quotation marks with " | " In content representation chapter primary-slave relation type.It is separated between chapter unit and chapter primary-slave relation type with comma.
S102 converts binary for the chapter primary-slave relation of non-dualization according to chapter primary-slave relation set obtained The chapter primary-slave relation of change obtains binary chapter primary-slave relation set.
Specifically, all chapter primary-slave relations in chapter primary-slave relation set are carried out judging whether it is n-tuple relation, by It is separated between chapter unit with " | ", therefore is carried out each of chapter primary-slave relation set chapter primary-slave relation using " | " Chapter unit cutting is n-tuple relation if chapter unit number is greater than 2, carries out left subtree conversion: for one comprising multiple The chapter primary-slave relation node of child, first by chapter primary-slave relation node first child node from left to right and second Child nodes combine generation chapter primary-slave relation node identical with former father node, using the node as the first of former father node A child nodes, successively iteration, until reaching the most right child nodes of former father node.In this way, including n chapter list for one The polynary chapter primary-slave relation of member generates n-1 binary chapter primary-slave relation after conversion.A binary piece is obtained after conversion Chapter primary-slave relation set.For example, conversion front and back chapter primary-slave relation format is as follows for ternary relation:
Before conversion:
" chapter unit 1 | chapter unit 2 | chapter unit 3 ", " chapter primary-slave relation type "
After conversion:
" chapter unit 1 | chapter unit 2 ", " chapter primary-slave relation type "
" chapter unit 1+ chapter unit 2 | chapter unit 3 ", " chapter primary-slave relation type "
The chapter primary-slave relation described in example 5: " ", they thought that investing working to Pudong New District has the art of composition, | say rule, | Interests can be protected.", " 3 " " and it is a n-tuple relation, including three chapter units, respectively " they think, new to Pudong Area's investment handles affairs and has the art of composition, and ", " says rule, and ", " interests can be protected.", relationship type is " core-core ".
It is indicated after carrying out dualization conversion to chapter primary-slave relation above-mentioned in example 5 are as follows:
Example 6:
" they think that investing working to Pudong New District has the art of composition, | say rule, ", " 3 "
" they think that investing working to Pudong New District has the art of composition, say rule, | interests can be protected.", " 3 "
It is two binary chapter primary-slave relations in example 6, i.e., converts non-binary crelation to the expression-form after binary crelation, Each chapter primary-slave relation only includes two chapter units.
S20 segments tool to each chapter cell call in binary chapter primary-slave relation set, obtains each chapter master The word feature of secondary relationship generates binary chapter primary-slave relation word characteristic set.To in binary chapter primary-slave relation word characteristic set Each chapter cell call part of speech extraction tool, obtain the part of speech feature of each all word of chapter unit, generate binary Chapter primary-slave relation word and part of speech feature set.
Wherein, as shown in figure 4, S20 detailed process is as follows:
S201 segments tool to each chapter cell call in binary chapter primary-slave relation set, obtains each chapter The word feature of primary-slave relation generates binary chapter primary-slave relation word characteristic set.Punctuation mark is removed during participle.Format It is as follows:
" 1 word of word, 2 word 3 ... | 1 word of word, 2 word 3 ... ", " chapter primary-slave relation type "
It is wherein separated between word and word with space, is separated between chapter unit with " | ".
For example, " they think that investing working to Pudong New District has the art of composition, say rule, | interests can be protected.", " 3 ", It is indicated after being segmented are as follows:
Example 7: " they think that investing working to Pudong New District has the art of composition to say rule | interests can be protected ", " 3 "
It is indicated in example 7 are as follows: " 1 set of words of chapter unit | 2 set of words of chapter unit ", " chapter primary-slave relation type ".
S202, to each of binary chapter primary-slave relation word characteristic set chapter cell call part of speech extraction tool, The part of speech feature of each all word of chapter unit is obtained, binary chapter primary-slave relation word and part of speech feature set are generated.Format It is as follows.
" 1 word of word, 2 word 3 ... 1 part of speech of part of speech, 2 part of speech 3 | 1 word of word, 2 word 3 ... 1 part of speech of part of speech, 2 part of speech 3 ", " chapter primary and secondary close Set type "
It is wherein separated between word and word with space, is also separated with space between part of speech and part of speech, used between word and part of speech " " It separates, uses " | " to separate between chapter unit.Word in each chapter unit and part of speech are corresponded.
For example, being indicated after carrying out part of speech feature extraction to the chapter unit in example 7 are as follows:
Example 8: " they think to Pudong New District to invest working have the art of composition say it is well-behaved r v v nv v v n v n | interests energy Be protected n v v v ", " 3 "
It is indicated in example 8 are as follows: " 1 Ci Jihe chapter unit of chapter unit, 1 part of speech set | 2 Ci Jihe chapter list of chapter unit First 2 part of speech set ", " chapter primary-slave relation type ".
S30, to each of each of collection of document to be identified document to be identified chapter unit use and S10, The identical method of S20 obtains binary chapter primary-slave relation word to be identified and part of speech feature set.The collection of document to be identified is The collection of document of chapter primary-slave relation type is not marked
Wherein, as shown in figure 5, S30 detailed process is as follows:
S301, using collection of document to be identified as input, invocation step S101 generates chapter primary-slave relation collection to be identified It closes.Example format is as follows in the chapter primary-slave relation set to be identified:
" chapter unit 1 | chapter unit 2 | ... "
By taking example 9 as an example,
Example 9:
Nearly 20,000,000,000 dollars of province's total volume of imports and exports of < Sentence=" last year, | the disbursement of foreign capital is more than 40 Hundred million dollars, | the overseas project contracting and labor service cooperation amount of money is up to 3 points 505 hundred million dollars.">
As the chapter unit in example 9 is expressed as:
Example 10:
" nearly 20,000,000,000 dollars of province's total volume of imports and exports of last year, | the disbursement of foreign capital is more than 4,000,000,000 dollars, | externally Undertaking contracted projects and the service cooperation amount of money are up to 3 points 505 hundred million dollars."
S302, using chapter primary-slave relation set to be identified as input, invocation step S102 generates binary chapter to be identified Primary-slave relation set.
If example 10 show polynary chapter primary-slave relation, it is after conversion
Example 11:
" nearly 20,000,000,000 dollars of province's total volume of imports and exports of last year, | the disbursement of foreign capital is more than 4,000,000,000 dollars, "
" nearly 20,000,000,000 dollars of province's total volume of imports and exports of last year, the disbursement of foreign capital is more than 4,000,000,000 dollars, | externally Undertaking contracted projects and the service cooperation amount of money are up to 3 points 505 hundred million dollars."
S303, using binary chapter primary-slave relation set to be identified as input, invocation step S201 generates binary to be identified Chapter primary-slave relation word characteristic set.
Such as example " nearly 20,000,000,000 dollars of province's total volume of imports and exports of last year, | the disbursement of foreign capital is more than 4,000,000,000 beauty Member, ", it is indicated after being segmented are as follows:
Example 12:
" nearly 20,000,000,000 dollars of province's total volume of imports and exports of last year | the disbursement of foreign capital is more than 4,000,000,000 dollars "
S304, using binary chapter primary-slave relation word characteristic set to be identified as input, invocation step S202 is generated wait know Other binary chapter primary-slave relation word and part of speech feature set.
It is indicated after such as carrying out part of speech feature extraction to the chapter unit in example 12 are as follows:
Example 13:
" the last year nearly 20,000,000,000 Mei Yuan t r n v n m q of province's total volume of imports and exports | the disbursement of foreign capital is more than four 1000000000 dollars n n v m q "
S40, by each chapter list of chapter primary-slave relation each in binary chapter primary-slave relation word and part of speech feature set Member is configured to the receptible input form of neural network, obtains mark file characteristics input set.It is taken using deep learning tool The neural network classifier based on gate memory network is built, is input, a training piece by mark file characteristics input set cooperation Chapter primary-slave relation identification model.It is special to each chapter unit in binary chapter primary-slave relation word to be identified and part of speech feature set Sign also constructs and marks file characteristics input and gathers identical input format, obtains file characteristics input set to be identified, then will File characteristics input set cooperation to be identified is that mode input identifies the chapter primary-slave relation in collection of document to be identified, is obtained The type belonging to each chapter primary-slave relation into collection of document to be identified generates document chapter primary-slave relation type to be identified Set.
Wherein, as shown in fig. 6, S40 detailed process is as follows:
S401, by each chapter list of chapter primary-slave relation each in binary chapter primary-slave relation word and part of speech feature set Member causes the receptible input form of neural network, obtains mark file characteristics input set.
Mark the word and part of speech feature of each chapter unit of each chapter primary-slave relation in file characteristics input set.
It is specific as follows:
Splice the word and part of speech of each chapter unit in word sequence, form is Wi=[ei, pi], and ei indicates word, pi Indicate part of speech.Input as neural network.
To feature carry out it is vector initialising, wherein vocabulary use on wikipedia Chinese corpus the good word of pre-training to Amount, dimension is 300 dimensions, and part of speech feature takes the strategy of random initializtion, and dimension is 50 dimensions.
S402 builds the neural network point based on gate memory network using deep learning tool (such as keras) Mark file characteristics input set cooperation is input, training chapter primary-slave relation identification model by class device.
The specific method is as follows:
For two chapter units in a binary chapter primary-slave relation feature input using it as nerve net The input of network.
The shot and long term memory network in deep learning tool is called to encode the input of each chapter unit, to obtain The global information for obtaining each chapter unit indicates.Shot and long term memory network dimension is set as 50.
The convolutional neural networks in deep learning tool are called to carry out the global information of each chapter unit of acquisition Coding, while the information that global maximum pond chemical industry tool chooses each chapter unit in deep learning tool being called to indicate.Convolution Neural network convolution kernel number is set as 1024, and convolution kernel window size is set as 2.
The information for merging two chapter units obtained indicates to obtain Global Information, specifically: u=v1 ⊕ v2.Wherein u Indicate Global Information, v1v2 indicates that the information of two chapter units indicates, ⊕ indicates to be added by element.
Door control unit is calculated by Global Information, specifically: g=sigmoid (Wu+b), wherein g is the door obtained Unit is controlled, W is parameter matrix, and b is bias matrix, and sigmoid function is used as the threshold function table of neural network, variable mappings are arrived Between (0,1).
V1, v2 are indicated using the information that the door control unit of acquisition is respectively acting on two chapter units, specifically: o1=g ⊙ v1, o2=g ⊙ v2, o1, o2 indicate two chapter unit recall infos.⊙ is indicated by element multiplication.
By o1, o2 for acquiring replace step (4) in v1, v2 repeat step (4) (5) (6) twice, by obtained o1, O2 obtains final Global Information c using the method for step (4).
The feedforward neural network in deep learning tool is called, using c as the input of feedforward neural network, is finally called deep Softmax function classifies to chapter primary-slave relation type in degree learning tool, realizes that chapter primary-slave relation identifies nerve net Network model.Softmax function is normalization exponential function, by each element probabilistic contraction between (0,1), and is owned Element probability and be 1, take maximum probability element be chapter primary-slave relation type.
S403, by each chapter element characteristic in binary chapter primary-slave relation word to be identified and part of speech feature set also structure It makes and marks file characteristics input and gather identical input format, obtain file characteristics input set to be identified,
S404 calls the network Chinese chapter primary-slave relation identification model based on gate memory nerve net to each chapter primary and secondary Relationship carries out relationship type identification, obtains type belonging to each chapter primary-slave relation in collection of document to be identified.Finally give birth to At document chapter primary-slave relation type set to be identified.Each example lattice in document chapter primary-slave relation type set to be identified Formula is as follows:
Example 14:
Nearly 20,000,000,000 dollars of province's total volume of imports and exports of < Sentence=" last year, | the disbursement of foreign capital is more than 40 Hundred million dollars, " Center=" 3 "/>
Sentence indicates the chapter unit that chapter primary-slave relation is included in example 14, with the piece in collection of document to be identified Zhang Danyuan is consistent, and Center indicates chapter primary-slave relation class belonging to the chapter primary-slave relation identified by model Type, wherein " 1 " expression " core-satellite " relationship, " 2 " expression " satellite-core " relationship, " 3 " expression " core-core " relationship.
A kind of Chinese chapter primary-slave relation identifying system is as shown in fig. 7, comprises chapter primary-slave relation chapter unit extracts mould Block 10, chapter primary-slave relation chapter unit preprocessing module 20, chapter primary-slave relation chapter cell processing module 30 to be identified, to Identify chapter primary-slave relation identification module 40.
Chapter primary-slave relation chapter unit abstraction module 10, read mark collection of document in chapter unit markup information and Chapter primary-slave relation type markup information, obtains chapter primary-slave relation set, to the non-dualization in chapter primary-slave relation set Chapter primary-slave relation carries out left subtree conversion, obtains binary chapter primary-slave relation set.The mark collection of document is to have marked The collection of document for having infused chapter primary-slave relation type is the basis of training pattern of the present invention.
Chapter primary-slave relation chapter unit preprocessing module 20, to each chapter list in binary chapter primary-slave relation set Metacall segments tool, obtains the word feature of each chapter primary-slave relation, generates binary chapter primary-slave relation word characteristic set.It is right Each of binary chapter primary-slave relation word characteristic set chapter cell call part of speech extraction tool, obtains each chapter list The part of speech feature of all words of member, generates binary chapter primary-slave relation word and part of speech feature set.
Chapter primary-slave relation chapter cell processing module 30 to be identified, it is to be identified to each of collection of document to be identified Each of document chapter unit uses and marks the identical processing method of collection of document, obtains binary chapter primary and secondary to be identified Relative and part of speech feature set.The collection of document to be identified is the collection of document for not marking chapter primary-slave relation type.
Chapter primary-slave relation identification module 40 to be identified, will be each in binary chapter primary-slave relation word and part of speech feature set Each chapter unit of chapter primary-slave relation is configured to the receptible input form of neural network, obtains mark file characteristics input Set.The neural network classifier based on gate memory network is built using deep learning tool, file characteristics will be marked Input set cooperation is input, training chapter primary-slave relation identification model.It is special to binary chapter primary-slave relation word to be identified and part of speech Each chapter element characteristic in collection conjunction also constructs and marks file characteristics input and gathers identical input format, obtains to be identified Then file characteristics input set cooperation to be identified is mode input in collection of document to be identified by file characteristics input set Chapter primary-slave relation identified, type belonging to each chapter primary-slave relation in collection of document to be identified is obtained, generate to Identify document chapter primary-slave relation type set.
Wherein, as shown in figure 8, chapter primary-slave relation chapter unit abstraction module 10 includes that chapter unit and chapter primary and secondary are closed It is classification reading unit 101, chapter primary-slave relation dualization converting unit 102.
Chapter unit and chapter primary-slave relation classification reading unit 101 read the piece in mark collection of document according to label Zhang Danyuan markup information and chapter primary-slave relation type markup information, obtain chapter primary-slave relation set.
Chapter primary-slave relation dualization converting unit 102, according to chapter primary-slave relation set obtained, by non-dualization Chapter primary-slave relation be converted into the chapter primary-slave relation of dualization, obtain binary chapter primary-slave relation set.
Specifically, all chapter primary-slave relations in chapter primary-slave relation set are carried out judging whether it is n-tuple relation, by It is separated between chapter unit with " | ", therefore is carried out each of chapter primary-slave relation set chapter primary-slave relation using " | " Chapter unit cutting is n-tuple relation if chapter unit number is greater than 2, carries out left subtree conversion: for one comprising multiple The chapter primary-slave relation node of child, first by chapter primary-slave relation node first child node from left to right and second Child nodes combine generation chapter primary-slave relation node identical with former father node, using the node as the first of former father node A child nodes, successively iteration, until reaching the most right child nodes of former father node.In this way, including n chapter list for one The polynary chapter primary-slave relation of member generates n-1 binary chapter primary-slave relation after conversion.A binary piece is obtained after conversion Chapter primary-slave relation set.
Wherein, as shown in figure 9, chapter primary-slave relation chapter unit preprocessing module 20 includes word chapter primary-slave relation chapter Unit segmenting words unit 201, chapter primary-slave relation chapter unit part of speech extracting unit 202.
Chapter primary-slave relation chapter unit segmenting words unit 201, to each piece in binary chapter primary-slave relation set Chapter cell call segments tool, obtains the word feature of each chapter primary-slave relation, generates binary chapter primary-slave relation word feature set It closes.Punctuation mark is removed during participle.
Chapter primary-slave relation chapter unit part of speech extracting unit 202, in binary chapter primary-slave relation word characteristic set Each chapter cell call part of speech extraction tool obtains the part of speech feature of each all word of chapter unit, generates a binary piece Chapter primary-slave relation word and part of speech feature set.
Wherein, as shown in Figure 10, chapter primary-slave relation chapter cell processing module 30 to be identified includes chapter list to be identified Member and chapter primary-slave relation classification reading unit 301, chapter primary-slave relation dualization converting unit 302 to be identified, a piece to be identified Chapter primary-slave relation chapter unit segmenting words unit 303, chapter primary-slave relation chapter unit part of speech extracting unit 304 to be identified,
Chapter unit and chapter primary-slave relation classification reading unit 301 to be identified, using collection of document to be identified as input, The identical processing method of collection of document is used and marked, chapter primary-slave relation set to be identified is generated.
Chapter primary-slave relation dualization converting unit 302 to be identified, using chapter primary-slave relation set to be identified as input, Using processing method identical with chapter primary-slave relation set, binary chapter primary-slave relation word characteristic set to be identified is generated.
Chapter primary-slave relation chapter unit segmenting words unit 303 to be identified, binary chapter primary-slave relation collection to be identified Cooperation is input, using processing method identical with binary chapter primary-slave relation set, generates binary chapter primary and secondary to be identified and closes Copula characteristic set.
Chapter primary-slave relation chapter unit part of speech extracting unit 304 to be identified, binary chapter primary-slave relation word to be identified Characteristic set generates to be identified two using processing method identical with binary chapter primary-slave relation word characteristic set as input First chapter primary-slave relation word and part of speech feature set.
Wherein, as shown in figure 11, chapter primary-slave relation identification module 40 to be identified includes mark file characteristics input set Structural unit 401, model training unit 402, file characteristics input set structural unit 403 to be identified, chapter primary and secondary to be identified Relation recognition unit 404,
File characteristics input set structural unit 401 is marked, it will be in binary chapter primary-slave relation word and part of speech feature set Each chapter unit of each chapter primary-slave relation causes the receptible input form of neural network, and it is defeated to obtain mark file characteristics Enter set.
Mark the word and part of speech feature tool of each chapter unit of each chapter primary-slave relation in file characteristics input set Body is as follows:
Splice the word and part of speech of each chapter unit in word sequence, form is Wi=[ei, pi], and ei indicates word, pi Indicate part of speech.Input as neural network.
To feature carry out it is vector initialising, wherein vocabulary use on wikipedia Chinese corpus the good word of pre-training to Amount, dimension is 300 dimensions, and part of speech feature takes the strategy of random initializtion, and dimension is 50 dimensions.
Model training unit 402 builds one based on gate memory network using deep learning tool (such as keras) Mark file characteristics input set cooperation is input, training chapter primary-slave relation identification model by neural network classifier.
The specific method is as follows:
For two chapter units in a binary chapter primary-slave relation feature input using it as nerve net The input of network.
The shot and long term memory network in deep learning tool is called to encode the input of each chapter unit, to obtain The global information for obtaining each chapter unit indicates.Shot and long term memory network dimension is set as 50.
The convolutional neural networks in deep learning tool are called to carry out the global information of each chapter unit of acquisition Coding, while the information that global maximum pond chemical industry tool chooses each chapter unit in deep learning tool being called to indicate.Convolution Neural network convolution kernel number is set as 1024, and convolution kernel window size is set as 2.
The information for merging two chapter units obtained indicates to obtain Global Information, specifically: u=v1 ⊕ v2.Wherein u Indicate Global Information, v1v2 indicates that the information of two chapter units indicates, ⊕ indicates to be added by element.
Door control unit is calculated by Global Information, specifically: g=sigmoid (Wu+b), wherein g is the door obtained Unit is controlled, W is parameter matrix, and b is bias matrix, and sigmoid function is used as the threshold function table of neural network, variable mappings are arrived Between (0,1).
V1, v2 are indicated using the information that the door control unit of acquisition is respectively acting on two chapter units, specifically: o1=g ⊙ v1, o2=g ⊙ v2, o1, o2 indicate two chapter unit recall infos.⊙ is indicated by element multiplication.
By o1, o2 for acquiring replace step (4) in v1, v2 repeat step (4) (5) (6) twice, by obtained o1, O2 obtains final Global Information c using the method for step (4).
The feedforward neural network in deep learning tool is called, using c as the input of feedforward neural network, is finally called deep Softmax function classifies to chapter primary-slave relation type in degree learning tool, realizes that chapter primary-slave relation identifies nerve net Network model.Softmax function is normalization exponential function, by each element probabilistic contraction between (0,1), and is owned Element probability and be 1, take maximum probability element be chapter primary-slave relation type.
File characteristics input set structural unit 403 to be identified, binary chapter primary-slave relation word to be identified and part of speech is special Each chapter element characteristic in collection conjunction also constructs and marks file characteristics input and gathers identical input format, obtains to be identified File characteristics input set.
Chapter primary-slave relation recognition unit 404 to be identified calls the network Chinese chapter primary and secondary based on gate memory nerve net Relation recognition model carries out relationship type identification to each chapter primary-slave relation, obtains each chapter in collection of document to be identified Type belonging to primary-slave relation.Ultimately produce document chapter primary-slave relation type set to be identified.
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims (10)

1. a kind of Chinese chapter primary-slave relation recognition methods characterized by comprising
The chapter unit markup information and chapter primary-slave relation type markup information in mark collection of document are read, chapter master is obtained Secondary set of relationship carries out left subtree conversion to the non-dualization chapter primary-slave relation in chapter primary-slave relation set, obtains binary Chapter primary-slave relation set;Wherein, the mark collection of document is the collection of document for being labelled with chapter primary-slave relation type;
Tool is segmented to each chapter cell call in binary chapter primary-slave relation set, obtains each chapter primary-slave relation Word feature generates binary chapter primary-slave relation word characteristic set;To each of binary chapter primary-slave relation word characteristic set Chapter cell call part of speech extraction tool obtains the part of speech feature of each all word of chapter unit, generates binary chapter primary and secondary Relative and part of speech feature set;
Collection of document is used and marked to each of each of collection of document to be identified document to be identified chapter unit Identical processing method obtains binary chapter primary-slave relation word to be identified and part of speech feature set;Wherein, the document to be identified Set is the collection of document for not marking chapter primary-slave relation type;
Each chapter unit of chapter primary-slave relation each in binary chapter primary-slave relation word and part of speech feature set is configured to The receptible input form of neural network obtains mark file characteristics input set;A base is built using deep learning tool It is input by mark file characteristics input set cooperation in the neural network classifier of gate memory network, training chapter primary and secondary is closed It is identification model;Each chapter element characteristic in binary chapter primary-slave relation word to be identified and part of speech feature set is also constructed Input format identical with mark file characteristics input set obtains file characteristics input set to be identified, then by text to be identified Shelves feature input set cooperation is that mode input identifies the chapter primary-slave relation in collection of document to be identified, is obtained to be identified Type belonging to each chapter primary-slave relation in collection of document generates document chapter primary-slave relation type set to be identified.
2. Chinese chapter primary-slave relation recognition methods according to claim 1, which is characterized in that " read mark document sets Chapter unit markup information and chapter primary-slave relation type markup information in conjunction, obtain chapter primary-slave relation set, to chapter Non- dualization chapter primary-slave relation in primary-slave relation set carries out left subtree conversion, obtains binary chapter primary-slave relation set; Wherein, the mark collection of document is the collection of document for being labelled with chapter primary-slave relation type;" specifically include:
The chapter unit markup information and chapter primary-slave relation type markup information in mark collection of document are read according to label, is obtained To chapter primary-slave relation set;
According to chapter primary-slave relation set obtained, it converts the chapter primary-slave relation of non-dualization to the chapter master of dualization Secondary relationship obtains binary chapter primary-slave relation set.
3. Chinese chapter primary-slave relation recognition methods according to claim 1, which is characterized in that " to binary chapter primary and secondary Each chapter cell call segments tool in set of relationship, obtains the word feature of each chapter primary-slave relation, generates a binary piece Chapter primary-slave relation word characteristic set;Each of binary chapter primary-slave relation word characteristic set chapter cell call part of speech is taken out Tool is taken, the part of speech feature of each all word of chapter unit is obtained, generates binary chapter primary-slave relation word and part of speech feature collection It closes;" specifically include:
Tool is segmented to each chapter cell call in binary chapter primary-slave relation set, obtains each chapter primary-slave relation Word feature generates binary chapter primary-slave relation word characteristic set;Punctuation mark is removed during participle;
To each of binary chapter primary-slave relation word characteristic set chapter cell call part of speech extraction tool, each is obtained The part of speech feature of all words of chapter unit generates binary chapter primary-slave relation word and part of speech feature set.
4. Chinese chapter primary-slave relation recognition methods according to claim 1, which is characterized in that " to document sets to be identified Each of each of conjunction document to be identified chapter unit uses and marks the identical processing method of collection of document and obtains Binary chapter primary-slave relation word to be identified and part of speech feature set;Wherein, the collection of document to be identified is not mark chapter master The collection of document of secondary relationship type;" specifically include:
Using collection of document to be identified as input, the identical processing method of collection of document is used and marked, chapter to be identified is generated Primary-slave relation set;
It is raw using processing method identical with chapter primary-slave relation set using chapter primary-slave relation set to be identified as input At binary chapter primary-slave relation set to be identified;
Using binary chapter primary-slave relation set to be identified as input, processing identical with binary chapter primary-slave relation set is used Method generates binary chapter primary-slave relation word characteristic set to be identified;
Using binary chapter primary-slave relation word characteristic set to be identified as input, using with binary chapter primary-slave relation word feature set Identical processing method is closed, binary chapter primary-slave relation word to be identified and part of speech feature set are generated.
5. Chinese chapter primary-slave relation recognition methods according to claim 1, which is characterized in that " by binary chapter primary and secondary It is receptible defeated to be configured to neural network for each chapter unit of each chapter primary-slave relation in relative and part of speech feature set Enter form, obtains mark file characteristics input set;The mind based on gate memory network is built using deep learning tool It is input, training chapter primary-slave relation identification model by mark file characteristics input set cooperation through network classifier;To be identified Each chapter element characteristic in binary chapter primary-slave relation word and part of speech feature set also constructs and marks file characteristics input Gather identical input format, obtains file characteristics input set to be identified, be then by file characteristics input set cooperation to be identified Mode input identifies the chapter primary-slave relation in collection of document to be identified, obtains each piece in collection of document to be identified Type belonging to chapter primary-slave relation generates document chapter primary-slave relation type set to be identified." specifically include:
Each chapter unit of chapter primary-slave relation each in binary chapter primary-slave relation word and part of speech feature set is caused into mind Through the receptible input form of network, mark file characteristics input set is obtained;
Obtain the word and part of speech feature of each chapter unit of each chapter primary-slave relation in mark file characteristics input set;
The neural network classifier based on gate memory network is built using deep learning tool, it is defeated by file characteristics are marked Enter set as input, training chapter primary-slave relation identification model;
Each chapter element characteristic in binary chapter primary-slave relation word to be identified and part of speech feature set is also constructed and marked Identical input format is gathered in file characteristics input, obtains file characteristics input set to be identified;
The network Chinese chapter primary-slave relation identification model based on gate memory nerve net is called to carry out each chapter primary-slave relation Relationship type identification, obtains type belonging to each chapter primary-slave relation in collection of document to be identified;It ultimately produces to be identified Document chapter primary-slave relation type set.
6. a kind of Chinese chapter primary-slave relation identifying system characterized by comprising
Chapter primary-slave relation chapter unit abstraction module reads the chapter unit markup information in mark collection of document and chapter master Secondary relationship type markup information obtains chapter primary-slave relation set, to the non-dualization chapter master in chapter primary-slave relation set Secondary relationship carries out left subtree conversion, obtains binary chapter primary-slave relation set;The mark collection of document is to be labelled with a piece The collection of document of chapter primary-slave relation type;
Chapter primary-slave relation chapter unit preprocessing module, to each chapter cell call in binary chapter primary-slave relation set Participle tool obtains the word feature of each chapter primary-slave relation, generates binary chapter primary-slave relation word characteristic set;To a binary piece It is all to obtain each chapter unit for each of chapter primary-slave relation word characteristic set chapter cell call part of speech extraction tool The part of speech feature of word generates binary chapter primary-slave relation word and part of speech feature set;
Chapter primary-slave relation chapter cell processing module to be identified, in each of collection of document to be identified document to be identified Each chapter unit use and mark the identical processing method of collection of document, obtain binary chapter primary-slave relation word to be identified With part of speech feature set;Wherein, the collection of document to be identified is the collection of document for not marking chapter primary-slave relation type;
Chapter primary-slave relation identification module to be identified, by chapter master each in binary chapter primary-slave relation word and part of speech feature set Each chapter unit of secondary relationship is configured to the receptible input form of neural network, obtains mark file characteristics input set; The neural network classifier based on gate memory network is built using deep learning tool, file characteristics input set will be marked Cooperation is input, training chapter primary-slave relation identification model;To binary chapter primary-slave relation word to be identified and part of speech feature set In each chapter element characteristic also construct and mark file characteristics input and gather identical input format, it is special to obtain document to be identified Then file characteristics input set cooperation to be identified is mode input to the chapter master in collection of document to be identified by sign input set Secondary relationship is identified, is obtained type belonging to each chapter primary-slave relation in collection of document to be identified, is generated text to be identified Shelves chapter primary-slave relation type set.
7. Chinese chapter primary-slave relation identifying system according to claim 1, which is characterized in that the chapter primary-slave relation Chapter unit abstraction module, comprising:
Chapter unit and chapter primary-slave relation classification reading unit read the chapter unit mark in mark collection of document according to label Information and chapter primary-slave relation type markup information are infused, chapter primary-slave relation set is obtained;
Chapter primary-slave relation dualization converting unit, according to chapter primary-slave relation set obtained, by the chapter of non-dualization Primary-slave relation is converted into the chapter primary-slave relation of dualization, obtains binary chapter primary-slave relation set.
8. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes any one of claims 1 to 5 the method when executing described program Step.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claims 1 to 5 the method is realized when row.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit requires 1 to 5 described in any item methods.
CN201811168250.0A 2018-10-08 2018-10-08 Chinese chapter primary-slave relation recognition methods and system Pending CN109299442A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811168250.0A CN109299442A (en) 2018-10-08 2018-10-08 Chinese chapter primary-slave relation recognition methods and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811168250.0A CN109299442A (en) 2018-10-08 2018-10-08 Chinese chapter primary-slave relation recognition methods and system

Publications (1)

Publication Number Publication Date
CN109299442A true CN109299442A (en) 2019-02-01

Family

ID=65161857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811168250.0A Pending CN109299442A (en) 2018-10-08 2018-10-08 Chinese chapter primary-slave relation recognition methods and system

Country Status (1)

Country Link
CN (1) CN109299442A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541337A (en) * 2020-12-16 2021-03-23 格美安(北京)信息技术有限公司 Document template automatic generation method and system based on recurrent neural network language model

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955956A (en) * 2016-05-05 2016-09-21 中国科学院自动化研究所 Chinese implicit discourse relation identification method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955956A (en) * 2016-05-05 2016-09-21 中国科学院自动化研究所 Chinese implicit discourse relation identification method

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
严为绒等: "篇章关系分析研究综述", 《中文信息学报》 *
严为绒等: "篇章关系分析研究综述", 《中文信息学报》, no. 04, 15 July 2016 (2016-07-15) *
奚雪峰 等: "汉语篇章微观话题结构建模与语料库构建", 《计算机研究与发展》 *
奚雪峰 等: "汉语篇章微观话题结构建模与语料库构建", 《计算机研究与发展》, 15 August 2017 (2017-08-15) *
李艳翠: "汉语篇章结构表示体系及资源构建研究", 《中国博士学位论文全文数据库信息科技辑》, 15 June 2016 (2016-06-15), pages 25 - 101 *
苏新宁 等: "数据挖掘理论与技术", 北京:科学技术文献出版社, pages: 42 - 44 *
蒋峰 等: "基于主题相似度的宏观篇章主次关系识别方法", 中文信息学报, no. 01, pages 47 - 54 *
褚晓敏 等: "自然语言处理中的篇章主次关系研究", 计算机学报, no. 04, pages 72 - 90 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541337A (en) * 2020-12-16 2021-03-23 格美安(北京)信息技术有限公司 Document template automatic generation method and system based on recurrent neural network language model

Similar Documents

Publication Publication Date Title
CN111581396B (en) Event graph construction system and method based on multi-dimensional feature fusion and dependency syntax
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
CN107229610B (en) A kind of analysis method and device of affection data
CN110427623A (en) Semi-structured document Knowledge Extraction Method, device, electronic equipment and storage medium
CN111325029B (en) Text similarity calculation method based on deep learning integrated model
CN108628828A (en) A kind of joint abstracting method of viewpoint and its holder based on from attention
CN112559766A (en) Legal knowledge map construction system
CN115329088B (en) Robustness analysis method of graph neural network event detection model
CN113312480A (en) Scientific and technological thesis level multi-label classification method and device based on graph convolution network
CN115688776A (en) Relation extraction method for Chinese financial text
CN116245107B (en) Electric power audit text entity identification method, device, equipment and storage medium
CN113255321A (en) Financial field chapter-level event extraction method based on article entity word dependency relationship
CN116383399A (en) Event public opinion risk prediction method and system
CN113869055A (en) Power grid project characteristic attribute identification method based on deep learning
CN113901813A (en) Event extraction method based on topic features and implicit sentence structure
CN113869054A (en) Deep learning-based electric power field project feature identification method
Kanev et al. Metagraph knowledge base and natural language processing pipeline for event extraction and time concept analysis
CN117473054A (en) Knowledge graph-based general intelligent question-answering method and device
CN112163069A (en) Text classification method based on graph neural network node feature propagation optimization
CN109299442A (en) Chinese chapter primary-slave relation recognition methods and system
CN114065770B (en) Method and system for constructing semantic knowledge base based on graph neural network
CN116910190A (en) Method, device and equipment for acquiring multi-task perception model and readable storage medium
Yang et al. A general solution and practice for automatically constructing domain knowledge graph
CN114330350A (en) Named entity identification method and device, electronic equipment and storage medium
CN113435190A (en) Chapter relation extraction method integrating multilevel information extraction and noise reduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination