CN106815207A - For the information processing method and device of law judgement document - Google Patents

For the information processing method and device of law judgement document Download PDF

Info

Publication number
CN106815207A
CN106815207A CN201510869588.9A CN201510869588A CN106815207A CN 106815207 A CN106815207 A CN 106815207A CN 201510869588 A CN201510869588 A CN 201510869588A CN 106815207 A CN106815207 A CN 106815207A
Authority
CN
China
Prior art keywords
content
text
preset rules
target
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510869588.9A
Other languages
Chinese (zh)
Other versions
CN106815207B (en
Inventor
胡斌
杜宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510869588.9A priority Critical patent/CN106815207B/en
Publication of CN106815207A publication Critical patent/CN106815207A/en
Application granted granted Critical
Publication of CN106815207B publication Critical patent/CN106815207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

This application discloses a kind of information processing method and device for law judgement document.The method includes:Obtain the target text content of law judgement document;With the presence or absence of the content of text matched with the first preset rules in detection target text content;If there is the content of text matched with the first preset rules in detecting target text content, at least one target keyword is extracted from the content of text matched with the first preset rules according to the first preset rules;And store to same keyword set at least one target keyword.By the application, solving the keyword extracted from law judgement document in correlation technique cannot embody the technical problem of the correlation between keyword.

Description

For the information processing method and device of law judgement document
Technical field
The application is related to text-processing field, in particular to a kind of information processing side for law judgement document Method and device.
Background technology
Penal code works author when settling a case, generally require to sentence the charge involved by case, punishment type, Criminal punishment amount, law of judgement institute foundation etc. are considered, with real work as operating reference Data.The source of these reference datas is usually that people's court has adjudicated and disclosed a large amount of cases, by case Do the result that big data analytic statistics draws.
In correlation technique, when big data analytic statistics is done to case, can temporarily travel through the related law of all cases and cut out Document is sentenced, to obtain the keyword included in case.Due to what is be related in the criminal case of people's court's examination judgement Punishment information point is more, contain much information, content complicated, represent variation, and it is also more to adjudicate the law species of foundation, For example, cause of criminal action charge species is various, the type of punishments sentenced according to different charges is different and the measurement of penalty is also different.Cause This, when data are inquired about to law judgement document data set using the method, due to word for word searching full text, to service Device causes very big pressure, and time-consuming very long;The correlation that the interim result for finding out cannot be embodied between keyword Property (such as the correlation between punishment information), is unfavorable for carrying out big data statistical analysis.
The correlation between keyword cannot be embodied for the keyword extracted from law judgement document in correlation technique Technical problem, effective solution is not yet proposed at present.
The content of the invention
The embodiment of the present application provides a kind of information processing method and device for law judgement document, at least to solve The technology that the keyword extracted from law judgement document in correlation technique cannot embody the correlation between keyword is asked Topic.
According to the one side of the embodiment of the present application, there is provided a kind of information processing method for law judgement document. The method includes:Obtain the target text content of law judgement document;Whether there is and the in detection target text content The content of text that one preset rules match;If existed and the first preset rules phase in detecting target text content The content of text matched somebody with somebody, then extract at least according to the first preset rules from the content of text matched with the first preset rules One target keyword;And store to same keyword set at least one target keyword.
Further, include with the presence or absence of the content of text matched with the first preset rules in detection target text content: Judge whether target text content meets following condition:In the presence of at least one first default characteristic key words, and at least One first default characteristic key words are located at predeterminated position, wherein, if it is judged that existing at least in target text content One first default characteristic key words, and at least one first default characteristic key words are located at predeterminated position, it is determined that There is the content of text matched with the first preset rules in target text content.
Further, the first preset rules include multiple default sub-rules, whether there is in detection target text content with The content of text that first preset rules match includes:Whether there is in detection target text content successively default with multiple The content of text that sub-rule matches;And the default cuckoo of the content of text that the presence that first detects matches Then as target sub-rule, according to the first preset rules extracted from the content of text matched with the first preset rules to A few target keyword includes:At least one of target text content target keyword is extracted according to target sub-rule.
Further, there are the feelings of the content of text matched with the first preset rules in target text content is detected Under condition, at least one target is being extracted from the content of text matched with the first preset rules according to the first preset rules Before keyword, the method also includes:Whether the content of text that detection matches with the first preset rules is default with second Rule matches, wherein, if detecting the content of text matched with the first preset rules and the second preset rules phase Matching, then extract at least one target according to the first preset rules from the content of text matched with the first preset rules Keyword.
Further, whether the content of text that detection matches with the first preset rules matches bag with the second preset rules Include:Whether the part of speech of the second default characteristic key words in the content of text that detection matches with the first preset rules is pre- If part of speech, wherein, the second default characteristic key words will match according to the 3rd preset rules with the first preset rules Content of text split the keyword for obtaining, wherein, if detected in the text matched with the first preset rules The part of speech of the second default characteristic key words in appearance is default part of speech, it is determined that the text matched with the first preset rules Content matches with the second preset rules.
Further, stored to before same keyword set by least one target keyword, the method is also wrapped Include:The numeral of the non-Arabic numerals form at least one target keyword is converted into Arabic numerals form, its In, it is converted into the stored digital of Arabic numerals form to same keyword set.
According to the another aspect of the embodiment of the present application, a kind of information processor for law judgement document is additionally provided. The device includes:Acquiring unit, the target text content for obtaining law judgement document;First detection unit, uses With the presence or absence of the content of text matched with the first preset rules in target text content is detected;Extraction unit, is used for It is default according to first if there is the content of text matched with the first preset rules in detecting target text content Rule extracts at least one target keyword from the content of text matched with the first preset rules;And memory cell, For at least one target keyword to be stored to same keyword set.
Further, the first detection unit includes:Judge module, for judging it is following whether target text content meets Condition:It is located in advance in the presence of at least one first default characteristic key words, and at least one first default characteristic key words If position, wherein, if it is judged that there are at least one first default characteristic key words in target text content, and At least one first default characteristic key words are located at predeterminated position, it is determined that exist in target text content default with first The content of text that rule matches.
Further, the first preset rules include multiple default sub-rules, and the first detection unit includes:Detection module, For detecting successively in target text content with the presence or absence of the content of text matched with the default sub-rules of multiple;And really Cover half block, for the default sub-rule of content of text that the presence that first detects matches as target sub-rule, Extraction unit includes:Extraction module, closes for extracting at least one of target text content target according to target sub-rule Keyword.
Further, the device also includes:Second detection unit, for detecting the text matched with the first preset rules Whether this content matches with the second preset rules, wherein, if detecting the text matched with the first preset rules Content matches with the second preset rules, then according to the first preset rules from the text matched with the first preset rules At least one target keyword is extracted in appearance.
In the embodiment of the present application, using the method for comprising the following steps:Obtain in the target text of law judgement document Hold;With the presence or absence of the content of text matched with the first preset rules in detection target text content;If detecting mesh There is the content of text that matches with the first preset rules in mark content of text, then according to the first preset rules from first At least one target keyword is extracted in the content of text that preset rules match;And by least one target keyword Store to same keyword set, solving the keyword extracted from law judgement document in correlation technique cannot Embody the technical problem of the correlation between keyword, and then whether there is in target text content and first by being detected The content of text that preset rules match, exists in the text matched with the first preset rules in target text content In the case of appearance, at least one is extracted from the content of text matched with the first preset rules according to the first preset rules Target keyword, and at least one target keyword is stored to same keyword set so that finally get Keyword set has a class keywords of dependency relation in representing law judgement document, it is achieved thereby that extracting law The technique effect of the keyword with correlation in judgement document.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In accompanying drawing In:
Fig. 1 is the flow chart of the information processing method for law judgement document according to the application first embodiment;
Fig. 2 is the flow chart of the information processing method for law judgement document according to the application second embodiment;With And
Fig. 3 is the schematic diagram of the information processor for law judgement document according to the embodiment of the present application.
Specific embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment The only embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, should all belong to The scope of the application protection.
It should be noted that term " first ", " in the description and claims of this application and above-mentioned accompanying drawing Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except Here the order beyond those for illustrating or describing is implemented.Additionally, term " comprising " and " having " and they Any deformation, it is intended that covering is non-exclusive to be included, for example, containing process, the side of series of steps or unit Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear List or for these processes, method, product or other intrinsic steps of equipment or unit.
According to the embodiment of the present application, there is provided a kind of embodiment of the method for information processing method for law judgement document, It should be noted that can be in the such as one group calculating of computer executable instructions the step of the flow of accompanying drawing is illustrated Performed in machine system, and, although logical order is shown in flow charts, but in some cases, can be with Shown or described step is performed different from order herein.
Fig. 1 is the flow chart of the information processing method for law judgement document according to the application first embodiment, such as Shown in Fig. 1, the method comprises the following steps:
Step S102, obtains the target text content of law judgement document.
Multiple nature paragraphs are generally included in law judgement document, each natural paragraph has certain text message meaning Deflection.Wherein, the text message meaning deflection that possible certain several paragragh fall is identical.Performed to law judgement document Before information processing, law judgement document can be carried out the division of content according to text message meaning deflection, wherein, The content of text of same meaning deflection can be processed as target text content.For example, in law judgement document Include describing one or more natural paragraph of the contents such as plaintiff, defendant, punishment.Can description plaintiff content (or Defendant's person content, punishment content) natural paragraph collect, as target text content.The mesh for obtaining by this way Mark content of text represents the full text content on certain meaning deflection in law judgement document, therefore, to the target Content of text carries out the extraction of relevant information, and the information extracted will be made more complete, accurate.
Step S104, with the presence or absence of the content of text matched with the first preset rules in detection target text content.
First preset rules are the rule for performing text matches set in advance, can be regularity, Huo Zheqi He is regular.First preset rules are the matched rules set according to user's request, in different types of target text Hold, the first different preset rules can be set.For example, if target text content characterizes plaintiff's content, user thinks The regulation that the name and plaintiff for getting plaintiff are offended, then can set the first preset rules is:* .* crime is violated.Pass through Full content to target text content is traveled through, and can detect whether there is in target text content default with first The content of text that rule matches.
Step S106, if there is the content of text matched with the first preset rules in detecting target text content, At least one target keyword is then extracted from the content of text matched with the first preset rules according to the first preset rules.
If there is the content of text matched with the first preset rules in detecting target text content, according to first Preset rules carry out information extraction to the content.For example, target text content characterizes punishment content, the first preset rules For:* violate .* crime, sentence .* punishment.Target text content includes " Wang's commission of a theft, sentence fixed-term imprisonment ... ", then root Information extraction is carried out to the content according to the first preset rules, target keyword can be obtained:Wang, steal, have phase apprentice Punishment.
Step S108, at least one target keyword is stored to same keyword set.
Typically there is logical relation according between at least one target keyword that the above method is obtained.For example, on Target keyword " Wang ", " theft " for extracting are stated, wherein, defendant is entitled " Wang ", its punishment violated It is " theft " crime to penalize.Therefore, at least one target keyword that will have association in logic is stored to same pass Keyword set, during follow-up statistics, analysis, can be readily available one group of key with dependency relation Word (keyword set), and other target keywords related to wherein any one target keyword can be known.
According to the information processing method for law judgement document of the embodiment, by the mesh for obtaining law judgement document Mark content of text;With the presence or absence of the content of text matched with the first preset rules in detection target text content;If Detect in target text content there is the content of text matched with the first preset rules, then according to the first preset rules At least one target keyword is extracted from the content of text matched with the first preset rules;And by least one mesh Mark keyword is stored to same keyword set, solves the pass extracted from law judgement document in correlation technique Whether keyword cannot embody the technical problem of the correlation between keyword, and then deposit in target text content by being detected In the content of text matched with the first preset rules, exist in target text content and match with the first preset rules Content of text in the case of, extracted from the content of text matched with the first preset rules according to the first preset rules At least one target keyword, and at least one target keyword is stored to same keyword set so that it is final The keyword set for getting represents the class keywords with dependency relation in law judgement document, it is achieved thereby that Extract the technique effect of the keyword with correlation in law judgement document.
Preferably, include with the presence or absence of the content of text matched with the first preset rules in detection target text content: Judge whether target text content meets following condition:In the presence of at least one first default characteristic key words, and at least One first default characteristic key words are located at predeterminated position, wherein, if it is judged that existing at least in target text content One first default characteristic key words, and at least one first default characteristic key words are located at predeterminated position, it is determined that There is the content of text matched with the first preset rules in target text content.
In this embodiment, the first preset rules are set as needing to include that at least one first is pre- in text to be detected If characteristic key words, and these first default characteristic key words are located at predetermined position.For example, the first preset rules For:.* violate .* crime, sentence .* punishment .* [moon | year], its expression:Certain criminal crime, sentences how many of certain punishment month (or year). Wherein, the rule needed in text to be detected including violating, crime, sentence, punishment, year, keyword (at least etc. the moon Individual first default characteristic key words), and violate, crime, sentence, there is position in logic to close for punishment, year, the keyword such as the moon System's (be located at predeterminated position), for example, be not newline behind " criminals ", and " criminal " position in the position of " crime " Before putting, the character with predetermined number between the position of " criminal " and " crime ".According to the method, can be more accurate Ground obtains the text message that user needs, and with matching efficiency higher.
Alternatively, the first preset rules include multiple default sub-rules, whether there is in detection target text content and the The content of text that one preset rules match includes:Whether there is in detection target text content successively and the default son of multiple The content of text that rule matches;And the default sub-rule of the content of text that the presence that first detects matches As target sub-rule, extracted at least from the content of text matched with the first preset rules according to the first preset rules One target keyword includes:At least one of target text content target keyword is extracted according to target sub-rule.
Under normal circumstances, can be directed to that different target text content settings in law judgement document are different first to be preset Rule, and can also set the first different preset rules for different extraction demands.For example, for plaintiff's content, Defendant's content, punishment content etc. can be respectively provided with the first different preset rules.In for plaintiff's content, according to The different information extraction demand in family (e.g., extracts plaintiff's name, sex, native place etc.;Or, extract plaintiff's age etc.), The first different preset rules can also be set.In addition, even for same type of target text content and together The user's request of sample, due to the difference of the form of presentation of the author of law judgement document, different law judgement documents In the first preset rules that can match of the target text content it could also be possible that different.Judge text for law For the information processing of book, especially to the information processing of a large amount of law judgement documents, can in advance by using up for extracting The first preset rules more than possible are stored in database, according to the applicability of different content of text to the first preset rules Classified.When information processing is carried out to certain class text content, corresponding first preset rules of class are entered one by one Row matching, until matching certain rule untill (match target sub-rule).It is every in one the first preset rules of class Individual preset rules are above-mentioned default sub-rule.
In order to improve the degree of accuracy of matching, alternatively, exist and the first preset rules in target text content is detected In the case of the content of text for matching, according to the first preset rules from the text matched with the first preset rules Before at least one target keyword is extracted in appearance, the method also includes:The text that detection matches with the first preset rules Whether this content matches with the second preset rules, wherein, if detecting the text matched with the first preset rules Content matches with the second preset rules, then according to the first preset rules from the text matched with the first preset rules At least one target keyword is extracted in appearance.
In this embodiment, there is the content of text matched with the first preset rules in target text content is detected Afterwards, can detect whether the content of text matched with the first preset rules matches with the second preset rules.Wherein, Second preset rules are the further restriction to user's request text set in advance.Second preset rules can be with The preset rules of the corresponding setting of the first preset rules, are the further restriction of the first preset rules.It is pre- by two If regular double definition, can largely improve the accuracy of the text message for matching.
Preferably, the content of text that detection matches with the first preset rules whether with the second preset rules match including: Whether the part of speech of the second default characteristic key words in the content of text that detection matches with the first preset rules is default word Property, wherein, the second default characteristic key words are the texts that will be matched with the first preset rules according to the 3rd preset rules Content split the keyword for obtaining, wherein, if in detecting the content of text matched with the first preset rules The part of speech of the second default characteristic key words be default part of speech, it is determined that the content of text matched with the first preset rules Match with the second preset rules.
Due to merely above (being located in advance comprising the first default characteristic key words and the first default characteristic key words from literal If position) content of text is identified, may result in the content of text for identifying not is the required text of user This content.For example, certain keyword for limiting in the first preset rules is verb, carried out to target text content During matching, a noun represented with same keyword has but been matched.In view of in Chinese text literal expression side Formula has variation, therefore, only rely on the literal matching for carrying out content of text and matching error is occurring in some cases. Therefore, in this embodiment, the keyword in the content of text that the second preset rules pair match with the first preset rules Part of speech be defined.According to the 3rd preset rules, the content of text matched with the first preset rules is torn open Point, obtain the second default characteristic key words.Wherein, the 3rd preset rules can be according to content of text from front to back Order, splits according to part of speech to text, obtains multiple keywords.By determining whether that the second default feature is closed Whether the part of speech of keyword is default part of speech, and when the part of speech of the second default characteristic key words is judged for default part of speech, It is determined that the content of text matched with the first preset rules matches with the second preset rules, text matches are effectively increased Accuracy.
For example, the first preset rules are:.* violate .* crime, sentence .* punishment .* [moon | year];The default rule of second corresponding thereto It is then:Name+verb+legal terms+comma+verb+legal terms+measure word.Assuming that target text content is included such as Under description:Defendant's Huang Lei commissions of a theft, are sentenced to fixed-term imprisonment seven months.Target text content is preset with first Rule performs matching treatment, can know that " defendant's Huang Lei commissions of a theft, be sentenced to fixed-term imprisonment seven months " meets the One preset rules.The content matched with the first preset rules is performed with the second preset rules is again matched.Wherein, will With the content that the first preset rules match split the default characteristic key words of second for obtaining is:Huang Lei, criminal, robber Surreptitiously crime, comma (), sentence, fixed-term imprisonment, seven months.Specifically, yellow of heap of stone/name+criminal/verb+theft Crime/legal terms+,/comma+sentence/verb+fixed-term imprisonment/legal terms+seven months/measure word.It can be seen that, The match is successful.Then, according to the first preset rules, can extract:Huang Lei, larceny, fixed-term imprisonment, seven The target keywords such as the moon.Keyword to extracting can carry out structuring encapsulation, and be stored in database, for inspection Rope, statistics, cluster etc. are used.
Again for example, the first preset rules are:Income generated in violation of the regulations RMB .* units [^,.] * (|) give recovery;With it The second corresponding preset rules are:Verb+noun+measure word+comma+verb+verb.
For the ease of keyword set is carried out it is unitized manage, alternatively, deposited by least one target keyword Before storage to same keyword set, the method also includes:By the non-Arabic number at least one target keyword The numeral of font formula is converted to Arabic numerals form, wherein, it is converted into the stored digital of Arabic numerals form extremely Same keyword set.For example, character string " seven " is changed into numerical value " 7 ".
Fig. 2 is the flow chart of the information processing method for law judgement document according to the application second embodiment, should Embodiment can be as a kind of preferred embodiment of embodiment illustrated in fig. 1.As shown in Fig. 2 the method is including as follows Step:
Step S202, extracts the punishment paragraph in law judgement document.
Extracted from law judgement document punishment paragraph (namely description punishment paragraph, as in target text Hold).Extraction process, can be analyzed, according to punishment keyword from method to the law judgement document that people's court announces The paragraph for meeting condition is extracted in rule judgement document's full text.For example, regular expression can be matched:Judgement is as follows | ruling | fixed-term imprisonment | life imprisonment as follows | death penalty, the regular expression is matched in full with law judgement document, can be defeated Go out punishment paragraph.
Step S204, the matching of punishment rule information.
Punishment paragraph can be done participle and part of speech analysis, by the result after analysis and preset rules list (including multiple First preset rules, and the second corresponding preset rules) matching is sequentially performed one by one, until matching One backed off after random of success, then analysis result and match condition are exported.Said process, namely judge the punishment paragraph With the presence or absence of the content of text for meeting the first preset rules and the second preset rules.
For example, punishment paragraph has following description:Defendant's Huang Lei commissions of a theft, are sentenced to fixed-term imprisonment seven months.Enter The result that obtains is after row participle and part of speech analysis:(yellow of heap of stone/name+criminal/verb+larceny/legal terms+, / comma+sentence/verb+fixed-term imprisonment/legal terms+seven months/measure word).It can be seen that, it meets following rule: First preset rules:.* violate .* crime, sentence .* punishment .* [moon | year];And second preset rules:Name+verb+law name Word+comma+verb+legal terms+measure word.Therefore, the match is successful for preset rules.
If it should be noted that in no one of list of rules preset rules (the first preset rules and corresponding The second preset rules) the match is successful, then by the output of this punishment paragraph in failure record.Subsequently can be by people's work point Analysis obtains the new rule corresponding to punishment content in the paragraph, and is added in list of rules, is used to improve rule Then list.
Step S206, extracts the punishment data in punishment paragraph.
According to the matching of above-mentioned completed preset rules, can be according to related in the first preset rules extraction punishment paragraph Punishment data.For example, during above-mentioned punishment paragraph can be extracted:Huang Lei, larceny, fixed-term imprisonment, seven The punishment keywords such as the moon.Or, " larceny " that can be only to wherein including is extracted.
Punishment data are carried out structured storage by step S208.
The data of extraction are done structuring encapsulation by what is extracted, and persistent storage is carried out in database, for retrieving, Statistics, cluster etc. are used.Alternatively, before structured storage is carried out to punishment data, can be by non-Arabic number The numeral of font formula is converted into Arabic numerals form, in order to subsequently carry out unified management to punishment data.For example: Character string " seven " is changed into numerical value " 7 ".
According to the information processing method for law judgement document of the embodiment, it is capable of achieving to cut out non-structured law Sentence effective extraction of punishment data in document, and then obtain correlation between punishment information and information included in document. In addition, the embodiment is packaged storage in structuring multi-dimensional data form to punishment data.By above-mentioned advance place The structural data of the various dimensions of reason so that in terms of big data, cloud storage, is capable of the inspection of quick response punishment data Rope, statistics, cluster etc..
Below according to the embodiment of the present application, there is provided a kind of device reality of information processor for law judgement document Apply example.
It should be noted that the information processor for law judgement document according to the embodiment of the present application can be used for The information processing method for law judgement document according to the embodiment of the present application is performed, according to the use of the embodiment of the present application In law judgement document information processing method can also by according to the embodiment of the present application for law judgement document's Information processor is performed.
Fig. 3 is the schematic diagram of the information processor for law judgement document according to the embodiment of the present application.Such as Fig. 3 Shown, the device includes:Acquiring unit 20, the first detection unit 40, extraction unit 60 and memory cell 80.
Acquiring unit 20, the target text content for obtaining law judgement document.
First detection unit 40, for detecting in target text content with the presence or absence of the text matched with the first preset rules This content.
Extraction unit 60, if for detecting in target text content there is the text matched with the first preset rules Content, then extract at least one target according to the first preset rules from the content of text matched with the first preset rules Keyword.
Memory cell 80, at least one target keyword to be stored to same keyword set.
According to the information processor for law judgement document of the embodiment, law is obtained by acquiring unit 20 and is cut out Sentence the target text content of document;Whether there is in the detection target text content of first detection unit 40 and the first default rule The content of text for then matching;Extraction unit 60 exists and the first preset rules phase in target text content is detected In the case of the content of text matched somebody with somebody, then according to the first preset rules from the content of text matched with the first preset rules Extract at least one target keyword;And memory cell 80 stores to same key at least one target keyword Set of words, solving the keyword extracted from law judgement document in correlation technique cannot embody between keyword The technical problem of correlation, so it is pre- with first by whether there is in the detection target text content of the first detection unit 40 If the content of text that rule matches, there is the content of text matched with the first preset rules in target text content In the case of, extraction unit 60 is extracted according to the first preset rules from the content of text matched with the first preset rules At least one target keyword, memory cell 80 stores to same keyword set at least one target keyword, So that the final keyword set for getting represents the class keywords with dependency relation in law judgement document, from And realize the technique effect for extracting the keyword with correlation in law judgement document.
Preferably, the first detection unit 40 includes:Judge module, for judging it is following whether target text content meets Condition:It is located in advance in the presence of at least one first default characteristic key words, and at least one first default characteristic key words If position, wherein, if it is judged that there are at least one first default characteristic key words in target text content, and At least one first default characteristic key words are located at predeterminated position, it is determined that exist in target text content default with first The content of text that rule matches.
Alternatively, the first preset rules include multiple default sub-rules, and the first detection unit 40 includes:Detection module, For detecting successively in target text content with the presence or absence of the content of text matched with the default sub-rules of multiple;And really Cover half block, for the default sub-rule of content of text that the presence that first detects matches as target sub-rule, Extraction unit 60 includes:Extraction module, for extracting at least one of target text content mesh according to target sub-rule Mark keyword.
Alternatively, the device also includes:Second detection unit, for detecting the text matched with the first preset rules Whether content matches with the second preset rules, wherein, if detected in the text matched with the first preset rules Appearance matches with the second preset rules, then according to the first preset rules from the content of text matched with the first preset rules At least one target keyword of middle extraction.
The information processor of law judgement document includes processor and memory, above-mentioned acquiring unit, the first detection Unit, extraction unit, memory cell and second detection unit unit etc. are stored in memory as program unit, Corresponding function is realized by computing device storage said procedure unit in memory.
Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one Or more, completed by adjusting kernel parameter to the extraction of various information and structured storage in law judgement document.
Memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/ Or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory, memory includes at least one Individual storage chip.
Present invention also provides a kind of computer program product, when being performed on data processing equipment, it is adapted for carrying out just The program code of beginningization there are as below methods step:Obtain the target text content of law judgement document;Detection target text With the presence or absence of the content of text matched with the first preset rules in content;If existed in detecting target text content The content of text matched with the first preset rules, then match according to the first preset rules from first preset rules At least one target keyword is extracted in content of text;And store to same key at least one target keyword Set of words.
Above-mentioned the embodiment of the present application sequence number is for illustration only, and the quality of embodiment is not represented.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment The part of detailed description, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, can be by other Mode realize.Wherein, device embodiment described above is only schematical, such as division of described unit, Can be a kind of division of logic function, there can be other dividing mode when actually realizing, for example multiple units or component Can combine or be desirably integrated into another system, or some features can be ignored, or do not perform.It is another, institute Display or the coupling each other for discussing or direct-coupling or communication connection can be by some interfaces, unit or mould The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to On multiple units.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme Purpose.
In addition, during each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.It is above-mentioned integrated Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is to realize in the form of SFU software functional unit and as independent production marketing or when using, Can store in a computer read/write memory medium.Based on such understanding, the technical scheme essence of the application On all or part of the part that is contributed to prior art in other words or the technical scheme can be with software product Form is embodied, and the computer software product is stored in a storage medium, including some instructions are used to so that one Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application State all or part of step of method.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD Etc. it is various can be with the medium of store program codes.
The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art For member, on the premise of the application principle is not departed from, some improvements and modifications can also be made, these improve and moisten Decorations also should be regarded as the protection domain of the application.

Claims (10)

1. a kind of information processing method for law judgement document, it is characterised in that including:
Obtain the target text content of law judgement document;
Detect in the target text content with the presence or absence of the content of text matched with the first preset rules;
If there is the content of text matched with first preset rules in detecting the target text content, Then at least one is extracted according to first preset rules from the content of text matched with first preset rules Individual target keyword;And
At least one target keyword is stored to same keyword set.
2. method according to claim 1, it is characterised in that whether there is in the detection target text content with The content of text that first preset rules match includes:
Judge whether the target text content meets following condition:In the presence of at least one first default feature criticals Word, and described at least one first default characteristic key words are located at predeterminated position,
Wherein, if it is judged that there are described at least one first default feature criticals in the target text content Word, and described at least one first default characteristic key words are located at the predeterminated position, it is determined that the target There is the content of text matched with first preset rules in content of text.
3. method according to claim 1, it is characterised in that first preset rules include multiple default cuckoos Then,
Detect in the target text content includes with the presence or absence of the content of text matched with the first preset rules: Detect successively in the target text content with the presence or absence of the content of text matched with the multiple default sub-rule; And the default sub-rule of the content of text matched described in the presence for detecting first is used as target sub-rule,
Extracted at least from the content of text matched with first preset rules according to first preset rules One target keyword includes:
At least one of target text content target keyword is extracted according to the target sub-rule.
4. method according to claim 1, it is characterised in that exist in the target text content is detected with In the case of the content of text that first preset rules match, according to first preset rules from institute Stating before extract at least one target keyword in the content of text that the first preset rules match, methods described is also Including:
Whether the content of text that detection matches with first preset rules matches with the second preset rules,
Wherein, if detecting the content of text and the described second default rule matched with first preset rules Then match, then according to first preset rules from the content of text matched with first preset rules Extract at least one target keyword.
5. method according to claim 3, it is characterised in that the text that detection matches with first preset rules This content whether with the second preset rules match including:
The part of speech of the second default characteristic key words in the content of text that detection matches with first preset rules Whether be default part of speech, wherein, the described second default characteristic key words be according to the 3rd preset rules will with it is described The content of text that first preset rules match split the keyword for obtaining,
Wherein, if in detecting the content of text matched with first preset rules second presets feature The part of speech of keyword is the default part of speech, it is determined that the content of text that matches with first preset rules with Second preset rules match.
6. method according to claim 1, it is characterised in that by least one target keyword store to Before same keyword set, methods described also includes:
The numeral of the non-Arabic numerals form at least one target keyword is converted into Arabic numerals Form, wherein, it is converted into the stored digital of Arabic numerals form to the same keyword set.
7. a kind of information processor for law judgement document, it is characterised in that including:
Acquiring unit, the target text content for obtaining law judgement document;
First detection unit, whether there is and the first preset rules phase for detecting in the target text content The content of text matched somebody with somebody;
Extraction unit, if existed and the first preset rules phase for detecting in the target text content The content of text of matching, then according to first preset rules from the text matched with first preset rules At least one target keyword is extracted in content;And
Memory cell, at least one target keyword to be stored to same keyword set.
8. device according to claim 7, it is characterised in that first detection unit includes:
Judge module, for judging whether the target text content meets following condition:In the presence of at least one One default characteristic key words, and described at least one first default characteristic key words are located at predeterminated position,
Wherein, if it is judged that there are described at least one first default feature criticals in the target text content Word, and described at least one first default characteristic key words are located at the predeterminated position, it is determined that the target There is the content of text matched with first preset rules in content of text.
9. device according to claim 7, it is characterised in that first preset rules include multiple default cuckoos Then,
First detection unit includes:Detection module, for detect successively in the target text content whether In the presence of the content of text matched with the multiple default sub-rule;And determining module, for first to be examined The default sub-rule of the content of text matched described in the presence for measuring as target sub-rule,
The extraction unit includes:Extraction module, for being extracted in the target text according to the target sub-rule At least one of appearance target keyword.
10. device according to claim 7, it is characterised in that described device also includes:
Second detection unit, for detecting the content of text matched with first preset rules whether with second Preset rules match,
Wherein, if detecting the content of text and the described second default rule matched with first preset rules Then match, then according to first preset rules from the content of text matched with first preset rules Extract at least one target keyword.
CN201510869588.9A 2015-12-01 2015-12-01 Information processing method and device for legal referee document Active CN106815207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510869588.9A CN106815207B (en) 2015-12-01 2015-12-01 Information processing method and device for legal referee document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510869588.9A CN106815207B (en) 2015-12-01 2015-12-01 Information processing method and device for legal referee document

Publications (2)

Publication Number Publication Date
CN106815207A true CN106815207A (en) 2017-06-09
CN106815207B CN106815207B (en) 2020-08-11

Family

ID=59108030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510869588.9A Active CN106815207B (en) 2015-12-01 2015-12-01 Information processing method and device for legal referee document

Country Status (1)

Country Link
CN (1) CN106815207B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197163A (en) * 2017-12-14 2018-06-22 上海银江智慧智能化技术有限公司 A kind of structuring processing method based on judgement document
CN108345584A (en) * 2018-01-04 2018-07-31 东南大学 A kind of rule-based doctor-patient dispute case keyword extracting method
CN108549813A (en) * 2018-03-02 2018-09-18 彭根 Method of discrimination, device and pocessor and storage media
CN109285094A (en) * 2017-07-19 2019-01-29 北京国双科技有限公司 The processing method and processing device of legal documents
CN109426905A (en) * 2017-08-29 2019-03-05 北京国双科技有限公司 A kind of determination method and device that the criminal document measurement of penalty deviates
CN110019659A (en) * 2017-07-31 2019-07-16 北京国双科技有限公司 The search method and device of judgement document
CN110032721A (en) * 2018-01-11 2019-07-19 北京国双科技有限公司 A kind of judgement document's method for pushing and device
CN111274354A (en) * 2020-01-15 2020-06-12 中科鼎富(北京)科技发展有限公司 Referee document structuring method and device
WO2020135247A1 (en) * 2018-12-24 2020-07-02 北京国双科技有限公司 Legal document parsing method and device
CN111798344A (en) * 2020-07-01 2020-10-20 北京金堤科技有限公司 Method and device for determining subject name, electronic equipment and storage medium
CN111950253A (en) * 2020-08-28 2020-11-17 鼎富智能科技有限公司 Evidence information extraction method and device for referee document

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1367446A (en) * 2001-01-22 2002-09-04 前程无忧网络信息技术(北京)有限公司上海分公司 Chinese personal biographical notes information treatment system and method
CN102930054A (en) * 2012-11-19 2013-02-13 北京奇虎科技有限公司 Data search method and data search system
CN104899262A (en) * 2015-05-22 2015-09-09 华中师范大学 Information categorization method supporting user-defined categorization rules
CN105069076A (en) * 2015-07-31 2015-11-18 北京奇虎科技有限公司 Method and apparatus for determining address information in home page of official website

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1367446A (en) * 2001-01-22 2002-09-04 前程无忧网络信息技术(北京)有限公司上海分公司 Chinese personal biographical notes information treatment system and method
CN102930054A (en) * 2012-11-19 2013-02-13 北京奇虎科技有限公司 Data search method and data search system
CN104899262A (en) * 2015-05-22 2015-09-09 华中师范大学 Information categorization method supporting user-defined categorization rules
CN105069076A (en) * 2015-07-31 2015-11-18 北京奇虎科技有限公司 Method and apparatus for determining address information in home page of official website

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109285094B (en) * 2017-07-19 2021-11-30 北京国双科技有限公司 Legal document processing method and device
CN109285094A (en) * 2017-07-19 2019-01-29 北京国双科技有限公司 The processing method and processing device of legal documents
CN110019659A (en) * 2017-07-31 2019-07-16 北京国双科技有限公司 The search method and device of judgement document
CN109426905B (en) * 2017-08-29 2022-03-18 北京国双科技有限公司 Criminal document criminal deviation judging method and device
CN109426905A (en) * 2017-08-29 2019-03-05 北京国双科技有限公司 A kind of determination method and device that the criminal document measurement of penalty deviates
CN108197163B (en) * 2017-12-14 2021-08-10 上海银江智慧智能化技术有限公司 Structured processing method based on referee document
CN108197163A (en) * 2017-12-14 2018-06-22 上海银江智慧智能化技术有限公司 A kind of structuring processing method based on judgement document
CN108345584A (en) * 2018-01-04 2018-07-31 东南大学 A kind of rule-based doctor-patient dispute case keyword extracting method
CN110032721A (en) * 2018-01-11 2019-07-19 北京国双科技有限公司 A kind of judgement document's method for pushing and device
CN108549813A (en) * 2018-03-02 2018-09-18 彭根 Method of discrimination, device and pocessor and storage media
CN111428466A (en) * 2018-12-24 2020-07-17 北京国双科技有限公司 Legal document analysis method and device
WO2020135247A1 (en) * 2018-12-24 2020-07-02 北京国双科技有限公司 Legal document parsing method and device
CN111428466B (en) * 2018-12-24 2022-04-01 北京国双科技有限公司 Legal document analysis method and device
CN111274354A (en) * 2020-01-15 2020-06-12 中科鼎富(北京)科技发展有限公司 Referee document structuring method and device
CN111274354B (en) * 2020-01-15 2023-08-11 鼎富智能科技有限公司 Referee document structuring method and referee document structuring device
CN111798344A (en) * 2020-07-01 2020-10-20 北京金堤科技有限公司 Method and device for determining subject name, electronic equipment and storage medium
CN111798344B (en) * 2020-07-01 2023-09-22 北京金堤科技有限公司 Principal name determining method and apparatus, electronic device, and storage medium
CN111950253A (en) * 2020-08-28 2020-11-17 鼎富智能科技有限公司 Evidence information extraction method and device for referee document
CN111950253B (en) * 2020-08-28 2023-12-08 鼎富智能科技有限公司 Evidence information extraction method and device for referee document

Also Published As

Publication number Publication date
CN106815207B (en) 2020-08-11

Similar Documents

Publication Publication Date Title
CN106815207A (en) For the information processing method and device of law judgement document
CN108509482B (en) Question classification method and device, computer equipment and storage medium
CN106815208A (en) The analysis method and device of law judgement document
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
CN110738039B (en) Case auxiliary information prompting method and device, storage medium and server
CN104133916B (en) Search result information method for organizing and device
CN113282955B (en) Method, system, terminal and medium for extracting privacy information in privacy policy
CN112632989B (en) Method, device and equipment for prompting risk information in contract text
CN106502879A (en) A kind of method and device for realizing applications security detection
CN113849760B (en) Sensitive information risk assessment method, system and storage medium
CN106776609A (en) Reprint the statistical method and device of quantity in website
CN111078839A (en) Structured processing method and processing device for referee document
CN113609261A (en) Vulnerability information mining method and device based on knowledge graph of network information security
CN110968664A (en) Document retrieval method, device, equipment and medium
CN115080704A (en) Computer file security check method and system based on scoring mechanism
CN113392637B (en) TF-IDF-based subject term extraction method, device, equipment and storage medium
CN104036189A (en) Page distortion detecting method and black link database generating method
EP3752929A1 (en) Computer-implemented methods, computer-readable media, and systems for identifying causes of loss
CN106649367B (en) Method and device for detecting keyword popularization degree
CN115879110B (en) System for identifying financial risk website based on fingerprint penetration technology
CN112395866A (en) Customs declaration data matching method and device
CN115563288B (en) Text detection method and device, electronic equipment and storage medium
CN110619212B (en) Character string-based malicious software identification method, system and related device
CN111026885A (en) System and method for extracting entity attribute of terrorist-related event based on text corpus
CN113888760B (en) Method, device, equipment and medium for monitoring violation information based on software application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant