CN106815207A - For the information processing method and device of law judgement document - Google Patents
For the information processing method and device of law judgement document Download PDFInfo
- Publication number
- CN106815207A CN106815207A CN201510869588.9A CN201510869588A CN106815207A CN 106815207 A CN106815207 A CN 106815207A CN 201510869588 A CN201510869588 A CN 201510869588A CN 106815207 A CN106815207 A CN 106815207A
- Authority
- CN
- China
- Prior art keywords
- content
- text
- preset rules
- target
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
This application discloses a kind of information processing method and device for law judgement document.The method includes:Obtain the target text content of law judgement document;With the presence or absence of the content of text matched with the first preset rules in detection target text content;If there is the content of text matched with the first preset rules in detecting target text content, at least one target keyword is extracted from the content of text matched with the first preset rules according to the first preset rules;And store to same keyword set at least one target keyword.By the application, solving the keyword extracted from law judgement document in correlation technique cannot embody the technical problem of the correlation between keyword.
Description
Technical field
The application is related to text-processing field, in particular to a kind of information processing side for law judgement document
Method and device.
Background technology
Penal code works author when settling a case, generally require to sentence the charge involved by case, punishment type,
Criminal punishment amount, law of judgement institute foundation etc. are considered, with real work as operating reference
Data.The source of these reference datas is usually that people's court has adjudicated and disclosed a large amount of cases, by case
Do the result that big data analytic statistics draws.
In correlation technique, when big data analytic statistics is done to case, can temporarily travel through the related law of all cases and cut out
Document is sentenced, to obtain the keyword included in case.Due to what is be related in the criminal case of people's court's examination judgement
Punishment information point is more, contain much information, content complicated, represent variation, and it is also more to adjudicate the law species of foundation,
For example, cause of criminal action charge species is various, the type of punishments sentenced according to different charges is different and the measurement of penalty is also different.Cause
This, when data are inquired about to law judgement document data set using the method, due to word for word searching full text, to service
Device causes very big pressure, and time-consuming very long;The correlation that the interim result for finding out cannot be embodied between keyword
Property (such as the correlation between punishment information), is unfavorable for carrying out big data statistical analysis.
The correlation between keyword cannot be embodied for the keyword extracted from law judgement document in correlation technique
Technical problem, effective solution is not yet proposed at present.
The content of the invention
The embodiment of the present application provides a kind of information processing method and device for law judgement document, at least to solve
The technology that the keyword extracted from law judgement document in correlation technique cannot embody the correlation between keyword is asked
Topic.
According to the one side of the embodiment of the present application, there is provided a kind of information processing method for law judgement document.
The method includes:Obtain the target text content of law judgement document;Whether there is and the in detection target text content
The content of text that one preset rules match;If existed and the first preset rules phase in detecting target text content
The content of text matched somebody with somebody, then extract at least according to the first preset rules from the content of text matched with the first preset rules
One target keyword;And store to same keyword set at least one target keyword.
Further, include with the presence or absence of the content of text matched with the first preset rules in detection target text content:
Judge whether target text content meets following condition:In the presence of at least one first default characteristic key words, and at least
One first default characteristic key words are located at predeterminated position, wherein, if it is judged that existing at least in target text content
One first default characteristic key words, and at least one first default characteristic key words are located at predeterminated position, it is determined that
There is the content of text matched with the first preset rules in target text content.
Further, the first preset rules include multiple default sub-rules, whether there is in detection target text content with
The content of text that first preset rules match includes:Whether there is in detection target text content successively default with multiple
The content of text that sub-rule matches;And the default cuckoo of the content of text that the presence that first detects matches
Then as target sub-rule, according to the first preset rules extracted from the content of text matched with the first preset rules to
A few target keyword includes:At least one of target text content target keyword is extracted according to target sub-rule.
Further, there are the feelings of the content of text matched with the first preset rules in target text content is detected
Under condition, at least one target is being extracted from the content of text matched with the first preset rules according to the first preset rules
Before keyword, the method also includes:Whether the content of text that detection matches with the first preset rules is default with second
Rule matches, wherein, if detecting the content of text matched with the first preset rules and the second preset rules phase
Matching, then extract at least one target according to the first preset rules from the content of text matched with the first preset rules
Keyword.
Further, whether the content of text that detection matches with the first preset rules matches bag with the second preset rules
Include:Whether the part of speech of the second default characteristic key words in the content of text that detection matches with the first preset rules is pre-
If part of speech, wherein, the second default characteristic key words will match according to the 3rd preset rules with the first preset rules
Content of text split the keyword for obtaining, wherein, if detected in the text matched with the first preset rules
The part of speech of the second default characteristic key words in appearance is default part of speech, it is determined that the text matched with the first preset rules
Content matches with the second preset rules.
Further, stored to before same keyword set by least one target keyword, the method is also wrapped
Include:The numeral of the non-Arabic numerals form at least one target keyword is converted into Arabic numerals form, its
In, it is converted into the stored digital of Arabic numerals form to same keyword set.
According to the another aspect of the embodiment of the present application, a kind of information processor for law judgement document is additionally provided.
The device includes:Acquiring unit, the target text content for obtaining law judgement document;First detection unit, uses
With the presence or absence of the content of text matched with the first preset rules in target text content is detected;Extraction unit, is used for
It is default according to first if there is the content of text matched with the first preset rules in detecting target text content
Rule extracts at least one target keyword from the content of text matched with the first preset rules;And memory cell,
For at least one target keyword to be stored to same keyword set.
Further, the first detection unit includes:Judge module, for judging it is following whether target text content meets
Condition:It is located in advance in the presence of at least one first default characteristic key words, and at least one first default characteristic key words
If position, wherein, if it is judged that there are at least one first default characteristic key words in target text content, and
At least one first default characteristic key words are located at predeterminated position, it is determined that exist in target text content default with first
The content of text that rule matches.
Further, the first preset rules include multiple default sub-rules, and the first detection unit includes:Detection module,
For detecting successively in target text content with the presence or absence of the content of text matched with the default sub-rules of multiple;And really
Cover half block, for the default sub-rule of content of text that the presence that first detects matches as target sub-rule,
Extraction unit includes:Extraction module, closes for extracting at least one of target text content target according to target sub-rule
Keyword.
Further, the device also includes:Second detection unit, for detecting the text matched with the first preset rules
Whether this content matches with the second preset rules, wherein, if detecting the text matched with the first preset rules
Content matches with the second preset rules, then according to the first preset rules from the text matched with the first preset rules
At least one target keyword is extracted in appearance.
In the embodiment of the present application, using the method for comprising the following steps:Obtain in the target text of law judgement document
Hold;With the presence or absence of the content of text matched with the first preset rules in detection target text content;If detecting mesh
There is the content of text that matches with the first preset rules in mark content of text, then according to the first preset rules from first
At least one target keyword is extracted in the content of text that preset rules match;And by least one target keyword
Store to same keyword set, solving the keyword extracted from law judgement document in correlation technique cannot
Embody the technical problem of the correlation between keyword, and then whether there is in target text content and first by being detected
The content of text that preset rules match, exists in the text matched with the first preset rules in target text content
In the case of appearance, at least one is extracted from the content of text matched with the first preset rules according to the first preset rules
Target keyword, and at least one target keyword is stored to same keyword set so that finally get
Keyword set has a class keywords of dependency relation in representing law judgement document, it is achieved thereby that extracting law
The technique effect of the keyword with correlation in judgement document.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen
Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In accompanying drawing
In:
Fig. 1 is the flow chart of the information processing method for law judgement document according to the application first embodiment;
Fig. 2 is the flow chart of the information processing method for law judgement document according to the application second embodiment;With
And
Fig. 3 is the schematic diagram of the information processor for law judgement document according to the embodiment of the present application.
Specific embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment
The only embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability
The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, should all belong to
The scope of the application protection.
It should be noted that term " first ", " in the description and claims of this application and above-mentioned accompanying drawing
Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this
The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except
Here the order beyond those for illustrating or describing is implemented.Additionally, term " comprising " and " having " and they
Any deformation, it is intended that covering is non-exclusive to be included, for example, containing process, the side of series of steps or unit
Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear
List or for these processes, method, product or other intrinsic steps of equipment or unit.
According to the embodiment of the present application, there is provided a kind of embodiment of the method for information processing method for law judgement document,
It should be noted that can be in the such as one group calculating of computer executable instructions the step of the flow of accompanying drawing is illustrated
Performed in machine system, and, although logical order is shown in flow charts, but in some cases, can be with
Shown or described step is performed different from order herein.
Fig. 1 is the flow chart of the information processing method for law judgement document according to the application first embodiment, such as
Shown in Fig. 1, the method comprises the following steps:
Step S102, obtains the target text content of law judgement document.
Multiple nature paragraphs are generally included in law judgement document, each natural paragraph has certain text message meaning
Deflection.Wherein, the text message meaning deflection that possible certain several paragragh fall is identical.Performed to law judgement document
Before information processing, law judgement document can be carried out the division of content according to text message meaning deflection, wherein,
The content of text of same meaning deflection can be processed as target text content.For example, in law judgement document
Include describing one or more natural paragraph of the contents such as plaintiff, defendant, punishment.Can description plaintiff content (or
Defendant's person content, punishment content) natural paragraph collect, as target text content.The mesh for obtaining by this way
Mark content of text represents the full text content on certain meaning deflection in law judgement document, therefore, to the target
Content of text carries out the extraction of relevant information, and the information extracted will be made more complete, accurate.
Step S104, with the presence or absence of the content of text matched with the first preset rules in detection target text content.
First preset rules are the rule for performing text matches set in advance, can be regularity, Huo Zheqi
He is regular.First preset rules are the matched rules set according to user's request, in different types of target text
Hold, the first different preset rules can be set.For example, if target text content characterizes plaintiff's content, user thinks
The regulation that the name and plaintiff for getting plaintiff are offended, then can set the first preset rules is:* .* crime is violated.Pass through
Full content to target text content is traveled through, and can detect whether there is in target text content default with first
The content of text that rule matches.
Step S106, if there is the content of text matched with the first preset rules in detecting target text content,
At least one target keyword is then extracted from the content of text matched with the first preset rules according to the first preset rules.
If there is the content of text matched with the first preset rules in detecting target text content, according to first
Preset rules carry out information extraction to the content.For example, target text content characterizes punishment content, the first preset rules
For:* violate .* crime, sentence .* punishment.Target text content includes " Wang's commission of a theft, sentence fixed-term imprisonment ... ", then root
Information extraction is carried out to the content according to the first preset rules, target keyword can be obtained:Wang, steal, have phase apprentice
Punishment.
Step S108, at least one target keyword is stored to same keyword set.
Typically there is logical relation according between at least one target keyword that the above method is obtained.For example, on
Target keyword " Wang ", " theft " for extracting are stated, wherein, defendant is entitled " Wang ", its punishment violated
It is " theft " crime to penalize.Therefore, at least one target keyword that will have association in logic is stored to same pass
Keyword set, during follow-up statistics, analysis, can be readily available one group of key with dependency relation
Word (keyword set), and other target keywords related to wherein any one target keyword can be known.
According to the information processing method for law judgement document of the embodiment, by the mesh for obtaining law judgement document
Mark content of text;With the presence or absence of the content of text matched with the first preset rules in detection target text content;If
Detect in target text content there is the content of text matched with the first preset rules, then according to the first preset rules
At least one target keyword is extracted from the content of text matched with the first preset rules;And by least one mesh
Mark keyword is stored to same keyword set, solves the pass extracted from law judgement document in correlation technique
Whether keyword cannot embody the technical problem of the correlation between keyword, and then deposit in target text content by being detected
In the content of text matched with the first preset rules, exist in target text content and match with the first preset rules
Content of text in the case of, extracted from the content of text matched with the first preset rules according to the first preset rules
At least one target keyword, and at least one target keyword is stored to same keyword set so that it is final
The keyword set for getting represents the class keywords with dependency relation in law judgement document, it is achieved thereby that
Extract the technique effect of the keyword with correlation in law judgement document.
Preferably, include with the presence or absence of the content of text matched with the first preset rules in detection target text content:
Judge whether target text content meets following condition:In the presence of at least one first default characteristic key words, and at least
One first default characteristic key words are located at predeterminated position, wherein, if it is judged that existing at least in target text content
One first default characteristic key words, and at least one first default characteristic key words are located at predeterminated position, it is determined that
There is the content of text matched with the first preset rules in target text content.
In this embodiment, the first preset rules are set as needing to include that at least one first is pre- in text to be detected
If characteristic key words, and these first default characteristic key words are located at predetermined position.For example, the first preset rules
For:.* violate .* crime, sentence .* punishment .* [moon | year], its expression:Certain criminal crime, sentences how many of certain punishment month (or year).
Wherein, the rule needed in text to be detected including violating, crime, sentence, punishment, year, keyword (at least etc. the moon
Individual first default characteristic key words), and violate, crime, sentence, there is position in logic to close for punishment, year, the keyword such as the moon
System's (be located at predeterminated position), for example, be not newline behind " criminals ", and " criminal " position in the position of " crime "
Before putting, the character with predetermined number between the position of " criminal " and " crime ".According to the method, can be more accurate
Ground obtains the text message that user needs, and with matching efficiency higher.
Alternatively, the first preset rules include multiple default sub-rules, whether there is in detection target text content and the
The content of text that one preset rules match includes:Whether there is in detection target text content successively and the default son of multiple
The content of text that rule matches;And the default sub-rule of the content of text that the presence that first detects matches
As target sub-rule, extracted at least from the content of text matched with the first preset rules according to the first preset rules
One target keyword includes:At least one of target text content target keyword is extracted according to target sub-rule.
Under normal circumstances, can be directed to that different target text content settings in law judgement document are different first to be preset
Rule, and can also set the first different preset rules for different extraction demands.For example, for plaintiff's content,
Defendant's content, punishment content etc. can be respectively provided with the first different preset rules.In for plaintiff's content, according to
The different information extraction demand in family (e.g., extracts plaintiff's name, sex, native place etc.;Or, extract plaintiff's age etc.),
The first different preset rules can also be set.In addition, even for same type of target text content and together
The user's request of sample, due to the difference of the form of presentation of the author of law judgement document, different law judgement documents
In the first preset rules that can match of the target text content it could also be possible that different.Judge text for law
For the information processing of book, especially to the information processing of a large amount of law judgement documents, can in advance by using up for extracting
The first preset rules more than possible are stored in database, according to the applicability of different content of text to the first preset rules
Classified.When information processing is carried out to certain class text content, corresponding first preset rules of class are entered one by one
Row matching, until matching certain rule untill (match target sub-rule).It is every in one the first preset rules of class
Individual preset rules are above-mentioned default sub-rule.
In order to improve the degree of accuracy of matching, alternatively, exist and the first preset rules in target text content is detected
In the case of the content of text for matching, according to the first preset rules from the text matched with the first preset rules
Before at least one target keyword is extracted in appearance, the method also includes:The text that detection matches with the first preset rules
Whether this content matches with the second preset rules, wherein, if detecting the text matched with the first preset rules
Content matches with the second preset rules, then according to the first preset rules from the text matched with the first preset rules
At least one target keyword is extracted in appearance.
In this embodiment, there is the content of text matched with the first preset rules in target text content is detected
Afterwards, can detect whether the content of text matched with the first preset rules matches with the second preset rules.Wherein,
Second preset rules are the further restriction to user's request text set in advance.Second preset rules can be with
The preset rules of the corresponding setting of the first preset rules, are the further restriction of the first preset rules.It is pre- by two
If regular double definition, can largely improve the accuracy of the text message for matching.
Preferably, the content of text that detection matches with the first preset rules whether with the second preset rules match including:
Whether the part of speech of the second default characteristic key words in the content of text that detection matches with the first preset rules is default word
Property, wherein, the second default characteristic key words are the texts that will be matched with the first preset rules according to the 3rd preset rules
Content split the keyword for obtaining, wherein, if in detecting the content of text matched with the first preset rules
The part of speech of the second default characteristic key words be default part of speech, it is determined that the content of text matched with the first preset rules
Match with the second preset rules.
Due to merely above (being located in advance comprising the first default characteristic key words and the first default characteristic key words from literal
If position) content of text is identified, may result in the content of text for identifying not is the required text of user
This content.For example, certain keyword for limiting in the first preset rules is verb, carried out to target text content
During matching, a noun represented with same keyword has but been matched.In view of in Chinese text literal expression side
Formula has variation, therefore, only rely on the literal matching for carrying out content of text and matching error is occurring in some cases.
Therefore, in this embodiment, the keyword in the content of text that the second preset rules pair match with the first preset rules
Part of speech be defined.According to the 3rd preset rules, the content of text matched with the first preset rules is torn open
Point, obtain the second default characteristic key words.Wherein, the 3rd preset rules can be according to content of text from front to back
Order, splits according to part of speech to text, obtains multiple keywords.By determining whether that the second default feature is closed
Whether the part of speech of keyword is default part of speech, and when the part of speech of the second default characteristic key words is judged for default part of speech,
It is determined that the content of text matched with the first preset rules matches with the second preset rules, text matches are effectively increased
Accuracy.
For example, the first preset rules are:.* violate .* crime, sentence .* punishment .* [moon | year];The default rule of second corresponding thereto
It is then:Name+verb+legal terms+comma+verb+legal terms+measure word.Assuming that target text content is included such as
Under description:Defendant's Huang Lei commissions of a theft, are sentenced to fixed-term imprisonment seven months.Target text content is preset with first
Rule performs matching treatment, can know that " defendant's Huang Lei commissions of a theft, be sentenced to fixed-term imprisonment seven months " meets the
One preset rules.The content matched with the first preset rules is performed with the second preset rules is again matched.Wherein, will
With the content that the first preset rules match split the default characteristic key words of second for obtaining is:Huang Lei, criminal, robber
Surreptitiously crime, comma (), sentence, fixed-term imprisonment, seven months.Specifically, yellow of heap of stone/name+criminal/verb+theft
Crime/legal terms+,/comma+sentence/verb+fixed-term imprisonment/legal terms+seven months/measure word.It can be seen that,
The match is successful.Then, according to the first preset rules, can extract:Huang Lei, larceny, fixed-term imprisonment, seven
The target keywords such as the moon.Keyword to extracting can carry out structuring encapsulation, and be stored in database, for inspection
Rope, statistics, cluster etc. are used.
Again for example, the first preset rules are:Income generated in violation of the regulations RMB .* units [^,.] * (|) give recovery;With it
The second corresponding preset rules are:Verb+noun+measure word+comma+verb+verb.
For the ease of keyword set is carried out it is unitized manage, alternatively, deposited by least one target keyword
Before storage to same keyword set, the method also includes:By the non-Arabic number at least one target keyword
The numeral of font formula is converted to Arabic numerals form, wherein, it is converted into the stored digital of Arabic numerals form extremely
Same keyword set.For example, character string " seven " is changed into numerical value " 7 ".
Fig. 2 is the flow chart of the information processing method for law judgement document according to the application second embodiment, should
Embodiment can be as a kind of preferred embodiment of embodiment illustrated in fig. 1.As shown in Fig. 2 the method is including as follows
Step:
Step S202, extracts the punishment paragraph in law judgement document.
Extracted from law judgement document punishment paragraph (namely description punishment paragraph, as in target text
Hold).Extraction process, can be analyzed, according to punishment keyword from method to the law judgement document that people's court announces
The paragraph for meeting condition is extracted in rule judgement document's full text.For example, regular expression can be matched:Judgement is as follows | ruling
| fixed-term imprisonment | life imprisonment as follows | death penalty, the regular expression is matched in full with law judgement document, can be defeated
Go out punishment paragraph.
Step S204, the matching of punishment rule information.
Punishment paragraph can be done participle and part of speech analysis, by the result after analysis and preset rules list (including multiple
First preset rules, and the second corresponding preset rules) matching is sequentially performed one by one, until matching
One backed off after random of success, then analysis result and match condition are exported.Said process, namely judge the punishment paragraph
With the presence or absence of the content of text for meeting the first preset rules and the second preset rules.
For example, punishment paragraph has following description:Defendant's Huang Lei commissions of a theft, are sentenced to fixed-term imprisonment seven months.Enter
The result that obtains is after row participle and part of speech analysis:(yellow of heap of stone/name+criminal/verb+larceny/legal terms+,
/ comma+sentence/verb+fixed-term imprisonment/legal terms+seven months/measure word).It can be seen that, it meets following rule:
First preset rules:.* violate .* crime, sentence .* punishment .* [moon | year];And second preset rules:Name+verb+law name
Word+comma+verb+legal terms+measure word.Therefore, the match is successful for preset rules.
If it should be noted that in no one of list of rules preset rules (the first preset rules and corresponding
The second preset rules) the match is successful, then by the output of this punishment paragraph in failure record.Subsequently can be by people's work point
Analysis obtains the new rule corresponding to punishment content in the paragraph, and is added in list of rules, is used to improve rule
Then list.
Step S206, extracts the punishment data in punishment paragraph.
According to the matching of above-mentioned completed preset rules, can be according to related in the first preset rules extraction punishment paragraph
Punishment data.For example, during above-mentioned punishment paragraph can be extracted:Huang Lei, larceny, fixed-term imprisonment, seven
The punishment keywords such as the moon.Or, " larceny " that can be only to wherein including is extracted.
Punishment data are carried out structured storage by step S208.
The data of extraction are done structuring encapsulation by what is extracted, and persistent storage is carried out in database, for retrieving,
Statistics, cluster etc. are used.Alternatively, before structured storage is carried out to punishment data, can be by non-Arabic number
The numeral of font formula is converted into Arabic numerals form, in order to subsequently carry out unified management to punishment data.For example:
Character string " seven " is changed into numerical value " 7 ".
According to the information processing method for law judgement document of the embodiment, it is capable of achieving to cut out non-structured law
Sentence effective extraction of punishment data in document, and then obtain correlation between punishment information and information included in document.
In addition, the embodiment is packaged storage in structuring multi-dimensional data form to punishment data.By above-mentioned advance place
The structural data of the various dimensions of reason so that in terms of big data, cloud storage, is capable of the inspection of quick response punishment data
Rope, statistics, cluster etc..
Below according to the embodiment of the present application, there is provided a kind of device reality of information processor for law judgement document
Apply example.
It should be noted that the information processor for law judgement document according to the embodiment of the present application can be used for
The information processing method for law judgement document according to the embodiment of the present application is performed, according to the use of the embodiment of the present application
In law judgement document information processing method can also by according to the embodiment of the present application for law judgement document's
Information processor is performed.
Fig. 3 is the schematic diagram of the information processor for law judgement document according to the embodiment of the present application.Such as Fig. 3
Shown, the device includes:Acquiring unit 20, the first detection unit 40, extraction unit 60 and memory cell 80.
Acquiring unit 20, the target text content for obtaining law judgement document.
First detection unit 40, for detecting in target text content with the presence or absence of the text matched with the first preset rules
This content.
Extraction unit 60, if for detecting in target text content there is the text matched with the first preset rules
Content, then extract at least one target according to the first preset rules from the content of text matched with the first preset rules
Keyword.
Memory cell 80, at least one target keyword to be stored to same keyword set.
According to the information processor for law judgement document of the embodiment, law is obtained by acquiring unit 20 and is cut out
Sentence the target text content of document;Whether there is in the detection target text content of first detection unit 40 and the first default rule
The content of text for then matching;Extraction unit 60 exists and the first preset rules phase in target text content is detected
In the case of the content of text matched somebody with somebody, then according to the first preset rules from the content of text matched with the first preset rules
Extract at least one target keyword;And memory cell 80 stores to same key at least one target keyword
Set of words, solving the keyword extracted from law judgement document in correlation technique cannot embody between keyword
The technical problem of correlation, so it is pre- with first by whether there is in the detection target text content of the first detection unit 40
If the content of text that rule matches, there is the content of text matched with the first preset rules in target text content
In the case of, extraction unit 60 is extracted according to the first preset rules from the content of text matched with the first preset rules
At least one target keyword, memory cell 80 stores to same keyword set at least one target keyword,
So that the final keyword set for getting represents the class keywords with dependency relation in law judgement document, from
And realize the technique effect for extracting the keyword with correlation in law judgement document.
Preferably, the first detection unit 40 includes:Judge module, for judging it is following whether target text content meets
Condition:It is located in advance in the presence of at least one first default characteristic key words, and at least one first default characteristic key words
If position, wherein, if it is judged that there are at least one first default characteristic key words in target text content, and
At least one first default characteristic key words are located at predeterminated position, it is determined that exist in target text content default with first
The content of text that rule matches.
Alternatively, the first preset rules include multiple default sub-rules, and the first detection unit 40 includes:Detection module,
For detecting successively in target text content with the presence or absence of the content of text matched with the default sub-rules of multiple;And really
Cover half block, for the default sub-rule of content of text that the presence that first detects matches as target sub-rule,
Extraction unit 60 includes:Extraction module, for extracting at least one of target text content mesh according to target sub-rule
Mark keyword.
Alternatively, the device also includes:Second detection unit, for detecting the text matched with the first preset rules
Whether content matches with the second preset rules, wherein, if detected in the text matched with the first preset rules
Appearance matches with the second preset rules, then according to the first preset rules from the content of text matched with the first preset rules
At least one target keyword of middle extraction.
The information processor of law judgement document includes processor and memory, above-mentioned acquiring unit, the first detection
Unit, extraction unit, memory cell and second detection unit unit etc. are stored in memory as program unit,
Corresponding function is realized by computing device storage said procedure unit in memory.
Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one
Or more, completed by adjusting kernel parameter to the extraction of various information and structured storage in law judgement document.
Memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/
Or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory, memory includes at least one
Individual storage chip.
Present invention also provides a kind of computer program product, when being performed on data processing equipment, it is adapted for carrying out just
The program code of beginningization there are as below methods step:Obtain the target text content of law judgement document;Detection target text
With the presence or absence of the content of text matched with the first preset rules in content;If existed in detecting target text content
The content of text matched with the first preset rules, then match according to the first preset rules from first preset rules
At least one target keyword is extracted in content of text;And store to same key at least one target keyword
Set of words.
Above-mentioned the embodiment of the present application sequence number is for illustration only, and the quality of embodiment is not represented.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment
The part of detailed description, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, can be by other
Mode realize.Wherein, device embodiment described above is only schematical, such as division of described unit,
Can be a kind of division of logic function, there can be other dividing mode when actually realizing, for example multiple units or component
Can combine or be desirably integrated into another system, or some features can be ignored, or do not perform.It is another, institute
Display or the coupling each other for discussing or direct-coupling or communication connection can be by some interfaces, unit or mould
The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit
The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to
On multiple units.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme
Purpose.
In addition, during each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.It is above-mentioned integrated
Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is to realize in the form of SFU software functional unit and as independent production marketing or when using,
Can store in a computer read/write memory medium.Based on such understanding, the technical scheme essence of the application
On all or part of the part that is contributed to prior art in other words or the technical scheme can be with software product
Form is embodied, and the computer software product is stored in a storage medium, including some instructions are used to so that one
Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application
State all or part of step of method.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD
Etc. it is various can be with the medium of store program codes.
The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art
For member, on the premise of the application principle is not departed from, some improvements and modifications can also be made, these improve and moisten
Decorations also should be regarded as the protection domain of the application.
Claims (10)
1. a kind of information processing method for law judgement document, it is characterised in that including:
Obtain the target text content of law judgement document;
Detect in the target text content with the presence or absence of the content of text matched with the first preset rules;
If there is the content of text matched with first preset rules in detecting the target text content,
Then at least one is extracted according to first preset rules from the content of text matched with first preset rules
Individual target keyword;And
At least one target keyword is stored to same keyword set.
2. method according to claim 1, it is characterised in that whether there is in the detection target text content with
The content of text that first preset rules match includes:
Judge whether the target text content meets following condition:In the presence of at least one first default feature criticals
Word, and described at least one first default characteristic key words are located at predeterminated position,
Wherein, if it is judged that there are described at least one first default feature criticals in the target text content
Word, and described at least one first default characteristic key words are located at the predeterminated position, it is determined that the target
There is the content of text matched with first preset rules in content of text.
3. method according to claim 1, it is characterised in that first preset rules include multiple default cuckoos
Then,
Detect in the target text content includes with the presence or absence of the content of text matched with the first preset rules:
Detect successively in the target text content with the presence or absence of the content of text matched with the multiple default sub-rule;
And the default sub-rule of the content of text matched described in the presence for detecting first is used as target sub-rule,
Extracted at least from the content of text matched with first preset rules according to first preset rules
One target keyword includes:
At least one of target text content target keyword is extracted according to the target sub-rule.
4. method according to claim 1, it is characterised in that exist in the target text content is detected with
In the case of the content of text that first preset rules match, according to first preset rules from institute
Stating before extract at least one target keyword in the content of text that the first preset rules match, methods described is also
Including:
Whether the content of text that detection matches with first preset rules matches with the second preset rules,
Wherein, if detecting the content of text and the described second default rule matched with first preset rules
Then match, then according to first preset rules from the content of text matched with first preset rules
Extract at least one target keyword.
5. method according to claim 3, it is characterised in that the text that detection matches with first preset rules
This content whether with the second preset rules match including:
The part of speech of the second default characteristic key words in the content of text that detection matches with first preset rules
Whether be default part of speech, wherein, the described second default characteristic key words be according to the 3rd preset rules will with it is described
The content of text that first preset rules match split the keyword for obtaining,
Wherein, if in detecting the content of text matched with first preset rules second presets feature
The part of speech of keyword is the default part of speech, it is determined that the content of text that matches with first preset rules with
Second preset rules match.
6. method according to claim 1, it is characterised in that by least one target keyword store to
Before same keyword set, methods described also includes:
The numeral of the non-Arabic numerals form at least one target keyword is converted into Arabic numerals
Form, wherein, it is converted into the stored digital of Arabic numerals form to the same keyword set.
7. a kind of information processor for law judgement document, it is characterised in that including:
Acquiring unit, the target text content for obtaining law judgement document;
First detection unit, whether there is and the first preset rules phase for detecting in the target text content
The content of text matched somebody with somebody;
Extraction unit, if existed and the first preset rules phase for detecting in the target text content
The content of text of matching, then according to first preset rules from the text matched with first preset rules
At least one target keyword is extracted in content;And
Memory cell, at least one target keyword to be stored to same keyword set.
8. device according to claim 7, it is characterised in that first detection unit includes:
Judge module, for judging whether the target text content meets following condition:In the presence of at least one
One default characteristic key words, and described at least one first default characteristic key words are located at predeterminated position,
Wherein, if it is judged that there are described at least one first default feature criticals in the target text content
Word, and described at least one first default characteristic key words are located at the predeterminated position, it is determined that the target
There is the content of text matched with first preset rules in content of text.
9. device according to claim 7, it is characterised in that first preset rules include multiple default cuckoos
Then,
First detection unit includes:Detection module, for detect successively in the target text content whether
In the presence of the content of text matched with the multiple default sub-rule;And determining module, for first to be examined
The default sub-rule of the content of text matched described in the presence for measuring as target sub-rule,
The extraction unit includes:Extraction module, for being extracted in the target text according to the target sub-rule
At least one of appearance target keyword.
10. device according to claim 7, it is characterised in that described device also includes:
Second detection unit, for detecting the content of text matched with first preset rules whether with second
Preset rules match,
Wherein, if detecting the content of text and the described second default rule matched with first preset rules
Then match, then according to first preset rules from the content of text matched with first preset rules
Extract at least one target keyword.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510869588.9A CN106815207B (en) | 2015-12-01 | 2015-12-01 | Information processing method and device for legal referee document |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510869588.9A CN106815207B (en) | 2015-12-01 | 2015-12-01 | Information processing method and device for legal referee document |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106815207A true CN106815207A (en) | 2017-06-09 |
CN106815207B CN106815207B (en) | 2020-08-11 |
Family
ID=59108030
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510869588.9A Active CN106815207B (en) | 2015-12-01 | 2015-12-01 | Information processing method and device for legal referee document |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106815207B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197163A (en) * | 2017-12-14 | 2018-06-22 | 上海银江智慧智能化技术有限公司 | A kind of structuring processing method based on judgement document |
CN108345584A (en) * | 2018-01-04 | 2018-07-31 | 东南大学 | A kind of rule-based doctor-patient dispute case keyword extracting method |
CN108549813A (en) * | 2018-03-02 | 2018-09-18 | 彭根 | Method of discrimination, device and pocessor and storage media |
CN109285094A (en) * | 2017-07-19 | 2019-01-29 | 北京国双科技有限公司 | The processing method and processing device of legal documents |
CN109426905A (en) * | 2017-08-29 | 2019-03-05 | 北京国双科技有限公司 | A kind of determination method and device that the criminal document measurement of penalty deviates |
CN110019659A (en) * | 2017-07-31 | 2019-07-16 | 北京国双科技有限公司 | The search method and device of judgement document |
CN110032721A (en) * | 2018-01-11 | 2019-07-19 | 北京国双科技有限公司 | A kind of judgement document's method for pushing and device |
CN111274354A (en) * | 2020-01-15 | 2020-06-12 | 中科鼎富(北京)科技发展有限公司 | Referee document structuring method and device |
WO2020135247A1 (en) * | 2018-12-24 | 2020-07-02 | 北京国双科技有限公司 | Legal document parsing method and device |
CN111798344A (en) * | 2020-07-01 | 2020-10-20 | 北京金堤科技有限公司 | Method and device for determining subject name, electronic equipment and storage medium |
CN111950253A (en) * | 2020-08-28 | 2020-11-17 | 鼎富智能科技有限公司 | Evidence information extraction method and device for referee document |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1367446A (en) * | 2001-01-22 | 2002-09-04 | 前程无忧网络信息技术(北京)有限公司上海分公司 | Chinese personal biographical notes information treatment system and method |
CN102930054A (en) * | 2012-11-19 | 2013-02-13 | 北京奇虎科技有限公司 | Data search method and data search system |
CN104899262A (en) * | 2015-05-22 | 2015-09-09 | 华中师范大学 | Information categorization method supporting user-defined categorization rules |
CN105069076A (en) * | 2015-07-31 | 2015-11-18 | 北京奇虎科技有限公司 | Method and apparatus for determining address information in home page of official website |
-
2015
- 2015-12-01 CN CN201510869588.9A patent/CN106815207B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1367446A (en) * | 2001-01-22 | 2002-09-04 | 前程无忧网络信息技术(北京)有限公司上海分公司 | Chinese personal biographical notes information treatment system and method |
CN102930054A (en) * | 2012-11-19 | 2013-02-13 | 北京奇虎科技有限公司 | Data search method and data search system |
CN104899262A (en) * | 2015-05-22 | 2015-09-09 | 华中师范大学 | Information categorization method supporting user-defined categorization rules |
CN105069076A (en) * | 2015-07-31 | 2015-11-18 | 北京奇虎科技有限公司 | Method and apparatus for determining address information in home page of official website |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109285094B (en) * | 2017-07-19 | 2021-11-30 | 北京国双科技有限公司 | Legal document processing method and device |
CN109285094A (en) * | 2017-07-19 | 2019-01-29 | 北京国双科技有限公司 | The processing method and processing device of legal documents |
CN110019659A (en) * | 2017-07-31 | 2019-07-16 | 北京国双科技有限公司 | The search method and device of judgement document |
CN109426905B (en) * | 2017-08-29 | 2022-03-18 | 北京国双科技有限公司 | Criminal document criminal deviation judging method and device |
CN109426905A (en) * | 2017-08-29 | 2019-03-05 | 北京国双科技有限公司 | A kind of determination method and device that the criminal document measurement of penalty deviates |
CN108197163B (en) * | 2017-12-14 | 2021-08-10 | 上海银江智慧智能化技术有限公司 | Structured processing method based on referee document |
CN108197163A (en) * | 2017-12-14 | 2018-06-22 | 上海银江智慧智能化技术有限公司 | A kind of structuring processing method based on judgement document |
CN108345584A (en) * | 2018-01-04 | 2018-07-31 | 东南大学 | A kind of rule-based doctor-patient dispute case keyword extracting method |
CN110032721A (en) * | 2018-01-11 | 2019-07-19 | 北京国双科技有限公司 | A kind of judgement document's method for pushing and device |
CN108549813A (en) * | 2018-03-02 | 2018-09-18 | 彭根 | Method of discrimination, device and pocessor and storage media |
CN111428466A (en) * | 2018-12-24 | 2020-07-17 | 北京国双科技有限公司 | Legal document analysis method and device |
WO2020135247A1 (en) * | 2018-12-24 | 2020-07-02 | 北京国双科技有限公司 | Legal document parsing method and device |
CN111428466B (en) * | 2018-12-24 | 2022-04-01 | 北京国双科技有限公司 | Legal document analysis method and device |
CN111274354A (en) * | 2020-01-15 | 2020-06-12 | 中科鼎富(北京)科技发展有限公司 | Referee document structuring method and device |
CN111274354B (en) * | 2020-01-15 | 2023-08-11 | 鼎富智能科技有限公司 | Referee document structuring method and referee document structuring device |
CN111798344A (en) * | 2020-07-01 | 2020-10-20 | 北京金堤科技有限公司 | Method and device for determining subject name, electronic equipment and storage medium |
CN111798344B (en) * | 2020-07-01 | 2023-09-22 | 北京金堤科技有限公司 | Principal name determining method and apparatus, electronic device, and storage medium |
CN111950253A (en) * | 2020-08-28 | 2020-11-17 | 鼎富智能科技有限公司 | Evidence information extraction method and device for referee document |
CN111950253B (en) * | 2020-08-28 | 2023-12-08 | 鼎富智能科技有限公司 | Evidence information extraction method and device for referee document |
Also Published As
Publication number | Publication date |
---|---|
CN106815207B (en) | 2020-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106815207A (en) | For the information processing method and device of law judgement document | |
CN108509482B (en) | Question classification method and device, computer equipment and storage medium | |
CN106815208A (en) | The analysis method and device of law judgement document | |
CN109145216A (en) | Network public-opinion monitoring method, device and storage medium | |
CN110738039B (en) | Case auxiliary information prompting method and device, storage medium and server | |
CN104133916B (en) | Search result information method for organizing and device | |
CN113282955B (en) | Method, system, terminal and medium for extracting privacy information in privacy policy | |
CN112632989B (en) | Method, device and equipment for prompting risk information in contract text | |
CN106502879A (en) | A kind of method and device for realizing applications security detection | |
CN113849760B (en) | Sensitive information risk assessment method, system and storage medium | |
CN106776609A (en) | Reprint the statistical method and device of quantity in website | |
CN111078839A (en) | Structured processing method and processing device for referee document | |
CN113609261A (en) | Vulnerability information mining method and device based on knowledge graph of network information security | |
CN110968664A (en) | Document retrieval method, device, equipment and medium | |
CN115080704A (en) | Computer file security check method and system based on scoring mechanism | |
CN113392637B (en) | TF-IDF-based subject term extraction method, device, equipment and storage medium | |
CN104036189A (en) | Page distortion detecting method and black link database generating method | |
EP3752929A1 (en) | Computer-implemented methods, computer-readable media, and systems for identifying causes of loss | |
CN106649367B (en) | Method and device for detecting keyword popularization degree | |
CN115879110B (en) | System for identifying financial risk website based on fingerprint penetration technology | |
CN112395866A (en) | Customs declaration data matching method and device | |
CN115563288B (en) | Text detection method and device, electronic equipment and storage medium | |
CN110619212B (en) | Character string-based malicious software identification method, system and related device | |
CN111026885A (en) | System and method for extracting entity attribute of terrorist-related event based on text corpus | |
CN113888760B (en) | Method, device, equipment and medium for monitoring violation information based on software application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: Beijing Guoshuang Technology Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: Beijing Guoshuang Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |