CN109977366A - A kind of catalogue generation method and device - Google Patents
A kind of catalogue generation method and device Download PDFInfo
- Publication number
- CN109977366A CN109977366A CN201711450681.1A CN201711450681A CN109977366A CN 109977366 A CN109977366 A CN 109977366A CN 201711450681 A CN201711450681 A CN 201711450681A CN 109977366 A CN109977366 A CN 109977366A
- Authority
- CN
- China
- Prior art keywords
- paragraph
- attribute
- catalogue
- configuration
- format
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000004891 communication Methods 0.000 claims description 19
- 238000012216 screening Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 230000015572 biosynthetic process Effects 0.000 abstract description 9
- 238000007726 management method Methods 0.000 description 59
- 238000012360 testing method Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 13
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Document Processing Apparatus (AREA)
Abstract
The embodiment of the invention provides a kind of catalogue generation method, this method comprises: obtaining paragraph format, the attribute of a configuration, segment number and the paragraph mark of the paragraph of catalogue to be generated in document;According to paragraph mark and paragraph format, the paragraph of title is selected as from the paragraph of catalogue to be generated;According to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained;According to described hierarchical relationship, the catalogue of the paragraph of catalogue to be generated is generated.Using catalogue generation method provided in an embodiment of the present invention, catalogue can be automatically generated, to improve the formation efficiency of catalogue, promotes the experience of user.
Description
Technical field
The present invention relates to computer software application technical fields, more particularly to a kind of catalogue generation method and device.
Background technique
The structure and level of document can be intuitively presented for user in catalogue, and user is helped to carry out the content in document
Quickly positioning, facilitates understanding and reading of the user to document.
However, the method for generating catalogue at present, needs manually to pick out the text as directory content from document, and by
One is the information such as text setting title pattern, the outline rank picked out, and then generates catalogue on this basis.Therefore, catalogue
Generating process it is very cumbersome, cause user generate catalogue efficiency it is lower, the experience to user is poor.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of catalogue generation method and device, to improve the generation effect of catalogue
Rate promotes the experience of user.
To solve the above problems, the embodiment of the present invention proposes a kind of catalogue generation method, which comprises
Obtain the paragraph format of the paragraph of catalogue to be generated in document, the attribute of a configuration, segment number and paragraph mark;
According to paragraph mark and paragraph format, the paragraph of title is selected as from the paragraph of catalogue to be generated;
According to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained;
According to described hierarchical relationship, the catalogue of the paragraph of catalogue to be generated is generated.
Preferably, it is described according to paragraph mark and paragraph format, title is selected as from the paragraph of catalogue to be generated
Paragraph, comprising:
Determine that paragraph mark is not belonging to preset the paragraph of non-title paragraph mark in the paragraph of catalogue to be generated;
According to paragraph format, the paragraph of title is selected as from identified paragraph.
Preferably, it is described according to paragraph format, the paragraph of title is selected as from identified paragraph, comprising:
According to paragraph format, identified predicted value of each paragraph as title is calculated;
According to the predicted value of identified each paragraph, the paragraph of title is selected as from identified paragraph.
Preferably, the paragraph format of a paragraph, comprising: number format, font size, text last character and text are long
Degree;
It is described according to paragraph format, calculate determined by predicted value of each paragraph as title, comprising:
According to the font size of text in paragraph, it is poor to calculate font size between identified each paragraph and preset title font size;
According to following formula, the corresponding predicted value of prediction element of identified each paragraph is obtained, wherein a section
The prediction element fallen includes: that the number format of paragraph, font size be poor, text in the last character of text and paragraph in paragraph
Length:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element
Default bias position;
According to predicted value obtained, identified predicted value of each paragraph as title is calculated.
Preferably, the non-title paragraph mark includes:
It indicates the paragraph mark of subdocument, the paragraph mark for indicating table, the paragraph mark for indicating directory field, indicate picture
Paragraph mark and mark blank paragraph paragraph mark.
Preferably, the segment number and the attribute of a configuration of the paragraph according to selected by, the level obtained between selected paragraph is closed
System, comprising:
According to the attribute of a configuration of paragraph, selected paragraph is divided into paragraph group;
According to segment number and following formula, the management section of each paragraph in each paragraph group is determined:
In the presence of one paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows: [paragraph
Segment number, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];There is no next in affiliated paragraph group for the paragraph
When adjacent paragraph, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph];
It puts in order according to the segment number of selected paragraph, and according to the management section of selected paragraph and selected paragraph
The attribute of a configuration obtains the hierarchical relationship between selected paragraph.
Preferably, the segment number according to selected paragraph puts in order, and according to the management section of selected paragraph and
The attribute of a configuration of selected paragraph obtains the hierarchical relationship between selected paragraph, comprising:
Put in order according to the segment number of selected paragraph and following manner, obtain in selected paragraph adjacent two paragraph it
Between hierarchical relationship:
Determine the section relationship between the management section of the first paragraph and the management section of the second paragraph, wherein described the
One paragraph and the second paragraph are as follows: in selected paragraph, two paragraphs for putting in order adjacent according to segment number arrange suitable according to segment number
Sequence, second paragraph are arranged in after first paragraph;
When the section relationship is disjoint relationship, judge first paragraph the attribute of a configuration and second paragraph
Whether the attribute of a configuration is identical;
If they are the same, the hierarchical relationship between first paragraph and the second paragraph is determined are as follows: paragraph at the same level;
If not identical, similar paragraph is searched, wherein the similar paragraph are as follows: it puts in order according to segment number, selected paragraph
In before first paragraph paragraph identical with the attribute of a configuration of second paragraph;The similar paragraph if it exists, really
Fixed second paragraph is the hierarchical relationship between the similar paragraph are as follows: at the same level;The similar paragraph if it does not exist, determines institute
State the hierarchical relationship between the first paragraph and the second paragraph are as follows: the small paragraph of segment number is the upper level paragraph of the big paragraph of segment number;
The section relationship be non-disjoint relationship when, execute the lookup similar paragraph the step of.
Preferably, the attribute of a configuration of the attribute of a configuration for judging first paragraph and second paragraph whether phase
Together, comprising:
Judge whether first paragraph and the second paragraph have number;
If there is number, according to the number format of the number format of first paragraph and the second paragraph, described in judgement
Whether the attribute of a configuration of the first paragraph is identical as the attribute of a configuration of second paragraph;
If not there is number, then it is arranged according to the text of the setting of the text of first paragraph and the second paragraph, judges institute
Whether the attribute of a configuration for stating the first paragraph is identical as the attribute of a configuration of second paragraph.
The embodiment of the invention also provides a kind of catalogue generating means, described device includes:
Paragraph information obtains module, for obtaining the paragraph format of the paragraph of catalogue to be generated, the attribute of a configuration, section in document
Number and paragraph mark;
Paragraph screening module, for being selected from the paragraph of the catalogue to be generated according to paragraph mark and paragraph format
Paragraph as title;
Analytic hierarchy process module obtains between selected paragraph for the segment number and the attribute of a configuration according to selected paragraph
Hierarchical relationship;
Catalog generation module, for generating the catalogue of the paragraph of the catalogue to be generated according to described hierarchical relationship.
Preferably, the paragraph screening module, comprising:
First screens submodule, and paragraph mark is not belonging to preset non-title in the paragraph for determining the catalogue to be generated
The paragraph of paragraph mark;
Second screening submodule, for being selected as the paragraph of title from identified paragraph according to paragraph format.
Preferably, the second screening submodule, comprising:
Predictor calculation unit, for calculating identified predicted value of each paragraph as title according to paragraph format;
Title selecting unit selects to make from identified paragraph for the predicted value according to identified each paragraph
For the paragraph of title.
Preferably, the paragraph format of a paragraph, comprising: number format, font size, text last character and text are long
Degree:
The predictor calculation unit, is specifically used for:
According to the font size of text in paragraph, it is poor to calculate font size between identified each paragraph and preset title font size;
According to following formula, the corresponding predicted value of prediction element of identified each paragraph is obtained, wherein a section
The prediction element fallen includes: that the number format of paragraph, font size be poor, text in the last character of text and paragraph in paragraph
Length:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element
Default bias position;
According to predicted value obtained, identified predicted value of each paragraph as title is calculated.
Preferably, the non-title paragraph mark includes:
It indicates the paragraph mark of subdocument, the paragraph mark for indicating table, the paragraph mark for indicating directory field, indicate picture
Paragraph mark and mark blank paragraph paragraph mark.
Preferably, the analytic hierarchy process module, comprising:
It is grouped submodule and selected paragraph is divided into paragraph group for the attribute of a configuration according to paragraph;
Interval division submodule, for determining each paragraph in each paragraph group according to segment number and following formula
Manage section:
In the presence of one paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows: [paragraph
Segment number, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];There is no next in affiliated paragraph group for the paragraph
When adjacent paragraph, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph];
Level divides submodule, for putting in order according to the segment number of selected paragraph, and according to the pipe of selected paragraph
The attribute of a configuration in section and selected paragraph is managed, the hierarchical relationship between selected paragraph is obtained.
It is preferable:
The level divides submodule, specifically for putting in order according to the segment number of selected paragraph and following manner,
Obtain the hierarchical relationship in selected paragraph between adjacent two paragraph:
Determine the section relationship between the management section of the first paragraph and the management section of the second paragraph, wherein described the
One paragraph and the second paragraph are as follows: in selected paragraph, two paragraphs for putting in order adjacent according to segment number arrange suitable according to segment number
Sequence, second paragraph are arranged in after first paragraph;
When the section relationship is disjoint relationship, judge first paragraph the attribute of a configuration and second paragraph
Whether the attribute of a configuration is identical;
If they are the same, the hierarchical relationship between first paragraph and the second paragraph is determined are as follows: paragraph at the same level;
If not identical, similar paragraph is searched, wherein the similar paragraph are as follows: it puts in order according to segment number, selected paragraph
In before first paragraph paragraph identical with the attribute of a configuration of second paragraph;The similar paragraph if it exists, really
Fixed second paragraph is the hierarchical relationship between the similar paragraph are as follows: at the same level;The similar paragraph if it does not exist, determines institute
State the hierarchical relationship between the first paragraph and the second paragraph are as follows: the small paragraph of segment number is the upper level paragraph of the big paragraph of segment number;
The section relationship be non-disjoint relationship when, execute the lookup similar paragraph the step of.
Preferably, the level, which divides submodule, judges the attribute of a configuration of first paragraph and the lattice of second paragraph
Whether formula attribute is identical, comprising:
Judge whether first paragraph and the second paragraph have number;
If there is number, according to the number format of the number format of first paragraph and the second paragraph, described in judgement
Whether the attribute of a configuration of the first paragraph is identical as the attribute of a configuration of second paragraph;
If not there is number, then it is arranged according to the text of the setting of the text of first paragraph and the second paragraph, judges institute
Whether the attribute of a configuration for stating the first paragraph is identical as the attribute of a configuration of second paragraph.
The embodiment of the invention also provides a kind of electronic equipment, including processor, communication interface, memory and communication are total
Line, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any of the above-described method and step.
The embodiment of the invention also provides a kind of computer program products comprising instruction, when it runs on computers
When, so that computer executes any of the above-described catalogue generation method.
Catalogue generation method provided in an embodiment of the present invention and device, by the section for obtaining each catalogue to be generated in document
Paragraph format, the attribute of a configuration, segment number and the paragraph mark fallen, filters out the section as title from the paragraph of catalogue to be generated
It falls, and the hierarchical structure of these paragraphs is divided, automatically generate catalogue, to improve the formation efficiency of catalogue, promoted and used
The experience at family.Certainly, it implements any of the products of the present invention or method must be not necessarily required to reach all the above excellent simultaneously
Point.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of catalogue generation method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of another catalogue generation method provided in an embodiment of the present invention;
Fig. 3 is the flow diagram of another catalogue generation method provided in an embodiment of the present invention;
Fig. 4 is the catalogue exemplary diagram using schemes generation provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of catalogue generating means in the embodiment of the present invention;
Fig. 6 is the structural schematic diagram of another catalogue generating means in the embodiment of the present invention;
Fig. 7 is the structural schematic diagram of another catalogue generating means in the embodiment of the present invention;
Fig. 8 is the structure chart of a kind of electronic equipment.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In order to which the generating process for solving file catalogue in the prior art is very cumbersome, user is caused to generate catalogue
The lower problem of efficiency, the invention proposes a kind of catalogue generation method and devices.
Catalogue generation method provided in an embodiment of the present invention is illustrated on the whole below.
In a kind of implementation of the invention, above-mentioned catalogue generation method includes:
Obtain the paragraph format of the paragraph of catalogue to be generated in document, the attribute of a configuration, segment number and paragraph mark;
According to paragraph mark and paragraph format, the paragraph of title is selected as from the paragraph of catalogue to be generated;
According to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained;
According to hierarchical relationship, the catalogue of the paragraph of catalogue to be generated is generated.
When as seen from the above, using the catalogue in schemes generation document provided in an embodiment of the present invention, by obtaining document
Paragraph format, the attribute of a configuration, segment number and the paragraph mark of the paragraph of interior each catalogue to be generated, from the paragraph of catalogue to be generated
The paragraph as title is filtered out, and the hierarchical structure of these paragraphs is divided, catalogue is automatically generated, to improve catalogue
Formation efficiency, promote the experience of user.
Catalogue generation method provided in an embodiment of the present invention will be described in detail by specific embodiment below.
As shown in Figure 1, being a kind of flow diagram of catalogue generation method provided in an embodiment of the present invention, including walk as follows
It is rapid:
Step S101: paragraph format, the attribute of a configuration, segment number and the paragraph mark of the paragraph of catalogue to be generated in document are obtained
Know.
In a kind of implementation, the paragraph format of paragraph include the number format of paragraph, the font size of text, text it is last
The length etc. of text in one character and paragraph;The attribute of a configuration of paragraph can be arranged according to the number format and text of paragraph into
Row judgement, wherein the text setting of paragraph includes the situation placed in the middle of paragraph, overstriking situation etc.;The segment number of paragraph is that the paragraph exists
The serial number being arranged in order in the paragraph of all catalogues to be generated;The paragraph mark of paragraph then embodies the content of paragraph, example
Such as: the content of the paragraph may be picture, directory field, subdocument.
In this step, the paragraph of catalogue to be generated can be all paragraphs in document, be also possible to be selected by user
Paragraph, can also be all paragraphs in document in the specific page number, specifically can need to determine by user, the present invention implement
Example does not limit this.
Step S102: according to paragraph mark and paragraph format, title is selected as from the paragraph of the catalogue to be generated
Paragraph.
In one implementation, the paragraph that can successively traverse each catalogue to be generated, it is to be generated to what is traversed
The paragraph of catalogue is judged, the paragraph of the catalogue to be generated as title is filtered out, until the section of all catalogues to be generated
Traversal is fallen to finish.
It is of course also possible to not consider sequence, directly the paragraph of catalogue to be generated is screened, need to only be guaranteed all to be generated
It can be all screened at the paragraph of catalogue, it is not limited in the embodiment of the present invention.
In this step, by the screening of the paragraph to catalogue to be generated, by the header segment in the paragraph of catalogue to be generated
It falls and is distinguished with other paragraphs.And title, inherently to the summary of document content, therefore, subsequent need to be to as title
Paragraph between hierarchical relationship divided so that generate catalogue efficiency improve.
Step S103: according to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained.
In this step, the hierarchical relationship between selected paragraph is the hierarchical relationship between title, and title it
Between hierarchical relationship, the hierarchical structure between the paragraph of catalogue to be generated can be embodied.
Step S104: according to described hierarchical relationship, the catalogue of the paragraph of the catalogue to be generated is generated.
In one implementation, the catalogue of generation can be shown in a document, such as: it shows in mesh to be generated
The prevpage of the paragraph of record or the next page of the paragraph of catalogue to be generated etc., it is not limited in the embodiment of the present invention.
When as seen from the above, using schemes generation catalogue provided in an embodiment of the present invention, by obtain document in it is each to
Paragraph format, the attribute of a configuration, segment number and the paragraph mark for generating the paragraph of catalogue, filter out work from the paragraph of catalogue to be generated
For the paragraph of title, and the hierarchical structure of these paragraphs is divided, automatically generate catalogue, to improve the generation effect of catalogue
Rate promotes the experience of user.
As shown in Fig. 2, for the flow diagram of another catalogue generation method provided in an embodiment of the present invention, including it is as follows
Step:
Step S201: paragraph format, the attribute of a configuration, segment number and the paragraph of the paragraph of each catalogue to be generated in document are obtained
Mark.
Step S202: determine that paragraph mark is not belonging to preset non-title paragraph mark in the paragraph of the catalogue to be generated
Paragraph.
The paragraph of different content has different paragraph marks, therefore, will can determine first and be not belonging to title paragraph
Paragraph mark screen, further according to these paragraphs identify, filtering out from the paragraph of all catalogues to be generated to be
The paragraph of title paragraph.
In one implementation, preset non-title paragraph mark includes: the paragraph mark for indicating subdocument, expression table
The paragraph mark of lattice, the paragraph mark for indicating directory field, the paragraph mark of the paragraph mark for indicating picture and mark blank paragraph,
Certainly may also include other can determine this section get blamed title paragraph paragraph mark.
Step S203: according to paragraph format, the paragraph of title is selected as from identified paragraph.
In the previous step, having been filtered out in the paragraph of catalogue to be generated using paragraph mark to be the paragraph of title,
And in these paragraphs, it is also possible to which there are text fragments, such as text paragraph of some non-titles etc..Therefore, in this step,
The paragraph format for continuing through the paragraph of catalogue to be generated is further selected from the paragraph that previous step filters out as mark
The paragraph of topic.
In one implementation, prediction of each paragraph as title determined by being calculated according to paragraph format
Value, the paragraph of title is selected as further according to the predicted value.
Specifically, predicted value of each paragraph as title can be calculated in the following manner:
Step 1: according to the font size of text in paragraph, calculating between identified each paragraph and preset title font size
Font size is poor.
Step 2: according to following formula, obtain the corresponding predicted value of prediction element of identified each paragraph:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element
Default bias position
Wherein, the prediction element of a paragraph include: the number format of paragraph, font size be poor, in paragraph text last
The length of text in a character and paragraph.And it is each prediction element default weight according to different prediction elements to prediction result
Influence size determine that the default bias position of each prediction element is peak excursion position that the prediction element allows in the algorithm
Range embodies the confidence interval of the prediction element, is both obtained according to the training of the machine learning algorithm of early period.
Step 3: according to predicted value obtained, calculating identified predicted value of each paragraph as title.
In one implementation, it can use Sigmoid function, to each predictive elements being calculated in previous step
The corresponding predicted value of element is calculated, and predicted value of each paragraph as title is finally obtained.
Specifically, one can be set for each paragraph is calculated as the predicted value of title using Sigmoid function
A threshold value judges that the paragraph is title paragraph when a paragraph is greater than threshold value as the predicted value of title, when a paragraph is made
When being less than threshold value for the predicted value of title, judge that the paragraph is text paragraph.Wherein, in one implementation, which can
To be set as 0.5.
It should be noted that the paragraph format of each paragraph includes multiple element, and such as: number format, font size, text are most
The latter character, text size, line space, character pitch etc..The embodiment of the present invention is by machine learning algorithm, to each element
It has carried out statistics to calculate, foundation of the optimal several elements of effect as subsequent calculating is selected according to training result.I.e. finally with
Number format, font size, text last character and the text size of paragraph be foundation, to paragraph as title predicted value into
Row calculates.But the embodiment of the present invention is only illustrated for above-mentioned, and limiting the invention.
Step S204: according to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained.
Step S205: according to described hierarchical relationship, the catalogue of the paragraph of the catalogue to be generated is generated.
Step S201 is identical as the step S101 of inventive embodiments shown in Fig. 1, shown in step S204 to step S205 and Fig. 1
The step S103 of inventive embodiments is identical to step S104, no longer repeats one by one here.
It is each to be generated by what is got when as seen from the above, using schemes generation catalogue provided in an embodiment of the present invention
At the paragraph mark of the paragraph of catalogue, the paragraph for being not belonging to preset non-title paragraph is filtered out, further according to the paragraph of each paragraph
Format filters out the paragraph as title, then according to the segment number and the attribute of a configuration of selected paragraph, obtain selected paragraph it
Between hierarchical relationship, automatically generate catalogue, to improve the formation efficiency of catalogue, promote the experience of user.
As shown in figure 3, for the flow diagram of another catalogue generation method provided in an embodiment of the present invention, including it is as follows
Step:
Step S301: paragraph format, the attribute of a configuration, segment number and the paragraph of the paragraph of each catalogue to be generated in document are obtained
Mark.
Step S302: according to paragraph mark and paragraph format, title is selected as from the paragraph of the catalogue to be generated
Paragraph.
Step S303: according to the attribute of a configuration of paragraph, selected paragraph is divided into paragraph group.
In one implementation, the identical paragraph of the attribute of a configuration is divided into one group, thus by the section of catalogue to be generated
It falls and is divided into different paragraph groups.
Step S304: according to segment number and following formula, the management section of each paragraph in each paragraph group is determined.
Specifically, in the presence of a paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows:
[segment number of the paragraph, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];The paragraph is in affiliated paragraph group
When being fallen there is no next adjacent segment, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph].
For example, if the segment number of a paragraph is 1, and and its next adjacent paragraph in the same paragraph group
Segment number be 4, then the management section of the paragraph be [1,3];If the segment number of a paragraph is 6, and in the same paragraph group,
The paragraph is fallen there is no next adjacent segment, then the management section of the paragraph is [6,6].
Step S305: putting in order according to the segment number of selected paragraph, and according to the management section of selected paragraph and institute
The attribute of a configuration of paragraph is selected, the hierarchical relationship between selected paragraph is obtained.
In one implementation, it puts in order according to the segment number of selected paragraph and following manner, selected by acquisition
Hierarchical relationship in paragraph between adjacent two paragraph:
Step 1: determining the section relationship between the management section of the first paragraph and the management section of the second paragraph.
Wherein, it in two paragraphs for putting in order adjacent according to segment number selected, puts in order according to segment number, second
Paragraph is arranged in after first paragraph.And the relationship between section is divided into two kinds, mutually from, intersection and comprising in this programme
In, by between section intersection and inclusion relation be known as non-disjoint relationship.
For example, if the management section of the first paragraph is [1,1], the management section of the second paragraph is [2,2], two
Do not have between section be overlapped part, then the relationship between the first paragraph section corresponding with the second paragraph be mutually from;If the
The management section of one paragraph is [1,5], and the management section of the second paragraph is [2,2], and the management section of the second paragraph completely includes
In the management section of the first paragraph, then the relationship between the first paragraph section corresponding with the second paragraph is to include that is, non-phase
From relationship;If the management section of the first paragraph is [1,2], the management section of the second paragraph is [2,3], the management of the first paragraph
There is the part being overlapped with the management section of the second paragraph completely in section, then between the first paragraph section corresponding with the second paragraph
Relationship be intersection, be also non-disjoint relationship.
Step 2:
The first situation:
Section relationship between the management section of first paragraph and the management section of the second paragraph is disjoint relationship:
(1) whether the attribute of a configuration for judging the first paragraph is identical as the attribute of a configuration of the second paragraph.
In one implementation, whether the attribute of a configuration for judging the first paragraph is identical as the attribute of a configuration of the second paragraph,
It is accomplished by the following way:
First, it is determined that whether the first paragraph and the second paragraph have number.
If there is number, according to the number format of the number format of the first paragraph and the second paragraph, first segment is judged
Whether the attribute of a configuration fallen is identical as the attribute of a configuration of the second paragraph.If number format is identical, the first paragraph and are judged
The attribute of a configuration of two paragraphs is identical;
If non- have number, i.e. the first paragraph and the second paragraph are not all numbered, or only wherein one section has number,
Another section does not have, then is arranged according to the text of paragraph, judges the attribute of a configuration of the first paragraph and the format category of second paragraph
Whether property is identical.If the text setting of paragraph is identical, judge that the attribute of a configuration of the first paragraph and the second paragraph is identical.
In oneainstance, the text setting of paragraph include font size size, whether it is placed in the middle and whether overstriking, when font size, occupy
When neutralization overstriking setting is all identical, the text setting of as paragraph is identical.
(2) if the attribute of a configuration of the first paragraph is identical as the attribute of a configuration of the second paragraph, it is determined that the first paragraph and second
Hierarchical relationship between paragraph are as follows: paragraph at the same level.
If not identical, similar paragraph is searched;Wherein, similar paragraph are as follows: put in order according to segment number, in selected paragraph
Paragraph identical with the attribute of a configuration of the second paragraph before first paragraph;
Similar paragraph if it exists, it is determined that the second paragraph is the hierarchical relationship between similar paragraph are as follows: at the same level.
Similar paragraph if it does not exist, it is determined that the hierarchical relationship between the first paragraph and the second paragraph are as follows: the small section of segment number
Fall be the big paragraph of segment number upper level paragraph.
In one implementation, when searching similar paragraph, according to the segment number of each paragraph, from the previous of the first paragraph
Paragraph starts, successively the paragraph before recursive lookup.
Second situation:
Section relationship between the management section of first paragraph and the management section of the second paragraph is non-disjoint relationship:
The step of executing above-mentioned lookup similar paragraph:
Similar paragraph if it exists, it is determined that the second paragraph is the hierarchical relationship between similar paragraph are as follows: at the same level.
Similar paragraph if it does not exist, it is determined that the hierarchical relationship between the first paragraph and the second paragraph are as follows: the small section of segment number
Fall be the big paragraph of segment number upper level paragraph.
Step S306: according to hierarchical relationship, the catalogue of the paragraph of catalogue to be generated is generated.
Step S301 to step S302 and the step S101 of inventive embodiments shown in Fig. 1 identical, the step to step S102
S306 is identical as the step S104 of inventive embodiments shown in Fig. 1, no longer repeats one by one here.
It is each to be generated by what is got when as seen from the above, using schemes generation catalogue provided in an embodiment of the present invention
At the paragraph mark and paragraph format of the paragraph of catalogue, the paragraph as title is filtered out, then according to the section of selected paragraph
Number, management section, the attribute of a configuration further according to each paragraph and the relationship for managing section are divided for each paragraph, selected by acquisition
Hierarchical relationship between paragraph, automatically generates catalogue, to improve the formation efficiency of catalogue, promotes the experience of user.
In order to make it easy to understand, being explained below by a specific example to catalogue generation method shown in Fig. 3.
As shown in figure 4, for the catalogue of application schemes generation provided in an embodiment of the present invention.
All titles in catalogue shown in Fig. 4, for according to paragraph mark and paragraph format, from all mesh to be generated
The paragraph as title filtered out in the paragraph of record.
1, according to the attribute of a configuration of paragraph, these paragraphs are divided into paragraph group.
As can be seen that " distinguishing hierarchy summary ", " purpose ", " conclusion ", " algorithm ", " verifying " and " points for attention " is one
Paragraph group, " 1. automatic test " and " 2. manual test " are a paragraph group, " 1.1. specimen page source ", " 1.2. correlation data " and
" 1.3 scene " is a paragraph group, and " 2.1. specimen page source ", " 2.2. method " and " 2.3 conclusion " is a paragraph group.
2, the management section of each paragraph in each paragraph group is determined.
The management section of " distinguishing hierarchy summary " is [1,1];The management section of " purpose " is [2,2];The management of " conclusion "
Section is [3,3];The management section of " algorithm " is [4,4];The management section of " verifying " is [5,13];The management of " points for attention "
Section is [14,14];
The management section of " 1. automatic test " is [6,9];The management section of " 2. manual test " is [10,10];
The management section in " 1.1. specimen page source " is [7,7];The management section of " 1.2. correlation data " is [8,8];" 1.3
The management section of scape " is [9,9];
The management section in " 2.1. specimen page source " is [11,11];The management section of " 2.2. method " is [12,12];" 2.3 knots
By " management section be [13,13].
3, it according to the attribute of a configuration in the management section and selected paragraph of selected paragraph, obtains between selected paragraph
Hierarchical relationship.
" distinguishing hierarchy summary " and " purpose ", " purpose " and " conclusion ", " conclusion " and " algorithm ", " algorithm " and " verifying ",
" 1.1. specimen page source " and " 1.2. correlation data ", " 1.2. correlation data " and " 1.3 scene ", " 2.1. specimen page source " and " side 2.2.
Management section is mutually from and the attribute of a configuration is identical, therefore, between above-mentioned paragraph between method ", " 2.2. method " and " 2.3 conclusion "
Hierarchical relationship be peer;
Between " verifying " and " 1. automatic test ", " 1. automatic test " and " 1.1. specimen page source " management section for non-phase from,
The then identical paragraph of the recursive lookup attribute of a configuration, it can be seen that " 1. automatic test " and " 1.1. specimen page source " are not deposited before
In similar paragraph, therefore, " verifying " is the upper level of " 1. automatic test ", and " 1. automatic test " is upper the one of " 1.1. specimen page source "
Grade;
It is non-phase from then the recursive lookup attribute of a configuration is identical that section is managed between " 2. manual test " and " 2.1. specimen page source "
Paragraph, it can be seen that " 2.1. specimen page source " is identical as " 1.3 scene " attribute of a configuration, therefore, " 2.1. specimen page source " with " 1.3
Scape " is peer;
It is mutually from and lattice that section is managed between " 1.3 scene " and " 2. manual test ", " points for attention " and " 2.3 conclusion "
Formula attribute is not identical, then the identical paragraph of the recursive lookup attribute of a configuration, it can be seen that " 2. manual test " and " 1. automatic test "
The attribute of a configuration is identical, and " points for attention " are identical as " verifying " attribute of a configuration, and therefore, " 2. manual test " is with " 1. automatic test "
Peer, " points for attention " and " verifying " are peer.
4, according to hierarchical relationship, the catalogue of the paragraph of catalogue to be generated, i.e., result as shown in Figure 4 are generated.
Corresponding with above- mentioned information method for pushing, the embodiment of the invention also provides a kind of catalogue generating means.
As shown in figure 5, for a kind of structural schematic diagram of catalogue generating means in the embodiment of the present invention, which includes:
Paragraph information obtains module 510, for obtaining paragraph format, the format of the paragraph of each catalogue to be generated in document
Attribute, segment number and paragraph mark.
Paragraph screening module 520, for being selected from the paragraph of the catalogue to be generated according to paragraph mark and paragraph format
Select the paragraph as title.
Analytic hierarchy process module 530 obtains between selected paragraph for the segment number and the attribute of a configuration according to selected paragraph
Hierarchical relationship.
Catalog generation module 540, for generating the catalogue of the paragraph of the catalogue to be generated according to described hierarchical relationship.
As seen from the above, it in scheme provided in an embodiment of the present invention, is obtained in document by paragraph data obtaining module 510
The paragraph format of the paragraph of each catalogue to be generated, the attribute of a configuration, segment number and paragraph mark, paragraph screening module 520 is to be generated
At the paragraph filtered out in the paragraph of catalogue as title, and by analytic hierarchy process module 530 to the hierarchical structures of these paragraphs into
Row divides, and final catalog generation module 540 automatically generates catalogue, to improve the formation efficiency of catalogue, promotes the experience of user.
As shown in fig. 6, for the structural schematic diagram of catalogue generating means another in the embodiment of the present invention, which includes:
Paragraph information obtains module 610, for obtaining paragraph format, the format of the paragraph of each catalogue to be generated in document
Attribute, segment number and paragraph mark.
Paragraph screening module 620, comprising:
First screens submodule 621, and paragraph mark is not belonging to preset non-in the paragraph for determining the catalogue to be generated
The paragraph of title paragraph mark.
In one implementation, it is not belonging to preset non-title paragraph mark are as follows: indicate the paragraph mark of subdocument, indicate
The paragraph mark of the paragraph mark of table, the paragraph mark for indicating directory field, the paragraph mark for indicating picture and mark blank paragraph
Know.
Second screening submodule 622, for being selected as the section of title from identified paragraph according to paragraph format
It falls.
In one implementation, the second screening submodule 622, comprising:
Predictor calculation unit 622 (a), for calculating identified each paragraph as title according to paragraph format
Predicted value;
It is specifically used for:
In one implementation, the paragraph format of a paragraph, comprising: number format, font size, text the last character
Symbol and text size.
According to the font size of text in paragraph, it is poor to calculate font size between identified each paragraph and preset title font size.
According to following formula, the corresponding predicted value of prediction element of identified each paragraph is obtained, wherein a section
The prediction element fallen includes: that the number format of paragraph, font size be poor, text in the last character of text and paragraph in paragraph
Length:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element
Default bias position.
According to predicted value obtained, identified predicted value of each paragraph as title is calculated.
Title selecting unit 622 (b), for the predicted value according to identified each paragraph, from identified paragraph
It is selected as the paragraph of title.
Analytic hierarchy process module 630 obtains between selected paragraph for the segment number and the attribute of a configuration according to selected paragraph
Hierarchical relationship.
Catalog generation module 640, for generating the catalogue of the paragraph of the catalogue to be generated according to described hierarchical relationship.
When as seen from the above, using the catalogue in schemes generation document provided in an embodiment of the present invention, pass through paragraph information
The paragraph mark of the paragraph for each catalogue to be generated that module 610 is got is obtained, the first screening submodule 621, which filters out, not to be belonged to
In the paragraph for presetting non-title paragraph, the second screening submodule 622 is filtered out further according to the paragraph format of each paragraph as mark
The paragraph of topic, then analytic hierarchy process module 630 obtains between selected paragraph according to the segment number and the attribute of a configuration of selected paragraph
Hierarchical relationship, catalog generation module 640 automatically generates catalogue, to improve the formation efficiency of catalogue, promotes the experience of user.
As shown in fig. 7, for the structural schematic diagram of the catalogue generating means in document another in the embodiment of the present invention, the dress
It sets and includes:
Paragraph information obtains module 710, for obtaining paragraph format, the format of the paragraph of each catalogue to be generated in document
Attribute, segment number and paragraph mark.
Paragraph screening module 720, for being selected from the paragraph of the catalogue to be generated according to paragraph mark and paragraph format
Select the paragraph as title.
Analytic hierarchy process module 730, comprising:
It is grouped submodule 731 and selected paragraph is divided into paragraph group for the attribute of a configuration according to paragraph.
Interval division submodule 732, for determining each paragraph in each paragraph group according to segment number and following formula
Management section:
In the presence of one paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows: [paragraph
Segment number, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];There is no next in affiliated paragraph group for the paragraph
When adjacent paragraph, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph].
Level divides submodule 733, for putting in order according to the segment number of selected paragraph, and according to selected paragraph
The attribute of a configuration in section and selected paragraph is managed, the hierarchical relationship between selected paragraph is obtained.
Specifically, determining the section relationship between the management section of the first paragraph and the management section of the second paragraph, wherein
First paragraph and the second paragraph are as follows: in selected paragraph, two paragraphs for putting in order adjacent according to segment number, according to segment number
It puts in order, second paragraph is arranged in after first paragraph;
When the section relationship is disjoint relationship, judge first paragraph the attribute of a configuration and second paragraph
Whether the attribute of a configuration is identical;
If they are the same, the hierarchical relationship between first paragraph and the second paragraph is determined are as follows: paragraph at the same level;
If not identical, similar paragraph is searched, wherein the similar paragraph are as follows: it puts in order according to segment number, selected paragraph
In before first paragraph paragraph identical with the attribute of a configuration of second paragraph;The similar paragraph if it exists, really
Fixed second paragraph is the hierarchical relationship between the similar paragraph are as follows: at the same level;The similar paragraph if it does not exist, determines institute
State the hierarchical relationship between the first paragraph and the second paragraph are as follows: the small paragraph of segment number is the upper level paragraph of the big paragraph of segment number;
The section relationship be non-disjoint relationship when, execute the lookup similar paragraph the step of.
In one implementation, the level divides the attribute of a configuration and described that submodule judges first paragraph
Whether the attribute of a configuration of two paragraphs is identical, comprising:
Judge whether first paragraph and the second paragraph have number;
If there is number, according to the number format of the number format of first paragraph and the second paragraph, described in judgement
Whether the attribute of a configuration of the first paragraph is identical as the attribute of a configuration of second paragraph;
If not there is number, then it is arranged according to the text of first paragraph, judges the attribute of a configuration of first paragraph
It is whether identical as the attribute of a configuration of second paragraph.
Catalog generation module 740, for generating the catalogue of the paragraph of the catalogue to be generated according to described hierarchical relationship.
When as seen from the above, using schemes generation catalogue provided in an embodiment of the present invention, pass through paragraph data obtaining module
The paragraph of the paragraph of the 710 each catalogues to be generated got identifies and paragraph format, paragraph screening module 720 filter out conduct
Then the paragraph of title is grouped submodule 731 and interval division submodule 732 according to the segment number of selected paragraph, is each section
Division management section is fallen, level divides submodule 733 further according to the attribute of a configuration of each paragraph and the relationship in management section, obtains
Hierarchical relationship between selected paragraph, catalog generation module 740 automatically generate catalogue, so that the formation efficiency of catalogue is improved,
Promote the experience of user.
The embodiment of the invention also provides a kind of electronic equipment, as shown in figure 8, include processor 801, communication interface 802,
Memory 803 and communication bus 804, wherein processor 801, communication interface 802, memory 803 are complete by communication bus 804
At mutual communication,
Memory 803, for storing computer program;
Processor 801 when for executing the program stored on memory 803, realizes following steps:
Obtain the paragraph format of the paragraph of each catalogue to be generated in document, the attribute of a configuration, segment number and paragraph mark;
According to paragraph mark and paragraph format, the paragraph of title is selected as from the paragraph of the catalogue to be generated;
According to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained;
According to described hierarchical relationship, the catalogue of the paragraph of the catalogue to be generated is generated.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component
Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard
Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just
It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy
The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also
To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit,
CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal
Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing
It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete
Door or transistor logic, discrete hardware components.
As seen from the above, in scheme provided in an embodiment of the present invention, by the section for obtaining each catalogue to be generated in document
Paragraph format, the attribute of a configuration, segment number and the paragraph mark fallen, filters out the section as title from the paragraph of catalogue to be generated
It falls, and the hierarchical structure of these paragraphs is divided, automatically generate catalogue, to improve the formation efficiency of catalogue, promoted and used
The experience at family promotes the experience of user.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can
It reads to be stored with instruction in storage medium, when run on a computer, so that computer executes any institute in above-described embodiment
The catalogue generation method stated.
In another embodiment provided by the invention, a kind of computer program product comprising instruction is additionally provided, when it
When running on computers, so that computer executes any catalogue generation method in above-described embodiment.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program
Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or
It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter
Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium
In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer
Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center
User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or
Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or
It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with
It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk
Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (17)
1. a kind of catalogue generation method, which is characterized in that the described method includes:
Obtain the paragraph format of the paragraph of catalogue to be generated in document, the attribute of a configuration, segment number and paragraph mark;
According to paragraph mark and paragraph format, the paragraph of title is selected as from the paragraph of catalogue to be generated;
According to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained;
According to described hierarchical relationship, the catalogue of the paragraph of catalogue to be generated is generated.
2. the method according to claim 1, wherein described according to paragraph mark and paragraph format, to be generated
The paragraph of title is selected as in the paragraph of catalogue, comprising:
Determine that paragraph mark is not belonging to preset the paragraph of non-title paragraph mark in the paragraph of catalogue to be generated;
According to paragraph format, the paragraph of title is selected as from identified paragraph.
3. according to the method described in claim 2, being selected from identified paragraph it is characterized in that, described according to paragraph format
Select the paragraph as title, comprising:
According to paragraph format, identified predicted value of each paragraph as title is calculated;
According to the predicted value of identified each paragraph, the paragraph of title is selected as from identified paragraph.
4. according to the method described in claim 3, it is characterized in that, the paragraph format of a paragraph, comprising: number format, word
Number, text last character and text size;
It is described according to paragraph format, calculate determined by predicted value of each paragraph as title, comprising:
According to the font size of text in paragraph, it is poor to calculate font size between identified each paragraph and preset title font size;
According to following formula, obtain determined by each paragraph the corresponding predicted value of prediction element, wherein paragraph
Prediction element include: the number format of paragraph, font size be poor, in paragraph in the last character of text and paragraph text length
Degree:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element is pre-
If bits of offset;
According to predicted value obtained, identified predicted value of each paragraph as title is calculated.
5. the method according to any one of claim 2-4, which is characterized in that the non-title paragraph, which identifies, includes:
Indicate the paragraph mark of subdocument, the paragraph mark for indicating table, the paragraph mark for indicating directory field, the section for indicating picture
It falls mark and identifies the paragraph mark of blank paragraph.
6. the method according to claim 1, wherein the segment number and the attribute of a configuration of the paragraph according to selected by,
Obtain the hierarchical relationship between selected paragraph, comprising:
According to the attribute of a configuration of paragraph, selected paragraph is divided into paragraph group;
According to segment number and following formula, the management section of each paragraph in each paragraph group is determined:
In the presence of one paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows: [section of the paragraph
Number, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];Next phase is not present in the paragraph in affiliated paragraph group
When adjacent paragraph, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph];
It puts in order according to the segment number of selected paragraph, and the format in the management section and selected paragraph according to selected paragraph
Attribute obtains the hierarchical relationship between selected paragraph.
7. according to the method described in claim 6, it is characterized in that, the segment number according to selected paragraph puts in order, and
According to the attribute of a configuration in the management section and selected paragraph of selected paragraph, the hierarchical relationship between selected paragraph is obtained,
Include:
It puts in order and following manner, is obtained in selected paragraph between adjacent two paragraph according to the segment number of selected paragraph
Hierarchical relationship:
Determine the section relationship between the management section of the first paragraph and the management section of the second paragraph, wherein the first segment
It falling and the second paragraph are as follows: in selected paragraph, two paragraphs for putting in order adjacent according to segment number put in order according to segment number,
Second paragraph is arranged in after first paragraph;
When the section relationship is disjoint relationship, the attribute of a configuration of first paragraph and the format of second paragraph are judged
Whether attribute is identical;
If they are the same, the hierarchical relationship between first paragraph and the second paragraph is determined are as follows: paragraph at the same level;
If not identical, search similar paragraph, wherein the similar paragraph are as follows: put in order according to segment number, in selected paragraph
Paragraph identical with the attribute of a configuration of second paragraph before first paragraph;The similar paragraph if it exists, determines institute
Stating the second paragraph is the hierarchical relationship between the similar paragraph are as follows: at the same level;The similar paragraph if it does not exist determines described
Hierarchical relationship between one paragraph and the second paragraph are as follows: the small paragraph of segment number is the upper level paragraph of the big paragraph of segment number;
The section relationship be non-disjoint relationship when, execute the lookup similar paragraph the step of.
8. the method according to the description of claim 7 is characterized in that the attribute of a configuration of the judgement first paragraph with it is described
Whether the attribute of a configuration of the second paragraph is identical, comprising:
Judge whether first paragraph and the second paragraph have number;
If there is number, according to the number format of the number format of first paragraph and the second paragraph, described first is judged
Whether the attribute of a configuration of paragraph is identical as the attribute of a configuration of second paragraph;
If not there is number, then it is arranged according to the text of the setting of the text of first paragraph and the second paragraph, judges described the
Whether the attribute of a configuration of one paragraph is identical as the attribute of a configuration of second paragraph.
9. a kind of catalogue generating means, which is characterized in that described device includes:
Paragraph information obtains module, for obtain the paragraph format of the paragraph of catalogue to be generated in document, the attribute of a configuration, segment number and
Paragraph mark;
Paragraph screening module, for being selected as from the paragraph of the catalogue to be generated according to paragraph mark and paragraph format
The paragraph of title;
Analytic hierarchy process module obtains the level between selected paragraph for the segment number and the attribute of a configuration according to selected paragraph
Relationship;
Catalog generation module, for generating the catalogue of the paragraph of the catalogue to be generated according to described hierarchical relationship.
10. device according to claim 9, which is characterized in that the paragraph screening module, comprising:
First screens submodule, and paragraph mark is not belonging to preset non-title paragraph in the paragraph for determining the catalogue to be generated
The paragraph of mark;
Second screening submodule, for being selected as the paragraph of title from identified paragraph according to paragraph format.
11. device according to claim 10, which is characterized in that the second screening submodule, comprising:
Predictor calculation unit, for calculating identified predicted value of each paragraph as title according to paragraph format;
Title selecting unit is selected as mark for the predicted value according to identified each paragraph from identified paragraph
The paragraph of topic.
12. device according to claim 11, which is characterized in that the paragraph format of a paragraph, comprising: number format,
Font size, text last character and text size:
The predictor calculation unit, is specifically used for:
According to the font size of text in paragraph, it is poor to calculate font size between identified each paragraph and preset title font size;
According to following formula, obtain determined by each paragraph the corresponding predicted value of prediction element, wherein paragraph
Prediction element include: the number format of paragraph, font size be poor, in paragraph in the last character of text and paragraph text length
Degree:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element is pre-
If bits of offset;
According to predicted value obtained, identified predicted value of each paragraph as title is calculated.
13. device described in any one of 0-12 according to claim 1, which is characterized in that the non-title paragraph, which identifies, includes:
Indicate the paragraph mark of subdocument, the paragraph mark for indicating table, the paragraph mark for indicating directory field, the section for indicating picture
It falls mark and identifies the paragraph mark of blank paragraph.
14. device according to claim 9, which is characterized in that the analytic hierarchy process module, comprising:
It is grouped submodule and selected paragraph is divided into paragraph group for the attribute of a configuration according to paragraph;
Interval division submodule, for determining the management of each paragraph in each paragraph group according to segment number and following formula
Section:
In the presence of one paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows: [section of the paragraph
Number, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];Next phase is not present in the paragraph in affiliated paragraph group
When adjacent paragraph, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph];
Level divides submodule, for putting in order according to the segment number of selected paragraph, and according to the directorial area of selected paragraph
Between and selected paragraph the attribute of a configuration, obtain the hierarchical relationship between selected paragraph.
15. device according to claim 14, it is characterised in that:
The level divides submodule, specifically for putting in order according to the segment number of selected paragraph and following manner, obtains
Hierarchical relationship in selected paragraph between adjacent two paragraph:
Determine the section relationship between the management section of the first paragraph and the management section of the second paragraph, wherein the first segment
It falling and the second paragraph are as follows: in selected paragraph, two paragraphs for putting in order adjacent according to segment number put in order according to segment number,
Second paragraph is arranged in after first paragraph;
When the section relationship is disjoint relationship, the attribute of a configuration of first paragraph and the format of second paragraph are judged
Whether attribute is identical;
If they are the same, the hierarchical relationship between first paragraph and the second paragraph is determined are as follows: paragraph at the same level;
If not identical, search similar paragraph, wherein the similar paragraph are as follows: put in order according to segment number, in selected paragraph
Paragraph identical with the attribute of a configuration of second paragraph before first paragraph;The similar paragraph if it exists, determines institute
Stating the second paragraph is the hierarchical relationship between the similar paragraph are as follows: at the same level;The similar paragraph if it does not exist determines described
Hierarchical relationship between one paragraph and the second paragraph are as follows: the small paragraph of segment number is the upper level paragraph of the big paragraph of segment number;
The section relationship be non-disjoint relationship when, execute the lookup similar paragraph the step of.
16. device according to claim 15, which is characterized in that the level divides submodule and judges first paragraph
The attribute of a configuration it is whether identical as the attribute of a configuration of second paragraph, comprising:
Judge whether first paragraph and the second paragraph have number;
If there is number, according to the number format of the number format of first paragraph and the second paragraph, described first is judged
Whether the attribute of a configuration of paragraph is identical as the attribute of a configuration of second paragraph;
If not there is number, then it is arranged according to the text of the setting of the text of first paragraph and the second paragraph, judges described the
Whether the attribute of a configuration of one paragraph is identical as the attribute of a configuration of second paragraph.
17. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing
Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and step of claim 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711450681.1A CN109977366B (en) | 2017-12-27 | 2017-12-27 | Catalog generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711450681.1A CN109977366B (en) | 2017-12-27 | 2017-12-27 | Catalog generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109977366A true CN109977366A (en) | 2019-07-05 |
CN109977366B CN109977366B (en) | 2023-10-31 |
Family
ID=67071916
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711450681.1A Active CN109977366B (en) | 2017-12-27 | 2017-12-27 | Catalog generation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977366B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427614A (en) * | 2019-07-16 | 2019-11-08 | 深圳追一科技有限公司 | Construction method, device, electronic equipment and the storage medium of paragraph level |
CN110704573A (en) * | 2019-09-04 | 2020-01-17 | 平安科技(深圳)有限公司 | Directory storage method and device, computer equipment and storage medium |
CN112307716A (en) * | 2019-07-25 | 2021-02-02 | 珠海金山办公软件有限公司 | Document content export method, export device, electronic equipment and storage medium |
CN113642320A (en) * | 2020-04-27 | 2021-11-12 | 北京庖丁科技有限公司 | Method, device, equipment and medium for extracting document directory structure |
CN113723078A (en) * | 2021-09-07 | 2021-11-30 | 杭州叙简科技股份有限公司 | Text logic information structuring method and device and electronic equipment |
CN113822023A (en) * | 2021-09-10 | 2021-12-21 | 厦门盈趣科技股份有限公司 | Automatic standard document generation method and system |
CN114065708A (en) * | 2021-11-12 | 2022-02-18 | 珠海金山办公软件有限公司 | Method and device for processing document information, computer storage medium and terminal |
CN115995087A (en) * | 2023-03-23 | 2023-04-21 | 杭州实在智能科技有限公司 | Document catalog intelligent generation method and system based on fusion visual information |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030126165A1 (en) * | 2001-08-27 | 2003-07-03 | Segal Irit Haviv | Method for defining and optimizing criteria used to detect a contextually specific concept within a paragraph |
CN102375806A (en) * | 2010-08-23 | 2012-03-14 | 北大方正集团有限公司 | Document title extraction method and device |
WO2016128310A1 (en) * | 2015-02-13 | 2016-08-18 | Valipat | Method and system for automatically generating documents on the basis of an index |
CN107291677A (en) * | 2017-07-14 | 2017-10-24 | 北京神州泰岳软件股份有限公司 | A kind of PDF document header syntax tree generation method, device, terminal and system |
CN107301184A (en) * | 2016-04-14 | 2017-10-27 | 珠海金山办公软件有限公司 | It is a kind of to recognize the method and device that word or file generates catalogue |
-
2017
- 2017-12-27 CN CN201711450681.1A patent/CN109977366B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030126165A1 (en) * | 2001-08-27 | 2003-07-03 | Segal Irit Haviv | Method for defining and optimizing criteria used to detect a contextually specific concept within a paragraph |
CN102375806A (en) * | 2010-08-23 | 2012-03-14 | 北大方正集团有限公司 | Document title extraction method and device |
WO2016128310A1 (en) * | 2015-02-13 | 2016-08-18 | Valipat | Method and system for automatically generating documents on the basis of an index |
CN107301184A (en) * | 2016-04-14 | 2017-10-27 | 珠海金山办公软件有限公司 | It is a kind of to recognize the method and device that word or file generates catalogue |
CN107291677A (en) * | 2017-07-14 | 2017-10-24 | 北京神州泰岳软件股份有限公司 | A kind of PDF document header syntax tree generation method, device, terminal and system |
Non-Patent Citations (4)
Title |
---|
JEONG, OK-RAN ET AL: "A word-salad filtering algorithm", 《LOGIC JOURNAL OF THE IGPL》, vol. 19, no. 5, pages 666 - 678 * |
仲勇 等: "文档目录轻松做", 《电脑迷》, no. 10, pages 72 * |
戴德宝: "Word环境下论文格式模板制作", 《电脑知识与技术》 * |
戴德宝: "Word环境下论文格式模板制作", 《电脑知识与技术》, no. 07, 5 March 2009 (2009-03-05), pages 177 - 178 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427614A (en) * | 2019-07-16 | 2019-11-08 | 深圳追一科技有限公司 | Construction method, device, electronic equipment and the storage medium of paragraph level |
CN110427614B (en) * | 2019-07-16 | 2023-08-08 | 深圳追一科技有限公司 | Construction method and device of paragraph level, electronic equipment and storage medium |
CN112307716A (en) * | 2019-07-25 | 2021-02-02 | 珠海金山办公软件有限公司 | Document content export method, export device, electronic equipment and storage medium |
CN110704573A (en) * | 2019-09-04 | 2020-01-17 | 平安科技(深圳)有限公司 | Directory storage method and device, computer equipment and storage medium |
WO2021042542A1 (en) * | 2019-09-04 | 2021-03-11 | 平安科技(深圳)有限公司 | Table of contents storage method and apparatus, computer device and storage medium |
CN110704573B (en) * | 2019-09-04 | 2023-12-22 | 平安科技(深圳)有限公司 | Catalog storage method, catalog storage device, computer equipment and storage medium |
CN113642320A (en) * | 2020-04-27 | 2021-11-12 | 北京庖丁科技有限公司 | Method, device, equipment and medium for extracting document directory structure |
CN113723078A (en) * | 2021-09-07 | 2021-11-30 | 杭州叙简科技股份有限公司 | Text logic information structuring method and device and electronic equipment |
CN113822023A (en) * | 2021-09-10 | 2021-12-21 | 厦门盈趣科技股份有限公司 | Automatic standard document generation method and system |
CN113822023B (en) * | 2021-09-10 | 2023-08-18 | 厦门盈趣科技股份有限公司 | Automatic standard document generation method and system |
CN114065708A (en) * | 2021-11-12 | 2022-02-18 | 珠海金山办公软件有限公司 | Method and device for processing document information, computer storage medium and terminal |
CN115995087A (en) * | 2023-03-23 | 2023-04-21 | 杭州实在智能科技有限公司 | Document catalog intelligent generation method and system based on fusion visual information |
Also Published As
Publication number | Publication date |
---|---|
CN109977366B (en) | 2023-10-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977366A (en) | A kind of catalogue generation method and device | |
JP6134437B2 (en) | Data transfer monitoring system, data transfer monitoring method, and base system | |
CN109308284B (en) | Report menu generation method and device, computer equipment and storage medium | |
CN108920242B (en) | Navigation bar generation method and device | |
CN108345485A (en) | identification method and device for interface view | |
CN106570025B (en) | Data filtering method and device | |
US11681764B2 (en) | System and method for monitoring internet activity | |
CN104408077B (en) | Picture shows method, shows system and terminal | |
KR101744892B1 (en) | System and method for data searching using time series tier indexing | |
CN104699837B (en) | Method, device and server for selecting illustrated pictures of web pages | |
CN103186666A (en) | Method, device and equipment for searching based on favorites | |
CN114219373A (en) | Method, system, device and medium for generating digital process visual flow chart | |
WO2013041022A1 (en) | Url navigation page generation method, device and program | |
CN111309970A (en) | Data retrieval method and device, electronic equipment and storage medium | |
CN106599009A (en) | Display method and device for map data | |
CN109743309A (en) | A kind of illegal request recognition methods, device and electronic equipment | |
US20150081710A1 (en) | Data typing with probabilistic maps having imbalanced error costs | |
JP2012252529A5 (en) | ||
US7953705B2 (en) | Autonomic retention classes | |
JP2009245162A (en) | Display control device, display control method, and display control program | |
CN106937173A (en) | Video broadcasting method and device | |
EP2400446A1 (en) | Equipment managing apparatus, equipment managing method, and equipment managing system | |
CN110427557A (en) | Main broadcaster's recommended method, device, electronic equipment and computer readable storage medium | |
CN105205062A (en) | Data storage method and data reading method and device | |
CN106874354A (en) | A kind of daily record data screening technique and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |