CN109977366A - A kind of catalogue generation method and device - Google Patents

A kind of catalogue generation method and device Download PDF

Info

Publication number
CN109977366A
CN109977366A CN201711450681.1A CN201711450681A CN109977366A CN 109977366 A CN109977366 A CN 109977366A CN 201711450681 A CN201711450681 A CN 201711450681A CN 109977366 A CN109977366 A CN 109977366A
Authority
CN
China
Prior art keywords
paragraph
attribute
catalogue
configuration
format
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711450681.1A
Other languages
Chinese (zh)
Other versions
CN109977366B (en
Inventor
辛洋
蒙燕玲
皮霞林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Jinshan Mobile Technology Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Jinshan Mobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd, Guangzhou Jinshan Mobile Technology Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN201711450681.1A priority Critical patent/CN109977366B/en
Publication of CN109977366A publication Critical patent/CN109977366A/en
Application granted granted Critical
Publication of CN109977366B publication Critical patent/CN109977366B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The embodiment of the invention provides a kind of catalogue generation method, this method comprises: obtaining paragraph format, the attribute of a configuration, segment number and the paragraph mark of the paragraph of catalogue to be generated in document;According to paragraph mark and paragraph format, the paragraph of title is selected as from the paragraph of catalogue to be generated;According to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained;According to described hierarchical relationship, the catalogue of the paragraph of catalogue to be generated is generated.Using catalogue generation method provided in an embodiment of the present invention, catalogue can be automatically generated, to improve the formation efficiency of catalogue, promotes the experience of user.

Description

A kind of catalogue generation method and device
Technical field
The present invention relates to computer software application technical fields, more particularly to a kind of catalogue generation method and device.
Background technique
The structure and level of document can be intuitively presented for user in catalogue, and user is helped to carry out the content in document Quickly positioning, facilitates understanding and reading of the user to document.
However, the method for generating catalogue at present, needs manually to pick out the text as directory content from document, and by One is the information such as text setting title pattern, the outline rank picked out, and then generates catalogue on this basis.Therefore, catalogue Generating process it is very cumbersome, cause user generate catalogue efficiency it is lower, the experience to user is poor.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of catalogue generation method and device, to improve the generation effect of catalogue Rate promotes the experience of user.
To solve the above problems, the embodiment of the present invention proposes a kind of catalogue generation method, which comprises
Obtain the paragraph format of the paragraph of catalogue to be generated in document, the attribute of a configuration, segment number and paragraph mark;
According to paragraph mark and paragraph format, the paragraph of title is selected as from the paragraph of catalogue to be generated;
According to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained;
According to described hierarchical relationship, the catalogue of the paragraph of catalogue to be generated is generated.
Preferably, it is described according to paragraph mark and paragraph format, title is selected as from the paragraph of catalogue to be generated Paragraph, comprising:
Determine that paragraph mark is not belonging to preset the paragraph of non-title paragraph mark in the paragraph of catalogue to be generated;
According to paragraph format, the paragraph of title is selected as from identified paragraph.
Preferably, it is described according to paragraph format, the paragraph of title is selected as from identified paragraph, comprising:
According to paragraph format, identified predicted value of each paragraph as title is calculated;
According to the predicted value of identified each paragraph, the paragraph of title is selected as from identified paragraph.
Preferably, the paragraph format of a paragraph, comprising: number format, font size, text last character and text are long Degree;
It is described according to paragraph format, calculate determined by predicted value of each paragraph as title, comprising:
According to the font size of text in paragraph, it is poor to calculate font size between identified each paragraph and preset title font size;
According to following formula, the corresponding predicted value of prediction element of identified each paragraph is obtained, wherein a section The prediction element fallen includes: that the number format of paragraph, font size be poor, text in the last character of text and paragraph in paragraph Length:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element Default bias position;
According to predicted value obtained, identified predicted value of each paragraph as title is calculated.
Preferably, the non-title paragraph mark includes:
It indicates the paragraph mark of subdocument, the paragraph mark for indicating table, the paragraph mark for indicating directory field, indicate picture Paragraph mark and mark blank paragraph paragraph mark.
Preferably, the segment number and the attribute of a configuration of the paragraph according to selected by, the level obtained between selected paragraph is closed System, comprising:
According to the attribute of a configuration of paragraph, selected paragraph is divided into paragraph group;
According to segment number and following formula, the management section of each paragraph in each paragraph group is determined:
In the presence of one paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows: [paragraph Segment number, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];There is no next in affiliated paragraph group for the paragraph When adjacent paragraph, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph];
It puts in order according to the segment number of selected paragraph, and according to the management section of selected paragraph and selected paragraph The attribute of a configuration obtains the hierarchical relationship between selected paragraph.
Preferably, the segment number according to selected paragraph puts in order, and according to the management section of selected paragraph and The attribute of a configuration of selected paragraph obtains the hierarchical relationship between selected paragraph, comprising:
Put in order according to the segment number of selected paragraph and following manner, obtain in selected paragraph adjacent two paragraph it Between hierarchical relationship:
Determine the section relationship between the management section of the first paragraph and the management section of the second paragraph, wherein described the One paragraph and the second paragraph are as follows: in selected paragraph, two paragraphs for putting in order adjacent according to segment number arrange suitable according to segment number Sequence, second paragraph are arranged in after first paragraph;
When the section relationship is disjoint relationship, judge first paragraph the attribute of a configuration and second paragraph Whether the attribute of a configuration is identical;
If they are the same, the hierarchical relationship between first paragraph and the second paragraph is determined are as follows: paragraph at the same level;
If not identical, similar paragraph is searched, wherein the similar paragraph are as follows: it puts in order according to segment number, selected paragraph In before first paragraph paragraph identical with the attribute of a configuration of second paragraph;The similar paragraph if it exists, really Fixed second paragraph is the hierarchical relationship between the similar paragraph are as follows: at the same level;The similar paragraph if it does not exist, determines institute State the hierarchical relationship between the first paragraph and the second paragraph are as follows: the small paragraph of segment number is the upper level paragraph of the big paragraph of segment number;
The section relationship be non-disjoint relationship when, execute the lookup similar paragraph the step of.
Preferably, the attribute of a configuration of the attribute of a configuration for judging first paragraph and second paragraph whether phase Together, comprising:
Judge whether first paragraph and the second paragraph have number;
If there is number, according to the number format of the number format of first paragraph and the second paragraph, described in judgement Whether the attribute of a configuration of the first paragraph is identical as the attribute of a configuration of second paragraph;
If not there is number, then it is arranged according to the text of the setting of the text of first paragraph and the second paragraph, judges institute Whether the attribute of a configuration for stating the first paragraph is identical as the attribute of a configuration of second paragraph.
The embodiment of the invention also provides a kind of catalogue generating means, described device includes:
Paragraph information obtains module, for obtaining the paragraph format of the paragraph of catalogue to be generated, the attribute of a configuration, section in document Number and paragraph mark;
Paragraph screening module, for being selected from the paragraph of the catalogue to be generated according to paragraph mark and paragraph format Paragraph as title;
Analytic hierarchy process module obtains between selected paragraph for the segment number and the attribute of a configuration according to selected paragraph Hierarchical relationship;
Catalog generation module, for generating the catalogue of the paragraph of the catalogue to be generated according to described hierarchical relationship.
Preferably, the paragraph screening module, comprising:
First screens submodule, and paragraph mark is not belonging to preset non-title in the paragraph for determining the catalogue to be generated The paragraph of paragraph mark;
Second screening submodule, for being selected as the paragraph of title from identified paragraph according to paragraph format.
Preferably, the second screening submodule, comprising:
Predictor calculation unit, for calculating identified predicted value of each paragraph as title according to paragraph format;
Title selecting unit selects to make from identified paragraph for the predicted value according to identified each paragraph For the paragraph of title.
Preferably, the paragraph format of a paragraph, comprising: number format, font size, text last character and text are long Degree:
The predictor calculation unit, is specifically used for:
According to the font size of text in paragraph, it is poor to calculate font size between identified each paragraph and preset title font size;
According to following formula, the corresponding predicted value of prediction element of identified each paragraph is obtained, wherein a section The prediction element fallen includes: that the number format of paragraph, font size be poor, text in the last character of text and paragraph in paragraph Length:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element Default bias position;
According to predicted value obtained, identified predicted value of each paragraph as title is calculated.
Preferably, the non-title paragraph mark includes:
It indicates the paragraph mark of subdocument, the paragraph mark for indicating table, the paragraph mark for indicating directory field, indicate picture Paragraph mark and mark blank paragraph paragraph mark.
Preferably, the analytic hierarchy process module, comprising:
It is grouped submodule and selected paragraph is divided into paragraph group for the attribute of a configuration according to paragraph;
Interval division submodule, for determining each paragraph in each paragraph group according to segment number and following formula Manage section:
In the presence of one paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows: [paragraph Segment number, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];There is no next in affiliated paragraph group for the paragraph When adjacent paragraph, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph];
Level divides submodule, for putting in order according to the segment number of selected paragraph, and according to the pipe of selected paragraph The attribute of a configuration in section and selected paragraph is managed, the hierarchical relationship between selected paragraph is obtained.
It is preferable:
The level divides submodule, specifically for putting in order according to the segment number of selected paragraph and following manner, Obtain the hierarchical relationship in selected paragraph between adjacent two paragraph:
Determine the section relationship between the management section of the first paragraph and the management section of the second paragraph, wherein described the One paragraph and the second paragraph are as follows: in selected paragraph, two paragraphs for putting in order adjacent according to segment number arrange suitable according to segment number Sequence, second paragraph are arranged in after first paragraph;
When the section relationship is disjoint relationship, judge first paragraph the attribute of a configuration and second paragraph Whether the attribute of a configuration is identical;
If they are the same, the hierarchical relationship between first paragraph and the second paragraph is determined are as follows: paragraph at the same level;
If not identical, similar paragraph is searched, wherein the similar paragraph are as follows: it puts in order according to segment number, selected paragraph In before first paragraph paragraph identical with the attribute of a configuration of second paragraph;The similar paragraph if it exists, really Fixed second paragraph is the hierarchical relationship between the similar paragraph are as follows: at the same level;The similar paragraph if it does not exist, determines institute State the hierarchical relationship between the first paragraph and the second paragraph are as follows: the small paragraph of segment number is the upper level paragraph of the big paragraph of segment number;
The section relationship be non-disjoint relationship when, execute the lookup similar paragraph the step of.
Preferably, the level, which divides submodule, judges the attribute of a configuration of first paragraph and the lattice of second paragraph Whether formula attribute is identical, comprising:
Judge whether first paragraph and the second paragraph have number;
If there is number, according to the number format of the number format of first paragraph and the second paragraph, described in judgement Whether the attribute of a configuration of the first paragraph is identical as the attribute of a configuration of second paragraph;
If not there is number, then it is arranged according to the text of the setting of the text of first paragraph and the second paragraph, judges institute Whether the attribute of a configuration for stating the first paragraph is identical as the attribute of a configuration of second paragraph.
The embodiment of the invention also provides a kind of electronic equipment, including processor, communication interface, memory and communication are total Line, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any of the above-described method and step.
The embodiment of the invention also provides a kind of computer program products comprising instruction, when it runs on computers When, so that computer executes any of the above-described catalogue generation method.
Catalogue generation method provided in an embodiment of the present invention and device, by the section for obtaining each catalogue to be generated in document Paragraph format, the attribute of a configuration, segment number and the paragraph mark fallen, filters out the section as title from the paragraph of catalogue to be generated It falls, and the hierarchical structure of these paragraphs is divided, automatically generate catalogue, to improve the formation efficiency of catalogue, promoted and used The experience at family.Certainly, it implements any of the products of the present invention or method must be not necessarily required to reach all the above excellent simultaneously Point.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of catalogue generation method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of another catalogue generation method provided in an embodiment of the present invention;
Fig. 3 is the flow diagram of another catalogue generation method provided in an embodiment of the present invention;
Fig. 4 is the catalogue exemplary diagram using schemes generation provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of catalogue generating means in the embodiment of the present invention;
Fig. 6 is the structural schematic diagram of another catalogue generating means in the embodiment of the present invention;
Fig. 7 is the structural schematic diagram of another catalogue generating means in the embodiment of the present invention;
Fig. 8 is the structure chart of a kind of electronic equipment.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
In order to which the generating process for solving file catalogue in the prior art is very cumbersome, user is caused to generate catalogue
The lower problem of efficiency, the invention proposes a kind of catalogue generation method and devices.
Catalogue generation method provided in an embodiment of the present invention is illustrated on the whole below.
In a kind of implementation of the invention, above-mentioned catalogue generation method includes:
Obtain the paragraph format of the paragraph of catalogue to be generated in document, the attribute of a configuration, segment number and paragraph mark;
According to paragraph mark and paragraph format, the paragraph of title is selected as from the paragraph of catalogue to be generated;
According to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained;
According to hierarchical relationship, the catalogue of the paragraph of catalogue to be generated is generated.
When as seen from the above, using the catalogue in schemes generation document provided in an embodiment of the present invention, by obtaining document Paragraph format, the attribute of a configuration, segment number and the paragraph mark of the paragraph of interior each catalogue to be generated, from the paragraph of catalogue to be generated The paragraph as title is filtered out, and the hierarchical structure of these paragraphs is divided, catalogue is automatically generated, to improve catalogue Formation efficiency, promote the experience of user.
Catalogue generation method provided in an embodiment of the present invention will be described in detail by specific embodiment below.
As shown in Figure 1, being a kind of flow diagram of catalogue generation method provided in an embodiment of the present invention, including walk as follows It is rapid:
Step S101: paragraph format, the attribute of a configuration, segment number and the paragraph mark of the paragraph of catalogue to be generated in document are obtained Know.
In a kind of implementation, the paragraph format of paragraph include the number format of paragraph, the font size of text, text it is last The length etc. of text in one character and paragraph;The attribute of a configuration of paragraph can be arranged according to the number format and text of paragraph into Row judgement, wherein the text setting of paragraph includes the situation placed in the middle of paragraph, overstriking situation etc.;The segment number of paragraph is that the paragraph exists The serial number being arranged in order in the paragraph of all catalogues to be generated;The paragraph mark of paragraph then embodies the content of paragraph, example Such as: the content of the paragraph may be picture, directory field, subdocument.
In this step, the paragraph of catalogue to be generated can be all paragraphs in document, be also possible to be selected by user Paragraph, can also be all paragraphs in document in the specific page number, specifically can need to determine by user, the present invention implement Example does not limit this.
Step S102: according to paragraph mark and paragraph format, title is selected as from the paragraph of the catalogue to be generated Paragraph.
In one implementation, the paragraph that can successively traverse each catalogue to be generated, it is to be generated to what is traversed The paragraph of catalogue is judged, the paragraph of the catalogue to be generated as title is filtered out, until the section of all catalogues to be generated Traversal is fallen to finish.
It is of course also possible to not consider sequence, directly the paragraph of catalogue to be generated is screened, need to only be guaranteed all to be generated It can be all screened at the paragraph of catalogue, it is not limited in the embodiment of the present invention.
In this step, by the screening of the paragraph to catalogue to be generated, by the header segment in the paragraph of catalogue to be generated It falls and is distinguished with other paragraphs.And title, inherently to the summary of document content, therefore, subsequent need to be to as title Paragraph between hierarchical relationship divided so that generate catalogue efficiency improve.
Step S103: according to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained.
In this step, the hierarchical relationship between selected paragraph is the hierarchical relationship between title, and title it Between hierarchical relationship, the hierarchical structure between the paragraph of catalogue to be generated can be embodied.
Step S104: according to described hierarchical relationship, the catalogue of the paragraph of the catalogue to be generated is generated.
In one implementation, the catalogue of generation can be shown in a document, such as: it shows in mesh to be generated The prevpage of the paragraph of record or the next page of the paragraph of catalogue to be generated etc., it is not limited in the embodiment of the present invention.
When as seen from the above, using schemes generation catalogue provided in an embodiment of the present invention, by obtain document in it is each to Paragraph format, the attribute of a configuration, segment number and the paragraph mark for generating the paragraph of catalogue, filter out work from the paragraph of catalogue to be generated For the paragraph of title, and the hierarchical structure of these paragraphs is divided, automatically generate catalogue, to improve the generation effect of catalogue Rate promotes the experience of user.
As shown in Fig. 2, for the flow diagram of another catalogue generation method provided in an embodiment of the present invention, including it is as follows Step:
Step S201: paragraph format, the attribute of a configuration, segment number and the paragraph of the paragraph of each catalogue to be generated in document are obtained Mark.
Step S202: determine that paragraph mark is not belonging to preset non-title paragraph mark in the paragraph of the catalogue to be generated Paragraph.
The paragraph of different content has different paragraph marks, therefore, will can determine first and be not belonging to title paragraph Paragraph mark screen, further according to these paragraphs identify, filtering out from the paragraph of all catalogues to be generated to be The paragraph of title paragraph.
In one implementation, preset non-title paragraph mark includes: the paragraph mark for indicating subdocument, expression table The paragraph mark of lattice, the paragraph mark for indicating directory field, the paragraph mark of the paragraph mark for indicating picture and mark blank paragraph, Certainly may also include other can determine this section get blamed title paragraph paragraph mark.
Step S203: according to paragraph format, the paragraph of title is selected as from identified paragraph.
In the previous step, having been filtered out in the paragraph of catalogue to be generated using paragraph mark to be the paragraph of title, And in these paragraphs, it is also possible to which there are text fragments, such as text paragraph of some non-titles etc..Therefore, in this step, The paragraph format for continuing through the paragraph of catalogue to be generated is further selected from the paragraph that previous step filters out as mark The paragraph of topic.
In one implementation, prediction of each paragraph as title determined by being calculated according to paragraph format Value, the paragraph of title is selected as further according to the predicted value.
Specifically, predicted value of each paragraph as title can be calculated in the following manner:
Step 1: according to the font size of text in paragraph, calculating between identified each paragraph and preset title font size Font size is poor.
Step 2: according to following formula, obtain the corresponding predicted value of prediction element of identified each paragraph:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element Default bias position
Wherein, the prediction element of a paragraph include: the number format of paragraph, font size be poor, in paragraph text last The length of text in a character and paragraph.And it is each prediction element default weight according to different prediction elements to prediction result Influence size determine that the default bias position of each prediction element is peak excursion position that the prediction element allows in the algorithm Range embodies the confidence interval of the prediction element, is both obtained according to the training of the machine learning algorithm of early period.
Step 3: according to predicted value obtained, calculating identified predicted value of each paragraph as title.
In one implementation, it can use Sigmoid function, to each predictive elements being calculated in previous step The corresponding predicted value of element is calculated, and predicted value of each paragraph as title is finally obtained.
Specifically, one can be set for each paragraph is calculated as the predicted value of title using Sigmoid function A threshold value judges that the paragraph is title paragraph when a paragraph is greater than threshold value as the predicted value of title, when a paragraph is made When being less than threshold value for the predicted value of title, judge that the paragraph is text paragraph.Wherein, in one implementation, which can To be set as 0.5.
It should be noted that the paragraph format of each paragraph includes multiple element, and such as: number format, font size, text are most The latter character, text size, line space, character pitch etc..The embodiment of the present invention is by machine learning algorithm, to each element It has carried out statistics to calculate, foundation of the optimal several elements of effect as subsequent calculating is selected according to training result.I.e. finally with Number format, font size, text last character and the text size of paragraph be foundation, to paragraph as title predicted value into Row calculates.But the embodiment of the present invention is only illustrated for above-mentioned, and limiting the invention.
Step S204: according to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained.
Step S205: according to described hierarchical relationship, the catalogue of the paragraph of the catalogue to be generated is generated.
Step S201 is identical as the step S101 of inventive embodiments shown in Fig. 1, shown in step S204 to step S205 and Fig. 1 The step S103 of inventive embodiments is identical to step S104, no longer repeats one by one here.
It is each to be generated by what is got when as seen from the above, using schemes generation catalogue provided in an embodiment of the present invention At the paragraph mark of the paragraph of catalogue, the paragraph for being not belonging to preset non-title paragraph is filtered out, further according to the paragraph of each paragraph Format filters out the paragraph as title, then according to the segment number and the attribute of a configuration of selected paragraph, obtain selected paragraph it Between hierarchical relationship, automatically generate catalogue, to improve the formation efficiency of catalogue, promote the experience of user.
As shown in figure 3, for the flow diagram of another catalogue generation method provided in an embodiment of the present invention, including it is as follows Step:
Step S301: paragraph format, the attribute of a configuration, segment number and the paragraph of the paragraph of each catalogue to be generated in document are obtained Mark.
Step S302: according to paragraph mark and paragraph format, title is selected as from the paragraph of the catalogue to be generated Paragraph.
Step S303: according to the attribute of a configuration of paragraph, selected paragraph is divided into paragraph group.
In one implementation, the identical paragraph of the attribute of a configuration is divided into one group, thus by the section of catalogue to be generated It falls and is divided into different paragraph groups.
Step S304: according to segment number and following formula, the management section of each paragraph in each paragraph group is determined.
Specifically, in the presence of a paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];The paragraph is in affiliated paragraph group When being fallen there is no next adjacent segment, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph].
For example, if the segment number of a paragraph is 1, and and its next adjacent paragraph in the same paragraph group Segment number be 4, then the management section of the paragraph be [1,3];If the segment number of a paragraph is 6, and in the same paragraph group, The paragraph is fallen there is no next adjacent segment, then the management section of the paragraph is [6,6].
Step S305: putting in order according to the segment number of selected paragraph, and according to the management section of selected paragraph and institute The attribute of a configuration of paragraph is selected, the hierarchical relationship between selected paragraph is obtained.
In one implementation, it puts in order according to the segment number of selected paragraph and following manner, selected by acquisition Hierarchical relationship in paragraph between adjacent two paragraph:
Step 1: determining the section relationship between the management section of the first paragraph and the management section of the second paragraph.
Wherein, it in two paragraphs for putting in order adjacent according to segment number selected, puts in order according to segment number, second Paragraph is arranged in after first paragraph.And the relationship between section is divided into two kinds, mutually from, intersection and comprising in this programme In, by between section intersection and inclusion relation be known as non-disjoint relationship.
For example, if the management section of the first paragraph is [1,1], the management section of the second paragraph is [2,2], two Do not have between section be overlapped part, then the relationship between the first paragraph section corresponding with the second paragraph be mutually from;If the The management section of one paragraph is [1,5], and the management section of the second paragraph is [2,2], and the management section of the second paragraph completely includes In the management section of the first paragraph, then the relationship between the first paragraph section corresponding with the second paragraph is to include that is, non-phase From relationship;If the management section of the first paragraph is [1,2], the management section of the second paragraph is [2,3], the management of the first paragraph There is the part being overlapped with the management section of the second paragraph completely in section, then between the first paragraph section corresponding with the second paragraph Relationship be intersection, be also non-disjoint relationship.
Step 2:
The first situation:
Section relationship between the management section of first paragraph and the management section of the second paragraph is disjoint relationship:
(1) whether the attribute of a configuration for judging the first paragraph is identical as the attribute of a configuration of the second paragraph.
In one implementation, whether the attribute of a configuration for judging the first paragraph is identical as the attribute of a configuration of the second paragraph, It is accomplished by the following way:
First, it is determined that whether the first paragraph and the second paragraph have number.
If there is number, according to the number format of the number format of the first paragraph and the second paragraph, first segment is judged Whether the attribute of a configuration fallen is identical as the attribute of a configuration of the second paragraph.If number format is identical, the first paragraph and are judged The attribute of a configuration of two paragraphs is identical;
If non- have number, i.e. the first paragraph and the second paragraph are not all numbered, or only wherein one section has number, Another section does not have, then is arranged according to the text of paragraph, judges the attribute of a configuration of the first paragraph and the format category of second paragraph Whether property is identical.If the text setting of paragraph is identical, judge that the attribute of a configuration of the first paragraph and the second paragraph is identical.
In oneainstance, the text setting of paragraph include font size size, whether it is placed in the middle and whether overstriking, when font size, occupy When neutralization overstriking setting is all identical, the text setting of as paragraph is identical.
(2) if the attribute of a configuration of the first paragraph is identical as the attribute of a configuration of the second paragraph, it is determined that the first paragraph and second Hierarchical relationship between paragraph are as follows: paragraph at the same level.
If not identical, similar paragraph is searched;Wherein, similar paragraph are as follows: put in order according to segment number, in selected paragraph Paragraph identical with the attribute of a configuration of the second paragraph before first paragraph;
Similar paragraph if it exists, it is determined that the second paragraph is the hierarchical relationship between similar paragraph are as follows: at the same level.
Similar paragraph if it does not exist, it is determined that the hierarchical relationship between the first paragraph and the second paragraph are as follows: the small section of segment number Fall be the big paragraph of segment number upper level paragraph.
In one implementation, when searching similar paragraph, according to the segment number of each paragraph, from the previous of the first paragraph Paragraph starts, successively the paragraph before recursive lookup.
Second situation:
Section relationship between the management section of first paragraph and the management section of the second paragraph is non-disjoint relationship:
The step of executing above-mentioned lookup similar paragraph:
Similar paragraph if it exists, it is determined that the second paragraph is the hierarchical relationship between similar paragraph are as follows: at the same level.
Similar paragraph if it does not exist, it is determined that the hierarchical relationship between the first paragraph and the second paragraph are as follows: the small section of segment number Fall be the big paragraph of segment number upper level paragraph.
Step S306: according to hierarchical relationship, the catalogue of the paragraph of catalogue to be generated is generated.
Step S301 to step S302 and the step S101 of inventive embodiments shown in Fig. 1 identical, the step to step S102 S306 is identical as the step S104 of inventive embodiments shown in Fig. 1, no longer repeats one by one here.
It is each to be generated by what is got when as seen from the above, using schemes generation catalogue provided in an embodiment of the present invention At the paragraph mark and paragraph format of the paragraph of catalogue, the paragraph as title is filtered out, then according to the section of selected paragraph Number, management section, the attribute of a configuration further according to each paragraph and the relationship for managing section are divided for each paragraph, selected by acquisition Hierarchical relationship between paragraph, automatically generates catalogue, to improve the formation efficiency of catalogue, promotes the experience of user.
In order to make it easy to understand, being explained below by a specific example to catalogue generation method shown in Fig. 3.
As shown in figure 4, for the catalogue of application schemes generation provided in an embodiment of the present invention.
All titles in catalogue shown in Fig. 4, for according to paragraph mark and paragraph format, from all mesh to be generated The paragraph as title filtered out in the paragraph of record.
1, according to the attribute of a configuration of paragraph, these paragraphs are divided into paragraph group.
As can be seen that " distinguishing hierarchy summary ", " purpose ", " conclusion ", " algorithm ", " verifying " and " points for attention " is one Paragraph group, " 1. automatic test " and " 2. manual test " are a paragraph group, " 1.1. specimen page source ", " 1.2. correlation data " and " 1.3 scene " is a paragraph group, and " 2.1. specimen page source ", " 2.2. method " and " 2.3 conclusion " is a paragraph group.
2, the management section of each paragraph in each paragraph group is determined.
The management section of " distinguishing hierarchy summary " is [1,1];The management section of " purpose " is [2,2];The management of " conclusion " Section is [3,3];The management section of " algorithm " is [4,4];The management section of " verifying " is [5,13];The management of " points for attention " Section is [14,14];
The management section of " 1. automatic test " is [6,9];The management section of " 2. manual test " is [10,10];
The management section in " 1.1. specimen page source " is [7,7];The management section of " 1.2. correlation data " is [8,8];" 1.3 The management section of scape " is [9,9];
The management section in " 2.1. specimen page source " is [11,11];The management section of " 2.2. method " is [12,12];" 2.3 knots By " management section be [13,13].
3, it according to the attribute of a configuration in the management section and selected paragraph of selected paragraph, obtains between selected paragraph Hierarchical relationship.
" distinguishing hierarchy summary " and " purpose ", " purpose " and " conclusion ", " conclusion " and " algorithm ", " algorithm " and " verifying ", " 1.1. specimen page source " and " 1.2. correlation data ", " 1.2. correlation data " and " 1.3 scene ", " 2.1. specimen page source " and " side 2.2. Management section is mutually from and the attribute of a configuration is identical, therefore, between above-mentioned paragraph between method ", " 2.2. method " and " 2.3 conclusion " Hierarchical relationship be peer;
Between " verifying " and " 1. automatic test ", " 1. automatic test " and " 1.1. specimen page source " management section for non-phase from, The then identical paragraph of the recursive lookup attribute of a configuration, it can be seen that " 1. automatic test " and " 1.1. specimen page source " are not deposited before In similar paragraph, therefore, " verifying " is the upper level of " 1. automatic test ", and " 1. automatic test " is upper the one of " 1.1. specimen page source " Grade;
It is non-phase from then the recursive lookup attribute of a configuration is identical that section is managed between " 2. manual test " and " 2.1. specimen page source " Paragraph, it can be seen that " 2.1. specimen page source " is identical as " 1.3 scene " attribute of a configuration, therefore, " 2.1. specimen page source " with " 1.3 Scape " is peer;
It is mutually from and lattice that section is managed between " 1.3 scene " and " 2. manual test ", " points for attention " and " 2.3 conclusion " Formula attribute is not identical, then the identical paragraph of the recursive lookup attribute of a configuration, it can be seen that " 2. manual test " and " 1. automatic test " The attribute of a configuration is identical, and " points for attention " are identical as " verifying " attribute of a configuration, and therefore, " 2. manual test " is with " 1. automatic test " Peer, " points for attention " and " verifying " are peer.
4, according to hierarchical relationship, the catalogue of the paragraph of catalogue to be generated, i.e., result as shown in Figure 4 are generated.
Corresponding with above- mentioned information method for pushing, the embodiment of the invention also provides a kind of catalogue generating means.
As shown in figure 5, for a kind of structural schematic diagram of catalogue generating means in the embodiment of the present invention, which includes:
Paragraph information obtains module 510, for obtaining paragraph format, the format of the paragraph of each catalogue to be generated in document Attribute, segment number and paragraph mark.
Paragraph screening module 520, for being selected from the paragraph of the catalogue to be generated according to paragraph mark and paragraph format Select the paragraph as title.
Analytic hierarchy process module 530 obtains between selected paragraph for the segment number and the attribute of a configuration according to selected paragraph Hierarchical relationship.
Catalog generation module 540, for generating the catalogue of the paragraph of the catalogue to be generated according to described hierarchical relationship.
As seen from the above, it in scheme provided in an embodiment of the present invention, is obtained in document by paragraph data obtaining module 510 The paragraph format of the paragraph of each catalogue to be generated, the attribute of a configuration, segment number and paragraph mark, paragraph screening module 520 is to be generated At the paragraph filtered out in the paragraph of catalogue as title, and by analytic hierarchy process module 530 to the hierarchical structures of these paragraphs into Row divides, and final catalog generation module 540 automatically generates catalogue, to improve the formation efficiency of catalogue, promotes the experience of user.
As shown in fig. 6, for the structural schematic diagram of catalogue generating means another in the embodiment of the present invention, which includes:
Paragraph information obtains module 610, for obtaining paragraph format, the format of the paragraph of each catalogue to be generated in document Attribute, segment number and paragraph mark.
Paragraph screening module 620, comprising:
First screens submodule 621, and paragraph mark is not belonging to preset non-in the paragraph for determining the catalogue to be generated The paragraph of title paragraph mark.
In one implementation, it is not belonging to preset non-title paragraph mark are as follows: indicate the paragraph mark of subdocument, indicate The paragraph mark of the paragraph mark of table, the paragraph mark for indicating directory field, the paragraph mark for indicating picture and mark blank paragraph Know.
Second screening submodule 622, for being selected as the section of title from identified paragraph according to paragraph format It falls.
In one implementation, the second screening submodule 622, comprising:
Predictor calculation unit 622 (a), for calculating identified each paragraph as title according to paragraph format Predicted value;
It is specifically used for:
In one implementation, the paragraph format of a paragraph, comprising: number format, font size, text the last character Symbol and text size.
According to the font size of text in paragraph, it is poor to calculate font size between identified each paragraph and preset title font size.
According to following formula, the corresponding predicted value of prediction element of identified each paragraph is obtained, wherein a section The prediction element fallen includes: that the number format of paragraph, font size be poor, text in the last character of text and paragraph in paragraph Length:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element Default bias position.
According to predicted value obtained, identified predicted value of each paragraph as title is calculated.
Title selecting unit 622 (b), for the predicted value according to identified each paragraph, from identified paragraph It is selected as the paragraph of title.
Analytic hierarchy process module 630 obtains between selected paragraph for the segment number and the attribute of a configuration according to selected paragraph Hierarchical relationship.
Catalog generation module 640, for generating the catalogue of the paragraph of the catalogue to be generated according to described hierarchical relationship.
When as seen from the above, using the catalogue in schemes generation document provided in an embodiment of the present invention, pass through paragraph information The paragraph mark of the paragraph for each catalogue to be generated that module 610 is got is obtained, the first screening submodule 621, which filters out, not to be belonged to In the paragraph for presetting non-title paragraph, the second screening submodule 622 is filtered out further according to the paragraph format of each paragraph as mark The paragraph of topic, then analytic hierarchy process module 630 obtains between selected paragraph according to the segment number and the attribute of a configuration of selected paragraph Hierarchical relationship, catalog generation module 640 automatically generates catalogue, to improve the formation efficiency of catalogue, promotes the experience of user.
As shown in fig. 7, for the structural schematic diagram of the catalogue generating means in document another in the embodiment of the present invention, the dress It sets and includes:
Paragraph information obtains module 710, for obtaining paragraph format, the format of the paragraph of each catalogue to be generated in document Attribute, segment number and paragraph mark.
Paragraph screening module 720, for being selected from the paragraph of the catalogue to be generated according to paragraph mark and paragraph format Select the paragraph as title.
Analytic hierarchy process module 730, comprising:
It is grouped submodule 731 and selected paragraph is divided into paragraph group for the attribute of a configuration according to paragraph.
Interval division submodule 732, for determining each paragraph in each paragraph group according to segment number and following formula Management section:
In the presence of one paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows: [paragraph Segment number, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];There is no next in affiliated paragraph group for the paragraph When adjacent paragraph, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph].
Level divides submodule 733, for putting in order according to the segment number of selected paragraph, and according to selected paragraph The attribute of a configuration in section and selected paragraph is managed, the hierarchical relationship between selected paragraph is obtained.
Specifically, determining the section relationship between the management section of the first paragraph and the management section of the second paragraph, wherein First paragraph and the second paragraph are as follows: in selected paragraph, two paragraphs for putting in order adjacent according to segment number, according to segment number It puts in order, second paragraph is arranged in after first paragraph;
When the section relationship is disjoint relationship, judge first paragraph the attribute of a configuration and second paragraph Whether the attribute of a configuration is identical;
If they are the same, the hierarchical relationship between first paragraph and the second paragraph is determined are as follows: paragraph at the same level;
If not identical, similar paragraph is searched, wherein the similar paragraph are as follows: it puts in order according to segment number, selected paragraph In before first paragraph paragraph identical with the attribute of a configuration of second paragraph;The similar paragraph if it exists, really Fixed second paragraph is the hierarchical relationship between the similar paragraph are as follows: at the same level;The similar paragraph if it does not exist, determines institute State the hierarchical relationship between the first paragraph and the second paragraph are as follows: the small paragraph of segment number is the upper level paragraph of the big paragraph of segment number;
The section relationship be non-disjoint relationship when, execute the lookup similar paragraph the step of.
In one implementation, the level divides the attribute of a configuration and described that submodule judges first paragraph Whether the attribute of a configuration of two paragraphs is identical, comprising:
Judge whether first paragraph and the second paragraph have number;
If there is number, according to the number format of the number format of first paragraph and the second paragraph, described in judgement Whether the attribute of a configuration of the first paragraph is identical as the attribute of a configuration of second paragraph;
If not there is number, then it is arranged according to the text of first paragraph, judges the attribute of a configuration of first paragraph It is whether identical as the attribute of a configuration of second paragraph.
Catalog generation module 740, for generating the catalogue of the paragraph of the catalogue to be generated according to described hierarchical relationship.
When as seen from the above, using schemes generation catalogue provided in an embodiment of the present invention, pass through paragraph data obtaining module The paragraph of the paragraph of the 710 each catalogues to be generated got identifies and paragraph format, paragraph screening module 720 filter out conduct Then the paragraph of title is grouped submodule 731 and interval division submodule 732 according to the segment number of selected paragraph, is each section Division management section is fallen, level divides submodule 733 further according to the attribute of a configuration of each paragraph and the relationship in management section, obtains Hierarchical relationship between selected paragraph, catalog generation module 740 automatically generate catalogue, so that the formation efficiency of catalogue is improved, Promote the experience of user.
The embodiment of the invention also provides a kind of electronic equipment, as shown in figure 8, include processor 801, communication interface 802, Memory 803 and communication bus 804, wherein processor 801, communication interface 802, memory 803 are complete by communication bus 804 At mutual communication,
Memory 803, for storing computer program;
Processor 801 when for executing the program stored on memory 803, realizes following steps:
Obtain the paragraph format of the paragraph of each catalogue to be generated in document, the attribute of a configuration, segment number and paragraph mark;
According to paragraph mark and paragraph format, the paragraph of title is selected as from the paragraph of the catalogue to be generated;
According to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained;
According to described hierarchical relationship, the catalogue of the paragraph of the catalogue to be generated is generated.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.
As seen from the above, in scheme provided in an embodiment of the present invention, by the section for obtaining each catalogue to be generated in document Paragraph format, the attribute of a configuration, segment number and the paragraph mark fallen, filters out the section as title from the paragraph of catalogue to be generated It falls, and the hierarchical structure of these paragraphs is divided, automatically generate catalogue, to improve the formation efficiency of catalogue, promoted and used The experience at family promotes the experience of user.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can It reads to be stored with instruction in storage medium, when run on a computer, so that computer executes any institute in above-described embodiment The catalogue generation method stated.
In another embodiment provided by the invention, a kind of computer program product comprising instruction is additionally provided, when it When running on computers, so that computer executes any catalogue generation method in above-described embodiment.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (17)

1. a kind of catalogue generation method, which is characterized in that the described method includes:
Obtain the paragraph format of the paragraph of catalogue to be generated in document, the attribute of a configuration, segment number and paragraph mark;
According to paragraph mark and paragraph format, the paragraph of title is selected as from the paragraph of catalogue to be generated;
According to the segment number and the attribute of a configuration of selected paragraph, the hierarchical relationship between selected paragraph is obtained;
According to described hierarchical relationship, the catalogue of the paragraph of catalogue to be generated is generated.
2. the method according to claim 1, wherein described according to paragraph mark and paragraph format, to be generated The paragraph of title is selected as in the paragraph of catalogue, comprising:
Determine that paragraph mark is not belonging to preset the paragraph of non-title paragraph mark in the paragraph of catalogue to be generated;
According to paragraph format, the paragraph of title is selected as from identified paragraph.
3. according to the method described in claim 2, being selected from identified paragraph it is characterized in that, described according to paragraph format Select the paragraph as title, comprising:
According to paragraph format, identified predicted value of each paragraph as title is calculated;
According to the predicted value of identified each paragraph, the paragraph of title is selected as from identified paragraph.
4. according to the method described in claim 3, it is characterized in that, the paragraph format of a paragraph, comprising: number format, word Number, text last character and text size;
It is described according to paragraph format, calculate determined by predicted value of each paragraph as title, comprising:
According to the font size of text in paragraph, it is poor to calculate font size between identified each paragraph and preset title font size;
According to following formula, obtain determined by each paragraph the corresponding predicted value of prediction element, wherein paragraph Prediction element include: the number format of paragraph, font size be poor, in paragraph in the last character of text and paragraph text length Degree:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element is pre- If bits of offset;
According to predicted value obtained, identified predicted value of each paragraph as title is calculated.
5. the method according to any one of claim 2-4, which is characterized in that the non-title paragraph, which identifies, includes:
Indicate the paragraph mark of subdocument, the paragraph mark for indicating table, the paragraph mark for indicating directory field, the section for indicating picture It falls mark and identifies the paragraph mark of blank paragraph.
6. the method according to claim 1, wherein the segment number and the attribute of a configuration of the paragraph according to selected by, Obtain the hierarchical relationship between selected paragraph, comprising:
According to the attribute of a configuration of paragraph, selected paragraph is divided into paragraph group;
According to segment number and following formula, the management section of each paragraph in each paragraph group is determined:
In the presence of one paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows: [section of the paragraph Number, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];Next phase is not present in the paragraph in affiliated paragraph group When adjacent paragraph, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph];
It puts in order according to the segment number of selected paragraph, and the format in the management section and selected paragraph according to selected paragraph Attribute obtains the hierarchical relationship between selected paragraph.
7. according to the method described in claim 6, it is characterized in that, the segment number according to selected paragraph puts in order, and According to the attribute of a configuration in the management section and selected paragraph of selected paragraph, the hierarchical relationship between selected paragraph is obtained, Include:
It puts in order and following manner, is obtained in selected paragraph between adjacent two paragraph according to the segment number of selected paragraph Hierarchical relationship:
Determine the section relationship between the management section of the first paragraph and the management section of the second paragraph, wherein the first segment It falling and the second paragraph are as follows: in selected paragraph, two paragraphs for putting in order adjacent according to segment number put in order according to segment number, Second paragraph is arranged in after first paragraph;
When the section relationship is disjoint relationship, the attribute of a configuration of first paragraph and the format of second paragraph are judged Whether attribute is identical;
If they are the same, the hierarchical relationship between first paragraph and the second paragraph is determined are as follows: paragraph at the same level;
If not identical, search similar paragraph, wherein the similar paragraph are as follows: put in order according to segment number, in selected paragraph Paragraph identical with the attribute of a configuration of second paragraph before first paragraph;The similar paragraph if it exists, determines institute Stating the second paragraph is the hierarchical relationship between the similar paragraph are as follows: at the same level;The similar paragraph if it does not exist determines described Hierarchical relationship between one paragraph and the second paragraph are as follows: the small paragraph of segment number is the upper level paragraph of the big paragraph of segment number;
The section relationship be non-disjoint relationship when, execute the lookup similar paragraph the step of.
8. the method according to the description of claim 7 is characterized in that the attribute of a configuration of the judgement first paragraph with it is described Whether the attribute of a configuration of the second paragraph is identical, comprising:
Judge whether first paragraph and the second paragraph have number;
If there is number, according to the number format of the number format of first paragraph and the second paragraph, described first is judged Whether the attribute of a configuration of paragraph is identical as the attribute of a configuration of second paragraph;
If not there is number, then it is arranged according to the text of the setting of the text of first paragraph and the second paragraph, judges described the Whether the attribute of a configuration of one paragraph is identical as the attribute of a configuration of second paragraph.
9. a kind of catalogue generating means, which is characterized in that described device includes:
Paragraph information obtains module, for obtain the paragraph format of the paragraph of catalogue to be generated in document, the attribute of a configuration, segment number and Paragraph mark;
Paragraph screening module, for being selected as from the paragraph of the catalogue to be generated according to paragraph mark and paragraph format The paragraph of title;
Analytic hierarchy process module obtains the level between selected paragraph for the segment number and the attribute of a configuration according to selected paragraph Relationship;
Catalog generation module, for generating the catalogue of the paragraph of the catalogue to be generated according to described hierarchical relationship.
10. device according to claim 9, which is characterized in that the paragraph screening module, comprising:
First screens submodule, and paragraph mark is not belonging to preset non-title paragraph in the paragraph for determining the catalogue to be generated The paragraph of mark;
Second screening submodule, for being selected as the paragraph of title from identified paragraph according to paragraph format.
11. device according to claim 10, which is characterized in that the second screening submodule, comprising:
Predictor calculation unit, for calculating identified predicted value of each paragraph as title according to paragraph format;
Title selecting unit is selected as mark for the predicted value according to identified each paragraph from identified paragraph The paragraph of topic.
12. device according to claim 11, which is characterized in that the paragraph format of a paragraph, comprising: number format, Font size, text last character and text size:
The predictor calculation unit, is specifically used for:
According to the font size of text in paragraph, it is poor to calculate font size between identified each paragraph and preset title font size;
According to following formula, obtain determined by each paragraph the corresponding predicted value of prediction element, wherein paragraph Prediction element include: the number format of paragraph, font size be poor, in paragraph in the last character of text and paragraph text length Degree:
The default weight * of one corresponding predicted value=prediction element of the prediction element prediction element+prediction element is pre- If bits of offset;
According to predicted value obtained, identified predicted value of each paragraph as title is calculated.
13. device described in any one of 0-12 according to claim 1, which is characterized in that the non-title paragraph, which identifies, includes:
Indicate the paragraph mark of subdocument, the paragraph mark for indicating table, the paragraph mark for indicating directory field, the section for indicating picture It falls mark and identifies the paragraph mark of blank paragraph.
14. device according to claim 9, which is characterized in that the analytic hierarchy process module, comprising:
It is grouped submodule and selected paragraph is divided into paragraph group for the attribute of a configuration according to paragraph;
Interval division submodule, for determining the management of each paragraph in each paragraph group according to segment number and following formula Section:
In the presence of one paragraph is in affiliated paragraph group when one adjacent paragraph, the management section of the paragraph are as follows: [section of the paragraph Number, the segment number -1 of the paragraph next adjacent paragraph in affiliated paragraph group];Next phase is not present in the paragraph in affiliated paragraph group When adjacent paragraph, the management section of the paragraph are as follows: [segment number of the paragraph, the segment number of the paragraph];
Level divides submodule, for putting in order according to the segment number of selected paragraph, and according to the directorial area of selected paragraph Between and selected paragraph the attribute of a configuration, obtain the hierarchical relationship between selected paragraph.
15. device according to claim 14, it is characterised in that:
The level divides submodule, specifically for putting in order according to the segment number of selected paragraph and following manner, obtains Hierarchical relationship in selected paragraph between adjacent two paragraph:
Determine the section relationship between the management section of the first paragraph and the management section of the second paragraph, wherein the first segment It falling and the second paragraph are as follows: in selected paragraph, two paragraphs for putting in order adjacent according to segment number put in order according to segment number, Second paragraph is arranged in after first paragraph;
When the section relationship is disjoint relationship, the attribute of a configuration of first paragraph and the format of second paragraph are judged Whether attribute is identical;
If they are the same, the hierarchical relationship between first paragraph and the second paragraph is determined are as follows: paragraph at the same level;
If not identical, search similar paragraph, wherein the similar paragraph are as follows: put in order according to segment number, in selected paragraph Paragraph identical with the attribute of a configuration of second paragraph before first paragraph;The similar paragraph if it exists, determines institute Stating the second paragraph is the hierarchical relationship between the similar paragraph are as follows: at the same level;The similar paragraph if it does not exist determines described Hierarchical relationship between one paragraph and the second paragraph are as follows: the small paragraph of segment number is the upper level paragraph of the big paragraph of segment number;
The section relationship be non-disjoint relationship when, execute the lookup similar paragraph the step of.
16. device according to claim 15, which is characterized in that the level divides submodule and judges first paragraph The attribute of a configuration it is whether identical as the attribute of a configuration of second paragraph, comprising:
Judge whether first paragraph and the second paragraph have number;
If there is number, according to the number format of the number format of first paragraph and the second paragraph, described first is judged Whether the attribute of a configuration of paragraph is identical as the attribute of a configuration of second paragraph;
If not there is number, then it is arranged according to the text of the setting of the text of first paragraph and the second paragraph, judges described the Whether the attribute of a configuration of one paragraph is identical as the attribute of a configuration of second paragraph.
17. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and step of claim 1-8.
CN201711450681.1A 2017-12-27 2017-12-27 Catalog generation method and device Active CN109977366B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711450681.1A CN109977366B (en) 2017-12-27 2017-12-27 Catalog generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711450681.1A CN109977366B (en) 2017-12-27 2017-12-27 Catalog generation method and device

Publications (2)

Publication Number Publication Date
CN109977366A true CN109977366A (en) 2019-07-05
CN109977366B CN109977366B (en) 2023-10-31

Family

ID=67071916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711450681.1A Active CN109977366B (en) 2017-12-27 2017-12-27 Catalog generation method and device

Country Status (1)

Country Link
CN (1) CN109977366B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427614A (en) * 2019-07-16 2019-11-08 深圳追一科技有限公司 Construction method, device, electronic equipment and the storage medium of paragraph level
CN110704573A (en) * 2019-09-04 2020-01-17 平安科技(深圳)有限公司 Directory storage method and device, computer equipment and storage medium
CN112307716A (en) * 2019-07-25 2021-02-02 珠海金山办公软件有限公司 Document content export method, export device, electronic equipment and storage medium
CN113642320A (en) * 2020-04-27 2021-11-12 北京庖丁科技有限公司 Method, device, equipment and medium for extracting document directory structure
CN113723078A (en) * 2021-09-07 2021-11-30 杭州叙简科技股份有限公司 Text logic information structuring method and device and electronic equipment
CN113822023A (en) * 2021-09-10 2021-12-21 厦门盈趣科技股份有限公司 Automatic standard document generation method and system
CN114065708A (en) * 2021-11-12 2022-02-18 珠海金山办公软件有限公司 Method and device for processing document information, computer storage medium and terminal
CN115995087A (en) * 2023-03-23 2023-04-21 杭州实在智能科技有限公司 Document catalog intelligent generation method and system based on fusion visual information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030126165A1 (en) * 2001-08-27 2003-07-03 Segal Irit Haviv Method for defining and optimizing criteria used to detect a contextually specific concept within a paragraph
CN102375806A (en) * 2010-08-23 2012-03-14 北大方正集团有限公司 Document title extraction method and device
WO2016128310A1 (en) * 2015-02-13 2016-08-18 Valipat Method and system for automatically generating documents on the basis of an index
CN107291677A (en) * 2017-07-14 2017-10-24 北京神州泰岳软件股份有限公司 A kind of PDF document header syntax tree generation method, device, terminal and system
CN107301184A (en) * 2016-04-14 2017-10-27 珠海金山办公软件有限公司 It is a kind of to recognize the method and device that word or file generates catalogue

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030126165A1 (en) * 2001-08-27 2003-07-03 Segal Irit Haviv Method for defining and optimizing criteria used to detect a contextually specific concept within a paragraph
CN102375806A (en) * 2010-08-23 2012-03-14 北大方正集团有限公司 Document title extraction method and device
WO2016128310A1 (en) * 2015-02-13 2016-08-18 Valipat Method and system for automatically generating documents on the basis of an index
CN107301184A (en) * 2016-04-14 2017-10-27 珠海金山办公软件有限公司 It is a kind of to recognize the method and device that word or file generates catalogue
CN107291677A (en) * 2017-07-14 2017-10-24 北京神州泰岳软件股份有限公司 A kind of PDF document header syntax tree generation method, device, terminal and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JEONG, OK-RAN ET AL: "A word-salad filtering algorithm", 《LOGIC JOURNAL OF THE IGPL》, vol. 19, no. 5, pages 666 - 678 *
仲勇 等: "文档目录轻松做", 《电脑迷》, no. 10, pages 72 *
戴德宝: "Word环境下论文格式模板制作", 《电脑知识与技术》 *
戴德宝: "Word环境下论文格式模板制作", 《电脑知识与技术》, no. 07, 5 March 2009 (2009-03-05), pages 177 - 178 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427614A (en) * 2019-07-16 2019-11-08 深圳追一科技有限公司 Construction method, device, electronic equipment and the storage medium of paragraph level
CN110427614B (en) * 2019-07-16 2023-08-08 深圳追一科技有限公司 Construction method and device of paragraph level, electronic equipment and storage medium
CN112307716A (en) * 2019-07-25 2021-02-02 珠海金山办公软件有限公司 Document content export method, export device, electronic equipment and storage medium
CN110704573A (en) * 2019-09-04 2020-01-17 平安科技(深圳)有限公司 Directory storage method and device, computer equipment and storage medium
WO2021042542A1 (en) * 2019-09-04 2021-03-11 平安科技(深圳)有限公司 Table of contents storage method and apparatus, computer device and storage medium
CN110704573B (en) * 2019-09-04 2023-12-22 平安科技(深圳)有限公司 Catalog storage method, catalog storage device, computer equipment and storage medium
CN113642320A (en) * 2020-04-27 2021-11-12 北京庖丁科技有限公司 Method, device, equipment and medium for extracting document directory structure
CN113723078A (en) * 2021-09-07 2021-11-30 杭州叙简科技股份有限公司 Text logic information structuring method and device and electronic equipment
CN113822023A (en) * 2021-09-10 2021-12-21 厦门盈趣科技股份有限公司 Automatic standard document generation method and system
CN113822023B (en) * 2021-09-10 2023-08-18 厦门盈趣科技股份有限公司 Automatic standard document generation method and system
CN114065708A (en) * 2021-11-12 2022-02-18 珠海金山办公软件有限公司 Method and device for processing document information, computer storage medium and terminal
CN115995087A (en) * 2023-03-23 2023-04-21 杭州实在智能科技有限公司 Document catalog intelligent generation method and system based on fusion visual information

Also Published As

Publication number Publication date
CN109977366B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN109977366A (en) A kind of catalogue generation method and device
JP6134437B2 (en) Data transfer monitoring system, data transfer monitoring method, and base system
CN109308284B (en) Report menu generation method and device, computer equipment and storage medium
CN108920242B (en) Navigation bar generation method and device
CN108345485A (en) identification method and device for interface view
CN106570025B (en) Data filtering method and device
US11681764B2 (en) System and method for monitoring internet activity
CN104408077B (en) Picture shows method, shows system and terminal
KR101744892B1 (en) System and method for data searching using time series tier indexing
CN104699837B (en) Method, device and server for selecting illustrated pictures of web pages
CN103186666A (en) Method, device and equipment for searching based on favorites
CN114219373A (en) Method, system, device and medium for generating digital process visual flow chart
WO2013041022A1 (en) Url navigation page generation method, device and program
CN111309970A (en) Data retrieval method and device, electronic equipment and storage medium
CN106599009A (en) Display method and device for map data
CN109743309A (en) A kind of illegal request recognition methods, device and electronic equipment
US20150081710A1 (en) Data typing with probabilistic maps having imbalanced error costs
JP2012252529A5 (en)
US7953705B2 (en) Autonomic retention classes
JP2009245162A (en) Display control device, display control method, and display control program
CN106937173A (en) Video broadcasting method and device
EP2400446A1 (en) Equipment managing apparatus, equipment managing method, and equipment managing system
CN110427557A (en) Main broadcaster's recommended method, device, electronic equipment and computer readable storage medium
CN105205062A (en) Data storage method and data reading method and device
CN106874354A (en) A kind of daily record data screening technique and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant