CN109885818A - A kind of powerpoint presentation is to Beamer PowerPoint conversion method and system - Google Patents

A kind of powerpoint presentation is to Beamer PowerPoint conversion method and system Download PDF

Info

Publication number
CN109885818A
CN109885818A CN201910097017.6A CN201910097017A CN109885818A CN 109885818 A CN109885818 A CN 109885818A CN 201910097017 A CN201910097017 A CN 201910097017A CN 109885818 A CN109885818 A CN 109885818A
Authority
CN
China
Prior art keywords
powerpoint
data
beamer
formula
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910097017.6A
Other languages
Chinese (zh)
Other versions
CN109885818B (en
Inventor
宋军
张坤
徐衡
曹威
夏雨
吴雅笛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN201910097017.6A priority Critical patent/CN109885818B/en
Publication of CN109885818A publication Critical patent/CN109885818A/en
Application granted granted Critical
Publication of CN109885818B publication Critical patent/CN109885818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of powerpoint presentations to Beamer PowerPoint conversion method and system, it include: that source file data are extracted, according to the original presentation manuscript that user provides, all lantern slides are obtained, then obtain the data information of the text fragment on each lantern slide;The analysis of source file data, the present invention realize that source file data are analyzed using novel transfer learning technology, and the mode of the information according to recorded in variable and data storage analyzes text data and attribute, different property contents is distinguish;And conversion process is carried out to the data of other formats;File destination generates, and the original presentation manuscript information analyzed and converted is sequentially written in Beamer PowerPoint by the blank PowerPoint for defining a format to be converted according to the location information of differentiation.

Description

A kind of powerpoint presentation is to Beamer PowerPoint conversion method and system
Technical field
The present invention relates to document conversion and extractive techniques, and in particular to a kind of powerpoint presentation is drilled to Beamer Show manuscript conversion method and system.
Background technique
With the rise of cloud computing and mobile Internet, office software is just welcoming the deep change of market and technological layer Leather.PowerPoint is generally made by text, picture etc. as one of office software, wherein it is aobvious to add some dynamics The audio-video document for showing effect, it is very widely used in fields such as business, education, government organs.
Microsoft PowerPoint, abbreviation PowerPoint are a demonstration texts by Microsoft Corporation exploitation Original text program is one of component in Microsoft Office system.It is by commercial staff, teacher, student and trainer Member is widely used.According to Microsoft developer's statistics indicate that, about 300,000,000 PowerPoints are made of PowerPoint every year 's.
Beamer is the free PowerPoint tools based on LaTeX, due to TeX outstanding mathematical formulae control and The report of file typesetting function, general professional stronger discussion or meeting all uses Beamer PowerPoint, it is special Retribution for sins is accused and there is extensive use in the fields such as scientific research.
Transfer learning is research direction modish in the subject of artificial intelligence and machine learning, and a kind of new study Thought and mode.Machine learning is a kind of important method of artificial intelligence, and development at present is most rapid, the most significant side of effect Method.What machine learning solved is that machine is allowed automatically to obtain knowledge from data, applied in new problem.Transfer learning conduct One important branch of machine learning lays particular emphasis on the knowledge migration learnt being applied to new problem, with emphasis on solution When initial data deficiency, by the Data Migration of other field, expand initial data, to improve arithmetic accuracy.
Clustering algorithm is famous unsupervised-learning algorithm, for cluster, a data set is given, by the data set According to some " index ", together the Data induction of similar index, different classes is formed.K-means cluster is most widely used General clustering algorithm.As most of conventional machines learning algorithm, algorithm effect is limited by initial data, when initial data not When sufficient, algorithm limited accuracy.
Process using Beamer production PowerPoint is more complicated for PowerPoint, and single use passes Machine learning algorithm fine granularity of uniting is insufficient, and user is needed to have the documenting ability and program capability of profession.Therefore, it uses Transfer learning and clustering algorithm realize the quickly and precisely conversion of manuscript, can reduce layman and make Beamer PowerPoint Difficulty, improve Beamer PowerPoint applicability and generality.
Summary of the invention
The technical problem to be solved in the present invention is that for above-mentioned powerpoint presentation and Beamer PowerPoint Can not flexible conversion the problem of, ask especially with conventional machines learning algorithm for single document classification fine granularity is insufficient Topic provides a kind of powerpoint presentation to Beamer PowerPoint conversion method and system.
A kind of powerpoint presentation is to Beamer PowerPoint conversion method, comprising:
S1, it introduces the data extraction that Apache POI realizes PowerPoint source file: source file being pre-processed, is obtained Source file paragraph information is taken, then carries out being extracted and preserved comprising the data of text, picture, table, formula;
S2, the analysis of source file data is carried out: according to the content extracted to PowerPoint source file, by the text of each paragraph This corresponding font size, line number, horizontal layout position are summarized as set of source data Ta, preset PowerPoint conversion Beamer goes through History information is as migrating data collection Tb, the two is merged into training dataset T;Definition is used for the Euclidean of K-means clustering algorithm Distance function distedWith minimum squared error function E;Transfer learning algorithm is executed, the weight vectors w of paragraph is initialized, and It calculates for the weight distribution p on data set Tt;It executes clustering algorithm to cluster data set T, by calling Euclidean distance Function distedWith minimum squared error function E, different paragraphs is incorporated into k class, then computation migration error rate ∈tIt updates Weight vectorIteration operation setting is repeatedly to obtain final classification device ht, and by the classification of text, picture, table, formula As a result it saves;Scaling, denoising, binary conversion treatment are done to formula, then by OCR and semantic conversion technical transform target formula, it is raw At the Beamer PowerPoint formula of formatting;
S3, it introduces JACOB realization Beamer file destination generation: the text of preservation, picture, table, formula is determined Writing position in default Beamer template, text, picture, table, formula are sequentially written in the Beamer file of target, complete At the conversion of PowerPoint.
Further, the introducing Apache POI of step S1 realizes that the specific method of source file data extraction includes:
Selection dialog box FileDialog in S11, calling system file, Microsoft to be converted is uploaded for user Powerpoint presentation;
After the completion of S12, upload, the getSlides method provided by HSLFSlideShow object in POI is somebody's turn to do All slide data information in powerpoint presentation;
The extraction of S13, text data, by " Item " that is provided in JACOB component, " Range ", " Text ", " Font ", " Size " parameter reads content of text, test font size, paragraph format, paragraph call number information in file;
The data extraction of S14, remaining format pass through the figure in the GETALLPictures method acquisition PowerPoint of POI Piece obtains table, the picture for extracting FileOutputStream, the formula for extracting Clipboard by GETTables method, And the data of extraction are saved.
Further, the specific method of the source file data analysis of step S2 includes:
S21, the mode that is stored in PowerPoint of statistics text data, by the corresponding font size of each paragraph text, line number, Summarize as set of source data T horizontal layout positiona, the length is m, load preset PowerPoint according to same format and convert Beamer historical information is as migrating data collection Tb, the length is n;The two is merged into training dataset T, the length is m+n;
S22, definition data set text data paragraph sample are expressed asMass center is expressed asWherein i=1,2 ..., s indicate paragraph call number, and j=1,2 ..., t indicate characteristic, The Euclidean distance function of every cluster mass center He the paragraph distance is calculated for K-means algorithm further according to above-mentioned symbol definition:
Define the minimum squared error function of K-means algorithm fitting cluster mass center:
WhereinIt is cluster CiMean vector;
S23, migration algorithm is executed, initializes the weight vectors of paragraph, w indicates the initial weight of each paragraph text, should Weight is for adjusting migrating data to the influence of source data:
S24, it calculates for the weight distribution p on data set Tt, for the weight item of K-means algorithm training data, Weight distribution ptAccording to weight vectors wtIt is calculated:
S25: it executes clustering algorithm and data set T is clustered, by calling Euclidean distance function distedIt is flat with minimizing Square error function E incorporates different paragraphs into k class;
S26: according to the cluster result of K-means algorithm, computation migration error rate ∈t:
htPresentation class device is in TaUpper classification results, c indicate that clustering algorithm is sorted in TaUpper classification results, setting And βt=∈t/(1-∈t) and calculated according to the error rate and update weight vector:
S27: return step S24 is iterated, until reaching the number of iterations N of setting, to obtain final classification device ht, and classification results are saved;
S28, for different formula types, when formula is picture format, to the formula of powerpoint presentation Picture does scaling, denoising, binary conversion treatment, then by OCR and semantic conversion technical transform target formula, generates formatting Beamer PowerPoint formula.
Further, the method for the introducing JACOB realization file destination generation of step S3 includes:
S31, classification results are read, the title of storage, content of text, table, picture and formula is corresponding with source file Data establish mapping relations;
Tetra- kinds of S32, built-in Bergen, Berkeley, Ilmaneau, Marburg target Beamer templates, are selected according to user It selects, using JACOB component definition one new Beamer PowerPoint, the mesh generated in file is determined according to above-mentioned mapping relations Mark the position of element;
S33, the data flow that file destination is generated by object element, are sequentially written in target for file destination data flow In Beamer file, final Beamer PowerPoint is generated.
A kind of powerpoint presentation is to Beamer PowerPoint converting system, comprising:
Source file data extraction module: realize that the data of PowerPoint source file are extracted for introducing Apache POI: Source file is pre-processed first, obtain source file paragraph information, then carry out comprising text, picture, table, formula number According to being extracted and preserved;
Source file data analysis module: the content that PowerPoint source file is extracted for basis, by each paragraph The corresponding font size of text, line number, horizontal layout position are summarized as set of source data Ta, preset PowerPoint conversion Beamer Historical information is as migrating data collection Tb, the two is merged into training dataset T;Definition is used for the Europe of K-means clustering algorithm Family name's distance function distedWith minimum squared error function E;Transfer learning algorithm is executed, the weight vectors w of paragraph is initialized, And it calculates for the weight distribution p on data set Tt;Execute clustering algorithm data set T is clustered, by call Euclidean away from From function distedWith minimum squared error function E, different paragraphs is incorporated into k class, then computation migration error rate ∈tMore New weight vectorIteration operation setting is repeatedly to obtain final classification device ht, and by text, picture, table, formula point Class result saves;Scaling, denoising, binary conversion treatment are done to formula, then pass through OCR and semantic conversion technical transform target formula, Generate the Beamer PowerPoint formula formatted;
File destination generation module: it introduces JACOB and realizes that Beamer file destination generates: to the text, picture, table of preservation Lattice, formula determine the writing position in default Beamer template, and text, picture, table, formula are sequentially written in target In Beamer file, the conversion of PowerPoint is completed.
Compared with prior art, the invention has the advantages that: filled up realization powerpoint presentation and drilled with Beamer The technological gap for showing manuscript conversion reduces the difficulty that layman makes Beamer PowerPoint;Use novel migration Learning art provides fine granularity better document data analysis scheme, improves the applicability of Beamer PowerPoint and universal Property.
Detailed description of the invention
Below in conjunction with the accompanying drawings and embodiment the invention will be further described, in attached drawing:
Fig. 1 be in present example a kind of powerpoint presentation to the process of Beamer PowerPoint conversion method Figure;
The flow chart that transfer learning is accurately analyzed in source file data in Fig. 2 embodiment of the present invention;
The four kinds of template effect pictures converted in Fig. 3 embodiment of the present invention;
Inventory interface effect picture is converted in Fig. 4 embodiment of the present invention;
The overall conversion effect picture converted in Fig. 5 embodiment of the present invention.
Specific embodiment
For a clearer understanding of the technical characteristics, objects and effects of the present invention, now control attached drawing is described in detail A specific embodiment of the invention.
The present invention provides a kind of powerpoint presentation to Beamer PowerPoint conversion method, as shown in Figure 1, packet Include the extraction of source file data, the analysis of source file data, file destination generation.
1, extract respectively, data are analyzed, file generated obtains target Beamer and drills by data for source powerpoint presentation Show manuscript.Three steps are described separately below.
S1: source file data are extracted.Source file data first pre-process file in extracting, and obtain source file paragraph Information then carries out the extraction of text data source extraction and other formatted datas.The present invention is according to different source PowerPoint PowerPoint data object uses different extracting modes, and the data after extraction are reprocessed, preferably to adapt to target text The data format of part.
S2: source file data analysis.The function of source file data analysis is to the Accurate classification of source file content and to source The conversion of file formula.Wherein Accurate classification is to provide fine granularity better source file data analysis side by transfer learning technology Case.Beamer PowerPoint inner element position and related information are considered, under the premise of guaranteeing basic conversion effect, to source Classifying content in powerpoint presentation makes conversion effect more meet actual file situation.The conversion of source file formula It is to need individually to divide formula because powerpoint presentation is different with target Beamer PowerPoint kind format Analysis.
S3: file destination generates.In system by the text of storage, picture, table, formula data, analyzed according to source file Position record is obtained, file destination data-flow analysis is carried out.It is loaded into default Beamer template, document data flow is sequentially written in mesh In target Beamer file, the conversion of PowerPoint is completed.
2, present invention introduces the data extraction that Apache POI realizes source file, detailed process is as follows:
S11: upload button, the selection dialog box FileDialog in calling system file, for user are clicked in program operation Select Microsoft powerpoint presentation to be converted;
S12: after the completion of upload, the getSlides method provided by HSLFSlideShow object in POI returns to magic lantern The array of all common lantern slides found in piece obtains slide data letter all in the powerpoint presentation Breath.
S13: the extraction of text data, by " Item " that is provided in JACOB component, " Range ", " Text ", " Font ", " Size " parameter reads content of text, test font size, paragraph format, paragraph call number information in file.
S14: the data of extended formatting are extracted, and the figure in PowerPoint is obtained by the GETALLPictures method of POI Piece obtains table, the picture for extracting FileOutputStream, the formula for extracting Clipboard by GETTables method, And save the data of extraction, carry out the analysis of next step.
3, in the accurate analysis phase of data analysis step of the invention, the improved clustering algorithm of transfer learning is used Classify to the content of source powerpoint presentation.As shown in Fig. 2, Fig. 2 is source file data point in present example The flow chart that analysis is accurately analyzed using transfer learning algorithm.The study found that single can not be obtained very well using clustering algorithm File content classification results, especially when file is too short, the phenomenon that classification error, is easy to occur.Because in body of an instrument Format differences it is significant, data analysis can be automatically by the text automatic cluster of same format, then to different-format content of text Automatic distinguishing, and new document classification is helped using system history file classification experience by transfer learning algorithm, improve weak typing The classification accuracy of device.In cluster K-means algorithm, an Euclidean distance function peace mass center distance function is defined, Content of text similar in the powerpoint presentation of source is incorporated into same category.It is substantially to have used history number The classification of file to be sorted is helped in the part of file same characteristic features to be sorted in.Because the file after migration expands Source file, so improve the accuracy of classification.To which distinguish different font size expressions is level-one title, second level mark Topic, text, formula etc. improve accuracy, applicable performance and the scope of application of classification.Process is as follows:
S21: reading the source file text data recorded in the extraction of source file data, is existed by counting text data The mode of PowerPoint storage, using every section of font size, line number, horizontal layout as the input matrix of K-means clustering algorithm, It is set to set of source data Ta, the length is m, history convert file information are loaded according to same format, as migrating data Collect Tb, the length is n;The two is merged into training dataset T, the length is m+n;
S22: it defines data set text data paragraph sample and is expressed asMass center is expressed asWherein i=1,2 ..., s indicate paragraph call number, and j=1,2 ..., t indicate characteristic, That is location information species number, further according to one Euclidean distance function of above-mentioned symbol definition:
For calculating every cluster mass center and the paragraph distance, and cluster dividing according to this distance.It is directed to further according to k-means algorithm Cluster institute cluster division C={ C1,C2,....,Ck, definition minimizes squared error function:
WhereinIt is cluster CiMean vector.
S23: executing migration algorithm, initialize the weight vectors of paragraph, and the weight is for adjusting migrating data to source data Influence, weight is smaller, act on it is smaller, by the size discrimination of weight migrate transportable literary data in literary data with not Transportable data:
Wherein, w indicates the initial weight of each paragraph text.
S24: it calculates for the weight distribution p on data set Tt, for the weight item of K-means algorithm training data, Weight distribution ptAccording to weight vectors wtIt is calculated:
S25: the different byte number of statistics k kind indicates that k kind is classified, as the super ginseng of K-means clustering algorithm Number executes clustering algorithm and clusters to data set T, by calling Euclidean distance function distedWith minimum square error letter Number E, incorporates different paragraphs into k class.
S26: according to the cluster result of K-means algorithm, computation migration error rate:
htPresentation class device is in TaUpper classification results, c indicate that clustering algorithm is sorted in TaUpper classification results.Setting And βt=∈t/(1-∈t) and weight vector is updated according to the error rate.Pass through its computation migration number According to new weight, mistake then reduces its weight, correctly then increases its weight.The update calculation formula of weight vectors is as follows:
S27: re-executing S24 to S26 step, until reaching the number of iterations N of setting.Migration algorithm iterative analysis In data procedures, gradually reduce can not migrating data weight, data transportable in historical data and will can not gradually move The data field of shifting separates, and stops migration algorithm when the number of iterations reaches setting value.Transportable data in historical data at this time It is consistent with the feature distribution of data to be sorted trend.Final classification device h is obtained at this timet, and classification results are saved.
S28: formula conversion processing part.The conversion of formula needs to be further analyzed source file, for difference Formula type, when formula be picture format when, with reference first to the location information of powerpoint presentation, to formula picture Scaling, denoising, binary conversion treatment are done, then by OCR and semantic conversion technical transform target formula, generates the Beamer of formatting PowerPoint formula.
4, present invention introduces JACOB to realize that file destination generates, and detailed process is as follows:
S31: classification results are read, by the title of storage, content of text and table, picture, formula formatted data and source File location data establishes mapping relations.
Tetra- kinds of target Beamer templates of S32: built-in Bergen, Berkeley, Ilmaneau, Marburg, are selected according to user It selects, using JACOB component definition one new Beamer PowerPoint, the mesh generated in file is determined according to above-mentioned mapping relations Mark the position of element.
S33: the document data flow of file destination is generated by object element, file destination data flow is sequentially written in mesh It marks in Beamer file, generates final Beamer PowerPoint.
The four kinds of template effect pictures converted in Fig. 3 embodiment of the present invention;Inventory interface effect is converted in Fig. 4 embodiment of the present invention Fruit figure;The overall conversion effect picture converted in Fig. 5 embodiment of the present invention.
The theory significance and practical application value that the present invention has: solve traditional documents software for editing be difficult to support it is a variety of The problem of type document mutually converts especially solves and is directed to single document classification fine granularity not using conventional machines learning algorithm The problem of foot provides tool support to meet user to different document type on-line conversion.Reduce professional PowerPoint Manufacture difficulty, improves the high efficiency of professional PowerPoint production, provides efficiently demonstration text for colleges and universities teachers and students, scientific research personnel etc. Original text conversion method.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, all of these belong to the protection of the present invention.

Claims (5)

1. a kind of powerpoint presentation is to Beamer PowerPoint conversion method characterized by comprising
S1, it introduces the data extraction that Apache POI realizes PowerPoint source file: source file being pre-processed, source is obtained File paragraph information then carries out being extracted and preserved comprising the data of text, picture, table, formula;
S2, the analysis of source file data is carried out: according to the content extracted to PowerPoint source file, by the text pair of each paragraph The font size answered, line number, horizontal layout position are summarized as set of source data Ta, preset PowerPoint conversion Beamer history letter Breath is used as migrating data collection Tb, the two is merged into training dataset T;Definition is used for the Euclidean distance of K-means clustering algorithm Function distedWith minimum squared error function E;Transfer learning algorithm is executed, initializes the weight vectors w of paragraph, and calculate Weight distribution p on data set Tt;It executes clustering algorithm to cluster data set T, by calling Euclidean distance function distedWith minimum squared error function E, different paragraphs is incorporated into k class, then computation migration error rate ∈tUpdate weight VectorIteration operation setting is repeatedly to obtain final classification device ht, and by text, picture, table, formula classification results It saves;Scaling, denoising, binary conversion treatment are done to formula, then by OCR and semantic conversion technical transform target formula, generate lattice The Beamer PowerPoint formula of formula;
S3, it introduces JACOB realization Beamer file destination generation: the text of preservation, picture, table, formula is determined default Text, picture, table, formula are sequentially written in the Beamer file of target by the writing position in Beamer template, complete to drill Show the conversion of manuscript.
2. a kind of powerpoint presentation according to claim 1 is to Beamer PowerPoint conversion method, special Sign is that the specific method that the introducing Apache POI of step S1 realizes that source file data are extracted includes:
Selection dialog box FileDialog in S11, calling system file, Microsoft to be converted is uploaded for user Powerpoint presentation;
After the completion of S12, upload, the getSlides method provided by HSLFSlideShow object in POI is somebody's turn to do All slide data information in powerpoint presentation;
The extraction of S13, text data, by " Item " that is provided in JACOB component, " Range ", " Text ", " Font ", " Size " parameter reads content of text, test font size, paragraph format, paragraph call number information in file;
The data extraction of S14, remaining format obtain the picture in PowerPoint by the GETALLPictures method of POI, lead to It crosses GETTables method to obtain table, the picture for extracting FileOutputStream, extract the formula of Clipboard, and will mention The data taken save.
3. a kind of powerpoint presentation according to claim 1 is to Beamer PowerPoint conversion method, special Sign is that the specific method of the source file data analysis of step S2 includes:
The mode that S21, statistics text data are stored in PowerPoint, by the corresponding font size of each paragraph text, line number, level Placement position summarizes as set of source data Ta, the length is m, load preset PowerPoint according to same format and convert Beamer historical information is as migrating data collection Tb, the length is n;The two is merged into training dataset T, the length is m+n;
S22, definition data set text data paragraph sample are expressed asMass center is expressed asWherein i=1,2 ..., s indicate paragraph call number, and j=1,2 ..., t indicate feature Number calculates the Euclidean distance function of every cluster mass center He the paragraph distance further according to above-mentioned symbol definition for K-means algorithm:
Define the minimum squared error function of K-means algorithm fitting cluster mass center:
WhereinIt is cluster CiMean vector;
S23, migration algorithm is executed, initializes the weight vectors of paragraph, w indicates the initial weight of each paragraph text, the weight For adjusting migrating data to the influence of source data:
S24, it calculates for the weight distribution p on data set Tt, for the weight item of K-means algorithm training data, weight point Cloth ptAccording to weight vectors wtIt is calculated:
S25: it executes clustering algorithm and data set T is clustered, by calling Euclidean distance function distedWith minimum square mistake Difference function E incorporates different paragraphs into k class;
S26: according to the cluster result of K-means algorithm, computation migration error rate ∈t:
htPresentation class device is in TaUpper classification results, c indicate that clustering algorithm is sorted in TaUpper classification results, setting And βt=∈t/(1-∈t) and calculated according to the error rate and update weight vector:
S27: return step S24 is iterated, until reaching the number of iterations N of setting, to obtain final classification device ht, and Classification results are saved;
S28, for different formula types, when formula is picture format, to the formula picture of powerpoint presentation Scaling, denoising, binary conversion treatment are done, then by OCR and semantic conversion technical transform target formula, generates the Beamer of formatting PowerPoint formula.
4. a kind of powerpoint presentation according to claim 1 is to Beamer PowerPoint conversion method, special Sign is that the method that the introducing JACOB of step S3 realizes that file destination generates includes:
S31, classification results are read, by the title of storage, content of text, table, picture and formula and source file corresponding data Establish mapping relations;
Tetra- kinds of S32, built-in Bergen, Berkeley, Ilmaneau, Marburg target Beamer templates, select according to user, Using JACOB component definition one new Beamer PowerPoint, the target generated in file is determined according to above-mentioned mapping relations The position of element;
S33, the data flow that file destination is generated by object element, are sequentially written in target Beamer for file destination data flow In file, final Beamer PowerPoint is generated.
5. a kind of powerpoint presentation is to Beamer PowerPoint converting system characterized by comprising
Source file data extraction module: realize that the data of PowerPoint source file are extracted for introducing Apache POI: first Source file is pre-processed, source file paragraph information is obtained, then carries out mentioning comprising the data of text, picture, table, formula It takes and saves;
Source file data analysis module: the content that PowerPoint source file is extracted for basis, by the text of each paragraph Corresponding font size, line number, horizontal layout position are summarized as set of source data Ta, preset PowerPoint conversion Beamer history Information is as migrating data collection Tb, the two is merged into training dataset T;Definition for K-means clustering algorithm Euclidean away from From function distedWith minimum squared error function E;Transfer learning algorithm is executed, initializes the weight vectors w of paragraph, and count It calculates for the weight distribution p on data set Tt;It executes clustering algorithm to cluster data set T, by calling Euclidean distance letter Number distedWith minimum squared error function E, different paragraphs is incorporated into k class, then computation migration error rate ∈tUpdate power It is worth vectorIteration operation setting is repeatedly to obtain final classification device ht, and by text, picture, table, formula classification knot Fruit saves;Scaling, denoising, binary conversion treatment are done to formula, then by OCR and semantic conversion technical transform target formula, generated The Beamer PowerPoint formula of formatting;
File destination generation module: introduce JACOB realize Beamer file destination generate: to the text of preservation, picture, table, Formula determines the writing position in default Beamer template, and text, picture, table, formula are sequentially written in target In Beamer file, the conversion of PowerPoint is completed.
CN201910097017.6A 2019-01-31 2019-01-31 Method and system for converting PowerPoint presentation into Beamer presentation Active CN109885818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910097017.6A CN109885818B (en) 2019-01-31 2019-01-31 Method and system for converting PowerPoint presentation into Beamer presentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910097017.6A CN109885818B (en) 2019-01-31 2019-01-31 Method and system for converting PowerPoint presentation into Beamer presentation

Publications (2)

Publication Number Publication Date
CN109885818A true CN109885818A (en) 2019-06-14
CN109885818B CN109885818B (en) 2020-11-27

Family

ID=66927592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910097017.6A Active CN109885818B (en) 2019-01-31 2019-01-31 Method and system for converting PowerPoint presentation into Beamer presentation

Country Status (1)

Country Link
CN (1) CN109885818B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046841A (en) * 2019-12-26 2020-04-21 中孚安全技术有限公司 Character extraction method, system, terminal and storage medium of PowerPoint file
CN112560406A (en) * 2020-12-17 2021-03-26 中科三清科技有限公司 Method and device for generating forecast consultation demonstration manuscript

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104142961A (en) * 2013-05-10 2014-11-12 北大方正集团有限公司 Logical processing device and logical processing method for composite diagram in format document
US9147129B2 (en) * 2011-11-18 2015-09-29 Honeywell International Inc. Score fusion and training data recycling for video classification
CN107357765A (en) * 2017-07-14 2017-11-17 北京神州泰岳软件股份有限公司 Word document flaking method and device
CN107526106A (en) * 2017-08-28 2017-12-29 电子科技大学 Quick seismic waveform sorting technique based on semi-supervised algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9147129B2 (en) * 2011-11-18 2015-09-29 Honeywell International Inc. Score fusion and training data recycling for video classification
CN104142961A (en) * 2013-05-10 2014-11-12 北大方正集团有限公司 Logical processing device and logical processing method for composite diagram in format document
CN107357765A (en) * 2017-07-14 2017-11-17 北京神州泰岳软件股份有限公司 Word document flaking method and device
CN107526106A (en) * 2017-08-28 2017-12-29 电子科技大学 Quick seismic waveform sorting technique based on semi-supervised algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZIBIN LI 等: "Cluster and dynamic-TrAdaBoost-based transfer learning for text classification", 《ICNC-FSKD 2017》 *
刘安斐 等: "基于 Adaboost 算法的 K- means 遥感影像分类算法", 《北京电子科技学院学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046841A (en) * 2019-12-26 2020-04-21 中孚安全技术有限公司 Character extraction method, system, terminal and storage medium of PowerPoint file
CN112560406A (en) * 2020-12-17 2021-03-26 中科三清科技有限公司 Method and device for generating forecast consultation demonstration manuscript
CN112560406B (en) * 2020-12-17 2021-09-07 中科三清科技有限公司 Method and device for generating forecast consultation demonstration manuscript

Also Published As

Publication number Publication date
CN109885818B (en) 2020-11-27

Similar Documents

Publication Publication Date Title
CN109657204B (en) Auto-pairing fonts using asymmetric metric learning
JP5134628B2 (en) Media material analysis of consecutive articles
US20220198182A1 (en) Methods and systems of field detection in a document
Gatos et al. Ground-truth production in the transcriptorium project
CN109918351B (en) Method and system for converting Beamer presentation into PowerPoint presentation
CN107392143A (en) A kind of resume accurate Analysis method based on SVM text classifications
CN107169485A (en) A kind of method for identifying mathematical formula and device
CN104699685B (en) Model modification device and method, data processing equipment and method, program
CN107430604A (en) The semantic expressiveness of picture material
CN107169086A (en) A kind of file classification method
CN110222317A (en) A kind of method and system that powerpoint presentation is converted to Word document
CN109885818A (en) A kind of powerpoint presentation is to Beamer PowerPoint conversion method and system
CN109086255A (en) A kind of bibliography automatic marking method and system based on deep learning
CN111753514B (en) Automatic generation method and device of patent application text
CN113360608A (en) Man-machine combined Chinese composition correcting system and method
CN116205211A (en) Document level resume analysis method based on large-scale pre-training generation model
CN112347761A (en) Bert-based drug relationship extraction method
CN114782965A (en) Visual rich document information extraction method, system and medium based on layout relevance
CN115661846A (en) Data processing method and device, electronic equipment and storage medium
CN106445914A (en) Microblog emotion classifier establishing method and device
CN110765278B (en) Method for searching similar exercises, computer equipment and storage medium
CN106776724B (en) Question classification method and system
JP2018116701A (en) Processor of seal impression image, method therefor and electronic apparatus
CN115130437B (en) Intelligent document filling method and device and storage medium
CN113705157B (en) Photographing and modifying method for paper work

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190614

Assignee: WUHAN TIMES GEOSMART TECHNOLOGY Co.,Ltd.

Assignor: CHINA University OF GEOSCIENCES (WUHAN CITY)

Contract record no.: X2022420000021

Denomination of invention: A conversion method and system from PowerPoint presentation to beamer presentation

Granted publication date: 20201127

License type: Common License

Record date: 20220302