CN112783957A - Method and system for importing word document format for English reading - Google Patents

Method and system for importing word document format for English reading Download PDF

Info

Publication number
CN112783957A
CN112783957A CN201911095769.5A CN201911095769A CN112783957A CN 112783957 A CN112783957 A CN 112783957A CN 201911095769 A CN201911095769 A CN 201911095769A CN 112783957 A CN112783957 A CN 112783957A
Authority
CN
China
Prior art keywords
document
word
reading
module
word document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911095769.5A
Other languages
Chinese (zh)
Inventor
郭永福
苏德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Linrui Education Technology Co ltd
Original Assignee
Shanghai Linrui Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Linrui Education Technology Co ltd filed Critical Shanghai Linrui Education Technology Co ltd
Priority to CN201911095769.5A priority Critical patent/CN112783957A/en
Publication of CN112783957A publication Critical patent/CN112783957A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides a formatting importing method and a formatting importing system for English reading word documents, which relate to the field of reading data importing, and comprise a system client and a network server which is in communication connection with the client, wherein the client is provided with a document uploading module; the network server is provided with a document identification module, a document reading module, a document identification and analysis module and a document storage module; the document uploading module is used for editing a document; the document identification module is used for calling a storage method to identify the document according to the document identification information; the document reading module is used for reading document information according to the document identification information; the document identification and analysis module is used for converting the document into formatted data. The unstructured data are stored in a structured mode, so that subsequent classified query, rendering display and analysis statistics are facilitated; the document is converted into a refined, standard HTML format so that it can be quickly propagated through WEB pages.

Description

Method and system for importing word document format for English reading
Technical Field
The invention relates to the field of reading material data import, in particular to a method and a system for importing English reading word document formatting.
Background
With the increasing demand of English learning, most English teachers urgently need to introduce materials such as English practice test paper, teaching aid and the like woven by themselves into various English material libraries; meanwhile, the English reading data of junior middle school and high school also needs to be imported into word document format.
However, at present, like products do not have the function of word document formatting input, and English reading word documents need to be converted into plain texts firstly, and then the plain texts are imported; however, the process obviously loses the format and the style, and the pictures and the tables are also filtered, so that the content is damaged; meanwhile, the technology of splitting and identifying the content is not mature, for example: how to identify the question stem, the answer, the analysis and the like in the English reading material.
Disclosure of Invention
In view of the defects of the prior art, the invention aims to provide a method and a system for importing the formatting of an English reading word document, which are used for storing unstructured data of the English reading word document in a structured manner so as to facilitate subsequent classified query, rendering display and analysis statistics; and on the basis of keeping the original format and style, the word document is converted into a refined and standard HTML format, so that the word document can be quickly spread through a WEB page.
The invention discloses a formatting import method of an English reading word document, which comprises the following steps:
step 1: the network server defines the attribute of a document detection rule, a document infrastructure, a document detection import method and a document object entity class, and defines a document template according to the requirement of a document topic type;
step 2: the client side edits the word document by referring to the document template and uploads the word document to the network server side;
and step 3: the network server side obtains a word document, serializes the word document into an object, extracts each node data element in the word document, detects and screens the attribute according to a predefined document detection rule attribute and a document basic structure, and then stores the attribute; recording a starting subscript and an ending subscript of each node data element of a word document, creating a word document object entity class through a predefined document detection import method and a document object instance class, and assigning attributes one by one;
and 4, step 4: substituting the starting subscript and the ending subscript of each node data element of the word document into a word document instantiation object;
and 5: reading the structure attribute of the document object according to the starting subscript and the ending subscript of each node data element of the word document, circularly traversing the child nodes of the structure attribute, judging whether the child nodes are paragraph objects, and if so, forcibly converting the child nodes into paragraphs;
step 6: judging a conversion method corresponding to each node data element of the word document, and executing corresponding conversion;
and 7: extracting the start subscript and the end subscript of each node data element of the word document, performing circular traversal, and converting the node content into HTML page codes; according to a standard formatting and filtering method written regularly, achieving the correct integrity of data, and assigning a page number code to a document object instance class;
and 8: performing data persistence operation on the document object instance class after assignment, storing the document object instance class in a database, and feeding back a result to a client;
and step 9: the client extracts the structured data for rendering, and the functions of online reading, answering and automatic reading are realized.
In an embodiment of the present invention, the document template is defined according to the requirement of english reading question types, including attribute, question stem, answer, and resolution of english reading question types.
In an embodiment of the invention, the client renders the structured data through HTML, CSS and JS technologies, so as to realize the functions of online reading, answering and automatic reading.
In an embodiment of the invention, the database is a relational database.
An English reading word document formatting import system comprises a system client and a network server which is in communication connection with the client, wherein the client is provided with a document uploading module; the network server is provided with a document identification module, a document reading module, a document identification and analysis module and a document storage module;
the document uploading module is used for editing and uploading word documents;
the document identification module is used for calling a storage method to identify a document according to word document identification information;
the document reading module is used for reading word document information according to the document identification information;
the recognition and analysis module is used for performing fragment decomposition, content analysis and data splicing on word document contents and converting the word document into formatted data;
as described above, the method and system for importing the formatting of the english reading word document of the present invention have the following advantages:
1. according to the invention, the unstructured data of the word document is structurally stored, so that subsequent operations such as classified query, rendering display, analysis and statistics and the like are facilitated.
2. The invention converts the word document into a refined and standard HTML format on the basis of keeping the original format and style of the word document for English reading, thereby being capable of quickly spreading through a WEB page.
Drawings
Fig. 1 is a block diagram showing a system configuration disclosed in the embodiment of the present invention.
Fig. 2 shows a flowchart of the system operation disclosed in the embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
The invention provides a formatting import method of an English reading word document, which is based on C # language, realizes the conversion of word document content to HTML (hypertext markup language) webpage content, processes the converted content by SQLSERVER (structured language query database) and REDIS (database) data storage technology, and feeds back the result to a user, and the method comprises the following steps:
step 1: defining a document detection rule attribute, a document infrastructure, a document detection import method and a document object entity class through a network server, and defining a document template according to a document question type requirement;
the document template is defined according to English reading question type requirements, and the document template comprises attributes, question stems, answers and analysis of English reading question types;
step 2: the client side edits the word document by referring to the document template and uploads the word document to the network server side;
and step 3: a network service terminal acquires a word document, and serializes the word document into an object by using an interface component DeserializazeObject;
extracting each node data element in the word document by using an expose.words component, detecting and screening according to predefined document detection rule attributes and a document basic structure, and then storing the attributes;
recording a start subscript StartIndex and an end subscript EndIndex of each node data element of a word document, detecting and importing methods and document object instance classes through predefined documents, creating and instantiating the document object instance classes, and assigning the attributes one by one;
and 4, step 4: substituting the start subscript StartIndex and the end subscript EndIndex of each node data element of the word document into a word document instantiation object;
and 5: reading the structural attribute of a document object according to the starIndex starting subscript and the EndIndex ending subscript of each node data element of the word document, circularly traversing the child nodes of the structural attribute, judging whether the child nodes are paragraph objects, and if so, forcibly converting the child nodes into paragraphs;
step 6: judging a conversion method corresponding to each node data element of the word document, and executing corresponding conversion;
and 7: extracting a start index StartIndex and an end index EndIndex of each node data element of the word document, performing circular traversal, and converting the node content into an Html page code;
according to the standard formatting and filtering method of the regular writing of Filter NoIndrentStyle (), Filter ColorStyle (), GetParagrPhTxt (), and Filter Style (), in addition, the Replace method in string achieves the correct integrity of data, and the page code is assigned to the document object instance class;
wherein, FilterNoIndexStyle (), FilterColorStyle (), GetParagrPhTxt (), FilterStyle (), string and Replace are all common methods in programming;
and 8: performing data persistence operation on the document object instance class after assignment, storing the document object instance class in a relational database, and feeding back a result to a client;
and step 9: the client side extracts the structured data and renders the structured data through HTML, CSS (cascading style sheet) and JS (JavaScript) technologies, so that the functions of online reading, answering and automatic reading are realized.
As shown in fig. 1, an english reading word document formatting import system includes a client, and a network server communicatively connected to the client, where the client is provided with a document upload module; the network server is provided with a document identification module, a document reading module, a document identification and analysis module and a document storage module;
the document uploading module is used for editing and uploading word documents;
the document identification module is used for calling a storage method to identify a document according to word document identification information;
the document reading module is used for reading word document information according to the document identification information;
the document identification and analysis module is used for performing fragment decomposition, content analysis and data splicing on word document contents and converting the word documents into formatted data;
and the document storage module is used for storing the formatted and converted data.
As shown in fig. 2, the system works as follows:
(1) a user edits a word document according to the definition requirements of basic notes and topics of the document, selects a word document file in a WEB page of a client and uploads the word document file;
(2) the network server side creates an interception detection uploading file at a port, automatically matches and checks with the template, releases the document meeting the requirements, stores the file under the server, and gives a reason explanation to the document not meeting the requirements
(3) The client submits to a network server, the network server reads data, and calls a saved file to read the data according to the identification information of the document stored in the server;
(4) the network server side reads file information and then executes an analysis command, executes an identification command after the file is preliminarily split, and performs operations such as fragment decomposition, content analysis, data splicing and the like on word document content to convert the word document content into formatted data;
(5) the network server executes a data storage command after processing the content, performs data persistence operation on the document object instance class subjected to the value assignment of the structured data, stores the document object instance class in a relational database, and feeds back a result to the client;
(6) the network server feeds back a result to the client and gives a feedback information prompt;
(7) in the using process, the structured data stored in the step (5) are extracted from the relational database and rendered to the client, so that the functions of online reading, answering and automatic reading are realized; in conclusion, the invention stores the unstructured data of the English reading word document in a structured way, thereby facilitating subsequent classified query, rendering display and analysis and statistics; and on the basis of keeping the original format and style, the word document is converted into a refined and standard HTML format, so that the word document can be quickly spread through a WEB page. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (5)

1. A method for importing English reading word document formatting is characterized by comprising the following steps:
step 1: the network server defines the attribute of a document detection rule, a document infrastructure, a document detection import method and a document object entity class, and defines a document template according to the requirement of a document topic type;
step 2: the client side edits the word document by referring to the document template and uploads the word document to the network server side;
and step 3: the network server side obtains a word document, serializes the word document into an object, extracts each node data element in the word document, detects and screens the attribute according to a predefined document detection rule attribute and a document basic structure, and then stores the attribute; recording a starting subscript and an ending subscript of each node data element of a word document, creating a word document object entity class through a predefined document detection import method and a document object instance class, and assigning attributes one by one;
and 4, step 4: substituting the starting subscript and the ending subscript of each node data element of the word document into a word document instantiation object;
and 5: reading the structure attribute of the document object according to the starting subscript and the ending subscript of each node data element of the word document, circularly traversing the child nodes of the structure attribute, judging whether the child nodes are paragraph objects, and if so, forcibly converting the child nodes into paragraphs;
step 6: judging a conversion method corresponding to each node data element of the word document, and executing corresponding conversion;
and 7: extracting the start subscript and the end subscript of each node data element of the word document, performing circular traversal, and converting the node content into HTML page codes; according to a standard formatting and filtering method written regularly, achieving the correct integrity of data, and assigning a page number code to a document object instance class;
and 8: performing data persistence operation on the document object instance class after assignment, storing the document object instance class in a database, and feeding back a result to a client;
and step 9: the client extracts the structured data for rendering, and the functions of online reading, answering and automatic reading are realized.
2. The english reading word document formatting import method of claim 1, wherein: the document template is defined according to English reading question type requirements, and comprises attributes, question stems, answers and analysis of English reading question types.
3. The english reading word document formatting import method of claim 1, wherein: the client renders the structured data through HTML, CSS and JS technologies, and achieves online reading, answering and automatic reading functions.
4. The english reading word document formatting import method of claim 1, wherein: the database is a relational database.
5. The formatting and importing system for English reading word documents is characterized by comprising a client and a network server which is in communication connection with the client, wherein the client is provided with a document uploading module; the network server is provided with a document identification module, a document reading module, a document identification and analysis module and a document storage module;
the document uploading module is used for editing and uploading word documents;
the document identification module is used for calling a storage method to identify a document according to word document identification information;
the document reading module is used for reading word document information according to the document identification information;
the document identification and analysis module is used for performing fragment decomposition, content analysis and data splicing on word document contents and converting the word documents into formatted data;
and the document storage module is used for storing the formatted and converted data.
CN201911095769.5A 2019-11-11 2019-11-11 Method and system for importing word document format for English reading Pending CN112783957A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911095769.5A CN112783957A (en) 2019-11-11 2019-11-11 Method and system for importing word document format for English reading

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911095769.5A CN112783957A (en) 2019-11-11 2019-11-11 Method and system for importing word document format for English reading

Publications (1)

Publication Number Publication Date
CN112783957A true CN112783957A (en) 2021-05-11

Family

ID=75749795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911095769.5A Pending CN112783957A (en) 2019-11-11 2019-11-11 Method and system for importing word document format for English reading

Country Status (1)

Country Link
CN (1) CN112783957A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361256A (en) * 2021-06-24 2021-09-07 上海真虹信息科技有限公司 Rapid Word document parsing method based on Aspose technology

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199871A (en) * 2014-08-19 2014-12-10 南京富士通南大软件技术有限公司 High-speed test question inputting method for intelligent teaching
CN104267953A (en) * 2014-09-27 2015-01-07 昆明钢铁集团有限责任公司 Control and method for importing Word test questions based on browser
CN104268226A (en) * 2014-09-27 2015-01-07 昆明钢铁集团有限责任公司 Control and method for copying and uploading Word picture based on browser
CN104298652A (en) * 2013-07-19 2015-01-21 深圳习习网络科技有限公司 Electronic test paper format conversion method and device
CN106202003A (en) * 2016-06-23 2016-12-07 广东小天才科技有限公司 Test question content processing method and system
CN106802937A (en) * 2016-12-30 2017-06-06 江苏中育优教科技发展有限公司 The conversion method and system of Word document
CN107203627A (en) * 2017-05-27 2017-09-26 山东浪潮通软信息科技有限公司 The method of mutual phase transformation between a kind of structural data and Word document
CN108595389A (en) * 2018-04-25 2018-09-28 华中科技大学 A method of Word document is converted into txt plain text documents
CN108614839A (en) * 2016-12-13 2018-10-02 上海宝信软件股份有限公司 WORD documents based on browser turn html page visualizing editing method and system
CN109002483A (en) * 2018-06-22 2018-12-14 平安科技(深圳)有限公司 Document management method, device, computer equipment and storage medium
CN109408783A (en) * 2018-09-06 2019-03-01 广州城市信息研究所有限公司 Electronic document online editing method and system
CN109947836A (en) * 2019-03-21 2019-06-28 江西风向标教育科技有限公司 English paper structural method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298652A (en) * 2013-07-19 2015-01-21 深圳习习网络科技有限公司 Electronic test paper format conversion method and device
CN104199871A (en) * 2014-08-19 2014-12-10 南京富士通南大软件技术有限公司 High-speed test question inputting method for intelligent teaching
CN104267953A (en) * 2014-09-27 2015-01-07 昆明钢铁集团有限责任公司 Control and method for importing Word test questions based on browser
CN104268226A (en) * 2014-09-27 2015-01-07 昆明钢铁集团有限责任公司 Control and method for copying and uploading Word picture based on browser
CN106202003A (en) * 2016-06-23 2016-12-07 广东小天才科技有限公司 Test question content processing method and system
CN108614839A (en) * 2016-12-13 2018-10-02 上海宝信软件股份有限公司 WORD documents based on browser turn html page visualizing editing method and system
CN106802937A (en) * 2016-12-30 2017-06-06 江苏中育优教科技发展有限公司 The conversion method and system of Word document
CN107203627A (en) * 2017-05-27 2017-09-26 山东浪潮通软信息科技有限公司 The method of mutual phase transformation between a kind of structural data and Word document
CN108595389A (en) * 2018-04-25 2018-09-28 华中科技大学 A method of Word document is converted into txt plain text documents
CN109002483A (en) * 2018-06-22 2018-12-14 平安科技(深圳)有限公司 Document management method, device, computer equipment and storage medium
CN109408783A (en) * 2018-09-06 2019-03-01 广州城市信息研究所有限公司 Electronic document online editing method and system
CN109947836A (en) * 2019-03-21 2019-06-28 江西风向标教育科技有限公司 English paper structural method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361256A (en) * 2021-06-24 2021-09-07 上海真虹信息科技有限公司 Rapid Word document parsing method based on Aspose technology

Similar Documents

Publication Publication Date Title
CN105447099B (en) Log-structuredization information extracting method and device
CN112749284B (en) Knowledge graph construction method, device, equipment and storage medium
US10366154B2 (en) Information processing device, information processing method, and computer program product
Turco et al. Edition visualization technology: A simple tool to visualize TEI-based digital editions
CN104123399A (en) Cross-platform question bank resource structuring method
CN106940693B (en) Single structure processing method for laboratory original record
CN112052414A (en) Data processing method and device and readable storage medium
CN104199871A (en) High-speed test question inputting method for intelligent teaching
CN111897781A (en) Method and system for extracting knowledge graph data
CN109614594B (en) Method for analyzing topic document into topic library data
CN114359533B (en) Page number identification method based on page text and computer equipment
CN113779345B (en) Teaching material generation method and device, computer equipment and storage medium
JP2019003472A (en) Information processing apparatus and information processing method
CN112783957A (en) Method and system for importing word document format for English reading
CN112668282B (en) Method and system for converting format of equipment procedure document
CN109086440B (en) Knowledge extraction method and system
Balk et al. IMPACT: working together to address the challenges involving mass digitization of historical printed text
Gephart et al. Qualitative Data Analysis: Three Microcomputer-Supported Approaches.
CN116306506A (en) Intelligent mail template method based on content identification
CN114973798A (en) Word learning card generation method and device
KR101632951B1 (en) Computer readable medium recording program for converting to online learning data and method of converting to online learning data
US20020129005A1 (en) Method and apparatus for regrouping data
Hast et al. Making large collections of handwritten material easily accessible and searchable
Serbaeva et al. READ for Solving Manuscript Riddles: A Preliminary Study of the Manuscripts of the 3rd ṣaṭka of the Jayadrathayāmala
CN111651963A (en) Technology for generating test paper by importing word into question bank

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210511

RJ01 Rejection of invention patent application after publication