Disclosure of Invention
In view of the above, the present invention provides a document processing method, apparatus, device and storage medium, so as to improve the processing efficiency of a document on the basis of satisfying the requirement of online question bank entry.
In a first aspect, an embodiment of the present application provides a document processing method, where the method includes:
the method comprises the steps of obtaining pictures and documents to be analyzed contained in the documents to be processed from the documents to be processed, and storing each obtained picture in a remote server, wherein the documents to be processed are word documents containing various media elements of different types, and the documents to be analyzed are the documents to be processed in an xml format;
aiming at each acquired picture, establishing a corresponding replacement path of the picture in the document to be analyzed according to the position of the picture in the document to be processed and the storage address of the picture in the remote server;
analyzing the document to be analyzed, and determining an xml tag corresponding to each media element in the document to be analyzed, wherein the xml tag at least comprises: the formula label is used for representing a formula and the picture label is used for representing a picture;
and converting the document to be analyzed into a display document in a target format based on each determined xml tag and the alternative path corresponding to each picture, wherein the target format is a document format capable of being displayed to a user in a webpage.
Optionally, the obtaining, from the document to be processed, the picture and the document to be analyzed included in the document to be processed includes:
decompressing the document to be processed by using a decompressing tool ziparcive to obtain the picture contained in the document to be processed and the document to be analyzed.
Optionally, the storing each acquired picture in a remote server includes:
determining the insertion sequence of the picture in the document to be processed according to the position of the picture in the document to be processed;
and taking the insertion sequence of the picture in the document to be processed as the storage file name of the picture, and storing the storage file name into the remote server.
Optionally, the establishing a corresponding alternative path of the picture in the document to be analyzed according to the position of the picture in the document to be processed and the storage address of the picture in the remote server includes:
searching a storage file name matched with the insertion sequence from the storage file names of the pictures stored in the remote server by using the insertion sequence of the picture in the document to be processed as a target file name;
extracting a storage address where the picture of the target file name is located from the remote server as a target storage address;
and taking the extracted target storage address as a corresponding replacement path of the picture in the document to be analyzed, wherein the replacement path is used for loading the picture in a webpage in a mode of accessing a remote picture.
Optionally, when the determined xml tags are the picture tags, the converting the document to be parsed into a presentation document in a target format based on each determined xml tag and the alternative path corresponding to each picture includes:
acquiring the replacement path corresponding to a target picture aiming at each picture tag, wherein the target picture is a picture marked by the picture tag in the document to be analyzed;
searching a replacement picture of the target picture from the remote server by using a replacement path corresponding to the target picture in a remote access mode, wherein the replacement picture is the target picture which can be loaded in a webpage;
and displaying the searched replacement picture to a user at a first target vacancy in the display document, wherein the first target vacancy is a position in the display document where the target picture needs to be inserted.
Optionally, when the determined xml tag is the formula tag, the converting the document to be parsed into a presentation document in a target format based on each determined xml tag and the alternative path corresponding to each picture includes:
for each formula label, obtaining a formula data line marked by the formula label from the document to be analyzed, wherein the formula data line is a permutation and combination of numbers, letters and operation symbols forming a target formula, and the target formula is a formula marked by the formula label in the document to be analyzed correspondingly;
marking each character included in the formula data line by using a region interval mark, and determining a word segmentation marking result of the formula data line, wherein the region interval mark is a sub-label used for identifying different types of characters in the formula label;
adjusting the display position of each character in the formula data line by utilizing a cascading style sheet according to the format of the target formula to obtain a formula to be imported for displaying in a webpage;
and displaying the formula to be introduced to a user at a second target vacancy in the display document, wherein the second target vacancy is a position in the display document where the target formula needs to be inserted.
Optionally, the determining the xml tag corresponding to each media element in the document to be parsed further includes:
judging whether the document to be analyzed contains a target media element, wherein the target media element is as follows: a formula inserted in a picture format;
and if the document to be analyzed contains the target media element, taking the picture tag as an xml tag corresponding to the target media element.
In a second aspect, an embodiment of the present application provides a document processing apparatus, including:
the data acquisition module is used for acquiring pictures and documents to be analyzed contained in the documents to be processed from the documents to be processed and storing each acquired picture in a remote server, wherein the documents to be processed are word documents containing various media elements of different types, and the documents to be analyzed are the documents to be processed in an xml format;
the resource replacing module is used for establishing a corresponding replacing path of each acquired picture in the document to be analyzed according to the position of the picture in the document to be processed and the storage address of the picture in the remote server;
a document analysis module, configured to analyze the document to be analyzed, and determine an xml tag corresponding to each media element in the document to be analyzed, where the xml tag at least includes: the formula label is used for representing a formula and the picture label is used for representing a picture;
and the document conversion module is used for converting the document to be analyzed into a display document in a target format based on each determined xml tag and the replacement path corresponding to each picture, wherein the target format is a document format which can be displayed to a user in a webpage.
Optionally, the data obtaining module is further configured to:
decompressing the document to be processed by using a decompressing tool ziparcive to obtain the picture contained in the document to be processed and the document to be analyzed.
Optionally, the data obtaining module is further configured to:
determining the insertion sequence of the picture in the document to be processed according to the position of the picture in the document to be processed;
and taking the insertion sequence of the picture in the document to be processed as the storage file name of the picture, and storing the storage file name into the remote server.
Optionally, the resource replacing module is further configured to:
searching a storage file name matched with the insertion sequence from the storage file names of the pictures stored in the remote server by using the insertion sequence of the picture in the document to be processed as a target file name;
extracting a storage address where the picture of the target file name is located from the remote server as a target storage address;
and taking the extracted target storage address as a corresponding replacement path of the picture in the document to be analyzed, wherein the replacement path is used for loading the picture in a webpage in a mode of accessing a remote picture.
Optionally, when the determined xml tag is the picture tag, the document conversion module is further configured to:
acquiring the replacement path corresponding to a target picture aiming at each picture tag, wherein the target picture is a picture marked by the picture tag in the document to be analyzed;
searching a replacement picture of the target picture from the remote server by using a replacement path corresponding to the target picture in a remote access mode, wherein the replacement picture is the target picture which can be loaded in a webpage;
and displaying the searched replacement picture to a user at a first target vacancy in the display document, wherein the first target vacancy is a position in the display document where the target picture needs to be inserted.
Optionally, when the determined xml tag is the formula tag, the document conversion module is further configured to:
for each formula label, obtaining a formula data line marked by the formula label from the document to be analyzed, wherein the formula data line is a permutation and combination of numbers, letters and operation symbols forming a target formula, and the target formula is a formula marked by the formula label in the document to be analyzed correspondingly;
marking each character included in the formula data line by using a region interval mark, and determining a word segmentation marking result of the formula data line, wherein the region interval mark is a sub-label used for identifying different types of characters in the formula label;
adjusting the display position of each character in the formula data line by utilizing a cascading style sheet according to the format of the target formula to obtain a formula to be imported for displaying in a webpage;
and displaying the formula to be introduced to a user at a second target vacancy in the display document, wherein the second target vacancy is a position in the display document where the target formula needs to be inserted.
Optionally, the document parsing module is further configured to:
judging whether the document to be analyzed contains a target media element, wherein the target media element is as follows: a formula inserted in a picture format;
and if the document to be analyzed contains the target media element, taking the picture tag as an xml tag corresponding to the target media element.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the document processing method when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the document processing method.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
converting a word format document to be processed containing various media elements of different types into an xml format document to be analyzed, and storing pictures contained in the document to be processed into a remote server; because the document to be analyzed is a document in an xml format, and media elements such as pictures and formulas in the document cannot be directly displayed in a webpage, on one hand, for the media elements belonging to the picture type: according to the method, for each picture contained in a document to be processed, a corresponding alternative path of the picture in the document to be analyzed is established according to the position of the picture in the document to be processed and the storage address of the picture in the remote server; therefore, the media element can be normally displayed on the webpage according to the corresponding alternative path of the media element and the mode of accessing the remote picture.
On the other hand, after the document to be analyzed is analyzed, the xml tags corresponding to the determined media elements can be utilized, when the document to be analyzed is converted into a display document which can be displayed to a user in a webpage, the media elements corresponding to the formula tags can be identified as a formula, and the technical problem that the com component in the prior art cannot identify the formula in the title document is solved. Therefore, the document processing method provided by the application can be used for respectively processing the pictures and the formulas in the subject documents, and processing the pictures and the formulas into the formats which can be displayed to users in the webpage, so that the processing efficiency of the documents is effectively improved on the basis of meeting the on-line question bank input requirements.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a document processing method, a document processing device, a document processing apparatus and a storage medium, which are described below through embodiments.
Example one
FIG. 1 is a flowchart illustrating a document processing method provided by an embodiment of the present application, wherein the method includes steps S101-S104; specifically, the method comprises the following steps:
s101, obtaining pictures and documents to be analyzed contained in the documents to be processed from the documents to be processed, and storing each obtained picture in a remote server.
Specifically, the document to be processed is a word document containing a plurality of media elements of different types, and the document to be analyzed is the document to be processed in an xml format. The media elements can be pictures, formulas, words, tables and other elements used for showing document information to users.
By way of example, a problem document created by a user for a mathematical subject may be used as a to-be-processed document, where the to-be-processed document includes at least: characters, tables, pictures and mathematical formulas appearing in the questions of the mathematical exercises; the document to be processed can be a word document with a docx suffix name, format conversion is carried out on the document to be processed, the document format of the document to be processed is converted into an xml format from the docx format, and the document to be analyzed corresponding to the document to be processed is obtained.
In this embodiment, as an optional embodiment, the obtaining, from a document to be processed, a picture and a document to be parsed included in the document to be processed includes:
decompressing the document to be processed by using a decompressing tool ziparcive to obtain the picture contained in the document to be processed and the document to be analyzed.
Illustratively, still taking the document to be processed in the above example as an example, decompressing the document to be processed by using a decompressing tool ziparichve, so as to obtain the document to be parsed in xml format and a media (media) picture resource folder, where the media picture resource folder is composed of each picture included in the document to be processed.
Specifically, in this embodiment of the present application, as an optional embodiment, the storing each acquired picture in a remote server includes:
determining the insertion sequence of the picture in the document to be processed according to the position of the picture in the document to be processed;
and taking the insertion sequence of the picture in the document to be processed as the storage file name of the picture, and storing the storage file name into the remote server.
Illustratively, if a document to be processed contains 10 mathematical topics, wherein a picture a appears in a first mathematical topic, a picture b appears in a fifth mathematical topic, and a picture c appears in an eighth mathematical topic, it may be determined that the insertion order of the picture a in the document to be processed is fig. 1, the insertion order of the picture b in the document to be processed is fig. 2, and the insertion order of the picture c in the document to be processed is fig. 3, and after the picture a, the picture b, and the picture c in the document to be processed are acquired, the picture a, the picture b, and the picture c in the document to be processed are stored in the remote server by using the storage file name of the picture a in fig. 1, the storage file name of the picture b in fig. 2, and the storage file name of the picture c in fig. 3.
It should be noted that, when processing a plurality of documents to be processed, the remote server establishes a separate picture folder for each document to be processed according to the document name of each document to be processed, so as to store and distinguish pictures contained in different documents to be processed.
S102, aiming at each acquired picture, establishing a corresponding replacement path of the picture in the document to be analyzed according to the position of the picture in the document to be processed and the storage address of the picture in the remote server.
Specifically, in this embodiment, as an optional embodiment, the establishing, according to the position of the picture in the document to be processed and the storage address of the picture in the remote server, a corresponding alternative path of the picture in the document to be analyzed includes:
searching a storage file name matched with the insertion sequence from the storage file names of the pictures stored in the remote server by using the insertion sequence of the picture in the document to be processed as a target file name;
extracting a storage address where the picture of the target file name is located from the remote server as a target storage address;
and taking the extracted target storage address as a corresponding replacement path of the picture in the document to be analyzed, wherein the replacement path is used for loading the picture in a webpage in a mode of accessing a remote picture.
Illustratively, taking the picture a in the above example as an example, according to the insertion order of the picture a in the document to be processed is fig. 1, searching for a picture with a storage file name of fig. 1 from a picture folder of the document to be processed stored in the remote server, and if the storage address of the searched picture of fig. 1 in the remote server is: www/media/fig. 1 (fig. 1 stored under the media folder in the remote server); and taking the storage address as a corresponding alternative path of the picture a at the picture a in the first mathematical topic in the document to be analyzed, so that the picture a in the first mathematical topic can be shown to the user in a final page of the online topic library webpage in a manner of accessing a remote picture, wherein the address for accessing the remote picture is http: // domain name/media/FIG. 1.
S103, analyzing the document to be analyzed, and determining the xml tags corresponding to the media elements in the document to be analyzed.
Specifically, as an optional embodiment, the document to be parsed in the xml format may be read by using a DomDocument instruction based on the PHP scripting language, where the DomDocument instruction is used to provide an initial (or top-most) access entry of the document to be parsed, each media element in the document to be parsed is used as a reading node to be read, and then, according to the attribute information of each read media element, an xml tag corresponding to the media element is parsed.
Specifically, the xml tag at least comprises: formula tags for characterizing formulas and picture tags for characterizing pictures.
In an exemplary description, a document to be analyzed containing various media elements of different types, such as a text, a picture, a table, a formula, and the like, is analyzed, and if an xml tag corresponding to a media element x is a < oMath > tag, it can be determined that the media element x belongs to the formula; if the xml tag corresponding to the media element y is the < imagedata > tag, determining that the media element y belongs to the picture; if the xml tag corresponding to media element z is a < tbl > tag, then it can be determined that media element z belongs to the table; wherein the < oMath > tag corresponds to the formula tag, and the < imagedata > tag corresponds to the picture tag.
It should be noted that, considering that the existence form of the formula in the document to be processed created by the user is not unique, for example: the user can insert an editable version of the formula through the formula editor, and can also insert a formula in a picture format. Therefore, in this embodiment of the present application, as an optional embodiment, the determining an xml tag corresponding to each media element in the document to be parsed further includes:
judging whether the document to be analyzed contains a target media element, wherein the target media element is as follows: a formula inserted in a picture format;
and if the document to be analyzed contains the target media element, taking the picture tag as an xml tag corresponding to the target media element.
Illustratively, if it is determined that a formula inserted in a picture format exists in the document to be parsed, the formula in the picture format is taken as a target media element, and the target media element is processed in a picture processing manner.
S104, converting the document to be analyzed into a display document in a target format based on each determined xml tag and the replacement path corresponding to each picture.
Specifically, the target format refers to a document format that can be presented to a user in a web page.
It should be noted that the document to be parsed is a document in xml format, media elements such as pictures and formulas in the document cannot be directly displayed in the web page, and therefore, according to each determined xml tag, the type of the media element included in the document to be parsed can be identified, for the media element belonging to the picture type, the media element can be normally displayed on the web page according to a replacement path corresponding to the media element and according to a manner of accessing a remote picture, and for the media element belonging to the formula type, a Mathjax js (front end integration) tool can be used to convert a < oMath > tag (also called a formula tag) into a div (partition mark) tag and a span tag (in-line tag) which are common in the web page document in html format, wherein the div tag is used to divide a data block and divide a corresponding data block into independent and different element parts, the span tag is an inline tag of the hypertext markup language, and is mostly used for combining inline elements in a document, and after conversion of the tag is completed, the display position of each number, letter or operation symbol in a formula can be adjusted by combining with a style sheet, so that the formula with a correct style is displayed to a user in a webpage.
In a possible implementation, when the xml tag is determined to be the picture tag, fig. 2 shows a flowchart of a method for processing a picture in a document to be parsed, which is provided by an embodiment of the present application, and as shown in fig. 2, when step S104 is executed, the method further includes S201-S203; specifically, the method comprises the following steps:
s201, aiming at each picture label, the replacement path corresponding to the target picture is obtained.
Specifically, the target picture is a picture marked by the picture tag in the document to be parsed.
Taking the example of step S101 as an example, the document to be processed collectively includes 10 mathematical topics, where a picture a appears in the first mathematical topic, a picture b appears in the fifth mathematical topic, and a picture c appears in the eighth mathematical topic, and since the document to be analyzed is only the document to be processed in xml format, the positions of the picture a, the picture b, and the picture c in the document to be analyzed are the same as the positions in the document to be processed, after the document to be analyzed is analyzed, the xml tags corresponding to the picture a, the picture b, and the picture c are all picture tags, and the storage address of the picture a is obtained from the remote server as a replacement path of the picture a, and is denoted as Xa; acquiring a storage address of the picture b as a replacement path of the picture b, and recording the storage address as Xb; and acquiring the storage address of the picture c as a replacement path of the picture c, and recording the storage address as Xc.
S202, searching for the replacement picture of the target picture from the remote server in a remote access mode by using the replacement path corresponding to the target picture.
Specifically, the replacement picture is the target picture that can be loaded in a web page.
Taking picture a as an example, picture a is a picture inserted first in a document to be parsed, a storage address of picture a in a remote server (i.e. a replacement path of picture a) is denoted as Xa, a storage file name of picture a in the remote server is denoted as fig. 1, and by using the replacement path of picture a, the replacement picture of picture a in fig. 1 can be found from the remote server by way of remote access.
S203, displaying the searched replacement picture to the user at the first target vacancy in the display document.
Specifically, the first target slot is a position in the presentation document where the target picture needs to be inserted.
In an exemplary description, still taking the picture a in the above example as an example, because the picture a is located in the first mathematical topic of the document to be parsed, the first target vacancy of the picture a in the display document is also in the first mathematical topic, so that when the document to be parsed is converted into the display document, the replacement picture fig. 1 of the picture a can be found from the remote server by using the replacement path of the picture a in a remote access manner, the picture 1 is inserted into the first mathematical topic of the display document, and the display is performed to the user in a form of a webpage, so that the user can perform online answering in the webpage conveniently.
In a possible implementation, when the determined xml tag is the formula tag, fig. 3 shows a flowchart of a method for processing a formula in a document to be parsed, which is provided by an embodiment of the present application, and as shown in fig. 3, when step S104 is executed, the method further includes S301-S304; specifically, the method comprises the following steps:
s301, aiming at each formula label, obtaining a formula data line marked by the formula label from the document to be analyzed.
Specifically, the formula data line is a permutation and combination of numbers, letters and operation symbols constituting a target formula, and the target formula is a formula in which the formula label is correspondingly marked in the document to be parsed. Exemplary illustrations of the use of the term in a document to be parsed<oMath >The label is used as the formula label, wherein,<oMath >the target formula marked by the label is:
(ii) a Then the formula data behavior corresponding to the target formula can be obtained: y, the value of (= g,
,×,m,+,b。
s302, marking each character included in the formula data line by using a region interval mark, and determining a word segmentation marking result of the formula data line, wherein the region interval mark is a sub-label used for identifying different types of characters in the formula label.
Specifically, as an optional embodiment, a < oMath > tag may be used as the formula tag to mark media elements belonging to a formula in a document to be parsed, where the < oMath > tag includes multiple sub-tags for identifying different types of characters in the formula, for example: the word segmentation method comprises the following steps that a < mi > sub-label used for marking letters, a < mn > sub-label used for marking numbers, a < mo > sub-label used for marking operation symbols and the like are utilized, the sub-labels in the formula labels are used as the region interval labels, each character included in a formula data line can be marked, and the word segmentation marking result of the formula data line is obtained.
Illustrative explanation, the target formula in the above example
By way of example, utilize<oMath >The sub-label in the label marks each character in the formula data line, and can obtain:<mi>y</mi> <mo>=</mo> <mn>1</mn> <mo>-</mo> <mn>2</mn> <mo>×</mo> <mi>m</mi> <mo>+</mo> <mi>b</mi>wherein, in the step (A),</mi>sub-label representation<mi>The sub-label marks the end of the character range,</mo>sub-label representation<mo>The sub-label marks the end of the character range,</mn>sub-label representation<mn>The sub-label marks the end of the character range; thus, the formula data line is divided into independent characters, and the word segmentation marking result of the formula data line is obtained as follows: y, =, 1, -, 2, ×, m, +, b.
And S303, adjusting the display position of each character in the formula data line according to the format of the target formula by using a cascading style sheet to obtain a formula to be imported for displaying in a webpage.
Specifically, taking the sub-tag included in the < oMath > tag (i.e., the sub-tag in the formula tag) as an example of the area interval tag, on the basis of obtaining the segmentation tagging result of the formula data line, in combination with the css style sheet (which may also be called a cascading style sheet), each independent character of the segmentation tagging result can be strictly set, and the display position when the character is displayed in the web page.
Illustratively, taking the character "y" in the above example as an example, according to the < mi > sub-tag for marking the letter, the independent letter y can be recognized, each character marked by the sub-tag is converted into a well-defined tag with a specific class (e.g. class), that is, the character y can be displayed in the web page by using the attribute of the class tag in combination with the css style sheet, and the example of tag conversion is as follows:
< mi y </mi > to < mjx-mi class = "mjx-n" >)
<mjx-c class="mjx-c79">
</mjx-c>
</mjx-mi>
Wherein c79 represents the character y;
thus, the character y can be displayed in the webpage by using the attribute of the class label;
and for the characters "1" - "" 2 "in the above example, because of its format in the target formula:
therefore, the recognized characters "1" - "" 2 "can be shown in the webpage as being combined with the css style sheet through the hierarchical relation contained between the labels
The specific processing example is as follows:
<mjx-table space=4>
< mjx-row size = "s" > < mjx-row > (molecule 1)
< mjx-line > </mjx-line > (horizontal line)
< mjx-row size = "s" > < mjx-row > (denominator 2)
</mjx-table>
Thus, the width of each character is defined through a space instruction, the size of each character displayed in a webpage is defined through a size instruction, the size of the character "1" as a numerator and the size of the character "2" as a denominator displayed in the webpage are defined to be s-type and slightly smaller than other characters, and therefore, the characters "1" - "" 2 "in a formula data line are adjusted according to the format in the target formula, and the characters which are finally used for displaying in the webpage and are to be imported into the formula are obtained
。
S304, displaying the formula to be introduced to the user at a second target vacancy in the display document.
Specifically, the second target slot is a position in the presentation document where the target formula needs to be inserted.
Illustratively, if the target formula is located in the second mathematical topic of the document to be analyzed, the second target vacancy is the second mathematical topic of the display document, the formula to be introduced, which is obtained after the target formula is adjusted in step S303, is introduced into the second mathematical topic of the display document to be displayed to the user, so that the user can conveniently answer the question online in the webpage.
Example two
Fig. 4 is a schematic structural diagram of a document processing apparatus provided in an embodiment of the present application, where the apparatus includes:
a data obtaining module 401, configured to obtain, from a to-be-processed document, a picture and a to-be-analyzed document that are included in the to-be-processed document, and store each obtained picture in a remote server, where the to-be-processed document is a word document that includes multiple different types of media elements, and the to-be-analyzed document is the to-be-processed document in an xml format;
a resource replacement module 402, configured to, for each obtained picture, establish a corresponding replacement path of the picture in the document to be analyzed according to a location of the picture in the document to be processed and a storage address of the picture in the remote server;
a document parsing module 403, configured to parse the document to be parsed, and determine an xml tag corresponding to each media element in the document to be parsed, where the xml tag at least includes: the formula label is used for representing a formula and the picture label is used for representing a picture;
a document conversion module 404, configured to convert, based on each determined xml tag and the alternative path corresponding to each picture, the document to be parsed into a presentation document in a target format, where the target format is a document format that can be presented to a user in a webpage.
Optionally, the data obtaining module 401 is further configured to:
decompressing the document to be processed by using a decompressing tool ziparcive to obtain the picture contained in the document to be processed and the document to be analyzed.
Optionally, the data obtaining module 401 is further configured to:
determining the insertion sequence of the picture in the document to be processed according to the position of the picture in the document to be processed;
and taking the insertion sequence of the picture in the document to be processed as the storage file name of the picture, and storing the storage file name into the remote server.
Optionally, the resource replacing module 402 is further configured to:
searching a storage file name matched with the insertion sequence from the storage file names of the pictures stored in the remote server by using the insertion sequence of the picture in the document to be processed as a target file name;
extracting a storage address where the picture of the target file name is located from the remote server as a target storage address;
and taking the extracted target storage address as a corresponding replacement path of the picture in the document to be analyzed, wherein the replacement path is used for loading the picture in a webpage in a mode of accessing a remote picture.
Optionally, when the determined xml tag is the picture tag, the document conversion module 404 is further configured to:
acquiring the replacement path corresponding to a target picture aiming at each picture tag, wherein the target picture is a picture marked by the picture tag in the document to be analyzed;
searching a replacement picture of the target picture from the remote server by using a replacement path corresponding to the target picture in a remote access mode, wherein the replacement picture is the target picture which can be loaded in a webpage;
and displaying the searched replacement picture to a user at a first target vacancy in the display document, wherein the first target vacancy is a position in the display document where the target picture needs to be inserted.
Optionally, when the determined xml tag is the formula tag, the document conversion module 404 is further configured to:
for each formula label, obtaining a formula data line marked by the formula label from the document to be analyzed, wherein the formula data line is a permutation and combination of numbers, letters and operation symbols forming a target formula, and the target formula is a formula marked by the formula label in the document to be analyzed correspondingly;
marking each character included in the formula data line by using a region interval mark, and determining a word segmentation marking result of the formula data line, wherein the region interval mark is a sub-label used for identifying different types of characters in the formula label;
adjusting the display position of each character in the formula data line by utilizing a cascading style sheet according to the format of the target formula to obtain a formula to be imported for displaying in a webpage;
and displaying the formula to be introduced to a user at a second target vacancy in the display document, wherein the second target vacancy is a position in the display document where the target formula needs to be inserted.
Optionally, the document parsing module 403 is further configured to:
judging whether the document to be analyzed contains a target media element, wherein the target media element is as follows: a formula inserted in a picture format;
and if the document to be analyzed contains the target media element, taking the picture tag as an xml tag corresponding to the target media element.
EXAMPLE III
As shown in fig. 5, an embodiment of the present application provides a computer device 500 for executing the document processing method in the present application, the device includes a memory 501, a processor 502 and a computer program stored on the memory 501 and executable on the processor 502, wherein the processor 502 implements the steps of the document processing method when executing the computer program.
Specifically, the memory 501 and the processor 502 may be general-purpose memory and processor, and are not limited to specific examples, and the document processing method can be executed when the processor 502 executes a computer program stored in the memory 501.
Corresponding to the document processing method in the present application, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, performs the steps of the document processing method described above.
In particular, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, on which a computer program can be executed when executed to perform the above-described document processing method.
In the embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. The above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and there may be other divisions in actual implementation, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of systems or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.